More efficient method for check_parallel_jacobian #1395

dallan-keylogic · 2024-04-10T21:06:46Z

Depends on #1429

Summary/Motivation:

Presently in check_parallel_jacobian, many arithmetic operations are being carried out in Python code, featuring a triple for loop for some reason. This new method is simpler and takes advantage of more vectorized operations. However, it doesn't knock out "negligible coefficients" like the old method did, and so some of the answers it produces are different given the current criterion for parallel constraints:

diff <= tolerance or diff <= tolerance * max(unorm, vnorm)

We should revisit this criterion---just using a relative tolerance might be enough.

Legal Acknowledgement

By contributing to this software project, I agree to the following terms and conditions for my contribution:

I agree my contributions are submitted under the license terms described in the LICENSE.txt file at the top level of this directory.
I represent I am authorized to make the contributions and grant the license. If my employer has rights to intellectual property that includes these contributions, I represent that I have received permission to make contributions and grant the required license on behalf of that employer.

Robbybp · 2024-04-10T23:11:24Z

idaes/core/util/model_diagnostics.py

+    # List to store pairs of parallel components
+    parallel = []
+
+    for row, col, val in zip(*find(upper_tri), strict=True):


When I try to test this, I get an error due to this keyword argument passed to zip. I assume strict should be passed to triu, but I'm not sure exactly what else is going on here.

Sorry about that. No, strict was an argument for zip. It throws an error if the iterators passed to zip are not the same length. However, it appears it was only added in Python 3.10, so I can't use it.

For reference find returns a tuple of three arrays, one containing row indices, one containing column indices, and one containing matrix values for all the nonzero entries of upper_tri. * unpacks the tuple into arguments for zip. It's a neater way of writing find_tuple = find(upper_tri) and
zip(find_tuple[0], find_tuple[1], find_tuple[2]).

Robbybp · 2024-04-11T15:25:10Z

idaes/core/util/model_diagnostics.py

+
+    # Take product of all rows/columns with all rows/columns by taking outer
+    # product of matrix with itself
+    outer = mat @ mat.transpose()


The previous implementation was designed to avoid taking every possible dot product by first sorting vectors by their nonzero structure. I would have expected this to be slow due to unnecessary dot products, but in some testing, it doesn't seem slower.

This is sparse matrix multiplication carried out by Scipy. The key here is that 1) Sorting through which products to take should be carried out in the sparse matrix multiplication routine, not manually and
2) The looping required to do this sorting and multiplication is done in compiled C++ code, not Python code, and (should) be much faster than doing it in Python code.

The scipy matrix product is taking every possible combination of non-zero dot products, while the previous implementation takes only dot products between vectors with the same sparsity structure. This is fewer dot products, although, in practice, probably not by a wide margin.

I'm afraid I don't entirely follow. Scipy's matrix multiply, as near as I can tell, does the same thing. It implements the SMMP algorithm which occurs in two steps: the first to determine the nonzero structure of the matrix, the second to actually compute the values (as near as I can tell). That's basically what you're doing, except 1) it isn't filtering out entries as being "too small to contribute" and 2) it's doing it in C++ instead of Python, which is much faster.

For example, if two rows are (1, 2, 3) and (1, 2, 0), they cannot possibly be parallel. The previous implementation will not compute this dot product, while the new implementation will.

So you're ruling out rows being parallel based on one having an element for the third index (3) and the other having zero for the third index? That can fail if the first row was (1, 2, 3e-8) instead of (1, 2, 3): (1, 2, 3e-8) is still effectively parallel to (1, 2, 0) even if they structurally differ.

We do this after filtering out small entries.

Robbybp · 2024-04-11T15:55:44Z

I just profiled this with the following script, which uses my branch: https://github.com/Robbybp/idaes-pse/tree/parallel_constraints-timing. This relies on the distill_DAE.py and distill.dat files from the Pyomo examples directory.

from pyomo.environ import *                                                                                                                                                          
from pyomo.dae import *                                                                                                                                                              
from distill_DAE import model                                                                                                                                                        
                                                                                                                                                                                     
instance = model.create_instance('distill.dat')                                                                                                                                      
                                                                                                                                                                                     
# Discretize using Finite Difference Approach                                                                                                                                        
discretizer = TransformationFactory('dae.finite_difference')                                                                                                                         
discretizer.apply_to(instance, nfe=1000, scheme='BACKWARD')                                                                                                                          
                                                                                                                                                                                     
from idaes.core.util.model_diagnostics import check_parallel_jacobian, check_parallel_jacobian_old                                                                                   
from pyomo.common.timing import TicTocTimer, HierarchicalTimer                                                                                                                       
                                                                                                                                                                                     
timer = TicTocTimer()                                                                                                                                                                
htimer = HierarchicalTimer()                                                                                                                                                         
                                                                                                                                                                                     
timer.tic()                                                                                                                                                                          
htimer.start("old")                                                                                                                                                                  
check_parallel_jacobian_old(instance, timer=htimer)                                                                                                                                  
htimer.stop("old")                                                                                                                                                                   
timer.toc("old")                                                                                                                                                                     
htimer.start("new")                                                                                                                                                                  
check_parallel_jacobian(instance, timer=htimer)                                                                                                                                      
htimer.stop("new")                                                                                                                                                                   
timer.toc("new")                                                                                                                                                                     
                                                                                                                                                                                     
print()                                                                                                                                                                              
print(htimer)

I see a profile that looks like this:

[    0.00] Resetting the tic/toc delta timer
[+  33.05] old
[+  31.19] new

Identifier            ncalls   cumtime   percall      %
-------------------------------------------------------
new                        1    31.189    31.189   48.6
     --------------------------------------------------
     check-dotprods        1     0.273     0.273    0.9
     get-jacobian          1    28.324    28.324   90.8
     matmul                1     0.012     0.012    0.0
     norms                 1     2.508     2.508    8.0
     other               n/a     0.073       n/a    0.2
     ==================================================
old                        1    33.048    33.048   51.4
     --------------------------------------------------
     get-jacobian          1    28.638    28.638   86.7
     norm             100068     0.343     0.000    1.0
     sort-by-nz            1     0.227     0.227    0.7
     vectors               1     3.682     3.682   11.1
     other               n/a     0.158       n/a    0.5
     ==================================================
=======================================================

The new implementation is indeed slightly faster, although both are dominated by the cost of getting the Jacobian.

I can make some small performance improvements to the old implementation, giving the following profile:

[    0.00] Resetting the tic/toc delta timer
[+  31.44] old                         
[+  30.82] new
                                             
Identifier            ncalls   cumtime   percall      %
-------------------------------------------------------
new                        1    30.817    30.817   49.5
     --------------------------------------------------
     check-dotprods        1     0.268     0.268    0.9               
     get-jacobian          1    27.966    27.966   90.7
     matmul                1     0.011     0.011    0.0
     norms                 1     2.501     2.501    8.1                                                                                                                              
     other               n/a     0.071       n/a    0.2
     ==================================================
old                        1    31.443    31.443   50.5
     --------------------------------------------------
     get-jacobian          1    28.255    28.255   89.9        
     norm             100068     0.330     0.000    1.1
     sort-by-nz            1     2.667     2.667    8.5
     vectors               1     0.029     0.029    0.1
     other               n/a     0.162       n/a    0.5
     ==================================================
=======================================================

Nothing too major. Both implementations seem to be dominated by the cost of extracting vectors from the Jacobian. I have no real problem with the new implementation. Do you have any performance benchmarks that motivated it?

dallan-keylogic · 2024-04-11T16:20:58Z

Thank you for profiling it @Robbybp . The main motivation was trying to read the old code, finding it unnecessarily complex, seeing a triple for loop in Python, and deciding to do an implementation myself. For a while, I did work in Octave, in which there was a huge performance difference between using built-ins and writing for-loops. I guess there isn't as big difference in Python, at least in this instance.

I'm curious about the time spent in norms, though. Is that dominated by lookup times, or by list creation? I just pushed a version that preallocates a Numpy array then fills it with norms---could you see how that performs?

Edit: Just realized you posted the script you used to profile it. Will check myself then.

dallan-keylogic · 2024-04-11T16:30:57Z

On a different note, there's the failing test. I am also having problems with test_display_near_parallel_variables, which I know is changed in #1380 .

The original test expects only (v1, v2), (v1,v4), and (v2, v4) as parallel. The new method additionally returns (v1, v3), (v4, v3), and (v2, v3). Here's the transposed Jacobian matrix, which has these vectors as rows:

matrix([[ 1.00000e+00,  1.00000e+00,  1.00000e+08],
        [-1.00000e+00,  0.00000e+00,  1.00000e+10],
        [ 9.99990e-01,  1.00001e+00,  1.00000e+08],
        [ 0.00000e+00, -1.00000e-08, -1.00000e-06]])

Frankly, I'm shocked that the (v1, v3) pair was missed by the old method (and that seems to have become undone with minor changes).

Robbybp · 2024-04-11T18:08:38Z

Frankly, I'm shocked that the (v1, v3) pair was missed by the old method (and that seems to have become undone with minor changes).

This is because, with a tolerance of 1e-4, the v3 vector was considered a zero-vector, and was not compared with other vectors. I think you could reasonably argue for considering a zero vector as parallel with everything or parallel with nothing.

dallan-keylogic · 2024-04-11T18:12:37Z

This is what I get with my latest version. Indeed, creating norms was the bottleneck, probably because it wasn't preallocated.

Identifier            ncalls   cumtime   percall      %
-------------------------------------------------------
new                        1    58.482    58.482   50.3
     --------------------------------------------------
     check-dotprods        1     4.389     4.389    7.5
     get-jacobian          1    53.934    53.934   92.2
     matmul                1     0.022     0.022    0.0
     norms                 1     0.002     0.002    0.0
     other               n/a     0.135       n/a    0.2
     ==================================================
old                        1    57.727    57.727   49.7
     --------------------------------------------------
     get-jacobian          1    51.437    51.437   89.1
     norm             100068     0.623     0.000    1.1
     sort-by-nz            1     5.356     5.356    9.3
     vectors               1     0.048     0.048    0.1
     other               n/a     0.263       n/a    0.5
     ==================================================
=======================================================

We have 4.548 s for the new method after subtracting get-jacobian and 6.290 s for the old method.

dallan-keylogic · 2024-04-11T18:14:52Z

Frankly, I'm shocked that the (v1, v3) pair was missed by the old method (and that seems to have become undone with minor changes).

This is because, with a tolerance of 1e-4, the v3 vector was considered a zero-vector, and was not compared with other vectors. I think you could reasonably argue for considering a zero vector as parallel with everything or parallel with nothing.

I ended up just screening out elements based on a norm_tolerance argument in my latest version. When it's called by the diagnostics toolbox, it's going to be given the same tolerance we're using for jacobian_rows and jacobian_columns so it gets flagged there.

dallan-keylogic · 2024-04-11T21:40:37Z

I did some (probably excessive) optimization, and now we have:

Identifier            ncalls   cumtime   percall      %
-------------------------------------------------------
new                        1     0.211     0.211    3.6
     --------------------------------------------------
     check-dotprods        1     0.145     0.145   68.5
     get-jacobian          1     0.000     0.000    0.0
     matmul                1     0.016     0.016    7.4
     norms                 1     0.002     0.002    0.8
     other               n/a     0.049       n/a   23.3
     ==================================================
old                        1     5.593     5.593   96.4
     --------------------------------------------------
     get-jacobian          1     0.000     0.000    0.0
     norm             100068     0.576     0.000   10.3
     sort-by-nz            1     4.773     4.773   85.3
     vectors               1     0.052     0.052    0.9
     other               n/a     0.192       n/a    3.4
     ==================================================
=======================================================

Note that I precalculated the Jacobian and passed it in so that it didn't take so long to run and we didn't have to subtract it out.

dallan-keylogic · 2024-04-12T13:10:38Z

The test failures in Python 3.8 are because some element of Numpy or Scipy decided to change the order it reports elements. The set of parallel vectors is the same, it's just being presented in a different order.

lbianchi-lbl · 2024-04-12T13:20:38Z

The test failures in Python 3.8 are because some element of Numpy or Scipy decided to change the order it reports elements. The set of parallel vectors is the same, it's just being presented in a different order.

Something to keep in mind that might or might not be relevant is that the versions of NumPy (and possibly SciPy as well) running on Python 3.8 will be older than for Python 3.9+, as NumPy dropped support for 3.8 a while ago. So this difference in sorting (or lack thereof) might be due to the NumPy version rather than the Python version.

andrewlee94 · 2024-04-12T13:56:24Z

A related note: Numpy is in the course of releasing v2.0 which may have impacts both here and in our code in general. The Pyomo team is tracking things on their end, but we should be aware of this coming change.

dallan-keylogic · 2024-06-18T17:23:06Z

@Robbybp @bpaul4 could one of you review this PR now that it's ready to be merged?

andrewlee94

Only a couple of minor comments to start with. That said, this PR required a lot of changes to scaling using the old API to make this work which makes me think this might have been better left until the new API is available. However, the work is done so if this passes tests and improves the diagnostics output we might as well move forward with it.

andrewlee94 · 2024-06-18T17:26:10Z

idaes/core/util/model_diagnostics.py

@@ -3746,8 +3747,6 @@ def check_parallel_jacobian(
        list of 2-tuples containing parallel Pyomo components

    """
-    # Thanks to Robby Parker for the sparse matrix implementation and


This should not be deleted, but added to.

andrewlee94 · 2024-06-18T17:27:35Z

idaes/models/flowsheets/demo_flowsheet.py

@@ -56,397 +56,40 @@ def build_flowsheet():

 def set_scaling(m):
    """Set scaling for demo flowsheet"""
-
-    iscale.set_scaling_factor(m.fs.M01.inlet_1_state[0].flow_mol_phase, 1e6)
+    params = m.fs.BT_props


Are all these changes necessary, especially seeing as you are xfailling the tests?

Originally, I was hoping that scaling would fix the problems, but the problems are deeper than that. However, I thought that writing all the scaling in a more concise way (as well as removing some bad manual scaling that's now taken care of in the BTX properties) was a change worth keeping.

andrewlee94 · 2024-06-18T17:28:53Z

idaes/models/properties/activity_coeff_models/activity_coeff_prop_pack.py

            iscale.constraint_scaling_transform(
-                self.eq_temperature_bubble, sf_T, overwrite=False
+                self.eq_temperature_bubble, sf_p, overwrite=False


A comment on why P is the correct scaling to use would be useful here so that someone does not undo this in the future.

That equation gives the bubble temperature implicitly, with the sum of partial pressures at the bubble temperature equaling the total pressure. Ideally we'd have people define scaling factors or rules to generate scaling factors immediately after they write the equation, so they don't forget what they wrote when it comes time to scale.

My point was to leave a note to that effect so that someone doesn't think this was a typo.

idaes/models/unit_models/tests/test_feed_flash.py

Robbybp · 2024-06-20T18:21:17Z

This PR still displays way more parallel constraints than I expect on my example, which has been merged into the examples repo. Here is code to reproduce:

from idaes_examples.mod.diagnostics.gas_solid_contactors.example import (
    create_square_model_with_new_variable_and_constraint
)
from idaes.core.util.model_diagnostics import DiagnosticsToolbox

m = create_square_model_with_new_variable_and_constraint()
dt = DiagnosticsToolbox(m)

dt.report_numerical_issues()
dt.display_near_parallel_constraints()

On main, this gives 11 parallel constraints. Here is the relevant output:

====================================================================================
Model Statistics

    Jacobian Condition Number: Undefined (Exactly Singular)

------------------------------------------------------------------------------------
4 WARNINGS

    WARNING: 110 Constraints with large residuals (>1.0E-05)
    WARNING: 77 Variables with extreme Jacobian values (<1.0E-08 or >1.0E+08)
    WARNING: 77 Constraints with extreme Jacobian values (<1.0E-08 or >1.0E+08)
    WARNING: 11 pairs of constraints are parallel (to tolerance 1.0E-08)

------------------------------------------------------------------------------------
6 Cautions

    Caution: 1254 Variables with value close to zero (tol=1.0E-08)
    Caution: 1058 Variables with extreme value (<1.0E-04 or >1.0E+04)
    Caution: 99 Variables with None value
    Caution: 1619 Variables with extreme Jacobian values (<1.0E-04 or >1.0E+04)
    Caution: 1870 Constraints with extreme Jacobian values (<1.0E-04 or >1.0E+04)
    Caution: 3553 extreme Jacobian Entries (<1.0E-04 or >1.0E+04)

------------------------------------------------------------------------------------
Suggested next steps:

    display_constraints_with_large_residuals()
    compute_infeasibility_explanation()
    display_variables_with_extreme_jacobians()
    display_constraints_with_extreme_jacobians()
    display_near_parallel_constraints()

====================================================================================
====================================================================================
The following pairs of constraints are nearly parallel:

    fs.MB.solid_super_vel[0.0], fs.MB.density_flowrate_constraint[0.0,1.0]
    fs.MB.solid_super_vel[30.0], fs.MB.density_flowrate_constraint[30.0,1.0]
    fs.MB.solid_super_vel[60.0], fs.MB.density_flowrate_constraint[60.0,1.0]
    fs.MB.solid_super_vel[90.0], fs.MB.density_flowrate_constraint[90.0,1.0]
    fs.MB.solid_super_vel[120.0], fs.MB.density_flowrate_constraint[120.0,1.0]
    fs.MB.solid_super_vel[150.0], fs.MB.density_flowrate_constraint[150.0,1.0]
    fs.MB.solid_super_vel[180.0], fs.MB.density_flowrate_constraint[180.0,1.0]
    fs.MB.solid_super_vel[210.0], fs.MB.density_flowrate_constraint[210.0,1.0]
    fs.MB.solid_super_vel[240.0], fs.MB.density_flowrate_constraint[240.0,1.0]
    fs.MB.solid_super_vel[270.0], fs.MB.density_flowrate_constraint[270.0,1.0]
    fs.MB.solid_super_vel[300.0], fs.MB.density_flowrate_constraint[300.0,1.0]

====================================================================================

Here is the output of report_numerical_issues with this branch:

====================================================================================
Model Statistics

    Jacobian Condition Number: Undefined (Exactly Singular)

------------------------------------------------------------------------------------
5 WARNINGS

    WARNING: 110 Constraints with large residuals (>1.0E-05)
    WARNING: 77 Variables with extreme Jacobian values (<1.0E-08 or >1.0E+08)
    WARNING: 77 Constraints with extreme Jacobian values (<1.0E-08 or >1.0E+08)
    WARNING: 1012 pairs of constraints are parallel (to tolerance 1.0E-08)
    WARNING: 462 pairs of variables are parallel (to tolerance 1.0E-08)

------------------------------------------------------------------------------------
6 Cautions

    Caution: 1254 Variables with value close to zero (tol=1.0E-08)
    Caution: 1058 Variables with extreme value (<1.0E-04 or >1.0E+04)
    Caution: 99 Variables with None value
    Caution: 1619 Variables with extreme Jacobian values (<1.0E-04 or >1.0E+04)
    Caution: 1870 Constraints with extreme Jacobian values (<1.0E-04 or >1.0E+04)
    Caution: 3553 extreme Jacobian Entries (<1.0E-04 or >1.0E+04)

------------------------------------------------------------------------------------
Suggested next steps:

    display_constraints_with_large_residuals()
    compute_infeasibility_explanation()
    display_variables_with_extreme_jacobians()
    display_constraints_with_extreme_jacobians()
    display_near_parallel_constraints()
    display_near_parallel_variables()

====================================================================================

Note that 1012 constraints are considered parallel. I would like this to be fixed before merging.

dallan-keylogic · 2024-06-20T18:47:23Z

It isn't clear to me that there's anything there to "fix". The Jacobian is singular to machine precision. Something is clearly wrong with the example numerically. I can go through and calculate the angle between those constraints or variables by hand, but if they're getting flagged it's less than 0.01 degrees.

Do you have scaling for your example? In order to get the diagnostics toolbox to recognize them, you need to do a Pyomo scaling transformation and create a toolbox with the scaled model.

dallan-keylogic · 2024-06-20T20:37:35Z

Okay, if we tighten dt.config.parallel_component_tolerance from 1e-8 to 1e-15, we get the expeced output:

dt.config.parallel_component_tolerance = 1e-15
dt.display_near_parallel_constraints()

====================================================================================
The following pairs of constraints are nearly parallel:

    fs.MB.solid_super_vel[0.0], fs.MB.density_flowrate_constraint[0.0,1.0]
    fs.MB.solid_super_vel[30.0], fs.MB.density_flowrate_constraint[30.0,1.0]
    fs.MB.solid_super_vel[60.0], fs.MB.density_flowrate_constraint[60.0,1.0]
    fs.MB.solid_super_vel[90.0], fs.MB.density_flowrate_constraint[90.0,1.0]
    fs.MB.solid_super_vel[120.0], fs.MB.density_flowrate_constraint[120.0,1.0]
    fs.MB.solid_super_vel[150.0], fs.MB.density_flowrate_constraint[150.0,1.0]
    fs.MB.solid_super_vel[180.0], fs.MB.density_flowrate_constraint[180.0,1.0]
    fs.MB.solid_super_vel[210.0], fs.MB.density_flowrate_constraint[210.0,1.0]
    fs.MB.solid_super_vel[240.0], fs.MB.density_flowrate_constraint[240.0,1.0]
    fs.MB.solid_super_vel[270.0], fs.MB.density_flowrate_constraint[270.0,1.0]
    fs.MB.solid_super_vel[300.0], fs.MB.density_flowrate_constraint[300.0,1.0]

====================================================================================

The problem seems to be that we're presently testing 1-<u,v>/(||u||*||v||) against a tolerance of 1e-8. If there's a variable in one of them that has a scaling factor off by a factor of 10^4, the scaling error is squared in the inner product, resulting in an error of 10^8, which then shows up in the parallel constraints. (This is the same issue that you run into when solving the normal equations or performing the SVD. If you form certain intermediate matrices, you lose half your digits of precision, going from 1e-16 to 1e-8). Presently, we're just giving cautions for badly scaled vars/constraints with magnitude 10^4, so this method is much more sensitive than our other methods. I'll make the default tolerance 1e-15 to resolve the inconsistency.

This method really should only be used on a model that has already had scaling issues resolved, and the example, being a legacy model, has severe scaling issues. However, this change is good enough to get by now. Maybe later we can add autoscaling to the example when @andrewlee94 finishes with that.

Robbybp · 2024-06-20T22:58:41Z

I'm not sure how I feel about further reducing the tolerance. Then Jacobian rows (1.0, 1.0) and (1.0, 1.0+1e-14) will not be considered parallel, which doesn't seem like what we want.

This method really should only be used on a model that has already had scaling issues resolved

I'm not sure I agree with this.

Robbybp · 2024-06-20T23:21:35Z

Then Jacobian rows (1.0, 1.0) and (1.0, 1.0+1e-14) will not be considered parallel

This is not correct. I see your point that our current measure of "distance-from-parallel" scales with epsilon**2 for two vectors (1.0, 1.0) and (1.0, 1.0+epsilon). Maybe we should use sqrt(unorm*vnorm - uvnorm) as the measure? It would be nice for the tolerance to be the same order of magnitude as a perturbation that will cause two vectors to no longer be parallel.

dallan-keylogic · 2024-06-21T17:18:22Z

Maybe we should use sqrt(unorm*vnorm - uvnorm) as the measure?

Is the goal here to make the measure of colinearity extensive? Then we run into problems if we have rows/columns that are small in norm. We could also make sqrt(1 - <u,v>/(||u||*||v||) ) the measure of colinearity, which is intensive. However, I'm not sure whether using an epsilon smaller than 1e-8 would be meaningful in that case unless the vectors were parallel to machine precision due to floating point error in the calculation. There are ways of computing the angle in a more precise way (check out this StackExchange post and the PDF it links to) but they'll take longer to calculate (we'd probably revert to your version of the code, but take out the part that automatically declares two vectors "not parallel" if they're structurally different) and be overkill for our purposes.

dallan-keylogic · 2024-06-27T16:46:21Z

One of the tests we're now failing is this one:

        @pytest.mark.component
        def test_display_near_parallel_constraints(self):
            model = ConcreteModel()
            model.v1 = Var(initialize=1e-8)
            model.v2 = Var()
            model.v3 = Var()

            model.c1 = Constraint(expr=model.v1 == model.v2)
            model.c2 = Constraint(expr=model.v1 == 1e-8 * model.v3)
            model.c3 = Constraint(expr=1e8 * model.v1 + 1e10 * model.v2 == 1e-6 * model.v3)
            model.c4 = Constraint(expr=-model.v1 == -0.99999 * model.v2)

            dt = DiagnosticsToolbox(model=model)

            stream = StringIO()
            dt.display_near_parallel_constraints(stream)

            expected = """====================================================================================
    The following pairs of constraints are nearly parallel:

        c1, c4

    ====================================================================================
    """

It turns out they're parallel only to a tolerance of 1e-11

(Pdb) from numpy.linalg import norm
(Pdb) v1 = np.array([1, -1])
(Pdb) v2 = np.array([-1, 0.99999])
(Pdb) norm(v1)
1.4142135623730951
(Pdb) norm(v2)
1.414206491322961
(Pdb) v1.dot(v2)
-1.99999
(Pdb) v1.dot(v2)/(norm(v1)*norm(v2))
-0.9999999999874997
(Pdb) out = 1- np.abs(v1.dot(v2)/(norm(v1)*norm(v2)))
(Pdb) out
1.2500334101162025e-11
(Pdb)

So the solution to @Robbybp 's example might just be to turn down the tolerance there (to 1e-15) and we can leave the default value looser. Is this solution satisfactory?

Robbybp

I'm still thinking about what to do with the tolerance issue, but here are a few comments in the meantime.

Robbybp · 2024-06-27T22:50:50Z

idaes/core/util/model_diagnostics.py

+    # The expression (1 - abs(values) < tolerance) returns an array with values
+    # of ones and zeros, depending on whether the condition is fulfilled or not.
+    # We then find indices where it is filled using np.nonzero.
+    parallel_1D = np.nonzero(1 - abs(values) < tolerance)[0]


What is the motivation for checking 1 - abs(u.dot(v)) / (unorm * vnorm) instead of unorm*vnorm - abs(u.dot(v))? The previous implementation, as well as Gurobi's implementation (https://github.com/Gurobi/gurobi-modelanalyzer/blob/a8a34f70e09f3e439b1b7f308b68010a2dfac5b3/src/gurobi_modelanalyzer/results_analyzer.py#L1241), use the latter quantity, which is fewer operations (less round-off error). Division in particular seems to be risky if the denominator is small, and with the default zero_norm_tolerance here, it can approach 1e-16.

Both the previous implementation and the Gurobi implementation checked against a semi-normalized epsilon:

# # Normalize the tolerance in the test; want the same # result for vectors u and v and 1000u and 1000v # if abs(abs(dotprod) - norm1 * norm2) > partol * norm1: continue

Note that this formulation doesn't scale the tolerance with norm2 for some reason. Also, possibly for reasons of efficiency, they use 1-norms instead of 2-norms. That's a false economy, though, because it's no longer guaranteed that abs(dotprod) <= norm1 * norm2 (Cauchy-Schwartz inequality). The equation we're using comes from the formula for angle between two vectors: u.dot(v) = unorm*vnorm*cos(theta) along with Taylor expansions around theta=0 and theta=pi.

We could implement the check like unorm*vnorm - abs(u.dot(v)) < tolerance*unorm*vnorm, but that has the disadvantage of being harder to vectorize. Floating point division, even division by some number on the order of 1e-16, is well defined and can be computed accurately. From Wikipedia:

There are no cancellation or absorption problems with multiplication or division, though small errors may accumulate as operations are performed in succession.

It's addition and subtraction that can cause problems with catastrophic cancellation.

I would advocate for unorm*vnorm - abs(u.dot(v)) < tolerance*max(1, unorm*vnorm), where all norms are 2-norms. Point taken on division, but the above is still fewer operations than the current implementation.

What's the intention with the max there? Even relatively orthogonal vectors will be considered "parallel" if the rows have small norm: [1e-6, 0] and [1e-6, 1e-6)] are considered parallel by this metric.

I don't think avoiding vectorized operations is going to be a major improvement in performance, especially since we'll be adding two multiplications (unorm*vnorm and tol*unorm*vnorm) for each pair for which u.dot(v) is nonzero.

idaes/core/util/model_diagnostics.py

codecov-commenter · 2024-06-28T16:19:23Z

Codecov Report

Attention: Patch coverage is 90.82569% with 10 lines in your changes missing coverage. Please review.

Project coverage is 76.95%. Comparing base (96237b9) to head (10250a5).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
idaes/models/unit_models/separator.py	72.22%	4 Missing and 1 partial ⚠️
idaes/core/base/control_volume1d.py	75.00%	3 Missing and 1 partial ⚠️
idaes/models/unit_models/feed_flash.py	92.85%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1395      +/-   ##
==========================================
- Coverage   76.99%   76.95%   -0.05%     
==========================================
  Files         382      382              
  Lines       61993    61923      -70     
  Branches    10146    10164      +18     
==========================================
- Hits        47733    47654      -79     
- Misses      11852    11864      +12     
+ Partials     2408     2405       -3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Robbybp · 2024-06-29T00:12:21Z

@dallan-keylogic I opened a PR into this branch, dallan-keylogic#4, that attempts to implement the original algorithm in a vectorized manner. Take a look when you have a chance.

andrewlee94 · 2024-07-05T12:52:27Z

@Robbybp @dallan-keylogic We don't seem to be making any progress towards converging on a solution to this - we have lots of ideas but nothing seems to stand out as a clear winner yet. Could someone summarise what ideas have been proposed, along with the pros and cons of each so we can start to work out what the path forward should be.

Also, if the underlying issue is that these checks are inherently sensitive to scaling, then maybe we should just wait until we have the new scaling tools ready so that we can test on actual scaled models.

Robbybp · 2024-07-11T17:19:15Z

@andrewlee94 I propose we do the following:

Merge scaling tools ( Add new AutoScaler and CustomScalerBase classes #1429 )
I update my example to (a) use scaling tools and (b) still not solve the numerically singular optimization problem.
Merge this PR

dallan-keylogic · 2024-10-10T18:44:44Z

@Robbybp , the scaling tools have been merged, so you can try to update your example to use those tools.

Robbybp · 2024-10-10T19:45:32Z

Thanks, I will try to carve out some time to update the example and get this merged

ksbeattie · 2024-10-24T18:44:30Z

Thanks, I will try to carve out some time to update the example and get this merged

@Robbybp, we've set some dates for the Nov release, namely Nov 14th for the release candidate. Hopefully you can get to this by then?

Robbybp · 2024-10-25T02:58:04Z

@dallan-keylogic I seem to remember that you said auto-scaling the model from my example led to the vectorized method producing a reasonable output. If this is correct, do you have the code you used to scale the model?

Robbybp · 2024-10-25T16:20:39Z

I'm running this with AutoScaler.scale_model, and still getting 66 parallel constraints and 8 parallel variables. Here's the code I'm running:

from idaes_examples.mod.diagnostics.gas_solid_contactors.example import (
    create_square_model_with_new_variable_and_constraint
)
from idaes.core.util.model_diagnostics import DiagnosticsToolbox
from idaes.core.scaling import AutoScaler
from idaes.core.util.scaling import get_jacobian
from pyomo.common.timing import TicTocTimer
from pyomo.contrib.pynumero.interfaces.pyomo_nlp import PyomoNLP
import scipy.sparse as sps
import logging

logging.getLogger("pyomo").setLevel(logging.CRITICAL)
logging.getLogger("idaes").setLevel(logging.CRITICAL)

timer = TicTocTimer()
timer.tic()

m = create_square_model_with_new_variable_and_constraint()
timer.toc("make model")
AutoScaler().scale_model(m)
timer.toc("scale model")
dt = DiagnosticsToolbox(m)
dt.report_numerical_issues(scaled=True)
timer.toc("report numerical issues")
dt.display_near_parallel_constraints(scaled=True)
dt.display_near_parallel_variables(scaled=True)
timer.toc("display near parallel")

def display_vectors(jac, con1, con2, transpose=False):
    comps = [con1, con2]
    if transpose:
        jac = jac.transpose().tocsr()
        # transpose indicates we are using variables...
        indices = nlp.get_primal_indices(comps)
    else:
        jac = jac.tocsr()
        indices = nlp.get_equality_constraint_indices(comps)
    for i, comp in enumerate(comps):
        print(f"{comp.name}")
        print(f"{jac[indices[i], :]}")
    u = jac[indices[0], :]
    v = jac[indices[1], :]
    dotprod = abs(u.dot(v.transpose()).data[0])
    norm1 = sps.linalg.norm(u, ord="fro")
    norm2 = sps.linalg.norm(v, ord="fro")
    normprod = norm1 * norm2
    ratio = dotprod / normprod
    print(f"dotprod  = {dotprod}")
    print(f"||u||    = {norm1}")
    print(f"||v||    = {norm2}")
    print(f"normprod = {normprod}")
    print(f"ratio    = {ratio}")

jac, nlp = get_jacobian(m, scaled=True)
con1 = m.fs.MB.solid_phase_heat_transfer[300.0,1.0]
con2 = m.fs.MB.gas_phase_heat_transfer[300.0,1.0]
print()
display_vectors(jac, con1, con2)

var1 = m.fs.MB.gas_phase.heat[0.0,1.0]
var2 = m.fs.MB.gas_phase.energy_accumulation[0.0,1.0,"Vap"]
print()
display_vectors(jac, var1, var2, transpose=True)

And the relevant part of the output:

[+  23.02] scale model
====================================================================================
Model Statistics

    Jacobian Condition Number: Undefined (Exactly Singular)

------------------------------------------------------------------------------------
3 WARNINGS

    WARNING: 110 Constraints with large residuals (>1.0E-05)
    WARNING: 66 pairs of constraints are parallel (to tolerance 1.0E-10)
    WARNING: 8 pairs of variables are parallel (to tolerance 1.0E-10)

------------------------------------------------------------------------------------
5 Cautions

    ...

------------------------------------------------------------------------------------
Suggested next steps:

    ...

====================================================================================
[+   4.20] report numerical issues
====================================================================================
The following pairs of constraints are nearly parallel:

    fs.MB.solid_super_vel[0.0], fs.MB.density_flowrate_constraint[0.0,1.0]
    fs.MB.solid_super_vel[30.0], fs.MB.density_flowrate_constraint[30.0,1.0]
    fs.MB.solid_super_vel[60.0], fs.MB.density_flowrate_constraint[60.0,1.0]
    fs.MB.solid_super_vel[90.0], fs.MB.density_flowrate_constraint[90.0,1.0]
    fs.MB.solid_super_vel[120.0], fs.MB.density_flowrate_constraint[120.0,1.0]
    fs.MB.solid_super_vel[150.0], fs.MB.density_flowrate_constraint[150.0,1.0]
    fs.MB.solid_super_vel[180.0], fs.MB.density_flowrate_constraint[180.0,1.0]
    fs.MB.solid_super_vel[210.0], fs.MB.density_flowrate_constraint[210.0,1.0]
    fs.MB.solid_super_vel[240.0], fs.MB.density_flowrate_constraint[240.0,1.0]
    fs.MB.solid_super_vel[270.0], fs.MB.density_flowrate_constraint[270.0,1.0]
    fs.MB.solid_super_vel[300.0], fs.MB.density_flowrate_constraint[300.0,1.0]
    fs.MB.solid_phase_heat_transfer[0.0,0.6], fs.MB.gas_phase_heat_transfer[0.0,0.6]
    ...
    fs.MB.solid_phase_heat_transfer[300.0,1.0], fs.MB.gas_phase_heat_transfer[300.0,1.0]

====================================================================================
====================================================================================
The following pairs of variables are nearly parallel:

    fs.MB.solid_phase.heat[0.0,0.7], fs.MB.solid_phase.energy_accumulation[0.0,0.7,Sol]
    fs.MB.solid_phase.heat[0.0,0.8], fs.MB.solid_phase.energy_accumulation[0.0,0.8,Sol]
    fs.MB.solid_phase.heat[0.0,0.9], fs.MB.solid_phase.energy_accumulation[0.0,0.9,Sol]
    fs.MB.gas_phase.heat[0.0,0.6], fs.MB.gas_phase.energy_accumulation[0.0,0.6,Vap]
    fs.MB.gas_phase.heat[0.0,0.7], fs.MB.gas_phase.energy_accumulation[0.0,0.7,Vap]
    fs.MB.gas_phase.heat[0.0,0.8], fs.MB.gas_phase.energy_accumulation[0.0,0.8,Vap]
    fs.MB.gas_phase.heat[0.0,0.9], fs.MB.gas_phase.energy_accumulation[0.0,0.9,Vap]
    fs.MB.gas_phase.heat[0.0,1.0], fs.MB.gas_phase.energy_accumulation[0.0,1.0,Vap]

====================================================================================
[+   4.43] display near parallel

fs.MB.solid_phase_heat_transfer[300.0,1.0]
  (0, 199)	-0.9999999999998741
  (0, 1242)	2.8977677529252563e-07
  (0, 2651)	2.8977677529252563e-07
  (0, 7274)	2.897767752616698e-07
fs.MB.gas_phase_heat_transfer[300.0,1.0]
  (0, 199)	0.9999999999998741
  (0, 1242)	-2.8977677529252563e-07
  (0, 2651)	-2.8977677529252563e-07
  (0, 3507)	2.897767752616698e-07
dotprod  = 0.9999999999999161
||u||    = 1.0
||v||    = 1.0
normprod = 1.0
ratio    = 0.9999999999999161

fs.MB.gas_phase.heat[0.0,1.0]
  (0, 1352)	2.897767752616698e-07
  (0, 2726)	-0.5749980449253075
fs.MB.gas_phase.energy_accumulation[0.0,1.0,Vap]
  (0, 2726)	0.09017618863215587
dotprod  = 0.051851132162305365
||u||    = 0.5749980449253805
||v||    = 0.09017618863215587
normprod = 0.05185113216231195
ratio    = 0.999999999999873

Robbybp · 2024-10-25T16:57:43Z

I see these same results when using scale_variables_by_magnitude and scale_constraints_by_jacobian_norm (maybe that is what scale_model is doing under the hood)

andrewlee94 · 2024-10-25T17:02:50Z

@Robbybp Yes, scale_model just calls scale_variables_by_magnitude and scale_constraints_by_jacobian_norm - it is intended to be the easy point of entry for new users and run the basic scaling routine in one command (whilst leaving more advanced users the option to call the individual methods as they desire).

implement new method

62a6346

dallan-keylogic requested a review from Robbybp April 10, 2024 21:06

dallan-keylogic requested review from andrewlee94 and lbianchi-lbl as code owners April 10, 2024 21:06

dallan-keylogic mentioned this pull request Apr 10, 2024

Add parallel constraint/variable check to report_numerical_issues #1385

Merged

run black

6a46229

Robbybp reviewed Apr 10, 2024

View reviewed changes

Nonstrict zip

5e4da39

Robbybp reviewed Apr 11, 2024

View reviewed changes

dallan-keylogic added 2 commits April 11, 2024 12:15

only relative criterion, filter out zero norm vectors

8fbb339

Correct preallocation

da664b0

dallan-keylogic added 2 commits April 11, 2024 12:31

run Black

9b8a25d

efficiency

7fe9c5e

ksbeattie added the Priority:Normal Normal Priority Issue or PR label Apr 11, 2024

ksbeattie assigned dallan-keylogic Apr 11, 2024

dallan-keylogic added 4 commits April 11, 2024 17:27

excessive optimization

ab30049

add optional jacobian and nlp arguments plus fix test

8c9959c

Merge branch 'main' into parallel_constraints

5021f29

run black

e4ce707

fix failing test and add additional comments

8438d09

andrewlee94 reviewed Jun 18, 2024

View reviewed changes

dallan-keylogic added 4 commits June 18, 2024 18:00

andrew's changes

c922565

readd attribution

fb1bb91

merge

e226647

merge in complementarity, leave memorials to its sins

1aed06f

tighten tolerance

02db37e

Robbybp reviewed Jun 27, 2024

View reviewed changes

dallan-keylogic added 2 commits June 28, 2024 11:39

more feedback from Robby

631e7c5

Merge branch 'main' into parallel_constraints

bf5f26d

Merge branch 'main' into parallel_constraints

ffd0071

add scaled option to some methods

10250a5

More efficient method for check_parallel_jacobian #1395

Are you sure you want to change the base?

More efficient method for check_parallel_jacobian #1395

Conversation

dallan-keylogic commented Apr 10, 2024 • edited Loading

Depends on #1429

Summary/Motivation:

Legal Acknowledgement

Choose a reason for hiding this comment

dallan-keylogic Apr 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Robbybp commented Apr 11, 2024

dallan-keylogic commented Apr 11, 2024 • edited Loading

dallan-keylogic commented Apr 11, 2024

Robbybp commented Apr 11, 2024

dallan-keylogic commented Apr 11, 2024

dallan-keylogic commented Apr 11, 2024

dallan-keylogic commented Apr 11, 2024

dallan-keylogic commented Apr 12, 2024

lbianchi-lbl commented Apr 12, 2024

andrewlee94 commented Apr 12, 2024 • edited Loading

dallan-keylogic commented Jun 18, 2024

andrewlee94 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Robbybp commented Jun 20, 2024

dallan-keylogic commented Jun 20, 2024

dallan-keylogic commented Jun 20, 2024

Robbybp commented Jun 20, 2024

Robbybp commented Jun 20, 2024

dallan-keylogic commented Jun 21, 2024

dallan-keylogic commented Jun 27, 2024

Robbybp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dallan-keylogic Jun 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Jun 28, 2024 • edited Loading

Codecov Report

Robbybp commented Jun 29, 2024

andrewlee94 commented Jul 5, 2024

Robbybp commented Jul 11, 2024 • edited by ksbeattie Loading

dallan-keylogic commented Oct 10, 2024

Robbybp commented Oct 10, 2024

ksbeattie commented Oct 24, 2024

Robbybp commented Oct 25, 2024

Robbybp commented Oct 25, 2024

Robbybp commented Oct 25, 2024

andrewlee94 commented Oct 25, 2024

dallan-keylogic commented Apr 10, 2024 •

edited

Loading

dallan-keylogic Apr 11, 2024 •

edited

Loading

dallan-keylogic commented Apr 11, 2024 •

edited

Loading

andrewlee94 commented Apr 12, 2024 •

edited

Loading

dallan-keylogic Jun 28, 2024 •

edited

Loading

codecov-commenter commented Jun 28, 2024 •

edited

Loading

Robbybp commented Jul 11, 2024 •

edited by ksbeattie

Loading