Improve `Set` initialization performance #3302

jsiirola · 2024-06-25T22:38:06Z

Fixes # .

Summary/Motivation:

When working on some very large models (100M+ variables), we identified some performance issues in how we were initializing Set objects. This PR resolves those performance bottlenecks:

import time
from pyomo.environ import *
start = time.time()
m = ConcreteModel()
m.I = Set(initialize=range(100000000))
print(time.time() - start)

is reduced from 91.3s to 13.2s, and a real model with a more complex set initializer (for 200M elements) is reduced from 305s to 69s.

One side effect is to remove the warning message for adding duplicate items to a Set (which brings the Pyomo Set behavior closer in line with the Python set behavior).

Changes proposed in this PR:

Rework Set initialization to make initializing large data sets more efficient
Remove the warning for adding duplicate elements to a Set

Legal Acknowledgement

By contributing to this software project, I have read the contribution guide and agree to the following terms and conditions for my contribution:

I agree my contributions are submitted under the BSD license.
I represent I am authorized to make the contributions and grant the license. If my employer has rights to intellectual property that includes these contributions, I represent that I have received permission to make contributions and grant the required license on behalf of that employer.

…struction

(which attempts to hash all known exception types)

emma58

I think this looks good. A few questions mostly out of curiosity!

pyomo/core/base/set.py

emma58 · 2024-07-09T16:51:52Z

pyomo/core/base/set.py

+            # Note that we reset _ordered_values within the loop because
+            # of an old example where the initializer rule makes
+            # reference to values previously inserted into the Set
+            # (which triggered the creation of the _ordered_values)
+            self._ordered_values = None


Wait, I'm still confused why you do this for each val though? What would be the difference with doing this above the loop?

The offending example is:

def U_init(model, z): if z == 6: return Set.End if z == 1: return 1 else: return model.U[z - 1] * z model.U = Set(ordered=True, initialize=U_init)

The issue is that when z == 2 the __getitem__ causes the _ordered_values list to be created with a single entry. If we don't clear _ordered_values, then it will stay a 1-element list, and when z == 3, looking for model.U[2] looks up _ordered_values[1]; running off the end of the list.

Ewwww, this is horrifying! But okay, makes sense.

pyomo/core/base/set.py

emma58 · 2024-07-09T19:52:48Z

(Also I am very excited about not warning for duplicate entries! :))

codecov · 2024-07-10T00:46:19Z

Codecov Report

Attention: Patch coverage is 98.13665% with 3 lines in your changes missing coverage. Please review.

Project coverage is 88.46%. Comparing base (17f8aed) to head (ba9706d).
Report is 98 commits behind head on main.

Files	Patch %	Lines
pyomo/core/base/set.py	98.13%	3 Missing ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #3302   +/-   ##
=======================================
  Coverage   88.46%   88.46%           
=======================================
  Files         867      867           
  Lines       98218    98250   +32     
=======================================
+ Hits        86886    86919   +33     
+ Misses      11332    11331    -1

Flag	Coverage Δ
linux	`86.28% <98.13%> (+<0.01%)`	⬆️
osx	`75.57% <98.13%> (+<0.01%)`	⬆️
other	`86.47% <98.13%> (+<0.01%)`	⬆️
win	`83.77% <98.13%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

blnicho

I have a couple minor questions but otherwise this looks great!

pyomo/core/base/set.py

blnicho · 2024-07-26T21:21:19Z

pyomo/core/base/set.py

+        # It is important that the iterator is an actual iterator
+        val_iter = iter(val_iter)


Is this needed? Seems like val_iter is guaranteed to be an iterator from line 1399 unless you expect this method to be called from elsewhere.

Great catch! Removed this, plus updated some other bugs / inconsistencies / documentation around flattening & initialization.

jsiirola added 7 commits June 25, 2024 14:21

Rework Set initialization to improve performance on large sets

3904639

implement Set.first() and .last() without forcing _ordered_values con…

5eb9875

…struction

NFC: apply black

4716f9d

Restore original ordered set initialization warning

b6cb09c

Restore previous Set add()/update() return values

d5e4f05

Rework Set.End to avoid conflict with dask.multiprocessing

d0338bc

(which attempts to hash all known exception types)

Remove unused class attribute

1ad9d4e

jsiirola mentioned this pull request Jun 28, 2024

Fixes bug with IndexedSet objects and the within argument #3288

Merged

blnicho requested review from blnicho and emma58 July 2, 2024 19:04

jsiirola and others added 2 commits July 8, 2024 07:28

Merge branch 'main' into set-init-performance

ef50ea3

Merge branch 'main' into set-init-performance

ba9706d

emma58 reviewed Jul 9, 2024

View reviewed changes

emma58 approved these changes Jul 10, 2024

View reviewed changes

mrmundt approved these changes Jul 24, 2024

View reviewed changes

blnicho approved these changes Jul 26, 2024

View reviewed changes

jsiirola added 6 commits July 29, 2024 05:36

Rework _initialize and avoid redundant call to iter()

ac47903

Duplicate _update_impl workaround from ordered sets in sorted sets

9aab8c2

Catch (and test) an edge case when flattening sets

26488f3

NFC: apply black

0fc78ad

Merge branch 'main' into set-init-performance

3bc6a52

Avoid duplicate logging of unordered Set data warning

7df3801

blnicho approved these changes Jul 30, 2024

View reviewed changes

blnicho merged commit 5c85b7c into Pyomo:main Jul 31, 2024
32 checks passed

jsiirola deleted the set-init-performance branch August 2, 2024 01:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve `Set` initialization performance #3302

Improve `Set` initialization performance #3302

jsiirola commented Jun 25, 2024 •

edited

Loading

emma58 left a comment

emma58 Jul 9, 2024

jsiirola Jul 9, 2024

emma58 Jul 10, 2024

emma58 commented Jul 9, 2024

codecov bot commented Jul 10, 2024 •

edited

Loading

blnicho left a comment

blnicho Jul 26, 2024

jsiirola Jul 29, 2024

		# It is important that the iterator is an actual iterator
		val_iter = iter(val_iter)

Improve Set initialization performance #3302

Improve Set initialization performance #3302

Conversation

jsiirola commented Jun 25, 2024 • edited Loading

Fixes # .

Summary/Motivation:

Changes proposed in this PR:

Legal Acknowledgement

emma58 left a comment

Choose a reason for hiding this comment

emma58 Jul 9, 2024

Choose a reason for hiding this comment

jsiirola Jul 9, 2024

Choose a reason for hiding this comment

emma58 Jul 10, 2024

Choose a reason for hiding this comment

emma58 commented Jul 9, 2024

codecov bot commented Jul 10, 2024 • edited Loading

Codecov Report

blnicho left a comment

Choose a reason for hiding this comment

blnicho Jul 26, 2024

Choose a reason for hiding this comment

jsiirola Jul 29, 2024

Choose a reason for hiding this comment

Improve `Set` initialization performance #3302

Improve `Set` initialization performance #3302

jsiirola commented Jun 25, 2024 •

edited

Loading

codecov bot commented Jul 10, 2024 •

edited

Loading