-
Notifications
You must be signed in to change notification settings - Fork 518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Set
initialization performance
#3302
Conversation
(which attempts to hash all known exception types)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good. A few questions mostly out of curiosity!
# Note that we reset _ordered_values within the loop because | ||
# of an old example where the initializer rule makes | ||
# reference to values previously inserted into the Set | ||
# (which triggered the creation of the _ordered_values) | ||
self._ordered_values = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, I'm still confused why you do this for each val
though? What would be the difference with doing this above the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The offending example is:
def U_init(model, z):
if z == 6:
return Set.End
if z == 1:
return 1
else:
return model.U[z - 1] * z
model.U = Set(ordered=True, initialize=U_init)
The issue is that when z == 2
the __getitem__
causes the _ordered_values
list to be created with a single entry. If we don't clear _ordered_values
, then it will stay a 1-element list, and when z == 3
, looking for model.U[2]
looks up _ordered_values[1]
; running off the end of the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ewwww, this is horrifying! But okay, makes sense.
(Also I am very excited about not warning for duplicate entries! :)) |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3302 +/- ##
=======================================
Coverage 88.46% 88.46%
=======================================
Files 867 867
Lines 98218 98250 +32
=======================================
+ Hits 86886 86919 +33
+ Misses 11332 11331 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple minor questions but otherwise this looks great!
pyomo/core/base/set.py
Outdated
# It is important that the iterator is an actual iterator | ||
val_iter = iter(val_iter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed? Seems like val_iter
is guaranteed to be an iterator from line 1399 unless you expect this method to be called from elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch! Removed this, plus updated some other bugs / inconsistencies / documentation around flattening & initialization.
Fixes # .
Summary/Motivation:
When working on some very large models (100M+ variables), we identified some performance issues in how we were initializing
Set
objects. This PR resolves those performance bottlenecks:is reduced from 91.3s to 13.2s, and a real model with a more complex set initializer (for 200M elements) is reduced from 305s to 69s.
One side effect is to remove the warning message for adding duplicate items to a Set (which brings the Pyomo
Set
behavior closer in line with the Pythonset
behavior).Changes proposed in this PR:
Set
initialization to make initializing large data sets more efficientLegal Acknowledgement
By contributing to this software project, I have read the contribution guide and agree to the following terms and conditions for my contribution: