Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix optimizer writeStep final (if optimization ends for reaching the limit number of iterations) #2387

Open
wants to merge 12 commits into
base: devel
Choose a base branch
from

Conversation

alfoa
Copy link
Collaborator

@alfoa alfoa commented Oct 21, 2024


Pull Request Description

What issue does this change request address? (Use "#" before the issue to link it, i.e., #42.)

Closes #2386

What are the significant changes in functionality due to this change request?

Added a print of the out streams at the end of the finalizeSampler call in the MultiRun step to make sure that the "final" solution is printed if an out stream is requested in a MultiRun step


For Change Control Board: Change Request Review

The following review must be completed by an authorized member of the Change Control Board.

  • 1. Review all computer code.
  • 2. If any changes occur to the input syntax, there must be an accompanying change to the user manual and xsd schema. If the input syntax change deprecates existing input files, a conversion script needs to be added (see Conversion Scripts).
  • 3. Make sure the Python code and commenting standards are respected (camelBack, etc.) - See on the wiki for details.
  • 4. Automated Tests should pass, including run_tests, pylint, manual building and xsd tests. If there are changes to Simulation.py or JobHandler.py the qsub tests must pass.
  • 5. If significant functionality is added, there must be tests added to check this. Tests should cover all possible options. Multiple short tests are preferred over one large test. If new development on the internal JobHandler parallel system is performed, a cluster test must be added setting, in XML block, the node <internalParallel> to True.
  • 6. If the change modifies or adds a requirement or a requirement based test case, the Change Control Board's Chair or designee also needs to approve the change. The requirements and the requirements test shall be in sync.
  • 7. The merge request must reference an issue. If the issue is closed, the issue close checklist shall be done.
  • 8. If an analytic test is changed/added is the the analytic documentation updated/added?
  • 9. If any test used as a basis for documentation examples (currently found in raven/tests/framework/user_guide and raven/docs/workshop) have been changed, the associated documentation must be reviewed and assured the text matches the example.

…is stopped not for a convergence but for reaching the limit)
@alfoa alfoa requested a review from wangcj05 October 21, 2024 18:42
@alfoa
Copy link
Collaborator Author

alfoa commented Oct 23, 2024

@wangcj05 @mandd ready to be reviewed

@moosebuild
Copy link

Job Test qsubs sawtooth on 39c5a39 : invalidated by @joshua-cogliati-inl

failed in fetch fatal: fetch-pack: invalid index-pack output

1 similar comment
@moosebuild
Copy link

Job Test qsubs sawtooth on 39c5a39 : invalidated by @joshua-cogliati-inl

failed in fetch fatal: fetch-pack: invalid index-pack output

@moosebuild
Copy link

Job Mingw Test on e8b15fd : invalidated by @alfoa

@moosebuild
Copy link

Job Test qsubs sawtooth on e8b15fd : invalidated by @alfoa

@moosebuild
Copy link

Job Mingw Test on e8b15fd : invalidated by @alfoa

@alfoa
Copy link
Collaborator Author

alfoa commented Oct 24, 2024

@joshua-cogliati-inl ray test fails in Windows (but this merge request should not influence that test). Is it a random failure?

@joshua-cogliati-inl
Copy link
Contributor

@joshua-cogliati-inl ray test fails in Windows (but this merge request should not influence that test). Is it a random failure?

Yes, the ray test is not the most reliable test, so that is probably random.

@moosebuild
Copy link

Job Mingw Test on e8b15fd : invalidated by @alfoa

dependencies.xml Outdated
@@ -68,7 +68,7 @@ Note all install methods after "main" take
<dask source="pip" pip_extra="[complete]"/>
<ray source="pip" pip_extra="[default]">2.6</ray>
<!-- redis is needed by ray, but on windows, this seems to need to be explicitly stated -->
<redis source="pip" os='windows'/>
<redis source="pip" os='windows'>5.1</redis>
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required because redis released a new version on Oct 24 2024 (5.2) and this new version is not compatible with ray 2.6 (causing the Windows test to fail).

See https://pypi.org/project/redis/#history

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well changing the dependency does not help.... @joshua-cogliati-inl no idea

dependencies.xml Outdated
@@ -68,7 +68,7 @@ Note all install methods after "main" take
<dask source="pip" pip_extra="[complete]"/>
<ray source="pip" pip_extra="[default]">2.6</ray>
<!-- redis is needed by ray, but on windows, this seems to need to be explicitly stated -->
<redis source="pip" os='windows'/>
<redis source="pip" os='windows'>5.1.0</redis>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason behind the choice of this particular version?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah...trying out the latest version of redis that seemed to work . Now I tried another one

@@ -148,6 +148,8 @@ def run(self):
if self.options['type'] == 'csv':
filename = dictOptions['filenameroot']
rlzIndex = self.indexPrinted.get(filename,0)
if rlzIndex and rlzIndex >= len(self.sourceData[index]):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if these two lines are needed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is needed when the printing of the dataset is not finished (e.g. point set) and this is triggered in the multi run step (right before exiting the step). We check here the index and the length in case the printing is finished before reaching the ned of the step

@@ -1,22 +1,22 @@
trajID,sigma-A,sigma-B,decay_A,decay_B,sum,age,batchId,fitness,iteration,accepted,AHDp,conv_AHDp
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while the previous csv contains new rows the the bottom of the file as expected, this file contains some differences thorughout the rows, any possible explanation here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rows are swapped.

@alfoa alfoa requested a review from mandd October 31, 2024 22:53
@moosebuild
Copy link

Job Mingw Test on 50828f1 : invalidated by @alfoa

Comment on lines +270 to +273
for myLambda, outIndex in self._outputCollectionLambda:
if isinstance(outputs[outIndex], OutStreamEntity):
myLambda([None,outputs[outIndex]])
self.raiseAMessage(f'Finalized output "{inDictionary["Output"][outIndex].name}"')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alfoa The proposed changes can resolve the issue. However, it can be confusing since these lines are almost similar to collection part in the same function. Could you provide more details why previous collection can not collect the final solution? Is it possible to make some modifications inside the optimizer to enable it? I have two concerns for the proposed approach:

  1. Two collections in the same function, which make it very confusing. Either add more explanations or find a way to avoid it.
  2. It is also very confusing in the FilePrint.py, since the new added lines to check the rlzIndex seem very unnecessary. I see the changes make the final collection possible, but it is really hard to understand why these lines are needed unless the developers fully understand the collections in the steps.

Copy link
Collaborator Author

@alfoa alfoa Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lines in the FilePrint are not unnecessary. They are required when the collection in triggered on data objects that are not "collected/created" by the Optimizers.

Basically, the "SolutionExport" in the Optimizers is "updated" with the final solution after the collection is triggered (at the begin of the processing of the "last job"). So the Outstream is not invoked before exiting the Multirun loop.

This modification was the "minimal viable" solution to trigger an out stream call at the end of the multi run.

Maybe another approach could be to split the calls to the output collection:

  • Right after a Job is finished for data objects (DataObjects/Databases)
  • After the call to the finalizeActualSampling for OutStreams

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DEFECT] Optimizers' <writeSteps>final</writeSteps> does not print any final value in optimizers (GA)
5 participants