Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitor observables every N epochs #573

Open
elcorto opened this issue Aug 22, 2024 · 8 comments
Open

Monitor observables every N epochs #573

elcorto opened this issue Aug 22, 2024 · 8 comments
Labels
question Further information is requested

Comments

@elcorto
Copy link
Member

elcorto commented Aug 22, 2024

When using during_training_metric, the respective quantity is calculated in every epoch, which may be costly if during_training_metric="total_energy".

When using shuffled snapshots, adding the required calculation_output_file as in

data_handler.add_snapshot(
    "Be_snapshot_shuffled1.in.npy",
    data_path,
    "Be_snapshot_shuffled1.out.npy",
    data_path,
    "va",
    calculation_output_file=os.path.join( data_path, "Be_snapshot1.out"),
)

may not be valid since the reference data in Be_snapshot1.out doesn't match the validation data. I'm not sure what data is read from this file, so this may or may not be a problem, but in any case one must provide some file here, else we see Exception: Could not guess type of additional calculation data provided to MALA..

In addition, #571 and #572 make it hard to use the feature in production at the moment.

So, is there a way to do something like examples/basic/ex02_test_network.py every N epochs only, where one defines non-shuffled test snapshots plus reference data calculation_output_file="/path/to/qe.out". This would be independent of the validation data (one could call it second validation data set) and save compute as well.

@elcorto elcorto added the question Further information is requested label Aug 22, 2024
@RandomDefaultUser
Copy link
Member

RandomDefaultUser commented Oct 8, 2024

Hi @elcorto, thanks for raising this issue!
You are indeed right that using some output with shuffled snapshots is dubious at best. To be more precise: band energy and total energy as an optimization metric do NOT work with shuffled snapshots at all. There is currently no way to safeguard this, albeit OpenPMD could maybe help in this respect in the future.

I generally agree that having a mechanic that would track a targeted metric only after N steps sounds useful. Just to double check with @nerkulec , since you overhauled the reporting, I assume this would not clash with the current state of things, right?

@nerkulec
Copy link

nerkulec commented Oct 8, 2024

@RandomDefaultUser Sure, I can integrate this, I already have this on my branch

@RandomDefaultUser
Copy link
Member

Great, thank you!

@nerkulec
Copy link

I implemented it in #584 in a way that LDOS error is evaluated every epoch, while other metrics only every parameters.running.validate_every_n_epochs epochs.
I usually train on shuffled data, and use unshuffled data for validation. That eliminates the need for two validation sets. @elcorto does that also work for you?

@elcorto
Copy link
Member Author

elcorto commented Oct 16, 2024

Thanks @nerkulec for this addition. I took the liberty to re-open this issue such that we can discuss this (which should be quickly resolved).

So to make sure I understand: There are two new parameters:

I was under the impression that all of those need to be added, in the running case, as a property to common.parameters.ParametersRunning, with a doc string such that they show up in the sphinx docs?

So as I understand #584:

  • validate_every_n_epochs will make the val loss be calculated every validate_every_n_epochs epochs instead of every (validate_every_n_epochs=1)
  • It assumes that the val snapshot is not shuffled, for certain parameters.running.validation_metrics apart from "ldos", such as "total_energy"? If yes, then which metrics rely on an unshuffled snapshot? I guess this comes down to what is read from calculation_output_file? This was the first point that I was trying to address above, sorry for not being more clear.

Given this new feature, what is the difference between validation_metrics and during_training_metric, other than the evaluation frequency defined by validate_every_n_epochs?

I think what I had in mind was this workflow, based on using shuffled snapshots by default:

  • shuffle one or more snapshots (or randomly sub-sample one, or or ...), this creates the whole dataset (descriptors, ldos)
  • do a train/val split (not a train/val/test split as usual in DL since the "test" set is a separate snapshot for which e.g. total energy is known); the fact that the train and val sets are supplied as "snapshots" is an implementation detail at this point
  • monitor train and val loss (usual stuff, detect overfitting, etc)
  • new: every N epochs, calculate observables which need a full snapshot as reference (total energy, say), this is what I meant by "second validation data set", so every N epochs we get the total energy error to some reference
  • do final eval on test snapshot (or in test runs, skip this and treat the N-epochs snapshot as independent test set)

So, for certain validation_metrics other than "ldos", the proposed workflow in #584 is to use a another full snapshot as a stand-in for a standard i.i.d. sampled val set from the same distribution as the train set, i.e. detect overfitting the train set by looking at the val loss computed on a (potentially) different population (what the MALA workflow was before using shuffled snapshots and what is still reflected in the API). This is kind of OK since we also have to do the final test eval on a different full snapshot where we need to assume that the descriptors come from the same distribution as the train and val ones -- this is just an artifact of the data we are dealing with. I still think it would be cleaner to have a standard train process (val loss on val set from a split of the same dataset) with a separate option to kick off a metric/observable calculation every N epochs which may require a full unshuffled snapshot. Since the val loss is calculated as often as the train loss, this would also speed up the val loss calculation given, say, a 80/20 train/val split.

@elcorto elcorto reopened this Oct 16, 2024
@RandomDefaultUser
Copy link
Member

I am wondering if maybe this is a discussion we should have during a meeting (or potentially the design workshop?)
I feel both the current method as well as what you, @elcorto, outline in your comment is reasonable and this is more of a "what do we want to do?" question rather than a "how do we implement it?", which may be resolved quicker in person.

The current implementation only allows for either shuffled validation snapshots and no observables or unshuffled validation snapshots and observables, just as you have mentioned.

I am wondering though what the intended use should be. I personally always use shuffled validation snapshots and no observables, but as I understand both you and @nerkulec use unshuffled validation snapshots and observables (or would like to at least incorporate that into the process). In that case, it may make sense to modify the entire interface, and subsume such a change to larger modifications of the data management/training subroutine?

What do you think?

@elcorto
Copy link
Member Author

elcorto commented Oct 18, 2024

I agree that this is best discussed F2F. I'd volunteer to document the current state just as you summarized above, afterwards I think we can close this issue. To do this, there is for me still the question what the difference between the new validation_metrics and during_training_metric is (apart from the eval frequency via validate_every_n_epochs), as both can be used to calculate "total_energy", for instance, if I understood correctly. @nerkulec feel free to approach me outside GitHub as well for this :)

@nerkulec
Copy link

@elcorto The difference between validation_metrics and during_training_metric is that validation_metrics are used for logging to monitor the training process (that's why it's possible to have multiple of them) and during_training_metric is used specifically for things like learning rate scheduling and early stopping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants