diff --git a/docs/learning.md b/docs/learning.md index aac132e..2d37481 100644 --- a/docs/learning.md +++ b/docs/learning.md @@ -38,3 +38,87 @@ Learning objects are instantiated using the following arguments: only sensible if the labeling agrees with the given Reference instance. (Same level of theory, same basis set, grid settings, ... ). + +
+ ![Image title](wandb.png){ width="900" } +
Illustration of what the Weights & biases logging looks like. + The graph on top simply shows the force RMSE on each data point versus a unique + 'identifier' per data point. The bottom plot shows the same data points, but now + grouped according to which walker generated them. In this case, walkers were sorted + according to temperature (lower walker index were lower temperature), and this is seen + in the fact that walkers with a higher index generated data with on average higher errors, + as they explored more out-of-equilibrium configurations.
+
+ + +The core business of a `Learning` instance is the following sequence of operations: + +1. use walkers in a `sample()` call to generate atomic geometries +2. evaluate those atomic geometries with the provided reference to obtain QM energy and + forces +3. include those geometries to the training data, or discard them if they exceed + `error_thresholds_for_discard`. Reset walkers if they exceed + `error_thresholds_for_reset`. +4. Train the model using the new data. +5. Compute metrics for the trained model across the new dataset and optionally log them to + W&B. + +Currently, there are two variants of this implemented: passive and active learning. + +## passive learning + +During passive learning, walkers are propagated using an external and 'fixed' Hamiltonian +which is not trained at any point (e.g. a pre-trained universal potential or a +hessian-based Hamiltonian). + +```py +model, walkers = learning.passive_learning( + model, + walkers, + hamiltonian=MACEHamiltonian.mace_mp0(), # fixed hamiltonian + steps=20000, + step=2000, + **optional_sampling_kwargs, +) +``` +Walkers are propagated for a total of 20,000 steps, and samples are drawn every 2,000 +steps which are QM evaluated by the reference and added to the training data. +If the walkers contain bias contributions, their total hamiltonian is simply the sum of +the existing bias contributions and the hamiltonian given to the `passive_learning()` +call. +Additional keyword arguments to this function are passed directly into the sample function (e.g. for +specifying the log level or the center-of-mass behavior). + +The returned model is the one trained on all data generated in the `passive_learning()` call as well as all data which was already present in the learning instance (for example if it had been initialized with `initial_data`, see above). +The returned walkers are identical to the ones passed into the method, but this is done to +emphasize that internally, they do change due to calling `passive_learning` (because they +are either propagated or reset, or their metadynamics bias has changed because there are +more hills present than before). + +## active learning + +During active learning, walkers are propagated with a Hamiltonian generated using the +current model. They are propagated for a given number of steps after which their final +state is passed into the reference for correct labeling. +Different from passive learning, active learning *does not allow for subsampling of the +trajectories of the walkers*. The idea behind this is that if you wish to propagate the +walker for 10 ps, and sample a structure every 1 ps to let each walker generate 10 states, +it is likely much better to instead increase the number of walkers (to cover more regions +in phase space) and propagate them in steps of 1 ps. Active learning is ideally suited for +massively parallel workflows (maximal number of walkers, with minimal sampling time per +walker) and we encourage users to exploit this. + +```py +model, walkers = learning.active_learning( + model, # used to generate hamiltonian + walkers, + steps=2000, # no more 'step' argument! + **optional_sampling_kwargs, +) +``` +## restarting a run + +`Learning` has first-class support for restarted runs -- simply resubmit your calculation! +It will detect whether or not the corresponding output folder has already fully logged the +each of the iterations, and if so, load the final state of the model, the walkers, and the +learning instance without actually doing any calculations. diff --git a/docs/wandb.png b/docs/wandb.png new file mode 100644 index 0000000..896c4b8 Binary files /dev/null and b/docs/wandb.png differ