Skip to content

Commit

Permalink
more docs
Browse files Browse the repository at this point in the history
  • Loading branch information
svandenhaute committed Jul 29, 2024
1 parent 7bcd2db commit e963d82
Show file tree
Hide file tree
Showing 2 changed files with 84 additions and 0 deletions.
84 changes: 84 additions & 0 deletions docs/learning.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,87 @@ Learning objects are instantiated using the following arguments:
only sensible if the labeling agrees with the given Reference instance. (Same level of
theory, same basis set, grid settings, ... ).


<figure markdown="span">
![Image title](wandb.png){ width="900" }
<figcaption> Illustration of what the Weights & biases logging looks like.
The graph on top simply shows the force RMSE on each data point versus a unique
'identifier' per data point. The bottom plot shows the same data points, but now
grouped according to which walker generated them. In this case, walkers were sorted
according to temperature (lower walker index were lower temperature), and this is seen
in the fact that walkers with a higher index generated data with on average higher errors,
as they explored more out-of-equilibrium configurations.</figcaption>
</figure>


The core business of a `Learning` instance is the following sequence of operations:

1. use walkers in a `sample()` call to generate atomic geometries
2. evaluate those atomic geometries with the provided reference to obtain QM energy and
forces
3. include those geometries to the training data, or discard them if they exceed
`error_thresholds_for_discard`. Reset walkers if they exceed
`error_thresholds_for_reset`.
4. Train the model using the new data.
5. Compute metrics for the trained model across the new dataset and optionally log them to
W&B.

Currently, there are two variants of this implemented: passive and active learning.

## passive learning

During passive learning, walkers are propagated using an external and 'fixed' Hamiltonian
which is not trained at any point (e.g. a pre-trained universal potential or a
hessian-based Hamiltonian).

```py
model, walkers = learning.passive_learning(
model,
walkers,
hamiltonian=MACEHamiltonian.mace_mp0(), # fixed hamiltonian
steps=20000,
step=2000,
**optional_sampling_kwargs,
)
```
Walkers are propagated for a total of 20,000 steps, and samples are drawn every 2,000
steps which are QM evaluated by the reference and added to the training data.
If the walkers contain bias contributions, their total hamiltonian is simply the sum of
the existing bias contributions and the hamiltonian given to the `passive_learning()`
call.
Additional keyword arguments to this function are passed directly into the sample function (e.g. for
specifying the log level or the center-of-mass behavior).

The returned model is the one trained on all data generated in the `passive_learning()` call as well as all data which was already present in the learning instance (for example if it had been initialized with `initial_data`, see above).
The returned walkers are identical to the ones passed into the method, but this is done to
emphasize that internally, they do change due to calling `passive_learning` (because they
are either propagated or reset, or their metadynamics bias has changed because there are
more hills present than before).

## active learning

During active learning, walkers are propagated with a Hamiltonian generated using the
current model. They are propagated for a given number of steps after which their final
state is passed into the reference for correct labeling.
Different from passive learning, active learning *does not allow for subsampling of the
trajectories of the walkers*. The idea behind this is that if you wish to propagate the
walker for 10 ps, and sample a structure every 1 ps to let each walker generate 10 states,
it is likely much better to instead increase the number of walkers (to cover more regions
in phase space) and propagate them in steps of 1 ps. Active learning is ideally suited for
massively parallel workflows (maximal number of walkers, with minimal sampling time per
walker) and we encourage users to exploit this.

```py
model, walkers = learning.active_learning(
model, # used to generate hamiltonian
walkers,
steps=2000, # no more 'step' argument!
**optional_sampling_kwargs,
)
```
## restarting a run

`Learning` has first-class support for restarted runs -- simply resubmit your calculation!
It will detect whether or not the corresponding output folder has already fully logged the
each of the iterations, and if so, load the final state of the model, the walkers, and the
learning instance without actually doing any calculations.
Binary file added docs/wandb.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit e963d82

Please sign in to comment.