more docs

molmod · Jun 5, 2024 · 5b3da30 · 5b3da30
1 parent 96dd426
commit 5b3da30
Show file tree

Hide file tree

Showing 4 changed files with 190 additions and 103 deletions.
diff --git a/docs/hamiltonian.md b/docs/hamiltonian.md
@@ -1,20 +1,77 @@
-In Born-Oppenheimer-based molecular simulation, atomic nuclei are treated as classical particles that are subject to *effective* interactions which are determined by the quantum mechanical behavior of the electrons.
-In addition to the atomic interactions, it is often useful to define additional biasing forces on the system, e.g. in order to drive a rare event or to prevent the system from exploring undesired regions in phase space.
-In addition, there exist various alchemical free energy techniques which rely on systematic changes in the hamiltonian ( = potential energy) of the system to derive free energy differences between different states.
+In Born-Oppenheimer-based molecular simulation, atomic nuclei are treated as classical particles that are subject to *effective* interactions, which are determined by the quantum mechanical behavior of the electrons.
+These interactions determine the interatomic forces which are used in a dynamic simulation to propagate the atomic positions from one timestep to the next.
+In more advanced schemes, researchers may modify these effective interactions to include biasing forces (e.g. in order to induce a phase transition), or perform an alchemical transformation between two potential energy surfaces (when computing relative free energies).
+
+The ability to combine various energy contributions in an arbitrary manner allows for a very natural definition of many algorithms in computational statistical physics.
+To accomodate for all these use cases, psiflow provides a simple abstraction for *"a function which accepts an atomic geometry and returns energies and forces"*: the `Hamiltonian` class.
+Examples of Hamiltonians are a specific ML potential, a bias potential on a collective variable, or a quadratic approximation to a potential energy minimum.
+
+By far the simplest hamiltonian is the Einstein crystal, which binds atoms to a certain reference position using harmonic springs with a single, fixed force constant.
 
-To accomodate for all these use cases, psiflow provides a simple abstraction for *a function which accepts an atomic geometry and returns energies and forces*: the `Hamiltonian` class.
-The simplest hamiltonian (which is only really useful for testing purposes) is the Einstein crystal, which binds atoms using harmonic springs to a certain reference position.
 ```py
+from psiflow.geometry import Geometry
+from psiflow.hamiltonians import EinsteinCrystal
+
 
 geometry = Geometry.from_string('''
     2
     H 0.0 0.0 0.0
     H 0.0 0.0 0.8
 ''')
 
-hamiltonian = EinsteinCrystal(
-    reference_geometry=geometry.positions,
-    force_constant=0.1,
+einstein = EinsteinCrystal(
+    reference_geometry=geometry.positions,  # positions at which all springs are at rest
+    force_constant=0.1,                     # force constant, in eV / A**2
     )
 
 ```
+As mentioned earlier, the key feature of hamiltonians is that they take as input an atomic geometry, and spit out an energy, a set of forces, and optionally also virial stress.
+Because hamiltonians might require specialized resources for their evaluation (e.g. an ML potential which gets executed on a GPU), evaluation of a hamiltonian does not necessarily happen instantly (e.g. if a GPU node is not immediately available). Similar to how `Dataset` instances return futures of a `Geometry` when a particular index is queried, hamiltonians return a future when asked to evaluate the energy/forces/stress of a particular `Geometry`:
+
+```py
+future = einstein.evaluate(geometry)      # returns an AppFuture of the Geometry; evaluates instantly
+evaluated = future.result()                  # calling result makes us wait for it to actually complete
+
+assert evaluated.energy is not None                     # the energy of the hamiltonian
+assert not np.any(np.isnan(evaluated.per_atom.forces))  # a (N, 3) array with forces
+```
+One of the most commonly used hamiltonians will be that of MACE, one of the most ubiquitous ML potentials.
+There exist reasonably accurate pretrained models which can be used for exploratory purposes. 
+These are readily available in psiflow:
+
+```py
+from psiflow.hamiltonians import get_mace_mp0
+
+
+mace = get_mace_mp0()               # downloads MACE-MP0 from github
+future = mace.evaluate(geometry)    # evaluates the MACE potential on the geometry
+
+evaluated = future.result()
+forces = evaluated.per_atom.forces  # forces on each atom, in float32
+
+assert np.sum(np.dot(forces[0], forces[1])) < 0  # forces in H2 always point opposite of each other
+assert np.allclose(np.sum(forces), 0.0)          # forces are conservative --> sum to zero
+```
+As alluded to earlier, hamiltonians can be combined in arbitrary ways to create new hamiltonians.
+Psiflow supports a concise syntax for basic arithmetic operations on hamiltonians, such as 
+multiplication by a scalar or addition of two hamiltonians:
+
+```py
+data = Dataset.load('train.xyz')
+mix = 0.5 * einstein + 0.5 * mace             # MixtureHamiltonian with E = 0.5 * E_einstein + 0.5 * E_mace
+energies_mix = mix.evaluate(data).get('energy')
+
+energies_einstein = einstein.evaluate(data).get('energy')
+energies_mace     = mace.evaluate(data).get('energy')
+assert np.allclose(
+      energies_mix.result(),
+      0.5 * energies_einstein.result() + 0.5 * energies_mace.result(),
+      )
+```
+This makes it very easy to introduce bias potentials into your simulations -- see for example the formic acid transition state [example](https://github.com/molmod/psiflow/tree/main/examples/formic_acid_transition.py).
+The following is a list of all available hamiltonians in psiflow:
+
+- `EinsteinCrystal`: A simple harmonic potential which binds atoms to a reference position.
+- `MACE`: ML potential, either pretrained as available on GitHub, or trained within psiflow (see later sections)
+- `Harmonic`: A general quadratic potential based on a Hessian matrix and an optimized geometry.
+- `PlumedHamiltonian`: a bias contribution based on a PLUMED input file.
diff --git a/docs/index.md b/docs/index.md
@@ -109,98 +109,6 @@ In what follows, we assume that a suitable `context` has been initialized.
 --->
 
 
-## Atomic data
-In psiflow, a set of atomic configurations is represented using the `Dataset` class.
-It may represent training/validation data for model development, or
-a trajectory of snapshots that was generated using molecular dynamics.
-A `Dataset` instance mimics the behavior of a list of ASE `Atoms` instances:
-```py
-from psiflow.data import Dataset
-
-
-data_train  = Dataset.load('train.xyz')         # create a psiflow Dataset from a file
-data_subset = data_train[:10]                   # create a new Dataset instance with the first 10 states
-data_train  = data_subset + data_train[10:]     # combining two datasets is easy
-
-data = Dataset.load('lots_of_data.xyz')
-train, valid = data.shuffle().split(0.9)        # shuffle structures and partition into train/valid sets
-type(train)                                     # psiflow Dataset
-type(valid)                                     # psiflow Dataset
-
-```
-The main difference between a psiflow `Dataset` instance and an actual Python `list` of
-`Atoms` is that a `Dataset` can represent data __that will be generated in the future__.
-
-!!! note "Parsl 101: Apps and Futures"
-    To understand what is meant by 'generating data in the future', it is necessary
-    to introduce the core concepts in Parsl: apps and futures. In their simplest
-    form, apps are just functions, and futures are the result of an app given
-    a set of inputs. Importantly, a Future already exists before the actual calculation
-    is performed. In essence, a Future _promises_ that, at some time in the future, it will
-    contain the actual result of the function evaluation. Take a look at the following
-    example:
-
-    ```py
-    from parsl.app.app import python_app
-
-
-    @python_app # convert a regular Python function into a Parsl app
-    def sum_integers(a, b):
-        return a + b
-
-
-    sum_future = sum_integers(3, 4) # tell Parsl to generate a future that represents the sum of integers 3 and 4
-    print(sum_future)               # is an AppFuture, not an integer
-
-    print(sum_future.result())      # now compute the actual result; this will print 7 !
-
-    ```
-    The return value of Parsl apps is not the actual result (in this case, an integer), but
-    an AppFuture that will store the result of the function evaluation after it has completed.
-    The main reason for doing things this way is that this allows for asynchronous execution.
-    For more information, check out the [Parsl documentation](https://parsl.readthedocs.io/en/stable/).
-
-The actual atomic configurations are stored __as a Parsl future, in an attribute of the Dataset
-object__.
-Actually getting the data would require the user to make a `.result()` call similar
-to the trivial Parsl example above.
-Let's go back to the first example and try and get the actual list of `Atoms` instances:
-```py
-data_train = Dataset.load('train.xyz')
-atoms_list = data_train.as_list()                   # returns AppFuture
-
-isinstance(atoms_list, list)                        # returns False! 
-
-atoms_list.result()                                 # this is the actual list
-
-
-data_train[4]                   # AppFuture representing the configuration at index 4
-data_train[4].result()          # actual Atoms instance
-
-```
-If the initial XYZ file was formatted in extended XYZ format and contained the potential
-energy, forces, and stress of each atomic configuration,
-they are also loaded in the dataset:
-```py
-data_train[4].result()                      # actual Atoms instance
-data_train[4].result().info['energy']       # potential energy, float
-data_train[4].result().info['stress']       # virial stress, 2darray of shape (3, 3)
-data_train[4].result().arrays['forces']     # forces, 2darray of shape (natoms, 3)
-
-```
-While not that important for the user, it is worth mentioning that psiflow
-extends ASE's `Atoms` functionality with a few additional features, mostly
-for internal convenience. Practically speaking, this does not really change anything for the user,
-but we mention it for completeness.
-```py
-from ase import Atoms
-from psiflow.data import FlowAtoms
-
-snapshot = data_train[4].result()   # convert Future of snapshot to actual snapshot
-isinstance(snapshot, Atoms)         # True; FlowAtoms subclasses Atoms
-type(snapshot) == Atoms             # False; it is not actually an Atoms instance
-type(snapshot) == FlowAtoms         # True
-
 ```
 
 ## Trainable potentials

diff --git a/docs/sampling.md b/docs/sampling.md
@@ -0,0 +1,114 @@
+In the Born-Oppenheimer philosophy, we explore the phase space of a molecule or a material and generate samples using molecular dynamics simulations.
+Those samples are then used to evaluate time averages of some property of interest in order to predict physical observables.
+In psiflow, such simulations are executed within [i-PI](https://ipi-code.org/), a versatile and efficient code which supports an impressive number of [features](https://ipi-code.org/i-pi/features.html).
+We mention the most important ones below
+
+- **molecular dynamics in various ensembles**: most notably NVE, NVT, and fully anisotropic NPT. There exist a variety of thermostat and barostat options, the default being Langevin. Together with the ability to combine arbitrary hamiltonians, this includes biased molecular dynamics simulations using e.g. harmonic restraints (umbrella sampling).
+- **path-integral molecular dynamics** (PIMD): allows for the simulation of the quantum behavior of light atomic nuclei. This is important for many systems involving hydrogen atoms at relatively low temperatures (<=room temperature). Importantly these simulations can also be performed in any of the aforementioned ensembles. 
+- **geometry optimizations**: i-PI can be used to optimize the geometry of a molecule or a material using a variety of optimization algorithms.
+- **replica exchange** (parallel tempering): dramatically improves the sampling efficiency and ergodicity whenever nontrivial free energy barriers are present in the phase space of the system. In this approach, one considers replicas of the system at various temperatures and/or pressures, or optionally even with different hamiltonians.
+- **multiple walker metadynamics**: simple but powerful method to overcome free energy barriers when a suitable collective variable is known for the system of interest.
+
+
+## the `Walker` class
+Psiflow is essentially a convenient wrapper around most of i-PI's features.
+The key object which enables the execution of these simulations is the `Walker` class.
+A single walker describes a single replica of the system which evolves through phase space.
+It is initialized with a `Geometry` instance which describes the start of the simulation, and can be assigned a particular hamiltonian, a temperature and/or pressure, and a timestep.
+
+```py
+from psiflow.sampling import Walker
+from psiflow.geometry import Geometry
+from psiflow.hamiltonians import get_mace_mp0
+
+
+start = Geometry.load("start.xyz")
+walker = Walker(
+    start,
+    hamiltonian=get_mace_mp0(),
+    temperature=300.0,
+    pressure=None,  # NVT simulation
+    timestep=0.5,   # in femtoseconds, the default value
+)
+```
+In the vast majority of cases, it is necessary to run mutiple simulations at slightly different conditions.
+For example, let us create ten of these walkers which are identical except for the temperature:
+
+```py
+walkers = walker.multiply(10)
+for i, walker in enumerate(walkers):
+    w.temperature = 300 + i * 10
+```
+When propagated, each of these walkers will generate trajectories in phase space which correspond to their own temperature.
+In the case of temperature, such trajectories can be used to e.g. evaluate variation of the mean energy with respect to temperature
+(and therefore, the heat capacity of the system).
+
+## generating trajectories
+
+Walkers can be propagated in time by using the `sample` function.
+It accepts a list of walkers, each of which will be propagated in phase space according to its own parameters.
+Importantly, there are *no restriction* on the type of walkers in this list.
+Users can mix regular NVT walkers, with PIMD NVE walkers, and a list of N replica exchange walkers.
+Internally, psiflow will recognize which walkers are independent and parallelize the execution as much as possible
+Consider the following example:
+```py
+from psiflow.sampling import sample
+
+outputs = sample(
+    walkers,
+    steps=1e6,  # total number of timesteps to run the simulation for; this translates to 500 ps in this case
+    step=1e3,   # sample every 1000 timesteps
+    start=1e5,  # start sampling after 50 ps
+)
+print(outputs)  # list of `SimulationOutput` instances
+```
+In this example, the sample function will return a list of `Output` instances, each of which contains the trajectory of a single walker.
+The outputs are ordered in the same way as the input walkers (i.e. `outputs[0]` corresponds to the output from `walkers[0]`).
+They provide access to the sampled trajectory of the simulation, the elapsed simulation time, and importantly, a number of *observable properties*
+which have been written out by i-PI. These properties can be used to compute averages of physical observables, such as the internal energy or the virial stress tensor.
+A full list of available properties is given in the [i-PI documentation](https://ipi-code.org/i-pi/output-tags.html). Note that psiflow adheres to the same naming convention as adopted in i-PI:
+
+- `energy`: the total energy of the system. The actual name of this quantity is `potential{electronvolt}`
+- `temperature`: the instantaneous temperature of the system. The actual name of this quantity is `temperature{kelvin}`
+- `time`: the elapsed simulation time. The actual name of this quantity is `time{picosecond}`
+- `volume`: the volume of the simulation cell (only for periodic systems). The actual name of this quantity is `volume{angstrom3}`
+
+Similarly to the evaluation of a `Hamiltonian` or the querying of a snapshot in a `Dataset`, simulation outputs are returned as futures.
+For example, say we wanted to compute the average energy for each of the simulations:
+```py
+import numpy as np
+
+energy_futures = [output["potential{electronvolt}"] for output in outputs]
+energies = [future.result() for future in energy_futures]
+mean_energy = np.array([np.mean(energy) for energy in energies])
+```
+This example extracts the futures which contain the potential energies of all simulations, waits for them to complete (via `result()`), and then computes the mean energy for each simulation. In a very similar fashion, we can compute the bulk modules of bcc iron simply by constructing walkers at various pressures and extracting the corresponding `volume{angstrom3}` observable -- see [here](https://github.com/molmod/psiflow/tree/main/examples/iron_bulk_modulus.py).
+
+In many cases, it is useful to save the trajectory of a simulation to disk.
+Trajectories are essentially just a series of snapshots, and as such, psiflow represents them as `Dataset` instances.
+Each of the outputs has an attribute `trajectory` which is a `Dataset` instance.
+Let us save the trajectory of the first simulation to disk:
+
+```py
+outputs[0].trajectory.save("300K.xyz")
+```
+
+As a sanity check, let us recompute the potential energies which were stored at each snapshot during the simulation using the `evaluate` functionality of our MACE hamiltonian:
+```py
+mace   = walkers[0].hamiltonian                               # the hamiltonian used in the simulations
+future = mace.evaluate(outputs[0].trajectory).get('energy')   # future of the recomputed energies as an array
+
+manual_energies_0 = future.result()                           # get the actual numpy array
+
+assert np.allclose(
+    manual_energies_0,
+    energies[0],
+    )
+```
+## walker utilities
+
+## PIMD simulations
+
+## replica exchange
+
+## metadynamics
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -61,7 +61,15 @@ markdown_extensions:
       emoji_index: !!python/name:materialx.emoji.twemoji
       emoji_generator: !!python/name:materialx.emoji.to_svg
 
+#extra_javascript:
+#  - javascripts/mathjax.js
+#  - https://polyfill.io/v3/polyfill.min.js?features=es6
+#  - https://unpkg.com/mathjax@3/es5/tex-mml-chtml.js
+
 extra_javascript:
-  - javascripts/mathjax.js
-  - https://polyfill.io/v3/polyfill.min.js?features=es6
-  - https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js
+  - javascripts/katex.js
+  - https://unpkg.com/katex@0/dist/katex.min.js
+  - https://unpkg.com/katex@0/dist/contrib/auto-render.min.js
+
+extra_css:
+  - https://unpkg.com/katex@0/dist/katex.min.css