Constructing 1D histograms & free energy profiles

Construct a 1D histrogam …

There are generally speaking two ways to construct a 1D histogram from molecular simulations:

The most straightforward case is processing the data of a single unbiased molecular simulation. This will use the from_single_trajectory routine of the Histogram1D class
The most interesting case, however, is when one has performed one or more simulations biased along a certain collective variable (denoted as CV in the following). In this case, the histogram is construced from a series of biased simulation trajectories using the Weighted Histogram Analysis Method (WHAM). This will use the from_wham routine of the Histogram1D class

… from a single simulation

A histogram can be computed from a trajectory of CV values generated by a molecular simulation (such as molecular dynamics or Monte Carlo). To this end, we first need to read or compute the data series of the CV values from the molecular simulation trajectory. This can be done using the routines implemented in the trajectory module. For example, if the CV values are written to the first column in a COLVAR file (e.g. as done with using PLUMED), one can use the following code:

from thermolib.thermodynamics.trajectory import ColVarReader
reader = ColVarReader([0])
cv_data = reader('colvar.dat')

Alternatively, if the cv values need to be computed from the configurations in a XYZ trajectory file, this can be done using the CVComputer class in conjunction with the classes from the CV module. For example, if you want to compute the values of the CV defined as the distance between the atoms with indices 1 and 2 from the trajectory in the traj.xyz file, one can use the following code:

from thermolib.thermodynamics.trajectory import CVComputer
from thermolib.thermodynamics.cv import Distance
CV = Distance(1,2)
reader = CVComputer([CV])
cv_data = reader('traj.xyz')

Once the CV values are determined, we construct the histogram as follows:

from thermolib.thermodynamics.histogram import Histogram1D
histogram = Histogram1D.from_single_trajectory(cv_data, bins)

Where bins represents the argument of the same name defined in the numpy.histogram routine, hence for more information on its meaning and allowed values we refer to its documentation.

Warning

UNITS in ThermoLIB

At this point, it is important to note that all arguments you parse to a ThermoLIB routine that represent physical properties are assumed to be defined in atomic units. Therefore, if you read an observable from a file using a ThermoLIB routine, you will have the option to define in which unit the observable is defined in the file. This will auto convert it into atomic units so that you can afterwards safely parse it to another ThermoLIB routine. Similarly, if any ThermoLIB routine returns an observable, it will be in atomic units unless you can specify the units with a keyword argument.

To illustrate with the code blocks above, the from_single_trajectory routine expects its cv_data argument to be given in atomic units. However, we extract this data from a COLVAR file. If the data in this COLVAR file is given in angstrom, you should specify the argument units=['A'] in the ColVarReader constructor which will read the value and auto convert it to atomic units which you can then also safely use to parse to the from_single_trajectory routine.

reader = ColVarReader([0], units=['A'])
cv_data = reader('colvar.dat')
histogram = Histogram1D.from_single_trajectory(cv_data, bins)

… from multiple simulations using WHAM

In order to reconstruct a 1D histogram in terms of a given CV from a set of trajectories that represent simulations that have been biased along this CV, we apply the Weighted Histogram Analysis Method (WHAM). To do so, we work in two steps.

As a first step, we need to read all trajectories and define the corresponding bias potentials. This requires reading in a WHAM input file using the read_wham_input routine. Such WHAM input file (which can for example be the metadata file generated/used in PLUMED) is assumed to have the following syntax:

T = XXX K
NAME_1 CV0_1 KAPPA_1
NAME_2 CV0_2 KAPPA_2
NAME_3 CV0_3 KAPPA_3
...

where NAME_i is a label for the i-th simulation, CV0_i and KAPPA_i are the parameter values of the bias potential along the CV in the i-th simulation. As we will later specify that in the simulations we have used the Parabola1D bias potential, this will imply that the simulation with label NAME_1 has a parabolic bias potential centered around CV0_1 and with a force constant of KAPPA_1. The label NAME_1 will furthermore be used to locate the trajectory file from which the cv data of simulation i will be extracted. For more information on how to use the read_wham_input routine to read/define all required setup information, we refer to either the source code documentation of this routine, the various tutorials or the example below.

Explenation of read_wham_input by example

To further illustrate this, we consider an example of Umbrella Sampling with 30 windows (i.e. 30 umbrellas) for which all required data is stored in a directory containing the following files:

current dir:
| wham_input.txt
| COLVAR_W0.dat
| COLVAR_W1.dat
| COLVAR_W2.dat
| ...
| COLVAR_W28.dat
| COLVAR_W29.dat
| COLVAR_W30.dat

Each file COLVAR_WXX.dat represents a COLVAR file corresponding to the biased simulation in window XX and contains the CV values of that simulation as its first (and only) column. The file wham_input.txt has following content:

T=300K
W1 -1.5 500
W2 -1.4 500
W3 -1.3 500
...
W22 1.3 500
W23 1.4 500
W24 1.5 500

This file hence specifies that the simulation performed in window 1 (which has label W1 and has its CV values stored in the COLVAR_W1.dat file) had a bias potential with two parameters with values -1.5 and 500 (which are the CV0 and KAPPA value of the Parabolic1D bias potential). As such, we can now collect all required data with the read_wham_input routine as follows:

from thermolib.thermodynamics.trajectory import ColVarReader
from thermolib.tools import read_wham_input
from thermolib.thermodynamics.histogram import Histogram1D
# define variables
fn_meta = 'wham_input.txt'
colvar_reader = ColVarReader([0], units=['au'])
path_template_trajectory_files = 'COLVAR_%s.dat'
# read temperature, bias potentials and CV trajectory data
temp, biasses, trajectories = read_wham_input(
    fn_meta, colvar_reader, path_template_trajectory_files,
    bias_potential='Parabola1D', q01_unit='au', kappa1_unit='kjmol',
)

The path_template_trajectory_files variable should hold a %s placeholder. For example the path template COLVAR_%s.dat means that a simulation that was given a label W1 in the WHAM input file wham_input.txt should have its trajectory stored in the file COLVAR_W1.dat. The colvar_reader object should be able to extract the CV values from such trajectory files, that is why in this case it was defined as an instance of the ColVarReader class, here with its index argument set to [0] indicating a single CV will be extracted from column with index 0.

Once the temperature, bias potentials and trajectories have been extracted uisng this read_wham_input routine, we can use WHAM to construct a histogram using the Histogram1D.from_wham routine as illustrated below.

hist = Histogram1D.from_wham(bins, trajectories, biasses, temp)

Constructing a 1D free energy profile …

A free energy profile can be defined in ThermoLIB using the BaseFreeEnergyProfile class or its child classes (such as SimpleFreeEnergyProfile). The BaseFreeEnergyProfile class implements all basic features of 1D free energy profiles such as:

making a plot of the free energy profile
cropping the range of a collective variable
recollecting the collective variable into newly defined intervals
transformation towards other collective variables
deprojection towards a 2D FES
…

The child classes will further implement additional features that require assumptions of the free energy profile. At this moment, only one such child class is implemented, the SimpleFreeEnergyProfile which represents the FEP of a simple process with a single reactant, transition and product state. There are several ways to construct a free energy profile.

In the following sections we will illustrate how to construct a free energy profile using the BaseFreeEnergyProfile (which can hence also be applied to the SimpleFreeEnergyProfile without futher modification).

… using the constructor

from thermolib.thermodynamics.fep import BaseFreeEnergyProfile
fep = BaseFreeEnergyProfile(cv, f, temp)

Herein, cv and f arguments represent equal-length numpy arrays containing respectively the collective variable and free energy on a grid, which should all be defined in atomic units. Furthermore, temp represents the temperature (in atomic units, hence, Kelvin) at which the free energy profile is evaluated.

… reading from a text file

from thermolib.thermodynamics.fep import BaseFreeEnergyProfile
fep = BaseFreeEnergyProfile.from_txt(fn, temp)

Herein, the fn argument represents the name of a text file containing the values of the collective variable and the corresponding free energy on a grid. By default, the values of the cv and f arguments should be defined in atomic units in the first and in kjmol in the second column respectively separated by one or multiple whitespaces. More control on the columns from which to read the data, the delimiter used, units in which cv and/or f is defined and more can be found in the reference guide. As was also the case in the constructor routine, temp represents the temperature (in atomic units, hence, Kelvin) at which the free energy profile is evaluated.

… from a histogram

A 1D free energy profile can also be constructed from an instance of the Histogram1D class that was computed from one or more (possibly biased) molecular simulations as detailed before. To convert this histogram into a free energy profile, we proceed as follows:

from thermolib.thermodynamics.fep import BaseFreeEnergyProfile
temp = 300*kelvin
fep = BaseFreeEnergyProfile.from_histogram(hist, temp)