.. _seclab_ug_construct1D:

*************************************************
Constructing 1D histograms & free energy profiles
*************************************************

.. _seclab_ug_construct1D_hist:

Construct a 1D histrogam ...
============================

There are generally speaking two ways to construct a 1D histogram from molecular simulations:

1) The most straightforward case is processing the data of a single unbiased molecular simulation. This will use the :py:meth:`from_single_trajectory <thermolib.thermodynamics.histogram.Histogram1D.from_single_trajectory>` routine of the :py:class:`Histogram1D <thermolib.thermodynamics.histogram.Histogram1D>` class
2) The most interesting case, however, is when one has performed one or more simulations biased along a certain collective variable (denoted as CV in the following). In this case, the histogram is construced from a series of biased simulation trajectories using the Weighted Histogram Analysis Method (WHAM). This will use the :py:meth:`from_wham <thermolib.thermodynamics.histogram.Histogram1D.from_wham>` routine of the :py:class:`Histogram1D <thermolib.thermodynamics.histogram.Histogram1D>` class

... from a single simulation
----------------------------

A histogram can be computed from a trajectory of CV values generated by a molecular simulation (such as molecular dynamics or Monte Carlo). To this end, we first need to read or compute the data series of the CV values from the molecular simulation trajectory. This can be done using the routines implemented in the :py:mod:`trajectory <thermolib.thermodynamics.trajectory>` module. For example, if the CV values are written to the first column in a COLVAR file (e.g. as done with using PLUMED), one can use the following code:

.. code-block:: python

    from thermolib.thermodynamics.trajectory import ColVarReader
    reader = ColVarReader([0])
    cv_data = reader('colvar.dat')

Alternatively, if the cv values need to be computed from the configurations in a XYZ trajectory file, this can be done using the :py:class:`CVComputer <thermolib.thermodynamics.cv.CVComputer>` class in conjunction with the classes from the :py:mod:`CV <thermolib.thermodynamics.cv>` module. For example, if you want to compute the values of the CV defined as the distance between the atoms with indices 1 and 2 from the trajectory in the ``traj.xyz`` file, one can use the following code:

.. code-block:: python

    from thermolib.thermodynamics.trajectory import CVComputer
    from thermolib.thermodynamics.cv import Distance
    CV = Distance(1,2)
    reader = CVComputer([CV])
    cv_data = reader('traj.xyz')

Once the CV values are determined, we construct the histogram as follows:

.. code-block:: python

    from thermolib.thermodynamics.histogram import Histogram1D
    histogram = Histogram1D.from_single_trajectory(cv_data, bins)

Where ``bins`` represents the argument of the same name defined in the numpy.histogram routine, hence for more information on its meaning and allowed values we refer to `its documentation <https://numpy.org/doc/stable/reference/generated/numpy.histogram.html>`_. 

.. warning:: **UNITS in ThermoLIB**

    At this point, it is important to note that all arguments you parse to a ThermoLIB routine that represent physical properties are assumed to be defined in atomic units. Therefore, if you read an observable from a file using a ThermoLIB routine, you will have the option to define in which unit the observable is defined in the file. This will auto convert it into atomic units so that you can afterwards safely parse it to another ThermoLIB routine. Similarly, if any ThermoLIB routine returns an observable, it will be in atomic units unless you can specify the units with a keyword argument.
    
    To illustrate with the code blocks above, the :py:meth:`from_single_trajectory <thermolib.thermodynamics.histogram.Histogram1D.from_single_trajectory>` routine expects its ``cv_data`` argument to be given in atomic units. However, we extract this data from a COLVAR file. If the data in this COLVAR file is given in angstrom, you should specify the argument ``units=['A']`` in the ColVarReader constructor which will read the value and auto convert it to atomic units which you can then also safely use to parse to the :py:meth:`from_single_trajectory <thermolib.thermodynamics.histogram.Histogram1D.from_single_trajectory>` routine.

    .. code-block:: python

        reader = ColVarReader([0], units=['A'])
        cv_data = reader('colvar.dat')
        histogram = Histogram1D.from_single_trajectory(cv_data, bins)


... from multiple simulations using WHAM
----------------------------------------

In order to reconstruct a 1D histogram in terms of a given CV from a set of trajectories that represent simulations that have been biased along this CV, we apply the Weighted Histogram Analysis Method (WHAM). To do so, we work in two steps.

As a first step, we need to read all trajectories and define the corresponding bias potentials. This requires reading in a WHAM input file using the :py:meth:`read_wham_input <thermolib.tools.read_wham_input>` routine. Such WHAM input file (which can for example be the metadata file generated/used in PLUMED) is assumed to have the following syntax:

.. code-block:: python

    T = XXX K
    NAME_1 CV0_1 KAPPA_1
    NAME_2 CV0_2 KAPPA_2
    NAME_3 CV0_3 KAPPA_3
    ...
            
where NAME_i is a label for the i-th simulation, CV0_i and KAPPA_i are the parameter values of the bias potential along the CV in the i-th simulation. As we will later specify that in the simulations we have used the :py:meth:`Parabola1D <thermolib.thermodynamics.bias.Parabola1D>` bias potential, this will imply that the simulation with label NAME_1 has a parabolic bias potential centered around CV0_1 and with a force constant of KAPPA_1. The label NAME_1 will furthermore be used to locate the trajectory file from which the cv data of simulation i will be extracted. For more information on how to use the read_wham_input routine to read/define all required setup information, we refer to either :py:meth:`the source code documentation of this routine <thermolib.tools.read_wham_input>`, :ref:`the various tutorials <seclab_tut>` or the example below. 

.. admonition:: Explenation of read_wham_input by example

    To further illustrate this, we consider an example of Umbrella Sampling with 30 windows (i.e. 30 umbrellas) for which all required data is stored in a directory containing the following files:

    .. code-block:: python

        current dir:
        | wham_input.txt
        | COLVAR_W0.dat
        | COLVAR_W1.dat
        | COLVAR_W2.dat
        | ...
        | COLVAR_W28.dat
        | COLVAR_W29.dat
        | COLVAR_W30.dat

    Each file ``COLVAR_WXX.dat`` represents a COLVAR file corresponding to the biased simulation in window XX and contains the CV values of that simulation as its first (and only) column. The file ``wham_input.txt`` has following content:

    .. code-block:: python

        T=300K
        W1 -1.5 500
        W2 -1.4 500
        W3 -1.3 500
        ...
        W22 1.3 500
        W23 1.4 500
        W24 1.5 500

    This file hence specifies that the simulation performed in window 1 (which has label W1 and has its CV values stored in the ``COLVAR_W1.dat`` file) had a bias potential with two parameters with values -1.5 and 500 (which are the CV0 and KAPPA value of the Parabolic1D bias potential). As such, we can now collect all required data with the :py:meth:`read_wham_input <thermolib.tools.read_wham_input>` routine as follows:

    .. code-block:: python

        from thermolib.thermodynamics.trajectory import ColVarReader
        from thermolib.tools import read_wham_input
        from thermolib.thermodynamics.histogram import Histogram1D
        # define variables
        fn_meta = 'wham_input.txt'
        colvar_reader = ColVarReader([0], units=['au'])
        path_template_trajectory_files = 'COLVAR_%s.dat'
        # read temperature, bias potentials and CV trajectory data
        temp, biasses, trajectories = read_wham_input(
            fn_meta, colvar_reader, path_template_trajectory_files, 
            bias_potential='Parabola1D', q01_unit='au', kappa1_unit='kjmol',
        )
        

    The ``path_template_trajectory_files`` variable should hold a ``%s`` placeholder. For example the path template ``COLVAR_%s.dat`` means that a simulation that was given a label ``W1`` in the WHAM input file ``wham_input.txt`` should have its trajectory stored in the file ``COLVAR_W1.dat``. The ``colvar_reader`` object should be able to extract the CV values from such trajectory files, that is why in this case it was defined as an instance of the :py:class:`ColVarReader <thermolib.thermodynamics.cv.ColVarReader>` class, here with its index argument set to [0] indicating a single CV will be extracted from column with index 0.

Once the temperature, bias potentials and trajectories have been extracted uisng this read_wham_input routine, we can use WHAM to construct a histogram using the :py:meth:`Histogram1D.from_wham <thermolib.thermodynamics.histogram.Histogram1D.from_wham>` routine as illustrated below.

.. code-block:: python

    hist = Histogram1D.from_wham(bins, trajectories, biasses, temp)

.. _seclab_ug_construct1D_FEP:

Constructing a 1D free energy profile ...
=========================================

A free energy profile can be defined in ThermoLIB using the :py:class:`BaseFreeEnergyProfile <thermolib.thermodynamics.fep.BaseFreeEnergyProfile>` class or its child classes (such as :py:class:`SimpleFreeEnergyProfile <thermolib.thermodynamics.fep.SimpleFreeEnergyProfile>`). The BaseFreeEnergyProfile class implements all basic features of 1D free energy profiles such as:

* making a plot of the free energy profile
* cropping the range of a collective variable
* recollecting the collective variable into newly defined intervals
* transformation towards other collective variables
* deprojection towards a 2D FES
* ...
  
The child classes will further implement additional features that require assumptions of the free energy profile. At this moment, only one such child class is implemented, the :py:class:`SimpleFreeEnergyProfile <thermolib.thermodynamics.fep.SimpleFreeEnergyProfile>` which represents the FEP of a simple process with a single reactant, transition and product state. There are several ways to construct a free energy profile. 

In the following sections we will illustrate how to construct a free energy profile using the :py:class:`BaseFreeEnergyProfile <thermolib.thermodynamics.fep.BaseFreeEnergyProfile>` (which can hence also be applied to the :py:class:`SimpleFreeEnergyProfile <thermolib.thermodynamics.fep.SimpleFreeEnergyProfile>` without futher modification).

... using the constructor
-------------------------

.. code-block:: python
    
    from thermolib.thermodynamics.fep import BaseFreeEnergyProfile
    fep = BaseFreeEnergyProfile(cv, f, temp)
    
Herein, ``cv`` and ``f`` arguments represent equal-length numpy arrays containing respectively the collective variable and free energy on a grid, which should all be defined in atomic units. Furthermore, ``temp`` represents the temperature (in atomic units, hence, Kelvin) at which the free energy profile is evaluated.

... reading from a text file
----------------------------

.. code-block:: python

    from thermolib.thermodynamics.fep import BaseFreeEnergyProfile
    fep = BaseFreeEnergyProfile.from_txt(fn, temp)

Herein, the ``fn`` argument represents the name of a text file containing the values of the collective variable and the corresponding free energy on a grid. By default, the values of the ``cv`` and ``f`` arguments should be defined in *atomic units* in the first and in *kjmol* in the second column respectively separated by one or multiple whitespaces. More control on the columns from which to read the data, the delimiter used, units in which cv and/or f is defined and more can be found in :py:meth:`the reference guide <thermolib.thermodynamics.fep.BaseFreeEnergyProfile.from_txt>`. As was also the case in the constructor routine, *temp* represents the temperature (in atomic units, hence, *Kelvin*) at which the free energy profile is evaluated.

... from a histogram
--------------------

A 1D free energy profile can also be constructed from an instance of the :py:class:`Histogram1D <thermolib.thermodynamics.histogram.Histogram1D>` class that was computed from one or more (possibly biased) molecular simulations :ref:`as detailed before <seclab_ug_construct1D_hist>`. To convert this histogram into a free energy profile, we proceed as follows:

.. code-block:: python

    from thermolib.thermodynamics.fep import BaseFreeEnergyProfile
    temp = 300*kelvin
    fep = BaseFreeEnergyProfile.from_histogram(hist, temp)