# 10. Trajectory Analysis¶

The analysis of the results is (in the first place) based on the HDF5 trajectory output files. On-line analysis (during the iterative algorithm, without writing data to disk) is also possible. As a final option, trajectory data from other programs may also be converted to the HDF5 format used by Yaff, which can then be post-processed as if it was generated by Yaff.

In all the examples below, the following import statement is implicitly assumed:

```
import h5py as h5
```

## 10.1. Slicing the data¶

All the analysis routines below have at least the following four optional arguments:

`start`

: the first sample to consider for the analysis`step`

: consider only a sample each`step`

iterations.`max_sample`

: consider at most`max_sample`

number of samples.

Some routines also support the following optional argument:

`end`

: the last sample to consider for the analysis.

The `max_sample`

option is only possible when `step`

is not specified and
the total number of samples (or `end`

) is known. When `max_sample`

is given,
the corresponding optimal value for `step`

will be derived from
`max_sample`

. Some analysis may not have the max_sample argument, e.g. the
spectrum analysis, because the choice of the step size for such analysis is a
critical parameter that needs to be set carefully.

See `yaff.analysis.utils.get_slice()`

for the documentation of the generic
underlying routine that handles the slices.

## 10.2. Basic analysis¶

A few basic analysis routines are provided to quickly check the sanity of an MD simulation:

`plot_energies`

makes a plot of the kinetic and the total energy as function of time. For example:f = h5.File('output.h5') plot_energies(f) f.close()

makes a figure

`energies.png`

.`plot_temperate`

and`plot_pressure`

are similar, but they plot the temperature and pressure as function of time instead of the energies.`plot_temp_dist`

plots the distribution (both pdf and cdf) of the instantaneous atomic and system temperatures and compares these with the expected analytical result for a constant-temperature ensemble. For example:f = h5.File('output.h5') plot_temp_dist(f) f.close()

makes a figure

`temp_dist.png`

All these functions accept optional arguments to tune their behavior. See
`yaff.analysis.basic`

for more analysis routines and more details.

## 10.3. Advanced analysis¶

Yaff also includes analysis tools that can extract relevant macroscopic properties from a simulation. These analysis tools require some additional computations that can either be done in a post-processing step, or on-line.

### 10.3.1. Radial distribution functions (RDFs)¶

The following example computes and oxygen-oxygen RDF using a previously generated trajectory file:

```
f = h5.File('output.h5')
select = system.get_indexes('8') # select atoms with ATSELECT rule
rdf = RDF(f, rcut=4.8*angstrom, rspacing=0.1*angstrom, max_sample=100, select0=select)
rdf.plot()
rdf.plot_crdf()
f.close()
```

In this example, the cutoff for the RDF is 4.8 Å and the spacing of the
bins 0.1 Å. At most 100 samples are used to compute the RDF. The results are
included in the HDF5 file, and optionally plotted using matplotlib.
Alternatively, the same `RDFAnalysis`

class can be used for on-line
analysis, without the need to store huge amounts of data on disk:

```
select = system.get_indexes('O')
rdf = RDF(None, 4.8*angstrom, 0.1*angstrom, max_sample=100, select0=select)
verlet = VerletIntegrator(ff, hooks=rdf, temp0=300)
verlet.run(5000)
rdf.plot()
rdf.plot_crdf()
```

One may also construct RDFs based on two sets of particles, using two `select*`

arguments. The following example computes the inter-particle RDF for the oxygen
and hydrogen atoms, using a previously computed trajectory file:

```
f = h5.File('output.h5')
select0 = system.get_indexes('8') # select atoms with ATSELECT rule
select1 = system.get_indexes('1')
rdf = RDF(f, rcut=4.8*angstrom, rspacing=0.1*angstrom, max_sample=100,
select0=select0, select1=select1)
rdf.plot()
rdf.plot_crdf()
f.close()
```

The reference documentation of the `rdf`

module (`yaff.analysis.rdf`

)
describes also the more exotic RDF options, e.g. to exclude intra-molecular
distances from the RDF.

### 10.3.2. Vibrational spectra and correlation times¶

The (efficient) computation of vibrational spectra and autocorrelation functions are so closely related that they are carried out simultaneously. A basic vibrational spectrum (by default based on the atomic velocities) is done as follows:

```
spectrum = Spectrum(f, bsize=512)
spectrum.plot()
spectrum.plot_ac()
```

The `bsize`

argument determines the size of the blocks used for the
spectral analysis. The trajectory is cut into blocks of the given size. For
each block, the spectrum is computed, and then averaged over all blocks. The
`plot`

method makes a figure of the spectrum. The `plot_ac`

method makes
a figure of the corresponding autocorrelation function. All the results are
also available as attributes of the spectrum object. Similar to the RDF
analysis, the spectrum can be computed both on-line and off-line. One can
also estimate the IR spectrum as follows:

```
spectrum = Spectrum(f, bsize=512, path='trajectory/dipole_vel', key='ir')
spectrum.plot()
```

More details can be found in the reference documention:
`yaff.analysis.spectrum`

.

### 10.3.3. Diffusion constants¶

The diffusion constant (based on the displacements of all nuclei) is computed as follows:

```
diff = Diffusion(f, step=10, mult=5)
print diff.A
diff.plot()
```

The positions are subsampled with step size of 10. The `mult`

argument controls
between which time intervals are used to compute the mean-square displacements
(MSDs) used to carry out a fit of the Einstein diffusion equation. When set to
5, the MSDs are computed at intervals of 10, 20, 30, 40 and 50 steps. As soon
as the analysis is ready, a straight line is fitted through the MSD versus time
data. The slope (`diff.A`

) of this line corresponds to the diffusion constant.
A plot of these data and the fitted line can be made with `diff.plot()`

.

One may restrict the analysis to just a selection of atoms as follows:

```
diff = Diffusion(f, step=10, mult=5, select=[0,4,5])
```

See `yaff.analysis.diffusion`

for more details.

**TODO.** Explain (and double check) how to compute diffusion constants of
multiple molecules instead of just atoms.

### 10.3.4. Computing errors on thermodynamic averages¶

The block average method can be used to compute the error on time-dependent data that have some degree of auto-correlation. The following example computes the error on the average temperature from an MD simulation, using a HDF5 trajectory file.

```
f = h5.file('output.h5')
error, sinef = blav(f['trajectory/temperature'])
f.close()
```

Two values are returned. `error`

is the error on the average. `sinef`

is the
statistical sampling inefficiency. More details about this routine are given
here: `yaff.sampling.blav`

.

## 10.4. Post-processing external trajectory data¶

One may also use the analysis module of Yaff to process trajectories generated
with other molecular simulation codes. The following example script (see
`data/examples/002_external_trajectory/rdf.py`

) converts the xyz file to HDF5 data,
which is then used to perform the analysis

```
# Needed for python2 backward compatibility
from __future__ import print_function
# Import Yaff and Numpy libraries and setup a reference system. The reference
# system is used to recognize atom types etc.
import numpy as np
from yaff import *
system = System.from_file('trajectory.xyz', rvecs=np.diag([20.3, 20.3, 20.3])*angstrom)
# Create a HDF5 file and convert the XYZ file to arrays in the HDF5 file
import h5py as h5
with h5.File('trajectory.h5', mode='w') as f:
system.to_hdf5(f)
xyz_to_hdf5(f, 'trajectory.xyz')
# Select two lists of atom indexes based on the ATSELECT rules '1' and '8'
select0 = system.get_indexes('1')
select1 = system.get_indexes('8')
# Note. The remainder of the example may be moved to a separate script if
# that would be more convenient, e.g. in case different RDFs must be generated.
# This would avoid repetetive conversion of the XYZ file.
# Create the RDF.
rdf = RDF(10*angstrom, 0.1*angstrom, f, max_sample=100, select0=select0, select1=select1)
# One may make plots with the rdf object ...
rdf.plot()
# ... or access the results as Numpy arrays
print()
print('RDF DATA FOR THE X-AXIS [A]')
print(rdf.d/angstrom)
print()
print('RDF DATA FOR THE Y-AXIS')
print(rdf.rdf)
```

See `yaff.conversion`

for more routines to convert trajectory data.