# 4. yaff.analysis – Trajectory analysis¶

## 4.1. yaff.analysis.basic – Basic trajectory analysis routines¶

Basic trajectory analysis routines

yaff.analysis.basic.plot_energies(f, fn_png='energies.png', **kwargs)

Make a plot of the potential, total and conserved energy as f. of time

Arguments:

f An h5.File instance containing the trajectory data.

Optional arguments:

fn_png The png file to write the figure to

The optional arguments of the get_slice function are also accepted in the form of keyword arguments.

The units for making the plot are taken from the yaff screen logger. This type of plot is essential for checking the sanity of a simulation.

yaff.analysis.basic.plot_temperature(f, fn_png='temperature.png', **kwargs)

Make a plot of the temperature as function of time

Arguments:

f
An h5.File instance containing the trajectory data.

Optional arguments:

fn_png
The png file to write the figure to

The optional arguments of the get_slice function are also accepted in the form of keyword arguments.

The units for making the plot are taken from the yaff screen logger. This type of plot is essential for checking the sanity of a simulation.

yaff.analysis.basic.plot_pressure(f, fn_png='pressure.png', window=1, **kwargs)

Make a plot of the pressure as function of time

Arguments:

f
An h5.File instance containing the trajectory data.

Optional arguments:

fn_png
The png file to write the figure to
window
The window over which the pressure is averaged

The optional arguments of the get_slice function are also accepted in the form of keyword arguments.

The units for making the plot are taken from the yaff screen logger. This type of plot is essential for checking the sanity of a simulation.

yaff.analysis.basic.plot_temp_dist(f, fn_png='temp_dist.png', temp=None, ndof=None, select=None, **kwargs)

Plots the distribution of the weighted atomic velocities

Arguments:

f
An h5.File instance containing the trajectory data.

Optional arguments:

fn_png
The png file to write the figure to
select
A list of atom indexes that should be considered for the analysis. By default, information from all atoms is combined.
temp
The (expected) average temperature used to plot the theoretical distributions.
ndof
The number of degrees of freedom. If not specified, this is chosen to be 3*(number of atoms)
start, end, step, max_sample
The optional arguments of the get_slice function are also accepted in the form of keyword arguments.

This type of plot is essential for checking the sanity of a simulation. The empirical cumulative distribution is plotted and overlayed with the analytical cumulative distribution one would expect if the data were taken from an NVT ensemble.

This type of plot reveals issues with parts that are relatively cold or warm compared to the total average temperature. This helps to determine (the lack of) thermal equilibrium.

yaff.analysis.basic.plot_press_dist(f, temp, fn_png='press_dist.png', press=None, ndof=None, select=None, **kwargs)

Plots the distribution of the internal pressure

Arguments:

f
An h5.File instance containing the trajectory data.
temp
The (expected) average temperature used to plot the theoretical distributions.

Optional arguments:

fn_png
The png file to write the figure to
select
A list of atom indexes that should be considered for the analysis. By default, information from all atoms is combined.
press
The (expected) average pressure used to plot the theoretical distributions.
ndof
The number of degrees of freedom. If not specified, this is chosen to be 3*(number of atoms)
start, end, step, max_sample
The optional arguments of the get_slice function are also accepted in the form of keyword arguments.

This type of plot is essential for checking the sanity of a simulation. The empirical cumulative distribution is plotted and overlayed with the analytical cumulative distribution one would expect if the data were taken from an NPT ensemble.

yaff.analysis.basic.plot_volume_dist(f, fn_png='volume_dist.png', temp=None, press=None, **kwargs)

Plots the distribution of the volume

Arguments:

f
An h5.File instance containing the trajectory data.

Optional arguments:

fn_png
The png file to write the figure to
temp
The (expected) average temperature used to plot the theoretical distributions.
press
The (expected) average pressure used to plot the theoretical distributions.
start, end, step, max_sample
The optional arguments of the get_slice function are also accepted in the form of keyword arguments.

This type of plot is essential for checking the sanity of a simulation. The empirical cumulative distribution is plotted and overlayed with the analytical cumulative distribution one would expect if the data were taken from an NPT ensemble.

yaff.analysis.basic.plot_density(f, fn_png='density.png', **kwargs)

Make a plot of the mass density as function of time

Arguments:

f
An h5.File instance containing the trajectory data.

Optional arguments:

fn_png
The png file to write the figure to

The optional arguments of the get_slice function are also accepted in the form of keyword arguments.

The units for making the plot are taken from the yaff screen logger.

yaff.analysis.basic.plot_cell_pars(f, fn_png='cell_pars.png', **kwargs)

Make a plot of the cell parameters as function of time

Arguments:

f
An h5.File instance containing the trajectory data.

Optional arguments:

fn_png
The png file to write the figure to

The optional arguments of the get_slice function are also accepted in the form of keyword arguments.

The units for making the plot are taken from the yaff screen logger.

yaff.analysis.basic.plot_epot_contribs(f, fn_png='epot_contribs.png', size=1.0, **kwargs)

Make a plot of the contributions to the potential energy as f. of time

Arguments:

f
An h5.File instance containing the trajectory data.

Optional arguments:

fn_png
The png file to write the figure to

The optional arguments of the get_slice function are also accepted in the form of keyword arguments.

The units for making the plot are taken from the yaff screen logger. This type of plot is essential for checking the sanity of a simulation.

yaff.analysis.basic.plot_angle(f, index, fn_png='angle.png', n_int=1, xlim=None, ymax=None, angle_lim=None, angle_shift=False, oriented=False, **kwargs)

Make a plot of the angle between the given atoms as f. of time

Arguments:

f
An h5.File instance containing the trajectory data
index
A list containing the three indices of the atoms as in the h5 file

Optional arguments:

fn_png
The png file to write the figure to
n_int
The number of equidistant intervals considered in the FFT and histogram
xlim
Frequency interval of interest
ymax
Maximal intensity of interest in the FFT
angle_lim
Angle interval of interest in the histogram
angle_shift
If True, all angles are shifted towards positive values
oriented
If True, a distinction is made between positive and negative angles
yaff.analysis.basic.plot_dihedral(f, index, fn_png='dihedral.png', n_int=1, xlim=None, ymax=None, angle_lim=None, angle_shift=False, oriented=False, **kwargs)

Make a plot of the angle between the given atoms as f. of time

Arguments:

f
An h5.File instance containing the trajectory data
index
A list containing the four indices of the atoms as in the h5 file

Optional arguments:

fn_png
The png file to write the figure to
n_int
The number of equidistant intervals considered in the FFT and histogram
xlim
Frequency interval of interest
ymax
Maximal intensity of interest in the FFT
angle_lim
Angle interval of interest in the histogram
angle_shift
If True, all angles are shifted towards positive values
oriented
If True, a distinction is made between positive and negative angles

## 4.2. yaff.analysis.blav – The block-average method¶

The block-average method

yaff.analysis.blav.blav(signal, minblock=100, fn_png=None, unit=None)

Analyze the signal with the block average method.

The variance on the block average error as function of block size is fitted using the a+b/bsize model, where a is a measure for the error on the average, i.e. when the block size becomes infintely large. If the fit fails, a coarse estimate of the error on the average is returned, i.e. the largest block average error.

Arguments:

signal
An array containing time-dependent data.

Optional arguments:

minblock
The minimum number of blocks to be considered.
fn_png
When given, the data used for the fit and the fitted model are plotted.
unit
This is only relevant when a plot is made. It is used as the unit for the y-axis, e.g. log.length.

Returns:

error
The fitted error.
sinef
The fitted statistical inefficiency.
yaff.analysis.blav.inefficiency(signal, time=None, fn_png='stat_ineff.png', taus=None, eq_limits=None)

Analyze the signal to determine the statistical inefficiency. The statistical inefficiency of a signal is defined as the limiting ratio of the observed variance of its long-term averages to their expected variance. It can hence be regarded as the factor (>1) with which the sample size should be multiplied in order to compensate for correlation. This is derived in:

Friedberg, R; Cameron, J.E. J. Chem. Phys. 1970, 52, 6049-6058.

Arguments:

signal
An array containing time-dependent data.

Optional arguments:

time
An array containing the time information of the provided signal.
fn_png
Name of the plot to which the data is written.
taus
An array, where for each element tau the block averages with this block size are determined.
eq_limits
An array containing the fractions of the signal to be considered as equilibration.

Returns:

## 4.3. yaff.analysis.diffusion – Diffusion constants¶

Diffusion constants

class yaff.analysis.diffusion.Diffusion(f=None, start=0, end=-1, step=1, mult=20, select=None, bsize=None, pospath='trajectory/pos', poskey='pos', outpath=None)

Computes mean-squared displacements and diffusion constants

Optional arguments:

f
An h5.File instance containing the trajectory data. If f is not given, or it does not contain the dataset referred to with the path argument, an on-line analysis is carried out.
start, end, step
Optional arguments for the get_slice function. max_sample is not supported because the choice of the step argument is critical for a useful result.
mult
In the first place, the mean square displacement (MSD) between subsequent step is computed. The MSD is also computed between every, two, three, …, until mult steps.
select
A list of atom indexes that are considered for the computation of the MSD’s. If not given, all atoms are used.
bsize
If given, time intervals that coincide with the boundaries of the block size, will not be considered form the analysis. This is useful when there is a significant monte carlo move between subsequent blocks. If step > 1, the intervals will be left out if the overlap with boundaries of blocks with size bsize*step.
pospath
The path of the dataset that contains the time dependent data in the HDF5 file. The first axis of the array must be the time axis.
poskey
In case of an on-line analysis, this is the key of the state item that contains the data from which the MSD’s are derived.
outpath
The output path for the MSD results. If not given, it defaults to ‘%s_diff’ % path. If this path already exists, it will be removed first.
init_first()
configure_online(iterative, st_pos)
configure_offline(ds_pos)
init_timestep()
read_online(st_pos)
read_offline(i, ds_pos)
compute_iteration()
compute_derived()
overlap_bsize(m)
update_msd(msd, m)
plot(fn_png='msds.png')

## 4.4. yaff.analysis.hook – Abstract hook implementation for trajectory analysis¶

Abstract hook implementation for trajectory analysis

class yaff.analysis.hook.AnalysisInput(path, key, required=True)

Bases: object

Describes the location of the time-dependent input for an analysis.

Arguments:

path
The path in the HDF5 file were the input can be found. This is only relevant for an off-line analysis. This must be a dataset in the trajectory folder.
key
The key of the state item that contains the input. This is only relevant for an on-line analysis.

Optional arguments:

required
When true, this input is mandatory. When false, it is optional.
class yaff.analysis.hook.AnalysisHook(f=None, start=0, end=-1, max_sample=None, step=None, analysis_inputs={}, outpath='trajectory/noname', do_timestep=False)

Base class for the analysis hooks.

Analysis hooks in Yaff support both off-line and on-line analysis.

Optional arguments:

f
An h5.File instance containing the trajectory data. If f is not given, or it does not contain the dataset referred to with the path argument, an on-line analysis is carried out.
start, end, max_sample, step
Optional arguments for the get_slice function.
analysis_inputs
A list with AnalysisInput instances
outpath
The output path for the analysis.
do_timestep
When True, a self.timestep attribute will be initialized as early as possible.
__call__(iterative)
compute_offline()
init_first()
init_online()
offline_loop(**datasets)
configure_online(**state_items)
configure_offline(**kwargs)
init_timestep()
read_online(**state_items)
read_offline(i, **kwargs)
compute_iteration()
compute_derived()

## 4.5. yaff.analysis.pca – Toolkit for Principal Component Analysis (PCA)¶

Toolkit for Principal Component Analysis (PCA)

yaff.analysis.pca.calc_cov_mat_internal(q, q_ref=None)

Calculates the covariance matrix of a time-dependent matrix q, containing the independent components in its columns. The rows are treated as different times at which the components are evaluated.

Arguments:

q
The txN matrix for which the NxN covariance matrix is determined

Optional arguments:

q_ref
Reference vector of size 1xN. If not provided, the ensemble average is taken.
yaff.analysis.pca.calc_cov_mat(f, q_ref=None, start=0, end=None, step=1, select=None, path='trajectory/pos', mw=True)

Calculates the covariance matrix of the given trajectory.

Arguments:

f
An h5.File instance containing the trajectory data.

Optional arguments:

q_ref
Reference vector of the positions. If not provided, the ensemble average is taken.
start
The first sample to be considered for analysis. This may be negative to indicate that the analysis should start from the -start last samples.
end
The last sample to be considered for analysis. This may be negative to indicate that the last -end sample should not be considered.
step
The spacing between the samples used for the analysis
select
A list of atom indexes that are considered for the computation of the spectrum. If not given, all atoms are used.
path
The path of the dataset that contains the time dependent data in the HDF5 file. The first axis of the array must be the time axis.
mw
If mw is True, the covariance matrix is mass-weighted.
yaff.analysis.pca.calc_pca(f_target, cov_mat=None, f=None, q_ref=None, start=0, end=None, step=1, select=None, path='trajectory/pos', mw=True, temp=None)

Performs a principle component analysis of the given trajectory.

Arguments:

f_target
Path to an h5.File instance to which the results are written.

Optional arguments:

cov_mat
The covariance matrix, if already calculated. If not provided, the covariance matrix will be calculatd based on the file f.
f
An h5.File instance containing the trajectory data.
q_ref
Reference vector of the positions. If not provided, the ensemble average is taken.
start
The first sample to be considered for analysis. This may be negative to indicate that the analysis should start from the -start last samples.
end
The last sample to be considered for analysis. This may be negative to indicate that the last -end sample should not be considered.
step
The spacing between the samples used for the analysis
select
A list of atom indexes that are considered for the computation of the spectrum. If not given, all atoms are used.
path
The path of the dataset that contains the time dependent data in the HDF5 file. The first axis of the array must be the time axis.
mw
If mass_weighted is True, the covariance matrix is mass-weighted.
temp
Temperature at which the simulation is carried out, necessary to determine the frequencies
yaff.analysis.pca.pca_projection(f_target, f, pm, start=0, end=None, step=1, select=None, path='trajectory/pos', mw=True)

Determines the principal components of an MD simulation

Arguments:

f_target
Path to an h5.File instance to which the results are written.
f
An h5.File instance containing the trajectory data.
pm
An array containing the principal modes in its columns

Optional arguments:

start
The first sample to be considered for analysis. This may be negative to indicate that the analysis should start from the -start last samples.
end
The last sample to be considered for analysis. This may be negative to indicate that the last -end sample should not be considered.
step
The spacing between the samples used for the analysis
select
A list of atom indexes that are considered for the computation of the spectrum. If not given, all atoms are used.
path
The path of the dataset that contains the time dependent data in the HDF5 file. The first axis of the array must be the time axis.
mw
If mass_weighted is True, the covariance matrix is mass-weighted.
yaff.analysis.pca.write_principal_mode(f, f_pca, index, n_frames=100, select=None, mw=True, scaling=1.0)

Writes out one xyz file per principal mode given in index

Arguments:

f
Path to an h5.File instance containing the original data.
f_pca
Path to an h5.File instance containing the PCA, with reference structure, eigenvalues and principal modes.
index
An array containing the principal modes which need to be written out.

Optional arguments:

n_frames
The number of frames in each xyz file.
select
A list of atom indexes that are considered for the computation of the spectrum. If not given, all atoms are used.
mw
If mass_weighted is True, the covariance matrix is assumed to be mass-weighted.
scaling
Scaling factor applied to the maximum deviation of the principal mode (i.e. the maximum principal component for that mode)
yaff.analysis.pca.pca_similarity(covar_a, covar_b)

Calculates the similarity between the two covariance matrices

Arguments:

covar_a
The first covariance matrix.
covar_b
The second covariance matrix.
yaff.analysis.pca.pca_convergence(f, eq_time=0.0, n_parts=None, step=1, fn='PCA_convergence', n_bootstrap=50, mw=True)

Calculates the convergence of the simulation by calculating the pca similarity for different subsets of the simulation.

Arguments:

f
An h5.File instance containing the trajectory data.

Optional arguments:

eq_time
Equilibration time, discarded from the simulation.
n_parts
Array containing the number of parts in which the total simulation is divided.
step
Stepsize used in the trajectory.
fn
Filename containing the convergence plot.
n_bootstrap
The number of bootstrapped trajectories.
mw
If mass_weighted is True, the covariance matrix is mass-weighted.

## 4.6. yaff.analysis.rdf – Radial distribution functions¶

class yaff.analysis.rdf.RDF(rcut, rspacing, f=None, start=0, end=-1, max_sample=None, step=None, select0=None, select1=None, pairs_sr=None, nimage=0, pospath='trajectory/pos', poskey='pos', cellpath=None, cellkey=None, outpath=None)

Computes a radial distribution function (RDF)

Argument:

rcut
The cutoff for the RDF analysis. This should be lower than the spacing between the primitive cell planes, multiplied by (1+2*nimage).
rspacing
The width of the bins to build up the RDF.

Optional arguments:

f
An h5.File instance containing the trajectory data. If f is not given, or it does not contain the dataset referred to with the path argument, an on-line analysis is carried out.
start, end, max_sample, step
arguments to setup the selection of time slices. See get_slice for more information.
select0
A list of atom indexes that are considered for the computation of the rdf. If not given, all atoms are used.
select1
A list of atom indexes that are needed to compute an RDF between two disjoint sets of atoms. (If there is some overlap between select0 and select1, an error will be raised.) If this is None, an ‘internal’ RDF will be computed for the atoms specified in select0.
pairs_sr
An array with short-range pairs of atoms (shape K x 2). When given, an additional RDFs is generated for the short-range pairs (rdf_sr).
nimage
The number of cell images to consider in the computation of the pair distances. By default, this is zero, meaning that only the minimum image convention is used.
pospath
The path of the dataset that contains the time dependent data in the HDF5 file. The first axis of the array must be the time axis. This is only needed for an off-line analysis
poskey
In case of an on-line analysis, this is the key of the state item that contains the data from which the RDF is derived.
cellpath
The path the time-dependent cell vector data. This is only needed when the cell parameters are variable and the analysis is off-line.
cellkey
The key of the stateitem that contains the cell vectors. This is only needed when the cell parameters are variable and the analysis is done on-line.
outpath
The output path for the frequency computation in the HDF5 file. If not given, it defaults to ‘%s_rdf’ % path. If this path already exists, it will be removed first.

When f is None, or when the path does not exist in the HDF5 file, the class can be used as an on-line analysis hook for the iterative algorithms in yaff.sampling package. This means that the RDF is built up as the itertive algorithm progresses. The end option is ignored and max_sample is not applicable to an on-line analysis.

configure_online(iterative, st_pos, st_cell=None)
configure_offline(ds_pos, ds_cell=None)
init_first()

Setup some work arrays

read_online(st_pos, st_cell=None)
read_offline(i, ds_pos, ds_cell=None)
compute_iteration()
compute_derived()
plot(fn_png='rdf.png')

## 4.7. yaff.analysis.spectrum – Spectral analysis and autocorrelation functions¶

Spectral analysis and autocorrelation functions

class yaff.analysis.spectrum.Spectrum(f=None, start=0, end=-1, step=1, bsize=4096, select=None, path='trajectory/vel', key='vel', outpath=None, weights=None)

Optional arguments:

f
An h5.File instance containing the trajectory data. If f is not given, or it does not contain the dataset referred to with the path argument, an on-line analysis is carried out.
start
The first sample to be considered for analysis. This may be negative to indicate that the analysis should start from the -start last samples.
end
The last sample to be considered for analysis. This may be negative to indicate that the last -end sample should not be considered.
step
The spacing between the samples used for the analysis
bsize
The size of the blocks used for individual FFT calls.
select
A list of atom indexes that are considered for the computation of the spectrum. If not given, all atoms are used.
path
The path of the dataset that contains the time dependent data in the HDF5 file. The first axis of the array must be the time axis. The spectra are summed over the other axes.
key
In case of an on-line analysis, this is the key of the state item that contains the data from which the spectrum is derived.
outpath
The output path for the frequency computation in the HDF5 file. If not given, it defaults to ‘%s_spectrum’ % path. If this path already exists, it will be removed first.
weights
If not given, the spectrum is just a simple sum of contributions from different time-dependent functions. If given, a linear combination is made based on these weights.

The max_sample argument from get_slice is not used because the choice step value is an important parameter: it is best to choose step*bsize such that it coincides with a part of the trajectory in which the velocities (or other data) are continuous.

The block size should be set such that it corresponds to a decent resolution on the frequency axis, i.e. 33356 fs of MD data corresponds to a resolution of about 1 cm^-1. The step size should be set such that the highest frequency is above the highest relevant frequency in the spectrum, e.g. a step of 10 fs corresponds to a frequency maximum of 3336 cm^-1. The total number of FFT’s, i.e. length of the simulation divided by the block size multiplied by the number of time-dependent functions in the data, determines the noise reduction on the (the amplitude of) spectrum. If there is sufficient data to perform 10K FFT’s, one should get a reasonably smooth spectrum.

Depending on the FFT implementation in numpy, it may be interesting to tune the bsize argument. A power of 2 is typically a good choice.

When f is None, or when the path does not exist in the HDF5 file, the class can be used as an on-line analysis hook for the iterative algorithms in yaff.sampling package. This means that the spectrum is built up as the iterative algorithm progresses. The end option is ignored for an on-line analysis.

init_online()
init_timestep()
offline_loop(ds_signal)
configure_online(iterative, st_signal)
configure_offline(ds_signal)
init_first()
read_online(st_signal)
compute_iteration()
compute_derived()
plot(fn_png='spectrum.png', do_wavenum=True, xlim=None, verticals=None, thermostat=None, ndof=None)

verticals: array containing as first entry the timeconstant of the original system, and as following entries the wavenumbers of the original system.

plot_ac(fn_png='ac.png')

## 4.8. yaff.analysis.utils – Auxiliary analysis routines¶

Auxiliary analysis routines

yaff.analysis.utils.get_slice(f, start=0, end=-1, max_sample=None, step=None)

Argument:

f
A HDF5.File instance, may be None if it is not available. If it contains a trajectory group, this group will be used to determine the number of time steps in the trajectory.

Optional arguments:

start
The first sample to be considered for analysis. This may be negative to indicate that the analysis should start from the -start last samples.
end
The last sample to be considered for analysis. This may be negative to indicate that the last -end sample should not be considered.
max_sample
When given, step is set such that the number of samples does not exceed max_sample.
step
The spacing between the samples used for the analysis

The optional arguments can be given to all of the analysis routines. Just make sure you never specify step and max_sample at the same time. The max_sample argument assures that the step is set such that the number of samples does (just) not exceed max_sample. The max_sample option only works when f is not None, or when end is positive.

if f is present or start and end are positive, and max_sample and step or not given, max_sample defaults to 1000.

Returns start, end and step. When f is given, start and end are always positive.