4. yaff.analysis
– Trajectory analysis¶
4.1. yaff.analysis.basic
– Basic trajectory analysis routines¶
Basic trajectory analysis routines

yaff.analysis.basic.
plot_energies
(f, fn_png='energies.png', **kwargs)¶ Make a plot of the potential, total and conserved energy as f. of time
Arguments:
f An h5.File instance containing the trajectory data.
Optional arguments:
fn_png The png file to write the figure to
The optional arguments of the
get_slice
function are also accepted in the form of keyword arguments.The units for making the plot are taken from the yaff screen logger. This type of plot is essential for checking the sanity of a simulation.

yaff.analysis.basic.
plot_temperature
(f, fn_png='temperature.png', **kwargs)¶ Make a plot of the temperature as function of time
Arguments:
 f
 An h5.File instance containing the trajectory data.
Optional arguments:
 fn_png
 The png file to write the figure to
The optional arguments of the
get_slice
function are also accepted in the form of keyword arguments.The units for making the plot are taken from the yaff screen logger. This type of plot is essential for checking the sanity of a simulation.

yaff.analysis.basic.
plot_pressure
(f, fn_png='pressure.png', window=1, **kwargs)¶ Make a plot of the pressure as function of time
Arguments:
 f
 An h5.File instance containing the trajectory data.
Optional arguments:
 fn_png
 The png file to write the figure to
 window
 The window over which the pressure is averaged
The optional arguments of the
get_slice
function are also accepted in the form of keyword arguments.The units for making the plot are taken from the yaff screen logger. This type of plot is essential for checking the sanity of a simulation.

yaff.analysis.basic.
plot_temp_dist
(f, fn_png='temp_dist.png', temp=None, ndof=None, select=None, **kwargs)¶ Plots the distribution of the weighted atomic velocities
Arguments:
 f
 An h5.File instance containing the trajectory data.
Optional arguments:
 fn_png
 The png file to write the figure to
 select
 A list of atom indexes that should be considered for the analysis. By default, information from all atoms is combined.
 temp
 The (expected) average temperature used to plot the theoretical distributions.
 ndof
 The number of degrees of freedom. If not specified, this is chosen to be 3*(number of atoms)
 start, end, step, max_sample
 The optional arguments of the
get_slice
function are also accepted in the form of keyword arguments.
This type of plot is essential for checking the sanity of a simulation. The empirical cumulative distribution is plotted and overlayed with the analytical cumulative distribution one would expect if the data were taken from an NVT ensemble.
This type of plot reveals issues with parts that are relatively cold or warm compared to the total average temperature. This helps to determine (the lack of) thermal equilibrium.

yaff.analysis.basic.
plot_press_dist
(f, temp, fn_png='press_dist.png', press=None, ndof=None, select=None, **kwargs)¶ Plots the distribution of the internal pressure
Arguments:
 f
 An h5.File instance containing the trajectory data.
 temp
 The (expected) average temperature used to plot the theoretical distributions.
Optional arguments:
 fn_png
 The png file to write the figure to
 select
 A list of atom indexes that should be considered for the analysis. By default, information from all atoms is combined.
 press
 The (expected) average pressure used to plot the theoretical distributions.
 ndof
 The number of degrees of freedom. If not specified, this is chosen to be 3*(number of atoms)
 start, end, step, max_sample
 The optional arguments of the
get_slice
function are also accepted in the form of keyword arguments.
This type of plot is essential for checking the sanity of a simulation. The empirical cumulative distribution is plotted and overlayed with the analytical cumulative distribution one would expect if the data were taken from an NPT ensemble.

yaff.analysis.basic.
plot_volume_dist
(f, fn_png='volume_dist.png', temp=None, press=None, **kwargs)¶ Plots the distribution of the volume
Arguments:
 f
 An h5.File instance containing the trajectory data.
Optional arguments:
 fn_png
 The png file to write the figure to
 temp
 The (expected) average temperature used to plot the theoretical distributions.
 press
 The (expected) average pressure used to plot the theoretical distributions.
 start, end, step, max_sample
 The optional arguments of the
get_slice
function are also accepted in the form of keyword arguments.
This type of plot is essential for checking the sanity of a simulation. The empirical cumulative distribution is plotted and overlayed with the analytical cumulative distribution one would expect if the data were taken from an NPT ensemble.

yaff.analysis.basic.
plot_density
(f, fn_png='density.png', **kwargs)¶ Make a plot of the mass density as function of time
Arguments:
 f
 An h5.File instance containing the trajectory data.
Optional arguments:
 fn_png
 The png file to write the figure to
The optional arguments of the
get_slice
function are also accepted in the form of keyword arguments.The units for making the plot are taken from the yaff screen logger.

yaff.analysis.basic.
plot_cell_pars
(f, fn_png='cell_pars.png', **kwargs)¶ Make a plot of the cell parameters as function of time
Arguments:
 f
 An h5.File instance containing the trajectory data.
Optional arguments:
 fn_png
 The png file to write the figure to
The optional arguments of the
get_slice
function are also accepted in the form of keyword arguments.The units for making the plot are taken from the yaff screen logger.

yaff.analysis.basic.
plot_epot_contribs
(f, fn_png='epot_contribs.png', size=1.0, **kwargs)¶ Make a plot of the contributions to the potential energy as f. of time
Arguments:
 f
 An h5.File instance containing the trajectory data.
Optional arguments:
 fn_png
 The png file to write the figure to
The optional arguments of the
get_slice
function are also accepted in the form of keyword arguments.The units for making the plot are taken from the yaff screen logger. This type of plot is essential for checking the sanity of a simulation.

yaff.analysis.basic.
plot_angle
(f, index, fn_png='angle.png', n_int=1, xlim=None, ymax=None, angle_lim=None, angle_shift=False, oriented=False, **kwargs)¶ Make a plot of the angle between the given atoms as f. of time
Arguments:
 f
 An h5.File instance containing the trajectory data
 index
 A list containing the three indices of the atoms as in the h5 file
Optional arguments:
 fn_png
 The png file to write the figure to
 n_int
 The number of equidistant intervals considered in the FFT and histogram
 xlim
 Frequency interval of interest
 ymax
 Maximal intensity of interest in the FFT
 angle_lim
 Angle interval of interest in the histogram
 angle_shift
 If True, all angles are shifted towards positive values
 oriented
 If True, a distinction is made between positive and negative angles

yaff.analysis.basic.
plot_dihedral
(f, index, fn_png='dihedral.png', n_int=1, xlim=None, ymax=None, angle_lim=None, angle_shift=False, oriented=False, **kwargs)¶ Make a plot of the angle between the given atoms as f. of time
Arguments:
 f
 An h5.File instance containing the trajectory data
 index
 A list containing the four indices of the atoms as in the h5 file
Optional arguments:
 fn_png
 The png file to write the figure to
 n_int
 The number of equidistant intervals considered in the FFT and histogram
 xlim
 Frequency interval of interest
 ymax
 Maximal intensity of interest in the FFT
 angle_lim
 Angle interval of interest in the histogram
 angle_shift
 If True, all angles are shifted towards positive values
 oriented
 If True, a distinction is made between positive and negative angles
4.2. yaff.analysis.blav
– The blockaverage method¶
The blockaverage method

yaff.analysis.blav.
blav
(signal, minblock=100, fn_png=None, unit=None)¶ Analyze the signal with the block average method.
The variance on the block average error as function of block size is fitted using the
a+b/bsize
model, wherea
is a measure for the error on the average, i.e. when the block size becomes infintely large. If the fit fails, a coarse estimate of the error on the average is returned, i.e. the largest block average error.Arguments:
 signal
 An array containing timedependent data.
Optional arguments:
 minblock
 The minimum number of blocks to be considered.
 fn_png
 When given, the data used for the fit and the fitted model are plotted.
 unit
 This is only relevant when a plot is made. It is used as the unit
for the yaxis, e.g.
log.length
.
Returns:
 error
 The fitted error.
 sinef
 The fitted statistical inefficiency.

yaff.analysis.blav.
inefficiency
(signal, time=None, fn_png='stat_ineff.png', taus=None, eq_limits=None)¶ Analyze the signal to determine the statistical inefficiency. The statistical inefficiency of a signal is defined as the limiting ratio of the observed variance of its longterm averages to their expected variance. It can hence be regarded as the factor (>1) with which the sample size should be multiplied in order to compensate for correlation. This is derived in:
Friedberg, R; Cameron, J.E. J. Chem. Phys. 1970, 52, 60496058.Arguments:
 signal
 An array containing timedependent data.
Optional arguments:
 time
 An array containing the time information of the provided signal.
 fn_png
 Name of the plot to which the data is written.
 taus
 An array, where for each element tau the block averages with this block size are determined.
 eq_limits
 An array containing the fractions of the signal to be considered as equilibration.
Returns:
4.3. yaff.analysis.diffusion
– Diffusion constants¶
Diffusion constants

class
yaff.analysis.diffusion.
Diffusion
(f=None, start=0, end=1, step=1, mult=20, select=None, bsize=None, pospath='trajectory/pos', poskey='pos', outpath=None)¶ Bases:
yaff.analysis.hook.AnalysisHook
Computes meansquared displacements and diffusion constants
Optional arguments:
 f
 An h5.File instance containing the trajectory data. If
f
is not given, or it does not contain the dataset referred to with thepath
argument, an online analysis is carried out.  start, end, step
 Optional arguments for the
get_slice
function. max_sample is not supported because the choice of the step argument is critical for a useful result.  mult
 In the first place, the mean square displacement (MSD) between
subsequent step is computed. The MSD is also computed between
every, two, three, …, until
mult
steps.  select
 A list of atom indexes that are considered for the computation of the MSD’s. If not given, all atoms are used.
 bsize
 If given, time intervals that coincide with the boundaries of the block size, will not be considered form the analysis. This is useful when there is a significant monte carlo move between subsequent blocks. If step > 1, the intervals will be left out if the overlap with boundaries of blocks with size bsize*step.
 pospath
 The path of the dataset that contains the time dependent data in the HDF5 file. The first axis of the array must be the time axis.
 poskey
 In case of an online analysis, this is the key of the state item that contains the data from which the MSD’s are derived.
 outpath
 The output path for the MSD results. If not given, it defaults to ‘%s_diff’ % path. If this path already exists, it will be removed first.

init_first
()¶

configure_online
(iterative, st_pos)¶

configure_offline
(ds_pos)¶

init_timestep
()¶

read_online
(st_pos)¶

read_offline
(i, ds_pos)¶

compute_iteration
()¶

compute_derived
()¶

overlap_bsize
(m)¶

update_msd
(msd, m)¶

plot
(fn_png='msds.png')¶
4.4. yaff.analysis.hook
– Abstract hook implementation for trajectory analysis¶
Abstract hook implementation for trajectory analysis

class
yaff.analysis.hook.
AnalysisInput
(path, key, required=True)¶ Bases:
object
Describes the location of the timedependent input for an analysis.
Arguments:
 path
 The path in the HDF5 file were the input can be found. This is only relevant for an offline analysis. This must be a dataset in the trajectory folder.
 key
 The key of the state item that contains the input. This is only relevant for an online analysis.
Optional arguments:
 required
 When true, this input is mandatory. When false, it is optional.

class
yaff.analysis.hook.
AnalysisHook
(f=None, start=0, end=1, max_sample=None, step=None, analysis_inputs={}, outpath='trajectory/noname', do_timestep=False)¶ Bases:
yaff.sampling.iterative.Hook
Base class for the analysis hooks.
Analysis hooks in Yaff support both offline and online analysis.
Optional arguments:
 f
 An h5.File instance containing the trajectory data. If
f
is not given, or it does not contain the dataset referred to with thepath
argument, an online analysis is carried out.  start, end, max_sample, step
 Optional arguments for the
get_slice
function.  analysis_inputs
 A list with AnalysisInput instances
 outpath
 The output path for the analysis.
 do_timestep
 When True, a self.timestep attribute will be initialized as early as possible.

__call__
(iterative)¶

compute_offline
()¶

init_first
()¶

init_online
()¶

offline_loop
(**datasets)¶

configure_online
(**state_items)¶

configure_offline
(**kwargs)¶

init_timestep
()¶

read_online
(**state_items)¶

read_offline
(i, **kwargs)¶

compute_iteration
()¶

compute_derived
()¶
4.5. yaff.analysis.pca
– Toolkit for Principal Component Analysis (PCA)¶
Toolkit for Principal Component Analysis (PCA)

yaff.analysis.pca.
calc_cov_mat_internal
(q, q_ref=None)¶ Calculates the covariance matrix of a timedependent matrix q, containing the independent components in its columns. The rows are treated as different times at which the components are evaluated.
Arguments:
 q
 The txN matrix for which the NxN covariance matrix is determined
Optional arguments:
 q_ref
 Reference vector of size 1xN. If not provided, the ensemble average is taken.

yaff.analysis.pca.
calc_cov_mat
(f, q_ref=None, start=0, end=None, step=1, select=None, path='trajectory/pos', mw=True)¶ Calculates the covariance matrix of the given trajectory.
Arguments:
 f
 An h5.File instance containing the trajectory data.
Optional arguments:
 q_ref
 Reference vector of the positions. If not provided, the ensemble average is taken.
 start
 The first sample to be considered for analysis. This may be negative to indicate that the analysis should start from the start last samples.
 end
 The last sample to be considered for analysis. This may be negative to indicate that the last end sample should not be considered.
 step
 The spacing between the samples used for the analysis
 select
 A list of atom indexes that are considered for the computation of the spectrum. If not given, all atoms are used.
 path
 The path of the dataset that contains the time dependent data in the HDF5 file. The first axis of the array must be the time axis.
 mw
 If mw is True, the covariance matrix is massweighted.

yaff.analysis.pca.
calc_pca
(f_target, cov_mat=None, f=None, q_ref=None, start=0, end=None, step=1, select=None, path='trajectory/pos', mw=True, temp=None)¶ Performs a principle component analysis of the given trajectory.
Arguments:
 f_target
 Path to an h5.File instance to which the results are written.
Optional arguments:
 cov_mat
 The covariance matrix, if already calculated. If not provided, the covariance matrix will be calculatd based on the file f.
 f
 An h5.File instance containing the trajectory data.
 q_ref
 Reference vector of the positions. If not provided, the ensemble average is taken.
 start
 The first sample to be considered for analysis. This may be negative to indicate that the analysis should start from the start last samples.
 end
 The last sample to be considered for analysis. This may be negative to indicate that the last end sample should not be considered.
 step
 The spacing between the samples used for the analysis
 select
 A list of atom indexes that are considered for the computation of the spectrum. If not given, all atoms are used.
 path
 The path of the dataset that contains the time dependent data in the HDF5 file. The first axis of the array must be the time axis.
 mw
 If mass_weighted is True, the covariance matrix is massweighted.
 temp
 Temperature at which the simulation is carried out, necessary to determine the frequencies

yaff.analysis.pca.
pca_projection
(f_target, f, pm, start=0, end=None, step=1, select=None, path='trajectory/pos', mw=True)¶ Determines the principal components of an MD simulation
Arguments:
 f_target
 Path to an h5.File instance to which the results are written.
 f
 An h5.File instance containing the trajectory data.
 pm
 An array containing the principal modes in its columns
Optional arguments:
 start
 The first sample to be considered for analysis. This may be negative to indicate that the analysis should start from the start last samples.
 end
 The last sample to be considered for analysis. This may be negative to indicate that the last end sample should not be considered.
 step
 The spacing between the samples used for the analysis
 select
 A list of atom indexes that are considered for the computation of the spectrum. If not given, all atoms are used.
 path
 The path of the dataset that contains the time dependent data in the HDF5 file. The first axis of the array must be the time axis.
 mw
 If mass_weighted is True, the covariance matrix is massweighted.

yaff.analysis.pca.
write_principal_mode
(f, f_pca, index, n_frames=100, select=None, mw=True, scaling=1.0)¶ Writes out one xyz file per principal mode given in index
Arguments:
 f
 Path to an h5.File instance containing the original data.
 f_pca
 Path to an h5.File instance containing the PCA, with reference structure, eigenvalues and principal modes.
 index
 An array containing the principal modes which need to be written out.
Optional arguments:
 n_frames
 The number of frames in each xyz file.
 select
 A list of atom indexes that are considered for the computation of the spectrum. If not given, all atoms are used.
 mw
 If mass_weighted is True, the covariance matrix is assumed to be massweighted.
 scaling
 Scaling factor applied to the maximum deviation of the principal mode (i.e. the maximum principal component for that mode)

yaff.analysis.pca.
pca_similarity
(covar_a, covar_b)¶ Calculates the similarity between the two covariance matrices
Arguments:
 covar_a
 The first covariance matrix.
 covar_b
 The second covariance matrix.

yaff.analysis.pca.
pca_convergence
(f, eq_time=0.0, n_parts=None, step=1, fn='PCA_convergence', n_bootstrap=50, mw=True)¶ Calculates the convergence of the simulation by calculating the pca similarity for different subsets of the simulation.
Arguments:
 f
 An h5.File instance containing the trajectory data.
Optional arguments:
 eq_time
 Equilibration time, discarded from the simulation.
 n_parts
 Array containing the number of parts in which the total simulation is divided.
 step
 Stepsize used in the trajectory.
 fn
 Filename containing the convergence plot.
 n_bootstrap
 The number of bootstrapped trajectories.
 mw
 If mass_weighted is True, the covariance matrix is massweighted.
4.6. yaff.analysis.rdf
– Radial distribution functions¶
Radial distribution functions

class
yaff.analysis.rdf.
RDF
(rcut, rspacing, f=None, start=0, end=1, max_sample=None, step=None, select0=None, select1=None, pairs_sr=None, nimage=0, pospath='trajectory/pos', poskey='pos', cellpath=None, cellkey=None, outpath=None)¶ Bases:
yaff.analysis.hook.AnalysisHook
Computes a radial distribution function (RDF)
Argument:
 rcut
 The cutoff for the RDF analysis. This should be lower than the spacing between the primitive cell planes, multiplied by (1+2*nimage).
 rspacing
 The width of the bins to build up the RDF.
Optional arguments:
 f
 An h5.File instance containing the trajectory data. If
f
is not given, or it does not contain the dataset referred to with thepath
argument, an online analysis is carried out.  start, end, max_sample, step
 arguments to setup the selection of time slices. See
get_slice
for more information.  select0
 A list of atom indexes that are considered for the computation of the rdf. If not given, all atoms are used.
 select1
 A list of atom indexes that are needed to compute an RDF between two disjoint sets of atoms. (If there is some overlap between select0 and select1, an error will be raised.) If this is None, an ‘internal’ RDF will be computed for the atoms specified in select0.
 pairs_sr
 An array with shortrange pairs of atoms (shape K x 2). When given, an additional RDFs is generated for the shortrange pairs (rdf_sr).
 nimage
 The number of cell images to consider in the computation of the pair distances. By default, this is zero, meaning that only the minimum image convention is used.
 pospath
 The path of the dataset that contains the time dependent data in the HDF5 file. The first axis of the array must be the time axis. This is only needed for an offline analysis
 poskey
 In case of an online analysis, this is the key of the state item that contains the data from which the RDF is derived.
 cellpath
 The path the timedependent cell vector data. This is only needed when the cell parameters are variable and the analysis is offline.
 cellkey
 The key of the stateitem that contains the cell vectors. This is only needed when the cell parameters are variable and the analysis is done online.
 outpath
 The output path for the frequency computation in the HDF5 file. If not given, it defaults to ‘%s_rdf’ % path. If this path already exists, it will be removed first.
When f is None, or when the path does not exist in the HDF5 file, the class can be used as an online analysis hook for the iterative algorithms in yaff.sampling package. This means that the RDF is built up as the itertive algorithm progresses. The end option is ignored and max_sample is not applicable to an online analysis.

configure_online
(iterative, st_pos, st_cell=None)¶

configure_offline
(ds_pos, ds_cell=None)¶

init_first
()¶ Setup some work arrays

read_online
(st_pos, st_cell=None)¶

read_offline
(i, ds_pos, ds_cell=None)¶

compute_iteration
()¶

compute_derived
()¶

plot
(fn_png='rdf.png')¶
4.7. yaff.analysis.spectrum
– Spectral analysis and autocorrelation functions¶
Spectral analysis and autocorrelation functions

class
yaff.analysis.spectrum.
Spectrum
(f=None, start=0, end=1, step=1, bsize=4096, select=None, path='trajectory/vel', key='vel', outpath=None, weights=None)¶ Bases:
yaff.analysis.hook.AnalysisHook
Optional arguments:
 f
 An h5.File instance containing the trajectory data. If
f
is not given, or it does not contain the dataset referred to with thepath
argument, an online analysis is carried out.  start
 The first sample to be considered for analysis. This may be negative to indicate that the analysis should start from the start last samples.
 end
 The last sample to be considered for analysis. This may be negative to indicate that the last end sample should not be considered.
 step
 The spacing between the samples used for the analysis
 bsize
 The size of the blocks used for individual FFT calls.
 select
 A list of atom indexes that are considered for the computation of the spectrum. If not given, all atoms are used.
 path
 The path of the dataset that contains the time dependent data in the HDF5 file. The first axis of the array must be the time axis. The spectra are summed over the other axes.
 key
 In case of an online analysis, this is the key of the state item that contains the data from which the spectrum is derived.
 outpath
 The output path for the frequency computation in the HDF5 file. If not given, it defaults to ‘%s_spectrum’ % path. If this path already exists, it will be removed first.
 weights
 If not given, the spectrum is just a simple sum of contributions from different timedependent functions. If given, a linear combination is made based on these weights.
The max_sample argument from get_slice is not used because the choice step value is an important parameter: it is best to choose step*bsize such that it coincides with a part of the trajectory in which the velocities (or other data) are continuous.
The block size should be set such that it corresponds to a decent resolution on the frequency axis, i.e. 33356 fs of MD data corresponds to a resolution of about 1 cm^1. The step size should be set such that the highest frequency is above the highest relevant frequency in the spectrum, e.g. a step of 10 fs corresponds to a frequency maximum of 3336 cm^1. The total number of FFT’s, i.e. length of the simulation divided by the block size multiplied by the number of timedependent functions in the data, determines the noise reduction on the (the amplitude of) spectrum. If there is sufficient data to perform 10K FFT’s, one should get a reasonably smooth spectrum.
Depending on the FFT implementation in numpy, it may be interesting to tune the bsize argument. A power of 2 is typically a good choice.
When f is None, or when the path does not exist in the HDF5 file, the class can be used as an online analysis hook for the iterative algorithms in yaff.sampling package. This means that the spectrum is built up as the iterative algorithm progresses. The end option is ignored for an online analysis.

init_online
()¶

init_timestep
()¶

offline_loop
(ds_signal)¶

configure_online
(iterative, st_signal)¶

configure_offline
(ds_signal)¶

init_first
()¶

read_online
(st_signal)¶

compute_iteration
()¶

compute_derived
()¶

plot
(fn_png='spectrum.png', do_wavenum=True, xlim=None, verticals=None, thermostat=None, ndof=None)¶ verticals: array containing as first entry the timeconstant of the original system, and as following entries the wavenumbers of the original system.

plot_ac
(fn_png='ac.png')¶
4.8. yaff.analysis.utils
– Auxiliary analysis routines¶
Auxiliary analysis routines

yaff.analysis.utils.
get_slice
(f, start=0, end=1, max_sample=None, step=None)¶ Argument:
 f
 A HDF5.File instance, may be None if it is not available. If it contains a trajectory group, this group will be used to determine the number of time steps in the trajectory.
Optional arguments:
 start
 The first sample to be considered for analysis. This may be negative to indicate that the analysis should start from the start last samples.
 end
 The last sample to be considered for analysis. This may be negative to indicate that the last end sample should not be considered.
 max_sample
 When given, step is set such that the number of samples does not exceed max_sample.
 step
 The spacing between the samples used for the analysis
The optional arguments can be given to all of the analysis routines. Just make sure you never specify step and max_sample at the same time. The max_sample argument assures that the step is set such that the number of samples does (just) not exceed max_sample. The max_sample option only works when f is not None, or when end is positive.
if f is present or start and end are positive, and max_sample and step or not given, max_sample defaults to 1000.
Returns start, end and step. When f is given, start and end are always positive.