4. yaff.analysis
– Trajectory analysis¶
4.1. yaff.analysis.basic
– Basic trajectory analysis routines¶
Basic trajectory analysis routines
-
yaff.analysis.basic.
plot_energies
(f, fn_png='energies.png', **kwargs)¶ Make a plot of the potential, total and conserved energy as f. of time
Arguments:
f An h5.File instance containing the trajectory data.
Optional arguments:
fn_png The png file to write the figure to
The optional arguments of the
get_slice
function are also accepted in the form of keyword arguments.The units for making the plot are taken from the yaff screen logger. This type of plot is essential for checking the sanity of a simulation.
-
yaff.analysis.basic.
plot_temperature
(f, fn_png='temperature.png', **kwargs)¶ Make a plot of the temperature as function of time
Arguments:
- f
- An h5.File instance containing the trajectory data.
Optional arguments:
- fn_png
- The png file to write the figure to
The optional arguments of the
get_slice
function are also accepted in the form of keyword arguments.The units for making the plot are taken from the yaff screen logger. This type of plot is essential for checking the sanity of a simulation.
-
yaff.analysis.basic.
plot_pressure
(f, fn_png='pressure.png', window=1, **kwargs)¶ Make a plot of the pressure as function of time
Arguments:
- f
- An h5.File instance containing the trajectory data.
Optional arguments:
- fn_png
- The png file to write the figure to
- window
- The window over which the pressure is averaged
The optional arguments of the
get_slice
function are also accepted in the form of keyword arguments.The units for making the plot are taken from the yaff screen logger. This type of plot is essential for checking the sanity of a simulation.
-
yaff.analysis.basic.
plot_temp_dist
(f, fn_png='temp_dist.png', temp=None, ndof=None, select=None, **kwargs)¶ Plots the distribution of the weighted atomic velocities
Arguments:
- f
- An h5.File instance containing the trajectory data.
Optional arguments:
- fn_png
- The png file to write the figure to
- select
- A list of atom indexes that should be considered for the analysis. By default, information from all atoms is combined.
- temp
- The (expected) average temperature used to plot the theoretical distributions.
- ndof
- The number of degrees of freedom. If not specified, this is chosen to be 3*(number of atoms)
- start, end, step, max_sample
- The optional arguments of the
get_slice
function are also accepted in the form of keyword arguments.
This type of plot is essential for checking the sanity of a simulation. The empirical cumulative distribution is plotted and overlayed with the analytical cumulative distribution one would expect if the data were taken from an NVT ensemble.
This type of plot reveals issues with parts that are relatively cold or warm compared to the total average temperature. This helps to determine (the lack of) thermal equilibrium.
-
yaff.analysis.basic.
plot_press_dist
(f, temp, fn_png='press_dist.png', press=None, ndof=None, select=None, **kwargs)¶ Plots the distribution of the internal pressure
Arguments:
- f
- An h5.File instance containing the trajectory data.
- temp
- The (expected) average temperature used to plot the theoretical distributions.
Optional arguments:
- fn_png
- The png file to write the figure to
- select
- A list of atom indexes that should be considered for the analysis. By default, information from all atoms is combined.
- press
- The (expected) average pressure used to plot the theoretical distributions.
- ndof
- The number of degrees of freedom. If not specified, this is chosen to be 3*(number of atoms)
- start, end, step, max_sample
- The optional arguments of the
get_slice
function are also accepted in the form of keyword arguments.
This type of plot is essential for checking the sanity of a simulation. The empirical cumulative distribution is plotted and overlayed with the analytical cumulative distribution one would expect if the data were taken from an NPT ensemble.
-
yaff.analysis.basic.
plot_volume_dist
(f, fn_png='volume_dist.png', temp=None, press=None, **kwargs)¶ Plots the distribution of the volume
Arguments:
- f
- An h5.File instance containing the trajectory data.
Optional arguments:
- fn_png
- The png file to write the figure to
- temp
- The (expected) average temperature used to plot the theoretical distributions.
- press
- The (expected) average pressure used to plot the theoretical distributions.
- start, end, step, max_sample
- The optional arguments of the
get_slice
function are also accepted in the form of keyword arguments.
This type of plot is essential for checking the sanity of a simulation. The empirical cumulative distribution is plotted and overlayed with the analytical cumulative distribution one would expect if the data were taken from an NPT ensemble.
-
yaff.analysis.basic.
plot_density
(f, fn_png='density.png', **kwargs)¶ Make a plot of the mass density as function of time
Arguments:
- f
- An h5.File instance containing the trajectory data.
Optional arguments:
- fn_png
- The png file to write the figure to
The optional arguments of the
get_slice
function are also accepted in the form of keyword arguments.The units for making the plot are taken from the yaff screen logger.
-
yaff.analysis.basic.
plot_cell_pars
(f, fn_png='cell_pars.png', **kwargs)¶ Make a plot of the cell parameters as function of time
Arguments:
- f
- An h5.File instance containing the trajectory data.
Optional arguments:
- fn_png
- The png file to write the figure to
The optional arguments of the
get_slice
function are also accepted in the form of keyword arguments.The units for making the plot are taken from the yaff screen logger.
-
yaff.analysis.basic.
plot_epot_contribs
(f, fn_png='epot_contribs.png', size=1.0, **kwargs)¶ Make a plot of the contributions to the potential energy as f. of time
Arguments:
- f
- An h5.File instance containing the trajectory data.
Optional arguments:
- fn_png
- The png file to write the figure to
The optional arguments of the
get_slice
function are also accepted in the form of keyword arguments.The units for making the plot are taken from the yaff screen logger. This type of plot is essential for checking the sanity of a simulation.
-
yaff.analysis.basic.
plot_angle
(f, index, fn_png='angle.png', n_int=1, xlim=None, ymax=None, angle_lim=None, angle_shift=False, oriented=False, **kwargs)¶ Make a plot of the angle between the given atoms as f. of time
Arguments:
- f
- An h5.File instance containing the trajectory data
- index
- A list containing the three indices of the atoms as in the h5 file
Optional arguments:
- fn_png
- The png file to write the figure to
- n_int
- The number of equidistant intervals considered in the FFT and histogram
- xlim
- Frequency interval of interest
- ymax
- Maximal intensity of interest in the FFT
- angle_lim
- Angle interval of interest in the histogram
- angle_shift
- If True, all angles are shifted towards positive values
- oriented
- If True, a distinction is made between positive and negative angles
-
yaff.analysis.basic.
plot_dihedral
(f, index, fn_png='dihedral.png', n_int=1, xlim=None, ymax=None, angle_lim=None, angle_shift=False, oriented=False, **kwargs)¶ Make a plot of the angle between the given atoms as f. of time
Arguments:
- f
- An h5.File instance containing the trajectory data
- index
- A list containing the four indices of the atoms as in the h5 file
Optional arguments:
- fn_png
- The png file to write the figure to
- n_int
- The number of equidistant intervals considered in the FFT and histogram
- xlim
- Frequency interval of interest
- ymax
- Maximal intensity of interest in the FFT
- angle_lim
- Angle interval of interest in the histogram
- angle_shift
- If True, all angles are shifted towards positive values
- oriented
- If True, a distinction is made between positive and negative angles
4.2. yaff.analysis.blav
– The block-average method¶
The block-average method
-
yaff.analysis.blav.
blav
(signal, minblock=100, fn_png=None, unit=None)¶ Analyze the signal with the block average method.
The variance on the block average error as function of block size is fitted using the
a+b/bsize
model, wherea
is a measure for the error on the average, i.e. when the block size becomes infintely large. If the fit fails, a coarse estimate of the error on the average is returned, i.e. the largest block average error.Arguments:
- signal
- An array containing time-dependent data.
Optional arguments:
- minblock
- The minimum number of blocks to be considered.
- fn_png
- When given, the data used for the fit and the fitted model are plotted.
- unit
- This is only relevant when a plot is made. It is used as the unit
for the y-axis, e.g.
log.length
.
Returns:
- error
- The fitted error.
- sinef
- The fitted statistical inefficiency.
-
yaff.analysis.blav.
inefficiency
(signal, time=None, fn_png='stat_ineff.png', taus=None, eq_limits=None)¶ Analyze the signal to determine the statistical inefficiency. The statistical inefficiency of a signal is defined as the limiting ratio of the observed variance of its long-term averages to their expected variance. It can hence be regarded as the factor (>1) with which the sample size should be multiplied in order to compensate for correlation. This is derived in:
Friedberg, R; Cameron, J.E. J. Chem. Phys. 1970, 52, 6049-6058.Arguments:
- signal
- An array containing time-dependent data.
Optional arguments:
- time
- An array containing the time information of the provided signal.
- fn_png
- Name of the plot to which the data is written.
- taus
- An array, where for each element tau the block averages with this block size are determined.
- eq_limits
- An array containing the fractions of the signal to be considered as equilibration.
Returns:
4.3. yaff.analysis.diffusion
– Diffusion constants¶
Diffusion constants
-
class
yaff.analysis.diffusion.
Diffusion
(f=None, start=0, end=-1, step=1, mult=20, select=None, bsize=None, pospath='trajectory/pos', poskey='pos', outpath=None)¶ Bases:
yaff.analysis.hook.AnalysisHook
Computes mean-squared displacements and diffusion constants
Optional arguments:
- f
- An h5.File instance containing the trajectory data. If
f
is not given, or it does not contain the dataset referred to with thepath
argument, an on-line analysis is carried out. - start, end, step
- Optional arguments for the
get_slice
function. max_sample is not supported because the choice of the step argument is critical for a useful result. - mult
- In the first place, the mean square displacement (MSD) between
subsequent step is computed. The MSD is also computed between
every, two, three, …, until
mult
steps. - select
- A list of atom indexes that are considered for the computation of the MSD’s. If not given, all atoms are used.
- bsize
- If given, time intervals that coincide with the boundaries of the block size, will not be considered form the analysis. This is useful when there is a significant monte carlo move between subsequent blocks. If step > 1, the intervals will be left out if the overlap with boundaries of blocks with size bsize*step.
- pospath
- The path of the dataset that contains the time dependent data in the HDF5 file. The first axis of the array must be the time axis.
- poskey
- In case of an on-line analysis, this is the key of the state item that contains the data from which the MSD’s are derived.
- outpath
- The output path for the MSD results. If not given, it defaults to ‘%s_diff’ % path. If this path already exists, it will be removed first.
-
init_first
()¶
-
configure_online
(iterative, st_pos)¶
-
configure_offline
(ds_pos)¶
-
init_timestep
()¶
-
read_online
(st_pos)¶
-
read_offline
(i, ds_pos)¶
-
compute_iteration
()¶
-
compute_derived
()¶
-
overlap_bsize
(m)¶
-
update_msd
(msd, m)¶
-
plot
(fn_png='msds.png')¶
4.4. yaff.analysis.hook
– Abstract hook implementation for trajectory analysis¶
Abstract hook implementation for trajectory analysis
-
class
yaff.analysis.hook.
AnalysisInput
(path, key, required=True)¶ Bases:
object
Describes the location of the time-dependent input for an analysis.
Arguments:
- path
- The path in the HDF5 file were the input can be found. This is only relevant for an off-line analysis. This must be a dataset in the trajectory folder.
- key
- The key of the state item that contains the input. This is only relevant for an on-line analysis.
Optional arguments:
- required
- When true, this input is mandatory. When false, it is optional.
-
class
yaff.analysis.hook.
AnalysisHook
(f=None, start=0, end=-1, max_sample=None, step=None, analysis_inputs={}, outpath='trajectory/noname', do_timestep=False)¶ Bases:
yaff.sampling.iterative.Hook
Base class for the analysis hooks.
Analysis hooks in Yaff support both off-line and on-line analysis.
Optional arguments:
- f
- An h5.File instance containing the trajectory data. If
f
is not given, or it does not contain the dataset referred to with thepath
argument, an on-line analysis is carried out. - start, end, max_sample, step
- Optional arguments for the
get_slice
function. - analysis_inputs
- A list with AnalysisInput instances
- outpath
- The output path for the analysis.
- do_timestep
- When True, a self.timestep attribute will be initialized as early as possible.
-
__call__
(iterative)¶
-
compute_offline
()¶
-
init_first
()¶
-
init_online
()¶
-
offline_loop
(**datasets)¶
-
configure_online
(**state_items)¶
-
configure_offline
(**kwargs)¶
-
init_timestep
()¶
-
read_online
(**state_items)¶
-
read_offline
(i, **kwargs)¶
-
compute_iteration
()¶
-
compute_derived
()¶
4.5. yaff.analysis.pca
– Toolkit for Principal Component Analysis (PCA)¶
Toolkit for Principal Component Analysis (PCA)
-
yaff.analysis.pca.
calc_cov_mat_internal
(q, q_ref=None)¶ Calculates the covariance matrix of a time-dependent matrix q, containing the independent components in its columns. The rows are treated as different times at which the components are evaluated.
Arguments:
- q
- The txN matrix for which the NxN covariance matrix is determined
Optional arguments:
- q_ref
- Reference vector of size 1xN. If not provided, the ensemble average is taken.
-
yaff.analysis.pca.
calc_cov_mat
(f, q_ref=None, start=0, end=None, step=1, select=None, path='trajectory/pos', mw=True)¶ Calculates the covariance matrix of the given trajectory.
Arguments:
- f
- An h5.File instance containing the trajectory data.
Optional arguments:
- q_ref
- Reference vector of the positions. If not provided, the ensemble average is taken.
- start
- The first sample to be considered for analysis. This may be negative to indicate that the analysis should start from the -start last samples.
- end
- The last sample to be considered for analysis. This may be negative to indicate that the last -end sample should not be considered.
- step
- The spacing between the samples used for the analysis
- select
- A list of atom indexes that are considered for the computation of the spectrum. If not given, all atoms are used.
- path
- The path of the dataset that contains the time dependent data in the HDF5 file. The first axis of the array must be the time axis.
- mw
- If mw is True, the covariance matrix is mass-weighted.
-
yaff.analysis.pca.
calc_pca
(f_target, cov_mat=None, f=None, q_ref=None, start=0, end=None, step=1, select=None, path='trajectory/pos', mw=True, temp=None)¶ Performs a principle component analysis of the given trajectory.
Arguments:
- f_target
- Path to an h5.File instance to which the results are written.
Optional arguments:
- cov_mat
- The covariance matrix, if already calculated. If not provided, the covariance matrix will be calculatd based on the file f.
- f
- An h5.File instance containing the trajectory data.
- q_ref
- Reference vector of the positions. If not provided, the ensemble average is taken.
- start
- The first sample to be considered for analysis. This may be negative to indicate that the analysis should start from the -start last samples.
- end
- The last sample to be considered for analysis. This may be negative to indicate that the last -end sample should not be considered.
- step
- The spacing between the samples used for the analysis
- select
- A list of atom indexes that are considered for the computation of the spectrum. If not given, all atoms are used.
- path
- The path of the dataset that contains the time dependent data in the HDF5 file. The first axis of the array must be the time axis.
- mw
- If mass_weighted is True, the covariance matrix is mass-weighted.
- temp
- Temperature at which the simulation is carried out, necessary to determine the frequencies
-
yaff.analysis.pca.
pca_projection
(f_target, f, pm, start=0, end=None, step=1, select=None, path='trajectory/pos', mw=True)¶ Determines the principal components of an MD simulation
Arguments:
- f_target
- Path to an h5.File instance to which the results are written.
- f
- An h5.File instance containing the trajectory data.
- pm
- An array containing the principal modes in its columns
Optional arguments:
- start
- The first sample to be considered for analysis. This may be negative to indicate that the analysis should start from the -start last samples.
- end
- The last sample to be considered for analysis. This may be negative to indicate that the last -end sample should not be considered.
- step
- The spacing between the samples used for the analysis
- select
- A list of atom indexes that are considered for the computation of the spectrum. If not given, all atoms are used.
- path
- The path of the dataset that contains the time dependent data in the HDF5 file. The first axis of the array must be the time axis.
- mw
- If mass_weighted is True, the covariance matrix is mass-weighted.
-
yaff.analysis.pca.
write_principal_mode
(f, f_pca, index, n_frames=100, select=None, mw=True, scaling=1.0)¶ Writes out one xyz file per principal mode given in index
Arguments:
- f
- Path to an h5.File instance containing the original data.
- f_pca
- Path to an h5.File instance containing the PCA, with reference structure, eigenvalues and principal modes.
- index
- An array containing the principal modes which need to be written out.
Optional arguments:
- n_frames
- The number of frames in each xyz file.
- select
- A list of atom indexes that are considered for the computation of the spectrum. If not given, all atoms are used.
- mw
- If mass_weighted is True, the covariance matrix is assumed to be mass-weighted.
- scaling
- Scaling factor applied to the maximum deviation of the principal mode (i.e. the maximum principal component for that mode)
-
yaff.analysis.pca.
pca_similarity
(covar_a, covar_b)¶ Calculates the similarity between the two covariance matrices
Arguments:
- covar_a
- The first covariance matrix.
- covar_b
- The second covariance matrix.
-
yaff.analysis.pca.
pca_convergence
(f, eq_time=0.0, n_parts=None, step=1, fn='PCA_convergence', n_bootstrap=50, mw=True)¶ Calculates the convergence of the simulation by calculating the pca similarity for different subsets of the simulation.
Arguments:
- f
- An h5.File instance containing the trajectory data.
Optional arguments:
- eq_time
- Equilibration time, discarded from the simulation.
- n_parts
- Array containing the number of parts in which the total simulation is divided.
- step
- Stepsize used in the trajectory.
- fn
- Filename containing the convergence plot.
- n_bootstrap
- The number of bootstrapped trajectories.
- mw
- If mass_weighted is True, the covariance matrix is mass-weighted.
4.6. yaff.analysis.rdf
– Radial distribution functions¶
Radial distribution functions
-
class
yaff.analysis.rdf.
RDF
(rcut, rspacing, f=None, start=0, end=-1, max_sample=None, step=None, select0=None, select1=None, pairs_sr=None, nimage=0, pospath='trajectory/pos', poskey='pos', cellpath=None, cellkey=None, outpath=None)¶ Bases:
yaff.analysis.hook.AnalysisHook
Computes a radial distribution function (RDF)
Argument:
- rcut
- The cutoff for the RDF analysis. This should be lower than the spacing between the primitive cell planes, multiplied by (1+2*nimage).
- rspacing
- The width of the bins to build up the RDF.
Optional arguments:
- f
- An h5.File instance containing the trajectory data. If
f
is not given, or it does not contain the dataset referred to with thepath
argument, an on-line analysis is carried out. - start, end, max_sample, step
- arguments to setup the selection of time slices. See
get_slice
for more information. - select0
- A list of atom indexes that are considered for the computation of the rdf. If not given, all atoms are used.
- select1
- A list of atom indexes that are needed to compute an RDF between two disjoint sets of atoms. (If there is some overlap between select0 and select1, an error will be raised.) If this is None, an ‘internal’ RDF will be computed for the atoms specified in select0.
- pairs_sr
- An array with short-range pairs of atoms (shape K x 2). When given, an additional RDFs is generated for the short-range pairs (rdf_sr).
- nimage
- The number of cell images to consider in the computation of the pair distances. By default, this is zero, meaning that only the minimum image convention is used.
- pospath
- The path of the dataset that contains the time dependent data in the HDF5 file. The first axis of the array must be the time axis. This is only needed for an off-line analysis
- poskey
- In case of an on-line analysis, this is the key of the state item that contains the data from which the RDF is derived.
- cellpath
- The path the time-dependent cell vector data. This is only needed when the cell parameters are variable and the analysis is off-line.
- cellkey
- The key of the stateitem that contains the cell vectors. This is only needed when the cell parameters are variable and the analysis is done on-line.
- outpath
- The output path for the frequency computation in the HDF5 file. If not given, it defaults to ‘%s_rdf’ % path. If this path already exists, it will be removed first.
When f is None, or when the path does not exist in the HDF5 file, the class can be used as an on-line analysis hook for the iterative algorithms in yaff.sampling package. This means that the RDF is built up as the itertive algorithm progresses. The end option is ignored and max_sample is not applicable to an on-line analysis.
-
configure_online
(iterative, st_pos, st_cell=None)¶
-
configure_offline
(ds_pos, ds_cell=None)¶
-
init_first
()¶ Setup some work arrays
-
read_online
(st_pos, st_cell=None)¶
-
read_offline
(i, ds_pos, ds_cell=None)¶
-
compute_iteration
()¶
-
compute_derived
()¶
-
plot
(fn_png='rdf.png')¶
4.7. yaff.analysis.spectrum
– Spectral analysis and autocorrelation functions¶
Spectral analysis and autocorrelation functions
-
class
yaff.analysis.spectrum.
Spectrum
(f=None, start=0, end=-1, step=1, bsize=4096, select=None, path='trajectory/vel', key='vel', outpath=None, weights=None)¶ Bases:
yaff.analysis.hook.AnalysisHook
Optional arguments:
- f
- An h5.File instance containing the trajectory data. If
f
is not given, or it does not contain the dataset referred to with thepath
argument, an on-line analysis is carried out. - start
- The first sample to be considered for analysis. This may be negative to indicate that the analysis should start from the -start last samples.
- end
- The last sample to be considered for analysis. This may be negative to indicate that the last -end sample should not be considered.
- step
- The spacing between the samples used for the analysis
- bsize
- The size of the blocks used for individual FFT calls.
- select
- A list of atom indexes that are considered for the computation of the spectrum. If not given, all atoms are used.
- path
- The path of the dataset that contains the time dependent data in the HDF5 file. The first axis of the array must be the time axis. The spectra are summed over the other axes.
- key
- In case of an on-line analysis, this is the key of the state item that contains the data from which the spectrum is derived.
- outpath
- The output path for the frequency computation in the HDF5 file. If not given, it defaults to ‘%s_spectrum’ % path. If this path already exists, it will be removed first.
- weights
- If not given, the spectrum is just a simple sum of contributions from different time-dependent functions. If given, a linear combination is made based on these weights.
The max_sample argument from get_slice is not used because the choice step value is an important parameter: it is best to choose step*bsize such that it coincides with a part of the trajectory in which the velocities (or other data) are continuous.
The block size should be set such that it corresponds to a decent resolution on the frequency axis, i.e. 33356 fs of MD data corresponds to a resolution of about 1 cm^-1. The step size should be set such that the highest frequency is above the highest relevant frequency in the spectrum, e.g. a step of 10 fs corresponds to a frequency maximum of 3336 cm^-1. The total number of FFT’s, i.e. length of the simulation divided by the block size multiplied by the number of time-dependent functions in the data, determines the noise reduction on the (the amplitude of) spectrum. If there is sufficient data to perform 10K FFT’s, one should get a reasonably smooth spectrum.
Depending on the FFT implementation in numpy, it may be interesting to tune the bsize argument. A power of 2 is typically a good choice.
When f is None, or when the path does not exist in the HDF5 file, the class can be used as an on-line analysis hook for the iterative algorithms in yaff.sampling package. This means that the spectrum is built up as the iterative algorithm progresses. The end option is ignored for an on-line analysis.
-
init_online
()¶
-
init_timestep
()¶
-
offline_loop
(ds_signal)¶
-
configure_online
(iterative, st_signal)¶
-
configure_offline
(ds_signal)¶
-
init_first
()¶
-
read_online
(st_signal)¶
-
compute_iteration
()¶
-
compute_derived
()¶
-
plot
(fn_png='spectrum.png', do_wavenum=True, xlim=None, verticals=None, thermostat=None, ndof=None)¶ verticals: array containing as first entry the timeconstant of the original system, and as following entries the wavenumbers of the original system.
-
plot_ac
(fn_png='ac.png')¶
4.8. yaff.analysis.utils
– Auxiliary analysis routines¶
Auxiliary analysis routines
-
yaff.analysis.utils.
get_slice
(f, start=0, end=-1, max_sample=None, step=None)¶ Argument:
- f
- A HDF5.File instance, may be None if it is not available. If it contains a trajectory group, this group will be used to determine the number of time steps in the trajectory.
Optional arguments:
- start
- The first sample to be considered for analysis. This may be negative to indicate that the analysis should start from the -start last samples.
- end
- The last sample to be considered for analysis. This may be negative to indicate that the last -end sample should not be considered.
- max_sample
- When given, step is set such that the number of samples does not exceed max_sample.
- step
- The spacing between the samples used for the analysis
The optional arguments can be given to all of the analysis routines. Just make sure you never specify step and max_sample at the same time. The max_sample argument assures that the step is set such that the number of samples does (just) not exceed max_sample. The max_sample option only works when f is not None, or when end is positive.
if f is present or start and end are positive, and max_sample and step or not given, max_sample defaults to 1000.
Returns start, end and step. When f is given, start and end are always positive.