stacie.utils module

Utilities for preparing inputs.

exception PositiveDefiniteError[source]

Bases: ValueError

Raised when a matrix is not positive definite.

class UnitConfig(*, acint_symbol='\\\\mathcal{I}', acint_unit_str='', acint_unit=1.0, acint_fmt='.2e', freq_unit_str='', freq_unit=1.0, freq_fmt='.2e', time_unit_str='', time_unit=1.0, time_fmt='.2e', clevel=0.95)[source]

Bases: object

Unit configuration for functions that print or plot values.

This class never influences numerical values in STACIE’s computations, such as attributes of a result object or other variables. It only affects printed or plotted values.

The acint_unit, freq_unit, and time_unit attributes should be set as follows:

  • The values of variables in STACIE (and your scripts using STACIE) are always in “internal units”.

  • The *_unit attributes are assumed to have the value of a “display unit” in the same internal units.

For example, if your internal time unit is 1 ps and you want to display times in ns, set time_unit=1000.0, because your display unit (1 ns) is 1000 internal units (1 ps).

To make these conventions easy to follow (and to avoid unit hell in general), it is recommended to pick consistent internal units for your system. For example, use atomic units or SI units throughout your code:

  • As soon as you load data from a file, immediately convert it to internal units.

  • Only just before printing or plotting, convert to display units, which is also how this class is used in STACIE.

For example, when all variables are in SI base units and you want to display time in ns, frequency in THz, and autocorrelation integrals in cm^2/s, then create a UnitConfig as follows:

units = UnitConfig(
    time_unit=1e-9,
    time_unit_str="ns",
    freq_unit=1e12,
    freq_unit_str="THz",
    acint_unit=1e-4,
    acint_unit_str="cm^2/s",
)
Parameters:
acint_fmt: str

The format string for an autocorrelation integral.

acint_symbol: str

The symbol used for the autocorrelation integral.

acint_unit: float

The unit of an autocorrelation integral.

acint_unit_str: str

The text used for the autocorrelation integral unit.

property clb: float

The confidence lower bound used to plot confidence intervals.

clevel: float

The confidence level used to plot confidence intervals.

property cub: float

The confidence upper bound used to plot confidence intervals.

freq_fmt: str

The format string for a frequency.

freq_unit: float

The unit of frequency.

freq_unit_str: str

The text used for the frequency unit.

time_fmt: str

The format string for a time value.

time_unit: float

The unit of time.

time_unit_str: str

The text used for the time unit.

block_average(sequences, size)[source]

Reduce input sequences by taking block averages.

This reduces the maximum frequency of the frequency axis of the spectrum, which may be useful when the time step is much shorter than the exponential autocorrelation time.

A time step \(h = \tau_\text{exp} / (20 \pi)\) (after taking block averages) is recommended, not larger.

Parameters:
  • sequences (ndarray[tuple[Any, ...], dtype[float]]) – Input sequence(s) to be block averaged, with shape (nseq, nstep). A single sequence with shape (nstep, ) is also accepted.

  • size (int) – The block size

Return type:

ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]

Returns:

blav_sequences – Sequences of block averages, with shape (nseq, nstep // size)

label_unit(label, unit_str)[source]

Format a label with the unit string as label [unit].

When the unit is "" or None, the unit is omitted.

Parameters:
  • label (str) – The label text.

  • unit_str (str | None) – The unit string.

Return type:

str

mixture_stats(means, covars, weights)[source]

Compute the statistics of the (Gaussian) mixture distribution.

Parameters:
  • means (ndarray[tuple[Any, ...], dtype[float]]) – The means of the mixture components. Weighted averages are taken over the first index. Shape is (ncomp, nfeature) or (ncomp,). If the shape is (ncomp,), the means are interpreted as scalars. If the shape is (ncomp, nfeature), the means are interpreted as vectors.

  • covars (ndarray[tuple[Any, ...], dtype[float]]) – The covariances of the mixture components. If the shape matches that of the means argument, this array is interpreted as a diagonal covariance matrix. If the shape is (ncomp, nfeature, nfeature), this array is interpreted as full covariance matrices.

  • weights (ndarray[tuple[Any, ...], dtype[float]]) – The weights of the mixture components. Shape is (ncomp,). The weights are normalized to sum to 1.

Returns:

  • mean – The mean of the mixture distribution. Shape is (nfeature,).

  • covar – If the input covariance matrix is diagonal, the output covariance matrix is also diagonal and has shape (nfeature,). If the input covariance matrix is full, the output covariance matrix is also full and has shape (nfeature, nfeature).

robust_dot(scales, evals, evecs, other)[source]

Compute the dot product of a robustly diagonalized matrix with another matrix.

  • The first three arguments are the output of robust_posinv.

  • To multiply with the inverse, just use element-wise inversion of scales and evals.

Parameters:
  • scales – The scales used to precondition the matrix.

  • evals – The eigenvalues of the preconditioned matrix.

  • evecs – The eigenvectors of the preconditioned matrix.

  • other – The other matrix to be multiplied. 1D or 2D arrays are accepted.

Returns:

result – The result of the dot product.

robust_posinv(matrix)[source]

Compute the eigenvalues, eigenvectors and inverse of a positive definite symmetric matrix.

This function is a robust version of numpy.linalg.eigh and numpy.linalg.inv that can handle large variations in order of magnitude of the diagonal elements. If the matrix is not positive definite, a ValueError is raised.

Parameters:

matrix (ndarray[tuple[Any, ...], dtype[float]]) – Input matrix to be diagonalized.

Return type:

tuple[ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]], ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]], ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]], ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]]

Returns:

  • scales – The scales used to precondition the matrix.

  • evals – The eigenvalues of the preconditioned matrix.

  • evecs – The eigenvectors of the preconditioned matrix.

  • inverse – The inverse of the original.

split(sequences, nsplit)[source]

Split input sequences into shorter parts of equal length.

This reduces the resolution of the frequency axis of the spectrum, which may be useful when the sequence length is much longer than the exponential autocorrelation time.

Parameters:
  • sequences (ndarray[tuple[Any, ...], dtype[float]]) – Input sequence(s) to be split, with shape (nseq, nstep). A single sequence with shape (nstep, ) is also accepted.

  • nsplit (int) – The number of splits.

Return type:

ndarray[tuple[Any, ...], dtype[TypeVar(_ScalarT, bound= generic)]]

Returns:

split_sequences – Splitted sequences, with shape (nseq * nsplit, nstep // nsplit).