Ionic Electrical Conductivity of Molten Sodium Chloride at 1100 K (OpenMM)

Warning

This example notebook is work in progress. There are still some issues with the MD results obtained with OpenMM, which are discussed in the notebook.

This notebook shows how to post-process trajectories from OpenMM simulations to calculate the ionic electrical conductivity. The OpenMM trajectories are converted to NPZ files within the Jupyter Notebooks of the simulation, making the approach here easily adaptable to other codes or physical systems. All OpenMM simulation notebooks can be found in the directory docs/data/openmm_salt in STACIE’s source repository. The required theoretical background is explained the Ionic Electrical Conductivity section.

The MD simulations are performed using the Born-Huggins-Mayer-Tosi-Fumi potential, which is a popular choice for molten salts. [TF64] This potential does not use mixing rules and it is not natively implemented in OpenMM, but it can be incorporated using the CustomNonbondedForce and some creativity, see docs/data/openmm_salt/bhmft.py in the Git repository. The molten salt was simulated with a 3D periodic box of 1728 ions (864 Na\(^+\) and 864 Cl\(^-\)). The time step in all simulations was 5 fs.

Following the Recommendations for MD Simulations, an initial block size of 10 steps (50 fs) was used. Because there is little prior knowledge on the structure of the spectrum, the exponential polynomial model (ExpPoly) with degrees \(S=\{0, 1\}\) was used initially, i.e. with \(P=2\) parameters. As explained in the section on block averages, \(400 P\) blocks were collected in the initial production runs, amounting to 8000 steps (40 ps) of simulation time.

In total 100 NVE production runs were performed. For each run, the system was first equilibrated in the NVT and later NPT ensemble. According to the section How to Prepare Sufficient Inputs for STACIE?, 100 runs should be sufficient to obtain a relative error on the ionic conductivity of about 1%:

\[ \epsilon_\text{rel} \approx \frac{1}{\sqrt{20 P M}} \approx 0.0091\]

where \(P\) is the number of parameters in the model and \(M = 100 \times 3\) is the number of independent input sequences. (100 trajectories, 3 Cartesian components of the charge current per trajectory)

Note

The results in this example were obtained using OpenMM 8.2.0. Minor differences may arise when using a different version of OpenMM, or even the same version compiled with a different compiler.

Library Imports and Configuration

import os
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
from path import Path
import scipy.constants as sc
from scipy.stats import chi2
from stacie import (
    ExpPolyModel,
    PadeModel,
    UnitConfig,
    compute_spectrum,
    estimate_acint,
    plot_fitted_spectrum,
    plot_extras,
)
from utils import plot_instantaneous_percentiles, plot_cumulative_temperature_histogram
mpl.rc_file("matplotlibrc")
%config InlineBackend.figure_formats = ["svg"]
# You normally do not need to change this path.
# It only needs to be overridden when building the documentation.
DATA_ROOT = Path(os.getenv("DATA_ROOT", "./")) / "openmm_salt/output/"

Analysis of the NpT Equilibration Runs

To validate that the equilibration runs have reached to proper temperature distribution, the following cell implements a plot of the percentiles (over the 100 trajectories) of a thermodynamic quantity (temperature or volume).

def plot_openmm_percentiles(
    ensemble: str,
    field: str,
    unitstr: str,
    unit: float = 1,
    npart: int = 1,
    ntraj: int = 100,
    expected: None = None,
    ymin: float | None = None,
    ymax: float | None = None,
):
    """Plot the temperature of the NpT equilibration runs."""
    time = None
    natom = None
    sequences = []
    time = None
    for itraj in range(ntraj):
        row = []
        if itraj == 0:
            time = []
        for ipart in range(npart):
            path_npz = DATA_ROOT / f"sim{itraj:04d}_part{ipart:02d}_{ensemble}_traj.npz"
            if not path_npz.exists():
                print(f"File {path_npz} not found, skipping.")
                row = None
                break
            data = np.load(path_npz)
            natom = len(data["atnums"])
            if itraj == 0:
                time.append(data["time"])
            row.append(data[field])
        if row is None:
            continue
        if itraj == 0:
            time = np.concatenate(time)
        row = np.concatenate(row)
        sequences.append(row)
    sequences = np.array(sequences)

    percents = np.array([95, 80, 50, 20, 5])
    if field == "temperature":
        temp_d = 1100
        ndof = 3 * natom - 3
        expected = chi2.ppf(percents / 100, ndof) * temp_d / ndof
        ymin = chi2.ppf(0.01, ndof) * temp_d / ndof
        ymax = chi2.ppf(0.99, ndof) * temp_d / ndof
    else:
        expected = None
    time_unit = 1e-12
    num = f"percentiles_{field}_{ensemble}"
    plt.close(num)
    _, ax = plt.subplots(num=num)
    plot_instantaneous_percentiles(
        ax,
        time / time_unit,
        sequences / unit,
        percents,
        None if expected is None else expected / unit,
        ymin,
        ymax,
    )
    ax.set_title(f"{field.title()} percentiles during the {ensemble.upper()} run")
    ax.set_xlabel("Time [ps]")
    ax.set_ylabel(f"{field.title()} [{unitstr}]")

The following cell plots the temperature percentiles for the NVT and NPT equilibration runs.

plot_openmm_percentiles("nvt", "temperature", "K")
plot_openmm_percentiles("npt", "temperature", "K")
../_images/86ae556cef1339624d5ec5ce9b0c57c5f038a57e385fb9b8d6dc65ceb894b42b.svg ../_images/e322f440fc3198fa5eb36bd1858442e8f2cd6fbc926642162582f8a06beced82.svg

The percentiles look good for the equilibration runs: they quickly reach their theoretical values (black dotted lines) and then fluctuate around them. The following cell plots the temperature percentiles for the initial NVE production runs.

plot_openmm_percentiles("nve", "temperature", "K")
../_images/50f3d3bfe25096a7b0c99dabc563c6ebfae0dc57425186bb22fad399ab1a6727.svg

This is clearly not the correct temperature distribution! There is a known problem with restart files in the NVE ensemble in OpenMM. Due to a bug, it tends to lower the temperature of the system. More details on this issue can be found here: https://github.com/openmm/openmm/issues/4948

The following cell plots the percentiles of the volume (over the 100 trajectories), which we can only use to validate that the volume distribution converges. However, we cannot trivially compare these percentiles to an expected distribution.

plot_openmm_percentiles("npt", "volume", "nm$^3$", unit=1e-27, expected="last")
../_images/aea955312e2baef51362f903e20fa685b9211394b75b9854732a40a4182b102b.svg

This plot is not completely satisfactory either, as it suggests that the volume fluctuations of the 100 runs exhibit synchronized fluctuations, while they should be independent. (They use different random seeds for their MC Barostat.)

Reusable Code for the Analysis of the Production Runs

The analyze function takes a few parameters to apply the same analysis with STACIE to different inputs (initial and extended production runs). After the analysis, it generates screen output and figures as discussed in the minimal example.

BOLTZMANN_CONSTANT = sc.value("Boltzmann constant")  # J/K


def analyze(model, npart: int = 1, ntraj: int = 100) -> float:
    """Analyze MD trajectories to compute the ionic conductivity.

    Parameters
    ----------
    model
        The model fitted to the spectrum.
    npart
        The number of parts in the simulation to load.
        The default value of 1 corresponds to only loading the initial production runs.

    Returns
    -------
    acint
        The estimated ionic conductivity, mainly used for regression testing.
    """
    # Get the time step from the first NPZ file.
    time = np.load(DATA_ROOT / "sim0000_part00_nve_traj.npz")["time"]
    timestep = time[1] - time[0]

    def iter_sequences():
        """A generator that only loads one MD trajectory at a time in memory."""
        for itraj in range(ntraj):
            paths_npz = [
                DATA_ROOT / f"sim{itraj:04d}_part{ipart:02d}_nve_traj.npz"
                for ipart in range(npart)
            ]
            if not all(path_npz.exists() for path_npz in paths_npz):
                print(f"Some of {paths_npz} not found, skipping.")
                continue
            dipole = []
            for path_npz in paths_npz:
                data = np.load(path_npz)
                dipole.append(data["dipole"])
            dipole = np.concatenate(dipole, axis=1)
            data = np.load(paths_npz[0])
            prefactor = 1.0 / (
                data["volume"][0] * data["temperature"].mean() * BOLTZMANN_CONSTANT
            )
            # The finite difference is equivalent to a block-averaged charge current.
            current = np.diff(dipole, axis=1) / timestep
            yield prefactor, current

    # Configure units for output
    uc = UnitConfig(
        acint_symbol=r"\sigma",
        acint_unit_str=r"S/m",
        acint_fmt=".1f",
        time_unit=1e-15,
        time_unit_str="fs",
        time_fmt=".3f",
        freq_unit=1e12,
        freq_unit_str="THz",
    )

    # Perform the analysis with STACIE
    spectrum = compute_spectrum(
        iter_sequences(),
        timestep=timestep,
        prefactors=None,
        include_zero_freq=False,
    )
    result = estimate_acint(spectrum, model, verbose=True, uc=uc)

    # Plot some basic analysis figures.
    prefix = "conductivity"
    plt.close(f"{prefix}_spectrum")
    _, ax = plt.subplots(num=f"{prefix}_fitted")
    plot_fitted_spectrum(ax, uc, result)
    plt.close(f"{prefix}_extras")
    _, axs = plt.subplots(2, 2, num=f"{prefix}_extras")
    plot_extras(axs, uc, result)

    # Return the ionic conductivity.
    return result.acint

Analysis of the Initial Production Simulation

The following cell computes the ionic conductivity of the molten salt at 1100 K, from the initial production runs (8000 steps each).

conductivity_1_01 = analyze(ExpPolyModel([0, 1]))
CUTOFF FREQUENCY SCAN cv2l(125%)
     neff   criterion  fcut [THz]
---------  ----------  ----------
     10.0       -30.5    2.56e-01
     10.7       -30.6    2.73e-01
     11.4       -30.7    2.90e-01
     12.2       -30.7    3.09e-01
     13.0       -30.7    3.29e-01
     13.8       -30.7    3.50e-01
     14.8       -30.6    3.73e-01
     15.8       -30.6    3.97e-01
     16.8       -30.5    4.22e-01
     17.9       -30.4    4.50e-01
     19.1       -30.2    4.79e-01
     20.4       -29.9    5.09e-01
     21.7       -29.4    5.42e-01
     23.2       -28.8    5.77e-01
     24.7       -28.2    6.14e-01
     26.3       -27.7    6.54e-01
     28.0       -27.5    6.96e-01
     29.9       -27.5    7.41e-01
     31.8       -27.8    7.89e-01
     33.9       -28.1    8.40e-01
     36.1       -28.3    8.94e-01
     38.5       -28.3    9.52e-01
     41.0       -28.0    1.01e+00
     43.7       -27.5    1.08e+00
     46.5       -26.8    1.15e+00
     49.6       -26.1    1.22e+00
     52.8       -25.3    1.30e+00
     56.2       -24.4    1.38e+00
     59.9       -23.3    1.47e+00
     63.8       -21.9    1.57e+00
     67.9       -19.9    1.67e+00
     72.4       -17.3    1.78e+00
     77.1       -13.6    1.89e+00
     82.1        -8.3    2.01e+00
     87.4        -0.6    2.14e+00
     93.1        10.8    2.28e+00
     99.1        27.5    2.43e+00
    105.5        52.4    2.59e+00
    112.3        89.3    2.75e+00
Cutoff criterion exceeds incumbent + margin: -30.7 + 100.0.

INPUT TIME SERIES
    Time step:                     50.000 fs
    Simulation time:               39950.000 fs
    Maximum degrees of freedom:    600.0

MAIN RESULTS
    Autocorrelation integral:      347.0 ± 10.9 S/m
    Integrated correlation time:   25.116 ± 0.788 fs

SANITY CHECKS (weighted averages over cutoff grid)
    Effective number of points:    15.6 (ideally > 40)
    Regression cost Z-score:       0.1 (ideally < 2)
    Cutoff criterion Z-score:      -0.1 (ideally < 2)

MODEL exppoly(0, 1) | CUTOFF CRITERION cv2l(125%)
    Number of parameters:          2
    Average cutoff frequency:      3.94e-01 THz
/home/toon/univ/molmod/stacie/src/stacie/model.py:323: RuntimeWarning: overflow encountered in exp
  amplitudes_model = np.exp(np.dot(design_matrix, pars))
../_images/07bb3aea7f56c9c0e03c346c17da9ccad3b9fbc181e7ec7604c3f2d56fd6cc75.svg ../_images/70f2c80bb516c7a8b3d16734470c0874e2b752f812d614764ba3f2a6a06e8500.svg

The analysis of the initial production runs shows that the trajectories are not yet sufficient for a reliable interpretation of the autocorrelation integrals:

  • Only 15 effective points are used for the fitting.

  • The relative error is 3.1%, while higher than the coarse estimate of 0.9%, it is of the right order of magnitude.

The extra plots reveal another reason for extending the MD simulations. The cutoff weight is significant at the lowest cutoff frequency, suggesting that a finer grid with lower frequencies could reveal new details. Hence, we extended the production runs by 8000 additional steps to refine the frequency grid, of which the results are discussed in the following subsection.

Analysis of the Extended Production Simulation (8000 + 8000 steps)

We simply call the same analyze() function, but now with npart=2, which loads the initial production runs and their first extension.

conductivity_2_01 = analyze(ExpPolyModel([0, 1]), npart=2)
CUTOFF FREQUENCY SCAN cv2l(125%)
     neff   criterion  fcut [THz]
---------  ----------  ----------
     10.0       -28.0    1.28e-01
     10.7       -27.7    1.36e-01
     11.4       -27.4    1.45e-01
     12.2       -27.3    1.54e-01
     13.0       -27.5    1.64e-01
     13.8       -27.9    1.75e-01
     14.8       -28.5    1.86e-01
     15.8       -29.2    1.98e-01
     16.8       -29.8    2.11e-01
     17.9       -30.3    2.25e-01
     19.1       -30.5    2.39e-01
     20.4       -30.6    2.55e-01
     21.7       -30.6    2.71e-01
     23.2       -30.7    2.88e-01
     24.7       -30.9    3.07e-01
     26.3       -31.2    3.27e-01
     28.0       -31.6    3.48e-01
     29.9       -31.6    3.70e-01
     31.8       -31.2    3.94e-01
     33.9       -30.2    4.20e-01
     36.1       -28.7    4.47e-01
     38.5       -26.9    4.76e-01
     41.0       -25.1    5.06e-01
     43.7       -23.6    5.39e-01
     46.5       -22.7    5.74e-01
     49.6       -22.3    6.11e-01
     52.8       -22.3    6.50e-01
     56.2       -22.4    6.92e-01
     59.9       -22.6    7.36e-01
     63.8       -22.8    7.84e-01
     67.9       -22.9    8.35e-01
     72.4       -23.1    8.88e-01
     77.1       -23.3    9.46e-01
     82.1       -23.5    1.01e+00
     87.4       -23.6    1.07e+00
     93.1       -23.4    1.14e+00
     99.1       -22.9    1.21e+00
    105.5       -22.0    1.29e+00
    112.3       -20.6    1.38e+00
    119.6       -18.8    1.46e+00
    127.4       -16.7    1.56e+00
    135.6       -13.9    1.66e+00
    144.4       -10.0    1.77e+00
    153.7        -4.1    1.88e+00
    163.7         5.1    2.00e+00
    174.3        19.1    2.13e+00
    185.6        40.9    2.27e+00
    197.6        74.1    2.41e+00
Cutoff criterion exceeds incumbent + margin: -31.6 + 100.0.

INPUT TIME SERIES
    Time step:                     50.000 fs
    Simulation time:               79950.000 fs
    Maximum degrees of freedom:    600.0

MAIN RESULTS
    Autocorrelation integral:      354.3 ± 8.0 S/m
    Integrated correlation time:   25.671 ± 0.576 fs

SANITY CHECKS (weighted averages over cutoff grid)
    Effective number of points:    25.7 (ideally > 40)
    Regression cost Z-score:       0.6 (ideally < 2)
    Cutoff criterion Z-score:      -0.4 (ideally < 2)

MODEL exppoly(0, 1) | CUTOFF CRITERION cv2l(125%)
    Number of parameters:          2
    Average cutoff frequency:      3.19e-01 THz
../_images/a488851945a7f108be6285bfb40016b10d7378122acbad4386469ad676e6bd0c.svg ../_images/7c72a571dbf8719d8f59e3945a40c4e4b5b8355079bf366552f8752e23210034.svg

The extended analysis shows that the results are starting to converge. However, the number of fitted points is still only 26, which is relatively low. To get more robust results, we extended the simulations once more. We added 184000 more steps, resulting in a simulation time of 1 ns for each of the 100 trajectories.

Analysis of the Extended Production Simulation (8000 + 8000 + 184000 steps)

We simply call the same analyze() function, but now with npart=3, which loads the initial production runs and their first and second extensions.

conductivity_3_01 = analyze(ExpPolyModel([0, 1]), npart=3)
CUTOFF FREQUENCY SCAN cv2l(125%)
     neff   criterion  fcut [THz]
---------  ----------  ----------
     10.0       -27.0    1.02e-02
     10.7       -27.1    1.09e-02
     11.4       -27.3    1.16e-02
     12.2       -27.4    1.23e-02
     13.0       -27.5    1.31e-02
     13.8       -27.6    1.40e-02
     14.8       -27.8    1.49e-02
     15.8       -27.9    1.58e-02
     16.8       -28.0    1.69e-02
     17.9       -28.0    1.80e-02
     19.1       -28.0    1.91e-02
     20.4       -28.1    2.04e-02
     21.7       -28.2    2.17e-02
     23.2       -28.4    2.31e-02
     24.7       -28.5    2.45e-02
     26.3       -28.6    2.61e-02
     28.0       -28.7    2.78e-02
     29.9       -28.8    2.96e-02
     31.8       -28.9    3.15e-02
     33.9       -29.0    3.36e-02
     36.1       -29.2    3.57e-02
     38.5       -29.3    3.80e-02
     41.0       -29.5    4.05e-02
     43.7       -29.6    4.31e-02
     46.5       -29.6    4.59e-02
     49.6       -29.7    4.88e-02
     52.8       -29.7    5.20e-02
     56.2       -29.8    5.53e-02
     59.9       -29.9    5.89e-02
     63.8       -30.1    6.27e-02
     67.9       -30.3    6.67e-02
     72.4       -30.4    7.10e-02
     77.1       -30.5    7.56e-02
     82.1       -30.7    8.05e-02
     87.4       -30.8    8.57e-02
     93.1       -30.9    9.12e-02
     99.1       -31.1    9.71e-02
    105.5       -31.2    1.03e-01
    112.3       -31.4    1.10e-01
    119.6       -31.5    1.17e-01
    127.4       -31.6    1.25e-01
    135.6       -31.7    1.33e-01
    144.4       -31.9    1.41e-01
    153.7       -32.0    1.50e-01
    163.7       -32.2    1.60e-01
    174.3       -32.4    1.70e-01
    185.6       -32.5    1.81e-01
    197.6       -32.6    1.93e-01
    210.3       -32.7    2.06e-01
    223.9       -32.6    2.19e-01
    238.4       -32.2    2.33e-01
    253.8       -31.6    2.48e-01
    270.2       -30.7    2.64e-01
    287.7       -29.4    2.81e-01
    306.3       -27.9    2.99e-01
    326.0       -26.4    3.18e-01
    347.1       -25.0    3.39e-01
    369.5       -23.7    3.61e-01
    393.4       -22.5    3.84e-01
    418.8       -21.3    4.09e-01
    445.8       -20.0    4.35e-01
    474.6       -18.5    4.63e-01
    505.3       -16.8    4.93e-01
    537.9       -15.1    5.25e-01
    572.6       -13.5    5.59e-01
    609.6       -12.2    5.95e-01
    648.9       -11.3    6.33e-01
    690.8       -11.0    6.74e-01
    735.4       -11.0    7.17e-01
    782.8       -11.2    7.64e-01
    833.3       -11.5    8.13e-01
    887.1       -11.7    8.65e-01
    944.4       -11.6    9.21e-01
   1005.3       -11.0    9.81e-01
Reached the maximum number of effective points (1000).

INPUT TIME SERIES
    Time step:                     50.000 fs
    Simulation time:               999950.000 fs
    Maximum degrees of freedom:    600.0

MAIN RESULTS
    Autocorrelation integral:      352.9 ± 3.7 S/m
    Integrated correlation time:   25.451 ± 0.264 fs

SANITY CHECKS (weighted averages over cutoff grid)
    Effective number of points:    165.4 (ideally > 40)
    Regression cost Z-score:       0.1 (ideally < 2)
    Cutoff criterion Z-score:      -0.3 (ideally < 2)

MODEL exppoly(0, 1) | CUTOFF CRITERION cv2l(125%)
    Number of parameters:          2
    Average cutoff frequency:      1.62e-01 THz
../_images/d969b4cb345abca089cf457de08d034484af2ad8a0408ba6e13f00e904be831e.svg ../_images/6662352a15f8c34f78747eaf02625f9b2f7697c7bc78eca2a9ff80cea7f9800e.svg

The analysis of the full extended production runs leads to a modest improvement. However, the utility of the first-order term of the model is questionable, given that the slope is nearly zero and could go either way according to the confidence intervals of the model (green dashed curves). Hence, we first test a constant (white noise) model to the first part of the spectrum:

conductivity_3_0 = analyze(ExpPolyModel([0]), npart=3)
CUTOFF FREQUENCY SCAN cv2l(125%)
     neff   criterion  fcut [THz]
---------  ----------  ----------
      5.0        -2.5    5.36e-03
      5.4        -2.4    5.71e-03
      5.7        -2.4    6.07e-03
      6.1        -2.3    6.47e-03
      6.6        -2.2    6.88e-03
      7.0        -2.1    7.33e-03
      7.5        -2.1    7.80e-03
      8.0        -2.2    8.30e-03
      8.6        -2.3    8.84e-03
      9.1        -2.4    9.41e-03
      9.8        -2.5    1.00e-02
     10.4        -2.6    1.07e-02
     11.1        -2.7    1.13e-02
     11.9        -2.7    1.21e-02
     12.7        -2.8    1.29e-02
     13.5        -2.8    1.37e-02
     14.4        -2.9    1.46e-02
     15.4        -2.9    1.55e-02
     16.4        -2.9    1.65e-02
     17.5        -3.0    1.76e-02
     18.7        -3.0    1.87e-02
     19.9        -3.1    1.99e-02
     21.2        -3.1    2.12e-02
     22.6        -3.2    2.26e-02
     24.1        -3.2    2.40e-02
     25.7        -3.2    2.56e-02
     27.4        -3.1    2.72e-02
     29.2        -3.1    2.90e-02
     31.1        -3.0    3.08e-02
     33.2        -2.9    3.28e-02
     35.4        -2.8    3.50e-02
     37.7        -2.7    3.72e-02
     40.1        -2.6    3.96e-02
     42.7        -2.6    4.22e-02
     45.5        -2.7    4.49e-02
     48.5        -2.7    4.78e-02
     51.7        -2.7    5.09e-02
     55.0        -2.8    5.41e-02
     58.6        -2.9    5.76e-02
     62.4        -3.0    6.13e-02
     66.5        -3.2    6.53e-02
     70.8        -3.3    6.95e-02
     75.4        -3.4    7.40e-02
     80.3        -3.6    7.88e-02
     85.5        -3.7    8.38e-02
     91.1        -3.8    8.93e-02
     97.0        -3.8    9.50e-02
    103.2        -3.9    1.01e-01
    109.9        -3.9    1.08e-01
    117.1        -4.0    1.15e-01
    124.6        -4.0    1.22e-01
    132.7        -4.0    1.30e-01
    141.3        -4.0    1.38e-01
    150.4        -4.1    1.47e-01
    160.2        -4.1    1.57e-01
    170.5        -4.1    1.67e-01
    181.6        -4.1    1.77e-01
    193.3        -4.0    1.89e-01
    205.8        -4.0    2.01e-01
    219.1        -3.9    2.14e-01
    233.3        -3.7    2.28e-01
    248.4        -3.3    2.43e-01
    264.4        -2.7    2.58e-01
    281.5        -1.7    2.75e-01
    299.7         0.0    2.93e-01
    319.1         2.8    3.12e-01
    339.7         7.0    3.32e-01
    361.6        13.2    3.53e-01
    385.0        22.0    3.76e-01
    409.8        34.1    4.00e-01
    436.3        50.2    4.26e-01
    464.4        71.6    4.53e-01
    494.4        99.8    4.82e-01
Cutoff criterion exceeds incumbent + margin: -4.1 + 100.0.

INPUT TIME SERIES
    Time step:                     50.000 fs
    Simulation time:               999950.000 fs
    Maximum degrees of freedom:    600.0

MAIN RESULTS
    Autocorrelation integral:      353.4 ± 3.8 S/m
    Integrated correlation time:   25.486 ± 0.271 fs

SANITY CHECKS (weighted averages over cutoff grid)
    Effective number of points:    100.0 (ideally > 20)
    Regression cost Z-score:       0.6 (ideally < 2)
    Cutoff criterion Z-score:      -0.4 (ideally < 2)

MODEL exppoly(0) | CUTOFF CRITERION cv2l(125%)
    Number of parameters:          1
    Average cutoff frequency:      9.80e-02 THz
../_images/2c31019fa516521db0e981dea65ec1232456a7049b57af0e310cf367e89be644.svg ../_images/27fdfd8e8d18c2bb502ef7b5a8d6b147fe6ac9a4b2a7cf33e08159f7bfa20e1f.svg

Another model to consider is the Pade model, not because we expect the ACF to decay exponentially, but because it features well-behaved high-frequency limits, which can facilitate the regression.

conductivity_3_p = analyze(PadeModel([0, 2], [2]), npart=3)
CUTOFF FREQUENCY SCAN cv2l(125%)
     neff   criterion  fcut [THz]
---------  ----------  ----------
     15.0         inf    1.51e-02  (cv2l: Linear dependencies in basis. evals=array([1.63889329e-06, 1.02817551e-03, 2.99897019e+00]))
     16.0         inf    1.61e-02  (cv2l: Linear dependencies in basis. evals=array([1.13716711e-06, 7.92211478e-04, 2.99920665e+00]))
     17.1         inf    1.71e-02  (cv2l: Linear dependencies in basis. evals=array([7.86089625e-07, 6.12365220e-04, 2.99938685e+00]))
     18.2         inf    1.82e-02  (cv2l: Linear dependencies in basis. evals=array([5.42866962e-07, 4.75274695e-04, 2.99952418e+00]))
     19.4         inf    1.94e-02  (cv2l: Linear dependencies in basis. evals=array([3.76196721e-07, 3.69469427e-04, 2.99963015e+00]))
     20.7         inf    2.06e-02  (cv2l: Linear dependencies in basis. evals=array([2.59666420e-07, 2.89088199e-04, 2.99971065e+00]))
     22.0         inf    2.20e-02  (cv2l: Linear dependencies in basis. evals=array([1.80689157e-07, 2.27594112e-04, 2.99977223e+00]))
     23.5         inf    2.34e-02  (opt: Hessian matrix has non-positive eigenvalues: evals=array([-2.74279924e-05,  4.30590654e-01,  2.56943677e+00]))
     25.0       -89.1    2.49e-02
     26.7       -89.7    2.65e-02
     28.4       -90.2    2.82e-02
     30.3       -90.3    3.00e-02
     32.3       -90.0    3.20e-02
     34.4       -89.9    3.40e-02
     36.7       -89.8    3.62e-02
     39.1       -89.7    3.86e-02
     41.6       -89.4    4.11e-02
     44.3       -89.2    4.37e-02
     47.2       -89.0    4.65e-02
     50.3       -88.7    4.95e-02
     53.6       -88.4    5.27e-02
     57.1       -88.0    5.61e-02
     60.8       -87.6    5.97e-02
     64.7       -87.1    6.36e-02
     68.9       -86.7    6.77e-02
     73.4       -86.3    7.21e-02
     78.2       -85.9    7.67e-02
     83.3       -85.5    8.17e-02
     88.7       -85.1    8.69e-02
     94.4       -84.7    9.25e-02
    100.5       -84.2    9.85e-02
    107.1       -83.7    1.05e-01
    114.0       -83.3    1.12e-01
    121.4       -82.9    1.19e-01
    129.2       -82.5    1.26e-01
    137.6       -82.1    1.35e-01
    146.5       -81.6    1.43e-01
    156.0         inf    1.53e-01  (opt: Hessian matrix has non-positive eigenvalues: evals=array([-1.54359370e-06,  4.41366618e-01,  2.55863493e+00]))
    166.1         inf    1.62e-01  (opt: Hessian matrix has non-positive eigenvalues: evals=array([-1.49142797e-06,  4.41582231e-01,  2.55841926e+00]))
    176.8         inf    1.73e-01  (opt: Hessian matrix has non-positive eigenvalues: evals=array([-2.91330612e-06,  4.41639799e-01,  2.55836311e+00]))
    188.3         inf    1.84e-01  (opt: Hessian matrix has non-positive eigenvalues: evals=array([-8.02534064e-06,  4.41998992e-01,  2.55800903e+00]))
    200.4         inf    1.96e-01  (opt: Hessian matrix has non-positive eigenvalues: evals=array([-1.49369521e-05,  4.42334093e-01,  2.55768084e+00]))
    213.4         inf    2.09e-01  (opt: Hessian matrix has non-positive eigenvalues: evals=array([-2.27159654e-05,  4.42747757e-01,  2.55727496e+00]))
    227.2         inf    2.22e-01  (opt: Hessian matrix has non-positive eigenvalues: evals=array([-2.87488075e-05,  4.43190437e-01,  2.55683831e+00]))
    241.9         inf    2.36e-01  (opt: Hessian matrix has non-positive eigenvalues: evals=array([-4.04863273e-05,  4.43778776e-01,  2.55626171e+00]))
    257.5         inf    2.52e-01  (opt: Hessian matrix has non-positive eigenvalues: evals=array([-4.33777791e-05,  4.44331986e-01,  2.55571139e+00]))
    274.2         inf    2.68e-01  (opt: Hessian matrix has non-positive eigenvalues: evals=array([-4.06989167e-05,  4.44918800e-01,  2.55512190e+00]))
    291.9         inf    2.85e-01  (opt: Hessian matrix has non-positive eigenvalues: evals=array([-2.40531661e-05,  4.45570259e-01,  2.55445379e+00]))
    310.7      -102.2    3.03e-01
    330.8      -102.7    3.23e-01
    352.2      -103.3    3.44e-01
    374.9      -103.9    3.66e-01
    399.1      -104.4    3.90e-01
    424.9      -104.9    4.15e-01
    452.3      -105.3    4.41e-01
    481.5      -105.7    4.70e-01
    512.6      -105.8    5.00e-01
    545.7      -105.9    5.32e-01
    580.9      -105.9    5.67e-01
    618.4      -105.8    6.03e-01
    658.4      -105.9    6.42e-01
    700.9      -106.2    6.84e-01
    746.1      -106.8    7.28e-01
    794.2      -107.5    7.75e-01
    845.5      -108.0    8.25e-01
    900.1      -108.2    8.78e-01
    958.1      -107.9    9.35e-01
   1020.0      -107.4    9.95e-01
Reached the maximum number of effective points (1000).

INPUT TIME SERIES
    Time step:                     50.000 fs
    Simulation time:               999950.000 fs
    Maximum degrees of freedom:    600.0

MAIN RESULTS
    Autocorrelation integral:      349.9 ± 1.3 S/m
    Integrated correlation time:   25.232 ± 0.091 fs

SANITY CHECKS (weighted averages over cutoff grid)
    Effective number of points:    829.1 (ideally > 60)
    Regression cost Z-score:       -0.5 (ideally < 2)
    Cutoff criterion Z-score:      -0.1 (ideally < 2)

MODEL pade(0, 2; 2) | CUTOFF CRITERION cv2l(125%)
    Number of parameters:          3
    Average cutoff frequency:      8.09e-01 THz
../_images/95c4064661a291496b2bcbc8469561b7c86e6e8198c0fe12c03e930a4a817497.svg ../_images/11c4f538a82e7dd9a5b1bdd303ddc77d409cc33edb21808fde1ece0aa1dc4f5f.svg

This is indeed a successful regression, with 829 effective points for a three-parameter model. The relative error estimate on the final result is 0.37%.

Density

To enable a proper comparison with the experimental and other simulation results, we also need to estimate the density of the system. This is done by averaging the density over the NpT trajectories from the production runs.

def estimate_density(ntraj: int = 100):
    densities = []
    molar_vols = []
    masses = {11: 22.990, 17: 35.45}  # g/mol
    avogadro = 6.02214076e23  # 1/mol
    for itraj in range(ntraj):
        path_npz = DATA_ROOT / f"sim{itraj:04d}_part00_npt_traj.npz"
        if not path_npz.exists():
            print(f"File {path_npz} not found, skipping.")
            continue
        data = np.load(path_npz)
        mass = sum(masses[atnum] for atnum in data["atnums"]) / avogadro
        volume = data["volume"] * 10**6  # from m³ to cm³
        densities.append(mass / volume)
        molar_vols.append(2 * avogadro * volume / len(data["atnums"]) / 2)
    density = np.mean(densities)
    print(f"Mass density: {density:.3f} ± {np.std(densities):.3f} g/cm³")
    print(f"Molar volume: {np.mean(molar_vols):.4f} ± {np.std(molar_vols):.4f} cm³/mol")
    return density


density = estimate_density()
Mass density: 1.454 ± 0.014 g/cm³
Molar volume: 20.1042 ± 0.1977 cm³/mol

Comparison to Literature Results

Transport properties for this system are challenging to compute accurately. Consequently, simulation results from the literature may exhibit some variation. While the results should be broadly comparable to some extent, deviations may arise due to the differences in post-processing techniques, and the absence of reported error bars in some studies. Furthermore, in [WSLY14] smaller simulation cells were used (512 ions instead of 1728), which may also contribute to discrepancies.

In the table below, we included some more results obtained with STACIE than those discussed above. We also computed the conductivity with the Pade model for all cases, which was a better choice in retrospect.

Ensemble

Simulated time [ns]

Density [g/cm3]

Conductivity [S/m]

Reference

NpT+NVE

4

1.454 ± 0.014

347 ± 10.9

init expoly(0,1)

NpT+NVE

8

1.454 ± 0.014

354 ± 8.0

ext1 expoly(0,1)

NpT+NVE

100

1.454 ± 0.014

353 ± 3.7

ext2 expoly(0,1)

NpT+NVE

100

1.454 ± 0.014

353 ± 3.8

ext2 expoly(0)

NpT+NVE

4

1.454 ± 0.014

343 ± 5.4

init pade(0, 2; 2)

NpT+NVE

8

1.454 ± 0.014

346 ± 3.7

ext1 pade(0, 2; 2)

NpT+NVE

100

1.454 ± 0.014

349 ± 1.3

ext2 pade(0, 2; 2)

NpT+NVT

6

1.456

348 ± 7

[WDZ+20]

NpT+NVT

> 5

1.444

≈ 310

[WSLY14]

Experiment

N.A.

1.542 ± 0.006

366 ± 3

[JDL+68] [BH61]

The comparison shows that the results obtained with STACIE align reasonably well with the literature. In terms of statistical efficiency, STACIE achieves comparable or smaller error bars for about the same simulation time. The deviation from experiment is attributed to the approximations in the NaCl potential. [WDZ+20]

Finally, this example also shows why transport properties can be difficult to compute. As more data is collected, a more detailed spectrum is obtained. Simple models can struggle to explain the increasing amount of information. When extending the total simulation time from 8 ns to 100 ns, the effective number of points in the fit does not grow accordingly. As a result, the uncertainties decrease rather slowly with increasing simulation time.

Technical Details of the Analysis of the Literature Data

References for the experimental data:

The following cell converts a molar ionic conductivity from the literature back to a conductivity.

def convert_molar_conductivity():
    """Convert a specific conductance to a conductivity."""
    # Parameters taken from Wang 2020 (https://doi.org/10.1063/5.0023225)
    # and immediately converted to SI units
    molar_conductivity = 140 * 1e-4  # S m²/mol
    molar_conductivity_std = 3 * 1e-4  # S m²/mol
    density = 1.456 * 1e3  # kg/m³
    molar_mass = (22.990 + 35.45) * 1e-3  # kg/mol
    molar_volume = molar_mass / density  # m³/mol
    conductivity = molar_conductivity / molar_volume
    conductivity_std = molar_conductivity_std / molar_volume
    print("Conductivity [S/m]", conductivity)
    print("Conductivity std [S/m]", conductivity_std)


convert_molar_conductivity()
Conductivity [S/m] 348.8021902806297
Conductivity std [S/m] 7.474332648870638

Validation of the Production Runs

To further establish that our NVE runs together represent the NpT ensemble, the following two cells perform additional validation checks.

  • A plot of the conserved quantity of the separate NVE runs, to detect any drift.

  • The distribution of the instantaneous temperature, which should match the desired NpT distribution. For each individual NVE run and for the combined NVE runs, cumulative distributions are plotted. The function also plots the expected cumulative distribution of the NpT ensemble.

def plot_total_energy(npart: int = 3, ntraj: int = 100):
    time = None
    energies = []
    for itraj in range(ntraj):
        if itraj == 0:
            time = []
        energies_traj = []
        for ipart in range(npart):
            path_npz = DATA_ROOT / f"sim{itraj:04d}_part{ipart:02d}_nve_traj.npz"
            if not path_npz.exists():
                print(f"File {path_npz} not found, skipping.")
                continue
            data = np.load(path_npz)
            if itraj == 0:
                time.append(data["time"])
            energies_traj.append(data["total_energy"])
        if itraj == 0:
            time = np.concatenate(time)
        energies.append(np.concatenate(energies_traj))

    num = "total_energy"
    plt.close(num)
    _, ax = plt.subplots(num=num)
    for energies_traj in energies:
        plt.plot(time, energies_traj)
    plt.title("Total energy of the NVE production runs")
    plt.xlabel("Time [ps]")
    plt.ylabel("Total energy [kJ/mol]")


plot_total_energy()
../_images/ea97b0549bb403a046538426f12d8eaa4db35c4e2d4386b35dde8da3e899ed07.svg

There is no noticeable drift in the total energy of the NVE runs. Apart from the usual (and acceptable) numerical noise, the total energy is conserved perfectly.

def plot_temperature_production(npart: int = 3, ntraj: int = 100):
    """Plot cumulative distributions of the instantaneous temperature."""
    # Load the temperature data from the NVE production runs.
    natom = None
    temps = []
    for itraj in range(ntraj):
        temps.append([])
        for ipart in range(npart):
            path_npz = DATA_ROOT / f"sim{itraj:04d}_part{ipart:02d}_nve_traj.npz"
            if not path_npz.exists():
                print(f"File {path_npz} not found, skipping.")
                continue
            data = np.load(path_npz)
            natom = len(data["atnums"])
            temps[-1].append(data["temperature"])
    temps = np.array([np.concatenate(t) for t in temps])

    # Plot the instantaneous and desired temperature distribution.
    plt.close("tempprod")
    _, ax = plt.subplots(num="tempprod")
    ndof = 3 * natom - 3
    temp_d = 1100
    plot_cumulative_temperature_histogram(ax, temps, temp_d, ndof, "K")


plot_temperature_production()
../_images/582a6f437c61392714c101fcdb5449663ba2d6dcbb576fed4c3cd427fbdf8e21.svg

Alas, as mentioned above, there is still a small mismatch between the obtained and expected NVT temperature distributions. This notebook will be updated after OpenMM issue #4948 has been resolved.

Regression Tests

If you are experimenting with this notebook, you can ignore any exceptions below. The tests are only meant to pass for the notebook in its original form.

if abs(conductivity_1_01 - 347) > 10:
    raise ValueError(f"wrong conductivity (production): {conductivity_1_01:.0f}")
if abs(conductivity_2_01 - 354) > 8:
    raise ValueError(f"wrong conductivity (production): {conductivity_2_01:.0f}")
if abs(conductivity_3_01 - 353) > 7:
    raise ValueError(f"wrong conductivity (production): {conductivity_3_01:.0f}")
if abs(conductivity_3_0 - 353) > 5:
    raise ValueError(f"wrong conductivity (production): {conductivity_3_0:.0f}")
if abs(conductivity_3_p - 349) > 3:
    raise ValueError(f"wrong conductivity (production): {conductivity_3_p:.0f}")
if abs(density - 1.449) > 0.02:
    raise ValueError(f"wrong density (production): {density:.3f}")