Correlation Time Analysis of Cloud Cover Data¶
This example is inspired by the work of P. A. Jones [Jon92] on the analysis of temporal (and spatial) correlations in cloud cover data. Jones observed an exponential decay of the autocorrelation function, with half-life times ranging from 5 to 40 hours.
Here, we analyze a time series of cloud cover data obtained from the Open-Meteo platform. The data correspond to hourly cloud cover observations (in percentage of the sky covered by clouds) in Ghent, Belgium, from January 1, 2010 to January 1, 2020.
Library Imports and Matplotlib Configuration¶
import os
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
from path import Path
from stacie import (
compute_spectrum,
estimate_acint,
LorentzModel,
UnitConfig,
plot_extras,
plot_fitted_spectrum,
)
from stacie.utils import split
mpl.rc_file("matplotlibrc")
%config InlineBackend.figure_formats = ["svg"]
Cloud Cover Data¶
In this example, cloud cover is expressed as the fraction of the sky covered by clouds, ranging from 0 (clear sky) to 1 (completely overcast).
# You normally do not need to change this path.
# It only needs to be overridden when building the documentation.
DATA_ROOT = Path(os.getenv("DATA_ROOT", "./")) / "cloud-cover/"
# Load data and convert to a fraction from 0 to 1.
cover = (
np.loadtxt(
DATA_ROOT / "cloud-cover-ghent-2010-2020.csv",
delimiter=",",
usecols=1,
skiprows=4,
)
/ 100
)
print(f"Number of measurements: {cover.shape}")
Number of measurements: (87672,)
Sample of the Cloud Cover Data¶
To get a first impression of the data, we plot a small sample.
def plot_sample():
plt.close("sample")
_, ax = plt.subplots(num="sample")
time = np.arange(240)
ax.plot(time, cover[:240], "C0-")
ax.set_xlabel("Time [h]")
ax.set_ylabel("Cloud Cover Fraction")
ax.set_title("Sample of Cloud Cover Data (Ghent, First 10 Days of 2010)")
ax.set_ylim(-0.05, 1.05)
plot_sample()
Histogram of Cloud Cover Data¶
The histogram below shows the distribution of cloud cover values.
def plot_histogram():
plt.close("histogram")
_, ax = plt.subplots(num="histogram")
ax.hist(cover, bins=np.linspace(0, 1, 21), density=True, ec="w")
ax.set_xlabel("Cloud Cover Fraction")
ax.set_ylabel("Probability Density")
ax.set_xlim(0, 1)
ax.set_title("Histogram of Cloud Cover Data (Ghent, 2010-2020)")
plot_histogram()
This histogram reflects the typical weather pattern in Belgium, with plenty of cloudy days. This is also reflected in the normalized standard deviation (NS), as defined by Jones [Jon92]:
Mean cloud cover: 0.668
Standard deviation: 0.369
Normalized std. dev.: 0.783
Compared to the values in Figure 10 of Jones’s work, this is a relatively high NS, which has been associated with longer correlation times.
Autocorrelation Function¶
Before applying STACIE to the data, let’s first compute and plot the autocorrelation function (ACF) directly.
Mind the normalization: due to the zero padding in np.correlate,
the number of terms contributing to the ACF decreases with lag time.
Furthermore, the ACF is not normalized to 1 in this plot,
as we want to keep the amplitude information.
AMP_EXP = 0.0347 # From STACIE Lorentz B parameter, see below.
TAU_EXP = 57.6 # From STACIE corrtime_exp, see below.
def plot_acf():
plt.close("acf")
_, ax = plt.subplots(num="acf")
delta = cover - np.mean(cover)
acf = np.correlate(delta, delta, mode="same")
nkeep = acf.size // 2
acf = acf[nkeep:] / (len(acf) - np.arange(nkeep))
time = np.arange(240)
ax.plot(time, acf[:240], "C0-")
ax.plot(
time,
AMP_EXP * np.exp(-time / TAU_EXP),
"C1--",
label=r"exp(-t/\tau_\mathrm{exp})",
)
xticks = np.arange(0, 241, 24)
ax.set_xlim(-1, 240)
ax.set_xticks(xticks)
ax.grid(True, axis="x")
ax.set_xlabel("Lag Time [h]")
ax.set_ylabel("Autocorrelation Function")
ax.set_title("Autocorrelation Function of Cloud Cover Data (Ghent, 2010-2020)")
plot_acf()
The ripples in the ACF are due to diurnal cycles, resulting in weak correlations at multiples of 24 hours. Superimposed on these diurnal effects, an overall exponential decay of the ACF can be observed. However, it is difficult to fit an exponential function to the ACF due to (i) the ripples and (ii) non-exponential short-time effects. Hence, fitting an exponential function can at best provide a rough estimate of the correlation time.
Below, it is shown how to use STACIE instead to perform a more robust analysis. The exponential decay shown in the plot is derived from STACIE’s output below. It is only expected to be representative at long lag times.
Autocorrelation Time¶
The following cells perform a standard time correlation analysis using STACIE.
The prefactors argument is set so that the resulting autocorrelation integral
is the variance of the mean cloud cover.
To reduce the noise of the spectrum, the data is split into 20 blocks,
and the resulting spectra are averaged.
# Compute spectrum
spectrum = compute_spectrum(
split(cover, 20),
include_zero_freq=False,
prefactors=2.0 / len(cover),
)
# Estimate autocorrelation time
uc = UnitConfig(
time_unit_str="h",
freq_unit_str="1/h",
time_fmt=".1f",
freq_fmt=".3f",
)
result = estimate_acint(spectrum, LorentzModel(), verbose=True, uc=uc)
CUTOFF FREQUENCY SCAN cv2l(125%)
neff criterion fcut [1/h]
--------- ---------- ----------
15.0 32.7 0.003
16.0 31.2 0.004
17.1 29.9 0.004
18.2 29.0 0.004
19.4 28.4 0.004
20.7 28.0 0.005
22.0 27.6 0.005
23.5 27.2 0.005
25.0 27.0 0.006
26.7 27.0 0.006
28.4 27.2 0.006
30.3 27.5 0.007
32.3 27.8 0.007
34.4 27.9 0.008
36.7 27.9 0.008
39.1 27.9 0.009
41.6 27.8 0.009
44.3 27.7 0.010
47.2 27.8 0.011
50.3 28.2 0.011
53.6 28.8 0.012
57.1 29.6 0.013
60.8 30.6 0.014
64.7 31.5 0.015
68.9 32.5 0.015
73.4 33.3 0.016
78.2 34.0 0.018
83.3 34.6 0.019
88.7 35.2 0.020
94.4 35.8 0.021
100.5 36.5 0.022
107.1 37.1 0.024
114.0 37.6 0.025
121.4 38.0 0.027
129.2 38.9 0.029
137.6 41.2 0.031
146.5 46.5 0.033
156.0 56.6 0.035
166.1 72.8 0.037
176.8 94.2 0.039
188.3 118.1 0.042
200.4 139.9 0.045
Cutoff criterion exceeds incumbent + margin: 27.0 + 100.0.
INPUT TIME SERIES
Time step: 1.0 h
Simulation time: 4383.0 h
Maximum degrees of freedom: 40.0
MAIN RESULTS
Autocorrelation integral: 5.16e+00 ± 5.66e-01
Integrated correlation time: 19.0 ± 2.1 h
SANITY CHECKS (weighted averages over cutoff grid)
Effective number of points: 31.7 (ideally > 60)
Regression cost Z-score: -0.4 (ideally < 2)
Cutoff criterion Z-score: 0.0 (ideally < 2)
MODEL lorentz() | CUTOFF CRITERION cv2l(125%)
Number of parameters: 3
Average cutoff frequency: 0.007 1/h
Exponential correlation time: 57.6 ± 16.9 h
RECOMMENDED SIMULATION SETTINGS (EXPONENTIAL CORR. TIME)
Block time: 18.1 ± 9.4 h
Simulation time: 3618.5 ± 133.6 h
As expected, the correlation times are on the order of one or a few days. The exponential correlation time, which is about 60 hours, is the longest because it captures the slowest relaxation process. The integrated correlation time is notably shorter at about 20 hours.
In the code below, we also derive the \(B\) parameter of the Lorentz model, which has been used to plot the exponential decay in the ACF above.
pars = result.props["pars"]
tau_exp = result.corrtime_exp
amp_exp = (pars[0] - pars[1] / pars[2]) / (2 * tau_exp)
print(f"TAU_EXP = {tau_exp:.4f} h")
print(f"AMP_EXP = {amp_exp:.4f}")
TAU_EXP = 57.5898 h
AMP_EXP = 0.0347
The plots below show the fitted spectrum and additional diagnostics. These plots can be used to evaluate the quality of the fit and confirm that the Lorentz model is appropriate in this case.
plt.close("fitted")
_, ax = plt.subplots(num="fitted")
plot_fitted_spectrum(ax, uc, result)
plt.close("extras")
_, axs = plt.subplots(2, 2, num="extras")
plot_extras(axs, uc, result)
Regression Tests¶
If you are experimenting with this notebook, you can ignore any exceptions below. The tests are only meant to pass for the notebook in its original form.
if abs(result.acint - 5.1612) > 5e-3:
raise ValueError(f"Wrong acint: {result.acint:.4e}")
if abs(result.corrtime_exp - 57.590) > 5e-2:
raise ValueError(f"Wrong corrtime_exp: {result.corrtime_exp:.4e}")