Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Mathematical Models for Time Series

Mathematical Concept of Time Series [1]

Aside to exploratory data analysis, i.e. summary statistics and plots, the primary objective of time series analysis is to develop mathematical models that provide plausible descriptions for sample data. In order to provide a statistical setting for describing the characteristics of data that seemingly fluctuate in a random fashion over time, we assume a time series can be considered as the realization of random variables that are indexed over time.

Discrete Stochastic Processes

Random Walk [2]

Assume, that a person starts walking from the coordinate center with constant speed in xx-direction. At each step, however, the person decides at random either to walk 1m to the left or to the right. This is the simplest instance of a random walk. The probabilistic model for this random walk would be

  1. Choose nn independent Bernoulli random variables D1,,DnD_1,\dots, D_n that take on the values 1 and -1 with equal probability, i.e. p=0.5p=0.5.

  2. Define the random variables Xi=D1++DiX_i = D_1+\dots+D_i for each 1in1\leq i\leq n. Then {X1,X2,}\{X_1, X_2,\dots\} is a discrete stochastic process modeling the random walk.

The following Python-code computes a particular instance of this process, i.e. a time series {x1,x2,}\{x_1,x_2,\dots\}. Each time this code is re-run, a new time series will display. The function np.cumsum() computes the cumulated sum of a given vector.

Source
import matplotlib.pyplot as plt
import numpy as np

nsamp = 10000
# Random samples d and cumulative sum x
d = np.random.choice(a=[-1,1], size=nsamp, replace=True)

x = np.cumsum(d)

# Plot
fig = plt.figure(figsize=(10, 5))
ax = fig.add_subplot(1, 1, 1)
ax.plot(x)
ax.grid()
plt.xlabel("Time")
plt.ylabel("y-deviation in [m]")

plt.show()
<Figure size 1000x500 with 1 Axes>

From the definition of the process it is clear that the following recursive definition is equivalent

Xi=Xi1+Di,X0=0.X_i = X_{i-1} + D_i,\quad X_0 = 0.

If in each step a fixed constant δ\delta is added to the series, i.e.

Yi=δ+Yi1+Di,Y0=0.Y_i = \delta + Y_{i-1} + D_i,\quad Y_0 = 0.

then we obtain a random walk with drift. In the figure, we see the observed time series of such a process. The random walk with drift models are used to model trends in time series data.

Source
# Set random seed
np.random.seed(0)
# Random samples d and cumulative sum x
d = np.random.choice(a=[-1,1], size=nsamp, replace=True)
x = np.cumsum(d)
# Add drift delta
delta = 5e-2
y = np.linspace(0, nsamp*delta, nsamp)
y += x

# Plot
fig = plt.figure(figsize=(10, 5))
ax = fig.add_subplot(1, 1, 1)
ax.plot(x, label="random walk")
ax.plot(y, label="random walk with drift")
ax.grid()
plt.xlabel("Time")
plt.ylabel("y-deviation in [m]")
plt.legend()

plt.show()
<Figure size 1000x500 with 1 Axes>

White Noise Process (WN) [3]

A white noise process consists of independent and identically distributed (i.i.d.i.i.d.) random variables {W1,W2,}\{W_1,W_2,\dots\} where each WiW_i has mean 0 and variance σ2\sigma^2.

Source
import matplotlib.pyplot as plt
import numpy as np

# White noise signal
w = np.random.normal(size=1000)

# plot
fig = plt.figure(figsize=(10, 5))
ax = fig.add_subplot(1, 1, 1)
ax.plot(w)
ax.grid()
plt.xlabel("Time")
plt.ylabel("Value")

plt.show()
<Figure size 1000x500 with 1 Axes>

If in addition the individual random variables WiW_i are normally distributed, then the process is called Gaussian white noise. These models are used to describe noise in engineering applications. The term white is chosen in analogy to white light and indicates that all possible periodic oscillations are present in a time series originating from a white noise process with equal strength.

Moving Average Process (MA) [4]

If we apply a sliding window filter to the white noise process {W1,W2,}\{W_1,W_2,\dots\} in the previous example of white noise we obtain a moving average process. E.g. if we choose the window length to be 3, we obtain

Vi=13(Wi1+Wi+Wi+1).V_i = \frac{1}{3}(W_{i-1} + W_i + W_{i+1}).

We choose V1=W1V_1 = W_1 and V2=0.5(W1+W2)V_2 = 0.5(W_1 + W_2). The resulting process is smoother, i.e. the higher order oscillations are smoothed out. In Python a sliding window can be achieved using .rolling(), the mean can be found with .mean().

Source
import pandas as pd

# Convert to DataFrame
w = pd.DataFrame(w)
# Filter with window = 3
v = w.rolling(window=3).mean()

# plot
fig = plt.figure(figsize=(10, 5))
ax = fig.add_subplot(1, 1, 1)
ax.plot(v)
ax.grid()
plt.xlabel("Time")

plt.show()
<Figure size 1000x500 with 1 Axes>

Autoregressive Process (AR) [5]

We consider again the white noise process {W1,W2,}\{W_1,W_2,\dots\} and recursively compute the following sequence

Xi=1.5Xi10.9Xi2+Wi.X_i = 1.5 X_{i-1} - 0.9 X_{i-2} + W_i.

In other words, the value of the process at time instance ii is modelled as a linear combination of the past two values plus some random component. Therefore this process is called autoregressive. The definition of the initial conditions is subtle, because the process will strongly depend on these. We will for the time being ignore the issue of initial conditions.

Source
# Autoregressive filter:
ar = np.zeros(1000)
for i in range(2,1000):
    ar[i] = 1.5 * ar[i-1] - 0.9 * ar[i-2] + w.iloc[i]
    
# plot
fig = plt.figure(figsize=(10, 5))
ax = fig.add_subplot(1, 1, 1)
ax.plot(ar)
ax.grid()
plt.xlabel("Autoregressive Process")

plt.show()
<Figure size 1000x500 with 1 Axes>

The figure shows a realization of the autoregressive process above. The oscillatory behaviour becomes clearly visible. Another interpretation of the autoregressive process above is via differential equations. The finite difference scheme for the second order equation

x¨+2δx˙+ω02x=W(t)\ddot x + 2\delta \dot x + \omega_0^2 x = W(t)

is given by

xi22xi1+xiΔt2+2δxixi1Δt+ω02xi=Wi\frac{x_{i-2} - 2x_{i-1} + x_{i}}{\Delta t^2} + 2\delta\frac{x_{i} - x_{i-1}}{\Delta t} + \omega_0^2 x_{i} = W_i

Here δ\delta is the damping term and ω0\omega_0 the frequency of the homogeneous equation. Setting Δt=1\Delta t = 1, ω02=0.4\omega_0^2 = 0.4 and δ=0.05\delta = 0.05 gives -- after some rearrangements -- the autoregressive process above. Hence it can be seen as a harmonic oscillator with random input. The wave length of the exact solution is T=2π ω010.0T = 2\pi \ \omega_0 \approx 10.0 which matches the previous situation pretty well.

Measures of Dependence

Aside to the individual (also called marginal) distributions FiF_i of the random variables in the process, we define first and second order moments to analyse the whole process. We start with the mean sequence.

Mean Sequence

Autocovariance and Autocorrelation [6]

Next we move to second order moments of various sorts. We only consider the covariance of the observations within a single process. Cross-covariance between two individual series is not considered in this course.

If it is clear from the context, we will skip the subscript XX. It is important to note that both, the autocovariance and the autocorrelation, are symmetric, i.e. γ(i,j)=γ(j,i)\gamma(i, j) = \gamma(j, i). The autocovariance measures the linear dependence of two points on the same process observed at different times. If a series is very smooth, the autocovariance is large, even if ii and jj are far apart. Note that if γ(i,j)=0\gamma(i, j) = 0 this only means that XiX_i and XjX_j are not linearly dependent; however, they still may be dependent in some nonlinear way. For i=ji = j the autocovariance reduces to the variance of XiX_i.

The autocorrelation can be described much in the same spirit, however, it is normalized, i.e. ρ(i,j)[1,1]\rho(i, j) \in [-1, 1]. Put differently, if there is a linear relationship between XiX_i and XjX_j, then ρ(Xi,Xj)=±1\rho(X_i, X_j) = \pm 1. To be more precise, if Xj=β0+β1XiX_j = \beta_0 + \beta_1 X_i, then the autocorrelation will be 1 if β1>0\beta_1 > 0 and -1 else. The autocorrelation hence gives a rough measure how well the series at time ii can be forecast by the value at time jj.

Stationarity [7]

As we have stated above, a time series can be seen as a single realization of a multivariate random variable X1,X2,{X_1, X_2, \ldots} which we call in this special setting a stochastic process. As we already know from elementary statistics, there is no way to do sound statistical analysis on a single observation. Hence we need a concept of regularity that allows us to infer information from a single time series: stationarity.

Strict stationarity

The definition of strict stationarity amounts to say, that any selection of observations in the time series is a typical realization of the random variables in a selection of the process with the same pattern at any point in time. Loosely speaking, this means that the probabilistic character of the process does not change over time. Strict stationarity often is a too strong assumption for many applications and hard to assess from a single data set.

The definition of strict stationarity in particular implies for the special case n=1n = 1, i.e. the “collection” of time instances in fact consists of one sample, that

P(Xic)=P(Xjc)P(X_i \le c) = P(X_j \le c)

for all i,ji, j and for all cc. This amounts to say that the marginal distributions Fi=FF_i = F of the process coincide and in particular this implies that μX(i)=μX(j)\mu_X(i) = \mu_X(j) or in other words, that the mean sequence is constant.

Further, we can consider the case when n=2n = 2. Then the definition of strict stationarity implies that

P(Xic1,Xjc2)=P(Xi+hc1,Xj+hc2).P(X_i \le c_1, X_j \le c_2) = P(X_{i+h} \le c_1, X_{j+h} \le c_2).

The joint distribution of each pair of time instances in the process depends only on the time difference hh and hence the autocovariance satisfies

γ(i,j)=γ(i+h,j+h).\gamma(i, j) = \gamma(i + h, j + h).

These two conclusions from strict stationarity are sufficient for most applications in order to come up with reasonable statistical models. Hence they give rise to the definition of a weaker form of stationarity.

Weak stationarity

From the discussion above it is clear that each strictly stationary time series is also weakly stationary. But the opposite is in general not true. It can be shown that for Gaussian processes, i.e. when all finite collections of random variables in the process have a joint normal distribution, the two notions of stationarity are equivalent.

Testing Stationary [8]

We study the STL decomposition of two time series from Chapter Introduction to Time Series. We begin with the quarterly Australian electricity production. We read the data into Python and decompose the logarithm of the data into a trend, a seasonal component and a remainder. We just plot the remainder of the series.

Source
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from statsmodels.tsa.seasonal import STL

# Load data
AusEl = pd.read_csv('data/AustralianElectricity.csv', sep =";")
# Create pandas DateTimeIndex
dtindexE = pd.DatetimeIndex(data=pd.to_datetime(AusEl["Quarter"]), 
                            freq='infer')

# Set as Index
AusEl.set_index(dtindexE, inplace=True)
AusEl.drop("Quarter", axis=1, inplace=True)

# Decomposition on log-model using STL
decomp = STL(np.log(AusEl), seasonal=15)
decomp = decomp.fit()

# plot
fig = plt.figure(figsize=(10, 5))
ax = fig.add_subplot(1, 1, 1)
ax.plot(decomp.resid)
plt.xlabel("Quarter")
plt.ylabel("Remainder")

plt.show()
<Figure size 1000x500 with 1 Axes>

The series does not exhibit any seasonal patterns, has a constant mean (0) and roughly constant variance. From visual inspection one would conclude that the underlying process is stationary.

The next example is the air temperature measurement. We again read the data, choose a window of appropriate size and coerce it into a time series. Afterwards we decompose the data (without log transform) by STL and study the remainder sequence.

Source
# Load data
AirQ = pd.read_csv('data/AirQualityUCI.csv',
                   parse_dates=True, decimal=",", sep=';')

# Combine Date and Time Columns:
AirQ["Time"] = AirQ["Time"].str.replace(".", ":")
AirQ["Date"] = pd.to_datetime(AirQ["Date"] + " " + AirQ["Time"], dayfirst=True)
dtindex = pd.DatetimeIndex(data=pd.to_datetime(AirQ["Date"]), freq='infer')

# Set as Index
AirQ.set_index(dtindex, inplace=True)
AirQ = AirQ.sort_index()
# Only keep temperature for given period
AirT = AirQ.loc["2004-3-10":"2004-3-30", "T"]
# AirT = pd.DataFrame(AirQ["T"])

# Decomposition on log-model using STL
decomp = STL(AirT, period=24 , seasonal=9)
decomp = decomp.fit()

# plot
fig = plt.figure(figsize=(10, 5))
ax = fig.add_subplot(1, 1, 1)
ax.plot(decomp.resid)
plt.xlabel("Quarter")
plt.ylabel("Remainder")

plt.show()
<Figure size 1000x500 with 1 Axes>

The time series plot still exhibits some seasonality, such that stationarity of the underlying process seems unlikely. Different choices of seasonal in the STL() function will change the behaviour of the remainder sequence.

Estimation of Correlation [9]

We have introduced the most important measures for serial dependence in time series. However, these concepts are restricted to the theoretical models and hence make use of the probabilistic behaviour at each time instance. In practice, however, we are confronted with a single realization of the process, i.e. at each time instance only one value is available. From a viewpoint of classical statistics, this is a problem. The assumption of stationarity is crucial in this context: it paves the way to sound statistical estimators for mean, auto- and crosscovariance sequences.

Throughout this chapter we will assume that the time series x1,x2,,xn{x_1, x_2, \ldots, x_n} is a realization of a weakly stationary process X1,X2,,Xn{X_1, X_2, \ldots, X_n}.

Estimation of the Autocovariance [10]

The theoretical autocovariance function of a stationary process is estimated by the sample autocovariance sequence which we define next.

Note that the sum runs over a restricted range of nhn - h points but we nevertheless normalize by nn rather than nhn - h. Neither choice, however, results in an unbiased estimator.

Example 1 [11]

We compute the estimators above for simulated data. First we simulate a realization of a moving average process.

Source
import numpy as np
import pandas as pd

np.random.seed(0)

# White noise signal
w = np.random.normal(size=1000, loc=2, scale=0.1)
w = pd.DataFrame(w)
# Filter with window = 3
w = w.rolling(window=3).mean()

In Python, the autocovariance and -correlation can be computed with the acof() and acf() (from statsmodels.tsa.stattools) respectively. During the application of the moving average (in the rolling command), missing values are created at the boundaries. To omit NaN entries, the setting missing is set to drop.

Source
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import acovf

# Autocovariance
ac = acovf(w, fft=False, missing='drop', nlag=50)

# Theoretical autocovariance
sigma = 0.1
act = np.zeros(len(ac))
act[0] = 3 / 9 * sigma**2
act[1] = 2 / 9 * sigma**2
act[2] = 1 / 9 * sigma**2


# plot
fig = plt.figure(figsize=(10, 5))
ax = fig.add_subplot(1, 1, 1)

width = 0.4
x = np.linspace(0, len(ac), len(ac))
ax.bar(x - width / 2, ac, width=width, label='Estimated')
ax.bar(x + width / 2, act, width=width, label='Theoretical')
plt.xlabel("Lag")
plt.ylabel("Autocovariance")
plt.legend()

plt.show()
<Figure size 1000x500 with 1 Axes>

The nlag parameter sets the maximal lag up to which the autocovariance should be computed. The figure shows the autocovariance of the moving average data. It can be seen that the lags h=0,1,2h=0,1,2 produce a significant value and that for h>2h>2 the autocovariance is very small. This is in accordance with the theoretical results. We add the values of the true autocovariance for comparison. Nevertheless, the sample autocovariance shows an oscillating pattern which we will comment on below.

Example 2: [12]

We again consider the moving average process. The following command produces the correlogram.

Source
from statsmodels.graphics.tsaplots import plot_acf

# plot
fig = plt.figure(figsize=(10, 5))
ax = fig.add_subplot(1, 1, 1)
plot_acf(w, missing='drop', lags=50, ax=ax, c='C1')
plt.xlabel("Lag")
plt.ylabel("Autocorrelation")
plt.show()
<Figure size 1000x500 with 1 Axes>

The two-sigma confidence bands are drawn automatically and we see that starting from the 3rd all lags produce an autocorrelation which is in the band. In practice, once the estimated autocorrelation stays within these bands one sets it to 0 (A more formal way of testing is the Ljung-Box test that tests the null-hypothesis that a number of autocorrelation coefficients is zero. The test is implemented in the statsmodels.stats.diagnostic.acorr_ljungbox())

Footnotes
  1. Birbaumer et al., 2025, p. 356

  2. Birbaumer et al., 2025, Example 13.1.1

  3. Birbaumer et al., 2025, Example 13.1.2

  4. Birbaumer et al., 2025, Example 13.1.3

  5. Birbaumer et al., 2025, Example 13.1.4

  6. Birbaumer et al., 2025, pg. 364

  7. Birbaumer et al., 2025, p. 366

  8. Birbaumer et al., 2025, Example 13.3.2

  9. Birbaumer et al., 2025, p. 371

  10. Birbaumer et al., 2025, p. 372

  11. Birbaumer et al., 2025, Example 13.4.1

  12. Birbaumer et al., 2025, Example 13.4.2

References
  1. Birbaumer, M., Frick, K., Büchel, P., & van Hemert, S. (2025). Predictive Modeling - Lecture Notes.