Time series data analysis

Isai Guizar

Disclaimer:

This document is intended for educational purposes only. It does not constitute business advice.


In the field of business intelligence time series analysis plays a central role because much of the data that organizations generate and monitor is time-dependent, such as sales, customer demand, or economic/financial indicators. Understanding how indicators evolve over time allows managers to detect trends, anticipate seasonal patterns, and respond proactively to changes. In this module, we divide our work into two sections. In the first section, we focus on the fundamentals, we will learn how to decompose a series, apply transformations, and identify correlations across time. These tools provide the foundation for making sense of sequential data. In the second section, we turn to forecasting, applying statistical and machine learning methods to predict future values. We will illustrate these methods using macroeconomic indicators, while simultaneously practice with business-related datasets that highlight the value of forecasting for decision-making across industries.


1 Fundamentals of time series data analysis

In this first part of the class, we will introduce the basic tools of time series analysis.
The focus is on building intuition: understanding what patterns exist in time series data, how to transform them, and how to detect persistence or seasonality.

1.1 Time series components

Trend: When there is a persistent long-term increase or decrease in the time series

Seasonality: When fluctuations of the time series occurs in fixed periods of time such as weekends, holidays or Christmas

Cycle: Unlike seasonality fluctuations of the time series occur over long periods of time and are not associated to fixed dates in the calendar

Decomposing a series helps managers separate systematic patterns from irregular variation. The components of a time series are: Trend, Cycle, Seasonality, and the Remainder, where the remainder absorbs any information in the series that is not contained in the other components. The trend and cycle are combined into a single component called trend/cycle, or simply trend.

An additive decomposition is expressed:

\[ y_t = T_t + S_t + R_t \]

while a multiplicative decomposition is: \[ y_t = T_t \times S_t \times R_t \]

1.2 Application

We will work with Mexico’s quarterly GDP data from 1990 to 2025 Q2 as our running example. Data will be obtained from the FRED, make sure you have installed the library (!pip install pandas_datareader), you only have to do this once

import pandas as pd
import numpy  as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
from datetime import datetime
# import yfinance as yf
# !pip install pandas-datareader
from pandas_datareader import data as pdr


GDP = pdr.DataReader('NGDPRNSAXDCMXQ', 'fred', start='1993Q1', end = '2025Q1')
GDP = GDP.rename(columns={'NGDPRNSAXDCMXQ': 'Values'})
GDP = GDP.reset_index()
GDP = GDP.rename(columns={'DATE': 'Date'})
GDP['Date'] = pd.to_datetime(GDP['Date'], dayfirst=True) + pd.offsets.QuarterEnd(0)
GDP['Values'] = GDP['Values']/1000 # Mil millones

1.2.1 Original times series

plt.figure(figsize=(7, 5))
plt.plot(GDP['Date'], GDP['Values'])
plt.xlabel('Date')
plt.ylabel('Billions of Mexican Pesos')
plt.title('Mexico Quartely GDP: 1993 to 2025')
plt.tight_layout()
# plt.grid(True)
plt.show()

decomp = sm.tsa.seasonal_decompose(GDP['Values'], model='additive', period=4)

1.2.2 Trend/cycle

plt.figure(figsize=(7, 5))
decomp.trend.plot(title='Trend/Cycle Component', ylabel='Trend')
# plt.grid(True)
plt.tight_layout()
plt.show()

Reflects long-term economic growth.

1.2.3 Seasonality

plt.figure(figsize=(7, 5))
decomp.seasonal.plot(title='Seasonal Component', ylabel='Seasonality')
# plt.grid(True)
plt.tight_layout()
plt.show()

Captures recurring within-year patterns.

1.2.4 Remainder

plt.figure(figsize=(7, 5))
decomp.resid.plot(title='Remainder Component', ylabel='Residual')
# plt.grid(True)
plt.tight_layout()
plt.show()

Shows unpredictable variation.

In a single plot:

fig, axes = plt.subplots(4, 1, figsize=(7, 7), sharex=True)

GDP['Values'].plot(ax=axes[0], title='Mexico Quarterly GDP', ylabel='Level')
decomp.trend.plot(ax=axes[1], title='Trend/Cycle')
decomp.seasonal.plot(ax=axes[2], title='Seasonality')
decomp.resid.plot(ax=axes[3], title='Remainder')

for ax in axes:
    ax.grid(False)
    
plt.tight_layout()
plt.show()

To better analyze underlying economic cycles, it is useful to remove the seasonal component. This is especially helpful for managers and policymakers, since decisions are made on underlying trends rather than seasonal swings.

\[ y_t - S_T \]

1.2.5 Seasonally adjusted

GDP['Deseasonalized'] = GDP['Values'] - decomp.seasonal

plt.figure(figsize=(7, 5))
plt.plot(GDP['Date'], GDP['Values'], label='GDP', linewidth=2,  color = 'steelblue')
plt.plot(GDP['Date'], GDP['Deseasonalized'], label='Seasonally Adjusted',color = 'slategray', linewidth=2)
plt.xlabel('Date')
plt.ylabel('GDP (Billions of Pesos)')
plt.title('Original vs. Seasonally Adjusted GDP')
plt.legend()
# plt.grid(True)
plt.tight_layout()
plt.show()

1.3 Differences, growth, and moving averages

Many forecasting models require stationary series (no trend, stable variance). We often use the following transformations to achieve this:

  1. First difference: removes trend by subtracting the previous observation.

  2. Seasonal difference: removes recurring seasonal patterns.

  3. Growth rates: express relative changes, often more interpretable for managers.

1.3.1 First differences

The first difference of a series, \(\Delta y_t\), is its change between periods \(t − 1\) and \(t\): \[ \Delta y_t = y_t - y_{t-1}=y_t - y_{t-1} = (1-L)y_t \] helps to get rid of the trend, but not seasonality.

where \(L\) is the lag operator: \(Ly_t \equiv y_{t-1}\) is the first lag (\(y_t - y_{t-1}\))

\(L^ky_t \equiv y_{t-k}\) is the \(k^{th}\) lag \((y_t - y_{t-k})\)


GDP['First_Diff'] = GDP['Values'].diff(1)

plt.figure(figsize=(7, 5))
plt.plot(GDP['Date'], GDP['First_Diff'])
plt.title('First Difference of GDP')
plt.xlabel('Date')
plt.ylabel('Billions of Mexican Pesos')
plt.tight_layout()
plt.show()

1.3.2 Seasonal differences

In general, \(\Delta ^k y_t = (1-L^k)y_t\) helps to get rid of trend and seasonality

GDP['Quarter_Diff'] = GDP['Values'].diff(4)

plt.figure(figsize=(7, 5))
plt.plot(GDP['Date'], GDP['Quarter_Diff'])
plt.title('Quarterly Difference of GDP')
plt.xlabel('Date')
plt.ylabel('Billions of Mexican Pesos')
plt.tight_layout()
plt.show()

1.3.3 Growth rate

The growth rate of \(y_t\)is: \[ \Delta\% y_t \equiv \frac{y_t - y_{t-1}}{y_{t-1}} \times 100 \]

Limitation: changes are not symmetric. For 100 to 125 no equal to 125 to 100. Use logs: \[ \Delta\%y_t \approx 100 [ln(y_t)-ln(y_{t-1})] \]

GDP['Growth'] = GDP['Values'].pct_change(1) * 100
# GDP['Growth_YoY'] = GDP['Values'].pct_change(4) * 100 # wrt to previous quarter

plt.figure(figsize=(7, 5))
plt.plot(GDP['Date'], GDP['Growth'])
plt.title('Quarterly Growth of GDP')
plt.xlabel('Date')
plt.ylabel('Percent')
plt.tight_layout()
plt.show()

GDP['Log_Growth'] = np.log(GDP['Values']).diff() * 100

plt.figure(figsize=(7, 5))
plt.plot(GDP['Date'], GDP['Log_Growth'])
plt.title('Quarterly Growth of GDP')
plt.xlabel('Date')
plt.ylabel('Percent')
plt.tight_layout()
plt.show()

Note that:

\[ \begin{align*}\Delta_4\% y_t &\approx \left(\ln y_t - \ln y_{t-4}\right)\times 100 \\ &= \left(\ln y_{t} - \ln y_{t-1} + \ln y_{t-1} - \ln y_{t-2} + \ln y_{t-2} - \ln y_{t-3} + \ln y_{t-3} - \ln y_{t-4}\right)\times 100 \\ &= \Delta\% y_{t} + \Delta\% y_{t-1} + \Delta\% y_{t-2} + \Delta\% y_{t-3}\end{align*} \]

So the inter-annual growth rate can be calculated as:

GDP['Year_Growth'] = np.log(GDP['Values']).diff(4) * 100

plt.figure(figsize=(7, 5))
plt.plot(GDP['Date'], GDP['Year_Growth'])
plt.title('Inter-Annual Growth of GDP')
plt.xlabel('Date')
plt.ylabel('Percent')
plt.tight_layout()
plt.show()

Using seasonally adjusted data to calculate annual grow-rates:

GDP['L_Growth_SA'] = np.log(GDP['Deseasonalized']).diff() * 100
GDP['Year'] = GDP['Date'].dt.year

annual_growth = GDP.groupby('Year')['L_Growth_SA'].sum()
print(annual_growth)
Year
1993   -1.725411
1994    5.014293
1995   -6.619130
1996    8.110851
1997    7.506747
1998    2.696247
1999    3.398140
2000    3.285277
2001   -1.083423
2002    1.767168
2003    1.196723
2004    3.795951
2005    3.251152
2006    3.591131
2007    2.210061
2008   -0.552113
2009   -2.047394
2010    3.556814
2011    3.840775
2012    2.771775
2013    0.899393
2014    3.007726
2015    2.219495
2016    2.278858
2017    1.450301
2018    1.143102
2019   -1.027717
2020   -3.600958
2021    1.896961
2022    4.552885
2023    2.482857
2024    0.360684
2025   -1.141510
Name: L_Growth_SA, dtype: float64

1.4 Moving average smoothing

A simple moving average (MA) is calculated by taking the arithmetic mean of a given set of values over a specified number of periods. A MA smooths short-term fluctuations, helping us visualize underlying cycles. Managers often rely on moving averages to detect long-term growth beyond seasonal noise.

\[ y^n_t \equiv \tfrac{1}{n}\left(y_t + y_{t-1} + y_{t-2} + y_{t-3} + \cdots +y_{t-n-1}\right) \]

The average eliminates seasonality in the data, leaving a smooth trend-cycle component

GDP['MA_4'] = GDP['Values'].rolling(window=4).mean()

plt.figure(figsize=(7, 5))
plt.plot(GDP.index, GDP['Values'], label='Original GDP', color = 'steelblue', linewidth=2)
plt.plot(GDP.index, GDP['MA_4'], label='4-Period Moving Average', color = 'slategray', linewidth=2)
plt.xlabel('Date')
plt.ylabel('GDP (Billions of Pesos)')
plt.title('GDP and 4-Period Moving Average')
plt.legend()
plt.tight_layout()
plt.show()

1.5 Autocorrelation

The correlation of a series with its own lagged values is called autocorrelation or serial correlation. The first autocorrelation of \(y_t\), \(\rho_1\), is: \[ \rho_1 = Corr(y_t,y_{t-1})=\frac{Cov(y_t,y_{t-1})}{\sqrt{Var(y_t)} \sqrt{Var(y_{t-1})} } \] The \(jth\) sample autocorrelation coefficient, \(\rho_j\) is the correlation between \(y_t\) and \(y_{t-j}\):

\[ \rho_j = Corr(y_t,y_{t-j})=\frac{Cov(y_t,y_{t-j})}{\sqrt{Var(y_t)} \sqrt{Var(y_{t-j})} } \]

rho1 = GDP['Values'].autocorr(lag=1)
rho2 = GDP['Values'].autocorr(lag=2)
rho3 = GDP['Values'].autocorr(lag=3)
rho4 = GDP['Values'].autocorr(lag=4)

# Store results
autocorr_values = {
    'rho 1': rho1,
    'rho 2': rho2,
    'rho 3': rho3,
    'rho 4': rho4
}

autocorr_table = pd.DataFrame.from_dict(autocorr_values, orient='index', columns=['Autocorrelation'])
autocorr_table.index.name = 'Lag'

autocorr_table
Autocorrelation
Lag
rho 1 0.978871
rho 2 0.973691
rho 3 0.964568
rho 4 0.967576

The set of autocorrelation coefficients is called the autocorrelation function or ACF

1.5.1 Autocorrelation function (ACF)

from statsmodels.graphics.tsaplots import plot_acf

plot_acf(GDP['Values'], lags=36, zero=False)
plt.title('Autocorrelation Function (ACF) of Mexico GDP')
plt.xlabel('Lag')
plt.ylabel('Autocorrelation')
plt.tight_layout()
plt.show()

1.5.2 Partial Autocorrelation function (PACF)

The autocorrelation between \(y_t\) and \(y_{t-2}\), could be due to effect of \(y_{t-1}\), rather than because \(y_{t-2}\) contains useful information to forecast \(y_t\). Partial autocorrelations measure the relationship between \(y_t\) and \(y_{t-k}\) after removing the effects of the variable in between, \(\{y_{t-1}, y_{t-2}, \dots, y_{t-k+1}\}\).

from statsmodels.graphics.tsaplots import plot_pacf

plot_pacf(GDP['Values'], lags=36, zero=False, method = 'ols')
plt.title('Partial autocorrelation Function (PACF) of Mexico GDP')
plt.xlabel('Lag')
plt.ylabel('Autocorrelation')
plt.tight_layout()
plt.show()

2 Forecasting algorithms

By predicting future values of key indicators, organizations can improve planning, anticipate risks, and allocate resources more effectively. In this section, we move from exploring and transforming data to building forecasting models.

2.1 Application

We will illustrate the process with Mexico’s quarterly GDP growth, a widely used macroeconomic indicator. You are invited to practice with a managerial dataset (e.g., retail sales, hotel occupancy, or energy demand), highlighting how forecasting informs decisions across industries.

GDP = pdr.DataReader('NGDPRNSAXDCMXQ', 'fred', start='1993Q1', end = '2025Q1')
GDP = GDP.rename(columns={'NGDPRNSAXDCMXQ': 'Values'})
GDP = GDP.reset_index()
GDP = GDP.rename(columns={'DATE': 'Date'})

GDP['Date'] = pd.to_datetime(GDP['Date'])
GDP['Date'] = pd.to_datetime(GDP['Date'], dayfirst=True) + pd.offsets.QuarterEnd(0)
GDP.set_index(pd.PeriodIndex(GDP['Date'], freq='Q'), inplace=True)

GDP['Values'] = GDP['Values']/1000 # Mil millones
GDP['y'] = np.log(GDP['Values']).diff() * 100
GDP = GDP.dropna()

2.2 Moving Average (MA)

A moving average of order \(k\), \(MA(k)\), is computed by:

\[ \hat{y}_{t+1}= \frac{y_t + y_{t-1}+\dots+y_{t-k-1}}{k} \]

The moving average for time period \(t\) is the arithmetic mean of the \(k\) most recent observations. Equal weights are assigned to each observation. Each new data point is included in the average as it becomes available, and the earliest data point is discarded.

Choosing \(k\):

  • Small : captures changes quickly, but more volatile [greater responsiveness]
  • Large : smoother forecast, but may lag behind changes [greater stability]
pers = 4
GDP['MA_Forecast'] = GDP['y'].rolling(window=pers).mean().shift(1)

plt.figure(figsize=(7,5))
plt.plot(GDP['Date'], GDP['y'], label='Actual', linewidth=2, alpha =0.5)
plt.plot(GDP['Date'], GDP['MA_Forecast'], label=f'{pers}-Period MA Forecast', color = 'slategray', linestyle='--')
plt.title('Moving Average Forecast of GDP growth')
plt.legend()
plt.tight_layout()
plt.show()

ma4_next = GDP['y'].iloc[-4:].mean()
next_qtr = GDP.index[-1] + 1
print(f"Forecast with an MA(4) for {next_qtr}: {ma4_next:.4f} %")

GDP['MA4_error'] = (GDP['y'] - GDP['MA_Forecast']).where(GDP['MA_Forecast'].notna())
rmse_ma4 = np.sqrt(np.nanmean(GDP['MA4_error']**2))
print(f"RMSE MA(4): {rmse_ma4:.4f} pp")

Forecast with an MA(4) for 2025Q2: 0.1990 %
RMSE MA(4): 3.8224 pp

Next-quarter forecast:

ma4_next = GDP['y'].iloc[-4:].mean()
next_qtr = GDP.index[-1] + 1
print(f"Forecast with an MA(4) for {next_qtr}: {ma4_next:.4f} %")
Forecast with an MA(4) for 2025Q2: 0.1990 %

Calculate the RMSE:

GDP['MA4_error'] = (GDP['y'] - GDP['MA_Forecast']).where(GDP['MA_Forecast'].notna())
rmse_ma4 = np.sqrt(np.nanmean(GDP['MA4_error']**2))
print(f"RMSE MA(4): {rmse_ma4:.4f} pp")
RMSE MA(4): 3.8224 pp

2.3 Exponential Smoothing

Whereas the method of moving averages takes into account only the most recent observations, simple exponential smoothing provides an exponentially weighted moving average of all previously observed values. This method is based on averaging (smoothing) past values of a series in an exponentially decreasing manner The most recent observation receives the largest weight, \(\alpha \in(0,1)\); the next most recent observation receives less weight, \(\alpha(1-\alpha)\); the observation two time periods in the past receives even less weight \(\alpha(1-\alpha)^2\); and so forth. \[ \hat{Y}_{t+1}=\alpha Y_t + (1-\alpha)\hat{Y}_t \] the smoothing constant, \(\alpha\), serves as the weighting factor.

The value of a determines the extent to which the current observation influences the forecast of the next observation. If \(\alpha=1\), the new forecast will be essentially the current observation. As \(\alpha\) approaches to zero, the new forecast will be very similar to the old forecast, and the current observation will have very little impact

To start the algorithm, an initial value for the old smoothed series must be set. A common approach is to set the first smoothed value equal to the first observation.

from statsmodels.tsa.holtwinters import SimpleExpSmoothing

es_model = SimpleExpSmoothing(GDP['y']).fit(smoothing_level=0.3, optimized=False)
GDP['ES_Forecast'] = es_model.fittedvalues.shift(1)  # one-step-ahead forecast

plt.figure(figsize=(7,5))
plt.plot(GDP['Date'], GDP['y'], label='Actual', linewidth=2, alpha=0.5)
plt.plot(GDP['Date'], GDP['ES_Forecast'], label='ES Forecast', color='tomato', linestyle='--')
plt.title('Simple Exponential Smoothing Forecast of GDP Growth')
plt.legend()
plt.tight_layout()
plt.show()

Next quarter forecast:

ses_next = es_model.forecast(1)[0]
next_qtr = GDP.index[-1] + 1
print(f"Exp Smoth Forecast for {next_qtr}: {ses_next:.4f} %")
Exp Smoth Forecast for 2025Q2: -0.6002 %
/var/folders/wl/12fdw3c55777609gp0_kvdrh0000gn/T/ipykernel_58067/3350117772.py:1: FutureWarning:

Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`

Calculate the RMSE:

GDP['ES_error'] = (GDP['y'] - GDP['ES_Forecast']).where(GDP['ES_Forecast'].notna())
rmse_ses = np.sqrt(np.nanmean(GDP['ES_error']**2))
print(f"RMSE Exp Smoth: {rmse_ses:.4f} pp")
RMSE Exp Smoth: 3.5713 pp

Observe, with we set ‘optimized=False’, so the function does not try to estimate \(\alpha\), it just uses the given value (here 0.3). If ‘optimized=True’ (the default), it will choose \(\alpha\) that minimizes an error criterion (usually SSE or likelihood). In your next activity you will be asked to move from simple ES to a full Holt-Winters (ETS) with optimized \(\alpha\).

2.4 Autoregression

An autoregression is a regression model in which \(y_t\) is regressed against its own lagged values. The forecasting model uses past values, \(y_{t-1}, y_{t-2}, y_{t-3}, \dots\), to forecast \(y_t\).

The number of lags used as regressors is called the order of the autoregression. An autorregresive model of order \(p\), \(AR(p)\), is written as: \[ y_{t} = \phi_0 + \phi_{1}y_{t-1} + \phi_{2}y_{t-2} + \dots + \phi_{p}y_{t-p} + u_{t} \]

where \(u_t\) is white noise and restrictions on the values of \(\phi\) apply due to stationarity requirements.

2.4.1 Stationarity test

from statsmodels.tsa.stattools import adfuller, kpss
from statsmodels.stats.diagnostic import acorr_ljungbox
from statsmodels.tsa.ar_model import AutoReg
adf_p  = adfuller(GDP['y'])[1]
kpss_p = kpss(GDP['y'], regression='c', nlags='auto')[1]

print(f"ADF p-value: {adf_p:.4f}")
print(f"KPSS p-value: {kpss_p:.4f}")
ADF p-value: 0.0000
KPSS p-value: 0.1000
/var/folders/wl/12fdw3c55777609gp0_kvdrh0000gn/T/ipykernel_58067/3253650088.py:2: InterpolationWarning:

The test statistic is outside of the range of p-values available in the
look-up table. The actual p-value is greater than the p-value returned.

Tests suggest the time series is stationary.

2.4.2 Forecast with an AR(1) model

df_ar1 = pd.DataFrame({
    'y': GDP['y'],
    'y_lag1': GDP['y'].shift(1)
}).dropna()

df_ar1 = sm.add_constant(df_ar1)

ols_model = sm.OLS(df_ar1['y'], df_ar1[['const', 'y_lag1']]).fit()
print(ols_model.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.127
Model:                            OLS   Adj. R-squared:                  0.120
Method:                 Least Squares   F-statistic:                     18.13
Date:                Thu, 18 Sep 2025   Prob (F-statistic):           4.01e-05
Time:                        20:28:54   Log-Likelihood:                -327.47
No. Observations:                 127   AIC:                             658.9
Df Residuals:                     125   BIC:                             664.6
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.6370      0.288      2.210      0.029       0.067       1.207
y_lag1        -0.3577      0.084     -4.259      0.000      -0.524      -0.191
==============================================================================
Omnibus:                      103.570   Durbin-Watson:                   2.030
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             1584.044
Skew:                          -2.559   Prob(JB):                         0.00
Kurtosis:                      19.527   Cond. No.                         3.47
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

The lagged growth rate is a useful predictor of the current GDP growth rate:

GDP.tail(4)
Date Values y MA_Forecast MA4_error ES_Forecast ES_error
Date
2024Q2 2024-06-30 6429.4663 3.950500 0.378054 3.572446 1.155292 2.795207
2024Q3 2024-09-30 6393.5398 -0.560346 0.552322 -1.112668 -0.326772 -0.233574
2024Q4 2024-12-31 6441.7739 0.751588 0.401697 0.349891 0.956410 -0.204822
2025Q1 2025-03-31 6229.8144 -3.345741 0.089205 -3.434946 0.501383 -3.847124

\[ GDPGR_{25:Q1|24:Q4} = 0.6370 - 0.3577\cdot(-3.3457)= 1.8337 \% \]

last_y = GDP['y'].iloc[-1]

phi_0 = ols_model.params['const']
phi_1 = ols_model.params['y_lag1']

y_hat = phi_0 + phi_1 * last_y
print(f"2025 Q1 Forecast: {y_hat:.4f}%")
2025 Q1 Forecast: 1.8337%

2.4.3 Forecast with an AR(2) model

df_ar2 = pd.DataFrame({
    'y': GDP['y'],
    'y_lag1': GDP['y'].shift(1),
    'y_lag2': GDP['y'].shift(2)
}).dropna()

df_ar2 = sm.add_constant(df_ar2)

ols_model2 = sm.OLS(df_ar2['y'], df_ar2[['const', 'y_lag1','y_lag2']]).fit()
print(ols_model2.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.130
Model:                            OLS   Adj. R-squared:                  0.115
Method:                 Least Squares   F-statistic:                     9.154
Date:                Thu, 18 Sep 2025   Prob (F-statistic):           0.000197
Time:                        20:28:54   Log-Likelihood:                -325.17
No. Observations:                 126   AIC:                             656.3
Df Residuals:                     123   BIC:                             664.8
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.6798      0.297      2.287      0.024       0.091       1.268
y_lag1        -0.3784      0.090     -4.182      0.000      -0.558      -0.199
y_lag2        -0.0585      0.090     -0.646      0.519      -0.238       0.121
==============================================================================
Omnibus:                      105.615   Durbin-Watson:                   2.022
Prob(Omnibus):                  0.000   Jarque-Bera (JB):             1564.126
Skew:                          -2.676   Prob(JB):                         0.00
Kurtosis:                      19.410   Cond. No.                         4.11
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
y_l1 = GDP['y'].iloc[-1]
y_l2 = GDP['y'].iloc[-2]

phi2_0 = ols_model2.params['const']
phi2_1 = ols_model2.params['y_lag1']
phi2_2 = ols_model2.params['y_lag2']

y_hat = phi2_0 + phi2_1 * y_l1 + phi2_2 * y_l2
print(f"2025 Q1 Forecast: {y_hat:.4f}%")
2025 Q1 Forecast: 1.9019%

2.4.4 Root Mean Squared Forecast Error (RMSFE)

To start a pseudo out-of-sample (POOS), split the sample into ‘train’ and ‘test’ sub-samples.

y_roll = GDP['y'] 
T = len(y_roll)
P = int(0.8 * T) # 80% train

Fit the model recursively and use it to forecast one step ahead

forecasts = []
actuals   = []

# Rolling POOS forecast
for s in range(P, T - 1):
    # Fit model on data up to time s
    model = AutoReg(y_roll[:s+1], lags=1, old_names=False).fit()

    # Forecast y_{s+1}
    y_pred = model.predict(start=s+1, end=s+1).iloc[0]

    forecasts.append(y_pred)
    actuals.append(y_roll.iloc[s+1])

Compute forecast errors:

errors = np.array(actuals) - np.array(forecasts)

# Compute RMSE
rmsfe = np.sqrt(np.mean(errors**2))
print(f"POOS RMSE AR(1): {rmsfe:.4f} pp")
POOS RMSE AR(1): 5.5539 pp

Recall the forecast of GDP growth using an AR(1) model

y_hat = phi_0 + phi_1 * last_y
print(f"2025 Q2 Forecast: {y_hat:.4f}%")
2025 Q2 Forecast: 1.8337%

which can also be obtained as:

final_model = AutoReg(GDP['y'], lags=1).fit()
forecast_T1 = final_model.params['const'] + final_model.params['y.L1'] * GDP['y'].iloc[-1]
print(f"2025 Q2 Forecast: {forecast_T1:.4f}%")
2025 Q2 Forecast: 1.8337%

Construct confidence intervals:

from scipy.stats import norm

# For a 80% confidence interval
alpha = 0.20
z = norm.ppf(1-alpha/2)

ci_lower = forecast_T1 - z * rmsfe
ci_upper = forecast_T1 + z * rmsfe

print(f"2025 Q1 Forecast: {forecast_T1:.4f}")
print(f"80% CI (based on RMSFE): [{ci_lower:.4f}, {ci_upper:.4f}]")
2025 Q1 Forecast: 1.8337
80% CI (based on RMSFE): [-5.2839, 8.9514]

Forecasting the next 4 quarters:

forecast_horizon = 4
f_values = []
current_y = last_y

for _ in range(forecast_horizon):
    next_y = final_model.params['const'] + final_model.params['y.L1'] * current_y
    f_values.append(next_y)
    current_y = next_y


ci_upper = [f + z * rmsfe for f in f_values]
ci_lower = [f - z * rmsfe for f in f_values]
GDP['Date'] = pd.to_datetime(GDP['Date'])
last_date   = GDP['Date'].iloc[-1]
forecast_dates = pd.date_range(start=last_date + pd.offsets.QuarterEnd(), periods=4, freq='QE')

extended_dates  = [GDP['Date'].iloc[-1]] + list(forecast_dates)
extended_values = [GDP['y'].iloc[-1]] + f_values

# Plot
plt.figure(figsize=(7, 5))
plt.plot(GDP['Date'], GDP['y'], label='Observed')
plt.plot(extended_dates, extended_values, label='Forecast', color='orange')
plt.fill_between(forecast_dates, ci_lower, ci_upper, color='orange', alpha=0.3, label='80% CI')
plt.title('Forecast of AR(1) for GDP Growth with 80% Confidence Interval')
plt.xlabel('Date')
plt.ylabel('y')
plt.legend()
plt.tight_layout()
plt.show()

```