import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
from datetime import datetime
# import yfinance as yf
# !pip install pandas-datareader
from pandas_datareader import data as pdr
Time series data analysis
Disclaimer:
This document is intended for educational purposes only. It does not constitute business advice.
In the field of business intelligence time series analysis plays a central role because much of the data that organizations generate and monitor is time-dependent, such as sales, customer demand, or economic/financial indicators. Understanding how indicators evolve over time allows managers to detect trends, anticipate seasonal patterns, and respond proactively to changes. In this module, we divide our work into two sections. In the first section, we focus on the fundamentals, we will learn how to decompose a series, apply transformations, and identify correlations across time. These tools provide the foundation for making sense of sequential data. In the second section, we turn to forecasting, applying statistical and machine learning methods to predict future values. We will illustrate these methods using macroeconomic indicators, while simultaneously practice with business-related datasets that highlight the value of forecasting for decision-making across industries.
1 Fundamentals of time series data analysis
In this first part of the class, we will introduce the basic tools of time series analysis.
The focus is on building intuition: understanding what patterns exist in time series data, how to transform them, and how to detect persistence or seasonality.
1.1 Time series components
Trend: When there is a persistent long-term increase or decrease in the time series
Seasonality: When fluctuations of the time series occurs in fixed periods of time such as weekends, holidays or Christmas
Cycle: Unlike seasonality fluctuations of the time series occur over long periods of time and are not associated to fixed dates in the calendar
Decomposing a series helps managers separate systematic patterns from irregular variation. The components of a time series are: Trend, Cycle, Seasonality, and the Remainder, where the remainder absorbs any information in the series that is not contained in the other components. The trend and cycle are combined into a single component called trend/cycle, or simply trend.
An additive decomposition is expressed:
\[ y_t = T_t + S_t + R_t \]
while a multiplicative decomposition is: \[ y_t = T_t \times S_t \times R_t \]
1.2 Application
= pdr.DataReader('NGDPRNSAXDCMXQ', 'fred', start='1993Q1', end = '2025Q1')
GDP = GDP.rename(columns={'NGDPRNSAXDCMXQ': 'Values'})
GDP = GDP.reset_index()
GDP = GDP.rename(columns={'DATE': 'Date'})
GDP 'Date'] = pd.to_datetime(GDP['Date'], dayfirst=True) + pd.offsets.QuarterEnd(0)
GDP['Values'] = GDP['Values']/1000 # Mil millones GDP[
1.2.1 Original times series
=(7, 5))
plt.figure(figsize'Date'], GDP['Values'])
plt.plot(GDP['Date')
plt.xlabel('Billions of Mexican Pesos')
plt.ylabel('Mexico Quartely GDP: 1993 to 2025')
plt.title(
plt.tight_layout()# plt.grid(True)
plt.show()
= sm.tsa.seasonal_decompose(GDP['Values'], model='additive', period=4) decomp
1.2.2 Trend/cycle
=(7, 5))
plt.figure(figsize='Trend/Cycle Component', ylabel='Trend')
decomp.trend.plot(title# plt.grid(True)
plt.tight_layout() plt.show()
Reflects long-term economic growth.
1.2.3 Seasonality
=(7, 5))
plt.figure(figsize='Seasonal Component', ylabel='Seasonality')
decomp.seasonal.plot(title# plt.grid(True)
plt.tight_layout() plt.show()
Captures recurring within-year patterns.
1.2.4 Remainder
=(7, 5))
plt.figure(figsize='Remainder Component', ylabel='Residual')
decomp.resid.plot(title# plt.grid(True)
plt.tight_layout() plt.show()
Shows unpredictable variation.
In a single plot:
= plt.subplots(4, 1, figsize=(7, 7), sharex=True)
fig, axes
'Values'].plot(ax=axes[0], title='Mexico Quarterly GDP', ylabel='Level')
GDP[=axes[1], title='Trend/Cycle')
decomp.trend.plot(ax=axes[2], title='Seasonality')
decomp.seasonal.plot(ax=axes[3], title='Remainder')
decomp.resid.plot(ax
for ax in axes:
False)
ax.grid(
plt.tight_layout() plt.show()
To better analyze underlying economic cycles, it is useful to remove the seasonal component. This is especially helpful for managers and policymakers, since decisions are made on underlying trends rather than seasonal swings.
\[ y_t - S_T \]
1.2.5 Seasonally adjusted
'Deseasonalized'] = GDP['Values'] - decomp.seasonal
GDP[
=(7, 5))
plt.figure(figsize'Date'], GDP['Values'], label='GDP', linewidth=2, color = 'steelblue')
plt.plot(GDP['Date'], GDP['Deseasonalized'], label='Seasonally Adjusted',color = 'slategray', linewidth=2)
plt.plot(GDP['Date')
plt.xlabel('GDP (Billions of Pesos)')
plt.ylabel('Original vs. Seasonally Adjusted GDP')
plt.title(
plt.legend()# plt.grid(True)
plt.tight_layout() plt.show()
1.3 Differences, growth, and moving averages
Many forecasting models require stationary series (no trend, stable variance). We often use the following transformations to achieve this:
First difference: removes trend by subtracting the previous observation.
Seasonal difference: removes recurring seasonal patterns.
Growth rates: express relative changes, often more interpretable for managers.
1.3.1 First differences
The first difference of a series, \(\Delta y_t\), is its change between periods \(t − 1\) and \(t\): \[ \Delta y_t = y_t - y_{t-1}=y_t - y_{t-1} = (1-L)y_t \] helps to get rid of the trend, but not seasonality.
where \(L\) is the lag operator: \(Ly_t \equiv y_{t-1}\) is the first lag (\(y_t - y_{t-1}\))
\(L^ky_t \equiv y_{t-k}\) is the \(k^{th}\) lag \((y_t - y_{t-k})\)
'First_Diff'] = GDP['Values'].diff(1)
GDP[
=(7, 5))
plt.figure(figsize'Date'], GDP['First_Diff'])
plt.plot(GDP['First Difference of GDP')
plt.title('Date')
plt.xlabel('Billions of Mexican Pesos')
plt.ylabel(
plt.tight_layout() plt.show()
1.3.2 Seasonal differences
In general, \(\Delta ^k y_t = (1-L^k)y_t\) helps to get rid of trend and seasonality
'Quarter_Diff'] = GDP['Values'].diff(4)
GDP[
=(7, 5))
plt.figure(figsize'Date'], GDP['Quarter_Diff'])
plt.plot(GDP['Quarterly Difference of GDP')
plt.title('Date')
plt.xlabel('Billions of Mexican Pesos')
plt.ylabel(
plt.tight_layout() plt.show()
1.3.3 Growth rate
The growth rate of \(y_t\)is: \[ \Delta\% y_t \equiv \frac{y_t - y_{t-1}}{y_{t-1}} \times 100 \]
Limitation: changes are not symmetric. For 100 to 125 no equal to 125 to 100. Use logs: \[ \Delta\%y_t \approx 100 [ln(y_t)-ln(y_{t-1})] \]
'Growth'] = GDP['Values'].pct_change(1) * 100
GDP[# GDP['Growth_YoY'] = GDP['Values'].pct_change(4) * 100 # wrt to previous quarter
=(7, 5))
plt.figure(figsize'Date'], GDP['Growth'])
plt.plot(GDP['Quarterly Growth of GDP')
plt.title('Date')
plt.xlabel('Percent')
plt.ylabel(
plt.tight_layout() plt.show()
'Log_Growth'] = np.log(GDP['Values']).diff() * 100
GDP[
=(7, 5))
plt.figure(figsize'Date'], GDP['Log_Growth'])
plt.plot(GDP['Quarterly Growth of GDP')
plt.title('Date')
plt.xlabel('Percent')
plt.ylabel(
plt.tight_layout() plt.show()
Note that:
\[ \begin{align*}\Delta_4\% y_t &\approx \left(\ln y_t - \ln y_{t-4}\right)\times 100 \\ &= \left(\ln y_{t} - \ln y_{t-1} + \ln y_{t-1} - \ln y_{t-2} + \ln y_{t-2} - \ln y_{t-3} + \ln y_{t-3} - \ln y_{t-4}\right)\times 100 \\ &= \Delta\% y_{t} + \Delta\% y_{t-1} + \Delta\% y_{t-2} + \Delta\% y_{t-3}\end{align*} \]
So the inter-annual growth rate can be calculated as:
'Year_Growth'] = np.log(GDP['Values']).diff(4) * 100
GDP[
=(7, 5))
plt.figure(figsize'Date'], GDP['Year_Growth'])
plt.plot(GDP['Inter-Annual Growth of GDP')
plt.title('Date')
plt.xlabel('Percent')
plt.ylabel(
plt.tight_layout() plt.show()
Using seasonally adjusted data to calculate annual grow-rates:
'L_Growth_SA'] = np.log(GDP['Deseasonalized']).diff() * 100
GDP['Year'] = GDP['Date'].dt.year
GDP[
= GDP.groupby('Year')['L_Growth_SA'].sum()
annual_growth print(annual_growth)
Year
1993 -1.725411
1994 5.014293
1995 -6.619130
1996 8.110851
1997 7.506747
1998 2.696247
1999 3.398140
2000 3.285277
2001 -1.083423
2002 1.767168
2003 1.196723
2004 3.795951
2005 3.251152
2006 3.591131
2007 2.210061
2008 -0.552113
2009 -2.047394
2010 3.556814
2011 3.840775
2012 2.771775
2013 0.899393
2014 3.007726
2015 2.219495
2016 2.278858
2017 1.450301
2018 1.143102
2019 -1.027717
2020 -3.600958
2021 1.896961
2022 4.552885
2023 2.482857
2024 0.360684
2025 -1.141510
Name: L_Growth_SA, dtype: float64
1.4 Moving average smoothing
A simple moving average (MA) is calculated by taking the arithmetic mean of a given set of values over a specified number of periods. A MA smooths short-term fluctuations, helping us visualize underlying cycles. Managers often rely on moving averages to detect long-term growth beyond seasonal noise.
\[ y^n_t \equiv \tfrac{1}{n}\left(y_t + y_{t-1} + y_{t-2} + y_{t-3} + \cdots +y_{t-n-1}\right) \]
The average eliminates seasonality in the data, leaving a smooth trend-cycle component
'MA_4'] = GDP['Values'].rolling(window=4).mean()
GDP[
=(7, 5))
plt.figure(figsize'Values'], label='Original GDP', color = 'steelblue', linewidth=2)
plt.plot(GDP.index, GDP['MA_4'], label='4-Period Moving Average', color = 'slategray', linewidth=2)
plt.plot(GDP.index, GDP['Date')
plt.xlabel('GDP (Billions of Pesos)')
plt.ylabel('GDP and 4-Period Moving Average')
plt.title(
plt.legend()
plt.tight_layout() plt.show()
1.5 Autocorrelation
The correlation of a series with its own lagged values is called autocorrelation or serial correlation. The first autocorrelation of \(y_t\), \(\rho_1\), is: \[ \rho_1 = Corr(y_t,y_{t-1})=\frac{Cov(y_t,y_{t-1})}{\sqrt{Var(y_t)} \sqrt{Var(y_{t-1})} } \] The \(jth\) sample autocorrelation coefficient, \(\rho_j\) is the correlation between \(y_t\) and \(y_{t-j}\):
\[ \rho_j = Corr(y_t,y_{t-j})=\frac{Cov(y_t,y_{t-j})}{\sqrt{Var(y_t)} \sqrt{Var(y_{t-j})} } \]
= GDP['Values'].autocorr(lag=1)
rho1 = GDP['Values'].autocorr(lag=2)
rho2 = GDP['Values'].autocorr(lag=3)
rho3 = GDP['Values'].autocorr(lag=4)
rho4
# Store results
= {
autocorr_values 'rho 1': rho1,
'rho 2': rho2,
'rho 3': rho3,
'rho 4': rho4
}
= pd.DataFrame.from_dict(autocorr_values, orient='index', columns=['Autocorrelation'])
autocorr_table = 'Lag'
autocorr_table.index.name
autocorr_table
Autocorrelation | |
---|---|
Lag | |
rho 1 | 0.978871 |
rho 2 | 0.973691 |
rho 3 | 0.964568 |
rho 4 | 0.967576 |
The set of autocorrelation coefficients is called the autocorrelation function or ACF
1.5.1 Autocorrelation function (ACF)
from statsmodels.graphics.tsaplots import plot_acf
'Values'], lags=36, zero=False)
plot_acf(GDP['Autocorrelation Function (ACF) of Mexico GDP')
plt.title('Lag')
plt.xlabel('Autocorrelation')
plt.ylabel(
plt.tight_layout() plt.show()
1.5.2 Partial Autocorrelation function (PACF)
The autocorrelation between \(y_t\) and \(y_{t-2}\), could be due to effect of \(y_{t-1}\), rather than because \(y_{t-2}\) contains useful information to forecast \(y_t\). Partial autocorrelations measure the relationship between \(y_t\) and \(y_{t-k}\) after removing the effects of the variable in between, \(\{y_{t-1}, y_{t-2}, \dots, y_{t-k+1}\}\).
from statsmodels.graphics.tsaplots import plot_pacf
'Values'], lags=36, zero=False, method = 'ols')
plot_pacf(GDP['Partial autocorrelation Function (PACF) of Mexico GDP')
plt.title('Lag')
plt.xlabel('Autocorrelation')
plt.ylabel(
plt.tight_layout() plt.show()
2 Forecasting algorithms
By predicting future values of key indicators, organizations can improve planning, anticipate risks, and allocate resources more effectively. In this section, we move from exploring and transforming data to building forecasting models.
2.1 Application
= pdr.DataReader('NGDPRNSAXDCMXQ', 'fred', start='1993Q1', end = '2025Q1')
GDP = GDP.rename(columns={'NGDPRNSAXDCMXQ': 'Values'})
GDP = GDP.reset_index()
GDP = GDP.rename(columns={'DATE': 'Date'})
GDP
'Date'] = pd.to_datetime(GDP['Date'])
GDP['Date'] = pd.to_datetime(GDP['Date'], dayfirst=True) + pd.offsets.QuarterEnd(0)
GDP['Date'], freq='Q'), inplace=True)
GDP.set_index(pd.PeriodIndex(GDP[
'Values'] = GDP['Values']/1000 # Mil millones
GDP['y'] = np.log(GDP['Values']).diff() * 100
GDP[= GDP.dropna() GDP
2.2 Moving Average (MA)
A moving average of order \(k\), \(MA(k)\), is computed by:
\[ \hat{y}_{t+1}= \frac{y_t + y_{t-1}+\dots+y_{t-k-1}}{k} \]
The moving average for time period \(t\) is the arithmetic mean of the \(k\) most recent observations. Equal weights are assigned to each observation. Each new data point is included in the average as it becomes available, and the earliest data point is discarded.
Choosing \(k\):
- Small : captures changes quickly, but more volatile [greater responsiveness]
- Large : smoother forecast, but may lag behind changes [greater stability]
= 4
pers 'MA_Forecast'] = GDP['y'].rolling(window=pers).mean().shift(1)
GDP[
=(7,5))
plt.figure(figsize'Date'], GDP['y'], label='Actual', linewidth=2, alpha =0.5)
plt.plot(GDP['Date'], GDP['MA_Forecast'], label=f'{pers}-Period MA Forecast', color = 'slategray', linestyle='--')
plt.plot(GDP['Moving Average Forecast of GDP growth')
plt.title(
plt.legend()
plt.tight_layout()
plt.show()
= GDP['y'].iloc[-4:].mean()
ma4_next = GDP.index[-1] + 1
next_qtr print(f"Forecast with an MA(4) for {next_qtr}: {ma4_next:.4f} %")
'MA4_error'] = (GDP['y'] - GDP['MA_Forecast']).where(GDP['MA_Forecast'].notna())
GDP[= np.sqrt(np.nanmean(GDP['MA4_error']**2))
rmse_ma4 print(f"RMSE MA(4): {rmse_ma4:.4f} pp")
Forecast with an MA(4) for 2025Q2: 0.1990 %
RMSE MA(4): 3.8224 pp
Next-quarter forecast:
= GDP['y'].iloc[-4:].mean()
ma4_next = GDP.index[-1] + 1
next_qtr print(f"Forecast with an MA(4) for {next_qtr}: {ma4_next:.4f} %")
Forecast with an MA(4) for 2025Q2: 0.1990 %
Calculate the RMSE:
'MA4_error'] = (GDP['y'] - GDP['MA_Forecast']).where(GDP['MA_Forecast'].notna())
GDP[= np.sqrt(np.nanmean(GDP['MA4_error']**2))
rmse_ma4 print(f"RMSE MA(4): {rmse_ma4:.4f} pp")
RMSE MA(4): 3.8224 pp
2.3 Exponential Smoothing
Whereas the method of moving averages takes into account only the most recent observations, simple exponential smoothing provides an exponentially weighted moving average of all previously observed values. This method is based on averaging (smoothing) past values of a series in an exponentially decreasing manner The most recent observation receives the largest weight, \(\alpha \in(0,1)\); the next most recent observation receives less weight, \(\alpha(1-\alpha)\); the observation two time periods in the past receives even less weight \(\alpha(1-\alpha)^2\); and so forth. \[ \hat{Y}_{t+1}=\alpha Y_t + (1-\alpha)\hat{Y}_t \] the smoothing constant, \(\alpha\), serves as the weighting factor.
The value of a determines the extent to which the current observation influences the forecast of the next observation. If \(\alpha=1\), the new forecast will be essentially the current observation. As \(\alpha\) approaches to zero, the new forecast will be very similar to the old forecast, and the current observation will have very little impact
To start the algorithm, an initial value for the old smoothed series must be set. A common approach is to set the first smoothed value equal to the first observation.
from statsmodels.tsa.holtwinters import SimpleExpSmoothing
= SimpleExpSmoothing(GDP['y']).fit(smoothing_level=0.3, optimized=False)
es_model 'ES_Forecast'] = es_model.fittedvalues.shift(1) # one-step-ahead forecast
GDP[
=(7,5))
plt.figure(figsize'Date'], GDP['y'], label='Actual', linewidth=2, alpha=0.5)
plt.plot(GDP['Date'], GDP['ES_Forecast'], label='ES Forecast', color='tomato', linestyle='--')
plt.plot(GDP['Simple Exponential Smoothing Forecast of GDP Growth')
plt.title(
plt.legend()
plt.tight_layout() plt.show()
Next quarter forecast:
= es_model.forecast(1)[0]
ses_next = GDP.index[-1] + 1
next_qtr print(f"Exp Smoth Forecast for {next_qtr}: {ses_next:.4f} %")
Exp Smoth Forecast for 2025Q2: -0.6002 %
/var/folders/wl/12fdw3c55777609gp0_kvdrh0000gn/T/ipykernel_58067/3350117772.py:1: FutureWarning:
Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
Calculate the RMSE:
'ES_error'] = (GDP['y'] - GDP['ES_Forecast']).where(GDP['ES_Forecast'].notna())
GDP[= np.sqrt(np.nanmean(GDP['ES_error']**2))
rmse_ses print(f"RMSE Exp Smoth: {rmse_ses:.4f} pp")
RMSE Exp Smoth: 3.5713 pp
Observe, with we set ‘optimized=False’, so the function does not try to estimate \(\alpha\), it just uses the given value (here 0.3). If ‘optimized=True’ (the default), it will choose \(\alpha\) that minimizes an error criterion (usually SSE or likelihood). In your next activity you will be asked to move from simple ES to a full Holt-Winters (ETS) with optimized \(\alpha\).
2.4 Autoregression
An autoregression is a regression model in which \(y_t\) is regressed against its own lagged values. The forecasting model uses past values, \(y_{t-1}, y_{t-2}, y_{t-3}, \dots\), to forecast \(y_t\).
The number of lags used as regressors is called the order of the autoregression. An autorregresive model of order \(p\), \(AR(p)\), is written as: \[ y_{t} = \phi_0 + \phi_{1}y_{t-1} + \phi_{2}y_{t-2} + \dots + \phi_{p}y_{t-p} + u_{t} \]
where \(u_t\) is white noise and restrictions on the values of \(\phi\) apply due to stationarity requirements.
2.4.1 Stationarity test
from statsmodels.tsa.stattools import adfuller, kpss
from statsmodels.stats.diagnostic import acorr_ljungbox
from statsmodels.tsa.ar_model import AutoReg
= adfuller(GDP['y'])[1]
adf_p = kpss(GDP['y'], regression='c', nlags='auto')[1]
kpss_p
print(f"ADF p-value: {adf_p:.4f}")
print(f"KPSS p-value: {kpss_p:.4f}")
ADF p-value: 0.0000
KPSS p-value: 0.1000
/var/folders/wl/12fdw3c55777609gp0_kvdrh0000gn/T/ipykernel_58067/3253650088.py:2: InterpolationWarning:
The test statistic is outside of the range of p-values available in the
look-up table. The actual p-value is greater than the p-value returned.
Tests suggest the time series is stationary.
2.4.2 Forecast with an AR(1) model
= pd.DataFrame({
df_ar1 'y': GDP['y'],
'y_lag1': GDP['y'].shift(1)
}).dropna()
= sm.add_constant(df_ar1)
df_ar1
= sm.OLS(df_ar1['y'], df_ar1[['const', 'y_lag1']]).fit()
ols_model print(ols_model.summary())
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.127
Model: OLS Adj. R-squared: 0.120
Method: Least Squares F-statistic: 18.13
Date: Thu, 18 Sep 2025 Prob (F-statistic): 4.01e-05
Time: 20:28:54 Log-Likelihood: -327.47
No. Observations: 127 AIC: 658.9
Df Residuals: 125 BIC: 664.6
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 0.6370 0.288 2.210 0.029 0.067 1.207
y_lag1 -0.3577 0.084 -4.259 0.000 -0.524 -0.191
==============================================================================
Omnibus: 103.570 Durbin-Watson: 2.030
Prob(Omnibus): 0.000 Jarque-Bera (JB): 1584.044
Skew: -2.559 Prob(JB): 0.00
Kurtosis: 19.527 Cond. No. 3.47
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
The lagged growth rate is a useful predictor of the current GDP growth rate:
4) GDP.tail(
Date | Values | y | MA_Forecast | MA4_error | ES_Forecast | ES_error | |
---|---|---|---|---|---|---|---|
Date | |||||||
2024Q2 | 2024-06-30 | 6429.4663 | 3.950500 | 0.378054 | 3.572446 | 1.155292 | 2.795207 |
2024Q3 | 2024-09-30 | 6393.5398 | -0.560346 | 0.552322 | -1.112668 | -0.326772 | -0.233574 |
2024Q4 | 2024-12-31 | 6441.7739 | 0.751588 | 0.401697 | 0.349891 | 0.956410 | -0.204822 |
2025Q1 | 2025-03-31 | 6229.8144 | -3.345741 | 0.089205 | -3.434946 | 0.501383 | -3.847124 |
\[ GDPGR_{25:Q1|24:Q4} = 0.6370 - 0.3577\cdot(-3.3457)= 1.8337 \% \]
= GDP['y'].iloc[-1]
last_y
= ols_model.params['const']
phi_0 = ols_model.params['y_lag1']
phi_1
= phi_0 + phi_1 * last_y
y_hat print(f"2025 Q1 Forecast: {y_hat:.4f}%")
2025 Q1 Forecast: 1.8337%
2.4.3 Forecast with an AR(2) model
= pd.DataFrame({
df_ar2 'y': GDP['y'],
'y_lag1': GDP['y'].shift(1),
'y_lag2': GDP['y'].shift(2)
}).dropna()
= sm.add_constant(df_ar2)
df_ar2
= sm.OLS(df_ar2['y'], df_ar2[['const', 'y_lag1','y_lag2']]).fit()
ols_model2 print(ols_model2.summary())
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.130
Model: OLS Adj. R-squared: 0.115
Method: Least Squares F-statistic: 9.154
Date: Thu, 18 Sep 2025 Prob (F-statistic): 0.000197
Time: 20:28:54 Log-Likelihood: -325.17
No. Observations: 126 AIC: 656.3
Df Residuals: 123 BIC: 664.8
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 0.6798 0.297 2.287 0.024 0.091 1.268
y_lag1 -0.3784 0.090 -4.182 0.000 -0.558 -0.199
y_lag2 -0.0585 0.090 -0.646 0.519 -0.238 0.121
==============================================================================
Omnibus: 105.615 Durbin-Watson: 2.022
Prob(Omnibus): 0.000 Jarque-Bera (JB): 1564.126
Skew: -2.676 Prob(JB): 0.00
Kurtosis: 19.410 Cond. No. 4.11
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
= GDP['y'].iloc[-1]
y_l1 = GDP['y'].iloc[-2]
y_l2
= ols_model2.params['const']
phi2_0 = ols_model2.params['y_lag1']
phi2_1 = ols_model2.params['y_lag2']
phi2_2
= phi2_0 + phi2_1 * y_l1 + phi2_2 * y_l2
y_hat print(f"2025 Q1 Forecast: {y_hat:.4f}%")
2025 Q1 Forecast: 1.9019%
2.4.4 Root Mean Squared Forecast Error (RMSFE)
To start a pseudo out-of-sample (POOS), split the sample into ‘train’ and ‘test’ sub-samples.
= GDP['y']
y_roll = len(y_roll)
T = int(0.8 * T) # 80% train P
Fit the model recursively and use it to forecast one step ahead
= []
forecasts = []
actuals
# Rolling POOS forecast
for s in range(P, T - 1):
# Fit model on data up to time s
= AutoReg(y_roll[:s+1], lags=1, old_names=False).fit()
model
# Forecast y_{s+1}
= model.predict(start=s+1, end=s+1).iloc[0]
y_pred
forecasts.append(y_pred)+1]) actuals.append(y_roll.iloc[s
Compute forecast errors:
= np.array(actuals) - np.array(forecasts)
errors
# Compute RMSE
= np.sqrt(np.mean(errors**2))
rmsfe print(f"POOS RMSE AR(1): {rmsfe:.4f} pp")
POOS RMSE AR(1): 5.5539 pp
Recall the forecast of GDP growth using an AR(1) model
= phi_0 + phi_1 * last_y
y_hat print(f"2025 Q2 Forecast: {y_hat:.4f}%")
2025 Q2 Forecast: 1.8337%
which can also be obtained as:
= AutoReg(GDP['y'], lags=1).fit()
final_model = final_model.params['const'] + final_model.params['y.L1'] * GDP['y'].iloc[-1]
forecast_T1 print(f"2025 Q2 Forecast: {forecast_T1:.4f}%")
2025 Q2 Forecast: 1.8337%
Construct confidence intervals:
from scipy.stats import norm
# For a 80% confidence interval
= 0.20
alpha = norm.ppf(1-alpha/2)
z
= forecast_T1 - z * rmsfe
ci_lower = forecast_T1 + z * rmsfe
ci_upper
print(f"2025 Q1 Forecast: {forecast_T1:.4f}")
print(f"80% CI (based on RMSFE): [{ci_lower:.4f}, {ci_upper:.4f}]")
2025 Q1 Forecast: 1.8337
80% CI (based on RMSFE): [-5.2839, 8.9514]
Forecasting the next 4 quarters:
= 4
forecast_horizon = []
f_values = last_y
current_y
for _ in range(forecast_horizon):
= final_model.params['const'] + final_model.params['y.L1'] * current_y
next_y
f_values.append(next_y)= next_y
current_y
= [f + z * rmsfe for f in f_values]
ci_upper = [f - z * rmsfe for f in f_values] ci_lower
'Date'] = pd.to_datetime(GDP['Date'])
GDP[= GDP['Date'].iloc[-1]
last_date = pd.date_range(start=last_date + pd.offsets.QuarterEnd(), periods=4, freq='QE')
forecast_dates
= [GDP['Date'].iloc[-1]] + list(forecast_dates)
extended_dates = [GDP['y'].iloc[-1]] + f_values
extended_values
# Plot
=(7, 5))
plt.figure(figsize'Date'], GDP['y'], label='Observed')
plt.plot(GDP[='Forecast', color='orange')
plt.plot(extended_dates, extended_values, label='orange', alpha=0.3, label='80% CI')
plt.fill_between(forecast_dates, ci_lower, ci_upper, color'Forecast of AR(1) for GDP Growth with 80% Confidence Interval')
plt.title('Date')
plt.xlabel('y')
plt.ylabel(
plt.legend()
plt.tight_layout() plt.show()
```