Module 8 Discussion

Author

Teddy Kelly

Part I

For this discussion, I have chosen to analyze the US Real GDP time series from Fred. Below, I have set my Fred API key, loaded in the time series, and converted it into a tsibble to using the fpp3 package.

rm(list=ls())
library(fredr)
library(fpp3)
library(patchwork)
library(kableExtra)

# Set Fred API key
fredr_set_key(Sys.getenv('fred_api_key'))

rgdp <- fredr(series_id = 'GDPC1',
              observation_start = as.Date('2005-01-01'),
              observation_end = as.Date('2025-10-01'))

# Convert time series into a tsibble
rgdp <- rgdp |>
  mutate(quarter = yearquarter(date)) |>
  select(quarter, value) |>
  as_tsibble(index = quarter)

# Initial Plot of the Real GDP time series
rgdp |> autoplot(value) +
  labs(title = "US Real GDP since 2005",
       x = "Time (Quarter)",
       y = "Real GDP")

  • The plot above of the Real GDP time series shows a clear upward trend over time, indicating that the time series is not mean stationary and will likely require differencing.
  • Below, I have used the unitroot_ndiffs function to confirm that the time series will require a first difference to achieve stationarity.
rgdp |> features(value, unitroot_ndiffs) |> 
  rename('Number of Differences' = 'ndiffs') |> 
  kable()
Number of Differences
1
  • Therefore, the optimal \(ARIMA(p,d,q)\) model will likely have \(d=1\).

  • Also, since the time series displays a clear upward trend, the optimal ARIMA model will probably include a drift component to allow for the forecasts to match this upward linear trend.

Now, I will use ACF and PACF charts to determine the optimal choice of the ARIMA parameters to provide the most accurate forecasts of US Real GDP.

# ACF Plots
rgdp |>
  ACF(value, lag_max = 20) |>
  autoplot() +
  labs(title = 'ACF Plot of US Real GDP',
       x = 'Lags (Quarter)',
       y = 'ACF')

  • The goal of using an ACF plot is to determine the order of the MA part of an Arima model. There will be a spike at the lag number on the ACF plot which corresponds to how many previous error terms to include in an ARIMA model.

  • According to the ACF plot above, there is no significant spike, but instead there is a gradual decay in the autocorrelation function as lag number increases.

  • Therefore, the optimal \(ARIMA(p,d,q)\) will likely have a value of 0 for \(q\), indicating that the forecasts will not account for past forecasting errors when generating new forecasts. This means that much of the forecasting signal is already captured in the Real GDP time series and not much noise is contained in external factors that could be influencing Real GDP.

Next, I have included a PACF plot below to determine the optimal value of \(p\) in the \(ARIMA(p,d,q)\) model.

# PACF Plot
rgdp |>
  PACF(value, lag_max = 20) |>
  autoplot() +
  labs(title = 'PACF Plot of US Real GDP',
       x = 'Lags (Quarter)',
       y = 'PACF')

  • The PACF plot helps us determine the number of past values of Real GDP to use in our ARIMA model.

  • According to the PACF plot above, there is a significant spike in the PACF at lag 1 and no other significant spikes.

  • This means that the value of p should be 1, meaning that the value of US real GDP in the previous quarter should be used to make forecasts.

After graphing the original time series and visualizing the ACF PACF plots, I have concluded that the optimal ARIMA model will be \(ARIMA(1,1,0)\). Forecasts generated using this ARIMA specification will difference the time series once, use past Real GDP observed values from one previous quarter, and no past forecasting errors to generate on step ahead forecasts.

Now, we will automatically fit an ARIMA model using the fpp3 package and confirm if the optimal ARIMA specification matches the \(ARIMA(1,1,0)\) specification that I determined above.

# Splitting into 80% for train and 20% for test
rgdp_train <- rgdp |> filter_index(~'2021 Q3')
rgdp_test <- rgdp |> filter_index('2021 Q4'~.)

# Fitting the ARIMA model
rgdp_fit <- rgdp_train |>
  model(
    arima = ARIMA(value)
  )

# Displaying the ARIMA model specification
rgdp_fit |> report()
Series: value 
Model: ARIMA(0,1,1) w/ drift 

Coefficients:
          ma1  constant
      -0.2849   86.6126
s.e.   0.1412   25.4190

sigma^2 estimated as 84890:  log likelihood=-467.2
AIC=940.39   AICc=940.78   BIC=946.96
  • According to the model report of the automatic ARIMA model, the optimal specification is an \(ARIMA(0,1,1)\) with a drift which is different from my conclusion from using the ACF and PACF charts.

  • This means that the automatic ARIMA model will use no past observed real GDP values to make forecasts, but will instead use past forecasting errors of one previous period to generate forecasts.

I have generated forecasts for the automatic ARIMA model below:

# Generating forecasts for automatic ARIMA model
rgdp_fc <- rgdp_fit |>
  forecast(h = nrow(rgdp_test))

# Generating accuracy metric table
rgdp_metrics <- accuracy(rgdp_fc, rgdp_test) |>
  select(-MASE, -RMSSE, -ACF1, -.type) |>
  rename('Model' = '.model')

# Graphing the forecasts
rgdp_fc |> autoplot(rgdp_train) +
  geom_line(data = rgdp_test, mapping = aes(y = value), linetype = 'dashed') +
  labs(title = 'Real GDP ARIMA(0,1,1) Forecasts',
       x = 'Time (Quarter)',
       y = 'Real GDP') 

rgdp_metrics |> kable()
Model ME RMSE MAE MPE MAPE
arima 600.9214 671.4701 600.9214 2.581523 2.581523
  • The forecasting graph shows that the ARIMA forecasts consistently underestimate the actual Real GDP values in the testing set period. This can be seen in the metric table where the Mean Error (ME) is about 600, confirming that the model underestimates the observed values since errors are calculated as actual - predicted.

  • More specifically, the ME is exactly the same as the Mean Absolute Error (MAE) and the Mean Percentage Error (MPE) is exactly the same as the Mean Absolute Percentage Error (MAPE). This happens because the ARIMA model always underestimates the observed Real GDP values and therefore causes the errors to always be positive.

  • The 95% confidence intervals are fairly narrow and stable indicating the certainty of the ARIMA model and the MAPE of only just above 2 indicates that the predictions from the ARIMA are fairly accurate as we can see in the graph where the ARIMA forecasts capture the general trend Real GDP.

Part II

Now, we experiment with using seasonal ARIMA and aus_arrivals time series from the fpp3 package. This time series contains quarterly data on international arrivals to Australia from several different countries.

I have loaded in the aus_arrivals time series below and have included a graph showing specifically international arrivals into Australia from New Zealand.

nz_aus <- aus_arrivals |> filter(Origin == 'NZ')

nz_aus |> autoplot() +
  labs(title = 'New Sealand Arrivals into Australia')
Plot variable not specified, automatically selected `.vars = Arrivals`

  • We can clearly see the seasonality in the data above where New Zealand arrivals into Australia typically peak during Quarter 3, corresponding to the summer months for both Australia and New Zealand.

  • In contrast, arrivals from New Zealand into Australia are usually the lowest during the winter months in Quarter 1.

We will now fit a seasonal ARIMA model using automatic selection in the fpp3 package and discuss the model parameters and coefficients.

nz_aus_fit <- nz_aus |> 
  model(
  sarima = ARIMA(Arrivals)
)

nz_aus_fit |> report()
Series: Arrivals 
Model: ARIMA(2,0,0)(1,1,2)[4] w/ drift 

Coefficients:
         ar1     ar2     sar1    sma1     sma2   constant
      0.5830  0.1772  -0.7022  0.2447  -0.5612  3091.1032
s.e.  0.0898  0.0901   0.1866  0.1736   0.0964   762.7256

sigma^2 estimated as 1.51e+08:  log likelihood=-1331.03
AIC=2676.07   AICc=2677.04   BIC=2695.75

Parameter and Coefficient Interpretations

  • As we can see from the model report above, the optimal ARIMA specification is a seasonal ARIMA model with 2 non-seasonal autoregressive terms, no 1st differencing, no non-seasonal forecasting errors, 1 seasonal autoregressive term, 1 seasonal difference, and 2 seasonal forecasting errors with a drift to capture the general upward trend in arrivals from New Zealand into Australia over time.

  • \(ar1=0.583\) and \(ar2=0.1772\)

    • Both non-seasonal autoregressive coefficients are positive which suggest that the time series is more persistent and won’t oscillate very much.

    • Looking at the time series, this is true since the general trend of the number of arrivals from New Zealand to Australia steadily increases over time.

  • \(sar1=-0.7022\)

    • This seasonal autoregressive coefficient coefficient is negative which suggests that a low value in a specific quarter in the current year will predict high values in that same quarter for the next year.

    • This seems counterintuitive to what we found in the time series graph where arrivals seem to always peak during the summer months and drop during the winter months.

    • However, this could reflect the general upward trend in arrivals. For example, arrivals in Quarter 3 of this year are likely higher than they were in Quarter 3 of the previous year, and lower than they will be in the next year.

  • \(sma1=0.2447\) and \(sma2=-0.5612\)

    • The positive seasonal moving average coefficient suggests that past seasonal forecasting errors in the previous season persist in the same direction.

    • However, the negative seasonal moving average coefficient suggests that the past seasonal forecasting error from two seasons ago reverse the shock in the next corresponding season. For example, positive shocks from two seasons ago will be corrected and the model will expect a low value in the current season.

  • \(constant=3,091\)

    • A positive value for the constant term indicates that the model forecasts will drift upwards which is exactly what we see in the forecast graph.

Now that we have discussed the seasonal ARIMA parameters and interpreted their coefficients, it’s time to generate forecasts and plot them.

nz_aus_fc <- nz_aus_fit |>
  forecast(h = 8)

nz_aus_fc |> autoplot(nz_aus) +
  labs(title = 'Australian Arrivals from New Zealand')

  • Above, we can see that the seasonal ARIMA forecasts are able to capture the seasonality of the Australian arrivals data very well and even capture the general upward trend quite well.

Part III

We will now compare white noise, random walk, and random walk with a drift time series. We will compare each of their equations to see how they are related mathematically and then compare how they look visually using graphs to plot out those time series.

Mathematical Equation Comparison

  • White Noise

    \[ y_t = \varepsilon_t \]

    • where \(y_t\) is the current value of the measured variable and \(\varepsilon_t\) is some random component.

    • The white noise model assumes that the value that \(y_t\) takes on is completely random and does not depend on any historical values of \(y\).

    • White noise models also have a mean of zero.

  • Random Walk

    \[ y_t=y_{t-1}+\varepsilon_t \]

    • where \(y_{t-1}\) is the observed value of \(y\) from the previous period.

    • Random walk models assume that the value that \(y_t\) takes on is the value of its past observed value from the most recent period plus some random component.

    • If you difference a random walk model (\(y_t-y_{t-1}\)) then you will obtain a white noise series.

    • Random walks will usually have long upward or downward trends and can change directly very suddenly.

  • Random Walk w/ Drift

    \[ y_t=c+y_{t-1}+\varepsilon_t \]

    • where \(c\) is a constant term that acts as the average between consecutive observations.

    • If \(c\) is positive, then the series will drift upwards and if \(c\) is negative, then the series will drift downwards.

    • Differencing a random walk with drift series will result in a white noise series with a non-zero mean if the constant \(c\neq0\).

Visual Comparison

Now that we have compared each of the equations of white noise, random walks, and random walks with a drift series and their properties, it’s now time to visualize them.

I have simulated data and graphed all three of the series on the same figure for easier comparison.

# Setting a seed
set.seed(123)

# White Noise Series
wn <- tsibble(
  time = 1:100,
  value = rnorm(100),
  index = time
)

wn_graph <- wn |> autoplot(value) +
  labs(title = 'White Noise Simulated Time Series',
       x = 'Time',
       y = 'Value')

# Random Walk Series
rw <- tsibble(
  time = 1:100,
  value = cumsum(rnorm(100)),
  index = time
)

rw_graph <- rw |> autoplot(value) +
  labs(title = 'Random Walk Simulated Time Series',
       x = 'Time',
       y = 'Value')

# Random Walk with Drift Time Series
rw_d <- tsibble(
  time = 1:100,
  value = cumsum(rnorm(100, mean = 0.25)),
  index = time
)

rw_d_graph <- rw_d |> autoplot(value) +
  labs(title = 'Random Walk With Drift Simulated Time Series',
       x = 'Time',
       y = 'Value')

# Graphing all the time series on same figure
(wn_graph + rw_graph) / rw_d_graph

Above are the graphs of the simulated white noise (top left), random walk (top right), and random walk with drift (bottom) time series. We will now dissect the characteristics of each simulated time series series.

Trend

  • White Noise: No consistent trend and the time series fluctuates greatly around zero.

  • Random Walk: The series has a general downward trend, but occasionally will trend upward randomly at times.

  • Random Walk w/ drift: Clear upward trend over time because the constant value is 0.25 which is positive.

Shock

  • White Noise: Shocks or random changes occur at almost every point in time in the series.

  • Random Walk: Random changes occur quite frequently, although they are not as drastic as for the white noise series.

  • Random Walk w/ drift: Shocks are relatively infrequent and mild as the series has a clear and consistent upward trend over time. There is some willingness in the series however, indicating some slight shocks.

Memory

  • White Noise: There is zero memory for white noise since the shocks occur at random at every point in time. A shock in the previous period has no influence on the value of the time series at the current period.

  • Random Walk: The shocks are fairly short-lived since the value of the current period does somewhat depend on the observed value at the previous period.

  • Random Walk w/ drift: This series has permanent memory since the general upward trend of the time series will go on forever because of the positive constant value of 0.25.

Stationarity

  • White Noise: The time series is stationary since the time series’ mean, variance, and covariance stay constant over time.

  • Random Walk: The variance and covariance of the time series appear to remain constant over time, however, the mean of the series seems to depend on the point in time. Therefore, it is not stationary.

  • Random Walk w/ drift: This series is definitely not stationary because of its clear upward trend, meaning that the mean of the series will change over time and not stay constant.

Forecast

# White noise forecast
wn_fit <- wn |>
  model(
    arima = ARIMA(value)
  )
wn_fc <- wn_fit |> forecast(h = 10)

wn_fc_graph <- wn_fc |>
  autoplot(wn) +
  labs(title = 'White Noise ARIMA Forecast',
       x = 'Time',
       y = 'Value')

# Random Walk forecast
rw_fit <- rw |>
  model(
    arima = ARIMA(value)
  )
rw_fc <- rw_fit |> forecast(h = 10)

rw_fc_graph <- rw_fc |>
  autoplot(rw) +
  labs(title = 'Random Walk ARIMA Forecast',
       x = 'Time',
       y = 'Value')

# Random Walk with Drift forecast
rw_d_fit <- rw_d |>
  model(
    arima = ARIMA(value)
  )
rw_d_fc <- rw_d_fit |> forecast(h = 10)

rw_d_fc_graph <- rw_d_fc |>
  autoplot(rw_d) +
  labs(title = 'Random Walk w/ Drift ARIMA Forecast',
       x = 'Time',
       y = 'Value')

# Plotting the forecasts together
(wn_fc_graph + rw_fc_graph) / rw_d_fc_graph

  • White Noise: Above, we can see that using an ARIMA model to forecast the white noise results in a flat and constant forecast because the values are always oscillating around the mean of zero.

  • Random Walk: The graph on the top right also shows that the ARIMA forecasts are constant and flat. Even though the random walk series has a general downward trend, because of the random nature of the series, this trend will not likely continue and the model predicts the future values to just be the mean value of the historical values.

  • Random Walk w/ drift: The graph on the bottom clearly shows that the ARIMA forecasts have an upward trend which reflect the increasing trend of the random walk time series with a drift.