library(fpp2) #in case of fpp3 package issues
library(fpp3)
library(forecast)
library(ggplot2)
library(urca)

Background

The purpose of this assignment was to explore the ARIMA exercises from Forecasting: Principles and Practice.


9.1

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

  • Explain the differences among these figures. Do they all indicate that the data are white noise?
  • Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The difference between these figures is that the size of the spikes reduces from the first figure through the last the location of the spikes differs between the figures being that the numbers are generated at random. Being that there are no clear patterns in the figures, yes, they are all indicative of white noise.

The critical values are at different distances because as the sample size ‘T’ increases, the critical value decreases. Their relationship can be defined as 1.96 / sqrt( T * d). The autocorrelations are different because we’re generating numbers at random in the time series and so the relationship between these numbers will be random (and more visible) for a smaller sample while leveling out over a greater sample size.

Note: The autocorrelation function measures the correlation of a time series with itself, where observations in question are separated by k units in time. The autocorrelation function can be useful in identifying ARIMA models because we can use it to examine the significance of spikes at each lag and determine whether there are patterns.

Refs:

  1. https://support.minitab.com/en-us/minitab/18/help-and-how-to/modeling-statistics/time-series/how-to/autocorrelation/interpret-the-results/autocorrelation-function-acf/

  2. https://stats.stackexchange.com/questions/185425/how-to-determine-the-critical-values-of-acf

9.2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

#daily closing price, ACF, and PACF for Amazon stock
gafa_stock %>%
    filter(Symbol == 'AMZN') %>%
    gg_tsdisplay(Close, plot_type = 'partial')

A stationary series has no predictable patterns in the long term. No trend and no seasonality. From the plots above we observe:

  • Amazon’s closing price follows a clear, upward trend in the daily closing price plot,
  • a large, positive r1 value with a slowly decreasing value in the ACF plot, and
  • high initial value highlighting strength of correlation between consecutive observations in the PACF plot.

The case for differencing, thus, is a strong one. It can help stabilize the variance and mean of the time series by removing changes in level and therefore reducing / eliminating trend and seasonality.

9.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

  • Turkish GDP from global_economy.
  • Accommodation takings in the state of Tasmania from aus_accommodation.
  • Monthly sales from souvenirs.

We visit the untransformed and transformed version of Turkish GDP data for comparison:

#filter data for Turkish economy
turkey_econ <- global_economy %>%
    filter(Country == 'Turkey')

#Plot un-transformed GDP
autoplot(turkey_econ)

#Transformed Turkish economy GDP
##calculate lambda
#BoxCox.lambda(turkey_econ$GDP) #0.1571804

##apply BoxCox and plot
turkey_econ %>%
    autoplot(BoxCox(GDP, BoxCox.lambda(turkey_econ$GDP)))

The Box Cox transformation, with a lambda of 0.1571804, reduces variation in data (ie. GDP spike from ~2000-2010) and results in a more linear relationship between GDP and Year.

With regard to differencing, we utilize the ndiffs() to determine the number of differences in order to obtain stationary data:

ndiffs(turkey_econ$GDP)
## [1] 2

Based on the output of this function we determine that 2 differences are required to obtain stationary data.

Next, we visit the untransformed and transformed version of Tasmanian accommodation data for comparison:

##filter data for Tasmanian accomodation
tasmanian_accom <- aus_accommodation %>%
    filter(State == 'Tasmania')

#simplify dataset and plot
#tasmanian_takings <- subset(tasmanian_accom, select = -c(State,Occupancy,CPI) )
tasmanian_accom %>% autoplot(Takings)

##calculate lambda
#BoxCox.lambda(tasmanian_takings$Takings) #-0.005076712

##apply BoxCox and plot
tasmanian_accom %>%
    autoplot(BoxCox(Takings, BoxCox.lambda(tasmanian_accom$Takings)))

The Box Cox transformation, with a lambda of -0.005076712, reduces variation in data and results in a more linear relationship between Takings and Quarter.

With regard to differencing, we utilize the ndiffs() to determine the number of differences in order to obtain stationary data:

ndiffs(tasmanian_accom$Takings)
## [1] 1

Based on the output of this function we determine that 1 difference is required to obtain stationary data.

Next, we visit the untransformed and transformed version of monthly sales from souvenirs for comparison:

#untransformed plot
souvenirs %>% autoplot()

##calculate lambda
#BoxCox.lambda(souvenirs$Sales) #-0.2444328

##apply BoxCox and plot
souvenirs %>%
    autoplot(BoxCox(Sales, BoxCox.lambda(BoxCox.lambda(souvenirs$Sales))))

The Box Cox transformation with lambda -0.2444328 does not appear to have improved the variation in souvenir sales data. While there is a slight upward trend, the seasonality is what’s most pronounced with this set.

With regard to differencing, we utilize the ndiffs() to determine the number of differences in order to obtain stationary data:

ndiffs(souvenirs$Sales)
## [1] 1

Based on the output of this function we determine that 1 difference is required to obtain stationary data.

9.5

For your retail data (from Exercise 8 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

First, we revisit our retail data series:

set.seed(333)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

#autoplot(myseries)

myseries %>%
    gg_tsdisplay(Turnover, plot_type = 'partial')

A stationary series has no predictable patterns in the long term. No trend and no seasonality. There appears to be a fluctuating trend and seasonality to our data. Additionally, there are numerous lagged values that spike well above the threshold. Our aim via differencing is to stabilize the variance and mean of the time series by removing changes in level and therefore reducing / eliminating trend and seasonality.

We determine the number of differences:

ndiffs(myseries$Turnover)
## [1] 1

From here we proceed to differencing ONE time and determine if our data is stationary:

myseries$Turnover %>% 
    diff(differences = 1) %>% 
    ur.kpss() %>% 
    summary()
## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0408 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

Applying a KPSS Unit Root Test helps to determine if the time series is stationary or non-stationary due to a unit root (if one of the roots is equal to one).

Being that the Null hypothesis is stationarity, our low test-statistic is inicative of stationarity. Our retail data only required one differencing to reach stationarity.

9.6

Simulate and plot some data from simple ARIMA models.

  1. Use the following R code to generate data from an AR(1) model with ϕ = 0.6 and σ^2 = 1. The process starts with y1 = 0.
  2. Produce a time plot for the series. How does the plot change as you change ϕ?
  3. Write your own code to generate data from an MA(1) model with θ = 0.6 and σ ^ 2 = 1.
  4. Produce a time plot for the series. How does the plot change as you change θ?
  5. Generate data from an ARMA(1,1) model with ϕ = 0.6, θ = 0.6 and σ^2 = 1.
  6. Generate data from an AR(2) model with ϕ = 0.8, ϕ^2 = 0.3 and σ^2 = 1. (Note that these parameters will give a non-stationary series.)
  7. Graph the latter two series and compare them.

First we produce a time series with ϕ = 0.6 then 0 and 1.0 to observe the effect of reduction and then increase:

y <- numeric(100)
e <- rnorm(100)

for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

#ϕ=0.6, σ^2 = 1, y=0
sim %>% autoplot(y)

#decrease ϕ --> ϕ=0
for(i in 2:100)
  y[i] <- 0*y[i-1] + e[i]

sim_lower <- tsibble(idx = seq_len(100), y = y, index = idx)
sim_lower %>% autoplot(y)

#increase ϕ --> ϕ=1
for(i in 2:100)
  y[i] <- 1*y[i-1] + e[i]
sim_higher <- tsibble(idx = seq_len(100), y = y, index = idx)
sim_higher %>% autoplot(y)

A reduction of ϕ appears to lead to the reduction in magnitude and more the shape of white noise while an increase in magnitude leads to an increase in magnitude and more the shape of random walk (as mentioned in section 9.3 of the course text).

Next, we produce an MA(1) model with θ = 0.6 then 0 and 0.9 (since we want |θ| < 1):

#MA(1) model with θ = 0.6 and σ ^ 2 = 1

for(i in 2:100)
  y[i] <- 0.6*e[i-1] + e[i] #1st order moving average model 

sim_ma1 <- tsibble(idx = seq_len(100), y = y, index = idx)

#θ=0.6, σ^2 = 1
sim_ma1 %>% autoplot(y)

#decrease θ --> θ=0
for(i in 2:100)
  y[i] <- 0*e[i-1] + e[i]

sim_ma1_lower <- tsibble(idx = seq_len(100), y = y, index = idx)
sim_ma1_lower %>% autoplot(y)

#increase θ --> θ=0.9
for(i in 2:100)
  y[i] <- 0.9*e[i-1] + e[i]

sim_ma1_higher <- tsibble(idx = seq_len(100), y = y, index = idx)
sim_ma1_higher %>% autoplot(y)

A reduction of θ appears to lead to the reduction in magnitude and increase in frequency of spikes / more recent observations carry more weight while an increase has no noticeable effect on the magnitude and the output plot more closely resembles that of our original MA(1) plot.

Next we generate data for an ARMA(1,1) model with ϕ = 0.6, θ = 0.6 and σ^2 = 1 and an AR(2) model with ϕ1 = 0.8, ϕ2 = 0.3 and σ^2 = 1:

#ARMA(1,1) model with ϕ = 0.6, θ = 0.6 and σ^2 = 1 
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + 0.6*e[i-1] + e[i] #combine AR-MA models

sim_arma <- tsibble(idx = seq_len(100), y = y, index = idx)

#AR(2) model with ϕ1 = 0.8, ϕ2 = 0.3 and σ^2 = 1
for(i in 3:100)
  y[i] <- -0.8*y[i-1] + 0.3*y[i-2] + e[i] #combine AR-MA models

sim_ar2 <- tsibble(idx = seq_len(100), y = y, index = idx)

Finally, we graph and compare the latter two series:

sim_arma %>%
    gg_tsdisplay(y, plot_type = 'partial')

sim_ar2 %>%
    gg_tsdisplay(y, plot_type = 'partial')

What we see above are two non-stationary models with dramatically different plots.

  • The ARMA(1,1) model appears to have no trend or seasonality. A large, positive initial r1 which then decreases in value and fluctuates from positive to (mostly) negative in the ACF plot. High initial value highlighting strength of correlation between consecutive observations in the PACF plot.
  • The AR model fluctuates from positive nearing exponential form at later indices. The acf values fluctuate from positive to negative in a decreasing manner until they’re near within the threshold value. The pcf appears to have one initial large spike and then no following values.

I believe the case here may have been to show the need for differencing and the purpose of an ARIMA model … From the course text section 9.5: “If we combine differencing with autoregression and a moving average model, we obtain a non-seasonal ARIMA model”.

9.7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

  1. Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.
  2. Write the model in terms of the backshift operator.
  3. Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.
  4. Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.
  5. Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

We apply the ARIMA() function to find an appropriate model and forecast for the next 10 periods:

fit <- aus_airpassengers %>%
  model(ARIMA(Passengers))

#what model was chosen
report(fit)
## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65
#check that the residuals look like white noise
#checkresiduals(fit) #function did not work for me

#plot forecast for next 10 periods
fit %>% forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(y = "Millions of Passengers", title = "Australian Air Travel")

An ARIMA(0,2,1) model was selected and we observe a forecast of over 90 million by 2025. Little could anyone have predicted the effects of COVID …

In terms of the backshift operator the model is: y_t = -0.8963 * ε_t-1 + ε_t.

From here we proceed to plot forecasts from an ARIMA(0,1,0) model with drift:

fit2 <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ pdq(0,1,0)))

#plot forecast for next 10 periods
fit2 %>% forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(y = "Millions of Passengers", title = "Australian Air Travel")

It appears that our second forecast (ARIMA(0,1,0) v. ARIMA(0,2,1)), has a more gradual slope as compared to the initial ARIMA(0,2,1) model. This may be because of the variation in variables or because of the drift. When we consider drift what we’re doing is taking into account the historic rise / run rather than homing in on more recent data.

Next, we plot forecasts from an ARIMA(2,1,2) model with drift:

fit3 <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ pdq(2,1,2)))

report(fit3)
## Series: Passengers 
## Model: NULL model 
## NULL model
#plot forecast for next 10 periods
fit3 %>% forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(y = "Millions of Passengers", title = "Australian Air Travel")

The ARIMA(2,1,2) model generated a NULL model. Thus, the output plot does not include forecasts. This may be because of my implementation or it may be because of the algorithm itself. A possible shortcoming of automated techniques …

As for the final request to use an ARIMA(0,2,1) model, that was the model selected when we used the ARIMA function. We can refer back to that section for plot discussion.

9.8

For the United States GDP series (from global_economy):

  1. if necessary, find a suitable Box-Cox transformation for the data;
  2. fit a suitable ARIMA model to the transformed data using ARIMA();
  3. try some other plausible models by experimenting with the orders chosen;
  4. choose what you think is the best model and check the residual diagnostics;
  5. produce forecasts of your fitted model. Do the forecasts look reasonable?
  6. compare the results with what you would obtain using ETS() (with no transformation).

We start by filtering for American economic data, visiting the corresponding timeseries plot and histogram:

us_econ <-global_economy %>%
    filter(Code == "USA") 

us_econ %>%
    autoplot(GDP) +
    labs(title="Un-transformed American GDP",
         y="GDP")

Although an upward trend is apparent, there is no seasonality to our data (it’s yearly) and thus there does not appear to be evidence of changing variance or need for Box-Cox transformation.

We proceed to apply an ARIMA model:

fit <- us_econ %>%
  model(ARIMA(GDP))

#what model was chosen
report(fit)
## Series: GDP 
## Model: ARIMA(0,2,2) 
## 
## Coefficients:
##           ma1      ma2
##       -0.4206  -0.3048
## s.e.   0.1197   0.1078
## 
## sigma^2 estimated as 2.615e+22:  log likelihood=-1524.08
## AIC=3054.15   AICc=3054.61   BIC=3060.23

Our first model is an ARIMA(0,2,2) model:

  • the autoregressive part (p) = 0
  • the number of differences taken (d) = 2
  • the order of the moving average part (q) = 2
  • the backshift operator the model is: y_t = -0.4206 * ε_t-2 -0.3048 * ε_t-1 + ε_t.
  • the AICc = 3054.61. The corrected AIC appears to be high but the Hyndman-Khandakar algorithm is aimed at optimizing here so it’s more-likely-than-not the best fit.

From here we vary the orders chosen to see if there’s a positive impact on AICc. We’ll ultimately use the AICc to determine which model holds the strongest predictive potential.

To start, we check the optimal number of differences:

ndiffs(us_econ$GDP)
## [1] 2

It’s 2 and with this in mind, we hold d = 2 steady as we vary the order of p (the autoregressive portion):

#vary the order of the autoregressive portion
##p = 1
fit_p1 <- us_econ %>%
  model(ARIMA(GDP ~ pdq(1,2,2)))

#what model was chosen

report(fit_p1)
## Series: GDP 
## Model: ARIMA(1,2,2) 
## 
## Coefficients:
##          ar1      ma1      ma2
##       0.2053  -0.5912  -0.1928
## s.e.  0.3008   0.2886   0.2102
## 
## sigma^2 estimated as 2.646e+22:  log likelihood=-1523.86
## AIC=3055.72   AICc=3056.51   BIC=3063.82
##p = 2
fit_p2 <- us_econ %>%
  model(ARIMA(GDP ~ pdq(2,2,2)))

#what model was chosen

report(fit_p2)
## Series: GDP 
## Model: ARIMA(2,2,2) 
## 
## Coefficients:
##          ar1      ar2      ma1     ma2
##       1.3764  -0.4780  -1.9659  1.0000
## s.e.  0.1216   0.1354   0.0723  0.0719
## 
## sigma^2 estimated as 2.283e+22:  log likelihood=-1521.14
## AIC=3052.27   AICc=3053.47   BIC=3062.4

From above we observe that:

  • p=1 produces a weaker model, higher AICc and
  • p=2 produces a stronger model, lower AICc.

Proceeding with d=2 and p=2, we vary the q variable (the moving average portion):

#vary the order of the autoregressive portion
##q = 0
fit_q0 <- us_econ %>%
  model(ARIMA(GDP ~ pdq(2,2,0)))

report(fit_q0)
## Series: GDP 
## Model: ARIMA(2,2,0) 
## 
## Coefficients:
##           ar1      ar2
##       -0.2088  -0.2059
## s.e.   0.1320   0.1317
## 
## sigma^2 estimated as 2.984e+22:  log likelihood=-1527.51
## AIC=3061.02   AICc=3061.48   BIC=3067.09
##q = 1
fit_q1 <- us_econ %>%
  model(ARIMA(GDP ~ pdq(2,2,1)))

report(fit_q1)
## Series: GDP 
## Model: ARIMA(2,2,1) 
## 
## Coefficients:
##          ar1      ar2      ma1
##       0.4321  -0.1605  -0.8028
## s.e.  0.1537   0.1405   0.0908
## 
## sigma^2 estimated as 2.619e+22:  log likelihood=-1523.61
## AIC=3055.22   AICc=3056   BIC=3063.32

From above we observe that reducing our q value to 0 or 1 produce weaker models and higher AICc values.

As such, we elect the ARIMA(2,2,2) model as the model with the greatest predictive potential and proceed to checking that the residuals look like white noise:

fit_p2 %>% gg_tsresiduals()

augment(fit_p2) %>%
  features(.innov, ljung_box, lag = 10, dof = 3)
## # A tibble: 1 x 4
##   Country       .model                    lb_stat lb_pvalue
##   <fct>         <chr>                       <dbl>     <dbl>
## 1 United States ARIMA(GDP ~ pdq(2, 2, 2))    10.7     0.155

The ACF plot of the residuals from the ARIMA(2,2,2) model shows that all autocorrelations are within the threshold limits, indicating behavior like white noise. Additionally, the portmanteau test returns a relatively large p-value (>0.05 threshold) also suggesting that the residuals are white noise.

Thus, it looks like we’ve got a favorable model and so we make our forecast:

#plot forecast for next 10 periods
fit_p2 %>% forecast(h=10) %>%
  autoplot(us_econ) +
  labs(y = "GDP", title = "American GDP")

Next, we compare to ETS:

#BELOW CHUNK - adapted from course text - DID NOT WORK
# us_econ %>%
#   slice(-n()) %>%
#   stretch_tsibble(.init = 10) %>%
#   model(
#     ETS(GDP),
#     ARIMA(GDP ~ pdq(2,2,2))
#   ) %>%
#   forecast(h = 1) %>%
#   accuracy(us_econ) %>%
#   select(.model, RMSE:MAPE)

#ets_model <- us_econ %>% 
#    model(ETS(GDP))

ets_model <- us_econ$GDP %>%
    ets()

ets_model
## ETS(M,A,N) 
## 
## Call:
##  ets(y = .) 
## 
##   Smoothing parameters:
##     alpha = 0.9991 
##     beta  = 0.5012 
## 
##   Initial states:
##     l = 448093333333.703 
##     b = 64917355686.8708 
## 
##   sigma:  0.026
## 
##      AIC     AICc      BIC 
## 3190.787 3191.941 3201.089

I’d initially tried to compare the ETS model to our ARIMA(2,2,2) model by adapting the code for the course text but that hadn’t worked for me. As an adaptation, I built the ETS model and checked the AICc as a means of measuring predictive promise.

After producing our ETS model we observe that our ARIMA(2,2,2) model had a far lower AICc. Thus, it appears our ARIMA(2,2,2) model was superior.