DATA624_HW6_CameronGray

library(fpp3)
library(tidyverse)
library(fable)
library(latex2exp)

9.1

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise?

Figure 9.32: Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers.

Since each of these graphs represent vastly different quantities of data we have large disparities in the dashed blue lines since they are dependent on the size of the time series. In addition the values get smaller with more data because the variance will decrease in the calculated acf due to having more data, this produces a smaller average over the course of all the lags. Since for all of these graphs the values all appear within the range of the dashed blue lines it is reasonable to assume that the data are all white noise.

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The critical distances are determined by the length of the time series, specifically by the expression \(\pm\frac{1.96}{\sqrt{T}}\) where \(T\) is the length of the time series. Thus by increasing the length of the time series we have decreased the distance of the critical values. Since white noise is different they should be expected to be different because white noise should not be discernable from random changes.

9.2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

The first plot of just the closing price shows that there is an upward trend to the data which would disqualify it from being stationary. The ACF plot shows that for all the lags the acf is outside the bounds of the dashed blue lines, this indicates that there is trend to the data since it is not just white noise. Similarly for the PACF there are at least 4 different lag values with an pacf outside the blue bounds indicating that the data is not white noise and thus non-stationary.

amzn_close <- gafa_stock |>
  filter(Symbol == 'AMZN') |>
  select(Close) |>
  print()

## # A tsibble: 1,258 x 2 [!]
##    Close Date      
##    <dbl> <date>    
##  1  398. 2014-01-02
##  2  396. 2014-01-03
##  3  394. 2014-01-06
##  4  398. 2014-01-07
##  5  402. 2014-01-08
##  6  401. 2014-01-09
##  7  398. 2014-01-10
##  8  391. 2014-01-13
##  9  398. 2014-01-14
## 10  396. 2014-01-15
## # ℹ 1,248 more rows

amzn_close |>
  gg_tsdisplay(plot_type = "partial")

9.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Turkish GDP from global_economy.

#Showing initial data
turkish_gdp <- global_economy |>
  filter(Country == 'Turkey') |> 
  select(GDP)

turkish_gdp |>
  gg_tsdisplay(plot_type = "partial") +
  ggtitle(paste("Turkish GDP"))

#Applying Box-Cox transform
lambda <- turkish_gdp |>
  features(GDP, features = guerrero) |>
  pull(lambda_guerrero)

bc_turkish_gdp <- turkish_gdp |>
  mutate(GDP = box_cox(turkish_gdp$GDP, lambda))

bc_turkish_gdp |>
  gg_tsdisplay(plot_type = "partial") +
  ggtitle(paste("Turkish GDP w/ Box-Cox = ", as.character(lambda)))

#Testing Box-Cox transform
bc_turkish_gdp |>
  features(GDP, unitroot_kpss) |>
  print()

## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1      1.50        0.01

#Finding suggested difference

bc_turkish_gdp |>
  features(GDP, unitroot_ndiffs) |>
  print()

## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      1

#Applying difference
diff_bc_turkish_gdp <- bc_turkish_gdp |> 
  mutate(GDP = difference(GDP)) 

diff_bc_turkish_gdp |>
  gg_tsdisplay(plot_type = "partial") +
  ggtitle(paste("Turkish GDP w/ Box-Cox = ", as.character(lambda), " & Diff = 1"))

#Testing differenced data
diff_bc_turkish_gdp |>
  features(GDP, unitroot_kpss) |>
  print()

## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1    0.0889         0.1

Accommodation takings in the state of Tasmania from aus_accommodation.

#Showing initial data
taz_take <- aus_accommodation |>
  filter(State == 'Tasmania') |>
  select(Takings)

taz_take |>
  gg_tsdisplay(plot_type = "partial") +
  ggtitle("Tasmania Takings")

#Applying Box-Cox transform
lambda <- taz_take |>
  features(Takings, features = guerrero) |>
  pull(lambda_guerrero)

bc_taz_take <- taz_take |>
  mutate(Takings = box_cox(taz_take$Takings, lambda))

bc_taz_take |>
  gg_tsdisplay(plot_type = "partial") +
  ggtitle(paste("Tasmania Takings w/ Box-Cox = ", as.character(lambda)))

#Testing Box-Cox transform
bc_taz_take |>
  features(Takings, unitroot_kpss) |>
  print()

## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1      1.84        0.01

#Finding suggested difference

bc_taz_take |>
  features(Takings, unitroot_ndiffs) |>
  print()

## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      1

#Applying difference
diff_bc_taz_take <- bc_taz_take |> 
  mutate(Takings = difference(Takings)) 

diff_bc_taz_take |>
  gg_tsdisplay(plot_type = "partial") +
  ggtitle(paste("Tasmania Takings w/ Box-Cox = ", as.character(lambda), " & Diff = 1"))

#Testing differenced data
diff_bc_taz_take |>
  features(Takings, unitroot_kpss) |>
  print()

## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1     0.258         0.1

Monthly sales from souvenirs.

#Showing initial data
souvenirs |>
  gg_tsdisplay(plot_type = "partial") +
  ggtitle("Souvenirs")

#Applying Box-Cox transform
lambda <- souvenirs |>
  features(Sales, features = guerrero) |>
  pull(lambda_guerrero)

bc_souvenirs <- souvenirs |>
  mutate(Sales = box_cox(souvenirs$Sales, lambda))

bc_souvenirs |>
  gg_tsdisplay(plot_type = "partial") +
  ggtitle(paste("Tasmania Sales w/ Box-Cox = ", as.character(lambda)))

#Testing Box-Cox transform
bc_souvenirs |>
  features(Sales, unitroot_kpss) |>
  print()

## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1      1.79        0.01

#Finding suggested difference

bc_souvenirs |>
  features(Sales, unitroot_ndiffs) |>
  print()

## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      1

#Applying difference
diff_bc_souvenirs <- bc_souvenirs |>
  mutate(Sales = difference(Sales))

diff_bc_souvenirs |>
  gg_tsdisplay(plot_type = "partial") +
  ggtitle(paste("Tasmania Sales w/ Box-Cox = ", as.character(lambda), " & Diff = 1"))

#Testing differenced data
diff_bc_souvenirs |>
  features(Sales, unitroot_kpss) |>
  print()

## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1    0.0631         0.1

9.5

For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1)) |>
  select(Turnover)

myseries |>
  gg_tsdisplay(plot_type = "partial") +
  ggtitle("Australian Turnover")

#Testing initial data
myseries |>
  features(Turnover, unitroot_kpss) |>
  print()

## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1      5.72        0.01

#Applying Box-Cox transform
lambda <- myseries |>
  features(Turnover, features = guerrero) |>
  pull(lambda_guerrero)

bc_myseries <- myseries |>
  mutate(Turnover = box_cox(myseries$Turnover, lambda))

bc_myseries |>
  gg_tsdisplay(plot_type = "partial") +
  ggtitle(paste("Australian Turnover w/ Box-Cox = ", as.character(lambda)))

#Testing Box-Cox transform
bc_myseries |>
  features(Turnover, unitroot_kpss) |>
  print()

## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1      5.43        0.01

#Finding suggested difference
bc_myseries |>
  features(Turnover, unitroot_ndiffs) |>
  print()

## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      1

#Applying difference
diff_bc_myseries <- bc_myseries |>
  mutate(Turnover = difference(Turnover))

diff_bc_myseries |>
  gg_tsdisplay(plot_type = "partial") +
  ggtitle(paste("Australian Turnover w/ Box-Cox = ", as.character(lambda), " & Diff = 1"))

#Testing differenced data
diff_bc_myseries |>
  features(Turnover, unitroot_kpss) |>
  print()

## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1    0.0243         0.1

9.6

Simulate and plot some data from simple ARIMA models.

Use the following R code to generate data from an AR(1) model with \(\phi_1 = 0.6\) and \(\sigma^2 = 1\). The process starts with \(y_1 = 0\)

set.seed(314156295)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

Produce a time plot for the series. How does the plot change as you change \(\phi_1\)?

sim |> autoplot()

for(i in 2:100)
  y[i] <- 0.0*y[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) |> autoplot() + ggtitle(TeX('$\\phi_1 = 0$'))

for(i in 2:100)
  y[i] <- 0.2*y[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) |> autoplot() + ggtitle(TeX('$\\phi_1 = 0.2$'))

for(i in 2:100)
  y[i] <- 0.4*y[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) |> autoplot() + ggtitle(TeX('$\\phi_1 = 0.4$'))

sim |> autoplot() + ggtitle(TeX('$\\phi_1 = 0.6$'))

for(i in 2:100)
  y[i] <- 0.8*y[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) |> autoplot() + ggtitle(TeX('$\\phi_1 = 0.8$'))

for(i in 2:100)
  y[i] <- 1.0*y[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) |> autoplot() + ggtitle(TeX('$\\phi_1 = 1.0$'))

Write your own code to generate data from an MA(1) model with\(\phi_1 = 0.6\) and \(\sigma^2 = 1\).

set.seed(314156295)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*e[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

Produce a time plot for the series. How does the plot change as you change \(\theta_1\)?

sim |> autoplot()

for(i in 2:100)
  y[i] <- 0.0*e[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) |> autoplot() + ggtitle(TeX('$\\theta_1 = 0$'))

for(i in 2:100)
  y[i] <- 0.2*e[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) |> autoplot() + ggtitle(TeX('$\\theta_1 = 0.2$'))

for(i in 2:100)
  y[i] <- 0.4*e[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) |> autoplot() + ggtitle(TeX('$\\theta_1 = 0.4$'))

sim |> autoplot() + ggtitle(TeX('$\\theta_1 = 0.6$'))

for(i in 2:100)
  y[i] <- 0.8*e[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) |> autoplot() + ggtitle(TeX('$\\theta_1 = 0.8$'))

for(i in 2:100)
  y[i] <- 1.0*e[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) |> autoplot() + ggtitle(TeX('$\\theta_1 = 1.0$'))

Generate data from an ARMA(1,1) model with \(\phi_1 = 0.6\), \(\theta_1 = 0.6\) and \(\sigma^2 = 1\).

set.seed(314156295)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + 0.6*e[i-1] + e[i]
arma11 <- tsibble(idx = seq_len(100), y = y, index = idx)

Generate data from an AR(2) model with \(\phi_1 = -0.8\), \(\phi_2 = 0.3\) and \(\sigma^2 = 1\). (Note that these parameters will give a non-stationary series.)

set.seed(314156295)
y <- numeric(100)
e <- rnorm(100)
for(i in 3:100)
  y[i] <- -0.8*y[i-1] + 0.3*y[i-2] + e[i]
ar2 <- tsibble(idx = seq_len(100), y = y, index = idx)

Graph the latter two series and compare them.

arma11 |> autoplot() + ggtitle(TeX('ARMA(1,1) $\\phi_1 = 0.6$, $\\theta_1 = 0.6$, $\\sigma^2 = 1$'))

ar2 |> autoplot() + ggtitle(TeX('AR(2) $\\phi_1 = -0.8$, $\\phi_2 = 0.3$, $\\sigma^2 = 1$'))

9.7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

ARIMA(0,2,1) was selected and the residuals do indeed look like white noise with a normal distribution and the ACF being within the dashed bounds.

arima_air <- aus_airpassengers|>
  model(arima = ARIMA(Passengers)) |> 
  print()

## # A mable: 1 x 1
##            arima
##          <model>
## 1 <ARIMA(0,2,1)>

arima_air |>
  select(arima) |>
  gg_tsresiduals()

arima_air |>
  forecast(h=10) |>
  filter(.model=='arima') |>
  autoplot(aus_airpassengers)

Write the model in terms of the backshift operator. \[\begin{align*} (1-B)^2y_t &= c+(1+\theta_1B)\varepsilon_t \\ (1-2B+B^2)y_t &= c+(1+\theta_1B)\varepsilon_t \\ y_t-2By_t+B^2y_t &= c+\varepsilon_t+\theta_1B\varepsilon_t \\ y_t-2y_{t-1}+y_{t-2} &= c+\varepsilon_t+\theta_1\varepsilon_{t-1} \\ y_t &= 2y_{t-1}-y_{t-2} + c+\varepsilon_t+\theta_1\varepsilon_{t-1} \end{align*}\]
Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

The slope of the forecast is slightly shallower and the error projection has a different shape but it remains largely the same.

arima_air <- aus_airpassengers|>
  model(arima = ARIMA(Passengers~ 1+pdq(0,1,0)))

arima_air |>
  forecast(h=10) |>
  filter(.model=='arima') |>
  autoplot(aus_airpassengers)

Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

This model looks a lot like c but instead of being linear it has a bit of a wobble likely due to the increase in the moving average order. Removing the constant leads to an error in the ARIMA model, the constant is needed for this order selection.

arima_air <- aus_airpassengers|>
  model(arima = ARIMA(Passengers~ 1+pdq(2,1,2)))

arima_air |>
  forecast(h=10) |>
  filter(.model=='arima') |>
  autoplot(aus_airpassengers)

arima_air <- aus_airpassengers|>
  model(arima = ARIMA(Passengers~ pdq(2,1,2)))

Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

The model appears to have a bit more of an exponential shape to it rather than the linear models from a and c. The error bounds appear to be smaller as well but it is hard to tell if this is from the model itself or because the scale of the axes changed.

arima_air <- aus_airpassengers|>
  model(arima = ARIMA(Passengers~ 1+pdq(0,2,1)))

arima_air |>
  forecast(h=10) |>
  filter(.model=='arima') |>
  autoplot(aus_airpassengers)

9.8

For the United States GDP series (from global_economy):

if necessary, find a suitable Box-Cox transformation for the data

us_gdp <- global_economy |>
  filter(Country == 'United States') |>
  select(GDP)

us_gdp |> autoplot()

lambda <- us_gdp |>
  features(GDP, features = guerrero) |>
  pull(lambda_guerrero)

bc_us_gdp <- us_gdp |>
  mutate(GDP = box_cox(us_gdp$GDP, lambda))

bc_us_gdp |>
  gg_tsdisplay(plot_type = "partial") +
  ggtitle(paste("Australian Turnover w/ Box-Cox = ", as.character(lambda)))

fit a suitable ARIMA model to the transformed data using ARIMA()

arima_us_gdp <- bc_us_gdp |>
  model(arima = ARIMA()) |>
  print()

## # A mable: 1 x 1
##                     arima
##                   <model>
## 1 <ARIMA(1,1,0) w/ drift>

try some other plausible models by experimenting with the orders chosen

arima_us_gdp <- bc_us_gdp |>
  model(arima100 = ARIMA(GDP ~ 1+pdq(1,0,0)),
        arima011 = ARIMA(GDP ~ 1+pdq(0,1,1)),
        arima001 = ARIMA(GDP ~ 1+pdq(0,0,1)),
        arima210 = ARIMA(GDP ~ 1+pdq(2,1,0)),
        arima220 = ARIMA(GDP ~ 1+pdq(2,2,0)),
        arima120 = ARIMA(GDP ~ 1+pdq(1,2,0)),
        arima = ARIMA()) |> 
  print()

## # A mable: 1 x 7
##                 arima100                arima011               arima001
##                  <model>                 <model>                <model>
## 1 <ARIMA(1,0,0) w/ mean> <ARIMA(0,1,1) w/ drift> <ARIMA(0,0,1) w/ mean>
## # ℹ 4 more variables: arima210 <model>, arima220 <model>, arima120 <model>,
## #   arima <model>

arima_us_gdp |>
  glance() |>
  arrange(AICc) |>
  print()

## # A tibble: 7 × 8
##   .model     sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
##   <chr>       <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
## 1 arima       5479.   -325.  657.  657.  663. <cpl [1]> <cpl [0]>
## 2 arima220    6592.   -324.  656.  657.  665. <cpl [2]> <cpl [0]>
## 3 arima120    6898.   -326.  658.  658.  664. <cpl [1]> <cpl [0]>
## 4 arima011    5689.   -326.  659.  659.  665. <cpl [0]> <cpl [1]>
## 5 arima210    5580.   -325.  659.  659.  667. <cpl [2]> <cpl [0]>
## 6 arima100   57172.   -402.  810.  811.  817. <cpl [1]> <cpl [0]>
## 7 arima001 4263605.   -526. 1058. 1058. 1064. <cpl [0]> <cpl [1]>

choose what you think is the best model and check the residual diagnostics

I chose the recommended ARIMA model because it has the best values for AIC, AICc and BIC. The residuals looks good and the acf values are well within the bounds of the blue dashed lines.

arima_us_gdp |>
  select(arima) |>
  gg_tsresiduals()

e) produce forecasts of your fitted model. Do the forecasts look reasonable?

The forecast looks extremely reasonable, like a natural extension of the graph.

arima_us_gdp |>
  forecast(h=10) |>
  filter(.model=='arima') |>
  autoplot(bc_us_gdp)

f) compare the results with what you would obtain using ETS() (with no transformation).

It seems that ARIMA has much tighter error bounds for the forecast than the ETS forecast.

bc_us_gdp |>
  model(ETS(GDP)) |>
  forecast(h = 10) |>
  autoplot(bc_us_gdp)

DATA624_HW6_CameronGray

Cameron Gray

2025-02-24

9.1

9.2

9.3

9.5

9.6

9.7

9.8