Forecasting ETS and ARIMA 2.0

Introduction

These are exercises on chapter 8 and 9 from “Forecasting Principles and Practice”

9.1

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise?

Answer

All the data is white noise since the ACF (autocorrelation) chart shows nothing falls outside the blue limits. When there are fewer observations, we can notice higher autocorrelation.

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

Answer

As stated before, as N (number of observations) increases the critical value decreases in the equation.

knitr::include_graphics("9.1.jpg")

9.2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

Answer

By just plotting the chart we can notice there is a clear upward trend. At first glance we can say it is not stationary, and therefore it must be differenced. ACF shows this with all the lines showing the high degree of autocorrelation, and that these changes (differences) are not centered around a mean — there is not a constant mean.

#Bring in data
Amzn <- 
  gafa_stock %>%
  filter(Symbol == "AMZN")
 
Amzn %>%
  autoplot(Close) +
  labs(y = "Close Price"
       , x="Date"
       , title = "Amazon Stock daily closing prices") + 
  theme_ipsum_ps()

# ACF

Amzn %>% 
  ACF(Close) %>%
    autoplot() +
    labs(y = "ACF", 
         title = "ACF Amazon close price") +
  theme_ipsum_es()

# PACF

Amzn %>% 
  PACF(Close) %>%
    autoplot() +
    labs(y = "PACF", 
         title = "PACF Amazon close price") +
    theme_ipsum_es()

9.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

9.3.a

Turkish GDP from global_economy.

Answer

Data is not stationary. The tests show that differencing is needed, and when transformed and differenced, then we can go ahead and model.

# explore data
tky_gdp <- 
  global_economy %>%
  filter(Country == "Turkey") %>%
  mutate(gdp_mill = GDP/1000000)

tky_gdp %>%
  autoplot(gdp_mill) +
  labs(y = "GDP in Millions"
       , x = "Date"
       , title = "Turkey GDP"
       , subtitle = "from 1960 - 2017") +
  theme_ipsum_es() +
  geom_smooth(formula = y ~ x, method = "loess", color = "orange", size = 0.3) +
  scale_y_continuous(labels = scales::dollar)

# a. box-cos transformation

lambda_tky <- tky_gdp %>%
  features(gdp_mill, features = guerrero) %>%
  pull(lambda_guerrero)

# plot transformation

tky_gdp %>%
autoplot(box_cox(gdp_mill,lambda_tky)) +
  labs(y = "GDP transformed"
       ,x = "Date"
       ,title = latex2exp::TeX(paste0("Transformed GDP with $\\lambda$ = ",round(lambda_tky,4)))) + 
  theme_ipsum_es() + 
  geom_smooth(formula = y ~ x, method = "loess", color = "orange", size = 0.3) +
  scale_y_continuous(labels = scales::dollar)

######### check if it is actually stationary or not (Consider Dickey Fuller)

#unit root test

#differences
tky_gdp %>%
  features(box_cox(gdp_mill, lambda_tky), unitroot_ndiffs) %>%
  kbl()%>%
  kable_material(c("striped","hover", full_width = T))

Country	ndiffs
Turkey	1

#Kpss
tky_gdp %>%
  features(box_cox(gdp_mill, lambda_tky), unitroot_kpss) %>%
  kbl()%>%
  kable_material(c("striped","hover", full_width = T))

Country	kpss_stat	kpss_pvalue
Turkey	1.498625	0.01

#plot differences with box cox --checking residuals transformed vs. non-transformed

tky_gdp %>%
  gg_tsdisplay(gdp_mill, plot_type = 'partial') +
  labs(y = "GDP in Millions"
       , x = "Date"
       , title = "Turkey GDP"
       , subtitle = "from 1960 - 2017")

tky_gdp %>%
  gg_tsdisplay(difference(box_cox(gdp_mill, lambda_tky)), plot_type = 'partial') +
  labs(title = TeX(paste0("Differenced Turkish GDP with $\\lambda$ = ", round(lambda_tky,2))))

9.3.b

Accomodation takings in the state of Tasmania from aus_accommodation.

Answer

At first, it looks highly seasonal, with a trend and cycle. Because of this, we might have to transform this to take the trend out and make it stationary. When performing the box-cox transformation, it reduced the variance towards the lat observations and made it more uniform. However, still more transformation could be necessary.

However, given the heavy seasonality, it looks like every 4 quarters, there could be high autocorrelation. When differencing every 4, we can remove the seasonality. According to ACF the last 3 observations could help provide a trend, once it is stationary.

#explore data
tas_acc <- aus_accommodation %>%
  filter(State == "Tasmania")

#plot data
tas_acc %>%
  autoplot(Takings) +
  theme_ipsum_rc() +
  labs(y = "Occupancy in % of rooms"
       , x = "Date"
       , title = "Accomodation Takings in Tasmania"
       , subtitle = "1998 - 2016, Quarterly") +
  theme_ipsum_es() +
  geom_smooth(formula = y ~ x, method = "loess", color = "orange", size = 0.3) +
  scale_y_continuous(labels = scales::number)

#box-cox transformation

#find lambda (not nemo)

tas_lambda <- 
  tas_acc %>%
  features(Takings, features = guerrero) %>%
  pull(lambda_guerrero)

tas_acc %>%
  autoplot(box_cox(Takings, tas_lambda)) +
  labs(y = "Occupancy in % of rooms"
       , x = "Date"
       , title = "Accomodation Takings in Tasmania"
       , subtitle = TeX(paste0("Transformed Tasmanian takings with $\\lambda$ = ",round(tas_lambda,4)))) +
  theme_ipsum_es() +
  geom_smooth(formula = y ~ x, method = "loess", color = "orange", size = 0.3) +
  scale_y_continuous(labels = scales::number)

#perform stationary unit root tests

tas_acc %>%
  mutate(tas_log = log(Takings)) %>%
  features(tas_log, unitroot_nsdiffs)

## # A tibble: 1 x 2
##   State    nsdiffs
##   <chr>      <int>
## 1 Tasmania       1

tas_acc %>%
  mutate(tas_log = log(Takings)) %>%
  features(tas_log, unitroot_kpss)

## # A tibble: 1 x 3
##   State    kpss_stat kpss_pvalue
##   <chr>        <dbl>       <dbl>
## 1 Tasmania      1.84        0.01

# back to residuals

tas_acc %>%
  gg_tsdisplay(Takings, plot_type = 'partial') +
  labs(title = TeX(paste0("Non-transformed Tasmania Accomodation Takings")))

tas_acc %>%
  gg_tsdisplay(difference(box_cox(Takings, tas_lambda)), plot_type = 'partial') + # played with 1,2,4
  labs(title = TeX(paste0("Differenced Tasmania Accomodation Takings with $\\lambda$ = ",
         round(tas_lambda,2))))

tas_acc %>%
  gg_tsdisplay(difference(box_cox(Takings, tas_lambda), 4), plot_type = 'partial') + # played with 1,2,4
  labs(title = TeX(paste0("Differenced Tasmania Accomodation Takings with $\\lambda$ = ",
         round(tas_lambda,2))))

9.3.c

Monthly sales from souvenirs.

Answer

Another dataseries that is highly seasonal and with a trend. I performed similar transformations to the previous two questions and tried something similar but at 12 months of differencing from the box-cox transformation since it is monthly data and it looks, from the ACF, highly autocorrelated at 12. The lambda value was 0, exactly, so we need to do a log transformation of the data.

#explore data
souv <- souvenirs

knitr::kable(head(souv),"html")

Month	Sales
1987 Jan	1664.81
1987 Feb	2397.53
1987 Mar	2840.71
1987 Apr	3547.29
1987 May	3752.96
1987 Jun	3714.74

souv %>%
  autoplot(Sales) +
  labs(y = "Sales in AUD"
       , x = "Date"
       , title = "Souvenir Sales"
       , subtitle = "Queensland, Australa") +
  theme_ipsum_es() +
  geom_smooth(formula = y ~ x, method = "loess", color = "orange", size = 0.3) +
  scale_y_continuous(labels = scales::dollar)

souv %>%
  gg_tsdisplay(Sales, plot_type = 'partial') +
  labs(title = TeX(paste0("Non-transformed Souvenir Sales")))

# box-cox

souv_lambda <-
  souv %>%
  features(Sales, features = guerrero) %>%
  pull(lambda_guerrero)

souv %>%
  features(box_cox(Sales, souv_lambda), unitroot_nsdiffs) %>%
  kbl()%>%
  kable_material(c("striped","hover", full_width = T))

nsdiffs
1

souv %>%
  features(box_cox(Sales, souv_lambda), unitroot_kpss) %>%
  kbl()%>%
  kable_material(c("striped","hover", full_width = T))

kpss_stat	kpss_pvalue
1.790425	0.01

souv %>%
  gg_tsdisplay(difference(box_cox(Sales, souv_lambda), 12), plot_type = 'partial') + # played with 1,2,4
  labs(title = TeX(paste0("Differenced Souvenir Sales in Queensland $\\lambda$ = ",
         round(souv_lambda,2))))

9.5

For your retail data (from Exercise 8 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

Answer

Again, the data is seasonal and with a trend. After tranforming the data with the Box-Cox method, we find that a log transformation could be in the cards but not forcefully necessary.

# return

set.seed(54321)
turnover_aus <- 
  aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

autoplot(turnover_aus,Turnover) +
  labs(title = "South Australia retail turnover by Month") +
  labs(y = "$Million AUD"
       , x = "Date"
       , title = "South Australia retail turnover by Month") +
  theme_ipsum_es() +
  geom_smooth(formula = y ~ x, method = "loess", color = "orange", size = 0.3) +
  scale_y_continuous(labels = scales::dollar)

turnover_lambda <-
  turnover_aus %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero )

paste0("lambda = ", round(turnover_lambda, 4)) %>%
  kbl() %>%
  kable_material(c("striped","hover", full_width = T))

x
lambda = -0.1028

# log

turnover_log <-
turnover_aus %>%
  mutate(turn_aus_log = log(Turnover))

#head(turnover_log)

turnover_log %>%
  gg_tsdisplay(turn_aus_log) + 
  labs(title = "Log transformation")

9.6

9.6.a

Use the following R code to generate data from an AR(1) model with ϕ1=0.6 and σ2=1. The process starts with y1=0

# borrow from Douglas

set.seed(13579)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

9.6.b

Produce a time plot for the series. How does the plot change as you change ϕ1?

Answer

Data is clearly stationary, meaning, no trend is clear. Can perform tests, but one can see it. Some seasonality could be happening, but more tests should be performed.

sim %>% autoplot(y) +
  labs(y = "Random"
       , x = "Index Time"
       , title = " AR (1) Model with phi 0.6 and sigma sq 1") +
  theme_ipsum_es() +
  geom_smooth(formula = y ~ x, method = "loess", color = "orange", size = 0.3) +
  scale_y_continuous(labels = scales::number)

9.6.c

Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1.

Answer

This results in another stationary model, with potential cycles, but no clear seasonality. Good for modeling moving averages without a trend, testing lambda and possibly using logs.

set.seed(24680)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*e[i-1] + e[i]
sim_theta <- tsibble(idx = seq_len(100), y = y, index = idx)

sim_theta %>% autoplot(y) + 
  labs(title = "MA(1) sim with theta = 0.6") +
  theme_ipsum_es() +
  geom_smooth(formula = y ~ x, method = "loess", color = "orange", size = 0.3) +
  geom_smooth(formula = y ~ x, method = "lm", color = "blue", size = 0.3) +
  scale_y_continuous(labels = scales::number)

9.6.d

Produce a time plot for the series. How does the plot change as you change θ1?

Answer

This change made the trend flatter. Mathematically a smaller theta suggests larger increases in spikes, but the changes in the plots suggests the effects take place over time assuming a trend. However, this means that these errors are closer to the mean, if this is constant.

set.seed(13579)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 1.0*e[i-1] + e[i]
sim_theta1 <- tsibble(idx = seq_len(100), y = y, index = idx)

sim_theta1 %>% autoplot(y) + 
  labs(title = "MA(1) sim with theta = 1.0") +
  theme_ipsum_es() +
  geom_smooth(formula = y ~ x, method = "loess", color = "orange", size = 0.3) +
  geom_smooth(formula = y ~ x, method = "lm", color = "blue", size = 0.3) +
  scale_y_continuous(labels = scales::number)

9.6.e

Generate data from an ARMA(1,1) model with ϕ1=0.6, θ1=0.6, and σ2=1.

Answer

set.seed(1234)
y <- numeric(100)

for(i in 2:100)
  y[i] <- e[i] + 0.6 * e[i-1] + 0.6 * y[i-1]

arma1_1 <- tsibble(idx = seq_len(100), y = y, index = idx)

autoplot(arma1_1, y) + 
  labs(title = "ARMA(1) sim with phi =0.6 , theta = 0.6, and sigma =1") +
  theme_ipsum_es() +
  geom_smooth(formula = y ~ x, method = "loess", color = "orange", size = 0.3) +
  geom_smooth(formula = y ~ x, method = "lm", color = "blue", size = 0.3) +
  scale_y_continuous(labels = scales::number)

9.6.f

Generate data from an AR(2) model with ϕ1=−0.8, ϕ2=0.3, and σ2=1. (Note that these parameters will give a non-stationary series.)

Comments

Variance increases considerably towards the end.

set.seed(1234)
y <- numeric(100)

for(i in 3:100)
  y[i] <- -0.8 * y[i-1] + 0.3 * y[i-2] + e[i]

ar2 <- tsibble(idx = seq_len(100), y = y, index = idx)

autoplot(ar2, y) + 
  labs(title = "AR(2) sim with phi = -0.8 , theta = 0.3, and sigma = 1") +
  theme_ipsum_es() +
  geom_smooth(formula = y ~ x, method = "loess", color = "orange", size = 0.3) +
  geom_smooth(formula = y ~ x, method = "lm", color = "blue", size = 0.3) +
  scale_y_continuous(labels = scales::number)

9.6.g

Graph the latter two series and compare them.

autoplot(arma1_1, y, color = "blue") +
  autolayer(ar2, y, color = "magenta") +
  labs(title = "AR and ARMA comparison of random data") +
  theme_ipsum_es()

9.7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

9.7.a

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

Answer

# Finally, ARIMA

aus_pass <- aus_airpassengers

arima_pass <- 
  aus_pass %>%
  model(ARIMA(Passengers))

report(arima_pass)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

#plot forecast
arima_pass %>%
  forecast(h=10) %>%
  autoplot(aus_pass) + 
  labs(y = "Passengers in Millions"
       , x = "Date"
       , title = "Air Passengers Forecast"
       , subtitle = "ARIMA") +
  theme_ipsum_es() +
  scale_y_continuous(labels = scales::number)

arima_pass %>%
  gg_tsresiduals() +
  labs(title = "Residuals of Arima Model"
       , subtitle = "Air Passengers")

9.7.b

Write the model in terms of the backshift operator.

Answer

yt = −0.8963εt − 1 + εt

(1−B) 2yt = (1−0.8963B)εt

9.7.c

Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

Comment

The model doesnt seem to be terrible. The trend is close and within the lo and high 80 CI levels.

arima_pass2 <- aus_pass %>%
  filter(Year < 2011) %>%
  model(ARIMA(Passengers ~ pdq(0,1,0)))

arima_pass2 %>% 
  forecast(h=10) %>%
  autoplot(aus_pass) +
  labs(title = "Air Passengers Forecast"
       , subtitle = "ARIMA(0,1,0)"
       , y = "Passengers in Millions") + 
  theme_ipsum_rc()

arima_pass2 %>% 
  gg_tsresiduals() +
  labs(title = "Residuals for ARIMA(0,1,0)")

9.7.c

Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

Answer

Nothing happens when removing the constant. Either it is an error, or I couldn’t produce it correctly.

arima_pass3 <-aus_pass %>%
  filter(Year < 2011) %>%
  model(ARIMA(Passengers ~ pdq(2,1,2)))

arima_pass3 %>% 
  forecast(h=10) %>%
  autoplot(aus_pass) +
  labs(title = "Air Passenger Forecast"
       , subtitle = "ARIMA(2,1,2)"
       , y = "Passengers in millions") +
    theme_ipsum_rc()

arima_pass3 %>% 
  gg_tsresiduals() +
  labs(title = "Residuals for ARIMA(2,1,2)")

#removing constant
arima_pass4 <-aus_pass %>%
  filter(Year < 2011) %>%
  model(ARIMA(Passengers ~ pdq(2,1,3), include.constant = FALSE))

#report(arima_pass4)

arima_pass4 %>% 
  forecast(h=10) %>%
  autoplot(aus_pass) +
  labs(title = "Air Passenger Forecast"
       , subtitle = "ARIMA(2,1,2)"
       , y = "Passengers in millions") +
  theme_ipsum_rc()

9.8

For the United States GDP series (from global_economy):

9.8.a if necessary, find a suitable Box-Cox transformation for the data;

Comment

Plotting and transformation was done. Box-Cox transformation reduced some of the variance, but because variance in general does not appear large, a transformation here might not be necessary.

# explore and plot

gdp_us<-
  global_economy %>%
  filter(Code == "USA") %>%
  mutate(gdp_bill = GDP/1000000000)

gdp_us %>%
  gg_tsdisplay(gdp_bill, plot_type='partial') +
  labs(title = "Non-transformed United States GDP")

# box-cox

gdp_us_lambda <-
  gdp_us %>%
  filter(Code == "USA") %>%
  features(gdp_bill, features = guerrero) %>%
  pull(lambda_guerrero)

gdp_us %>%
  autoplot(box_cox(gdp_bill, gdp_us_lambda)) +
  labs(title = TeX(paste0("Transformed US GDP with $\\lambda$ = ", round(gdp_us_lambda,4)))
       , subtitle = "Box-Cox Transformation"
       , y = "Billions US") +
  theme_ipsum_rc()

9.8.b

fit a suitable ARIMA model to the transformed data using ARIMA()

Answer

The ARIMA(1,1,0) w/ drift from the automated model showed reasonable results.

gdp_us_arima <- 
  gdp_us %>%
  model(ARIMA(box_cox(gdp_bill,gdp_us_lambda)))

report(gdp_us_arima)

## Series: gdp_bill 
## Model: ARIMA(1,1,0) w/ drift 
## Transformation: box_cox(gdp_bill, gdp_us_lambda) 
## 
## Coefficients:
##          ar1  constant
##       0.4586    0.3428
## s.e.  0.1198    0.0276
## 
## sigma^2 estimated as 0.0461:  log likelihood=7.72
## AIC=-9.43   AICc=-8.98   BIC=-3.3

gdp_us_arima %>%
  forecast(h=10) %>%
  autoplot(gdp_us) +
  labs(title = "US GDP"
       , subtitle = "ARIMA forecast"
       , y = "Billions US") +
  theme_ipsum_rc()

9.8.c

try some other plausible models by experimenting with the orders chosen;

The list of ARIMA’s are below.

all_arima <- gdp_us %>%
  model(ARIMA100 = ARIMA(gdp_bill ~ pdq(1,0,0)),
        ARIMA210 = ARIMA(gdp_bill ~ pdq(2,1,0)),
        ARIMA010 = ARIMA(gdp_bill ~ pdq(0,1,0)),
        ARIMA110 = ARIMA(gdp_bill ~ pdq(1,1,0)),
        ARIMA011 = ARIMA(gdp_bill ~ pdq(0,1,1)),
        ARIMA011c = ARIMA(gdp_bill ~ pdq(0,1,1)+1),
        ARIMA021 = ARIMA(gdp_bill ~ pdq(0,2,1)),
        ARIMA021c = ARIMA(gdp_bill ~ pdq(0,2,1)+1),
        search = ARIMA(gdp_bill, stepwise=FALSE))

glance(all_arima) %>% 
  arrange(AICc) %>%
  kbl()%>%
  kable_material(c("striped","hover", full_width = T))

Country	.model	sigma2	log_lik	AIC	AICc	BIC	ar_roots	ma_roots
United States	ARIMA021c	24238.82	-363.1436	732.2872	732.7488	738.3633		1.000002+0i
United States	search	26149.94	-363.5725	733.1449	733.6064	739.2210		1.248340-0i, -2.628247+0i
United States	ARIMA021	29190.96	-367.0479	738.0957	738.3221	742.1464		1.718758+0i
United States	ARIMA110	28372.78	-372.4723	750.9447	751.3975	757.0738	1.353121+0i
United States	ARIMA210	28811.65	-372.3921	752.7842	753.5534	760.9564	1.297121+0i, -14.059197+0i
United States	ARIMA011	37328.30	-380.0944	766.1889	766.6417	772.3180		-1.745319+0i
United States	ARIMA011c	37328.30	-380.0944	766.1889	766.6417	772.3180		-1.745319+0i
United States	ARIMA010	57844.51	-392.8922	789.7844	790.0066	793.8705

all_arima %>%
  forecast(h=10) %>%
  autoplot(gdp_us, PI = FALSE) +
  labs(title = "US GDP"
       , subtitle = "ARIMA forecast"
       , y = "Billions US") +
  theme_ipsum_rc()

9.8.d

choose what you think is the best model and check the residual diagnostics;

Answer

The model with the better result based on AIC, and almost any other metric, was ARIMA(0,2,1) with a constant.

9.8.e

Answer

The plots are below. The forecasts seem reasonable, all of them, including the first one that is automated.

gdp_us_arima021 <- 
  gdp_us %>%
  model(ARIMA(gdp_bill ~ pdq(0,2,1)+1))

gdp_us_arima021 %>%
  forecast(h=10) %>%
  autoplot(gdp_us) +
  labs(title = "US GDP"
       , subtitle = "ARIMA forecast(0,2,1) + c"
       , y = "Billions US") +
  theme_ipsum_rc()

gdp_us_arima021 %>%
  gg_tsresiduals() +
  labs(title = "Residuals US GDP")

9.9

Consider aus_arrivals, the quarterly number of international visitors to Australia from several countries for the period 1981 Q1 – 2012 Q3

9.9.a

Select one country and describe the time plot.

Chose the U.S.

aus_us <- 
  aus_arrivals %>% 
  filter(Origin=='US') %>%
  mutate(arr = Arrivals/1000)

aus_us %>% 
  autoplot(arr) + 
  ggtitle("Australian Arrivals from the US") +
  labs(y = "thousand of passengers"
       , x = "Date"
       , title = "Arrivals from the US to Australia") +
  theme_ipsum_es() +
  geom_smooth(formula = y ~ x, method = "loess", color = "orange", size = 0.3) +
  scale_y_continuous(labels = scales::number)

9.9.b

Use differencing to obtain stationary data

aus_us_ts <-ts(aus_us$arr, start =c(1981,1), frequency = 4) #use Dr. Fulton method of ts

aus_lambda <- BoxCox.lambda(aus_us_ts) #variance is out of control

aus_us_bc <- 
  aus_us_ts %>% 
  BoxCox(aus_lambda) %>% 
  diff(1) %>% 
  diff(12) # seasonality

ggtsdisplay(aus_us_bc, title = "Differencing") + 
  labs(title = "Residuals of Differenced Box Cox Transformation")

9.9.c

What can you learn from the ACF graph of the differenced data?

Answer

That the data is is highly auto correlated at minus 1, and -12, meaning, the seasonality is strong

9.9.d

What can you learn from the PACF graph of the differenced data?

Answer

Very similar to PACF, though there appears to be some partial correlation at 4t too. This can be discarded, however, since the multiple of this number does not seem to be repeated thereafter.

9.9.e

What model do these graphs suggest?

Answer Potentially an ETS or an ARIMA model of 12 (monthly seasonality), but we should test different results for each potential model.

9.9.f

Does ARIMA() give the same model that you chose? If not, which model do you think is better?

Answer

Will only try auto.arima. By the looks of it, the best performance came at ARIMA(2,1,0)(1,1,1) and 4 – not 12.

aus_us_bc_2 <-
  aus_us_ts %>%
  BoxCox(aus_lambda)


aus_us_arima2 <- 
  auto.arima(aus_us_bc_2)

summary(aus_us_arima2)

## Series: aus_us_bc_2 
## ARIMA(2,1,0)(1,1,1)[4] 
## 
## Coefficients:
##           ar1      ar2    sar1     sma1
##       -0.4711  -0.3837  0.2600  -0.9417
## s.e.   0.0838   0.0842  0.1061   0.0779
## 
## sigma^2 estimated as 0.04529:  log likelihood=14.13
## AIC=-18.26   AICc=-17.75   BIC=-4.24
## 
## Training set error measures:
##                       ME      RMSE       MAE       MPE     MAPE      MASE
## Training set -0.01981134 0.2051329 0.1416355 -0.309484 2.120754 0.6458966
##                      ACF1
## Training set -0.001670053

aus_us_arima2 %>%
  forecast(h=10) %>%
  plot(aus_us_ts, title = "Forecast US GDP, transformed")