Homework 6 JBurns

Assignment:

Do the exercises 9.1, 9.2, 9.3, 9.5, 9.6, 9.7, 9.8 in Hyndman. Please submit both the Rpubs link as well as your .rmd file.

library(tsibble)
library(fpp3)
library(readxl)
library(seasonal)

9.1

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

a.

Explain the differences among these figures. Do they all indicate that the data are white noise?

While all graphs of varying random ACF numbers do change with an increasing N, since all of the spices are within the bounded area these can all be deemed as white noise.

b.

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The critical values are different distances from the mean of zero because mathematically, their calculations are based off of the T (length of the time series). The time series size also has an effect on the autocorrelations across the different N’s. As the length of the time series gets longer the chance of autocorrelation between variables decreases.

9.2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

gafa_stock %>% 
    filter(Symbol == "AMZN") %>% 
    autoplot(Close) +
    labs(title = "AMZN Closing Price")

gafa_stock %>% 
    filter(Symbol == "AMZN") %>% 
    gg_tsdisplay(Close, plot_type = 'partial')+
    labs(title = 'Partial Plot of Amazon Closing Price')

In the acf lag plot there is pretty clear indications that the closing prices are not white noise. A static time series would show most spikes within the acf boundaries.

9.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Turkish GDP from global_economy.
Accommodation takings in the state of Tasmania from aus_accommodation.
Monthly sales from souvenirs. ### a. Turkish GDP from global_economy.

global_economy %>% 
    filter(Country == "Turkey") %>% 
    gg_tsdisplay(GDP, plot_type = 'partial') +
    labs(title = 'Pre Box Cox Transformation Turkish GDP')

turk_lambda <- global_economy %>% 
    filter(Country == "Turkey") %>% 
    features(GDP, features = guerrero) %>% 
    pull(lambda_guerrero)
global_economy %>% 
    filter(Country == "Turkey") %>% 
    features(box_cox(GDP, turk_lambda), unitroot_ndiffs)

## # A tibble: 1 × 2
##   Country ndiffs
##   <fct>    <int>
## 1 Turkey       1

Box Cox Transformation Viz:

global_economy %>% 
    filter(Country == "Turkey") %>% 
    gg_tsdisplay(difference(box_cox(GDP,turk_lambda)), plot_type = 'partial') +
    labs(title = 'Post Box Cox Transformation')

Following the box cox transformation both acf and pacf graphs indicate that the transformed time series is stationary. So one differencing was the correct move.

b. Accommodation takings in the state of Tasmania from aus_accommodation.

aus_accommodation %>% 
    filter(State == "Tasmania") %>% 
    gg_tsdisplay(Takings, plot_type = 'partial') +
    labs(title = 'Pre Box Cox Transformation Tasmanisan Accomodation Takings')

aus_lambda <- aus_accommodation %>% 
    filter(State == "Tasmania") %>% 
    features(Takings, features = guerrero) %>% 
    pull(lambda_guerrero)
aus_accommodation %>% 
    filter(State == "Tasmania") %>% 
    features(box_cox(Takings, aus_lambda), unitroot_ndiffs)

## # A tibble: 1 × 2
##   State    ndiffs
##   <chr>     <int>
## 1 Tasmania      1

aus_accommodation %>% 
    filter(State == "Tasmania") %>% 
    gg_tsdisplay(difference(box_cox(Takings,aus_lambda), 4), plot_type = 'partial') +
    labs(title = 'Post Box Cox Transformation')

This data looks at the aus_accomodation dataset, specifically the takings in Tasmania.The test suggests only one transformation for the data. Following the transformation seems to make the data more stationary, better suiting the requirements of ARIMA models.

c. Monthly sales from souvenirs.

souvenirs %>% 
    gg_tsdisplay(Sales, plot_type = 'partial') +
    labs(title = 'Pre Box Cox Transformation Souvenir Sales')

souv_lambda <- souvenirs %>% 
    features(Sales, features = guerrero) %>% 
    pull(lambda_guerrero)
souvenirs %>% 
    features(box_cox(Sales, souv_lambda), unitroot_ndiffs)

## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      1

souvenirs %>% 
    gg_tsdisplay(difference(box_cox(Sales, souv_lambda), 12), plot_type = 'partial')+
    labs(title = 'Post Box Cox Transformation')

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

After differing once (at the suggestion of KPSS) the data looks to be stationary.

9.5

For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

set.seed(223412)
aus_retail_orig <- aus_retail %>% 
    filter(`Series ID` == sample(`Series ID`,1))
aus_retail_orig %>% 
    gg_tsdisplay(Turnover, plot_type = 'partial')+
    labs(title = 'Pre Transform Retail Data')

aus_ret_lambda <- aus_retail_orig %>% 
    features(Turnover, features = guerrero) %>%
    pull(lambda_guerrero)
aus_retail_orig %>% 
    features(box_cox(Turnover, aus_ret_lambda), unitroot_nsdiffs)

## # A tibble: 1 × 3
##   State           Industry         nsdiffs
##   <chr>           <chr>              <int>
## 1 South Australia Liquor retailing       1

aus_retail_orig %>% 
    gg_tsdisplay(difference(box_cox(Turnover,aus_ret_lambda), 12), plot_type='partial')+
    labs(title = 'Post Box Cox Transformation Turnover')

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

The KPSS test recomends that differencing occur only once to this time series. As a result the ACF and PACF reflect a newely stationary state.

9.6

a. Use the following R code to generate data from an AR(1) model with ϕ1 = 0.6 and σ2 = 1. The process starts with y1 = 0.

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

b. Produce a time plot for the series. How does the plot change as you change ϕ1?

# Theta = 0.6
sim %>% 
  autoplot(y)+
  labs(title = "Theta = 0.6 Model")

# Theta = 1
for(i in 2:100)
  y[i] <- 1*y[i-1] + e[i]
sim_1 <- tsibble(idx = seq_len(100), y = y, index = idx) %>% 
  autoplot()

## Plot variable not specified, automatically selected `.vars = y`

sim_1

# Theta = 0
for(i in 2:100)
  y[i] <- 0*y[i-1] + e[i]
sim_2 <- tsibble(idx = seq_len(100), y = y, index = idx) %>% 
  autoplot()

## Plot variable not specified, automatically selected `.vars = y`

sim_2

c. Write your own code to generate data from an MA(1) model with θ1 = 0.6 and σ2 = 1

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*e[i-1] + e[i]
sim_c <- tsibble(idx = seq_len(100), y = y, index = idx)

d.

# Theta = 0.6
for(i in 2:100)
  y[i] <- 0.6*e[i-1] + e[i]
sim_3 <- tsibble(idx = seq_len(100), y = y, index = idx) %>% 
  autoplot()

## Plot variable not specified, automatically selected `.vars = y`

sim_3

# Theta = 1
for(i in 2:100)
  y[i] <- 1*e[i-1] + e[i]
sim_4 <- tsibble(idx = seq_len(100), y = y, index = idx) %>% 
  autoplot()

## Plot variable not specified, automatically selected `.vars = y`

sim_4

# Theta = 0
for(i in 2:100)
  y[i] <- 0*e[i-1] + e[i]
sim_5 <- tsibble(idx = seq_len(100), y = y, index = idx) %>% 
  autoplot()

## Plot variable not specified, automatically selected `.vars = y`

sim_5

### e. Generate data from an ARMA(1,1) model with ϕ1=0.6 , θ1=0.6 and σ2=1.

set.seed(02180)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + 0.6*e[i-1] + e[i] 
sim_e_arma <- tsibble(idx = seq_len(100), y = y, index = idx)

f.

Generate data from an AR(2) model with ϕ1=−0.8, ϕ2 = 0.3 and σ2=1.(Note that these parameters will give a non-stationary series.)

set.seed(02180)
y <- numeric(100)
e <- rnorm(100)
for(i in 5:100)
  y[i] <- -0.8*y[i-1] + 0.3*y[i-2] + e[i] 
sim_f_ar <- tsibble(idx = seq_len(100), y = y, index = idx)

g. Graph the latter two series and compare them.

sim_e_arma %>% 
  gg_tsdisplay(y, plot_type = 'partial')+
  labs(title = 'ARMA(1,1) model with ϕ1=0.6 , θ1=0.6 and σ2=1')

sim_f_ar %>% 
  gg_tsdisplay(y, plot_type = 'partial')+
  labs(title = 'AR(2) model with ϕ1=−0.8, ϕ2 = 0.3 and σ2=1')

The ARMA (1,1) model appears to be stationary, while the AR model is not. The AR model explodes as the idx increases, and the arma model is stationary.

9.7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

a.

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

aus_pass_fit <- aus_airpassengers %>% 
  model(ARIMA(Passengers))
report(aus_pass_fit)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

aus_pass_fit %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "AUS Aircraft Passengers with ARIMA", y = "Passengers")

aus_pass_fit %>% 
  gg_tsresiduals() +
  labs(title = "Residuals for ARIMA Model")

### b.

Write the model in terms of the backshift operator.

\[ (1 - B)^2 y_t = c + (1 + \theta_1 B) e_t \]

c.

Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

aus_fit_c <-aus_airpassengers %>%
  model(ARIMA(Passengers ~ pdq(0,1,0)))

aus_fit_c %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "AUS Aircraft Passengers with ARIMA pdq(0,1,0)", y = "Passengers")

aus_fit_c %>% 
  gg_tsresiduals() +
  labs(title = "Residuals for ARIMA")

d

Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and

c. Remove the constant and see what happens.

fit3 <-aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ pdq(2,1,2)))

fit3 %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Aircraft Passengers with pdq(2,1,2)", y = "Passengers")

fit3 %>% 
  gg_tsresiduals() +
  labs(title = "Residuals for ARIMA")

#removing constant
fit4 <-aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))

## Warning: 1 error encountered for ARIMA(Passengers ~ 0 + pdq(2, 1, 2))
## [1] non-stationary AR part from CSS

e.

Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

The residuals have become white noise in the ARIMA model (0,2,1) and the lag plot distribution centers around zero. Adding a constant keeps the trajectory on an upward trend.

fit_5 <- aus_airpassengers %>%
  filter(Year < 2012)
fit_5_arima <- fit_5 %>% model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))

## Warning: Model specification induces a quadratic or higher order polynomial trend. 
## This is generally discouraged, consider removing the constant or reducing the number of differences.

fit_5_arima %>%
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Air Passengers Forecast using ARIMA(2,1,2) add constant")

fit_5_arima %>% 
  gg_tsresiduals() +
  labs(title = "Residuals for ARIMA pdq(0,2,1) with constant")