Data 624 Homework 6

Exercise 9.1

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

(a) Explain the differences among these figures. Do they all indicate that the data are white noise?

Answer:

As the number of random numbers increases, the ACF boundaries become narrower. Each of the above ACF plots tell us that the data are white noise because over 95% of the lag data for each chart falls within the ACF boundaries.

(b) Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

Answer:

The critical values are at different distances from the mean of zero due to the law of large numbers. As the number of observations increase, the number of outliers from the mean decreases. Additionally, the autocorrelations are different in each figure due to the fact that the series are composed of random numbers. As a result, the autocorrelations will also show signs of randomness.

Exercise 9.2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

ggtsdisplay(gafa_stock$Close)

acf_results <- acf(gafa_stock$Close, plot = FALSE)
acf_results

## 
## Autocorrelations of series 'gafa_stock$Close', by lag
## 
##     0     1     2     3     4     5     6     7     8     9    10    11    12 
## 1.000 0.998 0.996 0.995 0.993 0.991 0.990 0.988 0.986 0.984 0.983 0.981 0.979 
##    13    14    15    16    17    18    19    20    21    22    23    24    25 
## 0.977 0.975 0.973 0.971 0.969 0.967 0.965 0.962 0.960 0.958 0.956 0.954 0.953 
##    26    27    28    29    30    31    32    33    34    35    36    37 
## 0.951 0.949 0.947 0.945 0.943 0.941 0.939 0.937 0.935 0.933 0.930 0.928

Answer:

The above ACF plot shows us that the autocorrelation values decrease over time (we can confirm this observation by looking at the results of calling the acf() function on the data). Both the ACF and PACF plots seem to display seasonality and most of the lags fall outside of the boundaries suggesting that the data is not white noise. Additionally, the series graph shows an upward trend in the data, followed by a sharp drop, and then a continued upward trend further suggesting that this is a non-stationary series.

Exercise 9.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

(a) Turkish GDP from global_economy.

# Extract Turkish GDP data from the global_economy dataset.
turkishGDP <- global_economy %>% filter(Country == 'Turkey')
ggtsdisplay(turkishGDP$GDP)

# BoxCox and difference the data.
BoxCoxedData <- BoxCox(turkishGDP$GDP, lambda = BoxCox.lambda(turkishGDP$GDP))
DifferencedData <- diff(BoxCoxedData)

# Explore the data after BoxCoxing and differencing to check for improvement.
ggtsdisplay(DifferencedData)

(b) Accommodation takings in the state of Tasmania from aus_accommodation.

# Extract Tasmania Accomodation Takings data from the aus_accommodation dataset.
tasmaniaAccomodationTakings <- aus_accommodation %>% filter(State == 'Tasmania')

# Explore the data before BoxCoxing and differencing.
ggtsdisplay(tasmaniaAccomodationTakings$Takings)

# BoxCox and difference the data.
TasmaniaBoxCoxedData <- BoxCox(tasmaniaAccomodationTakings$Takings, lambda = BoxCox.lambda(tasmaniaAccomodationTakings$Takings))
TasmaniaDifferencedData <- diff(TasmaniaBoxCoxedData)

# Explore the data after BoxCoxing and differencing to check for improvement.
ggtsdisplay(TasmaniaDifferencedData)

# Extract the sales data from the souvenirs dataset.
souvenirSales <- souvenirs$Sales

# Explore the data before BoxCoxing and differencing.
ggtsdisplay(souvenirSales)

# BoxCox and difference the data.
SalesBoxCoxedData <- BoxCox(souvenirSales, lambda = BoxCox.lambda(souvenirSales))
SalesDifferencedData <- diff(SalesBoxCoxedData)

# Explore the data after BoxCoxing and differencing to check for improvement.
ggtsdisplay(SalesDifferencedData)

Exercise 9.5

For your retail data (from Exercise 8 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

# Load the series and plot the data.
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1))

autoplot(myseries) +
  labs(title = 'Original Retail Data Plot')

Answer:

Looking at the original time series plot above, we can see that the data has an upward trend with increasing variance. This suggests that we may need to perform differencing, and Box-Cox the data to stabalize it.

# Check the number of differences that should be performed.
retailData <- ts(myseries$Turnover, frequency = 12, start = c(1982, 4))
ndiffs(retailData)

## [1] 1

Running the series through the ndiffs() function tells us that we need to perform 1 difference to achieve stationarity.

kpss.test(diff(diff(BoxCox(retailData, BoxCox.lambda(retailData)), lag = 12)))

## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(diff(BoxCox(retailData, BoxCox.lambda(retailData)), lag = 12))
## KPSS Level = 0.026139, Truncation lag parameter = 5, p-value = 0.1

Exercise 9.6

Simulate and plot some data from simple ARIMA models.

(a) Use the following R code to generate data from an AR(1) model with ϕ1 = 0.6 and σ2 = 1. The process starts with y1 = 0.

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
autoplot(sim)

(b) Produce a time plot for the series. How does the plot change as you change ϕ1?

ar1_model_simulation <- function(phi1_value) {
  y <- ts(numeric(100))
  e <- rnorm(100)
  for (i in 2:100){
    y[i] <- phi1_value*y[i-1] + e[i]
  }
  return(y)
}

autoplot(ar1_model_simulation(0.1), series = '0.3') + 
  autolayer(ar1_model_simulation(0.6), series = '0.6') + 
  autolayer(ar1_model_simulation(0.9), series = '0.9')

Answer:

As the value of ϕ1 is increased, the chart variation becomes more dispersed.

(c) Write your own code to generate data from an MA(1) model with θ1 = 0.6 and σ2 = 1.

ma1_model_simulation <- function(theta1_value) {
  y <- ts(numeric(100))
  e <- rnorm(100)
  for (i in 2:100) {
    y[i] <- theta1_value*e[i-1] + e[i]
  }
  return(y)
}

(d) Produce a time plot for the series. How does the plot change as you change θ1?

autoplot(ma1_model_simulation(0.1), series = '0.3') + 
  autolayer(ma1_model_simulation(0.6), series = '0.6') + 
  autolayer(ma1_model_simulation(0.9), series = '0.9')

Answer:

As the value of θ1 is increased, the variation of Y is increased also.

(e) Generate data from an ARMA(1,1) model with ϕ1 = 0.6, θ1 = 0.6 and σ2 = 1.

y1 <- ts(numeric(100))
e <- rnorm(100, sd=1)
for (i in 2:100)
  y1[i] <- 0.6*y1[i-1] + 0.6*e[i-1] + e[i]

autoplot(y1)

(f) Generate data from an AR(2) model with ϕ1 = −0.8, ϕ2 = 0.3 and σ2 = 1. (Note that these parameters will give a non-stationary series.)

y2 <- ts(numeric(100))
e <- rnorm(100, sd=1)
for (i in 3:100)
  y2[i] <- -0.8*y2[i-1] + 0.3*y2[i-2] + e[i]
autoplot(y2)

(g) Graph the latter two series and compare them.

autoplot(y1, series = 'ARMA(1,1)') + 
  autolayer(y2, series = 'AR(2)')

Answer:

From the above comparison plot, The ARMA(1,1) model appears to be stationary whilst the AR(2) model diverges exponentially.

Exercise 9.7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

(a) Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

# Run the series through the ARIMA() function and have it automatically select an appropriate ARIMA model. 
arimaAirPassengers <- aus_airpassengers %>%
  model(ARIMA(Passengers))

# Print out the results of the selection. 
arimaAirPassengers

## # A mable: 1 x 1
##   `ARIMA(Passengers)`
##               <model>
## 1      <ARIMA(0,2,1)>

Running the aus_airpassengers series through the ARIMA() function using the default stepwise procedure, results in an ARIMA(0,2,1) model being selected. A quick glance at the model reveals that it has a fairly low AICc value of 198. telling us that it is a good fit.

glance(arimaAirPassengers)

## # A tibble: 1 x 8
##   .model            sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
##   <chr>              <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
## 1 ARIMA(Passengers)   4.31   -97.0  198.  198.  202. <cpl [0]> <cpl [1]>

Next we will check that the residuals look like white noise.

# Plot the residuals to check for white noise. 
gg_tsresiduals(arimaAirPassengers)

# Perform a portmanteau test on the residuals.
augment(arimaAirPassengers) %>%
  filter(.model == 'ARIMA(Passengers)') %>%
  features(.innov, ljung_box, lag = 10, dof = 3)

## # A tibble: 1 x 3
##   .model            lb_stat lb_pvalue
##   <chr>               <dbl>     <dbl>
## 1 ARIMA(Passengers)    6.70     0.461

The ACF plot of the residuals above reveals that all of the autocorrelations are within the thereshold limits which confirms that the residuals are white noise. The high P value of 0.4611594 returned by the portmanteau test further confirms this so we will go ahead and plot forecasts for the next 10 periods.

# Plot forecasts on the time series for the next 10 periods.
arimaAirPassengers %>%
  forecast(h = 10) %>%
  autoplot(aus_airpassengers) +
  labs(title = 'Australian Air Passengers Forecast Plot')

The forecast for the time series follows the general upward trend of the data.

(b) Write the model in terms of the backshift operator.

Answer: \((1 − ϕ1B) (1−B)2_{yt} = c + εt\)

(c) Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

# Create and forecast an ARIMA(0,1,0) model with drift.
arima010AirPassengersDrift <- forecast(Arima(aus_airpassengers$Passengers, order = c(0, 1, 0), include.drift = TRUE))
autoplot(arima010AirPassengersDrift) +
  labs(title = 'ARIMA(0,1,0) Model With Drift Forecast Plot')

Answer:

Dropping the constant results in a forecast that is less steep than the forecast in part a (Australian Air Passengers Forecast Plot), and a reduction in confidence intervals.

(d) Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

# Create and forecast an ARIMA(2,1,2) model with drift.
arima212AirPassengersDrift <- forecast(Arima(aus_airpassengers$Passengers, order = c(2, 1, 2), include.constant = TRUE))
autoplot(arima212AirPassengersDrift) +
  labs(title = 'ARIMA(2,1,2) Model With Drift')

Remove the constant and see what happens.

# Now remove the constant to see what happens. I have commented this code out as removing the constant results in the following error:
# "Error in stats::arima(x = x, order = order, seasonal = seasonal, include.mean = include.mean, : non-stationary AR part from CSS".

# arima212AirPassengersNoConstant <- forecast(Arima(aus_airpassengers$Passengers, order = c(2, 1, 2), include.constant = FALSE))
# autoplot(arima212AirPassengersNoConstant) +
#  labs(title = 'ARIMA(2,1,2) Model With No Constant')

Answer:

Removing the constant from the model results in a "non-stationary AR part from CSS" R error, and the model does not run.

(e) Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

# Create and forecast an ARIMA(0,2,1) model with constant.
arima021AirPassengersConstant <- forecast(Arima(aus_airpassengers$Passengers, order = c(0, 2, 1), include.constant = TRUE))
autoplot(arima021AirPassengersConstant) +
  labs(title = 'ARIMA(0,2,1) Model With Constant Forecast Plot')

Answer:

The forecast now better resembles the original "Australian Air Passengers Forecast" plot in part a.

Data 624 Homework 6

Week 8 ARIMA

Stephen Haslett

10/15/2021

Exercise 9.1

Exercise 9.2

Exercise 9.3

Exercise 9.5

Exercise 9.6

Exercise 9.7