ARIMA models

library(tsibble)

## 
## Attaching package: 'tsibble'

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, union

library(tsibbledata)
library(forecast)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

library(feasts)

## Loading required package: fabletools

## 
## Attaching package: 'fabletools'

## The following objects are masked from 'package:forecast':
## 
##     accuracy, forecast

library(fable)
library(fpp3)

## ── Attaching packages ──────────────────────────────────────────── fpp3 0.4.0 ──

## ✓ tibble    3.1.1      ✓ lubridate 1.7.10
## ✓ dplyr     1.0.6      ✓ ggplot2   3.3.3 
## ✓ tidyr     1.1.3

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## x lubridate::date()      masks base::date()
## x dplyr::filter()        masks stats::filter()
## x fabletools::forecast() masks forecast::forecast()
## x tsibble::intersect()   masks base::intersect()
## x lubridate::interval()  masks tsibble::interval()
## x dplyr::lag()           masks stats::lag()
## x tsibble::setdiff()     masks base::setdiff()
## x tsibble::union()       masks base::union()

9.1

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise?

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

# All three figures show that the data is white noise. We would know that this is true even without looking at a graph, because the instructions for this question tell us that we are looking at random numbers. Random numbers are independent and identically distributed. IID data will have no autocorrelation for lags. The larger the sample, the closer the autocorrelation gets to zero, which is seen as part of the difference in the three graphs.

# We see in the ACF plots that the block columns that are the ACF bars are within the blue lines which are the intervals, outside of which there is statistical significance (in this case, the statistical significance would mean that the data is not white noise). 

# These ACF plots all look different from each other because the sample size is different. The width of the blue lines would look the same if we instead choose to change the y-axis values.

9.2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

gafa_stock <- gafa_stock

unique(gafa_stock$Symbol)

## [1] "AAPL" "AMZN" "FB"   "GOOG"

amzn_stock <- gafa_stock %>%
  filter(Symbol == "AMZN") %>%
  select(Symbol, Close)

amzn_stock %>% autoplot() +
  labs(title = "Amazon Closing Stock Price", subtitle = "") +
  theme(plot.title = element_text(hjust = 0.5, face = "bold")) +
  theme(plot.subtitle = element_text(hjust = 0.5)) +
  labs(caption = "")

## Plot variable not specified, automatically selected `.vars = Close`

amzn_stock %>%
  ACF(Close) %>%
  autoplot() +
  labs(title = "Amazon Closing Stock Price", subtitle = "ACF Plot") +
  theme(plot.title = element_text(hjust = 0.5, face = "bold")) +
  theme(plot.subtitle = element_text(hjust = 0.5)) +
  labs(caption = "")

## Warning: Provided data has an irregular interval, results should be treated with
## caution. Computing ACF by observation.

amzn_stock %>%
  PACF(Close) %>%
  autoplot() +  
  labs(title = "Amazon Closing Stock Price", subtitle = "PACF Plot") +
  theme(plot.title = element_text(hjust = 0.5, face = "bold")) +
  theme(plot.subtitle = element_text(hjust = 0.5)) +
  labs(caption = "")

## Warning: Provided data has an irregular interval, results should be treated with
## caution. Computing ACF by observation.

# Plot everything together
ggtsdisplay(amzn_stock$Close)

# Explain how each plot shows that the series is non-stationary and should be differenced.

# A stationary time series would look the same no matter which time frame we are observing, which means we don't see seasonality or even a trend. Our book also says that in a stationary time series there is no predictable long term pattern. 

# The ACF plot is useful in determining whether or not a time series is stationary. In our ACF plot we see the ACF is much closer to 1 than it is to 0. It's moving towards 0 along the x-axis but only gradually. Also, all the black lines are outside the blue intervals indicating a low p-value which means that the data fails the null hypothesis that the data is stationary. 

# The PACF removes the effects of the lags from the ACF plot.

9.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Turkish GDP from global_economy. Accommodation takings in the state of Tasmania from aus_accommodation. Monthly sales from souvenirs.

??fpp

turkey_economy <- global_economy %>%
  filter(Code == "TUR")

ggtsdisplay(turkey_economy$GDP)

tasmania <- aus_accommodation %>% filter(State == "Tasmania")
ggtsdisplay(tasmania$Takings)

9.5

For your retail data (from Exercise 8 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

9.6

Simulate and plot some data from simple ARIMA models.

Use the following R code to generate data from an AR(1) model with ϕ1=0.6 and σ2=1. The process starts with y1=0. y <- numeric(100) e <- rnorm(100)for(i in 2:100) y[i] <- 0.6*y[i-1] + e[i] sim <- tsibble(idx = seq_len(100), y = y, index = idx)
Produce a time plot for the series. How does the plot change as you change ϕ1?
Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1.
Produce a time plot for the series. How does the plot change as you change θ1
Generate data from an ARMA(1,1) model with ϕ1=0.6, θ1=0.6 and σ2=1.
Generate data from an AR(2) model with ϕ1=−0.8, ϕ2=0.3 and σ2=1. (Note that these parameters will give a non-stationary series.)
Graph the latter two series and compare them.

9.7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods. Write the model in terms of the backshift operator. Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a. Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens. Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

## structure(list(Year = c(1970, 1971, 1972, 1973, 1974, 1975, 1976, 
## 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 
## 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 
## 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 
## 2010, 2011, 2012, 2013, 2014, 2015, 2016), Passengers = c(7.3187, 
## 7.3266, 7.7956, 9.3846, 10.6647, 11.0551, 10.8643, 11.3065, 12.1223, 
## 13.0225, 13.6488, 13.2195, 13.1879, 12.6015, 13.2368, 14.4121, 
## 15.4973, 16.8802, 18.8163, 15.1143, 17.5534, 21.8601, 23.8866, 
## 26.9293, 26.8885, 28.8314, 30.0751, 30.9535, 30.1857, 31.5797, 
## 32.577569, 33.477398, 39.021581, 41.386432, 41.596552, 44.657324, 
## 46.951775, 48.728837, 51.488427, 50.026967, 60.640913, 63.3603103378, 
## 66.355274, 68.197955, 68.1232376693, 69.7793454797, 72.597700806
## )), row.names = c(NA, -47L), key = structure(list(.rows = structure(list(
##     1:47), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr", 
## "list"))), row.names = c(NA, -1L), class = c("tbl_df", "tbl", 
## "data.frame")), index = structure("Year", ordered = TRUE), index2 = "Year", interval = structure(list(
##     year = 1, quarter = 0, month = 0, week = 0, day = 0, hour = 0, 
##     minute = 0, second = 0, millisecond = 0, microsecond = 0, 
##     nanosecond = 0, unit = 0), .regular = TRUE, class = c("interval", 
## "vctrs_rcrd", "vctrs_vctr")), class = c("tbl_ts", "tbl_df", "tbl", 
## "data.frame"))

## Plot variable not specified, automatically selected `.vars = Passengers`

## Plot variable not specified, automatically selected `y = Passengers`

## # A mable: 5 x 2
## # Key:     names [5]
##   names                     values
##   <chr>                    <model>
## 1 arima010 <ARIMA(0,1,0) w/ drift>
## 2 arima212 <ARIMA(2,1,2) w/ drift>
## 3 arima021          <ARIMA(0,2,1)>
## 4 stepwise          <ARIMA(0,2,1)>
## 5 search            <ARIMA(0,2,1)>

## # A tibble: 1 x 3
##   .model lb_stat lb_pvalue
##   <chr>    <dbl>     <dbl>
## 1 search    6.00     0.539

## # A tibble: 1 x 3
##   .model   lb_stat lb_pvalue
##   <chr>      <dbl>     <dbl>
## 1 arima010    4.92     0.670

## # A tibble: 1 x 3
##   .model   lb_stat lb_pvalue
##   <chr>      <dbl>     <dbl>
## 1 arima212    2.75     0.907

## # A tibble: 1 x 3
##   .model   lb_stat lb_pvalue
##   <chr>      <dbl>     <dbl>
## 1 arima021    6.00     0.539

9.8

For the United States GDP series (from global_economy):

if necessary, find a suitable Box-Cox transformation for the data;
fit a suitable ARIMA model to the transformed data using ARIMA();
try some other plausible models by experimenting with the orders chosen;
choose what you think is the best model and check the residual diagnostics;
produce forecasts of your fitted model. Do the forecasts look reasonable?
compare the results with what you would obtain using ETS() (with no transformation).

## 
##  Fitting models using approximations to speed things up...
## 
##  ARIMA(2,0,2) with non-zero mean : Inf
##  ARIMA(0,0,0) with non-zero mean : 87508.09
##  ARIMA(1,0,0) with non-zero mean : 45033.15
##  ARIMA(0,0,1) with non-zero mean : 36186.37
##  ARIMA(0,0,0) with zero mean     : 124043.8
##  ARIMA(1,0,1) with non-zero mean : 7587.691
##  ARIMA(2,0,1) with non-zero mean : Inf
##  ARIMA(1,0,2) with non-zero mean : Inf
##  ARIMA(0,0,2) with non-zero mean : 29097
##  ARIMA(2,0,0) with non-zero mean : 45122.56
##  ARIMA(1,0,1) with zero mean     : Inf
## 
##  Now re-fitting the best model(s) without approximations...
## 
##  ARIMA(1,0,1) with non-zero mean : 46053.52
## 
##  Best model: ARIMA(1,0,1) with non-zero mean

## Plot variable not specified, automatically selected `.vars = GDP`

## Warning in sqrt(diag(best$var.coef)): NaNs produced

## Warning: It looks like you're trying to fully specify your ARIMA model but have not said if a constant should be included.
## You can include a constant using `ARIMA(y~1)` to the formula or exclude it by adding `ARIMA(y~0)`.

## Warning: 1 error encountered for arima212
## [1] Could not find an appropriate ARIMA model.
## This is likely because automatic selection does not select models with characteristic roots that may be numerically unstable.
## For more details, refer to https://otexts.com/fpp3/arima-r.html#plotting-the-characteristic-roots

## Warning in report.mdl_df(.): Model reporting is only supported for individual
## models, so a glance will be shown. To see the report for a specific model, use
## `select()` and `filter()` to identify a single model.

## # A tibble: 2 x 9
##   Country       .model    sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
##   <fct>         <chr>      <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
## 1 United States arima010 5.78e22  -1574. 3152. 3152. 3156. <cpl [0]> <cpl [0]>
## 2 United States arima021 2.92e22  -1528. 3059. 3059. 3063. <cpl [0]> <cpl [1]>

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning -
## Inf

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning -
## Inf

## Warning: Removed 15 row(s) containing missing values (geom_path).

ARIMA models

Josef

10/11/2021

9.1

9.2

9.3

9.5

9.6

9.7

9.8