ARIMA

1

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise?

The difference among the graph is that the length of the time series is smaller and smaller causing the ACF bounds to become narrower and narrower. Each graph indicates the data is white noise as the spikes remain within the bounds.

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The ACF bounds is determined by the length of the time series. As the time series length gets smaller, so do the bounds.

2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

The autoplot graph of Amazon closing stock shows that there are clear trends and seasonal pattern. In other words, the plot is not roughly horizontal.

In the case of the ACF graphs, you can see that the closing price is well beyond the ACF boundaries, therefore the data is not white noise. If the data is stationary, the spikes would almost completely stay within the ACF boundaries.

gafa_stock %>%
  filter(Symbol == 'AMZN') %>%
  autoplot(Close) +
  labs(title='Amazon Closing Stock Prices')

gafa_stock %>%
  filter(Symbol == 'AMZN') %>%
  gg_tsdisplay(Close, plot_type = 'partial')

## Warning: Provided data has an irregular interval, results should be treated with
## caution. Computing ACF by observation.

## Warning: Provided data has an irregular interval, results should be treated with
## caution. Computing ACF by observation.

3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Turkish GDP from global_economy.

turkey_gdp <- global_economy %>%
  filter(Country == 'Turkey') %>%
  select(Country, GDP)

lambda_gdp <- turkey_gdp %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

turkey_gdp %>%
  mutate(GDP = box_cox(GDP, lambda_gdp)) %>%
  features(GDP, unitroot_ndiffs)

## # A tibble: 1 x 2
##   Country ndiffs
##   <fct>    <int>
## 1 Turkey       1

Accommodation takings in the state of Tasmania from aus_accommodation.

tasmania_takings <- aus_accommodation %>%
  filter(State == 'Tasmania') %>%
  select(State, Takings)

lambda_takings <- tasmania_takings %>%
  features(Takings, features = guerrero) %>%
  pull(lambda_guerrero)

tasmania_takings %>%
  mutate(Takings = box_cox(Takings, lambda_takings)) %>%
  features(Takings, unitroot_ndiffs)

## # A tibble: 1 x 2
##   State    ndiffs
##   <chr>     <int>
## 1 Tasmania      1

Monthly sales from souvenirs.

lambda_sales <- souvenirs %>%
  features(Sales, features = guerrero) %>%
  pull(lambda_guerrero)

souvenirs %>%
  mutate(Sales = box_cox(Sales, lambda_sales)) %>%
  features(Sales, unitroot_ndiffs)

## # A tibble: 1 x 1
##   ndiffs
##    <int>
## 1      1

5

For your retail data (from Exercise 8 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

set.seed(614)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

lambda_turnover <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

myseries %>%
  mutate(Turnover = box_cox(Turnover, lambda_turnover)) %>%
  features(Turnover, unitroot_ndiffs)

## # A tibble: 1 x 3
##   State                        Industry                       ndiffs
##   <chr>                        <chr>                           <int>
## 1 Australian Capital Territory Supermarket and grocery stores      1

6

Simulate and plot some data from simple ARIMA models.

Use the following R code to generate data from an AR(1) model with \(\phi_1 = 0.6\) and \(\sigma^2=1\). The process starts with \(y_1 = 0\).

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

Produce a time plot for the series. How does the plot change as you change \(\phi_1\)?

sim %>%
  autoplot()

## Plot variable not specified, automatically selected `.vars = y`

Increasing the \(\phi_1\) reduces the randomness and forms a trend.

for(i in 2:100)
  y[i] <- 0.9*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim %>% autoplot()

## Plot variable not specified, automatically selected `.vars = y`

for(i in 2:100)
  y[i] <- 1.2*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim %>% autoplot()

## Plot variable not specified, automatically selected `.vars = y`

Write your own code to generate data from an MA(1) model with \(\phi_1 = 0.6\) and \(\sigma^2=1\).

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*e[i-1] + e[i]

Produce a time plot for the series. How does the plot change as you change \(\phi_1\)?

Increasing \(\phi\) in the MA model smooths the time series but the trend remains around 0.

sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim %>%
  autoplot()

## Plot variable not specified, automatically selected `.vars = y`

for(i in 2:100)
  y[i] <- 0.9*e[i-1] + e[i]

sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim %>%
  autoplot()

## Plot variable not specified, automatically selected `.vars = y`

Generate data from an ARMA(1,1) model with \(\phi_1 = 0.6\), \(\Theta_1 = 0.6\) and \(\sigma^2=1\).

phi <- 0.6
theta <- 0.6
sigma <- 1
y1 <- ts(numeric(100))
e <- rnorm(1000, sigma)

for(i in 2:100)
  y1[i] <- phi*y1[i-1] + theta*e[i-1] + e[i]

Generate data from an AR(2) model with \(\phi_1=0.6\), \(\phi_2=0.3\) and \(\sigma^2=1\). (Note that these parameters will give a non-stationary series.)

phi_1 <- 0.6
phi_2 <- 0.3
sigma <- 1
y2 <- ts(numeric(100))
e <- rnorm(100, sigma)

for(i in 3:100)
  y2[i] <- phi_1*y2[i-1] + phi_2*y2[i-2] + e[i]

Graph the latter two series and compare them.

You can see the AR(2) model produces a time series with a developing trend line.

sim1 <- tsibble(idx = seq_len(100), y = y1, index = idx)
sim1 %>% autoplot()

## Plot variable not specified, automatically selected `.vars = y`

sim2 <- tsibble(idx = seq_len(100), y = y2, index = idx)
sim2 %>% autoplot()

## Plot variable not specified, automatically selected `.vars = y`

7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

fit <- aus_airpassengers %>%
  model(
    search = ARIMA(Passengers, stepwise = FALSE)
  )

report(fit)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

glance(fit)

## # A tibble: 1 x 8
##   .model sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
##   <chr>   <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
## 1 search   4.31   -97.0  198.  198.  202. <cpl [0]> <cpl [1]>

fit %>%
  gg_tsresiduals()

augment(fit) %>%
  features(.innov, ljung_box, lag = 10, dof = 3)

## # A tibble: 1 x 3
##   .model lb_stat lb_pvalue
##   <chr>    <dbl>     <dbl>
## 1 search    6.70     0.461

fit %>%
  forecast(h=10) %>%
  autoplot(aus_airpassengers)

Write the model in terms of the backshift operator.

\[(1-B)^2y_t=c +(1+\theta_1B)e_t\]

Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

ARIMA(0,1,0) w/ drift compared the auto generated has a larger AIC and BIC values. Therefore, the auto generated ARIMA model outperforms this newer model.

fit2 <- aus_airpassengers %>%
  model(
    arima010 = ARIMA(Passengers ~ pdq(0,1,0))
  )

report(fit2)

## Series: Passengers 
## Model: ARIMA(0,1,0) w/ drift 
## 
## Coefficients:
##       constant
##         1.4191
## s.e.    0.3014
## 
## sigma^2 estimated as 4.271:  log likelihood=-98.16
## AIC=200.31   AICc=200.59   BIC=203.97

glance(fit2)

## # A tibble: 1 x 8
##   .model   sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
##   <chr>     <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
## 1 arima010   4.27   -98.2  200.  201.  204. <cpl [0]> <cpl [0]>

fit2 %>%
  gg_tsresiduals()

augment(fit2) %>%
  features(.innov, ljung_box, lag = 10, dof = 3)

## # A tibble: 1 x 3
##   .model   lb_stat lb_pvalue
##   <chr>      <dbl>     <dbl>
## 1 arima010    6.77     0.453

fit2 %>%
  forecast(h=10) %>%
  autoplot(aus_airpassengers)

Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

This results in a NULL model.

fit3 <- aus_airpassengers %>%
  model(
    arima212 = ARIMA(Passengers ~ 0 + pdq(2,1,2))
  )

## Warning: 1 error encountered for arima212
## [1] non-stationary AR part from CSS

report(fit3)

## Series: Passengers 
## Model: NULL model 
## NULL model

glance(fit3)

## # A tibble: 0 x 1
## # ... with 1 variable: .model <chr>

# 
# fit3 %>%
#   gg_tsresiduals()
# 
# augment(fit3) %>%
#   features(.innov, ljung_box, lag = 10, dof = 3)
# 
# fit3 %>%
#   forecast(h=10) %>%
#   autoplot(aus_airpassengers)

Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

This is the same as the auto generated ARIMA model.

fit4 <- aus_airpassengers %>%
  model(
    arima021 = ARIMA(Passengers ~ pdq(0,2,1))
  )

report(fit4)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

glance(fit4)

## # A tibble: 1 x 8
##   .model   sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
##   <chr>     <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
## 1 arima021   4.31   -97.0  198.  198.  202. <cpl [0]> <cpl [1]>

fit4 %>%
  gg_tsresiduals()

augment(fit4) %>%
  features(.innov, ljung_box, lag = 10, dof = 3)

## # A tibble: 1 x 3
##   .model   lb_stat lb_pvalue
##   <chr>      <dbl>     <dbl>
## 1 arima021    6.70     0.461

fit4 %>%
  forecast(h=10) %>%
  autoplot(aus_airpassengers)

8

For the United States GDP series (from global_economy):

if necessary, find a suitable Box-Cox transformation for the data;

us_gdp <- global_economy %>%
  filter(Code == 'USA') %>%
  select(Country, GDP)

lambda_gdp <- turkey_gdp %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

us_gdp <- us_gdp %>%
  mutate(GDP = box_cox(GDP, lambda_gdp))

fit a suitable ARIMA model to the transformed data using ARIMA();

fit <- us_gdp %>%
  model(
    arima = ARIMA(GDP, stepwise = FALSE, approx = FALSE)
  )
report(fit)

## Series: GDP 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.6534
## s.e.   0.1266
## 
## sigma^2 estimated as 3.881:  log likelihood=-117.16
## AIC=238.31   AICc=238.54   BIC=242.36

try some other plausible models by experimenting with the orders chosen;

The below test show that the d = 2 and q = 1 as a possible ARIMA model - the same model that was auto generated by the ARIMA model. For the purpose of these questions, I will generate a second ARIMA model.

us_gdp %>%
  features(GDP, unitroot_ndiffs)

## # A tibble: 1 x 2
##   Country       ndiffs
##   <fct>          <int>
## 1 United States      2

us_gdp %>%
  gg_tsdisplay(GDP, plot_type = 'partial')

fit2 <- us_gdp %>%
  model(
    arima1520 = ARIMA(GDP ~ pdq(2,2,1))
    )

report(fit2)

## Series: GDP 
## Model: ARIMA(2,2,1) 
## 
## Coefficients:
##          ar1      ar2      ma1
##       0.2310  -0.0871  -0.7454
## s.e.  0.1966   0.1611   0.1575
## 
## sigma^2 estimated as 3.865:  log likelihood=-116.02
## AIC=240.05   AICc=240.83   BIC=248.15

choose what you think is the best model and check the residual diagnostics;

The auto generated model has the lower AIC and BIC. The below test shows a high p - value indicating that the data is white noise. Furthermore, the ACF graph show that the spikes remain within boundaries.

augment(fit) %>%
  features(.innov, ljung_box, lag = 10, dof = 3)

## # A tibble: 1 x 4
##   Country       .model lb_stat lb_pvalue
##   <fct>         <chr>    <dbl>     <dbl>
## 1 United States arima     6.12     0.526

fit %>%
  gg_tsresiduals()

produce forecasts of your fitted model. Do the forecasts look reasonable?

The below graph shows the forecast on the US GDP time series. The forecast is within reason on the time series.

fit %>%
  forecast(h = 10) %>%
  autoplot(us_gdp)

compare the results with what you would obtain using ETS() (with no transformation).

With no transformations, the forecast using ETS() generates a forecast with a much wider confidence level.

global_economy %>%
  filter(Code == 'USA') %>%
  select(Country, GDP) %>%
  model(
    ETS(GDP)
  ) %>%
  forecast(h = 10) %>%
  autoplot(global_economy)

ARIMA

Diego Correa

10/22/2021

1

2

3

5

6

7

8