DATA 624 - Homework 6

9.1 Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise?

The plots differ in the size of the autocorrelations and the confidence intervals (blue lines) for each. The confidence interval narrows as there is more observations. The autocorrelations tend to approach zero when there are more observations.

There are no significant lags in the ACF plots. All of the ACF plots indicate that the data is white noise.

b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

As the sample size increases, the confidence interval narrows due to more precise estimates. Consequently, as the sample size progresses from 36 to 360 to 1,000 samples, the critical values approach zero, reflecting the increased certainty in parameter estimates.

Similarly, although each plot represents white noise, smaller sample sizes may yield seemingly larger autocorrelations due to random chance. As the sample size increases, the estimations tend to converge toward zero.

9.2 A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

 knitr::opts_chunk$set(warning = FALSE, message = FALSE)  

# Load the package
library(fpp3)

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.1
## ✔ lubridate   1.9.4     ✔ fable       0.4.1
## ✔ ggplot2     3.5.1

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

library(tsibbledata)
library(tsibble)
library(caret)

## Loading required package: lattice

## 
## Attaching package: 'caret'

## The following objects are masked from 'package:fabletools':
## 
##     MAE, RMSE

library(dplyr)   
library(ggplot2) 
library(feasts) 
library(forecast)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

library(fable)

gafa_stock |>
  filter(Symbol == "AMZN") |>
  gg_tsdisplay(Close, plot_type='partial') +
  labs(title = "Amazon Closing Stock Price")

The plots above show that the Amazon Closing Price is non stationary because the ACF plot has lag values that are large and decreasing slowly. A stationary time series would have an ACF plot that quickly converges to zero. The time series plot shows an increasing trend, therefore it is not stationary. The PACF for a stationary time series would be 0 for all of the lags. The PACF graph in this example shows a lag value at 1.

9.3 For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Turkish GDP from global_economy.

For this plot, the Box-Cox transformation resulted in a lambda = 0

turkey_data <- global_economy %>% 
  filter(Country == 'Turkey')

turkey_data %>% autoplot(GDP)

bc <- BoxCoxTrans(turkey_data$GDP)

(lambda <- bc$lambda)

## [1] 0

turkey_data %>%
  mutate(GDP_transformed = log(GDP)) %>%
  autoplot(GDP_transformed)

turkey_data %>%
  autoplot(difference(difference(log(GDP))))

b. Accommodation takings in the state of Tasmania from aus_accommodation.

For this plot, the appropriate lambda was 0.3

data("aus_accommodation")

tasmania_accomodations <- aus_accommodation %>% 
  filter(State == 'Tasmania')
tasmania_accomodations %>%
  autoplot(Takings)

bc <- BoxCoxTrans(tasmania_accomodations$Takings)

(lambda <- bc$lambda)

## [1] 0.3

tasmania_accomodations %>%
  mutate(Takings_transformed = (Takings^lambda-1)/lambda) %>%
  autoplot(Takings_transformed)

tasmania_accomodations %>%
  mutate(Takings_transformed = (Takings^lambda-1)/lambda) %>%
  autoplot(difference(difference(Takings_transformed,4)))

c. Monthly sales from souvenirs

For this plot, the appropriate lambda was -0.2

souvenirs %>% autoplot(Sales)

bc <- BoxCoxTrans(souvenirs$Sales)
(lambda <- bc$lambda)

## [1] -0.2

souvenirs %>%
  mutate(Sales_transformed = (Sales^lambda-1)/lambda) %>%
  autoplot(Sales_transformed)

souvenirs %>%
  mutate(Sales_transformed = (Sales^lambda-1)/lambda) %>%
  autoplot(difference(difference(difference(Sales_transformed,12))))

9.5 For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

For this data, I used a seasonal difference of 12 and then a secondar difference of 1

set.seed(9079180)

myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1))

# Apply the Box-Cox transformation
bc <- BoxCoxTrans(myseries$Turnover)

# Extract the lambda parameter
lambda <- bc$lambda

# Transform the Turnover and plot the results
myseries %>%
  mutate(Turnover_trans = (Turnover^lambda - 1) / lambda) %>%
  autoplot(Turnover_trans)

myseries %>%
  mutate(Turnover_trans = (Turnover^lambda-1)/lambda) %>%
  autoplot(difference(difference(Turnover_trans,12)))

9.6 Simulate and plot some data from simple ARIMA models.

Use the following R code to generate data from an AR(1) model with ϕ1=0.6 and σ2=1. The process starts with y1=0.

y <- numeric(100)
e <- rnorm(100)

for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]

sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim

## # A tsibble: 100 x 2 [1]
##      idx      y
##    <int>  <dbl>
##  1     1  0    
##  2     2 -0.808
##  3     3  0.333
##  4     4 -0.274
##  5     5 -0.216
##  6     6 -0.854
##  7     7 -1.39 
##  8     8 -1.29 
##  9     9 -1.07 
## 10    10  1.35 
## # ℹ 90 more rows

b. Produce a time plot for the series. How does the plot change as you change phi_1

sim %>% autoplot()

# phi=0.2
for(i in 2:100)
  y[i] <- 0.2*y[i-1] + e[i]
sim02 <- tsibble(idx = seq_len(100), y = y, index = idx)

# phi=1.0
for(i in 2:100)
  y[i] <- 1.0*y[i-1] + e[i]
sim10 <- tsibble(idx = seq_len(100), y = y, index = idx)

sim02 %>% autoplot(y)

sim10 %>% autoplot(y)

Lower the ϕ1, the more oscillations occur around zero, the higher the ϕ1 value, the fewer oscillations and thus the plot doesn’t center around zero.

c. Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1.

y <- numeric(100)
e <- rnorm(100)

for(i in 2:100)
  y[i] <- e[i] + 0.6*e[i-1]

sim_ma1 <- tsibble(idx = seq_len(100), y = y, index = idx)

d. Produce a time plot for the series. How does the plot change as you change θ1?

sim_ma1 %>% autoplot(y)

# theta is 0.2
for(i in 2:100)
  y[i] <- e[i] + 0.2*e[i-1]
sim_ma02 <- tsibble(idx = seq_len(100), y = y, index = idx)

# theta is 1.0
for(i in 2:100)
  y[i] <- e[i] + 1.0*e[i-1]
sim_ma10 <- tsibble(idx = seq_len(100), y = y, index = idx)

sim_ma02 %>% autoplot(y)

sim_ma10 %>% autoplot(y)

The lower the value of θ1, the closer the plot stays around zero, but with the higher value of θ1, the plot still remains around zero, but absolute values of y tend to be larger.

e. Generate data from an ARMA(1,1) model with ϕ1=0.6, θ1=0.6 and σ2=1.

y <- numeric(100)
e <- rnorm(100)

phi <- 0.6
theta <- 0.6

for(i in 2:100)
  y[i] <- phi*y[i-1] + theta*e[i-1] + e[i]

sim_arma11 <- tsibble(idx = seq_len(100), y = y, index = idx)

f. Generate data from an AR(2) model with ϕ1=−0.8 , ϕ2=0.3 and σ2=1 . (Note that these parameters will give a non-stationary series.)

y <- numeric(100)
e <- rnorm(100)

for(i in 3:100)
  y[i] <- -0.8*y[i-1] + 0.3*y[i-2] + e[i]

sim_ar2 <- tsibble(idx = seq_len(100), y = y, index = idx)

g. Graph the latter two series and compare them.

sim_arma11 %>% autoplot(y)

sim_ar2 %>% autoplot(y)

The plot for ARMA(1,1) model appears close to stationary data. The variance does appear to increase as the plot moves from left to right.

As for the AR(2) model, the plot is certainly not stationary. The variance clearly grows as the plot moves from left to right.

9.7 Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

train <- aus_airpassengers %>%
  filter(Year <= 2011)

test <- aus_airpassengers %>%
  filter(Year > 2011)

train %>%
  gg_tsdisplay(difference(log(Passengers)), plot_type='partial')

fit <- train %>%
  model(arima110 = ARIMA(log(Passengers) ~ pdq(1,1,0)),
        stepwise = ARIMA(log(Passengers)),
        search = ARIMA(Passengers, stepwise=FALSE))

glance(fit)

## # A tibble: 3 × 8
##   .model    sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
##   <chr>      <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
## 1 arima110 0.00576    48.6 -91.2 -90.5 -86.0 <cpl [1]> <cpl [0]>
## 2 stepwise 0.00562    48.5 -93.1 -92.8 -89.7 <cpl [0]> <cpl [0]>
## 3 search   4.67      -87.8 180.  180.  183.  <cpl [0]> <cpl [1]>

fit_new <- train %>%
  model(arima021 = ARIMA(log(Passengers) ~ pdq(0,2,1)))

fit_new %>%
  forecast(h=10) %>%
  autoplot(train)

The best model was the model selected by the function, which was an ARIMA(0,2,1) model

b. Write the model in terms of the backshift operator.
yt+1=(1−B)2∗log(Passengers)

c. Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.
The forecast appears pretty much the same as the model selected in part a

fit <- train %>%
  model(ARIMA(log(Passengers) ~ pdq(0,1,0)))

fit %>% forecast(h=10) %>%
  autoplot(train)

d. Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

fit1 <- train %>%
  model(ARIMA(log(Passengers) ~ 1 + pdq(2,1,2)))

fit2 <- train %>%
  model(ARIMA(log(Passengers) ~ 0 + pdq(2,1,0)))


fit1 %>% forecast(h=10) %>%
  autoplot(train)

fit2 %>% forecast(h=10) %>%
  autoplot(train)

e. Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

When I use a constant in the model, I receive a warning

fit <- train %>%
  model(ARIMA(log(Passengers) ~ pdq(0,2,1)))

fc <- fit %>% forecast(h=10) 

fit %>% 
  forecast(h=10) %>%
  autoplot(train)

9.8 For the United States GDP series (from global_economy):

if necessary, find a suitable Box-Cox transformation for the data;

us_economy <- global_economy %>%
  filter(Country == 'United States') %>%
  mutate(GDP = GDP/1e6)


us_economy %>% autoplot(GDP)

bc <- BoxCoxTrans(us_economy$GDP)
lambda <- bc$lambda

us_economy <- us_economy %>%
  mutate(GDP_trans = (GDP^lambda-1)/lambda)

b. fit a suitable ARIMA model to the transformed data using ARIMA();

us_economy %>%
  gg_tsdisplay(difference(difference(GDP_trans)), plot_type='partial')

model_fit <- us_economy %>%
  model(arima220 = ARIMA(GDP_trans ~ 0 + pdq(2,2,0)), 
        arima021 = ARIMA(GDP_trans ~ 0 + pdq(0,2,1)),
        arima120 = ARIMA(GDP_trans ~ 0 + pdq(1,2,0)),
        step = ARIMA(GDP_trans),
        search = ARIMA(GDP_trans, stepwise = FALSE))

glance(model_fit) %>% arrange(AICc)

## # A tibble: 5 × 9
##   Country       .model   sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
##   <fct>         <chr>     <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
## 1 United States step      0.180   -31.1  68.2  68.7  74.3 <cpl [1]> <cpl [0]>
## 2 United States search    0.180   -31.1  68.2  68.7  74.3 <cpl [1]> <cpl [0]>
## 3 United States arima021  0.191   -32.8  69.6  69.9  73.7 <cpl [0]> <cpl [1]>
## 4 United States arima220  0.206   -34.2  74.5  75.0  80.6 <cpl [2]> <cpl [0]>
## 5 United States arima120  0.216   -36.1  76.2  76.4  80.2 <cpl [1]> <cpl [0]>

model_fit

## # A mable: 1 x 6
## # Key:     Country [1]
##   Country         arima220       arima021       arima120                    step
##   <fct>            <model>        <model>        <model>                 <model>
## 1 United S… <ARIMA(2,2,0)> <ARIMA(0,2,1)> <ARIMA(1,2,0)> <ARIMA(1,1,0) w/ drift>
## # ℹ 1 more variable: search <model>

c. try some other plausible models by experimenting with the orders chosen;

Please see above

d. choose what you think is the best model and check the residual diagnostics;

fit <- us_economy %>%
  model(ARIMA(GDP_trans ~ pdq(1,1,0) ))


fit %>% gg_tsresiduals()

e. produce forecasts of your fitted model. Do the forecasts look reasonable?

Yes, the forecast appears reasonable.

fit %>% forecast(h=10) %>%
  autoplot(us_economy)

f. compare the results with what you would obtain using ETS() (with no transformation).

fit <- us_economy %>%
  model(ETS(GDP_trans))

fit %>% forecast(h=10) %>% 
  autoplot(us_economy)

report(fit)

## Series: GDP_trans 
## Model: ETS(M,A,N) 
##   Smoothing parameters:
##     alpha = 0.9999 
##     beta  = 0.3590179 
## 
##   Initial states:
##     l[0]      b[0]
##  64.3256 0.8176971
## 
##   sigma^2:  0
## 
##      AIC     AICc      BIC 
## 142.0720 143.2259 152.3742

DATA 624 - Homework 6

Bikash Bhowmik

23 Mar 2025