DATA624-HW6

library(fpp3)
## Warning: package 'fpp3' was built under R version 4.3.3
## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.0 ──
## ✔ tibble      3.2.1     ✔ tsibble     1.1.5
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.3.2
## ✔ lubridate   1.9.3     ✔ fable       0.3.4
## ✔ ggplot2     3.5.1     ✔ fabletools  0.4.2
## Warning: package 'tsibble' was built under R version 4.3.3
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0     ✔ readr   2.1.5
## ✔ purrr   1.0.2     ✔ stringr 1.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()     masks stats::filter()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag()        masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(forecast)
## Warning: package 'forecast' was built under R version 4.3.3
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
library(urca)
## Warning: package 'urca' was built under R version 4.3.3

9.1 Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise?

The figures represent ACFs for three series of random numbers with a different length. The three lengths we see here are 36, 360 and 1,000. The main difference between these figures is the length of the time series and how it affects the spikes and the bounded area which are the confidence intervals. As the time series increases the spikes become smaller and the bounded area around zero also becomes narrower. Despite these differences, all the figures indicate that the data are white noise because the autocorrelation values generally fall within the two blue lines. None of the autocorrelation values are significantly outside these intervals, which shows that the data points are uncorrelated and behave as white noise.

Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers. Figure 9.32: Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers.

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The critical values are at different distances from the mean of zero because the length of the time series affects the width of the confidence intervals.This shrinking of the confidence bounds occurs because more data (for example 1,000 observations) leads to a more precise estimate of the true autocorrelations, reducing random fluctuations. The differences in autocorrelation values across the figures arise because, with smaller time series (for example the 36 observations), random fluctuations have a larger effect on the calculated autocorrelations, leading to more variation. In contrast, as the time series grows (for example 1,000 observations), the random fluctuations are less defined, and the autocorrelation values tend to hover closer to zero. In this case, being that all the data are white noise, these variations are purely random and do not indicate any underlying pattern or correlation.

# Generate random white noise series of different observation lengths
set.seed(1234)
y_36 <- rnorm(36)
y_360 <- rnorm(360)
y_1000 <- rnorm(1000)

# Plot ACFs for the three series
par(mfrow = c(1, 3))
acf(y_36, main = "ACF for 36 Random Numbers")
acf(y_360, main = "ACF for 360 Random Numbers")
acf(y_1000, main = "ACF for 1000 Random Numbers")

9.2 A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

The time series in this case of amazon close prices has a clear uptrend which makes it non-stationary. The ACF plot is decreasing at a very slow rate. The PACF plot also shows a lag around 1 which makes this non-stationary.

# Amazon stock daily closing prices
amazon <- gafa_stock %>% filter(Symbol == "AMZN")

# Time plot
amazon %>% autoplot(Close) + labs(title = "Amazon Closing Stock Price", y = "Price")

# ACF Plot
amazon %>% ACF(Close) %>% autoplot() + labs(title = "ACF of Amazon Stock Prices")
## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.

# PACF Plot
amazon %>% PACF(Close) %>% autoplot() + labs(title = "PACF of Amazon Stock Prices")
## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.

9.3 For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Turkish GDP from global_economy.

# Turkish GDP (global_economy)
global_economy %>% filter(Country == "Turkey") %>%
  autoplot(GDP) + labs(title = "Turkish GDP")

lambda <- global_economy %>%
  filter(Country == "Turkey") %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

global_economy %>%
  filter(Country == "Turkey") %>%
  mutate(GDP_trans = box_cox(GDP, lambda)) %>%
  features(GDP_trans, unitroot_nsdiffs)
## # A tibble: 1 × 2
##   Country nsdiffs
##   <fct>     <int>
## 1 Turkey        0

Accommodation takings in the state of Tasmania from aus_accommodation.

# Accommodation takings in the state of Tasmania from aus_accommodation plot
aus_accommodation %>%
  filter(State == "Tasmania") %>%
  gg_tsdisplay(Takings, plot_type='partial') +
  labs(title = "Non-transformed Tasmania Accomodation Takings")

lambda <- aus_accommodation %>%
  filter(State == "Tasmania") %>%
  features(Takings, features = guerrero) %>%
  pull(lambda_guerrero)

aus_accommodation %>%
  filter(State == "Tasmania") %>%
  mutate(Takings_trans = box_cox(Takings, lambda)) %>%
  features(Takings_trans, unitroot_nsdiffs)
## # A tibble: 1 × 2
##   State    nsdiffs
##   <chr>      <int>
## 1 Tasmania       1
# Box-Cox transformation and differencing for Tasmania accommodation data
aus_accommodation %>%
  filter(State == "Tasmania") %>%
  model(ETS(Takings ~ trend("A"))) %>%
  report()
## Series: Takings 
## Model: ETS(M,A,M) 
##   Smoothing parameters:
##     alpha = 0.6668456 
##     beta  = 0.000369898 
##     gamma = 0.3137473 
## 
##   Initial states:
##      l[0]      b[0]     s[0]     s[-1]     s[-2]    s[-3]
##  21.51555 0.5531155 1.104761 0.7202077 0.8728188 1.302213
## 
##   sigma^2:  0.0018
## 
##      AIC     AICc      BIC 
## 397.9205 400.7330 418.6571

Monthly sales from souvenirs.

# Monthly sales from souvenirs
souvenirs %>% autoplot(Sales)

# Transformation example:
souvenirs %>% features(Sales, features = guerrero) # Find lambda for Box-Cox
## # A tibble: 1 × 1
##   lambda_guerrero
##             <dbl>
## 1         0.00212
# Apply Box-Cox with lambda = 0.5 and differencing
souvenirs %>% mutate(Sales_trans = box_cox(Sales, lambda = 0.5)) %>%
  autoplot(Sales_trans)

souvenirs %>% autoplot(Sales)

lambda <- souvenirs %>%
  features(Sales, features = guerrero) %>%
  pull(lambda_guerrero)

souvenirs %>%
  mutate(Sales_trans = box_cox(Sales, lambda)) %>%
  features(Sales_trans, unitroot_nsdiffs)
## # A tibble: 1 × 1
##   nsdiffs
##     <int>
## 1       1

9.5 For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

# Plot aus_retail data from department stores
retail_data <- aus_retail %>%
  filter(Industry == "Department stores")

retail_data %>% autoplot(Turnover)

 # Check stationarity
retail_data %>% features(Turnover, unitroot_kpss)
## # A tibble: 6 × 4
##   State                        Industry          kpss_stat kpss_pvalue
##   <chr>                        <chr>                 <dbl>       <dbl>
## 1 Australian Capital Territory Department stores      6.36        0.01
## 2 New South Wales              Department stores      6.63        0.01
## 3 Queensland                   Department stores      7.02        0.01
## 4 South Australia              Department stores      5.63        0.01
## 5 Victoria                     Department stores      6.93        0.01
## 6 Western Australia            Department stores      7.05        0.01
# Apply differencing 
retail_data_diff <- retail_data %>%
  mutate(Turnover_diff = difference(Turnover, lag = 1))
retail_data_diff %>% autoplot(Turnover_diff)
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

9.6 Simulate and plot some data from simple ARIMA models.

Use the following R code to generate data from an AR(1) model with
Ï• 1 = 0.6 and
σ 2 = 1 . The process starts with
y 1 = 0 .

y <- numeric(100) e <- rnorm(100) for(i in 2:100) y[i] <- 0.6*y[i-1] + e[i] sim <- tsibble(idx = seq_len(100), y = y, index = idx) Produce a time plot for the series. How does the plot change as you change
Ï• 1 ?

y <- numeric(100)
e <- rnorm(100)
for (i in 2:100) {
  y[i] <- 0.6 * y[i - 1] + e[i]
}
sim_ar <- tsibble(idx = seq_len(100), y = y, index = idx)
autoplot(sim_ar)
## Plot variable not specified, automatically selected `.vars = y`

Write your own code to generate data from an MA(1) model with
θ 1 = 0.6 and
σ 2 = 1 .

Produce a time plot for the series. How does the plot change as you change
θ 1 ?

y <- numeric(100)
e <- rnorm(100)
for (i in 2:100) {
  y[i] <- e[i] + 0.6 * e[i - 1]
}
sim_ma <- tsibble(idx = seq_len(100), y = y, index = idx)
autoplot(sim_ma)
## Plot variable not specified, automatically selected `.vars = y`

Generate data from an ARMA(1,1) model with
Ï• 1 = 0.6 ,
θ 1 = 0.6 and
σ 2 = 1 .

y <- numeric(100)
e <- rnorm(100)
for (i in 2:100) {
  y[i] <- 0.6 * y[i - 1] + e[i] + 0.6 * e[i - 1]
}
sim_arma <- tsibble(idx = seq_len(100), y = y, index = idx)
autoplot(sim_arma)
## Plot variable not specified, automatically selected `.vars = y`

Generate data from an AR(2) model with
ϕ 1 = − 0.8 ,
Ï• 2 = 0.3 and
σ 2 = 1 . (Note that these parameters will give a non-stationary series.)

y <- numeric(100)
e <- rnorm(100)
for (i in 3:100) {
  y[i] <- -0.8 * y[i - 1] + 0.3 * y[i - 2] + e[i]
}
sim_ar2 <- tsibble(idx = seq_len(100), y = y, index = idx)
autoplot(sim_ar2)
## Plot variable not specified, automatically selected `.vars = y`

Graph the latter two series and compare them.

9.7 Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods. Write the model in terms of the backshift operator. Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a. Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens. Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

# Fit ARIMA model to aus_airpassengers data
fit <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ PDQ(0,1,0)))
report(fit)
## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65
# Check residuals 
fit %>% gg_tsresiduals()

# Plot forecasts
fit %>% forecast(h = 10) %>% autoplot(aus_airpassengers)

9.8 For the United States GDP series (from global_economy):

if necessary, find a suitable Box-Cox transformation for the data; fit a suitable ARIMA model to the transformed data using ARIMA(); try some other plausible models by experimenting with the orders chosen; choose what you think is the best model and check the residual diagnostics; produce forecasts of your fitted model. Do the forecasts look reasonable? compare the results with what you would obtain using ETS() (with no transformation).

# Plot US GDP data
us_gdp <- global_economy %>% filter(Country == "United States")
us_gdp %>% autoplot(GDP)

lambda <- us_gdp %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

# Box-Cox transformation and ARIMA model
us_gdp %>% features(GDP, features = guerrero)
## # A tibble: 1 × 2
##   Country       lambda_guerrero
##   <fct>                   <dbl>
## 1 United States           0.282
fit_arima <- us_gdp %>%
  model(ARIMA(box_cox(GDP, lambda = 0.3)))
report(fit_arima)
## Series: GDP 
## Model: ARIMA(1,1,0) w/ drift 
## Transformation: box_cox(GDP, lambda = 0.3) 
## 
## Coefficients:
##          ar1  constant
##       0.4572  200.4845
## s.e.  0.1201   16.1779
## 
## sigma^2 estimated as 15874:  log likelihood=-355.64
## AIC=717.28   AICc=717.73   BIC=723.41
# Forecast and compare with ETS model
fc_arima <- fit_arima %>% forecast(h = 10)
fc_ets <- us_gdp %>% model(ETS(GDP)) %>% forecast(h = 10)

autoplot(fc_arima) + autolayer(fc_ets, series = "ETS")
## Warning in ggdist::geom_lineribbon(without(intvl_mapping, "colour_ramp"), :
## Ignoring unknown parameters: `series`
## Warning in geom_line(mapping = without(mapping, "shape"), data =
## unpack_data(object[single_row[["FALSE"]], : Ignoring unknown parameters:
## `series`
## Scale for fill_ramp is already present.
## Adding another scale for fill_ramp, which will replace the existing scale.