Do the exercises 9.1, 9.2, 9.3, 9.5, 9.6, 9.7, 9.8 in Hyndman. Please submit both the Rpubs link as well as your .rmd file.

library(fpp3)

## Warning: package 'fpp3' was built under R version 4.3.3

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.0 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.5
## ✔ dplyr       1.1.3     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.0     ✔ feasts      0.3.2
## ✔ lubridate   1.9.3     ✔ fable       0.3.4
## ✔ ggplot2     3.5.1     ✔ fabletools  0.4.2

## Warning: package 'ggplot2' was built under R version 4.3.3

## Warning: package 'tsibble' was built under R version 4.3.3

## Warning: package 'tsibbledata' was built under R version 4.3.3

## Warning: package 'feasts' was built under R version 4.3.3

## Warning: package 'fabletools' was built under R version 4.3.3

## Warning: package 'fable' was built under R version 4.3.3

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

library(forecast)

## Warning: package 'forecast' was built under R version 4.3.3

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

library(ggplot2)
library(urca)

## Warning: package 'urca' was built under R version 4.3.3

library(ggfortify)

## Warning: package 'ggfortify' was built under R version 4.3.3

## Registered S3 methods overwritten by 'ggfortify':
##   method                 from    
##   autoplot.Arima         forecast
##   autoplot.acf           forecast
##   autoplot.ar            forecast
##   autoplot.bats          forecast
##   autoplot.decomposed.ts forecast
##   autoplot.ets           forecast
##   autoplot.forecast      forecast
##   autoplot.stl           forecast
##   autoplot.ts            forecast
##   fitted.ar              forecast
##   fortify.ts             forecast
##   residuals.ar           forecast

library(dplyr)
library(gridExtra)

## 
## Attaching package: 'gridExtra'

## The following object is masked from 'package:dplyr':
## 
##     combine

#9.1 - Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

#Explain the differences among these figures. Do they all indicate that the data are white noise? #Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers. #Figure 9.32: Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers. The difference between these figures is that the size of the autocorrelation spikes decreases as the sample size increases, reflecting more stable estimates with larger samples. The critical values shrink with larger sample sizes because the standard error of the autocorrelations decreases, following the relationship 1.96/sqrt(T). Autocorrelations appear more scattered in smaller samples due to greater variability, but they level out closer to zero as the sample size grows. Despite these differences, all figures indicate white noise, as there are no clear patterns across any of the series.

#Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise? The critical values are at different distances because they depend on the sample size𝑇, with larger 𝑇leading to smaller critical values, as approximated by the formula +-2/(sqrt T)

#9.2 - A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

#head(gafa_stock)
am_stock <- gafa_stock %>%
  filter(Symbol == "AMZN") %>%
  dplyr::select(Date, Close)

am_stock %>%
  autoplot(Close) +
  ggtitle("AMZN Daily prices")

# ACF
am_stock %>%
  ACF(Close) %>%
  autoplot() + ggtitle("ACF: AMZN Stock Prices")

## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.

#PACF
am_stock %>%
  PACF(Close) %>%
  autoplot() + ggtitle("PACF: AMZN Stock Prices")

## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.

The ACF plot shows slow decay with correlations at several different lags, indicating thta there is a trend and non-stationarity. The PACF plot has a large spike at the first lag, which likely indicates strong autocorrelation. These patterns show that the series is non-stationary and should be differenced, which could help stabilize the variance of the time series

#9.3 - For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Accommodation takings in the state of Tasmania from aus_accommodation. Monthly sales from souvenirs.

#filter Turkish GDP, take a look
turkey_econ <- global_economy %>%
    filter(Country == 'Turkey')
autoplot(turkey_econ)

## Plot variable not specified, automatically selected `.vars = GDP`

# find lambda and transform
BoxCox.lambda(turkey_econ$GDP)

## [1] 0.1571804

#box cox
turkey_econ %>%
    autoplot(BoxCox(GDP, BoxCox.lambda(turkey_econ$GDP)), colour = "#E30A17")

## Warning: Use of `turkey_econ$GDP` is discouraged.
## ℹ Use `GDP` instead.

Box Cox transformation (lambda 0.1571804) enhances the relationship between GDP and time by reducing variation.

Let’s employ ndiffs to spotlight the number of differences to achieve stationary data.

ndiffs(turkey_econ$GDP)

## [1] 2

There are two differences.

#Accommodation takings in the state of Tasmania from aus_accommodation.
tas_acc <- aus_accommodation %>%
    filter(State == 'Tasmania')
tas_acc %>% autoplot(Takings, colour="#00008B")

#lambda
BoxCox.lambda(tas_acc$Takings)

## [1] -0.005076712

#box-cox & plot
tas_acc %>%
    autoplot(BoxCox(Takings, BoxCox.lambda(tas_acc$Takings)), colour="#00008B")

## Warning: Use of `tas_acc$Takings` is discouraged.
## ℹ Use `Takings` instead.

#differences
ndiffs(tas_acc$Takings)

## [1] 1

The Box Cox application (lambda -.0051) helps control variation and produces a more linear relationship between TAkings and time. The differencing assessment illustrates that only 1 difference is necessary to achieve stationarity.

#Monthly sales from souvenirs.

souvenirs %>% autoplot()

## Plot variable not specified, automatically selected `.vars = Sales`

#lambda
BoxCox.lambda(souvenirs$Sales)

## [1] -0.2444328

#plot boxcox and compare to above image

souvenirs %>%
    autoplot(BoxCox(Sales, BoxCox.lambda(BoxCox.lambda(souvenirs$Sales))))

## Warning: Use of `souvenirs$Sales` is discouraged.
## ℹ Use `Sales` instead.

ndiffs(souvenirs$Sales)

## [1] 1

There’s virtually no difference between the transformed plot and the original: no improvement in variation. What is most apparent is the seasonality over time.

#9.5 - For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

set.seed(432)
aus_series <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
aus_series %>%
    gg_tsdisplay(Turnover, plot_type = 'partial', lag_max = 36) +
  labs(title= "Australian retail turnover", y = NULL)

aus_series %>% 
  transmute(
    `Turnover` = Turnover,
    `log(Turnover)` = log(Turnover),
    `log(Turnover) | Ann. Change` = difference(log(Turnover), 12),
        `DD log(Turnover)` =
                     difference(difference(log(Turnover), 12), 1)
  )%>%
  pivot_longer(-Month, names_to="Type", values_to="Turnover") %>%
  mutate(
    Type = factor(Type, levels = c(
      "Turnover",
      "log(Turnover)",
      "log(Turnover) | Ann. Change",
      "DD log(Turnover)"))
  ) %>%
  ggplot(aes(x = Month, y = Turnover)) +
  geom_line() +
  facet_grid(vars(Type), scales = "free_y") +
  labs(title= "Monthly Australian retail turnover", y = NULL)

In general, the numbers appear to be trending upward, with seasonality. The stationary data is absent of predictable patterns, but applying log transformation provides some stabilization while still demonstrating the upward trend. Lastly, the seasonally-differenced data (with log transformation) was further stabilized.

#9.6 - Simulate and plot some data from simple ARIMA models.

#a. Use the following R code to generate data from an AR(1) model with  ϕ1 = 0.6 and σ^2 =1. The process starts with y1=0.

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

head(sim)

## # A tsibble: 6 x 2 [1]
##     idx       y
##   <int>   <dbl>
## 1     1  0     
## 2     2  0.601 
## 3     3  0.994 
## 4     4  1.07  
## 5     5  1.03  
## 6     6 -0.0519

#b. Produce a time plot for the series. How does the plot change as you change ϕ1?

sim %>% autoplot(y) +
  labs(title = expression("AR(1) model with" ~ phi[1] ~ "=" ~ 0.6 ~ "," ~ sigma^2 ~ "=" ~ 1 ~ "," ~ y[1] ~ "=" ~ 0))

set.seed(432)
y2 <- numeric(100)
y3 <- numeric(100)
e2 <- rnorm(100)
e3 <- rnorm(100)

for(i in 2:100){
    y2[i] <- 0.1*y2[i-1] + e2[i]
    y3[i] <- 0.9*y3[i-1] + e3[i]
    
}

sim <- tsibble(idx = seq_len(100), y = y, y2 = y2, y3 = y3, index = idx)

plot1 <- sim %>% autoplot(y2) + labs( title = "Phi = 0.1")
plot2 <- sim %>% autoplot(y) + labs( title = "Phi = 0.6")
plot3 <- sim %>% autoplot(y3) + labs( title = "Phi = 0.9")

grid.arrange(plot1, plot2, plot3, nrow = 2)

In assessing the above outputs, as ϕ1 approaches increases and decreases, both magnitude and wavelength track directionally.

#c - rite your own code to generate data from an MA(1) model with model with θ1=0.6 and σ2=1.

set.seed(432)
y4 <- numeric(100)
e4 <- rnorm(100)

for(i in 2:100)
  y4[i] <- 0.6*e4[i-1] + e4[i]
  
sim2 <- tsibble(idx = seq_len(100), y4 = y4, index = idx)

#d - Produce a time plot for the series. How does the plot change as you change θ1?

set.seed(432)
y5 <- numeric(100)
y6 <- numeric(100)
e5 <- rnorm(100)
e6 <- rnorm(100)

for(i in 2:100){
  y5[i] <- 0.1*e5[i-1] + e5[i]
  y6[i] <- 0.9*e6[i-1] + e6[i]
}
sim2 <- tsibble(idx = seq_len(100), y4 = y4, y5 = y5, y6 = y6, index = idx)

plot4 <- sim2 %>% autoplot(y5) + labs( title = "Theta = 0.1")
plot5 <- sim2 %>% autoplot(y4) + labs( title = "Theta = 0.6")
plot6 <- sim2 %>% autoplot(y6) + labs( title = "Theta = 0.9")

grid.arrange(plot4, plot5, plot6, nrow = 2)

As θ shifts, the plots do not vary much. There is consistent variance, and the minimum-maximum stays fairly consistent as well, as are wavelengths.

#e - Generate data from an ARMA(1,1) model with ϕ1=0.6, θ1=0.6 and σ2=1

for(i in 2:100)
  y[i] <- 0.6*y[i-1] + 0.6*e[i-1] + e[i]

sim_ARMA <- tsibble(idx = seq_len(100), y = y, index = idx)

#f - Generate data from an AR(2) model with ϕ1=−0.8,ϕ2=0.3 and σ^2=1. (Note that these parameters will give a non-stationary series.)
p_1<--0.8
p_2<-0.3


for(i in 3:100)
  y[i] <- p_1*y[i-1] + p_2*y[i-2] + e[i]

AR_sim <- tsibble(idx = seq_len(100), y = y, index = idx)

#g - Graph the latter two series and compare them.

plt_ARMA <- sim_ARMA %>% autoplot(y)+
  labs(title = expression("ARMA:" ~ phi[1] ~ "=" ~ 0.6 ~ "," ~ theta[1] ~ "=" ~ 0.6 ~ "," ~ sigma^2 ~ "=" ~ 1))

plt_AR <- AR_sim %>% autoplot(y)+
  labs(title = expression("AR:" ~ phi[1] ~ "=" ~ -0.8 ~ "," ~ theta[1] ~ "=" ~ 0.3 ~ "," ~ sigma^2 ~ "=" ~ 1))

gridExtra::grid.arrange(plt_ARMA, plt_AR, ncol = 1, nrow = 2)

These outputs show very different things, with the ARMA model not indicating much in the way of trend, and the AR model showing wildly fluctuating positive and negative values toward the later indexes. Looking at the two non-stationary models together demonstrates the importance of differencing.

9.7 - Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

#a - Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

fit <- aus_airpassengers %>%
  model(ARIMA(Passengers~pdq(0,1,0)))
fit %>% forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(y = "passengers", title = "Australia: Air Travel")

fit %>% gg_tsresiduals()

report(fit)

## Series: Passengers 
## Model: ARIMA(0,1,0) w/ drift 
## 
## Coefficients:
##       constant
##         1.4191
## s.e.    0.3014
## 
## sigma^2 estimated as 4.271:  log likelihood=-98.16
## AIC=200.31   AICc=200.59   BIC=203.97

I selected the ARIMA(0,1,0) model. The residuals function we employed shows that there is white noise, judging by the ACF plot.

#b - Write the model in terms of the backshift operator.

Yt=Yt−1+1.4191+ϵt

#c - Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

N/A as I employed this model originally.

#d - Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

fit_2 <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ pdq(2,1,2) + PDQ(0,0,0) + 1))
report(fit_2)

## Series: Passengers 
## Model: ARIMA(2,1,2) w/ drift 
## 
## Coefficients:
##           ar1      ar2     ma1     ma2  constant
##       -0.5518  -0.7327  0.5895  1.0000    3.2216
## s.e.   0.1684   0.1224  0.0916  0.1008    0.7224
## 
## sigma^2 estimated as 4.031:  log likelihood=-96.23
## AIC=204.46   AICc=206.61   BIC=215.43

#remove constant

fit_2.1 <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ pdq(0,1,2) + PDQ(0,0,0) + 0))

report(fit_2.1)

## Series: Passengers 
## Model: ARIMA(0,1,2) 
## 
## Coefficients:
##          ma1     ma2
##       0.1555  0.2913
## s.e.  0.1402  0.1429
## 
## sigma^2 estimated as 5.613:  log likelihood=-104.02
## AIC=214.05   AICc=214.62   BIC=219.53

#d - Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?
fit3 <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ pdq(0,2,1)))

report(fit3)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

fit3 %>% forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(y = "assengers", title = "Travel: AUS")

This is arguable the best fitting model.

#9.8 - For the United States GDP series (from global_economy):

#a - if necessary, find a suitable Box-Cox transformation for the data;

usa_econ <-global_economy %>%
    filter(Code == "USA") 

usa_econ %>%
    autoplot(GDP) +
    labs(title="US GDP",
         y="GDP")

This is annual data and so there’s no seasonality. No need to transform.

#b - fit a suitable ARIMA model to the transformed data using ARIMA();

fit <- usa_econ %>%
  model(ARIMA(GDP))
report(fit)

## Series: GDP 
## Model: ARIMA(0,2,2) 
## 
## Coefficients:
##           ma1      ma2
##       -0.4206  -0.3048
## s.e.   0.1197   0.1078
## 
## sigma^2 estimated as 2.615e+22:  log likelihood=-1524.08
## AIC=3054.15   AICc=3054.61   BIC=3060.23

#ndiffs(usa_econ$GDP)

ARIMA(0,2,2).

#c - try some other plausible models by experimenting with the orders chosen;
ndiffs(usa_econ$GDP)

## [1] 2

The number of differences, 2, justifies keeping d=2 in our next ARIMA selections.

fit_other1<- usa_econ %>%
  model(ARIMA(GDP ~ pdq(1,2,2)))

fit_other2<- usa_econ %>%
  model(ARIMA(GDP ~ pdq(2,2,0)))

fit_other3<- usa_econ %>%
  model(ARIMA(GDP ~ pdq(2,2,2)))

report(fit_other1)

## Series: GDP 
## Model: ARIMA(1,2,2) 
## 
## Coefficients:
##          ar1      ma1      ma2
##       0.2053  -0.5912  -0.1928
## s.e.  0.3008   0.2886   0.2102
## 
## sigma^2 estimated as 2.646e+22:  log likelihood=-1523.86
## AIC=3055.72   AICc=3056.51   BIC=3063.82

report(fit_other2)

## Series: GDP 
## Model: ARIMA(2,2,0) 
## 
## Coefficients:
##           ar1      ar2
##       -0.2088  -0.2059
## s.e.   0.1320   0.1317
## 
## sigma^2 estimated as 2.984e+22:  log likelihood=-1527.51
## AIC=3061.02   AICc=3061.48   BIC=3067.09

report(fit_other3)

## Series: GDP 
## Model: ARIMA(2,2,2) 
## 
## Coefficients:
##          ar1      ar2      ma1     ma2
##       1.3764  -0.4780  -1.9659  1.0000
## s.e.  0.1216   0.1354   0.0723  0.0719
## 
## sigma^2 estimated as 2.283e+22:  log likelihood=-1521.14
## AIC=3052.27   AICc=3053.47   BIC=3062.4

In assessing our reports, we can ascertain that changing our q value a lower figure results in a weaker model and higher AIC figures.

#d - choose what you think is the best model and check the residual diagnostics;

fit_other3 %>% gg_tsresiduals()

This looks as though the model is sufficient, so I’ll move on to producing forecasts.

#e - produce forecasts of your fitted model. Do the forecasts look reasonable?

fit_other3 %>% forecast(h=10) %>%
  autoplot(usa_econ) +
  labs(y = "GDP")

The forecasts track with existing annual trends.

#f - compare the results with what you would obtain using ETS() (with no transformation).

ets_comparison1 <- usa_econ %>%
    model(ETS(GDP))

ets_comparison1 %>% forecast(h = 10) %>%
  autoplot(usa_econ)

ets_comparison1 %>% gg_tsresiduals()

The comparison between my model and the ETS seems as though the ARIMA(2,2,2) performs a bit better.

DATA 624.HW6

Evan McLaughlin

2024-10-20

9.7 - Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.