Data 624 Homework 6

library(tidyverse)
library(fpp3)

## Warning: package 'tsibble' was built under R version 4.2.3

library(tseries)
library(forecast)
library(kableExtra)
library(reactable)
library(seasonal)
library(tsibble)

1. Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

knitr::include_graphics("/Users/mohamedhassan/Downloads/Figure 9.32.png")

Figure 9.32: Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers.

a. Explain the differences among these figures. Do they all indicate that the data are white noise?

As the number of observations increases, the confidence intervals narrow. The figure with 36 random numbers have wide confidence intervals, and as the amount of random numbers increase, the confidence intervals narrow towards zero. Each figure does have data that indicates white noise, since the autocorrelations fall within the confidence intervals and they appear to be randomized with no distinct pattern.

b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The formula for critical values is \(\pm1.96 \sqrt{T}\), where \(T\) is the number of observations. As sample size \(T\) increases, the critical values decrease. Therefore, the confidence intervals for 360 and 1,000 random numbers have smaller confidence intervals than the figure for 36 random numbers.

2. A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in `gafa_stock`), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

amazon_stock <- gafa_stock |>
  filter(Symbol == "AMZN")

amazon_stock |>
  autoplot(Close)

amazon_stock |>
  gg_tsdisplay(Close, plot_type = "partial")

## Warning: Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.

When examining the autoplot, it’s clear that the data is not stationary due to its change over time. If the data was stationary, we would see a roughly horizontal pattern with constant variance. While there appears to be constant variance, the data shows that there is predictably patterns with its upward diagonal trajectory. The ACF plot shows the data barely decreasing along the lags, which is indicative of non-stationarity. The PACF plot has one long spike at lag(1), with smaller spikes beyond the confidence intervals. Differencing can help stabilize the mean by removing changes in the level of the time series, and therefore eliminate (or reduce) trend and/or seasonality (Hyndman, Athanasopoulos 9.1).

Using the KPSS test, we can check if the data is non-stationary:

amazon_stock |>
  features(Close, unitroot_kpss)

## # A tibble: 1 × 3
##   Symbol kpss_stat kpss_pvalue
##   <chr>      <dbl>       <dbl>
## 1 AMZN        14.0        0.01

The kpss_pvalue is 0.01, which indicates that the null hypothesis is rejected and that the data is not stationary. We can use the unitroot_ndiffs function to determine how many times the data should be differenced in order to achieve stationarity:

amazon_stock |>
  features(Close, unitroot_ndiffs)

## # A tibble: 1 × 2
##   Symbol ndiffs
##   <chr>   <int>
## 1 AMZN        1

amazon_stock |>
  gg_tsdisplay(difference(Close), plot_type = "partial")

## Warning: Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.

## Warning: Removed 1 row containing missing values (`geom_line()`).

## Warning: Removed 1 rows containing missing values (`geom_point()`).

Differencing the data has made it stationary, with the autoplot appearing to be roughly horizontal and looking the same over time. Although there are spikes beyond the confidence intervals in the ACF and PACF plots, both plots appear to show white noise and be more randomized with no distinct pattern.

3. For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

a. Turkish GDP from `global_economy`.

turkey_gdp <- global_economy |>
  filter(Country == "Turkey")

turkey_gdp |>
  autoplot(GDP)

turkey_gdp |>
  gg_tsdisplay(GDP, plot_type = "partial")

Box-Cox Transformation

lambda_a <- turkey_gdp |>
  features(GDP, features = guerrero) |>
  pull(lambda_guerrero)

turkey_gdp |>
  gg_tsdisplay((box_cox(GDP, lambda_a)), plot_type = "partial")

turkey_gdp |>
  features(box_cox(GDP, lambda_a), unitroot_kpss)

## # A tibble: 1 × 3
##   Country kpss_stat kpss_pvalue
##   <fct>       <dbl>       <dbl>
## 1 Turkey       1.50        0.01

turkey_gdp |>
  features(box_cox(GDP, lambda_a), unitroot_ndiffs)

## # A tibble: 1 × 2
##   Country ndiffs
##   <fct>    <int>
## 1 Turkey       1

turkey_gdp |>
  features(difference(box_cox(GDP, lambda_a)), unitroot_ndiffs)

## # A tibble: 1 × 2
##   Country ndiffs
##   <fct>    <int>
## 1 Turkey       0

turkey_gdp |>
  gg_tsdisplay(difference(box_cox(GDP, lambda_a)), plot_type = "partial")

## Warning: Removed 1 row containing missing values (`geom_line()`).

## Warning: Removed 1 rows containing missing values (`geom_point()`).

turkey_gdp |>
  autoplot(difference(box_cox(GDP, lambda_a))) +
  labs(title="Turkey GDP After Transformation and Differencing") +
  theme(plot.title = element_text(hjust=0.5))

## Warning: Removed 1 row containing missing values (`geom_line()`).

b. Accommodation takings in the state of Tasmania from `aus_accommodation`.

This dataset is seasonal, since the observations are quarterly:

tasmania_takings <- aus_accommodation |>
  filter(State == "Tasmania")

tasmania_takings |>
  autoplot(Takings)

tasmania_takings |>
  gg_tsdisplay(Takings, plot_type = "partial", lag=8)

Box-Cox Transformation

lambda_b <- tasmania_takings |>
  features(Takings, features = guerrero) |>
  pull(lambda_guerrero)

tasmania_takings |>
  features(box_cox(Takings, lambda_b), unitroot_kpss)

## # A tibble: 1 × 3
##   State    kpss_stat kpss_pvalue
##   <chr>        <dbl>       <dbl>
## 1 Tasmania      1.84        0.01

tasmania_takings |>
  features(box_cox(Takings, lambda_b), unitroot_ndiffs)

## # A tibble: 1 × 2
##   State    ndiffs
##   <chr>     <int>
## 1 Tasmania      1

tasmania_takings |>
  features(difference(box_cox(Takings, lambda_b)), unitroot_ndiffs)

## # A tibble: 1 × 2
##   State    ndiffs
##   <chr>     <int>
## 1 Tasmania      0

tasmania_takings |>
  autoplot(difference(box_cox(Takings, lambda_b), 4)) +
  labs(title="Accomodation Takings in Tasmania After Transformation") +
  theme(plot.title = element_text(hjust=0.5))

## Warning: Removed 4 rows containing missing values (`geom_line()`).

tasmania_takings |>
  gg_tsdisplay(difference(box_cox(Takings, lambda_b), 4), plot_type = "partial", lag = 8)

## Warning: Removed 4 rows containing missing values (`geom_line()`).

## Warning: Removed 4 rows containing missing values (`geom_point()`).

c. Monthly sales from `souvenirs`.

This dataset is also seasonal, since the data is monthly:

souvenirs |>
  autoplot(Sales)

souvenirs |>
  gg_tsdisplay(Sales, plot_type = "partial", lag=24)

Box-Cox Transformation

lambda_c <- souvenirs |>
  features(Sales, features = guerrero) |>
  pull(lambda_guerrero)

souvenirs |>
  features(box_cox(Sales, lambda_c), unitroot_kpss)

## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1      1.79        0.01

souvenirs |>
  features(box_cox(Sales, lambda_c), unitroot_ndiffs)

## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      1

souvenirs |>
  features(difference(box_cox(Sales, lambda_c)), unitroot_ndiffs)

## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      0

souvenirs |>
  autoplot(difference(box_cox(Sales, lambda_c), 12)) +
  labs(title="Monthly Souvenir Sales After Transformation") +
  theme(plot.title = element_text(hjust=0.5))

## Warning: Removed 12 rows containing missing values (`geom_line()`).

souvenirs |>
  gg_tsdisplay(difference(box_cox(Sales, lambda_c), 12), plot_type = "partial", lag = 36)

## Warning: Removed 12 rows containing missing values (`geom_line()`).

## Warning: Removed 12 rows containing missing values (`geom_point()`).

5. For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

Since the data is monthly, we know this is seasonal data.

set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries |>
  autoplot(Turnover)

myseries |>
  gg_tsdisplay(Turnover, plot_type = "partial")

Based on the autoplot, it appears that the data should be transformed to stabilize the variance:

Box-Cox Transformation

lambda_retail <- myseries |>
  features(Turnover, features = guerrero) |>
  pull(lambda_guerrero)

myseries |>
  features(box_cox(Turnover, lambda_retail), unitroot_kpss)

## # A tibble: 1 × 4
##   State              Industry                              kpss_stat kpss_pvalue
##   <chr>              <chr>                                     <dbl>       <dbl>
## 1 Northern Territory Clothing, footwear and personal acce…      5.43        0.01

myseries |>
  features(box_cox(Turnover, lambda_retail), unitroot_ndiffs)

## # A tibble: 1 × 3
##   State              Industry                                            ndiffs
##   <chr>              <chr>                                                <int>
## 1 Northern Territory Clothing, footwear and personal accessory retailing      1

myseries |>
  gg_tsdisplay(difference(box_cox(Turnover, lambda_retail), 12), plot_type = "partial", lag = 36)

## Warning: Removed 12 rows containing missing values (`geom_line()`).

## Warning: Removed 12 rows containing missing values (`geom_point()`).

6. Simulate and plot some data from simple ARIMA models.

a. Use the following R code to generate data from an AR(1) model with \(\phi_{1} = 0.6\) and \(\sigma^2 = 1\). The process starts with \(y_1=0\).

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

b. Produce a time plot for the series. How does the plot change as you change \(\phi_1\)?

sim |>
  autoplot(y)

sim |>
  gg_tsdisplay(y, plot_type = "partial") +
  ggtitle("Using 0.6 as Phi")

For an AR(1) model: \(-1 < \phi < 1\)

\(\phi_{1} = 1\)

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 1*y[i-1] + e[i]
sim_2 <- tsibble(idx = seq_len(100), y = y, index = idx)

sim_2 |>
  autoplot(y)

sim_2 |>
  gg_tsdisplay(y, plot_type = "partial") +
  ggtitle("Using 1 as Phi")

\(\phi_{1} = -1\)

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- -1*y[i-1] + e[i]
sim_3 <- tsibble(idx = seq_len(100), y = y, index = idx)

sim_3 |>
  autoplot(y)

sim_3 |>
  gg_tsdisplay(y, plot_type = "partial") +
  ggtitle("Using -1 as Phi")

\(\phi_{1} = 0.1\)

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.1*y[i-1] + e[i]
sim_4 <- tsibble(idx = seq_len(100), y = y, index = idx)

sim_4 |>
  autoplot(y)

sim_4 |>
  gg_tsdisplay(y, plot_type = "partial")+
  ggtitle("Using 0.1 as Phi")

\(\phi_{1} = -0.6\)

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- -0.6*y[i-1] + e[i]
sim_5 <- tsibble(idx = seq_len(100), y = y, index = idx)

sim_5 |>
  autoplot(y)

sim_5 |>
  gg_tsdisplay(y, plot_type = "partial")+
  ggtitle("Using -0.6 as Phi")

There are interesting observations from each of the five AR models. The model when \(\phi = 0.1\) shows the closest to white noise, with both ACF and PACF plots appearing randomized with no distinct pattern and an autoplot that shows stationarity. This makes sense, since AR(1) models that have a constant of zero and \(\phi\) of zero are equivalent to white noise. The models that had a \(\phi\) of 0.6 and 1 showed oscillation in the ACF plot, showing less stationarity and longer distance between data points in the autoplot, which indicates a random walk. The models that had a \(\phi\) of -0.6 and -1 showed more prominent spikes in the ACF plot, in particular when \(\phi = 1\). The autoplots of each show more of up-and-down trend and less stationarity.

c. Write your own code to generate data from an MA(1) model with \(\theta_{1} = 0.6\) and \(\sigma^2=1\)

MA(1): \(y_t = c + \epsilon_t + \theta_{1} \epsilon_{t-1}\)

set.seed(123)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- e[i] + 0.6*e[i-1]
ma_sim <- tsibble(idx = seq_len(100), y = y, index = idx)

d. Produce a time plot for the series. How does the plot change as you change \(\theta_1\)?

ma_sim |>
  autoplot(y)

ma_sim |>
  gg_tsdisplay(y, plot_type = "partial") +
  ggtitle("Using 0.6 as Theta")

For an MA(1) model: \(-1 < \theta_1 < 1\)

\(\theta_{1} = 1\)

set.seed(123)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- e[i] + 1*e[i-1]
ma_sim2 <- tsibble(idx = seq_len(100), y = y, index = idx)

ma_sim2 |>
  autoplot(y)

ma_sim2 |>
  gg_tsdisplay(y, plot_type = "partial") +
  ggtitle("Using 1 as Theta")

\(\theta_{1} = -1\)

set.seed(123)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- e[i] + -1*e[i-1]
ma_sim3 <- tsibble(idx = seq_len(100), y = y, index = idx)

ma_sim3 |>
  autoplot(y)

ma_sim3 |>
  gg_tsdisplay(y, plot_type = "partial") +
  ggtitle("Using -1 as Theta")

\(\theta_{1} = 0.1\)

set.seed(123)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- e[i] + 0.1*e[i-1]
ma_sim4 <- tsibble(idx = seq_len(100), y = y, index = idx)

ma_sim4 |>
  autoplot(y)

ma_sim4 |>
  gg_tsdisplay(y, plot_type = "partial")+
  ggtitle("Using 0.1 as Theta")

\(\theta_{1} = -0.6\)

set.seed(123)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- e[i] + -0.6*e[i-1]
ma_sim5 <- tsibble(idx = seq_len(100), y = y, index = idx)

ma_sim5 |>
  autoplot(y)

ma_sim5 |>
  gg_tsdisplay(y, plot_type = "partial")+
  ggtitle("Using -0.6 as Theta")

When comparing the five autoplots, there doesn’t appear to be any distinct differences between each of them. They mostly show different variations over time with each theta value, but nothing that would suggest a clear pattern. When comparing the ACF and PACF plots, the theta value of 0.1 shows the most white noise, which makes sense since it is the closest value to zero among the other values used. Nevertheless, there doesn’t appear to be any suggestion of oscillation or distinct pattern in the plots. The plots that have 1 and -1 theta values appears to show more spikes in the lags than the other values.

e. Generate data from an ARMA(1,1) model with \(\phi_{1} = 0.6\), \(\theta_1=0.6\), and \(\sigma^2=1\).

set.seed(123)
y <- numeric(100)
e <- rnorm(100)
phi <- 0.6
theta <- 0.6
for(i in 2:100)
  y[i] <- phi*y[i-1] + theta*e[i-1] + e[i]
arma11_sim <- tsibble(idx = seq_len(100), y = y, index = idx)

f. Generate data from an AR(2) model with \(\phi_1=-0.8\), \(\phi_2=0.3\), and \(\sigma^2=1\). (Note that these parameters will give a non-stationary series.)

set.seed(123)
y <- numeric(100)
e <- rnorm(100)
phi1 <- -0.8
phi2 <- 0.3
for(i in 3:100)
  y[i] <- phi1*y[i-1] + phi2*y[i-2] + e[i]
ar2_sim <- tsibble(idx = seq_len(100), y = y, index = idx)

g. Graph the latter two series and compare them.

arma11_sim |>
  autoplot(y)

arma11_sim |>
  gg_tsdisplay(y, plot_type = "partial")

ar2_sim |>
  autoplot(y)

ar2_sim |>
  gg_tsdisplay(y, plot_type = "partial")

The ARMA(1,1) model shows more stationarity in its autoplot than the AR(2) model. The ACF plot of the ARMA(1,1) model shows oscillation, while the AR(2) model has spikes beyond the confidence intervals and are decreasing slowly. The autoplot of the AR(2) model is mostly flat until the data points start to go up and down after value 50 and begins rapidly widen after value 75.

7. Consider `aus_airpassengers`, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

a. Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

aus_airpassengers |>
  gg_tsdisplay(Passengers, plot_type = "partial")

fit <- aus_airpassengers |>
  model(ARIMA(Passengers))
report(fit)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

Using the ARIMA() function, the model that was selected was a MA model with a \(q\) value of 1 and differencing \(d\) of 2.

fit |> 
  gg_tsresiduals()

fit |> 
  forecast(h=10) |>
  autoplot(aus_airpassengers)

b. Write the model in terms of the backshift operator.

ARIMA(0,2,1): \(y_t = -0.8963 * \epsilon_{t-1} + \epsilon_{t}\)

c. Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

fit2 <- aus_airpassengers |>
  model(ARIMA(Passengers ~ 1 + pdq(0,1,0)))

report(fit2)

## Series: Passengers 
## Model: ARIMA(0,1,0) w/ drift 
## 
## Coefficients:
##       constant
##         1.4191
## s.e.    0.3014
## 
## sigma^2 estimated as 4.271:  log likelihood=-98.16
## AIC=200.31   AICc=200.59   BIC=203.97

fit2 |> 
  gg_tsresiduals()

fit2 |> 
  forecast(h=10) |>
  autoplot(aus_airpassengers)

There appears to be less white noise in the ARIMA(0,1,0) model than in the suggested ARIMA(0,2,1) model. The second model has an AIC value of 200.31, slightly more than the first model’s AIC of 198.04. When comparing the forecast plots of each, the first model’s confidence interval appears slightly wider than the second model. Nevertheless, each forecast appears to show an upward trajectory.

d. Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

fit3 <- aus_airpassengers |>
  model(ARIMA(Passengers ~ 1 + pdq(2,1,2)))

report(fit3)

## Series: Passengers 
## Model: ARIMA(2,1,2) w/ drift 
## 
## Coefficients:
##           ar1      ar2     ma1     ma2  constant
##       -0.5518  -0.7327  0.5895  1.0000    3.2216
## s.e.   0.1684   0.1224  0.0916  0.1008    0.7224
## 
## sigma^2 estimated as 4.031:  log likelihood=-96.23
## AIC=204.46   AICc=206.61   BIC=215.43

fit3 |>
  gg_tsresiduals()

fit3 |> 
  forecast(h=10) |>
  autoplot(aus_airpassengers)

When examining the third model’s plot with those in parts a and c, the thin blue line appears to not be straight, showing less of an upward trajectory and more of a flatter forecast. Its confidence interval looks to be roughly the same as the confidence interval produced in the plot for part c.

Without the constant

fit4 <- aus_airpassengers |>
  model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))

## Warning: 1 error encountered for ARIMA(Passengers ~ 0 + pdq(2, 1, 2))
## [1] non-stationary AR part from CSS

report(fit4)

## Series: Passengers 
## Model: NULL model 
## NULL model

Removing the constant produces an error stating that the non-stationary AR part from the CSS (conditional sum of squares). This may be due to the values falling outside the region for stationary processes.

e. Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

fit5 <- aus_airpassengers |>
  model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))

## Warning: Model specification induces a quadratic or higher order polynomial trend. 
## This is generally discouraged, consider removing the constant or reducing the number of differences.

report(fit5)

## Series: Passengers 
## Model: ARIMA(0,2,1) w/ poly 
## 
## Coefficients:
##           ma1  constant
##       -1.0000    0.0571
## s.e.   0.0585    0.0213
## 
## sigma^2 estimated as 3.855:  log likelihood=-95.1
## AIC=196.21   AICc=196.79   BIC=201.63

We get a warning that the model specification induces a quadratic or higher order polynomial trend, with a suggestion to remove the constant or reduce the number of differences. This may be caused by the fact that the model is differenced twice with a constant of 1.

fit5 |>
  gg_tsresiduals()

fit5 |> 
  forecast(h=10) |>
  autoplot(aus_airpassengers)

Despite the warning, the model still produces a forecast, which shows an upward trajectory with a more narrow confidence interval than the previous forecasts.

For the United States GDP series (from global_economy):

a. if necessary, find a suitable Box-Cox transformation for the data;

us_gdp <- global_economy |>
  filter(Country == "United States")

us_gdp |>
  autoplot(GDP)

us_gdp |>
  gg_tsdisplay(GDP, plot_type = "partial")

lambda_us <- us_gdp |>
  features(GDP, features = guerrero) |>
  pull(lambda_guerrero)

us_gdp |> 
  autoplot(box_cox(GDP, lambda_us))

us_gdp |>
  gg_tsdisplay(box_cox(GDP, lambda_us), plot_type = "partial")

b. fit a suitable ARIMA model to the transformed data using ARIMA();

fit <- us_gdp |>
  model(ARIMA(box_cox(GDP, lambda_us)))
report(fit)

## Series: GDP 
## Model: ARIMA(1,1,0) w/ drift 
## Transformation: box_cox(GDP, lambda_us) 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

us_gdp |>
  features(box_cox(GDP, lambda_us), unitroot_kpss)

## # A tibble: 1 × 3
##   Country       kpss_stat kpss_pvalue
##   <fct>             <dbl>       <dbl>
## 1 United States      1.55        0.01

us_gdp |>
  features(box_cox(GDP, lambda_us), unitroot_ndiffs)

## # A tibble: 1 × 2
##   Country       ndiffs
##   <fct>          <int>
## 1 United States      1

us_gdp |>
  gg_tsdisplay(difference(box_cox(GDP, lambda_us)), plot_type = "partial")

## Warning: Removed 1 row containing missing values (`geom_line()`).

## Warning: Removed 1 rows containing missing values (`geom_point()`).

c. try some other plausible models by experimenting with the orders chosen;

fit_models <- us_gdp |>
  model(
    "ARIMA110" = ARIMA(box_cox(GDP, lambda_us) ~ 1 + pdq(1,1,0)),
    "ARIMA011" = ARIMA(box_cox(GDP, lambda_us) ~ 1 + pdq(0,1,1)),
    "ARIMA210" = ARIMA(box_cox(GDP, lambda_us) ~ 1 + pdq(2,1,0)),
    "ARIMA010" = ARIMA(box_cox(GDP, lambda_us) ~ 1 + pdq(0,1,0)),
    "ARIMA012" = ARIMA(box_cox(GDP, lambda_us) ~ 1 + pdq(0,1,2)), 
    "ARIMA212" = ARIMA(box_cox(GDP, lambda_us) ~ 1 + pdq(2,1,2))
      )

glance(fit_models) |>
  arrange(AIC) |>
  select(.model:BIC)

## # A tibble: 6 × 6
##   .model   sigma2 log_lik   AIC  AICc   BIC
##   <chr>     <dbl>   <dbl> <dbl> <dbl> <dbl>
## 1 ARIMA110  5479.   -325.  657.  657.  663.
## 2 ARIMA210  5580.   -325.  659.  659.  667.
## 3 ARIMA011  5689.   -326.  659.  659.  665.
## 4 ARIMA012  5634.   -326.  659.  660.  667.
## 5 ARIMA212  5734.   -325.  662.  664.  674.
## 6 ARIMA010  6774.   -332.  668.  668.  672.

d. choose what you think is the best model and check the residual diagnostics;

I chose the model that had the lowest AIC value, which was the ARIMA110 model that was recommended by the ARIMA() function.

fit_models |>
  select(ARIMA110) |>
  gg_tsresiduals()

e. produce forecasts of your fitted model. Do the forecasts look reasonable?

fit_models |>
  forecast(h=10) |>
  filter(.model=='ARIMA110') |>
  autoplot(us_gdp) + 
  labs(y = "GDP", title = "Forecast of U.S. GDP Using ARIMA110") +
  theme(plot.title = element_text(hjust=0.5))

The forecast model look reasonable. The confidence interval is not wide and the forecast shows an upward trajectory that’s consistent with the data represented in the plot.

f. compare the results with what you would obtain using ETS() (with no transformation).

fit_ets <- us_gdp |>
  model(ETS(GDP))
report(fit_ets)

## Series: GDP 
## Model: ETS(M,A,N) 
##   Smoothing parameters:
##     alpha = 0.9990876 
##     beta  = 0.5011949 
## 
##   Initial states:
##          l[0]        b[0]
##  448093333334 64917355687
## 
##   sigma^2:  7e-04
## 
##      AIC     AICc      BIC 
## 3190.787 3191.941 3201.089

fit_ets |> 
  forecast(h=10) |>
  autoplot(us_gdp) +
  labs(y = "GDP", title = "Forecast of U.S. GDP Using ETS") +
  theme(plot.title = element_text(hjust=0.5))

The ETS forecast model shows a wider confidence interval and less of an upward trajectory than the recommended ARIMA model. The AIC value of the ETS model is 3190.787, which is significantly higher than the AIC of the selected ARIMA model, which was 656.65.

Data 624 Homework 6

Mohamed Hassan-El Serafi

2024-03-17

1. Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

a. Explain the differences among these figures. Do they all indicate that the data are white noise?

b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

2. A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in `gafa_stock`), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

3. For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

a. Turkish GDP from `global_economy`.

b. Accommodation takings in the state of Tasmania from `aus_accommodation`.

c. Monthly sales from `souvenirs`.

5. For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

6. Simulate and plot some data from simple ARIMA models.

a. Use the following R code to generate data from an AR(1) model with \(\phi_{1} = 0.6\) and \(\sigma^2 = 1\). The process starts with \(y_1=0\).

b. Produce a time plot for the series. How does the plot change as you change \(\phi_1\)?

c. Write your own code to generate data from an MA(1) model with \(\theta_{1} = 0.6\) and \(\sigma^2=1\)

d. Produce a time plot for the series. How does the plot change as you change \(\theta_1\)?

e. Generate data from an ARMA(1,1) model with \(\phi_{1} = 0.6\), \(\theta_1=0.6\), and \(\sigma^2=1\).

f. Generate data from an AR(2) model with \(\phi_1=-0.8\), \(\phi_2=0.3\), and \(\sigma^2=1\). (Note that these parameters will give a non-stationary series.)

g. Graph the latter two series and compare them.

7. Consider `aus_airpassengers`, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

a. Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

b. Write the model in terms of the backshift operator.

c. Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

d. Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

e. Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

a. if necessary, find a suitable Box-Cox transformation for the data;

b. fit a suitable ARIMA model to the transformed data using ARIMA();

c. try some other plausible models by experimenting with the orders chosen;

d. choose what you think is the best model and check the residual diagnostics;

e. produce forecasts of your fitted model. Do the forecasts look reasonable?

f. compare the results with what you would obtain using ETS() (with no transformation).

Data 624 Homework 6

Mohamed Hassan-El Serafi

2024-03-17

1. Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

a. Explain the differences among these figures. Do they all indicate that the data are white noise?

b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

2. A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

3. For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

a. Turkish GDP from global_economy.

b. Accommodation takings in the state of Tasmania from aus_accommodation.

c. Monthly sales from souvenirs.

5. For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

6. Simulate and plot some data from simple ARIMA models.

a. Use the following R code to generate data from an AR(1) model with \(\phi_{1} = 0.6\) and \(\sigma^2 = 1\). The process starts with \(y_1=0\).

b. Produce a time plot for the series. How does the plot change as you change \(\phi_1\)?

c. Write your own code to generate data from an MA(1) model with \(\theta_{1} = 0.6\) and \(\sigma^2=1\)

d. Produce a time plot for the series. How does the plot change as you change \(\theta_1\)?

e. Generate data from an ARMA(1,1) model with \(\phi_{1} = 0.6\), \(\theta_1=0.6\), and \(\sigma^2=1\).

f. Generate data from an AR(2) model with \(\phi_1=-0.8\), \(\phi_2=0.3\), and \(\sigma^2=1\). (Note that these parameters will give a non-stationary series.)

g. Graph the latter two series and compare them.

7. Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

a. Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

b. Write the model in terms of the backshift operator.

c. Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

d. Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

e. Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

a. if necessary, find a suitable Box-Cox transformation for the data;

b. fit a suitable ARIMA model to the transformed data using ARIMA();

c. try some other plausible models by experimenting with the orders chosen;

d. choose what you think is the best model and check the residual diagnostics;

e. produce forecasts of your fitted model. Do the forecasts look reasonable?

f. compare the results with what you would obtain using ETS() (with no transformation).

2. A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in `gafa_stock`), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

a. Turkish GDP from `global_economy`.

b. Accommodation takings in the state of Tasmania from `aus_accommodation`.

c. Monthly sales from `souvenirs`.

7. Consider `aus_airpassengers`, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.