DATA624HW6

DATA 624 Homework 5

library(fpp3)
library(tidyverse)

Question 9.1

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise?

Figure 9.32. Since the spikes lie within the 95% interval bounds the series can be considered white noise. For smaller sample size such as 36 random numbers several lags may appear significant just by chance while some may not. In more moderate sample size such as 360 the fluctuations are smaller and most autocorrelations are closer to zero. And lastly in larger sample size such as 1000 the ACF values are cluster tightly around zero, very few appear significant.

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The critical values are at different distances from the mean of zero because as our sample sizes increases we can be more confident that the true autocrrelations are near zero allowing the boundaries to be more narrow. The autocorrelations are different in each figure becuase each random sample produces different random fluctuations around zero. And also as the sample sizes become larger these random differences average out and the ACFs look closer to the theoretical expectations flat at zero.

Question 9.2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

amazon_stock <- gafa_stock %>%
  filter(Symbol == "AMZN")

amazon_stock %>%
  autoplot(Close) +
  labs(title = "Amazon Daily Closing Prices",
       y = "Closing Price (USD)",
       x = "Date")

amazon_stock %>%
  ACF(Close) %>%
  autoplot() +
  labs(title = "ACF of Amazon Closing Prices")

## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.

amazon_stock %>%
  PACF(Close) %>%
  autoplot() +
  labs(title = "PACF of Amazon Closing Prices")

## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.

The series is non-stationary because the there is a clear upward trend over time and the vairance and mean change over time. In the ACF plot the values decay slowly and remain significantly positive across many lags. And in the PACF plot there is a significant spike at lag 1 with smaller values in the subsequent lags. This supports that differencing once might help remove the trend.

Question 9.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Turkish GDP from global_economy.

turkey_gdp <- global_economy %>%
  filter(Country == "Turkey")

turkey_gdp %>%
  autoplot(GDP) +
  labs(title = "Turkish GDP")

turkey_gdp %>%
  features(GDP, features = guerrero)

## # A tibble: 1 × 2
##   Country lambda_guerrero
##   <fct>             <dbl>
## 1 Turkey            0.157

turkey_gdp <- turkey_gdp %>%
  mutate(GDP_trans = box_cox(GDP, 0.1572187))

turkey_gdp %>%
  autoplot(GDP_trans)

turkey_gdp %>%
  features(GDP_trans, unitroot_kpss)

## # A tibble: 1 × 3
##   Country kpss_stat kpss_pvalue
##   <fct>       <dbl>       <dbl>
## 1 Turkey       1.50        0.01

turkey_gdp %>%
  mutate(diff_gdp = difference(GDP_trans)) %>%
  ACF(diff_gdp, na.action = na.exclude) %>%
  autoplot()

Accommodation takings in the state of Tasmania from aus_accommodation.

tas_accom <- aus_accommodation %>%
  filter(State == "Tasmania")

tas_accom %>%
  autoplot(Takings)

tas_accom %>%
  features(Takings, features = guerrero)

## # A tibble: 1 × 2
##   State    lambda_guerrero
##   <chr>              <dbl>
## 1 Tasmania         0.00182

tas_accom <- tas_accom %>%
  mutate(Takings_trans = box_cox(Takings, 0.001819643))

tas_accom %>%
  autoplot(Takings_trans)

tas_accom %>%
  features(Takings_trans, unitroot_ndiffs)

## # A tibble: 1 × 2
##   State    ndiffs
##   <chr>     <int>
## 1 Tasmania      1

tas_accom %>%
  features(Takings_trans, unitroot_nsdiffs)

## # A tibble: 1 × 2
##   State    nsdiffs
##   <chr>      <int>
## 1 Tasmania       1

tas_accom_diff <- tas_accom %>%
  mutate(diff_takings = difference(Takings_trans, lag = 12))

tas_accom_diff %>%
  autoplot(diff_takings)

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

Monthly sales from souvenirs.

souvenirs %>%
  autoplot(Sales)

souvenirs %>%
  features(Sales, features = guerrero)

## # A tibble: 1 × 1
##   lambda_guerrero
##             <dbl>
## 1         0.00212

souvenirs <- souvenirs %>%
  mutate(Sales_trans = box_cox(Sales, 0.002118221))

souvenirs %>%
  features(Sales_trans, unitroot_nsdiffs)

## # A tibble: 1 × 1
##   nsdiffs
##     <int>
## 1       1

souvenirs %>%
  features(Sales_trans, unitroot_ndiffs)

## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      1

souvenirs_diff <- souvenirs %>%
  mutate(Sales_diff = difference(difference(Sales_trans, lag = 12), lag = 1))

souvenirs_diff %>%
  autoplot(Sales_diff)

## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).

Question 9.5

For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

set.seed(624)

retail <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

retail |>
  autoplot(Turnover) +
  labs(title = "Retail Turnover")

retail %>%
  features(Turnover, features = guerrero)

## # A tibble: 1 × 3
##   State           Industry               lambda_guerrero
##   <chr>           <chr>                            <dbl>
## 1 New South Wales Takeaway food services         0.00214

retail <- retail %>%
  mutate(Turnover_trans = box_cox(Turnover, 0.002144737))

retail %>%
  features(Turnover_trans, unitroot_nsdiffs)

## # A tibble: 1 × 3
##   State           Industry               nsdiffs
##   <chr>           <chr>                    <int>
## 1 New South Wales Takeaway food services       1

retail %>%
  features(Turnover_trans, unitroot_ndiffs)

## # A tibble: 1 × 3
##   State           Industry               ndiffs
##   <chr>           <chr>                   <int>
## 1 New South Wales Takeaway food services      1

Both have 1 so well apply both

retail_diff <- retail %>%
  mutate(
    Turnover_diff = difference(difference(Turnover_trans, lag = 12), lag = 1)
  )

retail_diff %>%
  autoplot(Turnover_diff) +
  labs(title = "Differenced & Transformed Retail Turnover")

## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).

retail_diff %>%
  ACF(Turnover_diff, na.action = na.exclude) %>%
  autoplot()

retail_diff %>%
  PACF(Turnover_diff, na.action = na.exclude) %>%
  autoplot()

The stationary series is

difference(difference(box_cox(Turnover, 0.002144737), lag = 12), lag = 1)

Question 9.6

Simulate and plot some data from simple ARIMA models.

Use the following R code to generate data from an AR(1) model with phi 1 = 0.6 and sigma 2 = 1 . The process starts with y 1 = 0.

set.seed(624)

y <- numeric(100)

e <- rnorm(100)

for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]

sim <- tsibble(idx = seq_len(100), y = y, index = idx)

Produce a time plot for the series. How does the plot change as you change phi1?

sim %>%
  autoplot(y) +
  labs(title = "AR(1) (phi = 0.6)")

when phi is 0.9 it is more persistent and slower decay when phi is 0.2 less persistent quicker delay and noisy when phi is -0.6 alternating values oscillatory patterns

Write your own code to generate data from an MA(1) model with theta1=0.6 and sigma2=1.

set.seed(624)
e <- rnorm(101, mean = 0, sd = 1)
y <- numeric(100)
for(i in 1:100) {
  y[i] <- e[i + 1] + 0.6 * e[i]
}
sim_ma1 <- tsibble(idx = 1:100, y = y, index = idx)

Produce a time plot for the series. How does the plot change as you change theta1 ?

sim_ma1 %>%
  autoplot(y) +
  labs(title = "MA(1) (theta = 0.6)")

when theta is 0 it becomes white noise when theta is 0.9 short bursts of correlation when theta is -0.6 more negative correlation between successive points

Generate data from an ARMA(1,1) model with phi1=0.6 , theta1=0.6 and sigma2=1.

set.seed(624)
e <- rnorm(101)
y <- numeric(100)
for(i in 2:100) {
  y[i] <- 0.6 * y[i - 1] + e[i] + 0.6 * e[i - 1]
}
sim_arma11 <- tsibble(idx = 1:100, y = y, index = idx)

sim_arma11 %>%
  autoplot(y) +
  labs(title = "ARMA(1,1) (phi = 0.6, theta = 0.6)")

Generate data from an AR(2) model with phi1=-0.8 , phi2=0.3 and sigma2=1. (Note that these parameters will give a non-stationary series.)

set.seed(624)
e <- rnorm(100)
y <- numeric(100)
y[1] <- 0
y[2] <- 0
for(i in 3:100) {
  y[i] <- -0.8 * y[i - 1] + 0.3 * y[i - 2] + e[i]
}
sim_ar2_ns <- tsibble(idx = 1:100, y = y, index = idx)

sim_ar2_ns %>%
  autoplot(y) +
  labs(title = "AR(2) (phi1 = -0.8, phi2 = 0.3)")

Graph the latter two series and compare them.

library(patchwork)

p1 <- sim_arma11 %>%
  autoplot(y) +
  labs(title = "ARMA(1,1)")

p2 <- sim_ar2_ns %>%
  autoplot(y) +
  labs(title = "Non-Stationary AR(2)")

p1 / p2

ARMA(1,1) mean reverting, stationary, fluctuations centered around 0 AR(2) non stationary values drift, thye explode and oscillate doesnt settle back to the mean

Question 9.7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

fit_auto <- aus_airpassengers %>%
  model(ARIMA(Passengers))

report(fit_auto)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

fit_auto %>%
  gg_tsresiduals()

## Warning: `gg_tsresiduals()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsresiduals()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

fit_auto %>%
  forecast(h = 10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Forecast: Automatic ARIMA Model")

Write the model in terms of the backshift operator.

(1-B)^2 [yt] = (1+0.8963B)log(Passenger)

Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

fit_b <- aus_airpassengers |>
  model(ARIMA(Passengers ~ 1 + pdq(0,1,0)))

fit_b %>%
  forecast(h = 10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "ARIMA(0,1,0) with Drift")

Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

air <- aus_airpassengers %>%
  filter(Year >= 1970, Year <= 2011)

fit_with_drift <- air %>%
  model(ARIMA(Passengers ~ pdq(2,1,2) + drift()))

## Warning: 1 error encountered for ARIMA(Passengers ~ pdq(2, 1, 2) + drift())
## [1] could not find function "drift"

fit_no_drift <- air %>%
  model(ARIMA(Passengers ~ pdq(2,1,2)))

fc_with_drift <- fit_with_drift %>% forecast(h = 10)
fc_no_drift   <- fit_no_drift %>% forecast(h = 10)

library(patchwork)

p1 <- fc_with_drift %>%
  autoplot(air) +
  labs(title = "ARIMA(2,1,2) with Drift",
       y = "Passengers (millions)")

p2 <- fc_no_drift %>%
  autoplot(air) +
  labs(title = "ARIMA(2,1,2) without Drift",
       y = "Passengers (millions)")

p1 / p2

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).

Doesn’t forecast with drift but does without it.

Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

fit_d <- air %>%
  model(ARIMA(Passengers ~ pdq(0,2,1) + drift()))

## Warning: 1 error encountered for ARIMA(Passengers ~ pdq(0, 2, 1) + drift())
## [1] could not find function "drift"

fit_d %>%
  forecast(h = 10) %>%
  autoplot(air) +
  labs(title = "Part D: ARIMA(0,2,1) with Constant",
       y = "Passengers (millions)")

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).

It doesn’t work with drift but it does without it.

fit_d <- air %>%
  model(ARIMA(Passengers ~ pdq(0,2,1)))

fit_d %>%
  forecast(h = 10) %>%
  autoplot(air) +
  labs(title = "Part D: ARIMA(0,2,1) with Constant",
       y = "Passengers (millions)")

Question 9.8

For the United States GDP series (from global_economy): if necessary, find a suitable Box-Cox transformation for the data; fit a suitable ARIMA model to the transformed data using ARIMA();

us_gdp <- global_economy %>%
  filter(Country == "United States") %>%
  select(Year, GDP)

autoplot(us_gdp, GDP) +
  labs(title = "US GDP over Time", y = "GDP (USD Billions)")

lambda <- us_gdp %>% features(GDP, features = guerrero) %>% pull(lambda_guerrero)
lambda

## [1] 0.2819443

fit_arima <- us_gdp %>%
  model(ARIMA(box_cox(GDP, lambda)))

report(fit_arima)

## Series: GDP 
## Model: ARIMA(1,1,0) w/ drift 
## Transformation: box_cox(GDP, lambda) 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

try some other plausible models by experimenting with the orders chosen; choose what you think is the best model and check the residual diagnostics; produce forecasts of your fitted model. Do the forecasts look reasonable? compare the results with what you would obtain using ETS() (with no transformation).

fit_simple <- us_gdp %>%
  model(ARIMA(box_cox(GDP, lambda) ~ pdq(1,1,0)))

report(fit_simple)

## Series: GDP 
## Model: ARIMA(1,1,0) w/ drift 
## Transformation: box_cox(GDP, lambda) 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

fit_alt1 <- us_gdp %>%
  model(ar210 = ARIMA(box_cox(GDP, lambda) ~ pdq(2,1,0)),
        ar111 = ARIMA(box_cox(GDP, lambda) ~ pdq(1,1,1)),
        ar211 = ARIMA(box_cox(GDP, lambda) ~ pdq(2,1,1)))

glance(fit_alt1)

## # A tibble: 3 × 8
##   .model sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
##   <chr>   <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
## 1 ar210   5580.   -325.  659.  659.  667. <cpl [2]> <cpl [0]>
## 2 ar111   5580.   -325.  659.  659.  667. <cpl [1]> <cpl [1]>
## 3 ar211   5647.   -325.  660.  661.  671. <cpl [2]> <cpl [1]>

fit_alt1 %>%
  select(ar111) %>%
  gg_tsresiduals()

fit_alt1 %>%
  select(ar111) %>%
  augment() %>%
  features(.innov, ljung_box, lag = 10)

## # A tibble: 1 × 3
##   .model lb_stat lb_pvalue
##   <chr>    <dbl>     <dbl>
## 1 ar111     3.84     0.954

fit_alt1 %>%
  select(ar111) %>%
  forecast(h = 10) %>%
  autoplot(us_gdp) +
  labs(title = "Forecast: US GDP using ARIMA(1,1,1)",
       y = "GDP (USD Billions)")

fit_ets <- us_gdp %>%
  model(ets = ETS(GDP))

forecast(fit_ets, h = 10) %>%
  autoplot(us_gdp) +
  labs(title = "Forecast: US GDP using ETS Model")

fc_combined <- bind_rows(
  forecast(fit_alt1 %>% select(ar111), h = 10) %>% mutate(Model = "ARIMA(1,1,1)"),
  forecast(fit_ets, h = 10) %>% mutate(Model = "ETS")
)

fc_combined %>%
  autoplot(us_gdp, level = NULL) +
  facet_wrap(~Model) +
  labs(title = "US GDP Forecasts: ARIMA vs ETS",
       y = "GDP (USD Billions)")