Data 624:HW 6

Do the exercises 9.1, 9.2, 9.3, 9.5, 9.6, 9.7, 9.8 in Hyndman. Please submit both the Rpubs link as well as your .rmd file.

1. Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

a. Explain the differences among these figures. Do they all indicate that the data are white noise?

The difference among these figures is the length of the lags for each ACF plot and the size of the bounds. The lags of the plots decrease in length as sample size increases in number (36 numbers to 1000 numbers). The bounds also decrease in size as the sample size increases in number (36 numbers to 1000 numbers). This is probably because larger sample sizes lead to more accurate estimates. The lags also get close and closer to zero as we increase the sample size - although, all of the lags are still within bounds, therefore, all the figures/plots indicate that the data is white noise and is no significant autocorrelation.

b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The critical values are at different distances from the mean of zero because the critical values or bounds are dependent on the sample size. Thinking about the equation for critical values, part of the formula includes sample mean. When the sample size is smaller, the bounds are wider and the estimates are less accurate compared to when the sample size is larger, where the bounds are narrow because the estimates are more accurate. The autocorrelations are different in each figure when they each refer to white noise because there is more random fluctuation in smaller sample sizes compared to larger sample sizes.

2. A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

The series is non-stationary because we can see from the first plot of the closing stock prices that there is an upward trend with some fluctuation in between - time series with trends or with seasonality are not stationary. We can also see from the ACF plot that the time series is very, very slowing decaying, even though it is a little difficult to see - which also means it is non-stationary. Lastly, from the PACF plot we see that the first lag is very large and outside of the bounds with other lags in the plot also outside the bounds, indicating significant correlation, which also show that this time series is non-stationary.

amazon_stock <- gafa_stock |>
  filter(Symbol == "AMZN")

amazon_stock |>
  autoplot(Close)

amazon_stock |> ACF(Close) |>
  autoplot() + labs(subtitle = "ACF of Amazon closing stock price")

## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.

amazon_stock |> PACF(Close) |>
  autoplot() + labs(subtitle = "PACF of Amazon closing stock price")

## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.

3. For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

a. Turkish GDP from global_economy. For Turkish GDP from global_economy, we could see an overall upward trend and the ACF and PACF plot shows that the data is non-stationary since the ACF plot is slowly decaying and the PACF plot having the first lag significantly out of bounds. The appropriate Box-Cox transformation is lambda = 0.1572187 and the uni_root tell us to take one difference - which gave us stationary data.

turk_gdp <- global_economy |>
  filter(Country == 'Turkey') 

turk_gdp |>
  autoplot(GDP)

turk_gdp |> ACF(GDP) |>
  autoplot()

turk_gdp |> PACF(GDP) |>
  autoplot()

lambda <- turk_gdp |>
  features(GDP, features = guerrero) |>
  pull(lambda_guerrero)
lambda

## [1] 0.1572187

turk_gdp |>
  mutate(gdp_box = box_cox(GDP, lambda)) |>
  features(gdp_box, unitroot_ndiffs)

## # A tibble: 1 × 2
##   Country ndiffs
##   <fct>    <int>
## 1 Turkey       1

turk_gdp |>
  autoplot(box_cox(GDP, lambda) |>
  difference(1))

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

global_economy %>%
  filter(Country == "Turkey") %>%
  gg_tsdisplay(difference(box_cox(GDP,lambda)), plot_type='partial') +
  labs(y = "",
    title = latex2exp::TeX(paste0("Box-cox and ordinary difference of GDP in US $\\lambda$ = ", round(lambda,2))))

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

b. Accommodation takings in the state of Tasmania from aus_accommodation.

For Accommodation takings in the state of Tasmania from aus_accommodation, there is an upward trend and there seems to be seasonality here. The appropriate Box-Cox transformation is lambda = 0.001819643 and the uni_root tells us to take one seasonal difference and one ordinary difference. I used a difference of 4 (Quarterly data) to see if only a seasonal difference would give us stationary data before taking an ordinary difference. However, this time series needed a seasonal and ordinary difference to make it stationary.

tas_take <- aus_accommodation |>
  filter(State == 'Tasmania') 

tas_take  |> 
  autoplot(Takings)

tas_take  |> ACF(Takings) |>
  autoplot()

tas_take  |> 
  PACF(Takings) |>
  autoplot()

lambda1 <- tas_take  |>
  features(Takings, features = guerrero) |>
  pull(lambda_guerrero)
lambda1

## [1] 0.001819643

tas_take  |>
  mutate(take_box = box_cox(Takings, lambda1)) |>
  features(take_box, unitroot_ndiffs) #ordinary difference

## # A tibble: 1 × 2
##   State    ndiffs
##   <chr>     <int>
## 1 Tasmania      1

tas_take  |>
  mutate(take_box = box_cox(Takings, lambda1)) |>
  features(take_box, unitroot_nsdiffs) #seasonal difference

## # A tibble: 1 × 2
##   State    nsdiffs
##   <chr>      <int>
## 1 Tasmania       1

aus_accommodation |>
  filter(State == 'Tasmania')  %>%
  gg_tsdisplay(difference(box_cox(Takings,lambda1),4), plot_type='partial') +
  labs(y = "",
    title = latex2exp::TeX(paste0("Box-cox and Seasonal Difference of Takings in Australia $\\lambda1$ = ", round(lambda1,2))))

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

aus_accommodation |>
  filter(State == 'Tasmania')  %>%
  gg_tsdisplay(difference(difference(box_cox(Takings,lambda1),4),1), plot_type='partial') +
  labs(y = "",
    title = latex2exp::TeX(paste0("Box-cox and double difference (Seasonal and Ordinary) of Takings in Australia $\\lambda1$ = ", round(lambda1,2))))

## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_point()`).

c. Monthly sales from souvenirs.

For Monthly sales from souvenirs, there is an upward trend and there seems to be seasonality here. The appropriate Box-Cox transformation is lambda = 0.002118221 and the uni_root tell us to take one seasonal difference and one ordinary difference. I used a difference of 12 (monthly data) to see if only a seasonal difference would give us stationary data before taking an ordinary difference. However, this time series also needed a seasonal and ordinary difference to make it somehwat stationary.

souvenirs |> 
  autoplot(Sales)

souvenirs  |> ACF(Sales) |>
  autoplot()

souvenirs  |> 
  PACF(Sales) |>
  autoplot()

lambda2 <- souvenirs |>
  features(Sales, features = guerrero) |>
  pull(lambda_guerrero)
lambda2

## [1] 0.002118221

souvenirs  |>
  mutate(sale_box = box_cox(Sales, lambda2)) |>
  features(sale_box, unitroot_ndiffs) #ordinary

## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      1

souvenirs  |>
  mutate(sale_box = box_cox(Sales, lambda2)) |>
  features(sale_box, unitroot_nsdiffs) #seasonal

## # A tibble: 1 × 1
##   nsdiffs
##     <int>
## 1       1

souvenirs |>
  gg_tsdisplay(difference(box_cox(Sales,lambda2),12), plot_type='partial') +
  labs(y = "",
    title = latex2exp::TeX(paste0("Box-cox and ordinary difference of Monthly Souvenir sales $\\lambda2$ = ", round(lambda2,2))))

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

souvenirs |>
  gg_tsdisplay(difference(difference(box_cox(Sales,lambda2),12),1), plot_type='partial') +
  labs(y = "",
    title = latex2exp::TeX(paste0("Box-cox and double difference (Seasonal and Ordinary) of Monthly Souvenir sales $\\lambda2$ = ", round(lambda2,2))))

## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_point()`).

5. For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

For my retail data (from Exercise 7 in Section 2.10), there is an upward trend and there seems to be seasonality here as well as variation. Since there is variation, I did a box-cox transformation and the uni_root tells us to take one seasonal difference and one ordinary difference. I used a difference of 12 (monthly data) to see if only a seasonal difference would give us stationary data before taking an ordinary difference. However, this time series as well, needed a seasonal and ordinary difference to make it somewhat stationary - after both, there is no pattern but some lags are still outside of the bounds.

set.seed(23532)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1)) 

myseries |> 
  autoplot(Turnover)

lambda3 <- myseries |>
  features(Turnover, features = guerrero) |>
  pull(lambda_guerrero)
lambda3

## [1] 0.7947362

myseries |>
  mutate(turn_box = box_cox(Turnover, lambda3)) |>
  features(turn_box, unitroot_ndiffs)

## # A tibble: 1 × 3
##   State           Industry                                         ndiffs
##   <chr>           <chr>                                             <int>
## 1 New South Wales Hardware, building and garden supplies retailing      1

myseries |>
  mutate(turn_box = box_cox(Turnover, lambda3)) |>
  features(turn_box, unitroot_nsdiffs)

## # A tibble: 1 × 3
##   State           Industry                                         nsdiffs
##   <chr>           <chr>                                              <int>
## 1 New South Wales Hardware, building and garden supplies retailing       1

myseries |>
  gg_tsdisplay(difference(box_cox(Turnover,lambda3),12), plot_type='partial') +
    labs(y = "",
    title = latex2exp::TeX(paste0("Box-cox and seasonal difference of myseries $\\lambda3$ = ", round(lambda3,2))))

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

myseries |>
  gg_tsdisplay(difference(difference(box_cox(Turnover,lambda3),12),1), plot_type='partial') +
    labs(y = "",
    title = latex2exp::TeX(paste0("Box-cox and double difference (Seasonal and Ordinary) of myseries $\\lambda3$ = ", round(lambda3,2))))

## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_point()`).

6. Simulate and plot some data from simple ARIMA models. a. Use the following R code to generate data from an AR(1) model with phi=0.6 and sigma=1. The process starts with y1=0.

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

b. Produce a time plot for the series. How does the plot change as you change phi?

Depending on how we change phi, the plot changes accordingly. At phi=1, y is equal to random walk and the plot looks like a trend. However, when when phi=-1, it oscillates between positive and negative numbers around zero. Lastly, when phi is around 0, y is equal to white noise. It seem as though when the value of phi is larger and positive, we are more likely to see a pattern/tread and random walk, while smaller positive numbers produce more noise and less clear trend. On the other hand, negative numbers produce oscillation while closer to zero does not have a pattern/more white noise.

sim %>%
  autoplot(y) +
  labs(title = "Sim Time Series AR(1): Phi = 0.6")

for(i in 2:100)
  y[i] <- 0.1*y[i-1] + e[i]
sim1 <- tsibble(idx = seq_len(100), y = y, index = idx)

for(i in 2:100)
  y[i] <- 1*y[i-1] + e[i]
sim2 <- tsibble(idx = seq_len(100), y = y, index = idx)

for(i in 2:100)
  y[i] <- -1*y[i-1] + e[i]
sim3 <- tsibble(idx = seq_len(100), y = y, index = idx)

sim3 %>%
  autoplot(y) +
  labs(title = "Sim Time Series AR(1): Phi = -1")

sim1 %>%
  autoplot(y) +
  labs(title = "Sim Time Series AR(1): Phi = 0.1")

sim2 %>%
  autoplot(y) +
   labs(title = "Sim Time Series AR(1): Phi = 1")

c. Write your own code to generate data from an MA(1) model with theta=0.6 and sigma=1.

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 1*e[i] + 0.6*e[i-1] 
# y[t] = e[i] + theta * e[i-1]

gen_sim <- tsibble(idx = seq_len(100), y = y, index = idx)

d. Produce a time plot for the series. How does the plot change as you change theta?

Similar to the AR(1) model, the time plot changes depending on how we change phi. At theta=1, y is equal to random walk but we don’t see much of a trend here. When theta=-1, it oscillates between positive and negative numbers around zero, not as much are the AR(1) model. Lastly, when theta is around 0, y is equal to white noise and appears this way. It seem as though the bigger the value of theta, the more likely we are to see less random noise, except when you move to a negative number, when the oscillation occurs.

gen_sim %>%
  autoplot(y) +
  labs(title = "Sim Time Series MA(1): Theta = 0.6 and Phi = 1")

for(i in 2:100)
  y[i] <- 1*e[i] + 0.1*e[i-1] 
gen_sim1 <- tsibble(idx = seq_len(100), y = y, index = idx)

gen_sim1 %>%
  autoplot(y) +
  labs(title = "Sim Time Series MA(1): Theta = 0.1 and Phi = 1")

for(i in 2:100)
  y[i] <- 1*e[i] - 1*e[i-1] 
gen_sim2 <- tsibble(idx = seq_len(100), y = y, index = idx)

gen_sim2 %>%
  autoplot(y) +
  labs(title = "Sim Time Series MA(1): Theta = -1 and Phi = 1")

for(i in 2:100)
  y[i] <- 1*e[i] + 1*e[i-1] 
gen_sim3 <- tsibble(idx = seq_len(100), y = y, index = idx)

gen_sim3 %>%
  autoplot(y) +
  labs(title = "Sim Time Series MA(1): Theta = 1 and Phi = 1")

e. Generate data from an ARMA(1,1) model with phi=0.6, theta=0.6 and sigma=1.

set.seed(1234)
y <- numeric(100)
e <- rnorm(100)
  
for(i in 2:100)
  y[i] <- e[i] + 0.6 * e[i-1] + 0.6 * y[i-1]

arma_1_1 <- tsibble(idx = seq_len(100), y = y, index = idx)

f. Generate data from an AR(2) model with phi1=-0.8, phi2=0.3 and sigma=1. (Note that these parameters will give a non-stationary series.)

set.seed(1234)
y <- numeric(100)
e <- rnorm(100)

for(i in 3:100)
  y[i] <- e[i] - 0.8 * y[i-1] + 0.3 * y[i-2]

ar_2 <- tsibble(idx = seq_len(100), y = y, index = idx)

g. Graph the latter two series and compare them.

The ARMA(1,1) model appears to be somewhat stationary and the AR(2) model appears to be non-stationary. The ARMA(1,1) model overall plot does not show a trend, the ACF plot is decreasing at a fast pace, however the PACF plot seems to have autocorrelation due the very large first lag. The AR(2) model shows oscillation around the values of zero, while also increasing in the size of oscillation as the index increases. The ACF plot has the same values of lags on the both sides of zero, while decaying and the PACF plot has one long negative first lag. The output we see in the plot is probably because phi2-phi1<1 is not met for AR(2) to be stationary.

arma_1_1 %>%
   gg_tsdisplay(y, plot_type='partial') +
  labs(title = "ARMA(1,1) model with phi=0.6, theta=0.6 and sigma=1")

ar_2 %>%
   gg_tsdisplay(y, plot_type='partial') +
  labs(title = "AR(2) model with phi1=-0.8, phi2=0.3 and sigma=1")

7. Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

a. Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

The appropriate ARIMA model is ARIMA(0,2,1). The histogram of residuals looks somewhat unimodal and normally distributed. The ACF of the residuals shows that none of the lags are out of bound and all the lags are somewhat close to zero with no obvious patterns in the model. The residuals do look like its white noise since there appears to be some autocorrelation in the residual series.

From the Box-Pierce test and the Ljung-Box test, we see that the results are not significant since the p-values are 0.8823761 and 0.8149997. We can conclude that the residuals are not distinguishable from white noise. This means that there is no significant autocorrelation in the residuals and the residuals are white noise.

aus_clean <- aus_airpassengers|>
  filter(Year < 2012)

fit <- aus_clean|>
  model(ARIMA(Passengers))

report(fit)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8756
## s.e.   0.0722
## 
## sigma^2 estimated as 4.671:  log likelihood=-87.8
## AIC=179.61   AICc=179.93   BIC=182.99

fit %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Passengers: ARIMA(0,2,1)")

fit %>% 
  gg_tsresiduals()

augment(fit) |> features(.innov, box_pierce, lag=10)

## # A tibble: 1 × 3
##   .model            bp_stat bp_pvalue
##   <chr>               <dbl>     <dbl>
## 1 ARIMA(Passengers)    5.13     0.882

augment(fit) |> features(.innov, ljung_box, lag=10)

## # A tibble: 1 × 3
##   .model            lb_stat lb_pvalue
##   <chr>               <dbl>     <dbl>
## 1 ARIMA(Passengers)    6.00     0.815

b. Write the model in terms of the backshift operator.

ARIMA(0, 2, 1) in terms of backshift operator:

\[ (1 - B)^2 Y_t = -0.8756 (1 - B) \epsilon_t \]

c. Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

The forecast of an ARIMA(0,1,0) model with drift produced a forecast with more passengers in the next 10 years compared to part a. It looks like the ARIMA(0,2,1) model in part a was slightly over predicting values and the ARIMA(0,1,0) model was slightly under predicting values. The residuals appear different, however both seem to be white noise. The ARIMA(0,2,1) model is still slightly better than the ARIMA(0,1,0) model with an AICs of 179.61 and 182.17m respectively.

fit2 <- aus_clean|>
  model(ARIMA(Passengers ~ pdq(0,1,0))) #already has drift

report(fit2)

## Series: Passengers 
## Model: ARIMA(0,1,0) w/ drift 
## 
## Coefficients:
##       constant
##         1.3669
## s.e.    0.3319
## 
## sigma^2 estimated as 4.629:  log likelihood=-89.08
## AIC=182.17   AICc=182.48   BIC=185.59

fc2 <- fit2 %>% 
  forecast(h=10)

fc2%>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Passengers with ARIMA(0,1,0) w/ Drift)")

fit2 %>% 
  gg_tsresiduals()

augment(fit2) |> features(.innov, box_pierce, lag = 10)

## # A tibble: 1 × 3
##   .model                           bp_stat bp_pvalue
##   <chr>                              <dbl>     <dbl>
## 1 ARIMA(Passengers ~ pdq(0, 1, 0))    3.98     0.948

augment(fit2) |> features(.innov, ljung_box, lag=10)

## # A tibble: 1 × 3
##   .model                           lb_stat lb_pvalue
##   <chr>                              <dbl>     <dbl>
## 1 ARIMA(Passengers ~ pdq(0, 1, 0))    4.92     0.897

d. Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

The forecast of an ARIMA(2,1,2) model with drift and without a constant is non-stationary, therefore cannot produce all parts of the output. We are now unable to see the forecast, residuals, and AIC.

fit3 <- aus_clean|>
  model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))

## Warning: 1 error encountered for ARIMA(Passengers ~ 0 + pdq(2, 1, 2))
## [1] non-stationary AR part from CSS

report(fit3)

## Series: Passengers 
## Model: NULL model 
## NULL model

fit3 %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Passengers with ARIMA(0,2,1) w/ drift, no constant")

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).

#fit3 %>% 
#  gg_tsresiduals()

augment(fit3) |> features(.innov, box_pierce, lag = 10)

## # A tibble: 1 × 3
##   .model                               bp_stat bp_pvalue
##   <chr>                                  <dbl>     <dbl>
## 1 ARIMA(Passengers ~ 0 + pdq(2, 1, 2))      NA        NA

augment(fit3) |> features(.innov, ljung_box, lag=10)

## # A tibble: 1 × 3
##   .model                               lb_stat lb_pvalue
##   <chr>                                  <dbl>     <dbl>
## 1 ARIMA(Passengers ~ 0 + pdq(2, 1, 2))      NA        NA

e. Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

The forecast of an ARIMA(0,2,1) model with a constant produced a forecast with less passengers in the next 10 years compared to part c. The residuals appear different, once again, however all still seem to be white noise. The ARIMA(0,2,1) appears to be the best with the lowest AIC, but I do not think this is the best model because for d>1, a constant is not recommended because a quadratic or higher order trend is particularly dangerous when forecasting.

fit4 <- aus_clean %>%
  model(arima3 = ARIMA(Passengers ~ 1 + pdq(0,2,1)))

## Warning: Model specification induces a quadratic or higher order polynomial trend. 
## This is generally discouraged, consider removing the constant or reducing the number of differences.

report(fit4)

## Series: Passengers 
## Model: ARIMA(0,2,1) w/ poly 
## 
## Coefficients:
##           ma1  constant
##       -1.0000    0.0721
## s.e.   0.0709    0.0260
## 
## sigma^2 estimated as 4.086:  log likelihood=-85.74
## AIC=177.48   AICc=178.15   BIC=182.55

fit4 %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Passengers with ARIMA(0,2,1) w/drift")

fit4 %>% 
  gg_tsresiduals()

augment(fit4) |> features(.innov, box_pierce, lag = 10)

## # A tibble: 1 × 3
##   .model bp_stat bp_pvalue
##   <chr>    <dbl>     <dbl>
## 1 arima3    5.27     0.872

augment(fit4) |> features(.innov, ljung_box, lag=10)

## # A tibble: 1 × 3
##   .model lb_stat lb_pvalue
##   <chr>    <dbl>     <dbl>
## 1 arima3    6.19     0.799

8. For the United States GDP series (from global_economy):

a. if necessary, find a suitable Box-Cox transformation for the data;

I did a Box-Cox transformation and the lambda is 0.2819443.

us_econ <- global_economy |>
  filter(Country == "United States")

us_econ |>
  autoplot(GDP)

lambda4 <- us_econ |>
  features(GDP, features = guerrero) |>
  pull(lambda_guerrero)
lambda4

## [1] 0.2819443

us_econ |>
  autoplot(box_cox(GDP, lambda4)) +
    labs(y = "",
    title = latex2exp::TeX(paste0("Transformed GDP of US $\\lambda4$ = ", round(lambda4,2))))

b. fit a suitable ARIMA model to the transformed data using ARIMA();

The suitable ARIMA model is model from the report is ARIMA(1,1,0) w/ drift, with an AIC of 656.65.

fit5 <- us_econ|>
  model(ARIMA(box_cox(GDP, lambda4))) 

report(fit5)

## Series: GDP 
## Model: ARIMA(1,1,0) w/ drift 
## Transformation: box_cox(GDP, lambda4) 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

fit5 %>% 
  forecast(h=10) %>%
  autoplot(us_econ) +
  labs(title = "ARIMA(1,1,0) w/ drift")

c. try some other plausible models by experimenting with the orders chosen;

I first tried differencing the box-cox transformed data, but it did not help to make the data stationary. I then experimented with different ARIMA models.

us_econ |> ACF(GDP) |>
  autoplot()

us_econ |> PACF(GDP) |>
  autoplot()

us_econ  |>
  mutate(us_box = box_cox(GDP, lambda4)) |>
  features(us_box, unitroot_ndiffs) #ordinary

## # A tibble: 1 × 2
##   Country       ndiffs
##   <fct>          <int>
## 1 United States      1

us_econ |>
  gg_tsdisplay(difference(box_cox(GDP,lambda4),1), plot_type='partial') +
  labs(y = "",
    title = latex2exp::TeX(paste0("Box-cox and ordinary difference of US GDP $\\lambda$ = ", round(lambda4,2))))

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

us_fit <- us_econ |>
  model(arima110 = ARIMA(box_cox(GDP,lambda4) ~ pdq(1,1,0)),
        arima021 = ARIMA(box_cox(GDP,lambda4) ~ pdq(0,2,1)),
        arima210 = ARIMA(box_cox(GDP,lambda4) ~ pdq(2,1,0)),
        arima010 = ARIMA(box_cox(GDP,lambda4) ~ pdq(0,1,0)),
        arima013 = ARIMA(box_cox(GDP,lambda4) ~ pdq(0,1,3)),
        arima1101 = ARIMA(box_cox(GDP,lambda4) ~ 0 + pdq(1,1,0)),
        stepwise = ARIMA(GDP),
        search = ARIMA(GDP, stepwise = FALSE))

d. choose what you think is the best model and check the residual diagnostics;

The best model appears to be ARIMA(2,1,0) with an AIC of 649.8444. The histogram of the residuals looks unimodal and somwehat normally distributed. The ACF of the residuals shows that none of the lags are out of bound and all the lags are somewhat close to zero and there is no obvious patterns in the model. The residuals do look like its white noise since there appears to be some autocorrelation in the residual series.

From the Box-Pierce test and the Ljung-Box test, we see that the results are not significant since the p-values are 0.9637426 and 0.9348524. We can conclude that the residuals are not distinguishable from white noise. This means that there is no significant autocorrelation in the residuals and the residuals are white noise.

glance(us_fit) |> arrange(AICc) |> select(.model:BIC)

## # A tibble: 8 × 6
##   .model     sigma2 log_lik   AIC  AICc   BIC
##   <chr>       <dbl>   <dbl> <dbl> <dbl> <dbl>
## 1 arima021  6.02e 3   -323.  650.  650.  654.
## 2 arima110  5.48e 3   -325.  657.  657.  663.
## 3 arima210  5.58e 3   -325.  659.  659.  667.
## 4 arima013  5.71e 3   -325.  661.  662.  671.
## 5 arima010  6.77e 3   -332.  668.  668.  672.
## 6 arima1101 7.04e 3   -334.  672.  672.  676.
## 7 stepwise  2.61e22  -1524. 3054. 3055. 3060.
## 8 search    2.61e22  -1524. 3054. 3055. 3060.

fit6 <- us_econ |>
  model(arima012 = ARIMA(box_cox(GDP,lambda4) ~ pdq(0,1,2)))
        
fit6 %>% 
  gg_tsresiduals()

augment(fit6) |> features(.innov, box_pierce, lag = 10)

## # A tibble: 1 × 4
##   Country       .model   bp_stat bp_pvalue
##   <fct>         <chr>      <dbl>     <dbl>
## 1 United States arima012    3.60     0.964

augment(fit6) |> features(.innov, ljung_box, lag=10)

## # A tibble: 1 × 4
##   Country       .model   lb_stat lb_pvalue
##   <fct>         <chr>      <dbl>     <dbl>
## 1 United States arima012    4.26     0.935

e. produce forecasts of your fitted model. Do the forecasts look reasonable?

The forecast looks reasonable to me, especially since the confidence interval is pretty narrow which means the model has low variability and it is certain about the forecasted predictions.

fit6 %>% 
  forecast(h=10) %>%
  autoplot(us_econ) +
  labs(title = "ARIMA(0,1,2)")

f. compare the results with what you would obtain using ETS() (with no transformation).

The ETS() model has a wider confidence interval which means it is less confident in it’s predictions. The model also has a much larger AIC, therefore, I think the ARIMA(0,1,2) model from the previous question is a better model to predict US GDP.

fit_ets <- us_econ |>
  model(ETS(GDP)) 

report(fit_ets)

## Series: GDP 
## Model: ETS(M,A,N) 
##   Smoothing parameters:
##     alpha = 0.9990876 
##     beta  = 0.5011949 
## 
##   Initial states:
##          l[0]        b[0]
##  448093333334 64917355687
## 
##   sigma^2:  7e-04
## 
##      AIC     AICc      BIC 
## 3190.787 3191.941 3201.089

fit_ets |>
  gg_tsresiduals(lag_max = 16)

fit_ets|>
  forecast(h=10) |>
  autoplot(us_econ) +
  labs(title = "ETS US GDP")

Data 624:HW 6

Nakesha Fray

2025-03-13

Do the exercises 9.1, 9.2, 9.3, 9.5, 9.6, 9.7, 9.8 in Hyndman. Please submit both the Rpubs link as well as your .rmd file.