DATA624HW6

Import Libraries

library(fpp3)

## Warning: package 'fpp3' was built under R version 4.3.3

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.0 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.5
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.3.2
## ✔ lubridate   1.9.3     ✔ fable       0.3.4
## ✔ ggplot2     3.5.1     ✔ fabletools  0.4.2

## Warning: package 'ggplot2' was built under R version 4.3.3

## Warning: package 'tsibble' was built under R version 4.3.3

## Warning: package 'tsibbledata' was built under R version 4.3.3

## Warning: package 'feasts' was built under R version 4.3.3

## Warning: package 'fabletools' was built under R version 4.3.3

## Warning: package 'fable' was built under R version 4.3.3

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

9.1

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise? Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers. Figure 9.32: Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers.

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

A. Explain the differences among these figures. Do they all indicate that the data are white noise?

The main difference between the figures is that each one represents the ACF (autocorrelation function) for a different number of random values. They vary in the number of random numbers depicted in the graphs. Additionally, we observe that the peaks and valleys in the graphs become smaller as the number of random numbers increases. The blue lines mark the boundaries, representing the significant threshold, which is calculated as ±1.96/√N, where N is the length of the time series. As shown in the graphs, all the data points fall within these boundaries, indicating that the data is purely white noise.

B. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The critical values are determined using the formula ±1.96/√N, so their distance from the mean of zero will vary depending on N, which represents the length of the time series. As N increases, the critical values get closer to zero because they are divided by a larger number. This corresponds to the decreasing autocorrelation values, which continue to shrink as the size of the random number series increases.

9.2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

data("gafa_stock")

# Time Series of ACF and PACF
gafa_stock %>%
  filter(Symbol == "AMZN") %>%
  gg_tsdisplay(Close, plot_type='partial') +
  labs(title = "Amazon Final Stock Price")

## Warning: Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.

# Unit Root Test
gafa_stock %>%
  filter(Symbol == "AMZN") %>%
  features(Close, unitroot_ndiffs)

# KPSS 
gafa_stock %>%
  filter(Symbol == "AMZN") %>%
  features(Close, unitroot_kpss)

A quick glance at the time series graph suggests that it is non-stationary due to its upward trend over time. In the ACF plot, there is a significant positive r1, and the values decrease slowly, unlike in a stationary time series, where they would drop quickly to zero. Additionally, the PACF graph shows a significant first lag, further indicating a non-stationary pattern. To make the time series stationary, we can apply the KPSS test, which would show that the closing price needs to be differenced once to achieve stationarity.

9.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Turkish GDP from global_economy. Accommodation takings in the state of Tasmania from aus_accommodation. Monthly sales from souvenirs.

A. Turkish GDP from global_economy.

data("global_economy")

# Lambda
Turkey_lambda <- global_economy %>%
  filter(Country == "Turkey") %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

# Unit Root Test KPSS
global_economy %>%
  filter(Country == "Turkey") %>%
  features(box_cox(GDP,Turkey_lambda), unitroot_ndiffs)

# Time Series of ACF and PACF
global_economy %>% 
  filter(Country =="Turkey") %>%
  gg_tsdisplay(GDP, plot_type='partial') +
  labs(title = "Turkish GDP")

# Box-Cox Transformation
global_economy %>%
  filter(Country == "Turkey") %>%
  gg_tsdisplay(difference(box_cox(GDP, Turkey_lambda)), plot_type = 'partial') +
  labs(title = paste("Differenced Turkish GDP with lambda = ", round(Turkey_lambda, 2)))

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

Turkey’s GDP is non-seasonal and non-stationary. According to the KPSS test, applying a single differencing step is recommended to make the data stationary.

B. Accommodation takings in the state of Tasmania from aus_accommodation.

data("aus_accommodation")

# Lambda
Tas_lambda <-aus_accommodation %>%
  filter(State == "Tasmania") %>%
  features(Takings, features = guerrero) %>%
  pull(lambda_guerrero)

# Unit Root Test KPSS
aus_accommodation %>%
  filter(State == "Tasmania") %>%
  features(box_cox(Takings,Tas_lambda), unitroot_nsdiffs)

# Time Series of ACF and PACF
aus_accommodation %>%
  filter(State == "Tasmania") %>%
  gg_tsdisplay(Takings, plot_type='partial') +
  labs(title = "Tasmania Accommodation Revenue")

# Box-Cox Transformation
aus_accommodation %>%
  filter(State == "Tasmania") %>%
  gg_tsdisplay(difference(box_cox(Takings,Tas_lambda), 4), plot_type='partial') +
  labs(title = paste("Differenced Tasmania Accomodation Takings with lambda = ",round(Tas_lambda,2)))

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

Tasmania’s accommodation revenue shows both seasonality and non-stationarity. Based on the KPSS test, one differencing step is recommended to make the data stationary. The autocorrelation function (ACF) displays a rapid decline, while the partial autocorrelation function (PACF) cuts off after the first lag. However, after applying a second differencing step, the data seems more centered around zero.

C. Monthly sales from souvenirs.

# Lambda
Souv_lambda <- souvenirs %>%
  features(Sales, features = guerrero) %>%
  pull(lambda_guerrero)

# Time Series of ACF and PACF
souvenirs %>%
  gg_tsdisplay(Sales, plot_type='partial', lag = 24) +
  labs(title = "Monthly Souvenir Sales")

# Unit Root Test KPSS
souvenirs %>%
  features(box_cox(Sales,Souv_lambda), unitroot_nsdiffs)

# Box-Cox Transformation
souvenirs %>%
  gg_tsdisplay(difference(box_cox(Sales,Souv_lambda), 12), plot_type='partial', lag = 24) +
   labs(title = paste("Differenced Monthly Souvenir Sales with lambda = ",round(Souv_lambda,2)))

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

Monthly souvenir sales show both seasonality and non-stationarity. The KPSS test recommends a single differencing step, and a lambda value near 0 suggests applying a natural log transformation. After this differencing, the graphs indicate that the data becomes stationary. The autocorrelation function (ACF) shows a quick decline, while the partial autocorrelation function (PACF) cuts off after the first lag.

9.5

For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

# Exercise 7 from Section 2.10
set.seed(334)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1)) 

# Time Series/ACF/PACF
myseries %>%
  gg_tsdisplay(Turnover, plot_type='partial', lag = 36) +
  labs(title = "Retail Sales")

# Lambda
lambda <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

# Unit Root Test KPSS
myseries %>%
  features(box_cox(Turnover, lambda), unitroot_nsdiffs)

# Box-Cox Transformation
myseries %>%
  gg_tsdisplay(difference(box_cox(Turnover,lambda), 12), plot_type='partial', lag = 36) +
  labs(title = paste("Differenced Tasmania Accomodation Takings with lambda = ",round(lambda,2)))

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

9.6

A. Use the following R code to generate data from an AR(1) model with
ϕ 1 = 0.6 and
σ 2 = 1 . The process starts with
y 1 = 0 .

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

B. Produce a time plot for the series. How does the plot change as you change
ϕ 1 ?

# Base Phi
sim %>%
  autoplot(y) +
  labs(title = "AR(1) model with Phi = 0.6")

# Phi_1 = 1
for(i in 2:100)
  y[i] <- 1*y[i-1] + e[i]

tsibble(idx = seq_len(100), y = y, index = idx) %>%
  autoplot(y) +
  labs(title = "AR(1) model with Phi = 1")

# Phi_1 = -0.6
for(i in 2:100)
  y[i] <- -0.6*y[i-1] + e[i]

tsibble(idx = seq_len(100), y = y, index = idx) %>%
  autoplot(y) +
  labs(title = "AR(1) model with Phi = -0.6")

When ϕ is at its base value, the pattern resembles random noise. As ϕ increases, it starts to display traits similar to a random walk, especially when ϕ is set to 1. When ϕ decreases to a negative value, like -0.6, we observe greater fluctuations, with the peaks and valleys spiking more dramatically around the mean.

C. Write your own code to generate data from an MA(1) model with
θ 1 = 0.6 and
σ 2 = 1 .

for(i in 2:100)
  y[i] <- e[i] + 0.6 * e[i-1]

sim_2 <- tsibble(idx = seq_len(100), y = y, index = idx)

D. Produce a time plot for the series. How does the plot change as you change
θ 1 ?

# Base Theta
sim_2 %>%
  autoplot(y) +
  labs(title = "MA(1) model with Theta = 0.6")

# Theta_1 = 1
for(i in 2:100)
  y[i] <- e[i] + 1 * e[i-1]

tsibble(idx = seq_len(100), y = y, index = idx) %>%
  autoplot(y) +
  labs(title = "MA(1) model with Theta = 1")

# Theta_1 = -0.6
for(i in 2:100)
  y[i] <- e[i] + -0.6 * e[i-1]

tsibble(idx = seq_len(100), y = y, index = idx) %>%
  autoplot(y) +
  labs(title = "MA(1) model with Theta = -0.6")

As θ increases, the amplitude of the graph rises, leading to higher peaks and deeper valleys. Conversely, when θ decreases, the frequency of the spikes increases, making it appear as though the valleys are compressed along the x-axis.

E. Generate data from an ARMA(1,1) model with
ϕ 1 = 0.6 ,
θ 1 = 0.6 and
σ 2 = 1 .

set.seed(334)
y <- numeric(100)

for(i in 2:100)
  y[i] <- e[i] + 0.6 * e[i-1] + 0.6 * y[i-1]

ARMA_11 <- tsibble(idx = seq_len(100), y = y, index = idx)

F. Generate data from an AR(2) model with
ϕ 1 = − 0.8 ,
ϕ 2 = 0.3 and
σ 2 = 1 . (Note that these parameters will give a non-stationary series.)

set.seed(334)
y <- numeric(100)

for(i in 3:100)
  y[i] <- -0.8 * y[i-1] + 0.3 * y[i-2] + e[i]

AR_2 <- tsibble(idx = seq_len(100), y = y, index = idx)

G. Graph the latter two series and compare them.

ARMA_11 %>%
  gg_tsdisplay(y, plot_type = 'partial') +
  labs(title = "ARMA(1,1) model with Phi = 0.6, Theta = 0.6, and Sigma^2 = 1")

AR_2 %>%
  gg_tsdisplay(y, plot_type = 'partial') +
  labs(title = "AR(2) model with Phi = -0.8, Thi = 0.3, and Sigma^2 = 1")

The ARMA(1,1) model shows signs of stationarity. It appears random, featuring a rapidly decreasing ACF plot and a PACF plot that truncates after the first lag. In contrast, the AR(2) model demonstrates non-stationarity, oscillating around the mean while displaying increasing variance as the index rises. The PACF plot shows a negative first lag with no further lags, while the ACF plot also exhibits oscillations. These inconsistencies seem to arise from unmet parameter constraints, specifically ϕ2 − ϕ1 < 1.

9.7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods. Write the model in terms of the backshift operator. Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a. Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens. Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

fit <- aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers)) 

report(fit)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8756
## s.e.   0.0722
## 
## sigma^2 estimated as 4.671:  log likelihood=-87.8
## AIC=179.61   AICc=179.93   BIC=182.99

fit %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Aircraft Passengers with ARIMA(0,2,1)", y = "Passengers (in millions)")

fit %>% 
  gg_tsresiduals() +
  labs(title = "Residuals for ARIMA(0,2,1)")

B. yt=−0.8756ϵt−1+ϵt (1−B)2yt=(1−0.87B)ϵt

fit2 <-aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ pdq(0,1,0)))

fit2 %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Aircraft Passengers with ARIMA(0,1,0)", y = "Passengers (in millions)")

fit2 %>% 
  gg_tsresiduals() +
  labs(title = "Residuals for ARIMA(0,1,0)")

The ARIMA model from part A predicts values that are higher than the actual figures, while this ARIMA model’s forecasted values fall below the actual results. The forecasted slopes appear to be more gradual.

fit3 <-aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ pdq(2,1,2)))

fit3 %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Aircraft Passengers with ARIMA(2,1,2)", y = "Passengers (in millions)")

fit3 %>% 
  gg_tsresiduals() +
  labs(title = "Residuals for ARIMA(2,1,2)")

#removing constant
fit4 <-aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))

## Warning: 1 error encountered for ARIMA(Passengers ~ 0 + pdq(2, 1, 2))
## [1] non-stationary AR part from CSS

report(fit4)

## Series: Passengers 
## Model: NULL model 
## NULL model

It closely mirrors the earlier section, with the residuals showing properties typical of white noise. If the constants are removed, a null model is evident. However, if the constants are retained, the forecast follows a polynomial trend of order d-1 or 0, which leads to non-stationarity.

fit5 <-aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))

## Warning: Model specification induces a quadratic or higher order polynomial trend. 
## This is generally discouraged, consider removing the constant or reducing the number of differences.

fit5 %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Aircraft Passengers with ARIMA(0,2,1) with constant", y = "Passengers (in millions)")

fit5 %>% 
  gg_tsresiduals() +
  labs(title = "Residuals for ARIMA(0,2,1) with constant")

When the slope rises significantly, a warning is issued, indicating that the model specifies a higher-order polynomial trend and suggesting the removal of the constant.

9.8

For the United States GDP series (from global_economy):

if necessary, find a suitable Box-Cox transformation for the data; fit a suitable ARIMA model to the transformed data using ARIMA(); try some other plausible models by experimenting with the orders chosen; choose what you think is the best model and check the residual diagnostics; produce forecasts of your fitted model. Do the forecasts look reasonable? compare the results with what you would obtain using ETS() (with no transformation).

# plot
global_economy %>%
  filter(Code == "USA") %>%
  gg_tsdisplay(GDP, plot_type='partial') +
  labs(title = "Gross Domestic Product of the United States")

The GDP is non-stationary and shows an upward trend. I think applying a Box-Cox transformation will help stabilize the data.

# calculate lambda
lambda <-global_economy %>%
  filter(Code == "USA") %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

fit6 <- global_economy %>%
  filter(Code == "USA") %>%
  model(ARIMA(box_cox(GDP, lambda))) 

report(fit6)

## Series: GDP 
## Model: ARIMA(1,1,0) w/ drift 
## Transformation: box_cox(GDP, lambda) 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

ARIMA(1,1,0) with a drift was appropriate for the transformed data.

# transformed plot
global_economy %>%
  filter(Code == "USA") %>%
  gg_tsdisplay(box_cox(GDP,lambda), plot_type='partial') +
  labs(title = "Transformed GDP of the United States")

# unit root test
global_economy %>%
  filter(Code == "USA") %>%
  features(box_cox(GDP,lambda), unitroot_ndiffs)

usa_fit <- global_economy %>%
  filter(Code == "USA") %>%
  model(arima110 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,1,0)),
        arima120 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,2,0)),
        arima210 = ARIMA(box_cox(GDP,lambda) ~ pdq(2,1,0)),
        arima212 = ARIMA(box_cox(GDP,lambda) ~ pdq(2,1,2)),
        arima111 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,1,1)))

glance(usa_fit) %>% arrange(AICc) %>% select(.model:BIC)

The unit root test indicates that a single differencing is sufficient. The ACF plot shows a declining trend, and the PACF plot reveals a significant first lag, suggesting an AR(1) model or ARIMA(1,1,0). This conclusion aligns with the results from part b. While the models have comparable AICc values, ARIMA(1,2,0) has the lowest value, followed closely by ARIMA(1,1,0).

usa_fit %>%   select(arima120) %>%   gg_tsresiduals() +   ggtitle("Residuals for ARIMA(1,2,0)")

usa_fit %>%
  forecast(h=5) %>%
  filter(.model=='arima120') %>%
  autoplot(global_economy)

The forecasts appear to be reasonable, as the trend is progressing at a very similar angle.

fit_ets <- global_economy %>%
  filter(Code == "USA") %>%
  model(ETS(GDP))

report(fit_ets)

## Series: GDP 
## Model: ETS(M,A,N) 
##   Smoothing parameters:
##     alpha = 0.9990876 
##     beta  = 0.5011949 
## 
##   Initial states:
##          l[0]        b[0]
##  448093333334 64917355687
## 
##   sigma^2:  7e-04
## 
##      AIC     AICc      BIC 
## 3190.787 3191.941 3201.089

fit_ets %>%
  forecast(h=5) %>%
  autoplot(global_economy)

The ETS() model has a considerably higher AICc, indicating that it performs worse than the other models.

DATA624HW6

2024-10-18

Import Libraries

9.1

9.2

9.3

9.5

9.6

9.7

9.8