Question 1

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

a) Explain the differences among these figures. Do they all indicate that the data are white noise?

The three ACF plots display white noise series of different lengths: 36, 360, and 1,000 numbers. In the plot with 36 numbers, there are several spikes that exceed the confidence bounds due to greater variability in the small sample. As the sample size increases to 360 and 1,000 numbers, the spikes become smaller, and almost all autocorrelations fall within the confidence bounds, better reflecting the expected behavior of white noise. Overall, all three plots indicate white noise, but the larger samples (360 and 1,000) provide clearer evidence with fewer random fluctuations compared to the smaller sample.

b) Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

In the ACF plots, the critical values (blue dashed lines) decrease as the sample size increases because the standard error of the autocorrelations is smaller for larger samples. This makes the confidence intervals narrower. In smaller samples, like the plot with 36 numbers, random variations cause the autocorrelations to appear larger, which is why more spikes exceed the confidence bounds. As the sample size grows to 360 and 1,000 numbers, these random fluctuations become less noticeable, and the autocorrelations are closer to zero, which is expected for white noise.

Question 2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

gafa_stock %>%
  filter(Symbol == "AMZN") %>%
  gg_tsdisplay(Close, plot_type='partial') 
## Warning: Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.

Question 3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

a) Turkish GDP from global_economy

Turkish GDP is non-seasonal and non-stationary.

gdp_turkey <- global_economy %>%
  filter(Country == "Turkey") %>%
  select(GDP)
 gdp_turkey %>% gg_tsdisplay(GDP, plot_type='partial') +
  labs(title = "Original Turkish GDP plot")

lambda <- global_economy %>%
  filter(Country == "Turkey") %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)
 
gdp_turkey %>%   gg_tsdisplay(difference(box_cox(GDP,lambda)), plot_type='partial') +
  labs(title = paste0("Transformed Turkish GDP with lambda = ",
         round(lambda,2)))

b)Accommodation takings in the state of Tasmania from aus_accommodation.

Accommodation takings in the state of Tasmania is seasonal and non-stationary

tas_accommodation <- aus_accommodation %>%
  filter(State == "Tasmania") %>%
  select(Takings)
tas_accommodation %>% gg_tsdisplay(Takings, plot_type='partial') +
  labs(title = "Original Tasmania Accomodation Takings plot")

# calculate lambda
lambda <-aus_accommodation %>%
  filter(State == "Tasmania") %>%
  features(Takings, features = guerrero) %>%
  pull(lambda_guerrero)

tas_accommodation %>% gg_tsdisplay(difference(box_cox(Takings,lambda), 4), plot_type='partial') +
  labs(title = paste0("Transformed Tasmania Accomodation Takings plot with lambda = ",
         round(lambda,2)))

c) Monthly sales from souvenirs.

Monthly souvernir sales is seasonal and non-stationary.

souvenirs %>%
  gg_tsdisplay(Sales, plot_type='partial', lag = 36) +
  labs(title = "Original Monthly Souvenir Sales plot")

# calculate lambda
lambda <- souvenirs %>%
  features(Sales, features = guerrero) %>%
  pull(lambda_guerrero)

souvenirs %>%
  gg_tsdisplay(difference(box_cox(Sales,lambda), 12), plot_type='partial', lag = 36) +
  labs(title = paste0("Transformed Monthly Souvenir Sales plot with lambda = ",
         round(lambda,2)))

Question 5

For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

set.seed(2601)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`, 1)) 

nsdiffs = 1 indicates that the series requires one seasonal difference to achieve stationarity.

myseries_diff <- myseries %>%
  mutate(
    transformed_diff = difference(difference(box_cox(Turnover, lambda), 12))
  )

# Step 2: Visualize the differenced series
myseries_diff %>%
  gg_tsdisplay(transformed_diff, plot_type = 'partial', lag = 36) +
  labs(title = paste0("Transformed and Differenced Data (lambda = ", round(lambda, 2), ")"))

# Step 3: Perform the ADF test to confirm stationarity
adf_result <- myseries_diff %>%
  features(transformed_diff, unitroot_kpss)

print(adf_result)
## # A tibble: 1 × 4
##   State             Industry                               kpss_stat kpss_pvalue
##   <chr>             <chr>                                      <dbl>       <dbl>
## 1 Western Australia Footwear and other personal accessory…    0.0107         0.1

It is notable in the top plot that the fluctuations in the differenced series now appear to be stationary, with no visible trend or seasonality.The ACF shows no significant autocorrelations beyond the 95% confidence bounds, indicating that the series has minimal autocorrelation and the PACF also shows no significant spikes, confirming that the remaining structure is white noise. Those are both good signs that the data is now stationary.The low KPSS Statistic (0.0107) confirms that the series is stationary after transformation and differencing.

Question 6

Simulate and plot some data from simple ARIMA models.

a)

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

b) Produce a time plot for the series. How does the plot change as you change TeX(“\(\\phi_1\)”) ?

simulate_and_plot <- function(phi, n = 100, sigma2 = 1) {
  y <- numeric(n)  # Initialize time series
  e <- rnorm(n, mean = 0, sd = sqrt(sigma2))  # Generate white noise
  
  # Simulate the AR(1) process
  for (i in 2:n) {
    y[i] <- phi * y[i - 1] + e[i]
  }
  
  # Create a tsibble and generate the plot
  sim <- tsibble(idx = seq_len(n), y = y, index = idx)
  sim %>%
    autoplot(y) +
    labs(
      title = TeX(paste0("AR(1) model with $\\phi_1$ = ", phi)),
      x = "Time", y = "y"
    )
}

# Generate plots for different phi_1 values
phi_values <- c(0.6, 1, 0, -0.6)

# Plot all models in a 2x2 grid layout
par(mfrow = c(2, 2))  

for (phi in phi_values) {
  print(simulate_and_plot(phi))  # Print each plot in the loop
}

TeX(“\(\\phi_1\)”) = 0.6 -> A moderately persistent process with some autocorrelation.

TeX(“\(\\phi_1\)”) = 1 -> This is a random walk with high persistence with no reversion to the mean.

TeX(“\(\\phi_1\)”) = 0 -> The series is just white noise with no autocorrelation.

TeX(“\(\\phi_1\)”) = -0.6 -> It shows negative autocorrelation.

c) Write your own code to generate data from an MA(1) model with TeX(“\(\\theta_1\)”) = 0.6 and TeX(“\(\\sigma^2\)”) =1

simulate_ma1 <- function(theta, n = 100, sigma2 = 1) {
  # Generate white noise (mean 0, variance sigma2)
  e <- rnorm(n + 1, mean = 0, sd = sqrt(sigma2))  
  y <- numeric(n)  

  # Simulate the MA(1) process: y_t = e_t + theta * e_(t-1)
  for (t in 2:n) {
    y[t] <- e[t] + theta * e[t - 1]
  }

  sim2  <- tsibble(idx = seq_len(n), y = y, index = idx)
  return(sim)
}

# Simulate the MA(1) process with theta = 0.6
sim2 <- simulate_ma1(theta = 0.6)

d) Produce a time plot for the series. How does the plot change as you change TeX(“\(\\theta_1\)”) ?

# Plot the simulated time series
sim2 %>%
  autoplot(y) +
  labs(
    title = expression(paste("MA(1) Model with ", theta[1], " = 0.6")),
    x = "Time", y = "y"
  )

theta_values <- c(0.6, 1, 0, -0.6)

# Generate and store plots for each theta_1 value
plots <- lapply(theta_values, function(theta) {
  sim <- simulate_ma1(theta)
  sim %>%
    autoplot(y) +
    labs(
      title = TeX(paste0("MA(1) Model with $\\theta_1 = ", theta, "$")),
      x = "Time", y = "y"
    )
})

# Display all plots in a 2x2 grid layout
grid.arrange(grobs = plots, ncol = 2)

With TeX(“\(\\theta_1\)”) = 0.6 the series displays moderate persistence, resulting in smooth fluctuations. When TeX(“\(\\theta_1\)”) increases to 1, the plot resembles a random walk, showing strong positive autocorrelation. A TeX(“\(\\theta_1\)”) of 0 eads to a purely random, uncorrelated series similar to white noise. In contrast, a TeX(“\(\\theta_1\)”) of -0.6 introduces negative autocorrelation. In conclusion, changing TeX(“\(\\theta_1\)”) significantly influences the degree and nature of autocorrelation in the resulting time series.

e)

set.seed(2001)

# Initialize parameters
n <- 100
y <- numeric(n)
e <- rnorm(n, mean = 0, sd = 1)  # Generate white noise

# Simulate ARMA(1,1) process
for (i in 2:n) {
  y[i] <- e[i] + 0.6 * e[i - 1] + 0.6 * y[i - 1]
}

# Create a tsibble for plotting
arma1 <- tsibble(idx = seq_len(n), y = y, index = idx)

f)

set.seed(4510)

# Initialize parameters
n <- 100
y <- numeric(n)
e <- rnorm(n, mean = 0, sd = 1)  # Generate white noise

# Simulate AR(2) process
for (i in 3:n) {
  y[i] <- -0.8 * y[i - 1] + 0.3 * y[i - 2] + e[i]
}

# Create a tsibble for plotting
ar2 <- tsibble(idx = seq_len(n), y = y, index = idx)

g) Graph the latter two series and compare them.

 arma1 %>%
  gg_tsdisplay(y, plot_type='partial') +
  labs(title = TeX("ARMA(1,1) model with $\\phi_1 = 0.6$, $\\theta_1 = 0.6$, and $\\sigma^2 = 1$"))

 ar2 %>%
  gg_tsdisplay(y, plot_type='partial') +
  labs(title = TeX("AR(2) model with $\\phi_1 = -0.8$, $\\phi_2 = 0.3$, and $\\sigma^2 = 1$"))

The ARMA(1,1) model shows stable, moderate oscillations with diminishing autocorrelations, suggesting an overall stationarity. In contrast, the AR(2) model starts stable but quickly becomes unstable with large, growing oscillations, reflecting a potential non-stationarity. The autocorrelations in the AR(2) model persist longer than in the ARMA(1,1), and the AR(2) model’s instability arises from its parameter choices, making it less reliable for forecasting compared to the more controlled ARMA(1,1) model.

Question 7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

a) Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

ARIMA (0,2,1) was picked and the residuals do resemble white noise.

data("aus_airpassengers")

# Fit an ARIMA model using auto.arima()
fit <- auto.arima(aus_airpassengers)
checkresiduals(fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,2,1)
## Q* = 6.5774, df = 8, p-value = 0.5828
## 
## Model df: 1.   Total lags used: 9
forecasted_values <- forecast(fit, h = 10)
autoplot(forecasted_values)

b) Write the model in terms of the backshift operator.

plot(TeX("$y_t = -0.87 \\epsilon_{t-1} + \\epsilon_t$"))

plot(TeX("(1 - B)^2 y_t = (1 - 0.87B) \\epsilon_t"))

c) Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

The ARIMA model from part a forecasted values that were consistently higher than the actual values, whereas the current ARIMA model forecasts lower than the actual values. Additionally, the slope of the forecasts in this model appears to be more gradual compared to the previous one

fit2 <-aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ pdq(0,1,0)))

fit2 %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "ARIMA(0,1,0)", y = "Passengers (in millions)")

fit2 %>% 
  gg_tsresiduals() 

d) Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

fit3 <-aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ pdq(2,1,2)))

fit3 %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = " ARIMA(2,1,2)", y = "Passengers (in millions)")

fit3 %>% 
  gg_tsresiduals() +
  labs(title = "Residuals for ARIMA(2,1,2)")

# removing the constant
fit4 <-aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))

report(fit4)
## Series: Passengers 
## Model: NULL model 
## NULL model

I plotted the forecasts from an ARIMA(2,1,2) model with a drift and compared them to the ones from parts a and c. The forecasts were similar to those from part a, and the residuals looked like white noise, indicating a good fit. However, when I removed the constant, the model became a null model. This change introduced a polynomial trend, making the forecasts non-stationary. This shows that including a constant is important for keeping the forecasts stable

e) Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

fit5 <-aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))
## Warning: Model specification induces a quadratic or higher order polynomial trend. 
## This is generally discouraged, consider removing the constant or reducing the number of differences.
fit5 %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "ARIMA(0,2,1) with constant", y = "Passengers (in millions)")

fit5 %>% 
  gg_tsresiduals() +
  labs(title = "ARIMA(0,2,1) with constant")

The slope becomes steeper, indicating a stronger trend in the forecasts. Additionally, a warning is issued, stating that the model specifies a higher-order polynomial trend due to the constant. This suggests that the model may be overly complex and could benefit from the removal of the constant to better capture the underlying data dynamics.

Question 8

For the United States GDP series (from global_economy):

a)if necessary, find a suitable Box-Cox transformation for the data;

gdp_us <- global_economy %>%
  filter(Code == "USA")

gdp_us %>% gg_tsdisplay(GDP, plot_type='partial') +
  labs(title = "Non-transformed United States GDP")

Based on the ACF and PACF plots in the image, we can see strong positive autocorrelation in the ACF, with a slow decay, indicating that the series is non-stationary. The PACF shows a significant spike at lag 1 and then quickly drops off. Applying a Box-Cox transformation could help stabilize the variance.

lambda <- gdp_us %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

lambda
## [1] 0.2819443
gdp_transformed <- BoxCox(gdp_us$GDP, lambda = 0.2819443)

# Plot the transformed series
autoplot(ts(gdp_transformed, start = c(1960, 1), frequency = 1)) + 
  ggtitle("Transformed United States GDP (Guerrero Method)") +
  xlab("Year") + 
  ylab("Transformed GDP") +
  theme_minimal()

b) fit a suitable ARIMA model to the transformed data using ARIMA();

fit_arima <- auto.arima(gdp_transformed)

#summary of the fitted ARIMA model
summary(fit_arima)
## Series: gdp_transformed 
## ARIMA(1,1,0) with drift 
## 
## Coefficients:
##          ar1     drift
##       0.4586  218.2818
## s.e.  0.1198   17.5551
## 
## sigma^2 = 5479:  log likelihood = -325.32
## AIC=656.65   AICc=657.1   BIC=662.78
## 
## Training set error measures:
##                    ME     RMSE      MAE          MPE     MAPE      MASE
## Training set 1.539081 72.08254 54.30345 0.0007732005 0.428452 0.2423566
##                     ACF1
## Training set -0.02317213
checkresiduals(fit_arima)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,1,0) with drift
## Q* = 3.8137, df = 9, p-value = 0.9232
## 
## Model df: 1.   Total lags used: 10

We have successfully fitted an ARIMA(1,1,0) with drift model to the Box-Cox transformed United States GDP series. Then we checked the residuals of the model to ensure there is no significant autocorrelation (i.e. they resemble white noise ).

c) try some other plausible models by experimenting with the orders chosen;

To experiment with other ARIMA models, we can manually adjust the orders of p, d, and q in the ARIMA model.

# ARIMA(2,1,0)
fit_arima_210 <- Arima(gdp_transformed, order = c(2, 1, 0))
summary(fit_arima_210)
## Series: gdp_transformed 
## ARIMA(2,1,0) 
## 
## Coefficients:
##          ar1     ar2
##       0.6950  0.2485
## s.e.  0.1281  0.1288
## 
## sigma^2 = 6719:  log likelihood = -332.05
## AIC=670.1   AICc=670.55   BIC=676.23
## 
## Training set error measures:
##                    ME     RMSE      MAE       MPE     MAPE     MASE        ACF1
## Training set 15.49463 79.82007 63.01628 0.1445514 0.484831 0.281242 -0.07720077
# ARIMA(0,1,1)
fit_arima_011 <- Arima(gdp_transformed, order = c(0, 1, 1))
summary(fit_arima_011)
## Series: gdp_transformed 
## ARIMA(0,1,1) 
## 
## Coefficients:
##          ma1
##       0.7673
## s.e.  0.0674
## 
## sigma^2 = 23338:  log likelihood = -367.47
## AIC=738.93   AICc=739.16   BIC=743.02
## 
## Training set error measures:
##                    ME     RMSE      MAE       MPE     MAPE      MASE       ACF1
## Training set 123.4948 150.1108 130.9554 0.9868126 1.034531 0.5844546 -0.4256826

d) choose what you think is the best model and check the residual diagnostics;

The ARIMA(2,1,0) model has lower BIC and AIC than the ARIMA(0,1,1) model indicating a better model fit. Moving on we’ll check their ressiduals to verify if they are uncorrelated and resemble white noise.

checkresiduals(fit_arima_210)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,1,0)
## Q* = 11.107, df = 8, p-value = 0.1957
## 
## Model df: 2.   Total lags used: 10
checkresiduals(fit_arima_011)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1)
## Q* = 36.047, df = 9, p-value = 3.889e-05
## 
## Model df: 1.   Total lags used: 10

The residuals of the ARIMA(2,1,0) model appear to oscillate around zero, suggesting that the model fits the data fairly well without any clear trend.The ACF plot shows several spikes, but most of them fall within the blue dashed lines.The histogram of the residuals appears to be roughly bell-shaped, suggesting that the residuals are normally distributed.

For the ARIMA(0,1,1) model the residuals oscillate around zero, but there seems to be a bit more variability compared to the ARIMA(2,1,0) model.The ACF plot for this model shows a more significant spike at lag 1 and then decreases more slowly, suggesting that there is more autocorrelation present in the residuals compared to the other model.The histogram for the residuals also resembles a normal distribution but shows more variability, and there is a bit of skewness.

In conclusion,based on the residual diagnostics and the AIC and BIC , ARIMA(2,1,0) seems to be the better model due to its more favorable residual behavior (random scatter) , less autocorrelation, and more normality in the residuals.

e) produce forecasts of your fitted model. Do the forecasts look reasonable?

forecasts <- forecast(fit_arima_210, h = 10)

# Plot the forecasts
autoplot(forecasts) +
  ggtitle("Forecasts from ARIMA(2,1,0) Model") +
  xlab("Year") +
  ylab("Transformed GDP")

The forecast follows the upward trend observed in the data, which suggests that the model has captured the long-term growth trend well and the prediction line and confidence intervals seem smooth without wild fluctuations, indicating that the model is not overfitted. Also, the blue confidence intervals look appropriate—neither too narrow nor excessively wide—indicating the model has captured uncertainty reasonably. Overall, the forecasts do look reasonable.

f) compare the results with what you would obtain using ETS() (with no transformation).

ets_fit <- ets(gdp_us$GDP)
summary(ets_fit)
## ETS(M,A,N) 
## 
## Call:
## ets(y = gdp_us$GDP)
## 
##   Smoothing parameters:
##     alpha = 0.9991 
##     beta  = 0.5012 
## 
##   Initial states:
##     l = 448093333333.703 
##     b = 64917355686.8708 
## 
##   sigma:  0.026
## 
##      AIC     AICc      BIC 
## 3190.787 3191.941 3201.089 
## 
## Training set error measures:
##                       ME         RMSE          MAE       MPE     MAPE      MASE
## Training set 20997049199 166906260796 104653662551 0.4843392 1.914575 0.3067446
##                   ACF1
## Training set 0.1532127
ets_forecasts <- forecast(ets_fit, h = 10)

# Plot the ETS forecasts
autoplot(ets_forecasts) +
  ggtitle("Forecasts from ETS Model (No Transformation)") +
  xlab("Year") +
  ylab("GDP")

The AIC and BIC values for the ARIMA(2,1,0) model are much lower than those for the ETS model, indicating a better fit for the ARIMA model. Both ARIMA models have lower AIC and BIC values compared to the ETS model, suggesting they explain the data more efficiently. The RMSE and MAE for the ARIMA(2,1,0) model are also significantly lower than those for the ETS model. Overall it looks like the ETS model is nowhere near being a good fit fot the data.