Exercises:
9.1 Explain ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
9.2 Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.
9.3 For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
9.4 For the souvenirs data, write down the differences you chose above using backshift operator notation.
9.5 For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
9.6 Simulate and plot some data from simple ARIMA models.
9.7 Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.
9.8 For the United States GDP series (from global_economy):

#Import needed libraries

library(tsibbledata) #to use the time series data in it for the exercises.
library(tsibble) # to use datasets and function as_tsibble
library(tibble) # to use view function
library(ggplot2)
library(feasts) # to use the functions for graphics like autoplot()

library(readr) # to uses read_csv function
library(dplyr) # to use Filter, mutate, arrange function etc
library(tidyr) # to use pivot_longer function

library(fpp3)  # to use us_gasoline dataset 
library(seasonal) # X-13ARIMA-SEATS decomposition
library(feasts)
library(fable) # for pdq() function

library(corrplot)

Exercises:

9.1 Explain ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise? Answer:

The difference among these figures has to do with the magnitude of lag/spikes and the blue lines.As the random numbers increase, the length of the lag and the distance betwen the blue lines decreases.
There is no significant autocorrelation in residuals.
However,they all indicate that the data are white noise because 95% of the spikes in the ACF lies within ± 1.96 of the blue line.

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they refer to white noise? Answer:

The critical values at different distances from the mean of zero, because as the population of the random numbers increases, the mean of the population decreases.
This also means the autocorrelation is also decreasing.
Hence larger the series, smaller the critical value.
The autocorrelations are different in each figure because of the different population of random numbers in the given plot.
It is all white noise however, because the numbers are random.

9.2 Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

Answer:

A convenient way to produce a time plot, ACF plot and PACF plot in one command is to use the gg_tsdisplay() function with plot_type = “partial”.
The time plot shows that the closing price have a positive increasing trend over time. The mean of the values is not consistent over time. There is no variation observed. Hence the time series is not stationary.
The ACF plot shows the lags are positive but decreasing over time.It is not quickly converging to 0. The value of r1 is large and positive.There is a strong autocorrelation among the lags.The data is not white noise because 95% of the spikes in the ACF does not lie within ± 1.96 of the blue line. Hence the closing price should be differenced.
The PACF plot shows a strong autocorrelation at lag1.The PACF for a stationary time series would be 0 for all of the lags. Hence the time series is not stationary.
A very low value of p(0.01) indicates that we can reject the null hypothesis.The data is not stationary.
Post performing the differencing checks on the closing price,the value indicates teh need for only first-order differencing.

# Data preparation
amazon_2015 <- gafa_stock |>
  filter(Symbol == "AMZN", year(Date) == 2015)

# Original Plot
amazon_2015 |> 
  autoplot(Close) + labs(subtitle = "Amazon daily closing stock price")

# Time plot with ACF and PACF
amazon_2015 |> 
gg_tsdisplay(Close, plot_type='partial') + labs(subtitle = "Amazon daily closing stock price")

# p value
amazon_2015 |>
  features(Close, unitroot_kpss)

## # A tibble: 1 × 3
##   Symbol kpss_stat kpss_pvalue
##   <chr>      <dbl>       <dbl>
## 1 AMZN        4.02        0.01

# number of differences needed
amazon_2015 |>
  features(Close, unitroot_ndiffs)

## # A tibble: 1 × 2
##   Symbol ndiffs
##   <chr>   <int>
## 1 AMZN        1

# Differenced plot
amazon_2015 |> ACF(difference(Close)) |>
  autoplot() + labs(subtitle = "Changes in Amazon closing stock price(differenced)")

#  p value post differencing
amazon_2015 |>
  mutate(diff_close = difference(Close)) |>
  features(diff_close, ljung_box, lag = 10)

## # A tibble: 1 × 3
##   Symbol lb_stat lb_pvalue
##   <chr>    <dbl>     <dbl>
## 1 AMZN      19.7    0.0324

9.3 For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

a)Turkish GDP from global_economy.

Answer:

The time plot shows a positive exponential trend which is increasing over time.There is no seasonality or variation.The time series is not stationary.
The lambda value of 0.157 indicates that the data needs a strong transformation as a log transformation (λ = 0)to make it more suitable for analysis and forecasting.
Based on the unitroot_ndiffs function, first order non seasonal differencing is only needed.

turkey_gdp <- global_economy |>
  filter(Code == "TUR") 

turkey_gdp |>
  autoplot(GDP) +
  labs(y = " GDP", title = "Turkish GDP")

# box cox transformation
lambda <- turkey_gdp  |>    # this is to find the lamba which is needed for box cox formula
  features(GDP, features = guerrero) |> # guerrero find s the best lambda at the peak
  pull(lambda_guerrero)

turkey_gdp  |>
  autoplot(box_cox(GDP, lambda)) +
  labs(y = "",title = paste("Turkish GDP with log transformation λ = ",round(lambda,4)))

# number of differences needed
turkey_gdp |>
  features(GDP, unitroot_ndiffs)

## # A tibble: 1 × 2
##   Country ndiffs
##   <fct>    <int>
## 1 Turkey       1

# number of seasonal differences needed
turkey_gdp |>
  features(GDP, unitroot_nsdiffs)

## # A tibble: 1 × 2
##   Country nsdiffs
##   <fct>     <int>
## 1 Turkey        0

# Differenced plot
turkey_gdp  |> autoplot(difference(log(GDP))) + labs(subtitle = "Changes in Turkish GDP (log transformed and differenced)")

b)Accommodation takings in the state of Tasmania from aus_accommodation.

Answer:

The time plot shows no trend over time.There is seasonality and a increasing variation over time.The time series is not stationary.
The lambda value of 0.0018 indicates that the data needs a strong transformation as a log transformation (λ = 0)to make it more suitable for analysis and forecasting.
Based on the unitroot_ndiffs function, first order differencing is needed.
Based on the unitroot_nsdiffs function, first order seasonal differencing is also needed.

tasmania_acc <- aus_accommodation |>
  filter(State == "Tasmania") 

tasmania_acc  |>
  autoplot(Takings) +
  labs(y = "Takings", title = "Tasmania Accomodation Takings")

# box cox transformation
lambda <- tasmania_acc   |>    # this is to find the lamba which is needed for box cox formula
  features(Takings, features = guerrero) |> # guerrero find s the best lambda at the peak
  pull(lambda_guerrero)

tasmania_acc  |>
  autoplot(box_cox(Takings, lambda)) +
  labs(y = "Takings",title = paste("Tasmania Accomodation Takings with log transformation λ = ",round(lambda,4)))

# number of differences needed
tasmania_acc |>
  features(Takings, unitroot_ndiffs)

## # A tibble: 1 × 2
##   State    ndiffs
##   <chr>     <int>
## 1 Tasmania      1

# number of differences needed
tasmania_acc |>
  features(Takings, unitroot_nsdiffs)

## # A tibble: 1 × 2
##   State    nsdiffs
##   <chr>      <int>
## 1 Tasmania       1

# Applying seasonal differencing first and then regular differencing
tasmania_acc   |> autoplot(difference(difference(log(Takings),4))) + labs(subtitle = "Tasmania Accomodation Takings (log transformed, seasonal differenced, regular differenced)")

c) Monthly sales from souvenirs.

Answer:

The time plot shows a increasing trend over time.There is seasonality( spike in december) and a increasing variation over time.The time series is not stationary.
The lambda value of 0.0021 indicates that the data needs a strong transformation as a log transformation (λ = 0)to make it more suitable for analysis and forecasting.
Based on the unitroot_ndiffs function, first order differencing is needed.
Based on the unitroot_nsdiffs function, first order seasonal differencing is also needed.

souvenirs  |>
  autoplot(Sales) +
  labs(y = "Monthly Sales", title = "Monthly Sale of souvenirs")

# box cox transformation
lambda <- souvenirs   |>    # this is to find the lamba which is needed for box cox formula
  features(Sales, features = guerrero) |> # guerrero find s the best lambda at the peak
  pull(lambda_guerrero)

souvenirs  |>
  autoplot(box_cox(Sales, lambda)) +
  labs(y = "Monthly Sales",title = paste("Monthly Sale of souvenirs with log transformation λ = ",round(lambda,4)))

# number of differences needed
souvenirs  |>
  features(Sales, unitroot_ndiffs)

## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      1

# number of differences needed
souvenirs  |>
  features(Sales, unitroot_nsdiffs)

## # A tibble: 1 × 1
##   nsdiffs
##     <int>
## 1       1

# Applying seasonal differencing first and then regular differencing
souvenirs  |> autoplot(difference(difference(log(Sales),12))) + labs(subtitle = "Monthly Sale of souvenirs(log transformed, seasonal differenced, regular differenced)")

9.4 For the souvenirs data, write down the differences you chose above using backshift operator notation.

Answer:

The complete transformation in backshift operator notation would be:

(1-B)(1-B^12)(log Y_t)

where:

Y_t is the original souvenirs sales series

log Y_t is the Box-Cox transformed series (since λ ≈ 0 is effectively a log transformation)

(1-B) represents the regular first difference

(1-B^12) represents the seasonal difference with period 12 (for monthly data)

This notation indicates that we first apply the Box-Cox (log) transformation to stabilize variance, then apply both seasonal and regular differencing to eliminate trend and seasonality, resulting in a stationary series.

9.5 For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

Answer:

The time plot shows a increasing trend over time.There is seasonality and a increasing variation over time.The time series is not stationary.
The lambda value of 0.083 indicates that the data needs a strong transformation as a log transformation (λ = 0)to make it more suitable for analysis and forecasting.
Based on the unitroot_ndiffs function, first order differencing is needed.
Based on the unitroot_nsdiffs function, first order seasonal differencing is also needed.

set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries |>
  autoplot(Turnover) +
  labs(y = "Retail turnover in $Million AUD ", title = "Turnover: Australian retail industry")

# box cox transformation
lambda <- myseries   |>    # this is to find the lamba which is needed for box cox formula
  features(Turnover, features = guerrero) |> # guerrero find s the best lambda at the peak
  pull(lambda_guerrero)

myseries  |>
  autoplot(box_cox(Turnover, lambda)) +
  labs(y = "Monthly Sales",title = paste("Monthly Sale of souvenirs with log transformation λ = ",round(lambda,4)))

# number of differences needed
myseries  |>
  features(Turnover, unitroot_ndiffs)

## # A tibble: 1 × 3
##   State              Industry                                            ndiffs
##   <chr>              <chr>                                                <int>
## 1 Northern Territory Clothing, footwear and personal accessory retailing      1

# number of differences needed
myseries  |>
  features(Turnover, unitroot_nsdiffs)

## # A tibble: 1 × 3
##   State              Industry                                            nsdiffs
##   <chr>              <chr>                                                 <int>
## 1 Northern Territory Clothing, footwear and personal accessory retailing       1

# Applying seasonal differencing first and then regular differencing
myseries  |> autoplot(difference(difference(log(Turnover),12))) + labs(subtitle = "Monthly Sale of souvenirs(log transformed, seasonal differenced, regular differenced)")

9.6 Simulate and plot some data from simple ARIMA models.

a)Use the following R code to generate data from an AR(1) model with ϕ1=0.6 and σ2=1.The process starts with y1=0

#data preparation
set.seed(123)
phi_1_func <- function(phi_1) {
  y <- numeric(100)
  e <- rnorm(100)
  for(i in 2:100)
    y[i] <- phi_1*y[i-1] + e[i]
  sim <- tsibble(idx = seq_len(100), y = y, index = idx)
  
  return(sim)
}

b) Produce a time plot for the series. How does the plot change as you change phi_1

Answer:

When i change the phi_1 parameter in the AR(1) model, the behavior of the time series changes significantly.

When φ₁ = 0:

The series becomes pure white noise. No autocorrelation between observations. The plot looks like random fluctuations around zero with no pattern.

When 0 < φ₁ < 1 (like our φ₁ = 0.6):

The series shows positive autocorrelation. Values tend to persist in the same direction for several time periods. Creates “local trends” or apparent cycles.

When φ₁ = 1:

The series becomes a random walk. Can wander far from its starting point. No tendency to return to a mean value. Shows strong persistent behavior and pronounced trends.

When φ₁:-0.5: The series shows negative autocorrelation Values tend to alternate between positive and negative Creates a “choppy” or oscillating appearance

When φ₁ = -1: The series alternates perfectly between positive and negative values Creates a strong oscillating pattern

When |φ₁|=2: The series becomes non-stationary and explosive Values grow exponentially in magnitude over time The plot would show extreme values that increase without bound

phi_1_func(0.6) |>
  autoplot(y) + labs( title = "Random Series with ϕ1= 0.6 (Moderate Persistence)")

phi_1_func(0) |>
  autoplot(y) + labs( title = "Random Series with ϕ1= 0 (White Noise)")

phi_1_func(1) |>
  autoplot(y) + labs( title = "Random Series with ϕ1= 1 (Random Walk)")

phi_1_func(-0.5) |>
  autoplot(y) + labs( title = "Random Series with ϕ1= -0.5 (Oscillating)")

phi_1_func(-1) |>
  autoplot(y) + labs( title = "Random Series with ϕ1= -1 (Strong Oscillating)")

phi_1_func(2) |>
  autoplot(y) + labs( title = "Random Series with ϕ1= 2 (Exponential)")

c) Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1

set.seed(124)
phi_1_own_func <- function(theta) {
  y <- numeric(100)
  e <- rnorm(100)
   for(i in 2:100)
    y[i] <- theta*e[i-1] + e[i]
  sim_own <- tsibble(idx = seq_len(100), y = y, index = idx)
  
  return(sim_own)
}

d)Produce a time plot for the series. How does the plot change as you change θ1?

Answer:

As mentioned earlier, when θ1 is closer to 0 it more resembles white noise and when θ1 is closer to 1,it more closely resembles a dampened oscillatory pattern.When θ1 is negative, the plot may show an amplified oscillatory behavior.

phi_1_own_func(0.6) |>
  autoplot(y) + labs( title = "Random Series with ϕ1= 0.6 (Moderate Persistence)")

phi_1_own_func(0) |>
  autoplot(y) + labs( title = "Random Series with ϕ1= 0 (White Noise)")

phi_1_own_func(1) |>
  autoplot(y) + labs( title = "Random Series with ϕ1= 1 (Moderate Persistence)")

phi_1_own_func(-0.5) |>
  autoplot(y) + labs( title = "Random Series with ϕ1= -0.5 (Oscillating)")

phi_1_own_func(-1) |>
  autoplot(y) + labs( title = "Random Series with ϕ1= -1 (Strong Oscillating)")

phi_1_own_func(2) |>
  autoplot(y) + labs( title = "Random Series with ϕ1= 2 (Moderate Persistence)")

e)Generate data from an ARMA(1,1) model with phi_1 = 0.6, theta_1 = 0.6, and sigma^2 = 1.

set.seed(123)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100){
  y[i] <- 0.6*y[i-1] + 0.6*e[i-1] + e[i]
}

arma_1_1 <- tsibble(idx = seq_len(100), y = y, index = idx)

f) Generate data from an AR(2) model with ϕ1=−0.8,ϕ2=0.3, σ2=1.(Note that these parameters will give a non-stationary series.)

set.seed(123)
yy <- numeric(100)
ee <- rnorm(100)

for(i in 3:100){
  yy[i] <- -0.8 * yy[i-1] + 0.3 * yy[i-2] + ee[i]
}

arma_2 <- tsibble(idx = seq_len(100), yy = yy, index = idx)

g) Graph the latter two series and compare them.

Answer:

ARMA(1,1) with ϕ1=0.6 combines autoregressive and moving average components. Shows smoother patterns with moderate persistence from the AR component. Has short-term correction effects from the MA component. Stationary since |φ₁| = 0.6 < 1

AR(2) with ϕ1=−0.8,ϕ2=0.3 is a pure autoregressive model with two lags. The negative φ₁ creates oscillatory behavior with alternating values. The positive φ₂ adds some longer-term persistence. Not Stationary since -0.8 + 0.3 = -0.5 < 1, φ₂ - φ₁ = 0.3 - (-0.8) = 1.1 >1, and |φ₂| = 0.3 < 1

 autoplot(arma_1_1) +labs(title =" ARMA(1,1)with ϕ1=0.6")

autoplot(arma_2)+labs(title =" ARMA(2) with ϕ1=−0.8,ϕ2=0.3")

9.7 Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

a)Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

Answer:

The timeplot shows a positive increasing trend with no seasonality or variation over time. The lambda value of -0.176 shows the need of a log transformation. The ndiffs value of 2 indicates we need to perform a second order differencing. nsdiffs value of 0 confirms that there is no seasonal differencing needed.

The ACF suggests an MA(1) model; so a candidate is an ARIMA(0,2,1). We fit an ARIMA(0,2,1) along with two automated model selections, one using the default stepwise procedure, and one working harder to search a larger model space.They both generated the same ARIMA(0,2,1) ARIMA model fitted also reported a ARIMA(0,2,1). Hence its consistent with what i observed.

The residual plot shows there is not autocorrelation of the lags, normal distribution, the lags are within the blue line and hence the data is white noise.

Plot forecasts for the next 10 years shows a relative exponential trend consistent with past error terms.

aus_airpassengers |>
  autoplot() +labs( title = "Australian air passengers" , y ="total number of passengers (in millions) ")

# number of differences needed
aus_airpassengers |>
  features(Passengers, unitroot_ndiffs)

## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      2

# number of seasonal differences needed
aus_airpassengers |>
  features(Passengers, unitroot_nsdiffs)

## # A tibble: 1 × 1
##   nsdiffs
##     <int>
## 1       0

# Applying second order differencing
aus_airpassengers_new <- aus_airpassengers |>
  mutate(Passengers_new = difference(difference(Passengers)))

# Review the timeplot, ACF and PACF plot
aus_airpassengers_new |>
  gg_tsdisplay(Passengers_new, plot_type='partial')+ labs(subtitle = "Australian air passengers (second order differenced)")

# Checking the fit
aus_airpassengers_fit <- aus_airpassengers |>
  filter(Year < 2012) |>
  model(ARIMA(Passengers)) 

report(aus_airpassengers_fit)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8756
## s.e.   0.0722
## 
## sigma^2 estimated as 4.671:  log likelihood=-87.8
## AIC=179.61   AICc=179.93   BIC=182.99

# Review the residuals 
aus_airpassengers_fit  |>
  gg_tsresiduals()

# plot forecasts
fc <- aus_airpassengers_fit |>
  forecast(h="10 years")

fc |>
  autoplot(aus_airpassengers) +
  labs(title = "10 Year Forecase for Australian Air Passengers")+
  scale_x_continuous(breaks = seq(min(aus_airpassengers$Year), max(aus_airpassengers$Year), by = 5))

b)Write the model in terms of the backshift operator.

Answer:

Theta is −0.8756 as seen above, MA(2) model:

y[t]=(−0.8756)e[t−1]+e[t]

d = 2 due to second order differencing, The backshift operator notation would be:

(1−B)^2 y[t]=(1−0.8756B)e[t]

This can be rewritten to isolate y[t] as :

(1-2B+B^2)y[t]=(1−0.8756B)e[t]

y[t] -2y[t-1]+ y[t-2] = e[t]−0.8756e[t-1]

y[t] = 2y[t-1]- y[t-2]+e[t]−0.8756e[t-1]

c)Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

Answer:

Fitted an ARIMA(0,1,0) model (which is equivalent to a random walk without drift) to the Passengers variable
The + 1 adds the constant term, making it a random walk with drift model.
The residual plot shows there is not autocorrelation of the lags, normal distribution, the lags are within the blue line and hence the data is white noise.
Comparing the forecasts of a) and c) , i observe that a) forecasts correctly till 2013 and then higher than the actual time series like a exponential trend whereas c) forecasts lower than the actual time series throughout. hence c) is a dampened forecast.

# Checking the fit
aus_airpassengers_fit2 <- aus_airpassengers |>
  filter(Year < 2012) |>
  model(arima010 = ARIMA(Passengers ~ 1+ pdq(0,1,0))) 

report(aus_airpassengers_fit2)

## Series: Passengers 
## Model: ARIMA(0,1,0) w/ drift 
## 
## Coefficients:
##       constant
##         1.3669
## s.e.    0.3319
## 
## sigma^2 estimated as 4.629:  log likelihood=-89.08
## AIC=182.17   AICc=182.48   BIC=185.59

# Review the residuals 
aus_airpassengers_fit2  |>
  gg_tsresiduals()

# forecast for next 10 years
fc2 <- aus_airpassengers_fit2 |>
  forecast(h="10 years")

# plot the forecasts
fc2 |>
  autoplot(aus_airpassengers) +
  labs(title = "10 Year Forecase for Australian Air Passengers")+
  scale_x_continuous(breaks = seq(min(aus_airpassengers$Year), max(aus_airpassengers$Year), by = 5))

d)Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

I observe that a) forecasts correctly till 2013 and then higher than the actual time series, c) forecasts lower than the actual time series throughout, d) forecasts lower than the actual time series till 2013 and then forecasts higher than the actual time series.
when i remove the constant, it becomes a NULL model, non stationary and hence invalid.

# Checking the fit
aus_airpassengers_fit3 <- aus_airpassengers |>
  filter(Year < 2012) |>
  model(arima212 = ARIMA(Passengers ~ 1 + pdq(2,1,2))) 

report(aus_airpassengers_fit3)

## Series: Passengers 
## Model: ARIMA(2,1,2) w/ drift 
## 
## Coefficients:
##          ar1      ar2      ma1     ma2  constant
##       1.4694  -0.5103  -1.5736  0.6780    0.0650
## s.e.  0.3780   0.3558   0.3081  0.2688    0.0294
## 
## sigma^2 estimated as 4.748:  log likelihood=-87.74
## AIC=187.47   AICc=189.94   BIC=197.75

# Review the residuals 
aus_airpassengers_fit3  |>
  gg_tsresiduals()

# forecast for next 10 years
fc3 <- aus_airpassengers_fit3 |>
  forecast(h="10 years")

# plot the forecasts
fc3 |>
  autoplot(aus_airpassengers) +
  labs(title = "10 Year Forecase for Australian Air Passengers") +
  scale_x_continuous(breaks = seq(min(aus_airpassengers$Year), max(aus_airpassengers$Year), by = 5))

e)Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

Answer:

Fitted an ARIMA(0,2,1) model to the Passengers variable
The + 1 adds the constant term, making it quadratic trend (polynomial of order 2).
The constant term is 0.0721 which represents acceleration in the trend rather than a simple drift term.
The forecast plot is showing a quadratic forecast.

# Checking the fit
aus_airpassengers_fit4 <- aus_airpassengers |>
  filter(Year < 2012) |>
  model(arima021 = ARIMA(Passengers ~ 1 + pdq(0,2,1))) 

report(aus_airpassengers_fit4)

## Series: Passengers 
## Model: ARIMA(0,2,1) w/ poly 
## 
## Coefficients:
##           ma1  constant
##       -1.0000    0.0721
## s.e.   0.0709    0.0260
## 
## sigma^2 estimated as 4.086:  log likelihood=-85.74
## AIC=177.48   AICc=178.15   BIC=182.55

# Review the residuals 
aus_airpassengers_fit4  |>
  gg_tsresiduals()

# forecast for next 10 years
fc4 <- aus_airpassengers_fit4 |>
  forecast(h="10 years")

# plot the forecasts
fc4 |>
  autoplot(aus_airpassengers) +
  labs(title = "10 Year Forecase for Australian Air Passengers") +
  scale_x_continuous(breaks = seq(min(aus_airpassengers$Year), max(aus_airpassengers$Year), by = 5))

9.8 For the United States GDP series (from global_economy):

a)if necessary, find a suitable Box-Cox transformation for the data;

The United States GDP time series shows a exponential trend with no seasonality or variation.

Lambda value of 0.2819 shows the need for a log or square root transformation to provide accurate forecast.

usa_gdp <- global_economy |>
  filter(Code == "USA") |>
  select(Year,GDP)


usa_gdp |>
  autoplot(GDP) +
  labs(y = " GDP", title = "USA GDP")

# box cox transformation
lambda <- usa_gdp  |>    # this is to find the lamba which is needed for box cox formula
  features(GDP, features = guerrero) |> # guerrero find s the best lambda at the peak
  pull(lambda_guerrero)

usa_gdp  |>
  autoplot(box_cox(GDP, lambda)) +
  labs(y = "",title = paste("USA GDP with log transformation λ = ",round(lambda,4)))

b)fit a suitable ARIMA model to the transformed data using ARIMA();

Answer: The fit shows a ARIMA(1,1,0) w/ drift .

fit <- usa_gdp|>
  model(ARIMA(box_cox(GDP, lambda))) 

report(fit)

## Series: GDP 
## Model: ARIMA(1,1,0) w/ drift 
## Transformation: box_cox(GDP, lambda) 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

c)try some other plausible models by experimenting with the orders chosen;

ndiffs() shows the need for first order differencing Other plausible models were tried but all have higher AICc than the ARIMA(1,1,0) w/ drift .

# number of differences needed
usa_gdp |>
  features(box_cox(GDP,lambda), unitroot_ndiffs)

## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      1

usa_gdp |>
  gg_tsdisplay(box_cox(GDP,lambda),plot_type='partial')

fit <- usa_gdp |>
  select(GDP, Year) |>
  model(
    arima011 = ARIMA(box_cox(GDP,lambda) ~ pdq(0,1,1)),
    arima110 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,1,0)),
    arima111 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,1,1)),
    arima010 = ARIMA(box_cox(GDP,lambda) ~ pdq(0,1,0)),
    arima212 = ARIMA(box_cox(GDP,lambda) ~ pdq(2,1,2)),
    arima120 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,2,0)),
    arima121 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,2,1)),
    
    auto = ARIMA(box_cox(GDP,lambda), stepwise = FALSE, approx = FALSE)
  )

fit |> pivot_longer(everything(), names_to = "Model name",
                     values_to = "Orders")

## # A mable: 8 x 2
## # Key:     Model name [8]
##   `Model name`                  Orders
##   <chr>                        <model>
## 1 arima011     <ARIMA(0,1,1) w/ drift>
## 2 arima110     <ARIMA(1,1,0) w/ drift>
## 3 arima111     <ARIMA(1,1,1) w/ drift>
## 4 arima010     <ARIMA(0,1,0) w/ drift>
## 5 arima212     <ARIMA(2,1,2) w/ drift>
## 6 arima120              <ARIMA(1,2,0)>
## 7 arima121              <ARIMA(1,2,1)>
## 8 auto         <ARIMA(1,1,0) w/ drift>

glance(fit) |> arrange(AICc) |> select(.model:BIC)

## # A tibble: 8 × 6
##   .model   sigma2 log_lik   AIC  AICc   BIC
##   <chr>     <dbl>   <dbl> <dbl> <dbl> <dbl>
## 1 arima121  5761.   -321.  648.  649.  655.
## 2 arima120  6780.   -326.  656.  656.  660.
## 3 arima110  5479.   -325.  657.  657.  663.
## 4 auto      5479.   -325.  657.  657.  663.
## 5 arima011  5689.   -326.  659.  659.  665.
## 6 arima111  5580.   -325.  659.  659.  667.
## 7 arima212  5734.   -325.  662.  664.  674.
## 8 arima010  6774.   -332.  668.  668.  672.

d)choose what you think is the best model and check the residual diagnostics;

Answer: The innovation residuals are centered around O. The ACF plot shows no autocorrelationa and the lags are all within the blue line. Hence its all white noise.

The large p-value (0.63) confirms that the residuals are similar to white noise.

Hence ARIMA(1,2,1) is the best model.

fit |>
  select(arima121) |>
  gg_tsresiduals()

augment(fit) |>
  filter(.model == "arima121") |>
  features(.innov, ljung_box, lag=10, dof=4)

## # A tibble: 1 × 3
##   .model   lb_stat lb_pvalue
##   <chr>      <dbl>     <dbl>
## 1 arima121    4.36     0.629

e)produce forecasts of your fitted model. Do the forecasts look reasonable?

Yes the forecast looks reasonable and shows similar exponential trend.

forecast(fit, h=10) |>
  filter(.model=='arima121') |>
  autoplot(usa_gdp)+
  labs(title = "Forecasted United States GDP using ARIMA(1,2,1)")

f)compare the results with what you would obtain using ETS() (with no transformation).

As the trend is exponential with no seasonality i will try the ETS MMN model. However ARIMA121 model is more accurate than it as it has lower AICc value.

fit_compare <- usa_gdp |>
  select(GDP, Year) |>
  model(
    
    arima121 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,2,1)),
    ETS = ETS(GDP),
    ETSMMN = ETS(GDP ~ error("M") + trend("M") + season("N"))
   
    )

fit_compare |> pivot_longer(everything(), names_to = "Model name",
                     values_to = "Orders")

## # A mable: 3 x 2
## # Key:     Model name [3]
##   `Model name`         Orders
##   <chr>               <model>
## 1 arima121     <ARIMA(1,2,1)>
## 2 ETS            <ETS(M,A,N)>
## 3 ETSMMN         <ETS(M,M,N)>

glance(fit_compare) |> arrange(AICc) |> select(.model:BIC)

## # A tibble: 3 × 6
##   .model        sigma2 log_lik   AIC  AICc   BIC
##   <chr>          <dbl>   <dbl> <dbl> <dbl> <dbl>
## 1 arima121 5761.         -321.  648.  649.  655.
## 2 ETSMMN      0.000611  -1588. 3186. 3187. 3196.
## 3 ETS         0.000678  -1590. 3191. 3192. 3201.

fit_compare %>%
  forecast(h = 10) %>%
  autoplot(usa_gdp, levels(NULL)) +
  ggtitle("US GDP Forecast")+
  scale_x_continuous(breaks = seq(min(usa_gdp$Year), max(usa_gdp$Year), by = 5))

Homework 6 - ARIMA Models

Dhanya Nair

2025-03-22

Exercises:

9.1 Explain ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

9.2 Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

9.3 For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

a)Turkish GDP from global_economy.

b)Accommodation takings in the state of Tasmania from aus_accommodation.

c) Monthly sales from souvenirs.

9.4 For the souvenirs data, write down the differences you chose above using backshift operator notation.

9.5 For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

9.6 Simulate and plot some data from simple ARIMA models.

a)Use the following R code to generate data from an AR(1) model with ϕ1=0.6 and σ2=1.The process starts with y1=0

b) Produce a time plot for the series. How does the plot change as you change phi_1

c) Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1

d)Produce a time plot for the series. How does the plot change as you change θ1?

e)Generate data from an ARMA(1,1) model with phi_1 = 0.6, theta_1 = 0.6, and sigma^2 = 1.

f) Generate data from an AR(2) model with ϕ1=−0.8,ϕ2=0.3, σ2=1.(Note that these parameters will give a non-stationary series.)

g) Graph the latter two series and compare them.

9.7 Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

a)Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

b)Write the model in terms of the backshift operator.

c)Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

d)Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

e)Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

9.8 For the United States GDP series (from global_economy):

a)if necessary, find a suitable Box-Cox transformation for the data;

b)fit a suitable ARIMA model to the transformed data using ARIMA();

c)try some other plausible models by experimenting with the orders chosen;

d)choose what you think is the best model and check the residual diagnostics;

e)produce forecasts of your fitted model. Do the forecasts look reasonable?

f)compare the results with what you would obtain using ETS() (with no transformation).