Data_624_HW_6

library(fpp3)

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.1
## ✔ lubridate   1.9.3     ✔ fable       0.4.1
## ✔ ggplot2     3.5.1

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

9.1

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise?

The data does look like white noise since there are no lags, the difference between ACF plots is the confidence interbal for each one of them. As we increase the number of observations the confidence interval narrows.

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

So as we increase the sample size, the confidence interval decreases and the critical values approach zero

9.2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

gafa_stock

## # A tsibble: 5,032 x 8 [!]
## # Key:       Symbol [4]
##    Symbol Date        Open  High   Low Close Adj_Close    Volume
##    <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>     <dbl>
##  1 AAPL   2014-01-02  79.4  79.6  78.9  79.0      67.0  58671200
##  2 AAPL   2014-01-03  79.0  79.1  77.2  77.3      65.5  98116900
##  3 AAPL   2014-01-06  76.8  78.1  76.2  77.7      65.9 103152700
##  4 AAPL   2014-01-07  77.8  78.0  76.8  77.1      65.4  79302300
##  5 AAPL   2014-01-08  77.0  77.9  77.0  77.6      65.8  64632400
##  6 AAPL   2014-01-09  78.1  78.1  76.5  76.6      65.0  69787200
##  7 AAPL   2014-01-10  77.1  77.3  75.9  76.1      64.5  76244000
##  8 AAPL   2014-01-13  75.7  77.5  75.7  76.5      64.9  94623200
##  9 AAPL   2014-01-14  76.9  78.1  76.8  78.1      66.1  83140400
## 10 AAPL   2014-01-15  79.1  80.0  78.8  79.6      67.5  97909700
## # ℹ 5,022 more rows

amazon_data <- gafa_stock %>% filter(Symbol == "AMZN")

amazon_data %>%
  gg_tsdisplay(Close, plot_type='partial') +
  labs(title = "Amazon Data")

## Warning: Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.

The time series plot of Amazon’s daily closing prices shows a strong upward trend over time, indicating that the mean of the series is not constant and suggesting non-stationarity. The ACF plot confirms this, as the autocorrelations remain very high across many lags, without a rapid decay toward zero. The PACF plot shows a large spike at lag 1, this means that the data would requires differencing to achieve stationarity.

9.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Turkish GDP from global_economy.
Accommodation takings in the state of Tasmania from aus_accommodation.
Monthly sales from souvenirs.

#Turkey GDP 
turkey_data <- global_economy %>%
  filter(Country == "Turkey")

turkey_data %>%
  gg_tsdisplay(GDP, plot_type='partial') + 
  labs(title = "Turkey GDP Data")

#box-cox transformaiton
turkey_lambda <- turkey_data %>%
    features(GDP, features = guerrero) %>%
    pull(lambda_guerrero)

turkey_data_transformed <- turkey_data |>
    mutate(GDP = box_cox(GDP, turkey_lambda))

turkey_data_transformed %>%
  gg_tsdisplay(GDP, plot_type='partial') + 
  labs(title = "Turkey GDP Data Transformed")

# kpss test for stationarity

turkey_data_transformed |>
  features(GDP, unitroot_kpss)

## # A tibble: 1 × 3
##   Country kpss_stat kpss_pvalue
##   <fct>       <dbl>       <dbl>
## 1 Turkey       1.50        0.01

#result: kpss_pvalue = 0.01(may be smaller than that), concludes that the data is non-stationary, need to prefrom the differencing

turkey_data_transformed |>
  features(GDP, unitroot_ndiffs)

## # A tibble: 1 × 2
##   Country ndiffs
##   <fct>    <int>
## 1 Turkey       1

# ndiffs = 1, one order of differencing is required to obtain stationary data

For Turkey GDP data we would need to apply one order of differencing to obtain stationary data.

#Accommodation takings in the state of Tasmania from aus_accommodation.
tasmania_data <- aus_accommodation %>%
  filter(State == "Tasmania")

tasmania_data %>%
  gg_tsdisplay(Takings, plot_type='partial') + 
  labs(title = "Tasmania Takings Data")

#box-cox transformaiton
tasmania_lambda <- tasmania_data |>
    features(Takings, features = guerrero) |>
    pull(lambda_guerrero)

tasmania_data_transformed <- tasmania_data |>
    mutate(Takings = box_cox(Takings, tasmania_lambda))

tasmania_data_transformed |>
    features(Takings, unitroot_kpss)

## # A tibble: 1 × 3
##   State    kpss_stat kpss_pvalue
##   <chr>        <dbl>       <dbl>
## 1 Tasmania      1.84        0.01

#kpss test results: kpss_pvalue = 0.01(may be smaller than that), data is non-stationary

tasmania_data_transformed |>
  features(Takings, unitroot_nsdiffs)

## # A tibble: 1 × 2
##   State    nsdiffs
##   <chr>      <int>
## 1 Tasmania       1

Since Tasmania Takings data is quarterly, I checked for seasonal differencing and got that we would need to apply 1 seasonal differencing.

#Monthly sales from souvenirs

souvenirs %>%
  gg_tsdisplay(Sales, plot_type='partial') + 
  labs(title = "Souvenirs sales data ")

souvenirs_lambda <- souvenirs |> 
  features(Sales, features = guerrero) |> 
  pull(lambda_guerrero)

souvenirs_transformed <- souvenirs |> 
  mutate(Sales = box_cox(Sales, souvenirs_lambda))

souvenirs_transformed |> 
  features(Sales, unitroot_kpss)

## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1      1.79        0.01

# kpss_pvalue = 0.01, non-stationary

souvenirs_transformed |>
  features(Sales, unitroot_nsdiffs)

## # A tibble: 1 × 1
##   nsdiffs
##     <int>
## 1       1

Similarly to Tasmania data, for the souvenirs data would need to apply 1 seasonal differencing.

9.5

For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries %>%
  autoplot(Turnover) + 
  labs(title = "Northern Territory Turnover data")

#box-cox transformation
myseries_lambda <- myseries %>%
    features(Turnover, features = guerrero) %>%
    pull(lambda_guerrero)

myseries_transformed <- myseries %>%
    mutate(Turnover = box_cox(Turnover, myseries_lambda))

myseries_transformed %>%
  gg_tsdisplay(Turnover, plot_type='partial') + 
  labs(title = "Transformed Northern Territory Turnover data")

myseries_transformed %>%
    features(Turnover, unitroot_kpss) #kpss_pvalue = 0.01, data is non-stationary, need to apply differencing

## # A tibble: 1 × 4
##   State              Industry                              kpss_stat kpss_pvalue
##   <chr>              <chr>                                     <dbl>       <dbl>
## 1 Northern Territory Clothing, footwear and personal acce…      5.43        0.01

#order of seasonal differencing? 
myseries_transformed %>%
    features(Turnover, unitroot_nsdiffs) # nsdiffs = 1

## # A tibble: 1 × 3
##   State              Industry                                            nsdiffs
##   <chr>              <chr>                                                 <int>
## 1 Northern Territory Clothing, footwear and personal accessory retailing       1

myseries_differenced <- myseries_transformed %>%
    mutate(turn_sdifferenced = difference(Turnover, 12))

# testing differenced data
myseries_differenced  %>%
    features(turn_sdifferenced, unitroot_kpss) #kpss_pvalue = 0.074

## # A tibble: 1 × 4
##   State              Industry                              kpss_stat kpss_pvalue
##   <chr>              <chr>                                     <dbl>       <dbl>
## 1 Northern Territory Clothing, footwear and personal acce…     0.407      0.0741

myseries_differenced %>%
  gg_tsdisplay(turn_sdifferenced, plot_type='partial') + 
  labs(title = "Differenced and Transformed Northern Territory Turnover data")

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_point()`).

I used the Box-Cox transformation to stabilize the variance. The kpss test showed the data was not stationary, so I checked for seasonal differencing and found that one seasonal difference was needed. After applying seasonal differencing (lag = 12), the kpss test showed the data was now stationary (kpss_pvalue = 0.074).

9.6

Simulate and plot some data from simple ARIMA models.

Use the following R code to generate data from an AR(1) model with … = 0.6 and …2 = 1. The process starts with y1 = 0.

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

Produce a time plot for the series. How does the plot change as you change …?

sim |>
  autoplot(y) + 
  ggtitle("Phi = 0.6")

#phi = 0.3
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.3*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim %>%
  autoplot(y) +
  ggtitle("Phi = 0.3")

#phi = 0.9
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.9*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim %>%
  autoplot(y) +
  ggtitle("Phi = 0.9")

As phi increases there is less variability in the values.

Write your own code to generate data from an MA(1) model with …1=0.6 and …2=1 .

y <- numeric(100)
e <- rnorm(100)

for(i in 2:100)
  y[i] <- 0.6*e[i-1] + e[i]

ma_sim <- tsibble(idx = seq_len(100), y = y, index = idx)

Produce a time plot for the series. How does the plot change as you change …1 ?

ma_sim %>%
  autoplot(y) +
  ggtitle("MA(1) model(theta = 0.6")

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.3*e[i-1] + e[i]
ma_sim2 <- tsibble(idx = seq_len(100), y = y, index = idx)

ma_sim2 %>% 
  autoplot(y) +
  ggtitle("Theta = 0.3")

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.9*e[i-1] + e[i]
ma_sim3 <- tsibble(idx = seq_len(100), y = y, index = idx)

ma_sim3 %>% 
  autoplot(y) +
  ggtitle("Theta = 0.9")

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- -0.9*e[i-1] + e[i]
ma_sim4 <- tsibble(idx = seq_len(100), y = y, index = idx)

ma_sim4 %>% 
  autoplot(y) +
  ggtitle("Theta = -0.9")

As Theta increases, the variation decreases. - Generate data from an ARMA(1,1) model with phi1=0.6 , 1=0.6 and 2=1 .

y <- numeric(100)
e <- rnorm(100)

for (i in 2:100)
    y[i] <- (0.6 * y[i-1]) + (0.6 * e[i-1]) + e[i]

arma_sim <- tsibble(idx = seq_len(100), y = y, index = idx)

Generate data from an AR(2) model with phi1=-0.8 , phi2=0.3 and 2=1 . (Note that these parameters will give a non-stationary series.)

y <- numeric(100)
for(i in 3:100)
  y[i] <- -0.8 * y[i-1] + 0.3 * y[i-2] + e[i]

ar2_sim <- tsibble(idx = seq_len(100), y = y, index = idx)

Graph the latter two series and compare them.

arma_sim %>%
  autoplot(y) + 
  ggtitle("ARMA(1,1) Model")

ar2_sim %>%
  autoplot(y) + 
  ggtitle("AR(2) Model")

The non-stationary AR(2) model displays the expected oscillatory behavior associated with negative phi values. ## 9.7 Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

autoplot(aus_airpassengers)

## Plot variable not specified, automatically selected `.vars = Passengers`

aus_fit <- aus_airpassengers %>%
    model(ARIMA(Passengers))
report(aus_fit)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

aus_fit %>%
  gg_tsresiduals() +
  labs(title = "Residual Diagnostics for ARIMA Model")

aus_fc <- aus_fit %>%
  forecast(h = 10)

aus_fc %>%
  autoplot(aus_airpassengers) +
  labs(title = "Forecast: Australian Air Passengers")

ARIMA(0,2,1) was automatically selected. Based on the residual plots it does look like white noise.

Write the model in terms of the backshift operator.

**(1-B)^2 * y_t= (1 - 0.8756B)*E_t**

Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

aus_fit2 <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ 1 + pdq(0,1,0)))

aus_fc2 <- aus_fit2 %>%
  forecast(h="10 years")

aus_fc2 %>%
  autoplot(aus_airpassengers) +
  labs(title = "10 Year Forecast with Drift")

#difficult to compare the forecasts on separete plots, I will combine all of them on one plot

Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.
Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

fits <- aus_airpassengers %>%
  model(
    auto_arima = ARIMA(Passengers),                          # part a
    arima_010_drift = ARIMA(Passengers ~ 1 + pdq(0,1,0)),    # part c
    arima_212_drift = ARIMA(Passengers ~ 1 + pdq(2,1,2)),    # ARIMA(2,1,2) with drift
    arima_021_const = ARIMA(Passengers ~ 1 + pdq(0,2,1))      # ARIMA(0,2,1) with constant
  )

## Warning: Model specification induces a quadratic or higher order polynomial trend. 
## This is generally discouraged, consider removing the constant or reducing the number of differences.

# Forecast both for 10 years
fc_combined <- fits %>%
  forecast(h = 10)

# Plot both forecasts on the same graph
fc_combined %>%
  autoplot(aus_airpassengers,level = NULL)+
  labs(
    title = "Forecast Comparison",
    y = "Passengers (millions)",
    x = "Year"
  )

The ARIMA(0,1,0) with drift model shows a steady, linear increase in passengers over time. The ARIMA(2,1,2) with drift and auto ARIMA models both closely follow the historical trend and project continued growth, suggesting they effectively capture the data’s structure. In contrast, the ARIMA(0,2,1) with a constant produces a noticeably steeper forecast. Overall, models with drift tend to produce more plausible long-term forecasts.

9.8

For the United States GDP series (from global_economy):

if necessary, find a suitable Box-Cox transformation for the data;
fit a suitable ARIMA model to the transformed data using ARIMA();
try some other plausible models by experimenting with the orders chosen;
choose what you think is the best model and check the residual diagnostics;
produce forecasts of your fitted model. Do the forecasts look reasonable?
compare the results with what you would obtain using ETS() (with no transformation).

usa_gdp_data <- global_economy %>%
  filter(Code == "USA")

usa_gdp_data %>%
  gg_tsdisplay(GDP,plot_type='partial') + 
  labs(title = "USA GDP Data") # data is not stationary

#Box-Cox transformation
usa_lambda <- usa_gdp_data %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

usa_gdp_data <- usa_gdp_data %>%
  mutate(GDP_boxcox = box_cox(GDP, usa_lambda))

usa_gdp_data %>%
  autoplot(GDP_boxcox)

#ARIMA model
usa_fit <- usa_gdp_data %>%
  model(ARIMA(box_cox(GDP, usa_lambda)))

report(usa_fit) # ARIMA(1,1,0) w drift AICc = 657.1

## Series: GDP 
## Model: ARIMA(1,1,0) w/ drift 
## Transformation: box_cox(GDP, usa_lambda) 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

#other ARIMA models
usa_fit2 <- usa_gdp_data %>%
  model(ARIMA(box_cox(GDP, usa_lambda) ~ 1 + pdq(1,1,1)))

report(usa_fit2) #ARIMA(1,1,1) - AICc = 659.41

## Series: GDP 
## Model: ARIMA(1,1,1) w/ drift 
## Transformation: box_cox(GDP, usa_lambda) 
## 
## Coefficients:
##          ar1      ma1  constant
##       0.4736  -0.0190  114.8547
## s.e.  0.2851   0.3286    9.3486
## 
## sigma^2 estimated as 5580:  log likelihood=-325.32
## AIC=658.64   AICc=659.41   BIC=666.82

usa_fit3 <- usa_gdp_data %>%
  model(ARIMA(box_cox(GDP, usa_lambda) ~ 0 + pdq(2,1,1)))

report(usa_fit3) #ARIMA(2,1,1) - AICc = 664.38

## Series: GDP 
## Model: ARIMA(2,1,1) 
## Transformation: box_cox(GDP, usa_lambda) 
## 
## Coefficients:
##          ar1      ar2      ma1
##       1.3142  -0.3175  -0.8318
## s.e.  0.1800   0.1776   0.1145
## 
## sigma^2 estimated as 5846:  log likelihood=-327.8
## AIC=663.61   AICc=664.38   BIC=671.78

The model that was automatically generated by ARIMA function has the lowest AICc and is the best model to forecast.

usa_fit %>%
  gg_tsresiduals()

ACF values are within the range, but the histogram doesn’t look normally distributed, left-skewed. Looks like there are some outliers.

usa_fc <- usa_fit %>%
  forecast(h="10 years") 

usa_fc |>
  autoplot(usa_gdp_data) +
  labs("10 Year United States GDP Prediction")

#ETS model

ets_usa_fit <- usa_gdp_data %>%
  model(ETS(GDP))

report(ets_usa_fit) # AICc = 3191.941

## Series: GDP 
## Model: ETS(M,A,N) 
##   Smoothing parameters:
##     alpha = 0.9990876 
##     beta  = 0.5011949 
## 
##   Initial states:
##          l[0]        b[0]
##  448093333334 64917355687
## 
##   sigma^2:  7e-04
## 
##      AIC     AICc      BIC 
## 3190.787 3191.941 3201.089

ets_fc <- ets_usa_fit %>%
  forecast(h="10 years")

ets_fc |>
  autoplot(usa_gdp_data)

ETS’s AICc, BIC values are much higher than the ARIMA’s model.

Data_624_HW_6_YR

2025-03-23

9.1

9.2

9.3

9.5

9.6

9.8