library("fpp3")

## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr

## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.6
## ✔ dplyr       1.1.4     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.1     ✔ feasts      0.4.1
## ✔ lubridate   1.9.3     ✔ fable       0.4.1
## ✔ ggplot2     3.5.1

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

library("tidyverse")

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0     ✔ readr   2.1.5
## ✔ purrr   1.0.2     ✔ stringr 1.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()     masks stats::filter()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag()        masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library("ggplot2")
library("fable")

Do the exercises 9.1, 9.2, 9.3, 9.5, 9.6, 9.7, 9.8 in Hyndman. Please submit both the Rpubs link as well as your .rmd file.

Question 9.1

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

a.) Explain the differences among these figures. Do they all indicate that the data are white noise?

Beginning from the left to right plot we can observe that are autocorrelation is strong as compared to shifting our observation to right where can see the correlation decreasing. In the first plot the sample size of 36 random numbers which is small and the autocorrelation is strong fluctuating meaning there are likely to be pattern. The second plot has 360 random numbers and the fluctuation has significantly decrease showing white noise. The third plot having 1000 random numbers further shows white nose meaning the data is random and there is correlation with the past data. b.) Why are the critical values at different distances from the mean of zero? Why are the auto correlations different in each figure when they each refer to white noise?

When you have a larger simple size you can get more precise estimate for correlation. The blue dashed line which representing ACF is denoted by ±1.96 / √N, and when we have smaller sample size like first plot where n is 36 then blue lines are further from 0. Getting close to zero indicated that past values do affect the future values. This shown when we increase the sample size to 1000 where we can better estimation of our correlation. When the sample size increase the confidence level representing the blue line start to get narrow showing that our sample size has pure white noise.

Question 9.2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

# loading the data for the gafa stock 
data(gafa_stock)
gafa_stock

## # A tsibble: 5,032 x 8 [!]
## # Key:       Symbol [4]
##    Symbol Date        Open  High   Low Close Adj_Close    Volume
##    <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>     <dbl>
##  1 AAPL   2014-01-02  79.4  79.6  78.9  79.0      67.0  58671200
##  2 AAPL   2014-01-03  79.0  79.1  77.2  77.3      65.5  98116900
##  3 AAPL   2014-01-06  76.8  78.1  76.2  77.7      65.9 103152700
##  4 AAPL   2014-01-07  77.8  78.0  76.8  77.1      65.4  79302300
##  5 AAPL   2014-01-08  77.0  77.9  77.0  77.6      65.8  64632400
##  6 AAPL   2014-01-09  78.1  78.1  76.5  76.6      65.0  69787200
##  7 AAPL   2014-01-10  77.1  77.3  75.9  76.1      64.5  76244000
##  8 AAPL   2014-01-13  75.7  77.5  75.7  76.5      64.9  94623200
##  9 AAPL   2014-01-14  76.9  78.1  76.8  78.1      66.1  83140400
## 10 AAPL   2014-01-15  79.1  80.0  78.8  79.6      67.5  97909700
## # ℹ 5,022 more rows

# Chapter 9.5  ACF plot and PACF you can use  "plot in one command is to use the gg_tsdisplay() function with plot_type = "partial"."
gafa_stock %>%
  filter(Symbol == "AMZN") %>%
  gg_tsdisplay(Close, plot_type='partial') +
  labs(title = "Daily Closing Stock Price of Amazon")

## Warning: Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.

In the “Daily Closing Stock Price of Amazon” we can observe the data has increasing trend/variance from 2014 to 2018. Each of the daily closing price shows to fluctuate daily and with clear upward trend, however the fluctuation does increase in magnitude as the years go by. In the ACF plot we can observe teh past value has significant relationship with the past and future values, this means it lacks white noise. In the PACF we can see large spike at when lag is 1, this can spike can stablizing when we use differencing. Both the ACF and PACF support the data being non stationary meaning, that data will exhibit strong trend or seasonality has the years go by. We can address this issue with the differencing and transformation like log.

Question 9.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

a.) Turkish GDP from global_economy.

# We will plot the data with the gg_tsdiplay from the previous r chunk to see teh ACF and PACF plot
global_economy %>%
  filter(Country == "Turkey") %>%
  gg_tsdisplay(GDP, plot_type='partial') +
  labs(title = " Pre - Transformed Turkish GDP Data")

Chapter 3 talks aboit Box-cox transformation with the guerrero feature and we can pull out the lambda and unitroot_ndiffs feature from 9.1 to find our difference value

# the guerrero will help find the lambda
lambda_turkey =  global_economy %>%
  filter(Country == "Turkey") %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

# this will by using the unit root test to find the difference to make the data stationary 
turkey_unit_test = global_economy %>%
  filter(Country == "Turkey") %>%
  features(box_cox(GDP,lambda_turkey), unitroot_ndiffs)

We now plot the GDP of Turkey now transformed data

#add the box_cox transformation to the gg fuction here
global_economy %>%
  filter(Country == "Turkey") %>%
  gg_tsdisplay(difference(box_cox(GDP,lambda_turkey)), plot_type='partial') +
labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed / Differenced Turkish GDP with $\\lambda$ = ",
         round(lambda_turkey,2))))

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

Using the unit root test we arrive wiht a difference of of “1” and will need one difference to transform our data. Add the differnce to our new ACF and PACG shows a better chance of data being random and getting closer to white noise.

b.) Accommodation takings in the state of Tasmania from aus_accommodation.

# plot the accommodation takings graph
aus_accommodation %>%
  filter(State == 'Tasmania') %>%
  gg_tsdisplay(Takings, plot_type='partial') +
  labs(title = " Pre Transformation of Tasmania Accomodation Takings Data ")

we will continue to find the lambda and use unit root test to find the difference

# finding the lambda
lambda_aus  =  aus_accommodation %>%
  filter(State == 'Tasmania') %>%
  features(Takings, features = guerrero) %>%
  pull(lambda_guerrero)

# using the unit root test 
aus_accommodation %>%
  filter(State == "Tasmania") %>%
  features(box_cox(Takings,lambda_aus), unitroot_nsdiffs)

## # A tibble: 1 × 2
##   State    nsdiffs
##   <chr>      <int>
## 1 Tasmania       1

aus_accommodation %>%
  filter(State == 'Tasmania') %>%
  gg_tsdisplay(difference(box_cox(Takings,lambda_aus)), plot_type='partial') +
  labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed / Differenced Tasmania Accomodation Takings Model Data $\\lambda$ = ",
         round(lambda_aus,2))))

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

c.) Monthly sales from souvenirs.

souvenirs %>%
  gg_tsdisplay(Sales, plot_type = 'partial') + 
  labs(title = "Monthly Souvenirs Sales Model Data")

lambda_sale =  souvenirs %>%
  features(Sales, features = guerrero) %>%
  pull(lambda_guerrero)

souvenirs_unit_test  = souvenirs %>% 
  features(Sales, unitroot_ndiffs)

souvenirs %>% 
  gg_tsdisplay(difference(box_cox(Sales,lambda_sale)), plot_type='partial')+ 
labs(y = "",
       title = latex2exp::TeX(paste0(
         "Transformed / Differenced Monthly Souvenirs Sales Model Data $\\lambda$ = ",
         round(lambda_sale,2))))

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

In the pre transformation we can observe strong seasonality and upward trend of fluctuations throughout the data. With our box cox transformation we arrive at lambda rounded to zero. A lambda close to 0 indicates a log transformation will stabilize the variance in the fluctuations throughout the trend. After the difference the ACF and PACF shows signs of the correlation decaying.

#Question 9.5 For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of difference (after transformation if necessary) to obtain stationary data.

# setting seed for random sampling to make same results in future process
set.seed(879)
myseries =  aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

# plot the data to get the ACF and PACF for the non transformed data
myseries %>%
  gg_tsdisplay(Turnover, plot_type='partial') +
  labs(title = " Retail Turnover Data")

# finding the lambdda 
lambda_turnover =  myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

# unitroot test
myseries %>%
  features(box_cox(Turnover, lambda_turnover), unitroot_nsdiffs)

## # A tibble: 1 × 3
##   State             Industry                           nsdiffs
##   <chr>             <chr>                                <int>
## 1 Western Australia Other recreational goods retailing       1

myseries %>%
  gg_tsdisplay(difference(box_cox(Turnover,lambda_turnover)), plot_type='partial') +
  labs(title = " Transformation of Retail Turnover Data")

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

Before our transformation we observed seasonality in the retail turover data, as employment differ based on upcoming holidays and other factors like vacations. After the transformation, where our unit root test shows “1” difference we can transform our data into a stationary model. The ACF and PACF significantly decay after our transformation with the mean gettting closer to zero.

Question 9.6

a.) Simulate and plot some data from simple ARIMA models.

Use the following R code to generate data from an AR(1) model with ϕ1 = 0.6 and σ2 = 1. The process starts with y1 = 0.

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

b.) Produce a time plot for the series. How does the plot change as you change ϕ1?

sim %>%
  autoplot(y) +
  labs(title = "AR(1) model , phi_1 = 0.6")

# we will change the phi with changing to 0, we dont need to make tssible for this so just add the autoplot it 
for(i in 2:100)
  y[i] <- 0*y[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) %>%
  autoplot(y) +
  labs(title = "AR(1) model , phi_1 = 0")

After changing phi to 0, we can AR model closer to randomness, because the series will show no sign of randomness, trend or seasonality. With a lack of autocorelation we say the model has white noise. Each data point will be independent and not have an affect on the future data point.

c.) Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1

for(i in 2:100)
  y[i] <- e[i] + 0.6 * e[i-1]

sim_2 <- tsibble(idx = seq_len(100), y = y, index = idx)

d.) Produce a time plot for the series. How does the plot change as you change θ1?

# OG plot wiht phi  = 0.6
sim_2 %>%
  autoplot(y) +
 labs(title = "MA(1) model , theta = 0.6")

Changing it to 0 and -1

for(i in 2:100)
  y[i] <- 0*e[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) %>%
  autoplot(y) +
  labs(title = "MA(1) model,  theta_1 = 0")

for(i in 2:100)
  y[i] <- -1*e[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) %>%
  autoplot(y) +
  labs(title = "MA(1) model , theta_1 = -1")

Changing theta equal to 0 will create pure white noise where each point is independent and points are randomly placed. However, if we change tetha equal to -1, we will create a negative dependence among the data, meaning if one of the value is high it will rapidly become low. This make frequent zig - zag like pattern throughout the variance.

e.) Generate data from an ARMA(1,1) model with ϕ1 = 0.6, θ1 = 0.6 and σ 2 = 1.

set.seed(456)
y <- numeric(100)

for(i in 2:100)
  y[i] <- e[i] + 0.6 * e[i-1] + 0.6 * y[i-1]

ARMA1 =  tsibble(idx = seq_len(100), y = y, index = idx)

f.) Generate data from an AR(2) model with ϕ1 = −0.8, ϕ2 = 0.3 and σ2 =1. (Note that these parameters will give a non-stationary series.)

set.seed(456)
y <- numeric(100)

for(i in 3:100)
  y[i] <- -0.8 * y[i-1] + 0.3 * y[i-2] + e[i]

ARMA2 <- tsibble(idx = seq_len(100), y = y, index = idx)

g.) Graph the latter two series and compare them.

ARMA1 %>%
  gg_tsdisplay(y, plot_type='partial') +
  labs(title = ( " The ARMA(1,1) Model Data"))

ARMA2 %>%
  gg_tsdisplay(y, plot_type='partial') +
  labs(title = (" The ARMA(2) Model Data"))

In the ARMA(1,1) we can observe a a stationary series and it lack prediction over time. There is no clear trend and there fluctuations through the series with the ACF and PACF decaying overtime. However, in our ARMA (2) model , it appears not to be stationary with an oscillating feature in teh graph where its amplitude looks to be increasing overtime.

Question 9.7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

a.) Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

air_fit = aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers))

report(air_fit)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8756
## s.e.   0.0722
## 
## sigma^2 estimated as 4.671:  log likelihood=-87.8
## AIC=179.61   AICc=179.93   BIC=182.99

air_fit %>% 
  gg_tsresiduals() +
  labs(title = " The Residuals Plot for ARIMA(0,2,1) Model")

air_fit %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian AirPassengers Using ARIMA (0,2,1) Model ", y = "Millions of Passengers")

b.) Write the model in terms of the backshift operator. 𝑦𝑡=−0.87𝜀𝑡−1+𝜀𝑡 (1−𝐵)2𝑦𝑡=(1−0.87𝐵)𝜀𝑡

c.) Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

air_fit_2 = aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ pdq(0,1,0)))

report(air_fit_2)

## Series: Passengers 
## Model: ARIMA(0,1,0) w/ drift 
## 
## Coefficients:
##       constant
##         1.3669
## s.e.    0.3319
## 
## sigma^2 estimated as 4.629:  log likelihood=-89.08
## AIC=182.17   AICc=182.48   BIC=185.59

air_fit_2 %>% 
  gg_tsresiduals() +
  labs(title = " The Residuals Plot for ARIMA(0,1,0)")

air_fit_2 %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Air Passengers with ARIMA(0,1,0)", y = "Millions of Passengers")

d.) Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

air_fit_3  = aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ pdq(2,1,2)))

report(air_fit_3)

## Series: Passengers 
## Model: ARIMA(2,1,2) w/ drift 
## 
## Coefficients:
##          ar1      ar2      ma1     ma2  constant
##       1.4694  -0.5103  -1.5736  0.6780    0.0650
## s.e.  0.3780   0.3558   0.3081  0.2688    0.0294
## 
## sigma^2 estimated as 4.748:  log likelihood=-87.74
## AIC=187.47   AICc=189.94   BIC=197.75

air_fit_3 %>% 
  gg_tsresiduals() +
  labs(title = " The Residuals Plot for ARIMA(2,1,2)")

air_fit_3 %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Air Passengers with ARIMA(2,1,2)", y = "Millions of Passengers")

air_fit_4 = aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))

## Warning: 1 error encountered for ARIMA(Passengers ~ 0 + pdq(2, 1, 2))
## [1] non-stationary AR part from CSS

report(air_fit_4)

## Series: Passengers 
## Model: NULL model 
## NULL model

{r-1} air_fit_4 %>% gg_tsresiduals() + labs(title = " The Residuals Plot for ARIMA(0,1,2)")

air_fit_4 %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Air Passengers with ARIMA(0,1,2) Remvoing Constant", y = "Millions of Passengers")

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning: Removed 10 rows containing missing values or values outside the scale range
## (`geom_line()`).

e.) Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

air_fit_5 = aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))

## Warning: Model specification induces a quadratic or higher order polynomial trend. 
## This is generally discouraged, consider removing the constant or reducing the number of differences.

report(air_fit_5)

## Series: Passengers 
## Model: ARIMA(0,2,1) w/ poly 
## 
## Coefficients:
##           ma1  constant
##       -1.0000    0.0721
## s.e.   0.0709    0.0260
## 
## sigma^2 estimated as 4.086:  log likelihood=-85.74
## AIC=177.48   AICc=178.15   BIC=182.55

air_fit_5 %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Air Passengers with ARIMA(0,1,2) Model Adding a constant", y = "Millions of Passengers")

Since we using the second order difference will make series that look quadratic trend. The trend become upward and constant will make our prediction of the passengers have an increasing trend.

#Question 9.8

For the United States GDP series (from global_economy):

a.) if necessary, find a suitable Box-Cox transformation for the data;

global_economy %>%
  filter(Code == "USA") %>%
  gg_tsdisplay(GDP, plot_type='partial') +
  labs(title = " United States GDP Model Data")

b.) fit a suitable ARIMA model to the transformed data using ARIMA()

lambda_us = global_economy %>%
  filter(Code == "USA") %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

us_fit =  global_economy %>%
  filter(Code == "USA") %>%
  model(ARIMA(box_cox(GDP, lambda_us)))

report(us_fit)

## Series: GDP 
## Model: ARIMA(1,1,0) w/ drift 
## Transformation: box_cox(GDP, lambda_us) 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

c.) try some other plausible models by experimenting with the orders chosen;

global_economy %>%
  filter(Code == "USA") %>%
  features(box_cox(GDP,lambda_us), unitroot_ndiffs)

## # A tibble: 1 × 2
##   Country       ndiffs
##   <fct>          <int>
## 1 United States      1

global_economy %>%
  filter(Code == "USA") %>%
  gg_tsdisplay(box_cox(GDP,lambda_us), plot_type='partial') +
  labs(title = "United States GDP Model Data Transformed/Differ")

# modeling several 
us_fit_2 =  global_economy %>%
  filter(Code == "USA") %>%
  model(
    arima111 = ARIMA(box_cox(GDP,lambda_us) ~ pdq(1,1,1)),
    arima210 = ARIMA(box_cox(GDP,lambda_us) ~ pdq(2,1,0)),
    arima212 = ARIMA(box_cox(GDP,lambda_us) ~ pdq(2,1,2)),
    arima120 = ARIMA(box_cox(GDP,lambda_us) ~pdq(1,2,0)),
    arima013 = ARIMA(box_cox(GDP,lambda_us) ~ pdq(0,1,3)))

glance(us_fit_2) %>% arrange(AICc) %>% select(.model:BIC)

## # A tibble: 5 × 6
##   .model   sigma2 log_lik   AIC  AICc   BIC
##   <chr>     <dbl>   <dbl> <dbl> <dbl> <dbl>
## 1 arima120  6780.   -326.  656.  656.  660.
## 2 arima111  5580.   -325.  659.  659.  667.
## 3 arima210  5580.   -325.  659.  659.  667.
## 4 arima013  5708.   -325.  661.  662.  671.
## 5 arima212  5734.   -325.  662.  664.  674.

choose what you think is the best model and check the residual diagnostics

us_fit_2 %>%
  select(arima111) %>%
  gg_tsresiduals() +
  ggtitle("Residuals for ARIMA(1,1,1)")

e.) produce forecasts of your fitted model. Do the forecasts look reasonable?

us_fit_2 %>%
  forecast(h=10) %>%
  filter(.model=='arima111') %>%
  autoplot(global_economy)

f.) compare the results with what you would obtain using ETS() (with no transformation).

us_ets =  global_economy %>%
  filter(Code == "USA") %>%
  model(ETS(GDP))

report(us_ets)

## Series: GDP 
## Model: ETS(M,A,N) 
##   Smoothing parameters:
##     alpha = 0.9990876 
##     beta  = 0.5011949 
## 
##   Initial states:
##          l[0]        b[0]
##  448093333334 64917355687
## 
##   sigma^2:  7e-04
## 
##      AIC     AICc      BIC 
## 3190.787 3191.941 3201.089

us_ets %>%
  forecast(h=10) %>%
  autoplot(global_economy)

ARIMA Homework

Nana Frimpong

2025-03-25

Question 9.1

Question 9.2

Question 9.3

Question 9.6

Question 9.7