ARIMA Homework

library(fpp3)
## ── Attaching packages ────────────────────────────────────────────── fpp3 0.5 ──
## ✔ tibble      3.2.1     ✔ tsibble     1.1.4
## ✔ dplyr       1.1.3     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.0     ✔ feasts      0.3.1
## ✔ lubridate   1.9.3     ✔ fable       0.3.3
## ✔ ggplot2     3.4.4     ✔ fabletools  0.4.0
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()
library(urca)
library(fabletools)

Exercises 9.1, 9.2, 9.3, 9.5, 9.6,9.7,9.8

Exercise 1

A.)

All of these figures indicate that the data is white noise. They all fall within the 95% confidence interval boundries. The degree of correlation displayed falls into a narrower range because as more values are added, coincidental lag correlation is less likely to appear.

B.)

The critical values for time series are defined by +/- (1.96/ sqrt(T)). Since the time series with more random values is longer, the critical value will be smaller. The autocorrelations will be different in each figure because the coincidental autocorrelation intervals will change as more random numbers are introduced.

2.)

gafa_stock %>% filter(Symbol == "AMZN") %>%
  autoplot(Adj_Close)

gafa_stock %>% 
  filter(Symbol == "AMZN") %>% 
  gg_tsdisplay(Adj_Close, plot_type = "partial")
## Warning: Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.

By definition, any time series that contains trends is not stationary. The time series plot clearly has an upward trend to it. We can also observe the ACF. In a stationary time series, the ACF will drop to zero quickly, which is not the case for this data.

3.)

A)

##Plot the turkish GDP
turkish <- global_economy %>% filter(Country == "Turkey")
global_economy %>% filter(Country == "Turkey") %>%
  autoplot(GDP) + labs(title = "Turkish GDP") + xlab("Year")

## Apply box cox transformation:
lambda <- turkish %>% features(GDP , features = guerrero) %>%
  pull(lambda_guerrero)
turkish <- turkish %>% mutate(box_cox = box_cox(GDP, lambda))

## Determine the optimal degree for the data:
turkish %>% features(box_cox, unitroot_ndiffs)
## # A tibble: 1 × 2
##   Country ndiffs
##   <fct>    <int>
## 1 Turkey       1

The optimal number of differencing is 1 according to the unit root test.

turkish <- turkish %>% mutate(diff_close = difference(box_cox))
turkish %>% autoplot(diff_close)
## Warning: Removed 1 row containing missing values (`geom_line()`).

Check to see if the data is stationary

turkish %>% 
  gg_tsdisplay(diff_close, plot_type = "partial")
## Warning: Removed 1 row containing missing values (`geom_line()`).
## Warning: Removed 1 rows containing missing values (`geom_point()`).

Based on the timeseries and acf plots, the data is now stationary

B.)

tasmania <-aus_accommodation %>% filter(State == "Tasmania")
tasmania %>% autoplot(Takings)

Apply box-cox transformation

lambda <- tasmania %>% features(Takings , features = guerrero) %>%
  pull(lambda_guerrero)

tasmania <- tasmania %>% mutate(box_cox = box_cox(Takings, lambda))
tasmania %>% autoplot(box_cox) + labs(title = "Plot of Tasmania Data with Box-Cox transformation")

tasmania %>% gg_tsdisplay(box_cox, plot_type = "partial")

Determine the amount of differencing needed

tasmania %>% features(box_cox, unitroot_ndiffs)
## # A tibble: 1 × 2
##   State    ndiffs
##   <chr>     <int>
## 1 Tasmania      1

tasmania %>% features(box_cox, unitroot_nsdiffs)
## # A tibble: 1 × 2
##   State    nsdiffs
##   <chr>      <int>
## 1 Tasmania       1

Regular and seasonal differencing needs to be applied to this data.

tasmania <- tasmania %>% mutate(diff_takings = difference(box_cox))
tasmania %>% autoplot(diff_takings)
## Warning: Removed 1 row containing missing values (`geom_line()`).


tasmania <- tasmania %>% mutate(seasonal_diff = difference(difference(box_cox,lag = 4)),1)
tasmania %>% 
  gg_tsdisplay(seasonal_diff, plot_type = "partial")
## Warning: Removed 5 rows containing missing values (`geom_line()`).
## Warning: Removed 5 rows containing missing values (`geom_point()`).

tasmania %>% autoplot(seasonal_diff)
## Warning: Removed 5 rows containing missing values (`geom_line()`).

Although there still appears to be some autocorrelation after taking a seasonal and first difference, the data is far more stationary than before.

C.)

souvenirs %>%
  autoplot(Sales)

Apply boxcox transformation

lambda <- souvenirs %>% features(Sales , features = guerrero) %>%
  pull(lambda_guerrero)

souvenirs <- souvenirs %>% mutate(box_cox = box_cox(Sales, lambda))
souvenirs %>% autoplot(box_cox) + labs(title = "Plot of Souvenir Data with Box-Cox transformation")

Determine what differences need to be applied

souvenirs %>% features(box_cox, unitroot_ndiffs)
## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      1

souvenirs %>% features(box_cox, unitroot_nsdiffs)
## # A tibble: 1 × 1
##   nsdiffs
##     <int>
## 1       1

A first difference and seasonal difference should be applied

souvenirs <- souvenirs %>% mutate(seasonal_dif = difference(difference(box_cox, lag =12), lag= 1))
souvenirs %>% autoplot(seasonal_dif)
## Warning: Removed 13 rows containing missing values (`geom_line()`).

souvenirs %>% 
  gg_tsdisplay(seasonal_dif, plot_type = "partial")
## Warning: Removed 13 rows containing missing values (`geom_line()`).
## Warning: Removed 13 rows containing missing values (`geom_point()`).

The data is now stationary

Question 5.)

set.seed(123478)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries %>% autoplot(Turnover)

First we will apply a boxcox transformation:

lambda <- myseries %>% features(Turnover , features = guerrero) %>%
  pull(lambda_guerrero)

myseries <- myseries %>% mutate(box_cox = box_cox(Turnover, lambda))
myseries %>% autoplot(box_cox) + labs(title = "Plot of Turnover Data with Box-Cox transformation")

myseries %>% gg_tsdisplay(Turnover,plot_type = "partial")

Now the optimal differencing must be determined

myseries %>% features(Turnover, unitroot_ndiffs)
## # A tibble: 1 × 3
##   State      Industry                       ndiffs
##   <chr>      <chr>                           <int>
## 1 Queensland Supermarket and grocery stores      1
myseries %>% features(Turnover, unitroot_nsdiffs)
## # A tibble: 1 × 3
##   State      Industry                       nsdiffs
##   <chr>      <chr>                            <int>
## 1 Queensland Supermarket and grocery stores       1

myseries <- myseries %>% mutate(seasonal_dif = difference(box_cox, lag =12))
myseries %>% autoplot(seasonal_dif)
## Warning: Removed 12 rows containing missing values (`geom_line()`).

myseries %>% 
  gg_tsdisplay(seasonal_dif, plot_type = "partial")
## Warning: Removed 12 rows containing missing values (`geom_line()`).
## Warning: Removed 12 rows containing missing values (`geom_point()`).

Although the unit root checks say that thE data should have a first order and a seasonal difference, the data appears more stationary with just the seasonal difference

6.)

A.)

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim %>% autoplot(y)

B.)

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.9*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim %>% autoplot(y)

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.1*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

sim %>% autoplot(y)

Decreasing theta will increase amount of times the data bounces across the x axis around 0, increasing theta will do the opposite.

C.)

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*e[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim %>% autoplot(y)

D.)

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.1*e[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim %>% autoplot(y) + labs(title = "Theta at 0.1")


for(i in 2:100)
  y[i] <- 0.9*e[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim %>% autoplot(y) + labs(title = "theta at 0.9")

As theta is decreased, the data bounces around 0 more often, while increasing it does the opposite

e.)

The autoregressive model can be written as y(t) = c + phi(y-1) + phi(y-2)…

The moving average model can be written as y(t) = c + e(t) + theta(e-1) + theta(e-1)…

These can be combined as:

for(i in 2:100)
  y[i] <- e[i] + 0.6*e[i-1] + 0.6 * y[i-1]
sim1 <- tsibble(idx = seq_len(100), y = y, index = idx)
sim1 %>% autoplot(y)

f.)

for(i in 3:100)
  y[i] <- -0.8*y[i-1] +0.3*y[i-2] + e[i]
sim2 <- tsibble(idx = seq_len(100), y = y, index = idx)
sim2 %>% autoplot(y)

G.)

sim1 %>% 
  gg_tsdisplay(y, plot_type = "partial") + labs(title ="ARMA(1,1) model with phi at 0.6 and theta at 0.6")


sim2 %>% 
  gg_tsdisplay(y, plot_type = "partial") + labs(title ="ARMA(2) model with phi 1 at -0.8 and phi 2 at 0.3")

## The ARMA(1,1) plot is far more stable and stationary, while the the ARMA(2) model is cLearly not stationary. The ARMA(1,1) model autocorrelations decrease much faster than the ARMA(2)

Question 7.)

A.)

passengers <- aus_airpassengers %>% filter(Year < 2012)

fit <- passengers %>% model(ARIMA(Passengers))
report(fit)
## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8756
## s.e.   0.0722
## 
## sigma^2 estimated as 4.671:  log likelihood=-87.8
## AIC=179.61   AICc=179.93   BIC=182.99

The recommended model is ARIMA(0,2,1)

fit %>% forecast(h = 10) %>%
  autoplot(passengers) + 
  labs(title = "Aircraft passengers using ARIMA(0,2,1) model")


fit %>% 
  gg_tsresiduals() + labs(title = "ARIMA(0,2,1) Residuals")

Based on the residuals plot, the residuals look like white noise

B.)

The model can be written as follows:

(1-B)^2 y(t) = (1-0.87B)ε(t)

C.)

fit2 <-passengers %>% model(arima = ARIMA(Passengers ~ pdq(0,1,0)))

fit2 %>% forecast(h = 10) %>%
  autoplot(passengers) + labs(title = "Australian Passenger Numbers with ARIMA(0,1,0)")


fit2 %>% gg_tsresiduals() + labs(title = "ARIMA(0,1,0) Residuals")

The plot of the ARIMA(0,1,0) model looks nearly identical, although the confidence interval looks slightly narrower and the autocorrelation peaks appear in different locations

fit3 <- passengers %>% model(arima = ARIMA(Passengers ~ pdq(2,1,2)))
fit3 %>% forecast(h = 10) %>%
  autoplot(passengers) + labs(title = "Australian Passenger Numbers with ARIMA(2,1,2")

fit3 %>% gg_tsresiduals() + labs(title = "ARIMA (2,1,2) Residuals")

fit4 <- passengers %>% model(arima = ARIMA(Passengers ~ pdq(0,2,1)))
fit4 %>%forecast(h = 10) %>%
  autoplot(passengers) + labs(title = "Australian Passenger Numbers with ARIMA(0,2,1)")

fit4 %>% gg_tsresiduals() + labs(title = "ARIMA (0,2,1) Residuals")

Question 8

usa <- global_economy %>% filter(Code == "USA")
usa %>% autoplot(GDP)

## Apply boxcox transormation:

lambda <- usa %>% features(GDP , features = guerrero) %>%
  pull(lambda_guerrero)
usa <- usa %>% mutate(box_cox = box_cox(GDP, lambda))

usa %>% autoplot(box_cox) + labs(title ="USA data with boxcox applied")

fit <- usa %>% model(ARIMA(box_cox))
report(fit)
## Series: box_cox 
## Model: ARIMA(1,1,0) w/ drift 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

The optimal model for the data is an ARIMA(1,1,0)

fit %>% forecast(h = 10) %>% 
  autoplot(usa) + labs(title = "ARIMA(1,1,0) Model")

fit %>% gg_tsresiduals() + labs(title = "ARIMA (1,1,0) Residuals")

Now we can try several models, to see which works best

fit <- usa %>% model(arima_110 = ARIMA(box_cox ~ pdq(1,1,0)),
             arima_111 = ARIMA(box_cox ~ pdq(1,1,1)),
             arima_211 = ARIMA(box_cox ~ pdq(2,1,1)),
             arima_212 = ARIMA(box_cox ~ pdq(2,1,2)),
             arima_210 = ARIMA(box_cox ~ pdq(2,1,0)),
             arima_112 = ARIMA(box_cox ~ pdq(1,1,2)),
             arima_120 = ARIMA(box_cox ~ pdq(1,2,0)))

glance(fit) %>% arrange(AICc) %>% select(.model:BIC)
## # A tibble: 7 × 6
##   .model    sigma2 log_lik   AIC  AICc   BIC
##   <chr>      <dbl>   <dbl> <dbl> <dbl> <dbl>
## 1 arima_120  6780.   -326.  656.  656.  660.
## 2 arima_110  5479.   -325.  657.  657.  663.
## 3 arima_111  5580.   -325.  659.  659.  667.
## 4 arima_210  5580.   -325.  659.  659.  667.
## 5 arima_112  5630.   -325.  660.  661.  670.
## 6 arima_211  5647.   -325.  660.  661.  671.
## 7 arima_212  5734.   -325.  662.  664.  674.

D.)

The ARIMA(1,2,0) has a slightly lower AIC than (1,1,0,) so it should be chosen

fit %>% select(arima_120) %>% gg_tsresiduals() + labs(title = "ARIMA (1,2,0) Residuals")

The residuals appear to be random and the distribution of the residuals is more normal than the 1,1,0 model so this should be the best choice

E.)

fit  %>% forecast(h = 10) %>% filter(.model == 'arima_120') %>%
  autoplot(usa) + labs(title ="USA Data with ARIMA(1,2,0) model applied")

fit2 <- usa %>% 
  model(ETS(GDP))
report(fit2)
## Series: GDP 
## Model: ETS(M,A,N) 
##   Smoothing parameters:
##     alpha = 0.9990876 
##     beta  = 0.5011949 
## 
##   Initial states:
##          l[0]        b[0]
##  448093333334 64917355687
## 
##   sigma^2:  7e-04
## 
##      AIC     AICc      BIC 
## 3190.787 3191.941 3201.089

The AIC is far greater than the ARIMA model that was selected, so this would be a poor choice.

fit2  %>% forecast(h = 10) %>%
  autoplot(usa) + labs(title ="USA Data with ETS model applied")

ARIMA Homework

Steve Phillips

2024-03-13

Exercises 9.1, 9.2, 9.3, 9.5, 9.6,9.7,9.8

Exercise 1

A.)

All of these figures indicate that the data is white noise. They all fall within the 95% confidence interval boundries. The degree of correlation displayed falls into a narrower range because as more values are added, coincidental lag correlation is less likely to appear.

B.)

2.)

By definition, any time series that contains trends is not stationary. The time series plot clearly has an upward trend to it. We can also observe the ACF. In a stationary time series, the ACF will drop to zero quickly, which is not the case for this data.

3.)

A)

The optimal number of differencing is 1 according to the unit root test.

Check to see if the data is stationary

Based on the timeseries and acf plots, the data is now stationary

B.)

Apply box-cox transformation

Determine the amount of differencing needed

Regular and seasonal differencing needs to be applied to this data.

Although there still appears to be some autocorrelation after taking a seasonal and first difference, the data is far more stationary than before.

C.)

Apply boxcox transformation

Determine what differences need to be applied

A first difference and seasonal difference should be applied

The data is now stationary

Question 5.)

First we will apply a boxcox transformation:

Now the optimal differencing must be determined

Although the unit root checks say that thE data should have a first order and a seasonal difference, the data appears more stationary with just the seasonal difference

6.)

A.)

B.)

Decreasing theta will increase amount of times the data bounces across the x axis around 0, increasing theta will do the opposite.

C.)

D.)

As theta is decreased, the data bounces around 0 more often, while increasing it does the opposite

e.)

The autoregressive model can be written as y(t) = c + phi(y-1) + phi(y-2)…

The moving average model can be written as y(t) = c + e(t) + theta(e-1) + theta(e-1)…

These can be combined as:

f.)

G.)

Question 7.)

A.)

The recommended model is ARIMA(0,2,1)

Based on the residuals plot, the residuals look like white noise

B.)

The model can be written as follows:

(1-B)^2 y(t) = (1-0.87B)ε(t)

C.)

The plot of the ARIMA(0,1,0) model looks nearly identical, although the confidence interval looks slightly narrower and the autocorrelation peaks appear in different locations

Question 8

The optimal model for the data is an ARIMA(1,1,0)

Now we can try several models, to see which works best

D.)

The ARIMA(1,2,0) has a slightly lower AIC than (1,1,0,) so it should be chosen

The residuals appear to be random and the distribution of the residuals is more normal than the 1,1,0 model so this should be the best choice

E.)

The AIC is far greater than the ARIMA model that was selected, so this would be a poor choice.