BrianSingh_Data624

9.1 Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

a. Explain the differences among these figures. Do they all indicate that the data are white noise?

Per Hyndman, for white noise series, we expect each autocorrelation to be close to zero and mostly in bounds of the dotted lines. This is the case with these three plots.

b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

This is due to the varying amounts of observations, which defines the length of the time series. In a series with a larger T, hence larger number of observations, the bounds will be smaller and coefficients of the autocorrelation will also decrease.

9.2 A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

library(fpp3)

## Warning: package 'fpp3' was built under R version 4.2.3

## ── Attaching packages ────────────────────────────────────────────── fpp3 0.5 ──

## ✔ tibble      3.2.1     ✔ tsibble     1.1.4
## ✔ dplyr       1.1.3     ✔ tsibbledata 0.4.1
## ✔ tidyr       1.3.0     ✔ feasts      0.3.1
## ✔ lubridate   1.9.3     ✔ fable       0.3.3
## ✔ ggplot2     3.4.4     ✔ fabletools  0.4.0

## Warning: package 'tibble' was built under R version 4.2.3

## Warning: package 'dplyr' was built under R version 4.2.3

## Warning: package 'tidyr' was built under R version 4.2.3

## Warning: package 'lubridate' was built under R version 4.2.3

## Warning: package 'ggplot2' was built under R version 4.2.3

## Warning: package 'tsibble' was built under R version 4.2.3

## Warning: package 'tsibbledata' was built under R version 4.2.3

## Warning: package 'feasts' was built under R version 4.2.3

## Warning: package 'fabletools' was built under R version 4.2.3

## Warning: package 'fable' was built under R version 4.2.3

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

library(dplyr)
library(ggplot2)
library(urca)

## Warning: package 'urca' was built under R version 4.2.3

gafa_stock %>%
  filter(Symbol == "AMZN") %>%
  gg_tsdisplay(Close, plot_type='partial')

## Warning: Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.

There is an upwards trend but no seasonality, which is required for a stationary series. The ACF plot is not normally distributed and the coefficients are all outside of the upper bounds and has a slight decrease towards zero over time indicating non-stationary data. Per Hyndman, in the PACF, a significant spike at lag p, as shown above, is also indicative.

9.3 For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

a. Turkish GDP from global_economy

tk_gdp <- global_economy %>% 
  filter(Country=='Turkey')

lambda_tk_gdp <- tk_gdp %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

tk_gdp  %>%
  mutate(GDP = box_cox(GDP, lambda_tk_gdp)) %>%
  features(GDP, unitroot_ndiffs)

## # A tibble: 1 × 2
##   Country ndiffs
##   <fct>    <int>
## 1 Turkey       1

The lambda returned for an appropriate box-cox transformation on the turkey gdp data is 0.1572. In addition, the unit root test on the transformed data returns 1 difference required to make the series stationary.

Accommodation takings in the state of Tasmania from aus_accommodation.

tasmania_accommodation <- aus_accommodation %>% 
  filter(State == 'Tasmania')

lambda_tasmania <- tasmania_accommodation %>%
  features(Takings, features = guerrero) %>%
  pull(lambda_guerrero)

tasmania_accommodation %>%
  mutate(Takings = box_cox(Takings, lambda_tasmania)) %>%
  features(Takings, unitroot_ndiffs)

## # A tibble: 1 × 2
##   State    ndiffs
##   <chr>     <int>
## 1 Tasmania      1

The lambda returned for an appropriate box-cox transformation on the Tasmania accommodation takings data is 0.002. In addition, the unit root test on the transformed data returns 1 difference required to make the series stationary.

c. Monthly sales from souvenirs.

lambda_souvenirs <- souvenirs %>% 
  features(Sales, features = guerrero) %>%
  pull(lambda_guerrero)

souvenirs %>%
  mutate(Sales = box_cox(Sales, lambda_souvenirs)) %>%
  features(Sales, unitroot_ndiffs)

## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      1

The lambda returned for an appropriate box-cox transformation on the souvenirs data is 0.002. In addition, the unit root test on the transformed data returns 1 difference required to make the series stationary.

9.5 For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

set.seed(12345678)
myseries <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

myseries %>%
  gg_tsdisplay(Turnover, plot_type='partial')

We can apply transformations and find ndiffs required to make it stationary.

lambda_myseries <- myseries %>% 
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

myseries %>%
  mutate(Turnover = box_cox(Turnover, lambda_myseries)) %>%
  features(Turnover, unitroot_ndiffs)

## # A tibble: 1 × 3
##   State              Industry                                            ndiffs
##   <chr>              <chr>                                                <int>
## 1 Northern Territory Clothing, footwear and personal accessory retailing      1

The lambda returned for an appropriate box-cox transformation on the retao;data is 0.27. In addition, the unit root test on the transformed data returns 1 difference required to make the series stationary.

9.6 Simulate and plot some data from simple ARIMA models.

a. Use the following R code to generate data from an AR(1) model with phi = 0.6 and phi^2 = 1. The process starts with y1 = 0.

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

b. Produce a time plot for the series. How does the plot change as you change ϕ1?

#at phi = 0.6
sim %>%
  autoplot(y)

# at phi = 0.1
for(i in 2:100)
  y[i] <- 0.1*y[i-1] + e[i]
sim_0.1 <- tsibble(idx = seq_len(100), y = y, index = idx)

sim_0.1 %>% autoplot(y)

# at phi = 0.5
for(i in 2:100)
  y[i] <- 0.5*y[i-1] + e[i]
sim_0.5 <- tsibble(idx = seq_len(100), y = y, index = idx)

sim_0.5 %>% autoplot(y)

# at phi = 1
for(i in 2:100)
  y[i] <- y[i-1] + e[i]
sim_1 <- tsibble(idx = seq_len(100), y = y, index = idx)
sim_1 %>% autoplot(y)

There is more seasonality in phi = 0.1 than in phi = 0.5 and more in phi = 0.5 than phi = 1, indicating more stationarity as phi is lower.

c. Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1.

for(i in 2:100)
  y[i] <- 0.6*e[i-1] + e[i] 

ma.1 <- tsibble(idx = seq_len(100), y = y, index = idx)

d. Produce a time plot for the series. How does the plot change as you change θ1?

for(i in 2:100)
  y[i] <- 0.1*e[i-1] + e[i] 
ma1.b <- tsibble(idx = seq_len(100), y = y, index = idx)
ma1.b %>%
  autoplot(y)

for(i in 2:100)
  y[i] <- .5*e[i-1] + e[i] 
ma1.c <- tsibble(idx = seq_len(100), y = y, index = idx)
ma1.c %>%
  autoplot(y)

for(i in 2:100)
  y[i] <- e[i-1] + e[i] 
ma1.d<- tsibble(idx = seq_len(100), y = y, index = idx)
ma1.d %>%
  autoplot(y)

There is not much change as θ increases or decreases, which means the data is probably stationary.

e. Generate data from an ARMA(1,1) model with ϕ1=0.6, θ1=0.6 and σ2=1.

e <- rnorm(100)
y <- numeric(100)
for(i in 2:100)
  y[i] <- e[i] + 0.6 * e[i-1] + 0.6 * y[i-1]
arma <- tsibble(idx = seq_len(100), y = y, index = idx)
arma %>% autoplot(y)

### f. Generate data from an AR(2) model with ϕ1=-0.8, ϕ2=0.3 and σ2=1. (Note that these parameters will give a non-stationary series.)

y <- numeric(100)
e <- rnorm(100)
for(i in 3:100)
  y[i] <- -0.8 * y[i-1] + 0.3 * y[i-2] + e[i]

ar <- tsibble(idx = seq_len(100), y = y, index = idx)
ar %>% autoplot(y)

### g. Graph the latter two series and compare them.

gg_tsdisplay(arma,plot_type = "partial")

## Plot variable not specified, automatically selected `y = y`

gg_tsdisplay(ar,plot_type = "partial")

## Plot variable not specified, automatically selected `y = y`

The ARMA model appears to be stationary while AR appears to be non-stationary since we can observe the variance is not constant but increasing when as the series goes on.

9.7 Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

fit <- aus_airpassengers %>%
  model(ARIMA(Passengers))

report(fit)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

ARIMA(0,2,1)

fit %>% 
  gg_tsresiduals()

The residuals are white noise

fit %>% forecast(h=10) %>%
  autoplot(aus_airpassengers)

b. Write the model in terms of the backshift operator.

yt=−0.8963 x E(t−1)+E(t)

c. Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

fit_010 <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ 1+pdq(0,1,0))) %>%
  forecast(h=10) %>%
  autoplot(aus_airpassengers)

fit_010

d. Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

fit_212 <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ 1+pdq(2,1,2))) %>%
  forecast(h=10) %>%
  autoplot(aus_airpassengers)

fit_212

fit_212_noconstant <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))

## Warning: 1 error encountered for ARIMA(Passengers ~ 0 + pdq(2, 1, 2))
## [1] non-stationary AR part from CSS

Receiving an error

e. Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

fit_021 <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))

## Warning: Model specification induces a quadratic or higher order polynomial trend. 
## This is generally discouraged, consider removing the constant or reducing the number of differences.

We get a warning when using a constant on the ARIMA (0,2,1) model.

9.8 For the United States GDP series (from global_economy)

a. if necessary, find a suitable Box-Cox transformation for the data;

us_gdp <- global_economy %>% 
  filter(Country == "United States")
us_gdp%>% 
  autoplot(GDP)

lambda_usgdp <- us_gdp %>%
  features(GDP,features=guerrero) %>%
  pull(lambda_guerrero)
lambda_usgdp

## [1] 0.2819443

us_gdp %>% 
  autoplot(box_cox(GDP, lambda_usgdp))

b. fit a suitable ARIMA model to the transformed data using ARIMA();

fit_usgdp <- us_gdp %>%
  model(ARIMA(box_cox(GDP, lambda_usgdp)))

report(fit_usgdp)

## Series: GDP 
## Model: ARIMA(1,1,0) w/ drift 
## Transformation: box_cox(GDP, lambda_usgdp) 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

ARIMA(1,1,0) w/drift seems to be the best fit

c. try some other plausible models by experimenting with the orders chosen;

fit_a <- us_gdp %>%
  model(ARIMA(GDP ~ pdq(1,2,2)))

report(fit_a)

## Series: GDP 
## Model: ARIMA(1,2,2) 
## 
## Coefficients:
##          ar1      ma1      ma2
##       0.2053  -0.5912  -0.1928
## s.e.  0.3008   0.2886   0.2102
## 
## sigma^2 estimated as 2.646e+22:  log likelihood=-1523.86
## AIC=3055.72   AICc=3056.51   BIC=3063.82

fit_b <- us_gdp %>%
  model(ARIMA(GDP ~ pdq(2,2,2)))

report(fit_b)

## Series: GDP 
## Model: ARIMA(2,2,2) 
## 
## Coefficients:
##          ar1      ar2      ma1     ma2
##       1.3764  -0.4780  -1.9659  1.0000
## s.e.  0.1216   0.1354   0.0723  0.0719
## 
## sigma^2 estimated as 2.283e+22:  log likelihood=-1521.14
## AIC=3052.27   AICc=3053.47   BIC=3062.4

fit_c <- us_gdp %>%
  model(ARIMA(GDP ~ pdq(1,1,0)))

report(fit_c)

## Series: GDP 
## Model: ARIMA(1,1,0) 
## 
## Coefficients:
##          ar1
##       0.9160
## s.e.  0.0522
## 
## sigma^2 estimated as 3.044e+22:  log likelihood=-1556.73
## AIC=3117.47   AICc=3117.69   BIC=3121.55

Trying other values, the lowest AIC is still from part b of the question with the ARIMA(1,1,0) w/drift.

d. choose what you think is the best model and check the residual diagnostics;

fit_110 <- us_gdp %>%
  model(ARIMA(box_cox(GDP, lambda_usgdp) ~ 1 + pdq(1, 1, 0))) 

fit_110 %>% 
  report()

## Series: GDP 
## Model: ARIMA(1,1,0) w/ drift 
## Transformation: box_cox(GDP, lambda_usgdp) 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

fit_110 %>% 
  gg_tsresiduals()

The residuals are right skewed however there is no autocorrelation, denoting white noise.

e. produce forecasts of your fitted model. Do the forecasts look reasonable?

fit_110 %>% forecast(h=10) %>%
  autoplot(us_gdp)

Appears reasonable given the continuous upwards trend

f. compare the results with what you would obtain using ETS() (with no transformation).

fit_ets <- us_gdp %>% 
  model(ETS(GDP))

report(fit_ets)

## Series: GDP 
## Model: ETS(M,A,N) 
##   Smoothing parameters:
##     alpha = 0.9990876 
##     beta  = 0.5011949 
## 
##   Initial states:
##          l[0]        b[0]
##  448093333334 64917355687
## 
##   sigma^2:  7e-04
## 
##      AIC     AICc      BIC 
## 3190.787 3191.941 3201.089

fit_ets %>% forecast(h = 10) %>%
  autoplot(us_gdp)

Looking at the AIC, the ARIMA(1,1,0) model is lower, suggesting a better performing model.

BrianSingh_Data624_HW6

2024-03-20