HW2 CH9

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise? The three plots below indicate that the data are white noise. The ACF bars do not exceed the bounds (dashed lines) which represent that the data is white noise. The height of the bars depends on the sample size. The smaller the sample, the taller the bars and vice versa.
Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise? The bounds (dashed lines) are calculated from +/- 1.96 / sqrt(N). As N gets larger, the critical value (bound) decreases so samples that are smaller may show higher correlation than larger samples. To balance out that issue, the absolute value of the bounds are larger for the smaller samples.

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

#View(gafa_stock)
amzn <- gafa_stock %>%
  filter(Symbol=="AMZN")

amzn %>%
  autoplot(Close) +
  labs(title="Amazon Closing Stock Price") + 
  theme(plot.title = element_text(hjust = 0.5))

ggAcf(amzn$Close, lag.max = 12, 
    type = c("correlation", 
             "covariance", 
             "partial"), 
    plot = TRUE, 
    na.action = na.contiguous, 
    demean = TRUE
    )

ggPacf(amzn$Close, 
     lag.max = 12, 
     plot = TRUE, 
     na.action = na.contiguous, 
     demean = TRUE
     )

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Turkish GDP from global_economy.

gdp_turkey <- global_economy %>%
  filter(Country=="Turkey")

gdp_turkey %>%
  gg_tsdisplay(GDP, plot_type='partial') +
  labs(title = "Turkish GDP")

gdp_lambda <- gdp_turkey %>% features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

gdp_turkey %>% features(box_cox(GDP, gdp_lambda), unitroot_ndiffs)

## # A tibble: 1 x 2
##   Country ndiffs
##   <fct>    <int>
## 1 Turkey       1

gdp_turkey %>% gg_tsdisplay(difference(box_cox(GDP, gdp_lambda))
                            , plot_type = 'partial') +
  labs(title = TeX(paste0("Turkish GDP using $\\lambda$ = ",
         round(gdp_lambda,4))))

## Warning: Removed 1 row(s) containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_point).

Accommodation takings in the state of Tasmania from aus_accommodation.

aus_accom <- aus_accommodation %>% filter(State=="Tasmania")
#View(aus_accom)
aus_accom %>%
  gg_tsdisplay(Takings, plot_type='partial') +
  labs(title = "Tasmanian Accomodations")

accom_lambda <- aus_accom %>%
  features(Takings, features = guerrero) %>%
  pull(lambda_guerrero)

aus_accom %>%
  features(box_cox(Takings, accom_lambda), unitroot_nsdiffs)

## # A tibble: 1 x 2
##   State    nsdiffs
##   <chr>      <int>
## 1 Tasmania       1

aus_accom %>% gg_tsdisplay(difference(box_cox(Takings, accom_lambda)
                                      , 4)
                           , plot_type = 'partial') +
  labs(title = TeX(paste0("Tasmania Accomodation using $\\lambda$ = ",
         round(accom_lambda,4))))

## Warning: Removed 4 row(s) containing missing values (geom_path).

## Warning: Removed 4 rows containing missing values (geom_point).

Monthly sales from souvenirs.

souvenirs %>%
  gg_tsdisplay(Sales
               , plot_type = 'partial'
               , lag=36) +
    labs(title = "Monthly Sales from Souvenirs")

souv_lambda <- souvenirs %>% features(Sales, features = guerrero) %>%
  pull(lambda_guerrero)

souvenirs %>% features(box_cox(Sales,souv_lambda)
                       , unitroot_nsdiffs)

## # A tibble: 1 x 1
##   nsdiffs
##     <int>
## 1       1

souvenirs %>% gg_tsdisplay(difference(box_cox(Sales, souv_lambda)
                                      , 12)
                           , plot_type='partial'
                           , lag = 36) +
  labs(title = TeX(paste0("Monthly Souvenir Sales using $\\lambda$ = ",
         round(souv_lambda,4))))

## Warning: Removed 12 row(s) containing missing values (geom_path).

## Warning: Removed 12 rows containing missing values (geom_point).

For the souvenirs data, write down the differences you chose above using backshift operator notation.

With a backshift notation of: Byt=yt−1 and using the unitroot_nsdiffs test, it looks like we need a difference of 1, so the notation would be ((1 - B)^1)*yt

For your retail data (from Exercise 8 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

#View(aus_retail)
set.seed(1234)
aus_turnover <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1)) 

aus_turnover %>%
  gg_tsdisplay(Turnover, plot_type='partial', lag = 36) +
  labs(title = "Non-transformed Retail Turnover")

turnover_lambda <- aus_turnover %>% features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

aus_turnover %>% features(box_cox(Turnover, turnover_lambda)
                          , unitroot_nsdiffs)

## # A tibble: 1 x 3
##   State    Industry                                      nsdiffs
##   <chr>    <chr>                                           <int>
## 1 Tasmania Cafes, restaurants and takeaway food services       1

aus_turnover %>% 
  gg_tsdisplay(difference(box_cox(Turnover,turnover_lambda)
                          , 12)
               , plot_type='partial'
               , lag = 36) +
  labs(title = TeX(paste0("Differenced Tasmanian Accomodation Takings with $\\lambda$ = ",
         round(turnover_lambda,4))))

## Warning: Removed 12 row(s) containing missing values (geom_path).

## Warning: Removed 12 rows containing missing values (geom_point).

Simulate and plot some data from simple ARIMA models.

Use the following R code to generate data from an AR(1) model with ϕ1=0.6 and σ2=1 . The process starts with y1=0 .

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

Produce a time plot for the series. How does the plot change as you change ϕ1?

sim %>% autoplot(.vars=y)

#0
for(i in 2:100)
  y[i] <- 0*y[i-1] + e[i]
sim2 <- tsibble(idx = seq_len(100), y = y, index = idx)
sim2 %>% autoplot(.vars=y)

#2
for(i in 2:100)
  y[i] <- 2*y[i-1] + e[i]
sim3 <- tsibble(idx = seq_len(100), y = y, index = idx)
sim3 %>% autoplot(.vars=y)

#-0.45
for(i in 2:100)
  y[i] <- -.45*y[i-1] + e[i]
sim4 <- tsibble(idx = seq_len(100), y = y, index = idx)
sim4 %>% autoplot(.vars=y)

Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1.

for(i in 2:100)
  y[i] <- e[i] + 0.6 * e[i-1]

sim6c <- tsibble(idx = seq_len(100), y = y, index = idx)
sim6c %>% autoplot(.vars=y)

Produce a time plot for the series. How does the plot change as you change θ1 ?

sim6c %>% autoplot(.vars=y)

# 0
for(i in 2:100)
  y[i] <- e[i] + 0 * e[i-1]

sim6 <- tsibble(idx = seq_len(100), y = y, index = idx)
sim6 %>% autoplot(.vars=y)

# 2
for(i in 2:100)
  y[i] <- e[i] + 2 * e[i-1]

sim7 <- tsibble(idx = seq_len(100), y = y, index = idx)
sim7 %>% autoplot(.vars=y)

# -0.45
for(i in 2:100)
  y[i] <- e[i] + -0.45 * e[i-1]

sim8 <- tsibble(idx = seq_len(100), y = y, index = idx)
sim8 %>% autoplot(.vars=y)

Generate data from an ARMA(1,1) model with ϕ1=0.6 , θ1=0.6 and σ2=1.

set.seed(1234)
y <- numeric(100)

for(i in 2:100)
  y[i] <- e[i] + 0.6 * e[i-1] + 0.6 * y[i-1]

arma6e <- tsibble(idx = seq_len(100), y = y, index = idx)
arma6e %>% autoplot(.vars=y)

Generate data from an AR(2) model with ϕ1=−0.8 , ϕ2=0.3 and σ2=1. (Note that these parameters will give a non-stationary series.)

set.seed(1234)
y <- numeric(100)

for(i in 2:100)
  y[i] <- e[i] + -0.8 * e[i-1] + 0.3 * y[i-1]

ar6f <- tsibble(idx = seq_len(100), y = y, index = idx)
ar6f %>% autoplot(.vars=y)

Graph the latter two series and compare them.

arma6e %>%
  gg_tsdisplay(y, plot_type='partial') +
  labs(title = TeX("ARMA(1,1) model with $\\phi_1 = 0.6$, $\\theta_1 = 0.6$, and $\\sigma^2 = 1$"))

ar6f %>%
  gg_tsdisplay(y, plot_type='partial') +
  labs(title = TeX("AR(2) model with $\\phi_1 = -0.8$, $\\phi_2 = 0.3$, and $\\sigma^2 = 1$"))

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

aus_pass <- aus_airpassengers %>% filter(Year < 2012)
aus_pass_arima <- aus_pass %>% model(ARIMA(Passengers))
report(aus_pass_arima)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8756
## s.e.   0.0722
## 
## sigma^2 estimated as 4.671:  log likelihood=-87.8
## AIC=179.61   AICc=179.93   BIC=182.99

aus_pass_arima %>% forecast(h=10) %>%
  autoplot(aus_airpassengers) + 
  labs(title="Australian Air Passengers Forecast (using ARIMA(0,2,1))") + 
  theme(plot.title = element_text(hjust = 0.5))

Write the model in terms of the backshift operator.

(1−B)2yt=c+(1+θ1B)et

Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

aus_pass2 <-aus_airpassengers %>% filter(Year < 2012)
aus_pass_arima2 <-  aus_pass2 %>% model(ARIMA(Passengers ~ pdq(0,1,0)))

aus_pass_arima2 %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Air Passengers Forecast (using ARIMA(0,1,0))") +
  theme(plot.title = element_text(hjust = 0.5))

aus_pass_arima2 %>% 
  gg_tsresiduals() +
  labs(title = "Residuals for ARIMA(0,1,0)")

Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

aus_pass3 <- aus_airpassengers %>% filter(Year < 2012)
aus_pass_arima3 <- aus_pass3 %>% model(ARIMA(Passengers~pdq(2,1,2))
                                       )
aus_pass_arima3 %>% forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Air Passengers Forecast using ARIMA(2,1,2)")

aus_pass_arima3 %>% gg_tsresiduals() + labs(title = "Residuals for Australian Air Passengers using ARIMA(2,1,2)")

aus_pass4 <- aus_airpassengers %>% filter(Year < 2012)
aus_pass_arima4 <- aus_pass4 %>% model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))

## Warning: 1 error encountered for ARIMA(Passengers ~ 0 + pdq(2, 1, 2))
## [1] non-stationary AR part from CSS

Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

The upward trend continues.

aus_pass5 <- aus_airpassengers %>%
  filter(Year < 2012)
aus_pass_arima5 <- aus_pass5 %>% model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))

## Warning: Model specification induces a quadratic or higher order polynomial trend. 
## This is generally discouraged, consider removing the constant or reducing the number of differences.

aus_pass_arima5 %>%
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Air Passengers Forecast using ARIMA(2,1,2) with constant")

For the United States GDP series (from global_economy):

if necessary, find a suitable Box-Cox transformation for the data;

us_gdp <- global_economy %>% 
  filter(Code=="USA") 
us_gdp %>% gg_tsdisplay(GDP, plot_type = 'partial') + labs(title = "United States GDP")

us_gdp_lambda <- us_gdp %>% features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

fit a suitable ARIMA model to the transformed data using ARIMA();

us_gdp_arima <- us_gdp %>% model(ARIMA(box_cox(GDP, us_gdp_lambda)))
report(us_gdp_arima)

## Series: GDP 
## Model: ARIMA(1,1,0) w/ drift 
## Transformation: box_cox(GDP, us_gdp_lambda) 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

try some other plausible models by experimenting with the orders chosen;

us_gdp %>% gg_tsdisplay(box_cox(GDP,us_gdp_lambda), plot_type='partial') +
  labs(title = "Box Cox Transformed United States GDP")

us_gdp %>% features(box_cox(GDP,us_gdp_lambda), unitroot_ndiffs)

## # A tibble: 1 x 2
##   Country       ndiffs
##   <fct>          <int>
## 1 United States      1

gdp_arimas <- us_gdp %>%
  model(arima110 = ARIMA(box_cox(GDP,us_gdp_lambda) ~ pdq(1,1,0)),
        arima120 = ARIMA(box_cox(GDP,us_gdp_lambda) ~ pdq(1,2,0)),
        arima210 = ARIMA(box_cox(GDP,us_gdp_lambda) ~ pdq(2,1,0)),
        arima212 = ARIMA(box_cox(GDP,us_gdp_lambda) ~ pdq(2,1,2)),
        arima111 = ARIMA(box_cox(GDP,us_gdp_lambda) ~ pdq(1,1,1))
        )

glance(gdp_arimas) %>% arrange(AICc) %>% select(.model:BIC)

## # A tibble: 5 x 6
##   .model   sigma2 log_lik   AIC  AICc   BIC
##   <chr>     <dbl>   <dbl> <dbl> <dbl> <dbl>
## 1 arima120  6780.   -326.  656.  656.  660.
## 2 arima110  5479.   -325.  657.  657.  663.
## 3 arima111  5580.   -325.  659.  659.  667.
## 4 arima210  5580.   -325.  659.  659.  667.
## 5 arima212  5734.   -325.  662.  664.  674.

choose what you think is the best model and check the residual diagnostics;

gdp_arimas %>% select(arima120) %>% gg_tsresiduals()

e. produce forecasts of your fitted model. Do the forecasts look reasonable?

gdp_arimas %>% forecast(h=10) %>%
  filter(.model=='arima120') %>%
  autoplot(global_economy) + labs(title = "Forecasted United States GDP using ARIMA(1,2,0)") +
  theme(plot.title = element_text(hjust = 0.5))

compare the results with what you would obtain using ETS() (with no transformation).

ets_compare <- us_gdp %>% model(ETS(GDP))
report(ets_compare)

## Series: GDP 
## Model: ETS(M,A,N) 
##   Smoothing parameters:
##     alpha = 0.9990876 
##     beta  = 0.5011949 
## 
##   Initial states:
##          l[0]        b[0]
##  448093333334 64917355687
## 
##   sigma^2:  7e-04
## 
##      AIC     AICc      BIC 
## 3190.787 3191.941 3201.089

ets_compare %>% forecast(h=10) %>%
  autoplot(global_economy) + labs(title = "Forecasted United States GDP using ETS") +
  theme(plot.title = element_text(hjust = 0.5))

Consider aus_arrivals, the quarterly number of international visitors to Australia from several countries for the period 1981 Q1 – 2012 Q3.

Select one country and describe the time plot. There was an increasing trend of arrivals throughout the 90s but started to decline in the early 2000s.

#View(aus_arrivals)
aus_jpn <- aus_arrivals %>% filter(Origin=='Japan')
aus_jpn %>% autoplot(.vars=Arrivals) + labs(title = "Australian Arrivals from Japan") +
  theme(plot.title = element_text(hjust = 0.5))

c. What can you learn from the ACF graph of the differenced data?

It looks like the data are not white noise since the bars exceed the bounds.

ggAcf(aus_jpn$Arrivals, lag.max = 12, 
    type = c("correlation", 
             "covariance", 
             "partial"), 
    plot = TRUE, 
    na.action = na.contiguous, 
    demean = TRUE
    )

What can you learn from the PACF graph of the differenced data? Like part c, it looks like the data are white noise but the decrease as the lag increases.

ggPacf(aus_jpn$Arrivals, 
     lag.max = 12, 
     plot = TRUE, 
     na.action = na.contiguous, 
     demean = TRUE
     )

Choose a series from us_employment, the total employment in different industries in the United States.

Produce an STL decomposition of the data and describe the trend and seasonality. There does not seem to be any predictable trend or seasonality.

#View(us_employment)
us_employment %>% filter(Title == "Mining and Logging") %>%
  model(STL(Employed)) %>%
  components() %>% autoplot()

Do the data need transforming? If so, find a suitable transformation. Yes, I took the log of the Employed values.

us_employment %>% filter(Title == "Mining and Logging") %>%
  model(STL(log(Employed))) %>%
  components() %>% autoplot()

Are the data stationary? If not, find an appropriate differencing which yields stationary data.

us_employment %>% filter(Title == "Mining and Logging")  %>%
  autoplot(Employed)

us_employment %>% filter(Title == "Mining and Logging") %>%
  features(Employed, unitroot_kpss)

## # A tibble: 1 x 3
##   Series_ID     kpss_stat kpss_pvalue
##   <chr>             <dbl>       <dbl>
## 1 CEU1000000001      2.83        0.01

us_employment %>% filter(Title == "Mining and Logging") %>%
  features(Employed, unitroot_ndiffs)

## # A tibble: 1 x 2
##   Series_ID     ndiffs
##   <chr>          <int>
## 1 CEU1000000001      1

us_employment %>% filter(Title == "Mining and Logging") %>%
  mutate(log_employed = difference(log(Employed), 12)) %>%
  features(log_employed, unitroot_ndiffs)

## # A tibble: 1 x 2
##   Series_ID     ndiffs
##   <chr>          <int>
## 1 CEU1000000001      0

Identify a couple of ARIMA models that might be useful in describing the time series. Which of your models is the best according to their AICc values? The stepwise and auto produce the same results (same AIC values) so either model could be chosen.

us_employment %>%
  filter(Title == "Mining and Logging")  %>%
  gg_tsdisplay(difference(Employed), plot_type='partial')

## Warning: Removed 1 row(s) containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_point).

us_fit <- us_employment %>%
  filter(Title == "Mining and Logging") %>%
  model(arima310 = ARIMA(Employed ~ pdq(3,1,0)),
        arima210 = ARIMA(Employed ~ pdq(2,1,0)),
        stepwise = ARIMA(Employed),
        auto = ARIMA(Employed, stepwise=FALSE))
report (us_fit)

## Warning in report.mdl_df(us_fit): Model reporting is only supported for
## individual models, so a glance will be shown. To see the report for a specific
## model, use `select()` and `filter()` to identify a single model.

## # A tibble: 4 x 9
##   Series_ID     .model   sigma2 log_lik   AIC  AICc   BIC ar_roots   ma_roots 
##   <chr>         <chr>     <dbl>   <dbl> <dbl> <dbl> <dbl> <list>     <list>   
## 1 CEU1000000001 arima310   910.  -4670. 9351. 9351. 9381. <cpl [27]> <cpl [0]>
## 2 CEU1000000001 arima210   977.  -4704. 9416. 9416. 9436. <cpl [14]> <cpl [0]>
## 3 CEU1000000001 stepwise   900.  -4664. 9342. 9342. 9376. <cpl [28]> <cpl [0]>
## 4 CEU1000000001 auto       900.  -4664. 9342. 9342. 9376. <cpl [28]> <cpl [0]>

Estimate the parameters of your best model and do diagnostic testing on the residuals. Do the residuals resemble white noise? If not, try to find another ARIMA model which fits better. Generally, the data are white noise as the ACF plot shows the bars settling within the bounds. The residuals are distributed normally, with some outliers.

fit2 <- us_employment %>%
  filter(Title == "Mining and Logging") %>%
  model(ARIMA(Employed)) %>%
  report(fit2)

## Series: Employed 
## Model: ARIMA(4,1,0)(2,0,0)[12] 
## 
## Coefficients:
##           ar1      ar2      ar3     ar4    sar1    sar2
##       -0.3209  -0.2162  -0.0969  0.1100  0.1188  0.2564
## s.e.   0.0321   0.0336   0.0343  0.0328  0.0327  0.0327
## 
## sigma^2 estimated as 900.2:  log likelihood=-4664.08
## AIC=9342.15   AICc=9342.27   BIC=9376.28

fit2 %>% gg_tsresiduals()

augment(fit2) %>% features(.innov, ljung_box, lag=10, dof=2)

## # A tibble: 1 x 4
##   Series_ID     .model          lb_stat lb_pvalue
##   <chr>         <chr>             <dbl>     <dbl>
## 1 CEU1000000001 ARIMA(Employed)    4.60     0.800

Forecast the next 3 years of data. Get the latest figures from https://fred.stlouisfed.org/categories/11 to check the accuracy of your forecasts.

ml_arima <- us_employment %>%
  filter(Title == "Mining and Logging") %>%
  model(ARIMA(Employed))

ml_arima %>%
  forecast(h = "3 years") %>%
  autoplot(us_employment, level = NULL) + labs(title = "US Mining & Logging Employment Forecast") +
  theme(plot.title = element_text(hjust = 0.5))

Eventually, the prediction intervals are so wide that the forecasts are not particularly useful. How many years of forecasts do you think are sufficiently accurate to be usable? I would say that 1 year would be an accurate forecast.

pred_int <- ml_arima %>%
  forecast(h = "3 years") %>%
  mutate(PI = hilo(Employed, level = 95))
pred_int

## # A fable: 36 x 6 [1M]
## # Key:     Series_ID, .model [1]
##    Series_ID     .model          Month     Employed .mean                     PI
##    <chr>         <chr>           <mth>       <dist> <dbl>                 <hilo>
##  1 CEU1000000001 ARIMA(Emplo~ 2019 Oct  N(759, 900)  759. [699.8304, 817.4398]95
##  2 CEU1000000001 ARIMA(Emplo~ 2019 Nov N(759, 1315)  759. [687.4222, 829.5843]95
##  3 CEU1000000001 ARIMA(Emplo~ 2019 Dec N(757, 1603)  757. [678.3775, 835.3445]95
##  4 CEU1000000001 ARIMA(Emplo~ 2020 Jan N(753, 1901)  753. [667.7366, 838.6365]95
##  5 CEU1000000001 ARIMA(Emplo~ 2020 Feb N(756, 2390)  756. [660.2835, 851.9288]95
##  6 CEU1000000001 ARIMA(Emplo~ 2020 Mar N(759, 2781)  759. [655.3179, 862.0389]95
##  7 CEU1000000001 ARIMA(Emplo~ 2020 Apr N(761, 3145)  761. [651.2540, 871.0766]95
##  8 CEU1000000001 ARIMA(Emplo~ 2020 May N(765, 3520)  765. [648.4519, 881.0059]95
##  9 CEU1000000001 ARIMA(Emplo~ 2020 Jun N(770, 3927)  770. [646.8510, 892.4950]95
## 10 CEU1000000001 ARIMA(Emplo~ 2020 Jul N(771, 4314)  771. [642.2205, 899.6761]95
## # ... with 26 more rows

Choose one of the following seasonal time series: the Australian production of electricity, cement, or gas (from aus_production).

Do the data need transforming? If so, find a suitable transformation. The data seems to be decent and no need for a transformation. I will do a log transformation to test.

#View(aus_production)
aus_production %>% autoplot(Electricity) + labs(title = "Australian Electricity Production") +
  theme(plot.title = element_text(hjust = 0.5))

aus_elec <- aus_production %>% select("Quarter", "Electricity")
aus_elec$Electricity <- log(aus_elec$Electricity)
aus_elec %>% autoplot(.vars=Electricity) +
  labs(title = "Australian Electricity Production with Log Transformation") +
  theme(plot.title = element_text(hjust = 0.5))

Are the data stationary? If not, find an appropriate differencing which yields stationary data.

aus_elec %>%
  features(Electricity, unitroot_kpss)

## # A tibble: 1 x 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1      4.26        0.01

aus_elec %>% 
  features(Electricity, unitroot_ndiffs)

## # A tibble: 1 x 1
##   ndiffs
##    <int>
## 1      2

aus_elec %>%
  mutate(log_elec = difference(log(Electricity), 12)) %>%
  features(log_elec, unitroot_ndiffs)

## # A tibble: 1 x 1
##   ndiffs
##    <int>
## 1      1

Identify a couple of ARIMA models that might be useful in describing the time series. Which of your models is the best according to their AIC values? The ARIMA(3,1,0) model performs best as it has the lowest AIC value

aus_elec %>%
  gg_tsdisplay(difference(Electricity), plot_type='partial')

## Warning: Removed 1 row(s) containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_point).

elec_fit <- aus_elec %>%
  model(arima310 = ARIMA(Electricity ~ pdq(3,1,0)),
        arima210 = ARIMA(Electricity ~ pdq(2,1,0)),
        stepwise = ARIMA(Electricity),
        auto = ARIMA(Electricity, stepwise=FALSE))
report (elec_fit)

## Warning in report.mdl_df(elec_fit): Model reporting is only supported for
## individual models, so a glance will be shown. To see the report for a specific
## model, use `select()` and `filter()` to identify a single model.

## # A tibble: 4 x 8
##   .model     sigma2 log_lik    AIC   AICc    BIC ar_roots   ma_roots 
##   <chr>       <dbl>   <dbl>  <dbl>  <dbl>  <dbl> <list>     <list>   
## 1 arima310 0.000357    545. -1076. -1076. -1053. <cpl [11]> <cpl [4]>
## 2 arima210 0.000350    546. -1080. -1080. -1060. <cpl [6]>  <cpl [8]>
## 3 stepwise 0.000359    543. -1079. -1078. -1065. <cpl [1]>  <cpl [5]>
## 4 auto     0.000347    547. -1083. -1082. -1063. <cpl [8]>  <cpl [9]>

Estimate the parameters of your best model and do diagnostic testing on the residuals. Do the residuals resemble white noise? If not, try to find another ARIMA model which fits better.

fit11d <- aus_elec %>%
  model(ARIMA(Electricity)) %>%
  report(fit11d)

## Series: Electricity 
## Model: ARIMA(1,1,1)(0,1,1)[4] 
## 
## Coefficients:
##          ar1      ma1     sma1
##       0.4537  -0.8002  -0.6075
## s.e.  0.1320   0.0969   0.0527
## 
## sigma^2 estimated as 0.0003586:  log likelihood=543.29
## AIC=-1078.59   AICc=-1078.4   BIC=-1065.14

fit11d %>% gg_tsresiduals()

augment(fit11d) %>% features(.innov, ljung_box, lag=10, dof=2)

## # A tibble: 1 x 3
##   .model             lb_stat lb_pvalue
##   <chr>                <dbl>     <dbl>
## 1 ARIMA(Electricity)    12.8     0.119

Forecast the next 24 months of data using your preferred model.

elec_arima <- aus_elec %>%
  model(ARIMA(Electricity))

elec_arima %>%
  forecast(h = "2 years") %>%
  autoplot(aus_elec, level = NULL) + labs(title = "24 Month Australian Electricity Production Forecast") +
  theme(plot.title = element_text(hjust = 0.5))

Compare the forecasts obtained using ETS().

ets_compare11 <- aus_elec %>% model(ETS(Electricity))
report(ets_compare11)

## Series: Electricity 
## Model: ETS(M,A,A) 
##   Smoothing parameters:
##     alpha = 0.5160242 
##     beta  = 0.02948315 
##     gamma = 0.3091674 
## 
##   Initial states:
##      l[0]       b[0]        s[0]      s[-1]      s[-2]      s[-3]
##  8.328679 0.02053009 -0.03579961 0.08180292 0.02721019 -0.0732135
## 
##   sigma^2:  0
## 
##       AIC      AICc       BIC 
## -546.2358 -545.3705 -515.7754

ets_compare11 %>% forecast(h=24) %>%
  autoplot(aus_elec) + labs(title = "Forecasted Australian Electricity Production using ETS") +
  theme(plot.title = element_text(hjust = 0.5))

For the same time series you used in the previous exercise, try using a non-seasonal model applied to the seasonally adjusted data obtained from STL. Compare the forecasts with those obtained in the previous exercise. Which do you think is the best approach?

aus_elec_decomp <- stl(aus_elec, t.window = 13, s.window = "periodic")

aus_elec_sznadj <- seasadj(aus_elec_decomp)

aus_elec_stl <- stlf(aus_elec_sznadj, method = "arima")
aus_elec_fc <- forecast(aus_elec_stl)
plot(aus_elec_fc)

For the Australian tourism data (from tourism):

Fit ARIMA models for each time series.

#View(tourism)
aus_tour13 <- tourism %>% model(ARIMA(Trips))

## Warning in sqrt(diag(best$var.coef)): NaNs produced

report(aus_tour13)

## Warning in report.mdl_df(aus_tour13): Model reporting is only supported for
## individual models, so a glance will be shown. To see the report for a specific
## model, use `select()` and `filter()` to identify a single model.

## # A tibble: 304 x 11
##    Region         State Purpose .model sigma2 log_lik   AIC  AICc   BIC ar_roots
##    <chr>          <chr> <chr>   <chr>   <dbl>   <dbl> <dbl> <dbl> <dbl> <list>  
##  1 Adelaide       Sout~ Busine~ ARIMA~ 1.19e3   -395.  799.  799.  808. <cpl [4~
##  2 Adelaide       Sout~ Holiday ARIMA~ 5.38e2   -365.  738.  738.  747. <cpl [4~
##  3 Adelaide       Sout~ Other   ARIMA~ 1.86e2   -318.  642.  643.  650. <cpl [0~
##  4 Adelaide       Sout~ Visiti~ ARIMA~ 8.79e2   -384.  776.  776.  785. <cpl [4~
##  5 Adelaide Hills Sout~ Busine~ ARIMA~ 1.85e1   -230.  463.  463.  468. <cpl [0~
##  6 Adelaide Hills Sout~ Holiday ARIMA~ 3.80e1   -256.  516.  516.  521. <cpl [0~
##  7 Adelaide Hills Sout~ Other   ARIMA~ 2.11e0   -141.  293.  293.  304. <cpl [0~
##  8 Adelaide Hills Sout~ Visiti~ ARIMA~ 1.09e2   -298.  599.  599.  604. <cpl [0~
##  9 Alice Springs  Nort~ Busine~ ARIMA~ 4.29e1   -260.  527.  527.  534. <cpl [0~
## 10 Alice Springs  Nort~ Holiday ARIMA~ 1.01e2   -285.  575.  576.  582. <cpl [0~
## # ... with 294 more rows, and 1 more variable: ma_roots <list>

aus_tour13 %>% forecast(h=10) %>%
  autoplot(tourism)

Produce forecasts of your fitted models.

aus_tour13 %>% forecast(h=10) %>%
  autoplot(tourism)

Check the forecasts for the “Snowy Mountains” and “Melbourne” regions. Do they look reasonable?

aus_tour13 %>% forecast(h=10) %>%
  filter(Region=="Snowy Mountains") %>%
  autoplot(tourism)

aus_tour13 %>% forecast(h=10) %>%
  filter(Region=="Melbourne") %>%
  autoplot(tourism)

For your retail time series (Exercise 5 above):

develop an appropriate seasonal ARIMA model;

#View(aus_turnover)
fit14 <- aus_turnover %>%
  model(
    arima012011 = ARIMA(Turnover ~ pdq(0,1,2) + PDQ(0,1,1)),
    arima210011 = ARIMA(Turnover ~ pdq(2,1,0) + PDQ(0,1,1)),
    auto = ARIMA(Turnover, stepwise = FALSE, approx = FALSE)
  )
#fit14 %>% pivot_longer(everything(), names_to = "Model name",
#                     values_to = "Orders")  
glance(fit14) %>% arrange(AICc) %>% select(.model:BIC)

## # A tibble: 3 x 6
##   .model      sigma2 log_lik   AIC  AICc   BIC
##   <chr>        <dbl>   <dbl> <dbl> <dbl> <dbl>
## 1 auto          1.98   -758. 1527. 1527. 1551.
## 2 arima012011   2.08   -767. 1542. 1542. 1558.
## 3 arima210011   2.10   -769. 1546. 1546. 1562.

fit14 %>% select(auto) %>% gg_tsresiduals(lag=36)

b. compare the forecasts with those you obtained in earlier chapters;

fit14 %>% forecast(h=1) %>% autoplot()

Consider the number of Snowshoe Hare furs traded by the Hudson Bay Company between 1845 and 1935 (data set pelt).

Produce a time plot of the time series.

#View(pelt)
hare <- pelt %>% select("Year", "Hare")
#View(hare)
pelt %>% select("Year", "Hare") %>% autoplot(.vars=Hare) +
  labs(title = "Hare Trades") +
  theme(plot.title = element_text(hjust = 0.5))

Assume you decide to fit the following model: yt=c+ϕ1yt−1+ϕ2yt−2+ϕ3yt−3+ϕ4yt−4+εt, where εt is a white noise series. What sort of ARIMA model is this (i.e., what are p , d, and q)?

fit15b <- pelt %>%
  model(ARIMA(Hare)) %>%
  report(fit15b)

## Series: Hare 
## Model: ARIMA(2,0,1) w/ mean 
## 
## Coefficients:
##          ar1      ar2      ma1   constant
##       1.3811  -0.7120  -0.5747  14936.234
## s.e.  0.1016   0.0784   0.1240   1072.718
## 
## sigma^2 estimated as 583278195:  log likelihood=-1046.07
## AIC=2102.15   AICc=2102.85   BIC=2114.7

P = 2 | D = 0 | Q = 1

By examining the ACF and PACF of the data, explain why this model is appropriate.

#ggtsdisplay(diff(hare))
gg_tsdisplay(hare, plot_type='partial')

## Plot variable not specified, automatically selected `y = Hare`

The ACF shows decreasing autocorrelation and PACF shows that lags 1 and 2 are above the bounds.

The last five values of the series are given below: The estimated parameters are c=30993 , ϕ1=0.82 , ϕ2=−0.29 , ϕ3=−0.01 , and ϕ4=−0.22. Without using the forecast() function, calculate forecasts for the next three years (1936–1939).

hare1 = 30993 + 19520 + 0.82*(19250-82110) - 0.29*(82110-89760) - 0.01*(89760-81660) - 0.22*(81660-15760)
hare2 = hare1 + 0.82*(hare1 - 19250) - 0.29*(19250-82110) - 0.01*(82110-89760) - 0.22*(89760-81660)
hare3 = hare2 + 0.82*(hare2-hare1) - 0.29*(hare1-19250) - 0.01*(19250-82110) - 0.22*(82110-89760)
c(hare1, hare2, hare3)

## [1] -13392.70 -23635.81 -20257.18

Now fit the model in R and obtain the forecasts using forecast(). How are they different from yours? Why? My manual calculation must be miscalculated because it was negative and you can have negative hare fur trades. They are also very off from the forecasted values.

#hare <- as.matrix(hare)
#hare_fc <- forecast(ARIMA(hare ~ pdq(3,1,0)))
hare_fc <- hare %>% model(ARIMA(Hare))
hare_fc %>% forecast(h=3)

## # A fable: 3 x 4 [1Y]
## # Key:     .model [1]
##   .model       Year              Hare  .mean
##   <chr>       <dbl>            <dist>  <dbl>
## 1 ARIMA(Hare)  1936  N(6634, 5.8e+08)  6634.
## 2 ARIMA(Hare)  1937 N(12878, 9.6e+08) 12878.
## 3 ARIMA(Hare)  1938 N(27999, 1.1e+09) 27999.

report(hare_fc)

## Series: Hare 
## Model: ARIMA(2,0,1) w/ mean 
## 
## Coefficients:
##          ar1      ar2      ma1   constant
##       1.3811  -0.7120  -0.5747  14936.234
## s.e.  0.1016   0.0784   0.1240   1072.718
## 
## sigma^2 estimated as 583278195:  log likelihood=-1046.07
## AIC=2102.15   AICc=2102.85   BIC=2114.7

#hare_fc %>% autoplot()

The population of Switzerland from 1960 to 2017 is in data set global_economy.

Produce a time plot of the data.

#View(global_economy)
swiss_pop <- global_economy %>% filter(Country=="Switzerland")
swiss_pop %>% autoplot(.vars=Population) +
  labs(title = "Swiss Population") +
  theme(plot.title = element_text(hjust = 0.5))

Assume you decide to fit the following model: yt=c+ϕ1yt−1+ϕ2yt−2+ϕ3yt−3+ϕ4yt−4+εt, where εt is a white noise series. What sort of ARIMA model is this (i.e., what are p , d, and q)?

fit16b <- swiss_pop %>%
  model(ARIMA(Population)) %>%
  report(fit16b)

## Series: Population 
## Model: ARIMA(1,1,1) w/ drift 
## 
## Coefficients:
##          ar1     ma1   constant
##       0.8152  0.8944  10956.514
## s.e.  0.0750  0.0950   2508.242
## 
## sigma^2 estimated as 121695933:  log likelihood=-611.7
## AIC=1231.4   AICc=1232.17   BIC=1239.58

P = 1 | D = 1 | Q = 1

Explain why this model was chosen using the ACF and PACF of the differenced series.

#ggtsdisplay(diff(hare))
gg_tsdisplay(swiss_pop, y=Population, plot_type='partial')

The last five values of the series are given below. The estimated parameters are c=0.0053 , ϕ1=1.64 , ϕ2=−1.17 , and ϕ3=0.45. Without using the forecast() function, calculate forecasts for the next three years (2018–2020).

pop1 = 0.0053 + 8.09 + 1.64*(8.09-8.19) - 1.17*(8.19-8.28) + 0.45*(8.28-8.37) 
pop2 = pop1 + 1.64*(pop1-8.09) - 1.17*(8.09-8.19) + 0.45*(8.19-8.28)
pop3 = pop2 + 1.64*(pop2-pop1) - 1.17*(pop1-8.09) + 0.45*(8.09-8.19)
c(pop1, pop2, pop3)

## [1] 7.996100 7.918604 7.856374

Now fit the model in R and obtain the forecasts from the same model. How are they different from yours? Why?

#pop_fc <- forecast(ARIMA(swiss_pop ~ pdq(3,1,0)))
pop_fc <- swiss_pop %>% model(ARIMA(Population))
pop_fc %>% forecast(h=3)

## # A fable: 3 x 5 [1Y]
## # Key:     Country, .model [1]
##   Country     .model             Year          Population    .mean
##   <fct>       <chr>             <dbl>              <dist>    <dbl>
## 1 Switzerland ARIMA(Population)  2018 N(8561084, 1.2e+08) 8561084.
## 2 Switzerland ARIMA(Population)  2019   N(8649542, 1e+09) 8649542.
## 3 Switzerland ARIMA(Population)  2020 N(8732611, 3.1e+09) 8732611.

report(pop_fc)

## Series: Population 
## Model: ARIMA(1,1,1) w/ drift 
## 
## Coefficients:
##          ar1     ma1   constant
##       0.8152  0.8944  10956.514
## s.e.  0.0750  0.0950   2508.242
## 
## sigma^2 estimated as 121695933:  log likelihood=-611.7
## AIC=1231.4   AICc=1232.17   BIC=1239.58

#hare_fc %>% autoplot()

Before doing this exercise, you will need to install the Quandl package in R using

#install.packages("Quandl")
#library("Quandl")

Select a time series from Quandl. Then copy its short URL and import the data using

#y <- as_tsibble(Quandl::Quandl("https://data.nasdaq.com/tables/RTAT/NDAQ-RTAT10"
#                              , api_key = G5-RhwAvaZvKwTcxTVwN), index = Date)

Plot graphs of the data, and try to identify an appropriate ARIMA model.

aus_production %>% autoplot(Electricity) + labs(title = "Australian Electricity Production") +
  theme(plot.title = element_text(hjust = 0.5))

aus_elec <- aus_production %>% select("Quarter", "Electricity")
aus_elec$Electricity <- log(aus_elec$Electricity)
aus_elec %>% autoplot(.vars=Electricity) +
  labs(title = "Australian Electricity Production with Log Transformation") +
  theme(plot.title = element_text(hjust = 0.5))

aus_elec %>%
  gg_tsdisplay(difference(Electricity), plot_type='partial')

## Warning: Removed 1 row(s) containing missing values (geom_path).

## Warning: Removed 1 rows containing missing values (geom_point).

elec_fit <- aus_elec %>%
  model(arima310 = ARIMA(Electricity ~ pdq(3,1,0)),
        arima210 = ARIMA(Electricity ~ pdq(2,1,0)),
        stepwise = ARIMA(Electricity),
        auto = ARIMA(Electricity, stepwise=FALSE))
report (elec_fit)

## Warning in report.mdl_df(elec_fit): Model reporting is only supported for
## individual models, so a glance will be shown. To see the report for a specific
## model, use `select()` and `filter()` to identify a single model.

## # A tibble: 4 x 8
##   .model     sigma2 log_lik    AIC   AICc    BIC ar_roots   ma_roots 
##   <chr>       <dbl>   <dbl>  <dbl>  <dbl>  <dbl> <list>     <list>   
## 1 arima310 0.000357    545. -1076. -1076. -1053. <cpl [11]> <cpl [4]>
## 2 arima210 0.000350    546. -1080. -1080. -1060. <cpl [6]>  <cpl [8]>
## 3 stepwise 0.000359    543. -1079. -1078. -1065. <cpl [1]>  <cpl [5]>
## 4 auto     0.000347    547. -1083. -1082. -1063. <cpl [8]>  <cpl [9]>

Do residual diagnostic checking of your ARIMA model. Are the residuals white noise?

fit17c <- aus_elec %>%
  model(ARIMA(Electricity)) %>%
  report(fit17c)

## Series: Electricity 
## Model: ARIMA(1,1,1)(0,1,1)[4] 
## 
## Coefficients:
##          ar1      ma1     sma1
##       0.4537  -0.8002  -0.6075
## s.e.  0.1320   0.0969   0.0527
## 
## sigma^2 estimated as 0.0003586:  log likelihood=543.29
## AIC=-1078.59   AICc=-1078.4   BIC=-1065.14

fit17c %>% gg_tsresiduals()

augment(fit17c) %>% features(.innov, ljung_box, lag=10, dof=2)

## # A tibble: 1 x 3
##   .model             lb_stat lb_pvalue
##   <chr>                <dbl>     <dbl>
## 1 ARIMA(Electricity)    12.8     0.119

Use your chosen ARIMA model to forecast the next four years.

#fit17c %>% forecast(h=4) %>% autoplot()
elec_arima <- aus_elec %>%
  model(ARIMA(Electricity))

elec_arima %>%
  forecast(h = "4 years") %>%
  autoplot(aus_elec, level = NULL) + labs(title = "4 Year Australian Electricity Production Forecast") +
  theme(plot.title = element_text(hjust = 0.5))

Now try to identify an appropriate ETS model.

ets_compare17 <- aus_elec %>% model(ETS(Electricity))
report(ets_compare17)

## Series: Electricity 
## Model: ETS(M,A,A) 
##   Smoothing parameters:
##     alpha = 0.5160242 
##     beta  = 0.02948315 
##     gamma = 0.3091674 
## 
##   Initial states:
##      l[0]       b[0]        s[0]      s[-1]      s[-2]      s[-3]
##  8.328679 0.02053009 -0.03579961 0.08180292 0.02721019 -0.0732135
## 
##   sigma^2:  0
## 
##       AIC      AICc       BIC 
## -546.2358 -545.3705 -515.7754

Do residual diagnostic checking of your ETS model. Are the residuals white noise?

ets_compare17 %>% gg_tsresiduals()

augment(ets_compare17) %>% features(.innov, ljung_box, lag=10, dof=2)

## # A tibble: 1 x 3
##   .model           lb_stat lb_pvalue
##   <chr>              <dbl>     <dbl>
## 1 ETS(Electricity)    16.8    0.0323

Use your chosen ETS model to forecast the next four years.

ets_compare17 %>% forecast(h=48) %>%
  autoplot(aus_elec) + labs(title = "Forecasted Australian Electricity Production using ETS") +
  theme(plot.title = element_text(hjust = 0.5))

Which of the two models do you prefer? We will choose the ARIMA model because it has a lower AIC value, so it will perform better than the ETS model.

HW2 CH9

Sal Pollastro

4/14/2022