Chapter 8: Exponential Smoothing

1. Consider the the number of pigs slaughtered in Victoria, available in the aus_livestock dataset.

a. Use the ETS() function to estimate the equivalent model for simple exponential smoothing. Find the optimal values of α and ℓ and generate forecasts for the next four months.

pigs <- filter(aus_livestock, Animal=='Pigs')
pigs <- subset(pigs, )
pigsts <- ts(pigs$Count, start= c(1972), frequency=12)
pigs_ets<-ets(pigsts)
pigs_predict<- forecast(pigs_ets, 36)
#something went wacky here

Compute a 95% prediction interval for the first forecast using 1.96s where
s is the standard deviation of the residuals. Compare your interval with the interval produced by R.

s<-sd(pigs_predict$residuals)
pigs_predict$mean +1.96*s

##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 2344 74371.36 78554.53 86493.45 75184.13 80144.59 88372.69 78285.38 76098.74
## 2345 74371.36 78554.53 86493.45 75184.13 80144.59 88372.69 78285.38 76098.74
## 2346 74371.36 78554.53 86493.45 75184.13 80144.59 88372.69 78285.38 76098.74
##           Sep      Oct      Nov      Dec
## 2344 84922.65 74466.26 75891.19 87579.62
## 2345 84922.65 74466.26 75891.19 87579.62
## 2346 84922.65 74466.26 75891.19 87579.62

pigs_predict$mean -1.96*s

##           Jan      Feb      Mar      Apr      May      Jun      Jul      Aug
## 2344 47689.53 51872.70 59811.62 48502.30 53462.75 61690.86 51603.55 49416.91
## 2345 47689.53 51872.70 59811.62 48502.30 53462.75 61690.86 51603.55 49416.91
## 2346 47689.53 51872.70 59811.62 48502.30 53462.75 61690.86 51603.55 49416.91
##           Sep      Oct      Nov      Dec
## 2344 58240.82 47784.43 49209.36 60897.78
## 2345 58240.82 47784.43 49209.36 60897.78
## 2346 58240.82 47784.43 49209.36 60897.78

Write your own function to implement simple exponential smoothing. The function should take arguments y (the time series), alpha (the smoothing parameter α) and level (the initial level ℓ0).

It should return the forecast of the next observation in the series. Does it give the same forecast as ETS()?

ETSnew <- function(y, alpha, l0){
  y_hat <- l0
  for(index in 1:length(y)){
   y_hat <- alpha*y[index] + (1 - alpha)*y_hat 
  }
  cat("Forecast of next observation by ets function: ",
      as.character(y_hat),
      sep = "\n")
}

5. Data set global_economy contains the annual Exports from many countries. Select one country to analyse.

a)Plot the Exports series and discuss the main features of the data.

arg_exp <- global_economy %>% filter(Code == 'ARG')
arg_exp %>% autoplot(Exports)

## b)Use an ETS(A,N,N) model to forecast the series, and plot the forecasts

fit <- arg_exp %>% 
  model(ANN = ETS (Exports ~ error("A") + trend("N") + season("N")))
arg_fc <- fit %>% forecast(h = 4)
arg_fc %>% autoplot(arg_exp) +
  labs(title="Forecast: Argentina's Exports (ANN)")

c. Compute the RMSE values for the training data. RMSE=2.776

fit %>% accuracy()

## # A tibble: 1 × 11
##   Country   .model .type        ME  RMSE   MAE   MPE  MAPE  MASE RMSSE    ACF1
##   <fct>     <chr>  <chr>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1 Argentina ANN    Training 0.0762  2.78  1.62 -1.73  15.7 0.983 0.986 0.00902

d) Compare the results to those from an ETS(A,A,N) model. (Remember that the trended model is using one more parameter than the simpler model.) Discuss the merits of the two forecasting methods for this data set.

Using an additive for trend results in a slightly lower RMSE as well as MAE and MAPE. Interestingly, the values for both the multiplicative and additive model for trend are very similar, suggesting that either model could be used depending on the forecast data.

fit2 <- arg_exp %>% 
  model(ANN = ETS (Exports ~ error("A") + trend("A") + season("N")))

fit2 %>% accuracy()

## # A tibble: 1 × 11
##   Country   .model .type         ME  RMSE   MAE   MPE  MAPE  MASE RMSSE   ACF1
##   <fct>     <chr>  <chr>      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
## 1 Argentina ANN    Training 0.00795  2.78  1.64 -2.51  15.9 0.994 0.986 0.0271

e. Compare the forecasts from both methods. Which do you think is best?

The prediction for the two models is shown below. The AAN is slightly higher since it allows for an increasing trend (additive model).

We can also see that while the AIC levels are similar, the ANN model peforms better on the BIC test. Since both models are so similar on other metrics, I would choose the ANN model in this case.

comparison <- arg_exp %>%
  model(
    ANN = ETS (Exports ~ error("A") + trend("N") + season("N")),
    AAN = ETS (Exports ~ error("A") + trend("A") + season("N"))
    )
accuracy(comparison)

## # A tibble: 2 × 11
##   Country   .model .type         ME  RMSE   MAE   MPE  MAPE  MASE RMSSE    ACF1
##   <fct>     <chr>  <chr>      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1 Argentina ANN    Training 0.0762   2.78  1.62 -1.73  15.7 0.983 0.986 0.00902
## 2 Argentina AAN    Training 0.00795  2.78  1.64 -2.51  15.9 0.994 0.986 0.0271

comparison %>% 
  forecast(h=4) %>% 
  autoplot(arg_exp, level=NULL) +
  labs(title="Forecast Comparison: Belgium's Exports")

report(fit)

## Series: Exports 
## Model: ETS(A,N,N) 
##   Smoothing parameters:
##     alpha = 0.8855249 
## 
##   Initial states:
##      l[0]
##  7.407428
## 
##   sigma^2:  7.9819
## 
##      AIC     AICc      BIC 
## 359.9466 360.3911 366.1279

report(fit2)

## Series: Exports 
## Model: ETS(A,A,N) 
##   Smoothing parameters:
##     alpha = 0.8629392 
##     beta  = 0.0001 
## 
##   Initial states:
##      l[0]       b[0]
##  6.794833 0.07159122
## 
##   sigma^2:  8.2795
## 
##      AIC     AICc      BIC 
## 363.9608 365.1147 374.2630

Forecast the Chinese GDP from the global_economy data set using an ETS model. Experiment with the various options in the ETS() function to see how much the forecasts change with damped trend, or with a Box-Cox transformation. Try to develop an intuition of what each is doing to the forecasts.

Based on the forecasts and data below, the AAN model seems to be the best based on the ones tested.

chn_gdp <- global_economy %>% filter(Code == 'CHN')
chn_gdp %>% autoplot(GDP)

fit2 <- arg_exp %>% 
  model(ANN = ETS (Exports ~ error("A") + trend("A") + season("N")))

chn_fit <- chn_gdp %>%
  model(ANN= ETS (GDP ~ error("A") + trend("N") + season("N")),
        AAN = ETS (GDP ~ error("A") + trend("A") + season("N")),
        AAA = ETS (GDP ~ error("A") + trend("M") + season("N")))


chn_fit %>% 
  forecast(h=10) %>% 
  autoplot(chn_gdp, level=NULL) +
  labs(title="Forecast Comparison: Chinese GDP")

report(chn_fit)

## Warning in report.mdl_df(chn_fit): Model reporting is only supported for
## individual models, so a glance will be shown. To see the report for a specific
## model, use `select()` and `filter()` to identify a single model.

## # A tibble: 3 × 10
##   Country .model  sigma2 log_lik   AIC  AICc   BIC     MSE    AMSE           MAE
##   <fct>   <chr>    <dbl>   <dbl> <dbl> <dbl> <dbl>   <dbl>   <dbl>         <dbl>
## 1 China   ANN    1.79e23  -1669. 3345. 3345. 3351. 1.73e23 7.22e23 213215048409.
## 2 China   AAN    3.88e22  -1624. 3258. 3259. 3268. 3.61e22 1.31e23  95916434827.
## 3 China   AAA    5.36e22  -1633. 3277. 3278. 3287. 4.99e22 3.07e23 107433931132.

7. Find an ETS model for the Gas data from aus_production and forecast the next few years. Why is multiplicative seasonality necessary here? Experiment with making the trend damped. Does it improve the forecasts?

In this case, damped makes the forecast worse. Multiplicative seasonality is necessary because ” the seasonal variations are changing proportional to the level of the series”(Hyndman 8.3)

aus_production %>% autoplot(Gas)

aus_fit <- aus_production %>%
  filter(Quarter > yearquarter("1990 Q4")) %>%
  model(etsaus = ETS(Gas),
        damped = ETS(Gas ~ trend("Ad")))
report(aus_fit)

## Warning in report.mdl_df(aus_fit): Model reporting is only supported for
## individual models, so a glance will be shown. To see the report for a specific
## model, use `select()` and `filter()` to identify a single model.

## # A tibble: 2 × 9
##   .model sigma2 log_lik   AIC  AICc   BIC   MSE  AMSE   MAE
##   <chr>   <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 etsaus   28.5   -296.  611.  613.  632.  25.6  32.0  3.85
## 2 damped   29.9   -298.  615.  619.  639.  26.5  34.2  3.75

aus_fit %>%
  forecast(h = "3 years") %>%
  autoplot(aus_production)

8. Recall your retail time series data (from Exercise 8 in Section 2.10).

a. Why is multiplicative seasonality necessary for this series?

Multiplicative seasonality is necessary because the seasonal variations are changing in proportion with the trend.

b. Apply Holt-Winters’ multiplicative method to the data. Experiment with making the trend damped.

We see a much better fit with less error with the hold-winters model

set.seed(1234)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries

## # A tsibble: 441 x 5 [1M]
## # Key:       State, Industry [1]
##    State    Industry                               `Series ID`    Month Turnover
##    <chr>    <chr>                                  <chr>          <mth>    <dbl>
##  1 Tasmania Cafes, restaurants and takeaway food … A3349520V   1982 Apr      5.4
##  2 Tasmania Cafes, restaurants and takeaway food … A3349520V   1982 May      5.5
##  3 Tasmania Cafes, restaurants and takeaway food … A3349520V   1982 Jun      5.1
##  4 Tasmania Cafes, restaurants and takeaway food … A3349520V   1982 Jul      5.5
##  5 Tasmania Cafes, restaurants and takeaway food … A3349520V   1982 Aug      5.5
##  6 Tasmania Cafes, restaurants and takeaway food … A3349520V   1982 Sep      5.7
##  7 Tasmania Cafes, restaurants and takeaway food … A3349520V   1982 Oct      5.9
##  8 Tasmania Cafes, restaurants and takeaway food … A3349520V   1982 Nov      5.9
##  9 Tasmania Cafes, restaurants and takeaway food … A3349520V   1982 Dec      6.7
## 10 Tasmania Cafes, restaurants and takeaway food … A3349520V   1983 Jan      5.5
## # … with 431 more rows

holtfit <- myseries %>%
  model(
    holtfit = ETS(Turnover ~ error("M") + trend("A") +season("M")))

  
justadd <- myseries %>%
  model(
    additive = ETS(Turnover ~ error("A") + trend("A") + season("A")
                                                ))

c. Compare the RMSE of the one-step forecasts from the two methods. Which do you prefer?

The holt forecast performs better with a lower RMSE

d. Check that the residuals from the best method look like white noise.

Residuals clearly look like white noise in last plot.

components(holtfit) %>%
  autoplot() +
  labs(title = "ETS(M,A,M) components")

## Warning: Removed 12 row(s) containing missing values (geom_path).

Chapter 9 ARIMA Models

Note: I Inadvertently used the 2nd edition for the first half of this assignment. It’s mostly the same but in some cases uses different data.

1. Explain the differences among these figures. Do they all indicate that the data are white noise?

a.All of the models appear to be white noise. While Series x1 appears to have larger autocorrelations, it does not go out o fthe ‘critical value’ denoted by hte blue dotted line. In none of these cases do the values go beyond that line, so all are white noise. b. The autocorrelations can be different because there can still be a relationship between two consecutive variables even if there is not a [attern that is beyond white noise. More specifically, we can calculate the absolte distance from the mean based the length of the series, per Hyndman calculation for critical values.

2.A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

While one can tell little about stationarity from a simple time series plot, we can see from the ACF and PACF plots that there is not stationarity. In the ACF plot, there is clearly a downward trend indicating that the plot is not statitionary. The large value at lag=1 on the PACF plot is also an indicator of a non-stationary series.

ibm<-ibmclose
autoplot(ibm)

autoplot(acf(ibm))

autoplot(pacf(ibm))

3. For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

library(urca)
(lambda <- BoxCox.lambda(usnetelec))

## [1] 0.5167714

autoplot(BoxCox(usnetelec,lambda))

usnetelec %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 1.464 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

usnetelec %>% diff() %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.1585 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

ndiffs(usnetelec)

## [1] 1

#one difference needed to make stationary

(lambda2 <- BoxCox.lambda(usgdp))

## [1] 0.366352

autoplot(BoxCox(usgdp,lambda2))

usgdp %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 4.6556 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

usgdp %>% diff() %>%  ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 1.7909 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

ndiffs(usgdp)

## [1] 2

#ndiffs shows two differences  needed.  Below model shows signficance. 
usgdp %>% diff() %>% diff() %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 0.021 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

(lambda3 <- BoxCox.lambda(mcopper))

## [1] 0.1919047

autoplot(BoxCox(mcopper,lambda3))

mcopper %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 6 lags. 
## 
## Value of test-statistic is: 5.01 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

ndiffs(mcopper)

## [1] 1

#ndiffs shows one difference needed.  Below model shows significance. 
mcopper %>% diff() %>%  ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 6 lags. 
## 
## Value of test-statistic is: 0.1843 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

en<-enplanements
(lambda4 <- BoxCox.lambda(en))

## [1] -0.2269461

autoplot(BoxCox(en,lambda4))

en %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 4.4423 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

ndiffs(en)

## [1] 1

#ndiffs shows one difference needed.  Below model shows significance. 
mcopper %>% diff() %>%  ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 6 lags. 
## 
## Value of test-statistic is: 0.1843 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

(lambda5 <- BoxCox.lambda(visitors))

## [1] 0.2775249

autoplot(BoxCox(visitors,lambda4))

visitors %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 4 lags. 
## 
## Value of test-statistic is: 4.6025 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

ndiffs(visitors)

## [1] 1

#ndiffs shows one difference needed.  Below model shows significance. 
mcopper %>% diff() %>%  ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 6 lags. 
## 
## Value of test-statistic is: 0.1843 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

4. For the enplanements data, write down the differences you chose above using backshift operator notation.

Using backshift notation, we would write this as (1-B)^1(y_t) for a single difference.

5. For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

Using a box-cox transformation, the ndiff tests recommends first order differencing which gives us signficant t-test as shown below.

retail <- readxl::read_excel("/Users/Luke/Documents/BC/Predictive Analytics/Assignment 2/retail.xlsx", skip=1)
ret_ts <- ts(retail[,"A3349873A"],
  frequency=12, start=c(1982,4))
ret_ts

##        Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec
## 1982                    62.4  63.1  59.6  61.9  60.7  61.2  62.1  68.3 104.0
## 1983  63.9  64.8  70.0  65.3  68.9  65.7  66.9  70.4  71.6  74.9  83.4 122.8
## 1984  69.0  71.8  74.9  70.2  76.6  68.7  70.1  74.6  70.6  80.5  87.2 121.3
## 1985  73.3  71.1  75.7  76.0  86.1  75.2  83.4  85.3  81.3  93.9 104.7 143.8
## 1986  88.5  85.2  86.2  92.4 100.9  90.1  96.1  97.2  96.8 107.7 110.9 161.0
## 1987  98.1  94.5  97.7  99.3 106.3  98.5 107.1 105.9 108.5 117.1 121.4 170.1
## 1988 109.0 110.7 115.5 105.7 114.3 107.5 108.8 109.6 118.4 125.5 151.8 232.4
## 1989 129.4 120.6 133.2 129.3 142.8 127.6 126.0 136.7 144.5 147.8 168.4 242.6
## 1990 141.2 139.8 152.1 135.8 148.0 135.8 138.7 144.8 139.9 151.6 163.9 215.8
## 1991 135.1 135.5 142.4 137.3 146.5 137.6 147.0 152.9 157.5 169.3 184.8 250.1
## 1992 164.4 169.8 171.0 167.5 173.2 150.8 160.9 164.5 173.6 182.7 196.9 255.5
## 1993 156.1 152.6 162.0 151.5 160.5 144.9 147.0 151.5 161.6 169.4 186.7 270.1
## 1994 159.6 161.0 171.3 152.6 159.5 157.4 156.9 169.6 186.2 206.3 198.3 269.5
## 1995 176.6 170.8 179.7 174.9 174.9 169.1 184.9 192.5 201.5 210.5 227.9 316.5
## 1996 202.2 210.0 204.5 203.3 209.4 194.8 215.7 228.6 226.6 229.8 242.6 336.5
## 1997 228.4 212.9 222.3 217.2 225.4 217.2 228.2 227.9 234.9 257.6 280.7 390.1
## 1998 235.6 224.4 219.1 242.2 239.6 230.5 240.5 233.9 242.7 227.3 243.9 337.8
## 1999 211.2 197.0 194.3 218.5 222.6 195.0 215.2 222.7 232.6 236.7 252.2 364.6
## 2000 219.2 215.2 221.0 212.6 228.6 239.4 201.0 211.4 241.1 253.9 261.2 362.6
## 2001 244.9 236.1 249.7 263.4 268.1 248.9 253.3 266.0 262.2 291.6 316.8 445.0
## 2002 268.6 248.4 272.4 261.5 283.1 254.4 265.3 284.9 291.2 299.7 332.0 454.8
## 2003 271.8 261.3 266.7 275.8 287.3 277.5 285.4 297.1 314.4 323.0 346.5 456.0
## 2004 268.5 256.8 270.7 250.9 266.4 255.2 261.0 263.9 276.3 291.2 304.8 427.0
## 2005 279.4 255.7 268.3 260.6 260.1 254.4 249.9 262.4 269.9 277.8 303.0 417.3
## 2006 265.8 248.7 273.1 261.0 266.3 260.4 268.3 275.9 278.2 284.1 299.2 429.1
## 2007 266.0 251.1 269.9 261.7 273.7 254.8 275.2 290.4 306.7 309.8 324.3 472.0
## 2008 285.9 286.8 275.3 257.2 285.8 259.7 261.2 273.4 275.2 300.5 323.5 457.3
## 2009 290.8 285.2 300.6 294.4 304.9 292.5 305.3 289.1 296.2 298.6 321.0 408.9
## 2010 266.2 240.0 267.5 260.7 272.8 260.5 268.5 277.0 278.7 279.0 319.3 400.2
## 2011 296.2 302.5 310.8 274.8 267.0 263.8 294.6 317.8 320.4 308.6 427.5 463.9
## 2012 288.6 287.1 315.6 291.2 309.3 330.0 327.0 331.1 344.6 366.0 534.2 535.4
## 2013 364.5 360.1 400.3 379.4 395.1 373.6 400.1 384.1 388.4 418.2 577.9 564.3

(ret_lambda<- BoxCox.lambda(ret_ts))

## [1] 0.1276369

autoplot(BoxCox(ret_ts,ret_lambda))

BoxCox(ret_ts,ret_lambda) %>% ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 5.8234 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

ndiffs(BoxCox(ret_ts,ret_lambda))

## [1] 1

#ndiffs shows one difference needed.  Below model shows significance. 
BoxCox(ret_ts,ret_lambda) %>% diff() %>%  ur.kpss() %>% summary()

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 5 lags. 
## 
## Value of test-statistic is: 0.0198 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

6. Use R to simulate and plot some data from simple ARIMA models.

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

Produce a time plot for the series. How does the plot change as you change

#(sim)
#e <- rnorm(100)
#for(i in 2:100)
 # y[i] <- 0.6*y[i-0] + e[i]
#autoplot(y)

7. Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

a. Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

The correct model is a ARIMA(0,2,1)

aap<-aus_airpassengers

aap_fit<-auto.arima(aap)
autoplot(aap)

## Plot variable not specified, automatically selected `.vars = Passengers`

autoplot(acf(aap))

autoplot(pacf(aap))

aap_predict <- forecast(aap_fit, 10)
autoplot(aap_predict)

b.Write the model in terms of the backshift operator.

\[(1−B)^2y_t=c+(1+θ1B)e_t\] ### c. Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

#aap2<-tsibble(aap, key=Passengers, index=1)

#aap
#aap_arimacomp <- aap %>%
 # model(arima010 = ARIMA(Passengers ~ pdq(0,1,0)),
     #   arima013 = ARIMA(Passengers ~ pdq(2,1,2)),
       # stepwise = ARIMA(Passengers),
      #  search = ARIMA(Passengers, stepwise=FALSE))
#aap_arimacomp

8. For the United States GDP series (from global_economy):

a. if necessary, find a suitable Box-Cox transformation for the data; credit: used Sean Connin’s code for assitance on this section

usgdp<-global_economy%>%
  filter(Country %in% 'United States')%>%
  mutate(gdp_capita = GDP/Population)%>%
  select(Year, gdp_capita)

usgdp%>%
  autoplot(color='steelblue')+
  labs(title = 'Figure 34. Per Capita US GDP')+
  theme_classic()

## Plot variable not specified, automatically selected `.vars = gdp_capita`

(lambda_us <- BoxCox.lambda(usgdp$GDP))

## Warning: Unknown or uninitialised column: `GDP`.

## [1] 1

lambda_gdp<-usgdp%>%
  features(gdp_capita, features=guerrero)%>%
  pull(lambda_guerrero)

b.fit a suitable ARIMA model to the transformed data using ARIMA();

gdp.fit<-usgdp %>%
  model(ARIMA(box_cox(gdp_capita, lambda_gdp)))

report(gdp.fit)

## Series: gdp_capita 
## Model: ARIMA(1,1,0) w/ drift 
## Transformation: box_cox(gdp_capita, lambda_gdp) 
## 
## Coefficients:
##          ar1  constant
##       0.4663    1.2236
## s.e.  0.1191    0.1235
## 
## sigma^2 estimated as 0.9269:  log likelihood=-77.82
## AIC=161.64   AICc=162.09   BIC=167.77

c. try some other plausible models by experimenting with the orders chosen;

alternatives<-usgdp%>%
  model(gdp110 = ARIMA(box_cox(gdp_capita, lambda_gdp) ~ 1 + pdq(1,1,0)),
        gdp011 = ARIMA(box_cox(gdp_capita, lambda_gdp) ~ 1 + pdq(0,1,1)),
        gdp022 =ARIMA(gdp_capita ~ 1+pdq(0,2,2)),  
        stepwise=ARIMA(box_cox(gdp_capita, lambda_gdp)),
        search=ARIMA(box_cox(gdp_capita, lambda_gdp), stepwise=FALSE))

## Warning: Model specification induces a quadratic or higher order polynomial trend. 
## This is generally discouraged, consider removing the constant or reducing the number of differences.

report(alternatives)

## Warning in report.mdl_df(alternatives): Model reporting is only supported for
## individual models, so a glance will be shown. To see the report for a specific
## model, use `select()` and `filter()` to identify a single model.

## # A tibble: 5 × 8
##   .model       sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
##   <chr>         <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
## 1 gdp110        0.927   -77.8  162.  162.  168. <cpl [1]> <cpl [0]>
## 2 gdp011        0.959   -78.8  164.  164.  170. <cpl [0]> <cpl [1]>
## 3 gdp022   258222.     -429.   865.  866.  873. <cpl [0]> <cpl [2]>
## 4 stepwise      0.927   -77.8  162.  162.  168. <cpl [1]> <cpl [0]>
## 5 search        0.927   -77.8  162.  162.  168. <cpl [1]> <cpl [0]>

9. Consider aus_arrivals, the quarterly number of international visitors to Australia from several countries for the period 1981 Q1 – 2012 Q3.

aus_arrivals

## # A tsibble: 508 x 3 [1Q]
## # Key:       Origin [4]
##    Quarter Origin Arrivals
##      <qtr> <chr>     <int>
##  1 1981 Q1 Japan     14763
##  2 1981 Q2 Japan      9321
##  3 1981 Q3 Japan     10166
##  4 1981 Q4 Japan     19509
##  5 1982 Q1 Japan     17117
##  6 1982 Q2 Japan     10617
##  7 1982 Q3 Japan     11737
##  8 1982 Q4 Japan     20961
##  9 1983 Q1 Japan     20671
## 10 1983 Q2 Japan     12235
## # … with 498 more rows

aus_jap<-filter(aus_arrivals, Origin == "Japan")
aj<- subset(aus_jap, select = -c(Origin))

autoplot(aj)

## Plot variable not specified, automatically selected `.vars = Arrivals`

autoplot(acf(aj))

autoplot(pacf(aj))

## b. Use differencing to obtain stationary data.

#this had issues
#aj %>% ur.kpss() %>% summary()
#usnetelec %>% diff() %>% ur.kpss() %>% summary()
#ndiffs(usnetelec)

#auto.arima(aj)

Assignment 2

Lucas Kaplan

2022-04-13