Hw#6

library(fpp3)

## ── Attaching packages ──────────────────────────────────────────── fpp3 0.4.0 ──

## ✔ tibble      3.1.8     ✔ tsibble     1.1.2
## ✔ dplyr       1.0.9     ✔ tsibbledata 0.4.0
## ✔ tidyr       1.2.0     ✔ feasts      0.2.2
## ✔ lubridate   1.8.0     ✔ fable       0.3.1
## ✔ ggplot2     3.3.6

## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date()    masks base::date()
## ✖ dplyr::filter()      masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval()  masks lubridate::interval()
## ✖ dplyr::lag()         masks stats::lag()
## ✖ tsibble::setdiff()   masks base::setdiff()
## ✖ tsibble::union()     masks base::union()

library(Quandl)

## Warning: package 'Quandl' was built under R version 4.2.2

## Loading required package: xts

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following object is masked from 'package:tsibble':
## 
##     index

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

## 
## Attaching package: 'xts'

## The following objects are masked from 'package:dplyr':
## 
##     first, last

Question 1

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

a.) Explain the differences among these figures. Do they all indicate that the data are white noise?

All three ACF graphs are white noise. None of the lags cross the critical value range.

b.) Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The number of observations in each series is increasing and changing the critical value range. The mean and variance is remaining constant, however, and is why the autocorrelations are changing.

Question 2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

#Plotting the stock's price
amazon <- gafa_stock%>%
  filter(Symbol == "AMZN") %>%
  select(Date, Close)

amazon %>%
  ggplot(aes(x = Date)) +
  geom_line(aes(y = Close)) +
  labs(title = "Amazon", y = "Closing Price")

#ACF and PACF graphs
amazon %>%
  gg_tsdisplay(Close, plot_type = 'partial')

## Warning: Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.

Looking at the graph of the data, there is an upward trend. This alone is reason enough to know I’d need to difference this before proceeding with an ARIMA model. The ACF and PACF graphs confirm this. The ACF shows autocorrelation at every lag, and the PACF graph shows perfect autocorrelation in the second lag.

Question 3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data. If any type of difference were taken, please write down the differences you chose above using backshift operator notation.

a.) Turkish GDP from global_economy.

#plotting the data
turk <- global_economy %>%
  filter(Country == "Turkey") %>%
  mutate(GDP = (GDP/1000000)) %>%
  select(Year, GDP) 

turk %>%
  ggplot(aes(x = Year, y = GDP)) +
  geom_line(aes(y = GDP)) +
  labs(y = 'GDP (Million)', title = "GDP of Turkey")

#performing a box cox transformation
turkguer <- turk %>%
  features(GDP, features = guerrero)

turkboxcox <- turk %>%
  mutate(GDP = box_cox(GDP, turkguer))

turkboxcox %>%
  features(GDP, unitroot_ndiffs)

## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      1

#Taking and then graphing the first difference with the box cox transformation
turkboxcox %>%
  ggplot(aes(x = Year)) +
  geom_line(aes(y = difference(GDP))) +
  labs(title = "First Difference")

## Warning: Removed 1 row(s) containing missing values (geom_path).

turkboxcox %>%
  features(difference(GDP), unitroot_kpss)

## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1    0.0889         0.1

The graph of the data shows an upward trend with random variance. The box cox transformation reduces the random variane and the first difference makes the series stationary. The low tstat value suggest the data is stationary after the tranformation and differencing. The notation for this is (1-B)yt

b.) Accommodation takings in the state of Tasmania from aus_accommodation.

tasmania <- aus_accommodation %>%
  filter(State == "Tasmania") %>%
  select(Date, Takings)

tasmania %>%
  ggplot(aes(x = Date, y = Takings)) +
  geom_line(aes(y = Takings))

tasmania %>%
  gg_tsdisplay(Takings, plot_type = 'partial')

tasmania %>%
  features(Takings, unitroot_nsdiffs)

## # A tibble: 1 × 1
##   nsdiffs
##     <int>
## 1       1

tasmaniaseadiff<-tasmania %>%
  mutate(Takings = difference(Takings, 4))
  

tasmaniaseadiff %>%
  features(Takings, unitroot_ndiffs)

## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      0

tasmaniaseadiff %>%
  ggplot(aes(x = Date, y = Takings)) +
  geom_line(aes(y = Takings)) +
  labs(title = "Seasonally Difference")

## Warning: Removed 4 row(s) containing missing values (geom_path).

tasmaniaseadiff %>%
  features(Takings, unitroot_kpss)

## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1     0.196         0.1

The graph of the data is very seasonal and on an upward trend. The ACF and PACF show high autocorrelation in the data. This suggest the need for a seasonal difference and since the seasonal period is quarters, I took the 4th difference. I did not have to take the difference after taking the seasonal difference so the notation is (1-(B1)-(B4)+(B^5))yt.

c.) Monthly sales from souvenirs.

souvenirs %>%
  ggplot(aes(x = Month, y = Sales)) +
  geom_line(aes(y = Sales))

souvenirs %>%
  gg_tsdisplay(Sales, plot_type = 'partial')

souvgurrero <- souvenirs %>%
  features(Sales, features = guerrero)

souvboxcox <- souvenirs %>%
  mutate(Sales = box_cox(Sales, souvgurrero))

souvboxcox %>%
  features(Sales, unitroot_nsdiffs)

## # A tibble: 1 × 1
##   nsdiffs
##     <int>
## 1       1

souvseadiff <- souvboxcox %>%
  mutate(Sales = difference(Sales, 12))

souvseadiff %>%
  features(Sales, unitroot_ndiffs)

## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      0

souvseadiff %>%
  ggplot(aes(x = Month, y = Sales)) +
  geom_line(aes(y = Sales))

## Warning: Removed 12 row(s) containing missing values (geom_path).

souvseadiff %>%
  features(Sales, unitroot_kpss)

## # A tibble: 1 × 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1     0.240         0.1

The data has an upward trend with a yearly seasonality. There is autocorrelation in both the ACF and PACF graphs.The seasonal variation increased through time, which required a box-cox transformation. After the seasonal difference was taken, there was no need to take another difference since the data was stationary.

Question 4

Simulate and plot some data from simple ARIMA models.

a.) Use the following R code to generate data from an AR(1) model with ϕ1=0.6 and σ2=1. The process starts with y1=0.

ar1 <- function(phi, n = 100) {
  y <- numeric(n)
  e <- rnorm(n)
  for (i in 2:n) {
    y[i] <- phi * y[i - 1] + e[i]
  }
  tsibble(idx = seq_len(n), y = y, index = idx)
}
ar1(0.5)

## # A tsibble: 100 x 2 [1]
##      idx      y
##    <int>  <dbl>
##  1     1  0    
##  2     2 -0.830
##  3     3 -0.556
##  4     4  0.134
##  5     5 -0.242
##  6     6  0.522
##  7     7  0.281
##  8     8  0.974
##  9     9 -0.939
## 10    10  2.69 
## # … with 90 more rows

autoplot(ar1(0.5))

## Plot variable not specified, automatically selected `.vars = y`

autoplot(ar1(0.6))

## Plot variable not specified, automatically selected `.vars = y`

Both graphs here could be stationary, however, the graph for ar1(0.5) has an upward trend in the beginning. This does not mean it is not stationary, it needs to be tested further. The 0.6 graph is stationary.

b.) Produce a time plot for the series. How does the plot change as you change ϕ1?

autoplot(ar1(-1))

## Plot variable not specified, automatically selected `.vars = y`

autoplot(ar1(0))

## Plot variable not specified, automatically selected `.vars = y`

autoplot(ar1(1))

## Plot variable not specified, automatically selected `.vars = y`

When phi is -1, the data is very seasonal and is not stationary. When phi is 0, the data appears to be white noise. When phi is 1, there is a significant downward trend which means it is not stationary.

c.) Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1.

ma1 = function(param = 0.6)
{
  seed = 0747
  y <- ts(numeric(100))
  forecast.err <- rnorm(100)
  rand.err = rnorm(100)
  for(i in 2:100)
    y[i] <- param *  forecast.err[i-1] + rand.err[i]
  return(y)
}

ma1(0.6) %>%
  as_tsibble() %>%
  autoplot() +
  labs(title = "MA1(0.6)")

## Plot variable not specified, automatically selected `.vars = value`

d.) Produce a time plot for the series. How does the plot change as you change θ1?

ma1(-1) %>%
  as_tsibble() %>%
  autoplot() +
  labs(title = "MA1(-1)")

## Plot variable not specified, automatically selected `.vars = value`

ma1(0) %>%
  as_tsibble() %>%
  autoplot() +
  labs(title = "MA1(0)")

## Plot variable not specified, automatically selected `.vars = value`

ma1(1) %>%
  as_tsibble() %>%
  autoplot() +
  labs(title = "MA1(1)")

## Plot variable not specified, automatically selected `.vars = value`

When I changed phi, there was little change in the graphs. All three of these resemble a stationary series.

e.) Generate data from an ARMA(1,1) model with ϕ1=0.6, θ1=0.6 and σ2=1.

arma1 = function(alpha = 0.6, theta = 0.6)
{
  seed = 0747
  y <- ts(numeric(100))
  e <- rnorm(100)
  rand.error = rnorm(100)
  for(i in 2:100)
    y[i] <- alpha *  y[i-1] + theta * e[i] + rand.error[i]
  return(y)
}
arma1() %>%
  as_tsibble() %>%
  autoplot() +
  labs(title = "ARMA(1,1)")

## Plot variable not specified, automatically selected `.vars = value`

f.) Generate data from an AR(2) model with ϕ1=−0.8, ϕ2=0.3 and σ2=1. (Note that these parameters will give a non-stationary series.)

ar2 = function(alpha1 = -0.8,  alpha2 = 0.3)
{
  seed = 0747
  y <- ts(numeric(100))
  e <- rnorm(100)
  for(i in 3:100)
    y[i] <- alpha1 *  y[i-1] + alpha2 *  y[i-2] + e[i]
  return(y)
}

ar2() %>%
  as_tsibble() %>%
  autoplot() +
  labs(title = "AR(2)")

## Plot variable not specified, automatically selected `.vars = value`

g.) Graph the latter two series and compare them.

The graph from part e looks stationary due to the lack of a trend, seasonality, or cycle. The graph from part f is not stationary at all. There is a high degree of autocorrelation and the variance is increasing exponentally.

Question 5

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

a.) Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

pass <- aus_airpassengers %>% 
  slice(1:42) 
pass

## # A tsibble: 42 x 2 [1Y]
##     Year Passengers
##    <dbl>      <dbl>
##  1  1970       7.32
##  2  1971       7.33
##  3  1972       7.80
##  4  1973       9.38
##  5  1974      10.7 
##  6  1975      11.1 
##  7  1976      10.9 
##  8  1977      11.3 
##  9  1978      12.1 
## 10  1979      13.0 
## # … with 32 more rows

autoplot(pass)

## Plot variable not specified, automatically selected `.vars = Passengers`

pass %>% features(Passengers, unitroot_ndiffs)

## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      2

pass1 <- pass %>% 
  mutate(Passengers=difference(Passengers, 2))
autoplot(pass1)

## Plot variable not specified, automatically selected `.vars = Passengers`

## Warning: Removed 2 row(s) containing missing values (geom_path).

fitpass <- pass %>% 
  model(ARIMA(Passengers))
report(fitpass)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8756
## s.e.   0.0722
## 
## sigma^2 estimated as 4.671:  log likelihood=-87.8
## AIC=179.61   AICc=179.93   BIC=182.99

gg_tsresiduals(fitpass)

fitpass %>% 
  forecast(h=10) %>% 
  autoplot(pass)

Despite the unitroot_ndiffs function suggesting a second difference, the data still does not appear stationary since it has an overall upward trend. The residuals do no appear to be white noise either. The

b.) Write the model in terms of the backshift operator.

(1-2B+B^2)Yt

c.) Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

pass101 <- aus_airpassengers %>%
  model(
    ARIMA(Passengers~1+pdq(0,1,0)))

pass101 %>%
  gg_tsresiduals()

augment(pass101) %>%
  features(.resid, ljung_box, lag = 10, dof = 0)

## # A tibble: 1 × 3
##   .model                               lb_stat lb_pvalue
##   <chr>                                  <dbl>     <dbl>
## 1 ARIMA(Passengers ~ 1 + pdq(0, 1, 0))    6.77     0.747

pass101 %>%
  forecast(h = 10) %>%
  autoplot(aus_airpassengers)

This forecast has some of the same problems as the one is part A. The residuals are not white noise. The prediction interval is much smaller here as well.

d.) Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to part b. Remove the constant and see what happens.

pass212con <- aus_airpassengers %>%
  model(
    ARIMA(Passengers ~ 1 + pdq(2,1,1)))

pass212con %>%
  gg_tsresiduals()

augment(pass212con) %>%
  features(.resid, ljung_box, lag = 10, dof = 0)

## # A tibble: 1 × 3
##   .model                               lb_stat lb_pvalue
##   <chr>                                  <dbl>     <dbl>
## 1 ARIMA(Passengers ~ 1 + pdq(2, 1, 1))    4.35     0.930

pass212con %>%
  forecast(h = 10) %>%
  autoplot(aus_airpassengers)

pass212 <- aus_airpassengers %>%
  model(ARIMA(Passengers~0 + pdq(2,1,1)))

## Warning: 1 error encountered for ARIMA(Passengers ~ 0 + pdq(2, 1, 1))
## [1] non-stationary AR part from CSS

augment(pass212) %>%
  features(.resid, ljung_box, lag = 10, dof = 0)

## # A tibble: 1 × 3
##   .model                               lb_stat lb_pvalue
##   <chr>                                  <dbl>     <dbl>
## 1 ARIMA(Passengers ~ 0 + pdq(2, 1, 1))      NA        NA

pass212 %>%
  forecast(h = 10) %>%
  autoplot(aus_airpassengers)

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf

## Warning: Removed 10 row(s) containing missing values (geom_path).

The ARIMA(2,1,2) with the constant has residuals that look similar to the previous graphs. I attempted to view the residuals of the ARIMA(2,1,2) without the constant but kept getting an error. I think this is caused by the series not being stationary when removing the constant.

e.) Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

pass021con <- aus_airpassengers %>%
  model(
    ARIMA(Passengers ~ 1 + pdq(0,2,1)))

## Warning: Model specification induces a quadratic or higher order polynomial trend. 
## This is generally discouraged, consider removing the constant or reducing the number of differences.

pass021con %>%
  gg_tsresiduals()

augment(pass021con) %>%
  features(.resid, ljung_box, lag = 10, dof = 0)

## # A tibble: 1 × 3
##   .model                               lb_stat lb_pvalue
##   <chr>                                  <dbl>     <dbl>
## 1 ARIMA(Passengers ~ 1 + pdq(0, 2, 1))    6.91     0.734

pass021con %>%
  forecast(h = 10) %>%
  autoplot(aus_airpassengers)

Adding the constant tightened the prediction interval, steepened the point forecast line, and slightly skwed the residuals to the left. Since the residuals do not follow a normal distribution, I think it is more appropriate to leave the constant out.

Question 6

For the United States GDP series (from global_economy):

a.) If necessary, find a suitable Box-Cox transformation for the data;

usagdp <- global_economy %>%
  filter(Code == "USA") %>%
  mutate(GDP = (GDP/1000000000)) %>%
  select(Year, GDP)

usagdp %>%
  ggplot(aes(x = Year, y = GDP)) +
  geom_line(aes(y = GDP)) +
  labs(title = "US GDP", y = "GDP ($ in Billions)")

#boxcox
usagdpguerro <- usagdp %>%
  features(GDP, features = guerrero)

usagdpbc <- usagdp %>%
  mutate(GDP = box_cox(GDP, usagdpguerro))

usagdpbc %>%
  autoplot()

## Plot variable not specified, automatically selected `.vars = GDP`

b.) fit a suitable ARIMA model to the transformed data using ARIMA();

usagdp %>%
  gg_tsdisplay(GDP, plot_type = 'partial')

usagdp %>%
  features(GDP, unitroot_ndiffs)

## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      2

usagdparima <- usagdp %>%
  model(
    'Auto' = ARIMA(GDP))

usagdparima %>%
  report()

## Series: GDP 
## Model: ARIMA(0,2,2) 
## 
## Coefficients:
##           ma1      ma2
##       -0.4206  -0.3048
## s.e.   0.1197   0.1078
## 
## sigma^2 estimated as 26150:  log likelihood=-363.57
## AIC=733.14   AICc=733.61   BIC=739.22

c.) try some other plausible models by experimenting with the orders chosen;

arimamodels <- usagdp %>%
  model(
    'Auto' = ARIMA(GDP),
    '120' = ARIMA(GDP ~ pdq(1,2,0)),
    '020' = ARIMA(GDP ~ pdq(1,2,0)),
    "021" = ARIMA(GDP ~ pdq(0,2,1)),
    "121" = ARIMA(GDP ~ pdq(1,2,1)))

arimamodels %>%
  report()

## Warning in report.mdl_df(.): Model reporting is only supported for individual
## models, so a glance will be shown. To see the report for a specific model, use
## `select()` and `filter()` to identify a single model.

## # A tibble: 5 × 8
##   .model sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
##   <chr>   <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
## 1 Auto   26150.   -364.  733.  734.  739. <cpl [0]> <cpl [2]>
## 2 120    30623.   -368.  740.  741.  744. <cpl [1]> <cpl [0]>
## 3 020    30623.   -368.  740.  741.  744. <cpl [1]> <cpl [0]>
## 4 021    29191.   -367.  738.  738.  742. <cpl [0]> <cpl [1]>
## 5 121    26342.   -364.  733.  734.  740. <cpl [1]> <cpl [1]>

d.) choose what you think is the best model and check the residual diagnostics;

usagdp121 <- usagdp %>%
  model(
    "AR + Diff + MA" = ARIMA(GDP ~ pdq(1,2,1))
  )
usagdparima %>%
  gg_tsresiduals()

usagdp121 %>%
  gg_tsresiduals()

augment(usagdparima) %>%
  features(.resid, ljung_box, lag = 10, dof = 0)

## # A tibble: 1 × 3
##   .model lb_stat lb_pvalue
##   <chr>    <dbl>     <dbl>
## 1 Auto      12.2     0.273

augment(usagdp121) %>%
  features(.resid, ljung_box, lag = 10, dof = 0)

## # A tibble: 1 × 3
##   .model         lb_stat lb_pvalue
##   <chr>            <dbl>     <dbl>
## 1 AR + Diff + MA    14.1     0.169

The automatic ARIMA model produced an AMRIMA(0,2,2). This model had the lowest AICc score followed closely by the ARIMA(1,2,1) model. After plotting the residuals of both, I relized they both have a non normal distribution that is skewed to the right. The ARIMA(1,2,1) model has some autocorrelation which is why I think the ARIMA(0,2,2) is the best model.

e.) produce forecasts of your fitted model. Do the forecasts look reasonable?

usagdparima %>%
  forecast(h = 10) %>%
  autoplot(usagdp)

f.) compare the results with what you would obtain using ETS() (with no transformation).

usetscompare <- usagdp %>%
  model('Auto' = ETS(GDP),
        'Multiplicative Trend' = ETS(GDP ~ error("A") + trend("M") + season("N")))
usetscompare %>%
  report()

## Warning in report.mdl_df(.): Model reporting is only supported for individual
## models, so a glance will be shown. To see the report for a specific model, use
## `select()` and `filter()` to identify a single model.

## # A tibble: 2 × 9
##   .model                  sigma2 log_lik   AIC  AICc   BIC    MSE   AMSE     MAE
##   <chr>                    <dbl>   <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>   <dbl>
## 1 Auto                   4.55e-4   -377.  764.  765.  774. 27947. 1.28e5 1.68e-2
## 2 Multiplicative Trend   3.19e+4   -416.  843.  844.  853. 29698. 1.45e5 1.02e+2

Question 7

Consider aus_arrivals, the quarterly number of international visitors to Australia from several countries for the period 1981 Q1 – 2012 Q3.

a.) Select one country and describe the time plot.

arrival <- aus_arrivals %>%
  filter(Origin == "UK")

autoplot(arrival)

## Plot variable not specified, automatically selected `.vars = Arrivals`

b.) Use differencing to obtain stationary data.

arrivalseadiff <- arrival %>%
  gg_tsdisplay(difference(Arrivals, 4) %>% 
                 difference(), plot_type = "partial" , lag = 16)
arrivalseadiff

## Warning: Removed 5 row(s) containing missing values (geom_path).

## Warning: Removed 5 rows containing missing values (geom_point).

c.)What can you learn from the ACF graph of the differenced data?

Looking at the first lag, I can tell data is highly correlated with the previous observation. The next highly correlated mark is at the fourth lag, this makes sense since it is quarterly data. So the data is highly correlated with the previous period as well as the same quarter a year ago.

d.)What can you learn from the PACF graph of the differenced data?

The PACF tells a similar story as the ACF. Yt is highly correlated with the previous quarter and the quarter a year ago. The main change to this graph is the correlation between Yt and the second lag.

e.) What model do these graphs suggest?

From these graphs, I know that I would need to do a seasonal differencing.

f.) Does ARIMA() give the same model that you chose? If not, which model do you think is better?

arrivalsfit <- arrival %>%
  model(ARIMA(Arrivals))

report(arrivalsfit)

## Series: Arrivals 
## Model: ARIMA(2,0,2)(0,1,2)[4] 
## 
## Coefficients:
##          ar1      ar2      ma1     ma2     sma1    sma2
##       1.5155  -0.5924  -1.4193  0.7519  -0.7658  0.5231
## s.e.  0.1272   0.1284   0.0961  0.1127   0.1015  0.1174
## 
## sigma^2 estimated as 90018881:  log likelihood=-1299.44
## AIC=2612.87   AICc=2613.85   BIC=2632.56

arrivalsarima <- arrival %>%
  model(ARIMA(Arrivals ~ pdq(0,1,1) + PDQ(1,1,1)))

report(arrivalsarima)

## Series: Arrivals 
## Model: ARIMA(0,1,1)(1,1,1)[4] 
## 
## Coefficients:
##           ma1     sar1     sma1
##       -0.7660  -0.1992  -0.2512
## s.e.   0.0565   0.1570   0.1384
## 
## sigma^2 estimated as 99932239:  log likelihood=-1296.22
## AIC=2600.43   AICc=2600.78   BIC=2611.65

arrivalsarima2 <- arrival %>%
  model(ARIMA(Arrivals ~ pdq(1,1,0) + PDQ(1,1,1)))

report(arrivalsarima2)

## Series: Arrivals 
## Model: ARIMA(1,1,0)(1,1,1)[4] 
## 
## Coefficients:
##           ar1     sar1     sma1
##       -0.5194  -0.2305  -0.1849
## s.e.   0.0813   0.1563   0.1396
## 
## sigma^2 estimated as 126853251:  log likelihood=-1310.24
## AIC=2628.47   AICc=2628.81   BIC=2639.69

g.) Write the model in terms of the backshift operator, then without using the backshift operator.

Question 8

a.) Select a time series from Quandl. Then copy its short URL and import the data. Make sure you select a data set that is Free.

whywontthiswork <- Quandl("BP/GEO_CAP_TAFR", api_key="Zu7ectruWmSzVxsDU2Eo")

b.) Plot graphs of the data, and try to identify an appropriate ARIMA model.

whywontthiswork <- Quandl("BP/GEO_CAP_TAFR", api_key="Zu7ectruWmSzVxsDU2Eo")  %>%
  mutate(Date=rev(Date)) %>% 
  mutate(Value=rev(Value)) %>% 
  mutate(num=row_number()) %>% 
  as_tsibble(index=num) %>% 
  select(-Date)
autoplot(whywontthiswork)

## Plot variable not specified, automatically selected `.vars = Value`

whywontthiswork %>%
  features(Value, unitroot_ndiffs)

## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      1

whywontthiswork%>%
  features(Value, unitroot_nsdiffs)

## # A tibble: 1 × 1
##   nsdiffs
##     <int>
## 1       0

#based off of these two, I will need a first difference
 
whywontthiswork %>% 
  gg_tsdisplay(Value,plot_type = "partial")

compare <- whywontthiswork %>% 
  model(
    'Auto' = ARIMA(Value),
    '0,1,0' = ARIMA(Value ~ pdq(0,1,0)),
    '1,1,0' = ARIMA(Value ~ pdq(1,1,0)),
    "0,1,1" = ARIMA(Value ~ pdq(0,1,1)),
    "1,1,1" = ARIMA(Value ~ pdq(1,1,1))) 
compare %>% 
  report()

## Warning in report.mdl_df(.): Model reporting is only supported for individual
## models, so a glance will be shown. To see the report for a specific model, use
## `select()` and `filter()` to identify a single model.

## # A tibble: 5 × 8
##   .model sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
##   <chr>   <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
## 1 Auto    4249.   -123.  249.  250.  251. <cpl [0]> <cpl [0]>
## 2 0,1,0   4249.   -123.  249.  250.  251. <cpl [0]> <cpl [0]>
## 3 1,1,0   4203.   -122.  250.  251.  253. <cpl [1]> <cpl [0]>
## 4 0,1,1   4092.   -122.  249.  251.  253. <cpl [0]> <cpl [1]>
## 5 1,1,1   4303.   -122.  251.  254.  256. <cpl [1]> <cpl [1]>

The ARIMA(0,1,0) model has the lowest AICc.

c.) Do residual diagnostic checking of your ARIMA model. Are the residuals white noise?

quan010 <- whywontthiswork%>%
  model('Auto' = ARIMA(Value))
quan010%>%
  gg_tsresiduals()

This model does not produce white noise in the residuals. The residuals also do not follow a normal distribution.

d.) Use your chosen ARIMA model to forecast the next four years.

quan010%>%
  forecast(h=4)%>%
  autoplot(whywontthiswork)

e.) Now try to identify an appropriate ETS model.

whywontthiswork %>% 
  model(
    'Auto' = ETS(Value),
    'Simple' = ETS(Value ~ error("A") + trend("N") + season("N")),
    'Holt' = ETS(Value ~ error("A") + trend("A") + season("N")),
    'Multiplicative Trend' = ETS(Value ~ error("A") + trend("M") + season("N"))) %>% 
  report(whywontthiswork)

## Warning in report.mdl_df(., whywontthiswork): Model reporting is only supported
## for individual models, so a glance will be shown. To see the report for a
## specific model, use `select()` and `filter()` to identify a single model.

## # A tibble: 4 × 9
##   .model                  sigma2 log_lik   AIC  AICc   BIC   MSE   AMSE    MAE
##   <chr>                    <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>  <dbl>
## 1 Auto                    0.0769   -125.  260.  263.  265. 4204. 11236.  0.173
## 2 Simple               5587.       -134.  274.  276.  278. 5101. 15541. 34.2  
## 3 Holt                 4871.       -132.  273.  277.  279. 4024. 10851. 38.7  
## 4 Multiplicative Trend 4697.       -131.  272.  276.  278. 3880.  9580. 40.0

f.) Do residual diagnostic checking of your ETS model. Are the residuals white noise?

etsquan<-whywontthiswork%>%
  model('Auto' = ETS(Value))
etsquan%>%
  gg_tsresiduals()

g.) Use your chosen ETS model to forecast the next four years.

etsquan%>%
  forecast(h=4)%>%
  autoplot(whywontthiswork)

h.) Which of the two models do you prefer?

Although I wouldn’t bet on either model, I prefer the Auto ETS model. It has a more reasonable residual distribution.