Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers. ## 9.1a) Explain the differences among these figures. Do they all indicate that the data are white noise?
Each figures has different spikes and also the size of their bounded areas are different. They indicate that they are white noise because they all lie within two blue bound lines which can be described as \(\pm1.96/\sqrt{T}\), where T is the length of the time series.
Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?
Referencing the above equation, as T increases, the critical value decreases. The autocorrelations are different because each ACF has an increasing amount of random numbers, which decreases the chance of autocorrelation.
A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.
gafa_stock %>%
filter(Symbol == "AMZN") %>%
gg_tsdisplay(Close, plot_type='partial') +
labs(title = "Amazon Closing Stock Price")
## Warning: Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
gafa_stock %>%
filter(Symbol == "AMZN") %>%
features(Close, unitroot_ndiffs)
## # A tibble: 1 × 2
## Symbol ndiffs
## <chr> <int>
## 1 AMZN 1
The time series shows increasing trend and no seasonal spikes, indicating it non-stationary. The ACF plot is slightly decreasing and has a large and positive \(r1\), whereas a stationary time series decrease to 0 relatively quick.
The PACF plot has a first lag of around 1 showing it as non-stationary. It should be differenced to stabilize the mean.
The KPSS test shows it needs to be differenced once to be stationary.
For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
Turkish GDP from global_economy.
# plot
global_economy %>%
filter(Country == "Turkey") %>%
gg_tsdisplay(GDP, plot_type='partial') +
labs(title = "Non-transformed Turkish GDP")
# calculate lambda
lambda <- global_economy %>%
filter(Country == "Turkey") %>%
features(GDP, features = guerrero) %>%
pull(lambda_guerrero)
# unit root test
global_economy %>%
filter(Country == "Turkey") %>%
features(box_cox(GDP,lambda), unitroot_ndiffs)
## # A tibble: 1 × 2
## Country ndiffs
## <fct> <int>
## 1 Turkey 1
Turkish GDP is non-seasonal and non-stationary. The KPSS test recommends only differencing once, since it is in years, we difference 1.
# transformed plot
global_economy %>%
filter(Country == "Turkey") %>%
gg_tsdisplay(difference(box_cox(GDP,lambda)), plot_type='partial')
## Warning: Removed 1 row containing missing values (`geom_line()`).
## Warning: Removed 1 rows containing missing values (`geom_point()`).
It seems to be stationary according to the graphs after differencing
once.
Accommodation takings in the state of Tasmania from aus_accommodation.
# plot
aus_accommodation %>%
filter(State == "Tasmania") %>%
gg_tsdisplay(Takings, plot_type='partial') +
labs(title = "Non-transformed Tasmania Accomodation Takings")
# calculate lambda
lambda <-aus_accommodation %>%
filter(State == "Tasmania") %>%
features(Takings, features = guerrero) %>%
pull(lambda_guerrero)
#unit root test
aus_accommodation %>%
filter(State == "Tasmania") %>%
features(box_cox(Takings,lambda), unitroot_nsdiffs)
## # A tibble: 1 × 2
## State nsdiffs
## <chr> <int>
## 1 Tasmania 1
Accommodation takings in the state of Tasmania is seasonal and non-stationary. The KPSS test recommends only differencing once, since it is in quarters, we difference 4.
aus_accommodation %>%
filter(State == "Tasmania") %>%
gg_tsdisplay(difference(box_cox(Takings, lambda), 4), plot_type='partial')
## Warning: Removed 4 rows containing missing values (`geom_line()`).
## Warning: Removed 4 rows containing missing values (`geom_point()`).
It seems to be stationary according to the graphs after differencing once. The ACF is decaying and the PACF plot is truncated after the first lag.
Monthly sales from souvenirs.
# plot
souvenirs %>%
gg_tsdisplay(Sales, plot_type='partial', lag = 36) +
labs(title = "Non-transformed Monthly Souvenir Sales")
# calculate lambda
lambda <- souvenirs %>%
features(Sales, features = guerrero) %>%
pull(lambda_guerrero)
# unit root test
souvenirs %>%
features(box_cox(Sales,lambda), unitroot_nsdiffs)
## # A tibble: 1 × 1
## nsdiffs
## <int>
## 1 1
Monthly souvernir sales is seasonal and non-stationary. The KPSS test recommends only differencing once, since it is in months, we difference 12.
It seems to be stationary according to the graphs after differencing once. The ACF is fast decaying and the PACF plot is truncated after the first lag.
For your retail data (from Exercise 8 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data
set.seed(43)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
# plot
myseries %>%
gg_tsdisplay(Turnover, plot_type='partial', lag = 12) +
labs(title = "Non-transformed Retail Turnover")
# lambda calculation
lambda <- myseries %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
# unit root test
myseries %>%
features(box_cox(Turnover, lambda), unitroot_nsdiffs)
## # A tibble: 1 × 3
## State Industry nsdiffs
## <chr> <chr> <int>
## 1 Western Australia Hardware, building and garden supplies retailing 1
myseries %>%
gg_tsdisplay(difference(box_cox(Turnover,lambda), 12), plot_type='partial', lag = 12)
## Warning: Removed 12 rows containing missing values (`geom_line()`).
## Warning: Removed 12 rows containing missing values (`geom_point()`).
set.seed(54)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100) {
y[i] <- 0.6*y[i-1] + e[i] }
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
Produce a time plot for the series. How does the plot change as you change \(\phi_1\)?
set.seed(91)
sim %>%
autoplot(y) +
labs(title = "ϕ = 0.6")
# phi = 1
for(i in 2:100)
y[i] <- 1*y[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) %>%
autoplot(y) +
labs(title = "ϕ = 1")
# phi = 0
for(i in 2:100)
y[i] <- 0*y[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) %>%
autoplot(y) +
labs(title = "ϕ = 0")
# phi = -1
for(i in 2:100)
y[i] <- -1*y[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) %>%
autoplot(y) +
labs(title = "ϕ = -1")
Changing the \(\phi_1\) changes the patterns. When \(\phi_1 = 0.6\), it looks white noise, and when \(\phi_1 = 1\), it resembles a random walk. When \(\phi_1 = 0\) and then -1, it has larger variations with each high and low spiking around the mean.
Write your own code to generate data from an MA(1) model with \(\theta_1 = 0.6\) and \(\sigma^2 = 1\).
set.seed(34)
for(i in 2:100)
y[i] <- 1+ e[i] + (.6 * e[i-1])
sim2 <- tsibble(idx = seq_len(100), y = y, index = idx)
sim2 %>%
autoplot(y)
Produce a time plot for the series. How does the plot change as you change \(\theta_1\)?
set.seed(12)
# phi = 1
for(i in 2:100)
y[i] <- 1+ e[i] + (1 * e[i-1])
sim2 <- tsibble(idx = seq_len(100), y = y, index = idx)
sim2 %>%
autoplot(y)
# phi = 1
for(i in 2:100)
y[i] <- 1+ e[i] + (0 * e[i-1])
sim2 <- tsibble(idx = seq_len(100), y = y, index = idx)
sim2 %>%
autoplot(y)
As \(\theta_1\) it changes to be more centered around the mean.
Generate data from an ARMA(1,1) model with \(\phi_1=0.6,\space \theta_1=0.6, \space\sigma^2=1\)
set.seed(76)
y <- numeric(100)
for(i in 2:100)
y[i] <- e[i] + 0.6 * e[i-1] + 0.6 * y[i-1]
arma1_1 <- tsibble(idx = seq_len(100), y = y, index = idx)
Generate data from an AR(2) model with \(\phi_1=-0.8,\space \phi_2=0.3, \space\sigma^2=1\). (Note that these parameters will give a non-stationary series.)
set.seed(761)
y <- numeric(100)
for(i in 3:100)
y[i] <- -0.8 * y[i-1] + 0.3 * y[i-2] + e[i]
ar2 <- tsibble(idx = seq_len(100), y = y, index = idx)
Graph the latter two series and compare them.
arma1_1 %>%
gg_tsdisplay(y, plot_type='partial') +
labs(title = "ARMA(1,1)")
ar2 %>%
gg_tsdisplay(y, plot_type='partial') +
labs(title = "AR(2)")
The ARMA(1,1) is stationary while the AR(2) model is not stationary as it exponentially increases in the end.
Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.
Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.
fit <- aus_airpassengers %>%
filter(Year < 2012) %>%
model(ARIMA(Passengers))
report(fit)
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8756
## s.e. 0.0722
##
## sigma^2 estimated as 4.671: log likelihood=-87.8
## AIC=179.61 AICc=179.93 BIC=182.99
ARIMA(0,2,1) was the selected model
fit %>%
forecast(h=10) %>%
autoplot(aus_airpassengers) +
labs(title = "Australian Aircraft Passengers for ARIMA(0,2,1)", y = "Passengers (in millions)")
fit %>%
gg_tsresiduals() +
labs(title = "Residualsfor ARIMA(0,2,1)")
The residuals do look like white noise hovering around the mean which is ~0
Write the model in terms of the backshift operator. \(y_t = -0.8756 \varepsilon_{t-1} + \varepsilon_t\) $ $
\(B(y_t) = (B-1)*-0.8756 \varepsilon_{t-1} + \varepsilon_t\) $ $
\(By_t = (1+0.8756B) \varepsilon_{t}+ \varepsilon_t\)
Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.
fit2 <-aus_airpassengers %>%
filter(Year < 2012) %>%
model(ARIMA(Passengers ~ pdq(0,1,0)))
fit2 %>%
forecast(h=10) %>%
autoplot(aus_airpassengers) +
labs(title = "ARIMA(0,1,0)", y = "Passengers (in millions)")
fit2 %>%
gg_tsresiduals() +
labs(title = "Residuals for ARIMA(0,1,0)")
The ARIMA model from part A forecasted greater than the actual values while the part C ARIMA model forecasted values less than the actual values. The slope for part C appears to have a lower slope than part A.
Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.
fit3 <-aus_airpassengers %>%
filter(Year < 2012) %>%
model(ARIMA(Passengers ~ pdq(2,1,2)))
fit3 %>%
forecast(h=10) %>%
autoplot(aus_airpassengers) +
labs(title = "Australian Aircraft Passengers for ARIMA(2,1,2)", y = "Passengers (in millions)")
fit3 %>%
gg_tsresiduals() +
labs(title = "Residuals for ARIMA(2,1,2)")
It is in between the results for parts A and C. The residuals still seem to be white noise.
#removing constant
fit4 <-aus_airpassengers %>%
filter(Year < 2012) %>%
model(ARIMA(Passengers ~ 0 + pdq(2,1,2), method="ML"))
report(fit4)
## Series: Passengers
## Model: ARIMA(2,1,2)
##
## Coefficients:
## ar1 ar2 ma1 ma2
## 1.8130 -0.8156 -1.9641 0.9998
## s.e. 0.1321 0.1320 0.1294 0.1309
##
## sigma^2 estimated as 4.203: log likelihood=-88.19
## AIC=186.37 AICc=188.09 BIC=194.94
fit4 %>%
forecast(h=10) %>%
autoplot(aus_airpassengers) +
labs(title = "Australian Aircraft Passengers for ARIMA(2,1,2)", y = "Passengers (in millions)and 0 constant")
fit4 %>%
gg_tsresiduals() +
labs(title = "Residuals for ARIMA(2,1,2) and 0 constant")
When the constant is removed, a null model is produced and you have to
force through with method = “ML”. When a constant is omitted, the
forecast included a polynomial trend of order d−1, or in this case, 0
and making it non-stationary. But trajectory of the passenegers becomes
more exponential, following the trend we see occuring.
Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?
fit5 <-aus_airpassengers %>%
filter(Year < 2012) %>%
model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))
## Warning: Model specification induces a quadratic or higher order polynomial trend.
## This is generally discouraged, consider removing the constant or reducing the number of differences.
fit5 %>%
forecast(h=10) %>%
autoplot(aus_airpassengers) +
labs(title = "Australian Aircraft Passengers for ARIMA(0,2,1) with a constant", y = "Passengers (in millions)")
fit5 %>%
gg_tsresiduals() +
labs(title = "Residuals for ARIMA(0,2,1) with a constant")
We see the same trend as with the 0 constant, the forecast now follows the trend, but with the constant we see that it increases the forecast to be greater than the actual numbers, whereas with 0 we saw closer to numbers.
For the United States GDP series (from global_economy):
if necessary, find a suitable Box-Cox transformation for the data;
us_gdp <- global_economy %>%
filter(Code == "USA")
us_gdp %>%
gg_tsdisplay(GDP, plot_type='partial') +
labs(title = "United States GDP")
We can see that the decreasing values in the ACF indicate that we should
attain a box cox:
lambda <- forecast::BoxCox.lambda(us_gdp$GDP)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
us_gdp$BCGDP = forecast::BoxCox(us_gdp$GDP, lambda)
us_gdp %>%
gg_tsdisplay(BCGDP, plot_type='partial') +
labs(title = "Transformed United States GDP")
We see that the data is now more linear than exponential, but to get
better results we could also difference.
fit a suitable ARIMA model to the transformed data using ARIMA();
fit <- us_gdp %>%
model(ARIMA(BCGDP))
report(fit)
## Series: BCGDP
## Model: ARIMA(1,1,0) w/ drift
##
## Coefficients:
## ar1 constant
## 0.4586 118.2764
## s.e. 0.1198 9.5123
##
## sigma^2 estimated as 5488: log likelihood=-325.37
## AIC=656.74 AICc=657.19 BIC=662.87
ARIMA(1,1,0) with a drift was fitted to the transformed data.
try some other plausible models by experimenting with the orders chosen;
Let’s try a few:
usa_fit <- us_gdp %>%
model(ARIMA110 = ARIMA((BCGDP) ~ pdq(1,1,0)),
ARIMA1100 = ARIMA((BCGDP) ~ 0 + pdq(1,1,0)),
ARIMA1101 = ARIMA((BCGDP) ~ 1 + pdq(1,1,0)),
ARIMA120 = ARIMA((BCGDP) ~ pdq(1,2,0)),
ARIMA210 = ARIMA((BCGDP) ~ pdq(2,1,0)),
ARIMA212 = ARIMA((BCGDP) ~ pdq(2,1,2)),
ARIMA111 = ARIMA((BCGDP) ~ pdq(1,1,1)))
glance(usa_fit) %>% arrange(AICc) %>% select(.model:BIC)
## # A tibble: 7 × 6
## .model sigma2 log_lik AIC AICc BIC
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ARIMA120 6791. -326. 656. 656. 660.
## 2 ARIMA110 5488. -325. 657. 657. 663.
## 3 ARIMA1101 5488. -325. 657. 657. 663.
## 4 ARIMA111 5589. -325. 659. 660. 667.
## 5 ARIMA210 5589. -325. 659. 660. 667.
## 6 ARIMA212 5744. -325. 662. 664. 674.
## 7 ARIMA1100 7049. -334. 672. 672. 676.
The models have similar AIC and AICc values, with ARIMA(1,2,0) having the lowest and ARIMA(1,1,0) in second.
choose what you think is the best model and check the residual diagnostics;
We will look at ARIMA120 since it had the best AICc
usa_fit %>%
select(ARIMA120) %>%
gg_tsresiduals() +
ggtitle("Residuals for ARIMA(1,2,0)")
The residuals show white noise with around a normal curve with a left
skew
produce forecasts of your fitted model. Do the forecasts look reasonable?
usa_fit %>%
forecast(h=5) %>%
filter(.model=='ARIMA120') %>%
autoplot(us_gdp)
It does look the forecast follows the trend, which is increasing.
compare the results with what you would obtain using ETS() (with no transformation).
fit_ets <- us_gdp %>%
model(ETS(GDP))
report(fit_ets)
## Series: GDP
## Model: ETS(M,A,N)
## Smoothing parameters:
## alpha = 0.9990876
## beta = 0.5011949
##
## Initial states:
## l[0] b[0]
## 448093333334 64917355687
##
## sigma^2: 7e-04
##
## AIC AICc BIC
## 3190.787 3191.941 3201.089
fit_ets %>%
forecast(h=5) %>%
autoplot(global_economy)
The AICc is 5x greater than the ARIMA(1,2,0) indicating that it is worse than the ARIMA Model.