library(fpp3)
library(urca)
library(gridExtra)
All the lines are within the blue dotted lines, so all three plots show white noise. From the Hyndman book,
The dashed blue lines indicate whether the correlations are significantly different from zero. For a white noise series, we expect 95% of the spikes in the ACF to lie within ±2/√T where T is the length of the time series. If one or more large spikes are outside these bounds, or if substantially more than 5% of spikes are outside these bounds, then the series is probably not white noise.
These three plots differ in the confidence that there is no autocorrelation. For the plot with 1000 numbers, there is high confidence that the data is not autocorrelated. When there is more data (36 vs 360 vs 1000 samples),
A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.
azn <- gafa_stock %>% filter(Symbol == "AMZN")
azn %>% ACF(Close) %>% autoplot()
## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.
The ACF would drop to zero quickly if this were a stationary time
series. Therefore this is not stationary.
azn %>% PACF(Close) %>% autoplot()
## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.
For non-stationary data, the value of the first lag is often large and positive, as seen here. Both plots should be differenced.
For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
Turkish GDP from global_economy. Accommodation takings in the state of Tasmania from aus_accommodation. Monthly sales from souvenirs.
Turkish GDP
turkey_gdp <- global_economy %>% filter(Country=="Turkey")
turkey_gdp %>% autoplot(GDP)
turkey_gdp %>% ACF(GDP) %>% autoplot()
turkey_gdp %>%
features(GDP, unitroot_ndiffs)
## # A tibble: 1 × 2
## Country ndiffs
## <fct> <int>
## 1 Turkey 1
# box-cox transformation + difference
lambda <- turkey_gdp %>%
features(GDP, features = guerrero) %>%
pull(lambda_guerrero)
turkey_gdp %>%
autoplot(difference(box_cox(GDP, lambda)))
## Warning: Removed 1 row containing missing values (`geom_line()`).
Tasmania, Takings
tasmania_accom <- aus_accommodation %>%
filter(State=="Tasmania")
tasmania_accom %>%
autoplot(Takings)
tasmania_accom %>% features(Takings, unitroot_ndiffs)
## # A tibble: 1 × 2
## State ndiffs
## <chr> <int>
## 1 Tasmania 1
# box-cox transformation + difference
lambda <- tasmania_accom %>%
features(Takings, features = guerrero) %>%
pull(lambda_guerrero)
tasmania_accom %>%
autoplot(difference(box_cox(Takings, lambda)))
## Warning: Removed 1 row containing missing values (`geom_line()`).
Monthly sales
souv <- souvenirs
souv %>% autoplot(Sales)
lambda <- souvenirs %>%
features(Sales,features=guerrero) %>%
pull(lambda_guerrero)
souv %>% features(Sales, unitroot_nsdiffs)
## # A tibble: 1 × 1
## nsdiffs
## <int>
## 1 1
souv %>%
PACF(difference(box_cox(Sales, lambda), 12)) %>%
autoplot()
For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
set.seed(50)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries %>% autoplot(Turnover)
myseries %>% ACF(Turnover) %>% autoplot()
myseries %>% features(Turnover, unitroot_nsdiffs)
## # A tibble: 1 × 3
## State Industry nsdiffs
## <chr> <chr> <int>
## 1 Queensland Other recreational goods retailing 1
lambda <- myseries %>% features(Turnover, features=guerrero) %>% pull(lambda_guerrero)
souv %>% features(Sales, unitroot_nsdiffs)
## # A tibble: 1 × 1
## nsdiffs
## <int>
## 1 1
myseries %>% autoplot(difference(box_cox(Turnover, lambda), 12))
## Warning: Removed 12 rows containing missing values (`geom_line()`).
Simulate and plot some data from simple ARIMA models.
Use the following R code to generate data from an AR(1) model with \(\phi_{1}\)=0.6 and \(\sigma^{2}\)=1. The process starts with \(y_{1}\)=0.
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
Produce a time plot for the series. How does the plot change as you change \(\phi_{1}\)?
sim %>% autoplot(y)
\(\phi_{1}\) controls the scale of the
pattern.
Write your own code to generate data from an MA(1) model with \(\theta_{1}\)=0.6 and \(\sigma^{2}\)=1.
yma1 <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
yma1[i] <- 0.6*e[i-1] + e[i]
ma1_data <- tsibble(idx = seq_len(100), y = yma1, index = idx)
Produce a time plot for the series. How does the plot change as you change \(\theta_{1}\)?
ma1_data %>% autoplot(yma1)
When |\(\theta_{1}\)| increases,
observations that are further back in time, and further forward in time,
have a greater influence on the error. When |\(\theta_{1}\)|=1, these observations all
have an equal influence on the error.
Generate data from an ARMA(1,1) model with \(\phi_{1}\)=0.6, \(\theta_{1}\)=0.6, \(\theta_{2}\)=1.
y_arma <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y_arma[i] <- 0.6*y_arma[i-1] +0.6*e[i-1] + e[i]
yarma_data <- tsibble(idx = seq_len(100), y = y_arma, index = idx)
Generate data from an AR(2) model with \(\phi_{1}\)=-0.8, \(\phi_{2}\)=0.3, \(\sigma^{2}\)=1. (Note that these parameters will give a non-stationary series.)
yar2 <- numeric(100)
e <- rnorm(100)
for(i in 3:100)
yar2[i] <- 0.3*yar2[i-2] - 0.8*yar2[i-1] + e[i]
yar2_data <- tsibble(idx = seq_len(100), y = yar2, index = idx)
Graph the latter two series and compare them.
ARMA(1,1)
p1 <- yarma_data %>% autoplot(y_arma) + labs(title="ARMA(1,1)")
p2 <- yar2_data %>% autoplot(yar2) + labs(title="AR(2)")
grid.arrange(p1,p2)
The variance for AR(2) is much higher compared to ARMA(1,1). There appears to be a reversed damping effect.
Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.
Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods. Write the model in terms of the backshift operator. Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a. Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens. Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?
Plot time series
aus_airpassengers %>% autoplot()
## Plot variable not specified, automatically selected `.vars = Passengers`
fit <- aus_airpassengers %>%
model(ARIMA(Passengers))
report(fit)
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8963
## s.e. 0.0594
##
## sigma^2 estimated as 4.308: log likelihood=-97.02
## AIC=198.04 AICc=198.32 BIC=201.65
ARIMA(0,2,1)
fit %>% gg_tsresiduals()
Forecast the next 10 periods:
train <- aus_airpassengers %>% filter_index("2000" ~ "2016")
passengers_fc <- fit %>% forecast(h=10)
passengers_fc %>%
autoplot(train, level = NULL) +
autolayer(
filter_index(aus_airpassengers, "2017" ~ .),
colour = "black"
) +
labs(
y = "Passengers (millions)",
title = "Australia Air Passengers Forecast"
) +
guides(colour = guide_legend(title = "Forecast"))
## Plot variable not specified, automatically selected `.vars = Passengers`
ARIMA(0,2,1) version:
\(y_{t}\) = -0.9*\(\epsilon_{t-1}\) + \(\epsilon_{t}\), where \(\epsilon_{t}\) = \[\sqrt{4.308}\]
Backshift operator version:
\((1-B)^{2}y_{t}\) = \((1-B)^{2}(-0.9*\epsilon_{t-1} + \epsilon_{t})\)
fit2 <- aus_airpassengers %>%
model(`ARIMA(0,1,0) w/ drift` = ARIMA(Passengers ~ 1 + pdq(0,1,0)),
`ARIMA(2,1,2) w/ drift` = ARIMA(Passengers ~ 1 + pdq(2,1,2)),
`ARIMA(0,2,1) w/ constant 20` = ARIMA(Passengers ~ 20 + pdq(0,2,1))
)
## Warning: 1 error encountered for ARIMA(0,2,1) w/ constant 20
## [1] invalid model formula in ExtractVars
fc2 <- fit2 |> forecast(h = 10)
fc2 %>%
autoplot(train, level = NULL) +
autolayer(
filter_index(aus_airpassengers, "2018" ~ .),
colour = "black"
) +
labs(
y = "Passengers (millions)",
title = "Australia, Air Passengers"
) +
guides(colour = guide_legend(title = "Forecast"))
## Plot variable not specified, automatically selected `.vars = Passengers`
## Warning: Removed 10 rows containing missing values (`()`).
For the United States GDP series (from global_economy):
if necessary, find a suitable Box-Cox transformation for the data; fit a suitable ARIMA model to the transformed data using ARIMA(); try some other plausible models by experimenting with the orders chosen; choose what you think is the best model and check the residual diagnostics; produce forecasts of your fitted model. Do the forecasts look reasonable? compare the results with what you would obtain using ETS() (with no transformation).
us_gdp <- global_economy %>% filter(Country=="United States")
us_gdp %>% autoplot(GDP)
There is no evidence of changing variance in the original data so we will not do a Box-Cox transformation right now.
us_gdp %>%
features(GDP, unitroot_ndiffs)
## # A tibble: 1 × 2
## Country ndiffs
## <fct> <int>
## 1 United States 2
double_diff_gdp <- us_gdp %>% transmute(GDP = difference(difference(GDP)))
double_diff_gdp %>% features(GDP, unitroot_ndiffs)
## # A tibble: 1 × 1
## ndiffs
## <int>
## 1 0
double_diff_gdp %>% autoplot(GDP)
## Warning: Removed 2 rows containing missing values (`geom_line()`).
lambda <- double_diff_gdp %>% filter_index("1962" ~ .) %>% features(GDP, features=guerrero) %>% pull(lambda_guerrero)
Box-cox transformation on the double difference GDP. Removes some variance.
double_diff_gdp %>% autoplot(box_cox(GDP, lambda))
## Warning: Removed 2 rows containing missing values (`geom_line()`).
fit3 <- double_diff_gdp %>%
model(ARIMA(box_cox(GDP, lambda)))
report(fit3)
## Series: GDP
## Model: ARIMA(1,0,1)
## Transformation: box_cox(GDP, lambda)
##
## Coefficients:
## ar1 ma1
## 0.4891 -0.7693
## s.e. 0.1833 0.1175
##
## sigma^2 estimated as 8.418e+14: log likelihood=-1041.84
## AIC=2089.67 AICc=2090.12 BIC=2095.85
Residual diagnostics
fit3 %>% gg_tsresiduals()
## Warning: Removed 2 rows containing missing values (`geom_line()`).
## Warning: Removed 2 rows containing missing values (`geom_point()`).
## Warning: Removed 2 rows containing non-finite values (`stat_bin()`).
Forecasts
double_diff_gdp
## # A tsibble: 58 x 2 [1Y]
## Year GDP
## <dbl> <dbl>
## 1 1960 NA
## 2 1961 NA
## 3 1962 21800000000
## 4 1963 -8300000000
## 5 1964 13700000000
## 6 1965 10700000000
## 7 1966 13400000000
## 8 1967 -24600000000
## 9 1968 34100000000
## 10 1969 -3400000000
## # ℹ 48 more rows
ETS
fit4 <- double_diff_gdp %>% filter_index("1962" ~ .) %>%
model(ETS(GDP))
report(fit4)
## Series: GDP
## Model: ETS(A,N,N)
## Smoothing parameters:
## alpha = 0.0001001392
##
## Initial states:
## l[0]
## 7907460002
##
## sigma^2: 3.204469e+22
##
## AIC AICc BIC
## 3131.383 3131.844 3137.459
The ARIMA model has less error.