These plots have different confidence intervals for their autocorrelation coefficients. The plots do indicate that the data are white noise because the autocorrelation lines are randomly distributed have no obvious pattern and no lines breach the 95% confidence interval.
The length of the time series, T, is part of the calculation for critical values. The underlying time series of each plot has a different T, which leads to varying critical values.
amzn_close_px <- gafa_stock %>%
filter(Symbol=='AMZN')
amzn_close_px %>%
gg_tsdisplay(Close, plot_type="partial")
## Warning: Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
The AMZN closing stock price plot has a clear trend and levels that are dependent on which period the time series is observed.
The ACF plot shows a high level of positive correlation in closing price out to 30 lags, with autocorrelation values of 1 or close to 1. All ACF values are far outside the 95% confidence interval. The PACF chart has some statistically significant autocorrelation spikes that are sort of random.
Removing the upward trend and the positive, direct relationship in value between lags are reasons to difference this data.
turkish_gdp <- global_economy %>%
filter(Country=="Turkey")
turkish_gdp %>%
gg_tsdisplay(GDP, plot_type = "partial")
There is an upward trend and the ACF and PACF both indicate non-stationary data (high positive autocorrelation). There is no apparent seasonality.
lambda <- BoxCox.lambda(turkish_gdp$GDP, method = "guerrero")
turkish_gdp %>%
gg_tsdisplay(difference(box_cox(GDP,lambda),1),plot_type = "partial")
## Warning: Removed 1 row(s) containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_point).
The differenced and transformed data appears stationary, with relatively constant variance and no discernible trend or seasonality. There are no significant autocorrelations in either ACF plot.
tas_takings <- aus_accommodation %>%
filter(State=="Tasmania")
tas_takings %>%
gg_tsdisplay(Takings,plot_type="partial")
There is clear seasonality, an upward trend, and an increasing variance depending on level. The autocorrelation plots show statistically significant autocorrelations and partial autocorrelations out to a relatively high number.
Let’s try a seasonal differencing.
lambda <- BoxCox.lambda(tas_takings$Takings, method = "guerrero")
tas_takings %>%
gg_tsdisplay(difference(box_cox(Takings,lambda),12) %>% difference(), plot_type="partial", lag=36)
## Warning: Removed 13 row(s) containing missing values (geom_path).
## Warning: Removed 13 rows containing missing values (geom_point).
Data required a second differencing with a BoxCox transformation to appear stationary. Probably could have saved time with an nsdiffs test on the log of Takings and the differenced Takings.
souvenirs %>%
gg_tsdisplay(plot_type = "partial")
## Plot variable not specified, automatically selected `y = Sales`
Souvenir sales data is seasonal, has increasing variance, and an upward trend. Clear patterns in ACF plots.
Let’s find lambda and test for seasonality and differencing transformation indicators.
lambda <- BoxCox.lambda(souvenirs$Sales, method = "guerrero")
souvenirs %>%
mutate(boxcox_sales = box_cox(Sales,lambda)) %>%
features(boxcox_sales, unitroot_nsdiffs)
## # A tibble: 1 × 1
## nsdiffs
## <int>
## 1 1
souvenirs %>%
mutate(boxcox_sales = difference(box_cox(Sales,lambda),12)) %>%
features(boxcox_sales, unitroot_ndiffs)
## # A tibble: 1 × 1
## ndiffs
## <int>
## 1 0
These results indicate that only a seasonal differencing is necessary, no additional.
souvenirs %>%
gg_tsdisplay(difference(box_cox(Sales,lambda),12),plot_type = "partial",lag=36)
## Warning: Removed 12 row(s) containing missing values (geom_path).
## Warning: Removed 12 rows containing missing values (geom_point).
Data now appears stationary and the noticeable spikes in the ACF and PACF plots would give us clues as to which configuration should work for ARIMA.
set.seed(81023948)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries %>%
gg_tsdisplay(Turnover,plot_type="partial")
Not stationary.
lambda <- BoxCox.lambda(myseries$Turnover, method = "guerrero")
myseries %>%
mutate(boxcox_turnover = box_cox(Turnover,lambda)) %>%
features(boxcox_turnover, unitroot_nsdiffs)
## # A tibble: 1 × 3
## State Industry nsdiffs
## <chr> <chr> <int>
## 1 Tasmania Clothing retailing 1
myseries %>%
mutate(boxcox_turnover = difference(box_cox(Turnover,lambda),12)) %>%
features(boxcox_turnover, unitroot_ndiffs)
## # A tibble: 1 × 3
## State Industry ndiffs
## <chr> <chr> <int>
## 1 Tasmania Clothing retailing 0
These tests indicate a seasonal differencing only.
myseries %>%
gg_tsdisplay(difference(log(Turnover),12) %>%
difference(),plot_type = "partial")
## Warning: Removed 13 row(s) containing missing values (geom_path).
## Warning: Removed 13 rows containing missing values (geom_point).
Although the tests appeared to indicate only a seasonal differencing was needed, a stationary state was reached by using a first order differencing in addition to the seasonal differencing.
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
sim1 <- tsibble(idx = seq_len(100), y = y, index = idx)
for(i in 2:100)
y[i] <- 0.05*y[i-1] + e[i]
sim2 <- tsibble(idx = seq_len(100), y = y, index = idx)
for(i in 2:100)
y[i] <- 0.99*y[i-1] + e[i]
sim3 <- tsibble(idx = seq_len(100), y = y, index = idx)
sim1 %>%
autoplot()
## Plot variable not specified, automatically selected `.vars = y`
sim2 %>%
autoplot()
## Plot variable not specified, automatically selected `.vars = y`
sim3 %>%
autoplot()
## Plot variable not specified, automatically selected `.vars = y`
A higher ϕ1 reduces variation
for(i in 2:100)
y[i] <- e[i] + 0.6*e[i-1]
sim4 <- tsibble(idx = seq_len(100), y=y, index=idx)
for(i in 2:100)
y[i] <- e[i] + 0.05*e[i-1]
sim5 <- tsibble(idx = seq_len(100), y=y, index=idx)
for(i in 2:100)
y[i] <- e[i] + 0.99*e[i-1]
sim6 <- tsibble(idx = seq_len(100), y=y, index=idx)
sim4 %>%
autoplot
## Plot variable not specified, automatically selected `.vars = y`
sim5 %>%
autoplot
## Plot variable not specified, automatically selected `.vars = y`
sim6 %>%
autoplot
## Plot variable not specified, automatically selected `.vars = y`
A higher θ1 introduces more variation
for(i in 2:100)
y[i] <- 0.6*y[i-1] + 0.6*e[i-1] + e[i]
sim7 <- tsibble(idx = seq_len(100), y=y, index=idx)
for(i in 3:100)
y[i] <- -0.8*y[i-1] + 0.3*y[i-2] + e[i]
sim8 <- tsibble(idx = seq_len(100), y=y, index=idx)
sim7 %>% autoplot
## Plot variable not specified, automatically selected `.vars = y`
sim8 %>% autoplot
## Plot variable not specified, automatically selected `.vars = y`
The AR model (trumpet shaped) has a zero mean throughout the series but variance that is changing with the level. The ARMA model appears stationary.
fit <- aus_airpassengers %>%
model(ARIMA(Passengers))
report(fit)
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8963
## s.e. 0.0594
##
## sigma^2 estimated as 4.308: log likelihood=-97.02
## AIC=198.04 AICc=198.32 BIC=201.65
The ARIMA() function selects an ARIMA(0,2,1) model.
fit %>%
gg_tsresiduals()
The residuals do look like white noise.
fit %>%
forecast(h=10) %>%
autoplot(aus_airpassengers)
P = 0
D = 2
Q = 1
(1-B)^2 Yt = c + (1+θ1B)Et
fit2 <- aus_airpassengers %>%
model(arima010 = ARIMA(Passengers ~ 1 + pdq(0,1,0)))
report(fit2)
## Series: Passengers
## Model: ARIMA(0,1,0) w/ drift
##
## Coefficients:
## constant
## 1.4191
## s.e. 0.3014
##
## sigma^2 estimated as 4.271: log likelihood=-98.16
## AIC=200.31 AICc=200.59 BIC=203.97
fit2 %>%
forecast(h=10) %>%
autoplot(aus_airpassengers)
The forecast appears roughly the same.
fit3 <- aus_airpassengers %>%
model(arima212 = ARIMA(Passengers ~ 1 + pdq(2,1,2)))
report(fit3)
## Series: Passengers
## Model: ARIMA(2,1,2) w/ drift
##
## Coefficients:
## ar1 ar2 ma1 ma2 constant
## -0.5518 -0.7327 0.5895 1.0000 3.2216
## s.e. 0.1684 0.1224 0.0916 0.1008 0.7224
##
## sigma^2 estimated as 4.031: log likelihood=-96.23
## AIC=204.46 AICc=206.61 BIC=215.43
fit3 %>%
forecast(h=10) %>%
autoplot(aus_airpassengers)
Removing the constant:
fit4 <- aus_airpassengers %>%
model(arima212 = ARIMA(Passengers ~ 0 + pdq(2,1,2)))
## Warning: 1 error encountered for arima212
## [1] non-stationary AR part from CSS
report(fit3)
## Series: Passengers
## Model: ARIMA(2,1,2) w/ drift
##
## Coefficients:
## ar1 ar2 ma1 ma2 constant
## -0.5518 -0.7327 0.5895 1.0000 3.2216
## s.e. 0.1684 0.1224 0.0916 0.1008 0.7224
##
## sigma^2 estimated as 4.031: log likelihood=-96.23
## AIC=204.46 AICc=206.61 BIC=215.43
fit4 %>%
forecast(h=10) %>%
autoplot(aus_airpassengers)
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning
## -Inf
## Warning: Removed 10 row(s) containing missing values (geom_path).
No forecast is produced due to a warning. Without the constant the data must not be stationary.
fit5 <- aus_airpassengers %>%
model(arima021 = ARIMA(Passengers ~ 1 + pdq(0,2,1)))
## Warning: Model specification induces a quadratic or higher order polynomial trend.
## This is generally discouraged, consider removing the constant or reducing the number of differences.
report(fit3)
## Series: Passengers
## Model: ARIMA(2,1,2) w/ drift
##
## Coefficients:
## ar1 ar2 ma1 ma2 constant
## -0.5518 -0.7327 0.5895 1.0000 3.2216
## s.e. 0.1684 0.1224 0.0916 0.1008 0.7224
##
## sigma^2 estimated as 4.031: log likelihood=-96.23
## AIC=204.46 AICc=206.61 BIC=215.43
fit5 %>%
forecast(h=10) %>%
autoplot(aus_airpassengers)
The slope of the forecast seems to be steeper, like it hasn’t been as dampened as the others.
us_gdp <- global_economy %>%
filter(Country=="United States")
us_gdp %>%
autoplot(GDP)
lambda <- BoxCox.lambda(us_gdp$GDP,method="guerrero")
fit_us_gdp <- us_gdp %>%
model(ARIMA(box_cox(GDP,lambda)))
report(fit_us_gdp)
## Series: GDP
## Model: ARIMA(1,1,0) w/ drift
## Transformation: box_cox(GDP, lambda)
##
## Coefficients:
## ar1 constant
## 0.4586 118.2764
## s.e. 0.1198 9.5123
##
## sigma^2 estimated as 5488: log likelihood=-325.37
## AIC=656.74 AICc=657.19 BIC=662.87
ARIMA(1,1,0) selected.
us_gdp_multi_fit <- us_gdp %>%
model(arima_orig = ARIMA(box_cox(GDP,lambda)),
arima011 = ARIMA(box_cox(GDP,lambda) ~ pdq(0,1,1)),
arima210 = ARIMA(box_cox(GDP,lambda) ~ pdq(2,1,0)),
arima010 = ARIMA(box_cox(GDP,lambda) ~ pdq(0,1,0)))
glance(us_gdp_multi_fit)
## # A tibble: 4 × 9
## Country .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <fct> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 United States arima_orig 5488. -325. 657. 657. 663. <cpl [1]> <cpl [0]>
## 2 United States arima011 5698. -326. 659. 659. 665. <cpl [0]> <cpl [1]>
## 3 United States arima210 5589. -325. 659. 660. 667. <cpl [2]> <cpl [0]>
## 4 United States arima010 6785. -332. 668. 668. 672. <cpl [0]> <cpl [0]>
The auto-selected model appears to be the best based on having the lowed AICc value (though the others don’t appear to be far off).
fit_us_gdp %>%
gg_tsresiduals()
Residuals appear to be white noise.
fit_us_gdp %>%
forecast(h=10) %>%
autoplot(us_gdp)
The forecast does appear reasonable, the trend continues wth a similar slope and an exponential shape.
fit_ets <- us_gdp %>%
model(ETS(GDP))
report(fit_ets)
## Series: GDP
## Model: ETS(M,A,N)
## Smoothing parameters:
## alpha = 0.9990876
## beta = 0.5011949
##
## Initial states:
## l[0] b[0]
## 448093333334 64917355687
##
## sigma^2: 7e-04
##
## AIC AICc BIC
## 3190.787 3191.941 3201.089
fit_ets %>%
forecast(h=10) %>%
autoplot(us_gdp)
ETS model appears to have much wider confidence intervals compared to the ARIMA model.