Per Hyndman, for white noise series, we expect each autocorrelation to be close to zero and mostly in bounds of the dotted lines. This is the case with these three plots.
This is due to the varying amounts of observations, which defines the length of the time series. In a series with a larger T, hence larger number of observations, the bounds will be smaller and coefficients of the autocorrelation will also decrease.
library(fpp3)
## Warning: package 'fpp3' was built under R version 4.2.3
## ── Attaching packages ────────────────────────────────────────────── fpp3 0.5 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.1.4
## ✔ dplyr 1.1.3 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.0 ✔ feasts 0.3.1
## ✔ lubridate 1.9.3 ✔ fable 0.3.3
## ✔ ggplot2 3.4.4 ✔ fabletools 0.4.0
## Warning: package 'tibble' was built under R version 4.2.3
## Warning: package 'dplyr' was built under R version 4.2.3
## Warning: package 'tidyr' was built under R version 4.2.3
## Warning: package 'lubridate' was built under R version 4.2.3
## Warning: package 'ggplot2' was built under R version 4.2.3
## Warning: package 'tsibble' was built under R version 4.2.3
## Warning: package 'tsibbledata' was built under R version 4.2.3
## Warning: package 'feasts' was built under R version 4.2.3
## Warning: package 'fabletools' was built under R version 4.2.3
## Warning: package 'fable' was built under R version 4.2.3
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
library(dplyr)
library(ggplot2)
library(urca)
## Warning: package 'urca' was built under R version 4.2.3
gafa_stock %>%
filter(Symbol == "AMZN") %>%
gg_tsdisplay(Close, plot_type='partial')
## Warning: Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
There is an upwards trend but no seasonality, which is required for a
stationary series. The ACF plot is not normally distributed and the
coefficients are all outside of the upper bounds and has a slight
decrease towards zero over time indicating non-stationary data. Per
Hyndman, in the PACF, a significant spike at lag p, as shown above, is
also indicative.
tk_gdp <- global_economy %>%
filter(Country=='Turkey')
lambda_tk_gdp <- tk_gdp %>%
features(GDP, features = guerrero) %>%
pull(lambda_guerrero)
tk_gdp %>%
mutate(GDP = box_cox(GDP, lambda_tk_gdp)) %>%
features(GDP, unitroot_ndiffs)
## # A tibble: 1 × 2
## Country ndiffs
## <fct> <int>
## 1 Turkey 1
The lambda returned for an appropriate box-cox transformation on the turkey gdp data is 0.1572. In addition, the unit root test on the transformed data returns 1 difference required to make the series stationary.
tasmania_accommodation <- aus_accommodation %>%
filter(State == 'Tasmania')
lambda_tasmania <- tasmania_accommodation %>%
features(Takings, features = guerrero) %>%
pull(lambda_guerrero)
tasmania_accommodation %>%
mutate(Takings = box_cox(Takings, lambda_tasmania)) %>%
features(Takings, unitroot_ndiffs)
## # A tibble: 1 × 2
## State ndiffs
## <chr> <int>
## 1 Tasmania 1
The lambda returned for an appropriate box-cox transformation on the Tasmania accommodation takings data is 0.002. In addition, the unit root test on the transformed data returns 1 difference required to make the series stationary.
lambda_souvenirs <- souvenirs %>%
features(Sales, features = guerrero) %>%
pull(lambda_guerrero)
souvenirs %>%
mutate(Sales = box_cox(Sales, lambda_souvenirs)) %>%
features(Sales, unitroot_ndiffs)
## # A tibble: 1 × 1
## ndiffs
## <int>
## 1 1
The lambda returned for an appropriate box-cox transformation on the souvenirs data is 0.002. In addition, the unit root test on the transformed data returns 1 difference required to make the series stationary.
set.seed(12345678)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries %>%
gg_tsdisplay(Turnover, plot_type='partial')
There is an upwards trend but no seasonality, which is required for a stationary series. The ACF plot is not normally distributed and the coefficients are all outside of the upper bounds and has a slight decrease towards zero over time indicating non-stationary data. Per Hyndman, in the PACF, a significant spike at lag p, as shown above, is also indicative.
We can apply transformations and find ndiffs required to make it stationary.
lambda_myseries <- myseries %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
myseries %>%
mutate(Turnover = box_cox(Turnover, lambda_myseries)) %>%
features(Turnover, unitroot_ndiffs)
## # A tibble: 1 × 3
## State Industry ndiffs
## <chr> <chr> <int>
## 1 Northern Territory Clothing, footwear and personal accessory retailing 1
The lambda returned for an appropriate box-cox transformation on the retao;data is 0.27. In addition, the unit root test on the transformed data returns 1 difference required to make the series stationary.
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
#at phi = 0.6
sim %>%
autoplot(y)
# at phi = 0.1
for(i in 2:100)
y[i] <- 0.1*y[i-1] + e[i]
sim_0.1 <- tsibble(idx = seq_len(100), y = y, index = idx)
sim_0.1 %>% autoplot(y)
# at phi = 0.5
for(i in 2:100)
y[i] <- 0.5*y[i-1] + e[i]
sim_0.5 <- tsibble(idx = seq_len(100), y = y, index = idx)
sim_0.5 %>% autoplot(y)
# at phi = 1
for(i in 2:100)
y[i] <- y[i-1] + e[i]
sim_1 <- tsibble(idx = seq_len(100), y = y, index = idx)
sim_1 %>% autoplot(y)
There is more seasonality in phi = 0.1 than in phi = 0.5 and more in phi
= 0.5 than phi = 1, indicating more stationarity as phi is lower.
for(i in 2:100)
y[i] <- 0.6*e[i-1] + e[i]
ma.1 <- tsibble(idx = seq_len(100), y = y, index = idx)
for(i in 2:100)
y[i] <- 0.1*e[i-1] + e[i]
ma1.b <- tsibble(idx = seq_len(100), y = y, index = idx)
ma1.b %>%
autoplot(y)
for(i in 2:100)
y[i] <- .5*e[i-1] + e[i]
ma1.c <- tsibble(idx = seq_len(100), y = y, index = idx)
ma1.c %>%
autoplot(y)
for(i in 2:100)
y[i] <- e[i-1] + e[i]
ma1.d<- tsibble(idx = seq_len(100), y = y, index = idx)
ma1.d %>%
autoplot(y)
There is not much change as θ increases or decreases, which means the
data is probably stationary.
e <- rnorm(100)
y <- numeric(100)
for(i in 2:100)
y[i] <- e[i] + 0.6 * e[i-1] + 0.6 * y[i-1]
arma <- tsibble(idx = seq_len(100), y = y, index = idx)
arma %>% autoplot(y)
### f. Generate data from an AR(2) model with ϕ1=-0.8, ϕ2=0.3 and σ2=1.
(Note that these parameters will give a non-stationary series.)
y <- numeric(100)
e <- rnorm(100)
for(i in 3:100)
y[i] <- -0.8 * y[i-1] + 0.3 * y[i-2] + e[i]
ar <- tsibble(idx = seq_len(100), y = y, index = idx)
ar %>% autoplot(y)
### g. Graph the latter two series and compare them.
gg_tsdisplay(arma,plot_type = "partial")
## Plot variable not specified, automatically selected `y = y`
gg_tsdisplay(ar,plot_type = "partial")
## Plot variable not specified, automatically selected `y = y`
The ARMA model appears to be stationary while AR appears to be
non-stationary since we can observe the variance is not constant but
increasing when as the series goes on.
fit <- aus_airpassengers %>%
model(ARIMA(Passengers))
report(fit)
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8963
## s.e. 0.0594
##
## sigma^2 estimated as 4.308: log likelihood=-97.02
## AIC=198.04 AICc=198.32 BIC=201.65
ARIMA(0,2,1)
fit %>%
gg_tsresiduals()
The residuals are white noise
fit %>% forecast(h=10) %>%
autoplot(aus_airpassengers)
yt=−0.8963 x E(t−1)+E(t)
fit_010 <- aus_airpassengers %>%
model(ARIMA(Passengers ~ 1+pdq(0,1,0))) %>%
forecast(h=10) %>%
autoplot(aus_airpassengers)
fit_010
fit_212 <- aus_airpassengers %>%
model(ARIMA(Passengers ~ 1+pdq(2,1,2))) %>%
forecast(h=10) %>%
autoplot(aus_airpassengers)
fit_212
fit_212_noconstant <- aus_airpassengers %>%
model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))
## Warning: 1 error encountered for ARIMA(Passengers ~ 0 + pdq(2, 1, 2))
## [1] non-stationary AR part from CSS
Receiving an error
fit_021 <- aus_airpassengers %>%
model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))
## Warning: Model specification induces a quadratic or higher order polynomial trend.
## This is generally discouraged, consider removing the constant or reducing the number of differences.
We get a warning when using a constant on the ARIMA (0,2,1) model.
us_gdp <- global_economy %>%
filter(Country == "United States")
us_gdp%>%
autoplot(GDP)
lambda_usgdp <- us_gdp %>%
features(GDP,features=guerrero) %>%
pull(lambda_guerrero)
lambda_usgdp
## [1] 0.2819443
us_gdp %>%
autoplot(box_cox(GDP, lambda_usgdp))
fit_usgdp <- us_gdp %>%
model(ARIMA(box_cox(GDP, lambda_usgdp)))
report(fit_usgdp)
## Series: GDP
## Model: ARIMA(1,1,0) w/ drift
## Transformation: box_cox(GDP, lambda_usgdp)
##
## Coefficients:
## ar1 constant
## 0.4586 118.1822
## s.e. 0.1198 9.5047
##
## sigma^2 estimated as 5479: log likelihood=-325.32
## AIC=656.65 AICc=657.1 BIC=662.78
ARIMA(1,1,0) w/drift seems to be the best fit
fit_a <- us_gdp %>%
model(ARIMA(GDP ~ pdq(1,2,2)))
report(fit_a)
## Series: GDP
## Model: ARIMA(1,2,2)
##
## Coefficients:
## ar1 ma1 ma2
## 0.2053 -0.5912 -0.1928
## s.e. 0.3008 0.2886 0.2102
##
## sigma^2 estimated as 2.646e+22: log likelihood=-1523.86
## AIC=3055.72 AICc=3056.51 BIC=3063.82
fit_b <- us_gdp %>%
model(ARIMA(GDP ~ pdq(2,2,2)))
report(fit_b)
## Series: GDP
## Model: ARIMA(2,2,2)
##
## Coefficients:
## ar1 ar2 ma1 ma2
## 1.3764 -0.4780 -1.9659 1.0000
## s.e. 0.1216 0.1354 0.0723 0.0719
##
## sigma^2 estimated as 2.283e+22: log likelihood=-1521.14
## AIC=3052.27 AICc=3053.47 BIC=3062.4
fit_c <- us_gdp %>%
model(ARIMA(GDP ~ pdq(1,1,0)))
report(fit_c)
## Series: GDP
## Model: ARIMA(1,1,0)
##
## Coefficients:
## ar1
## 0.9160
## s.e. 0.0522
##
## sigma^2 estimated as 3.044e+22: log likelihood=-1556.73
## AIC=3117.47 AICc=3117.69 BIC=3121.55
Trying other values, the lowest AIC is still from part b of the question with the ARIMA(1,1,0) w/drift.
fit_110 <- us_gdp %>%
model(ARIMA(box_cox(GDP, lambda_usgdp) ~ 1 + pdq(1, 1, 0)))
fit_110 %>%
report()
## Series: GDP
## Model: ARIMA(1,1,0) w/ drift
## Transformation: box_cox(GDP, lambda_usgdp)
##
## Coefficients:
## ar1 constant
## 0.4586 118.1822
## s.e. 0.1198 9.5047
##
## sigma^2 estimated as 5479: log likelihood=-325.32
## AIC=656.65 AICc=657.1 BIC=662.78
fit_110 %>%
gg_tsresiduals()
The residuals are right skewed however there is no autocorrelation,
denoting white noise.
fit_110 %>% forecast(h=10) %>%
autoplot(us_gdp)
Appears reasonable given the continuous upwards trend
fit_ets <- us_gdp %>%
model(ETS(GDP))
report(fit_ets)
## Series: GDP
## Model: ETS(M,A,N)
## Smoothing parameters:
## alpha = 0.9990876
## beta = 0.5011949
##
## Initial states:
## l[0] b[0]
## 448093333334 64917355687
##
## sigma^2: 7e-04
##
## AIC AICc BIC
## 3190.787 3191.941 3201.089
fit_ets %>% forecast(h = 10) %>%
autoplot(us_gdp)
Looking at the AIC, the ARIMA(1,1,0) model is lower, suggesting a better
performing model.