library(tidyverse)
library(fpp3)
## Warning: package 'tsibble' was built under R version 4.2.3
library(tseries)
library(forecast)
library(kableExtra)
library(reactable)
library(seasonal)
library(tsibble)
knitr::include_graphics("/Users/mohamedhassan/Downloads/Figure 9.32.png")
Figure 9.32: Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers.
As the number of observations increases, the confidence intervals narrow. The figure with 36 random numbers have wide confidence intervals, and as the amount of random numbers increase, the confidence intervals narrow towards zero. Each figure does have data that indicates white noise, since the autocorrelations fall within the confidence intervals and they appear to be randomized with no distinct pattern.
The formula for critical values is \(\pm1.96 \sqrt{T}\), where \(T\) is the number of observations. As sample size \(T\) increases, the critical values decrease. Therefore, the confidence intervals for 360 and 1,000 random numbers have smaller confidence intervals than the figure for 36 random numbers.
gafa_stock), along with the ACF and PACF. Explain how each
plot shows that the series is non-stationary and should be
differenced.amazon_stock <- gafa_stock |>
filter(Symbol == "AMZN")
amazon_stock |>
autoplot(Close)
amazon_stock |>
gg_tsdisplay(Close, plot_type = "partial")
## Warning: Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
When examining the autoplot, it’s clear that the data is not stationary due to its change over time. If the data was stationary, we would see a roughly horizontal pattern with constant variance. While there appears to be constant variance, the data shows that there is predictably patterns with its upward diagonal trajectory. The ACF plot shows the data barely decreasing along the lags, which is indicative of non-stationarity. The PACF plot has one long spike at lag(1), with smaller spikes beyond the confidence intervals. Differencing can help stabilize the mean by removing changes in the level of the time series, and therefore eliminate (or reduce) trend and/or seasonality (Hyndman, Athanasopoulos 9.1).
Using the KPSS test, we can check if the data is non-stationary:
amazon_stock |>
features(Close, unitroot_kpss)
## # A tibble: 1 × 3
## Symbol kpss_stat kpss_pvalue
## <chr> <dbl> <dbl>
## 1 AMZN 14.0 0.01
The kpss_pvalue is 0.01, which indicates that the null hypothesis is rejected and that the data is not stationary. We can use the unitroot_ndiffs function to determine how many times the data should be differenced in order to achieve stationarity:
amazon_stock |>
features(Close, unitroot_ndiffs)
## # A tibble: 1 × 2
## Symbol ndiffs
## <chr> <int>
## 1 AMZN 1
amazon_stock |>
gg_tsdisplay(difference(Close), plot_type = "partial")
## Warning: Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## Warning: Removed 1 row containing missing values (`geom_line()`).
## Warning: Removed 1 rows containing missing values (`geom_point()`).
Differencing the data has made it stationary, with the autoplot appearing to be roughly horizontal and looking the same over time. Although there are spikes beyond the confidence intervals in the ACF and PACF plots, both plots appear to show white noise and be more randomized with no distinct pattern.
global_economy.turkey_gdp <- global_economy |>
filter(Country == "Turkey")
turkey_gdp |>
autoplot(GDP)
turkey_gdp |>
gg_tsdisplay(GDP, plot_type = "partial")
Box-Cox Transformation
lambda_a <- turkey_gdp |>
features(GDP, features = guerrero) |>
pull(lambda_guerrero)
turkey_gdp |>
gg_tsdisplay((box_cox(GDP, lambda_a)), plot_type = "partial")
turkey_gdp |>
features(box_cox(GDP, lambda_a), unitroot_kpss)
## # A tibble: 1 × 3
## Country kpss_stat kpss_pvalue
## <fct> <dbl> <dbl>
## 1 Turkey 1.50 0.01
turkey_gdp |>
features(box_cox(GDP, lambda_a), unitroot_ndiffs)
## # A tibble: 1 × 2
## Country ndiffs
## <fct> <int>
## 1 Turkey 1
turkey_gdp |>
features(difference(box_cox(GDP, lambda_a)), unitroot_ndiffs)
## # A tibble: 1 × 2
## Country ndiffs
## <fct> <int>
## 1 Turkey 0
turkey_gdp |>
gg_tsdisplay(difference(box_cox(GDP, lambda_a)), plot_type = "partial")
## Warning: Removed 1 row containing missing values (`geom_line()`).
## Warning: Removed 1 rows containing missing values (`geom_point()`).
turkey_gdp |>
autoplot(difference(box_cox(GDP, lambda_a))) +
labs(title="Turkey GDP After Transformation and Differencing") +
theme(plot.title = element_text(hjust=0.5))
## Warning: Removed 1 row containing missing values (`geom_line()`).
aus_accommodation.This dataset is seasonal, since the observations are quarterly:
tasmania_takings <- aus_accommodation |>
filter(State == "Tasmania")
tasmania_takings |>
autoplot(Takings)
tasmania_takings |>
gg_tsdisplay(Takings, plot_type = "partial", lag=8)
Box-Cox Transformation
lambda_b <- tasmania_takings |>
features(Takings, features = guerrero) |>
pull(lambda_guerrero)
tasmania_takings |>
features(box_cox(Takings, lambda_b), unitroot_kpss)
## # A tibble: 1 × 3
## State kpss_stat kpss_pvalue
## <chr> <dbl> <dbl>
## 1 Tasmania 1.84 0.01
tasmania_takings |>
features(box_cox(Takings, lambda_b), unitroot_ndiffs)
## # A tibble: 1 × 2
## State ndiffs
## <chr> <int>
## 1 Tasmania 1
tasmania_takings |>
features(difference(box_cox(Takings, lambda_b)), unitroot_ndiffs)
## # A tibble: 1 × 2
## State ndiffs
## <chr> <int>
## 1 Tasmania 0
tasmania_takings |>
autoplot(difference(box_cox(Takings, lambda_b), 4)) +
labs(title="Accomodation Takings in Tasmania After Transformation") +
theme(plot.title = element_text(hjust=0.5))
## Warning: Removed 4 rows containing missing values (`geom_line()`).
tasmania_takings |>
gg_tsdisplay(difference(box_cox(Takings, lambda_b), 4), plot_type = "partial", lag = 8)
## Warning: Removed 4 rows containing missing values (`geom_line()`).
## Warning: Removed 4 rows containing missing values (`geom_point()`).
souvenirs.This dataset is also seasonal, since the data is monthly:
souvenirs |>
autoplot(Sales)
souvenirs |>
gg_tsdisplay(Sales, plot_type = "partial", lag=24)
Box-Cox Transformation
lambda_c <- souvenirs |>
features(Sales, features = guerrero) |>
pull(lambda_guerrero)
souvenirs |>
features(box_cox(Sales, lambda_c), unitroot_kpss)
## # A tibble: 1 × 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 1.79 0.01
souvenirs |>
features(box_cox(Sales, lambda_c), unitroot_ndiffs)
## # A tibble: 1 × 1
## ndiffs
## <int>
## 1 1
souvenirs |>
features(difference(box_cox(Sales, lambda_c)), unitroot_ndiffs)
## # A tibble: 1 × 1
## ndiffs
## <int>
## 1 0
souvenirs |>
autoplot(difference(box_cox(Sales, lambda_c), 12)) +
labs(title="Monthly Souvenir Sales After Transformation") +
theme(plot.title = element_text(hjust=0.5))
## Warning: Removed 12 rows containing missing values (`geom_line()`).
souvenirs |>
gg_tsdisplay(difference(box_cox(Sales, lambda_c), 12), plot_type = "partial", lag = 36)
## Warning: Removed 12 rows containing missing values (`geom_line()`).
## Warning: Removed 12 rows containing missing values (`geom_point()`).
Since the data is monthly, we know this is seasonal data.
set.seed(12345678)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries |>
autoplot(Turnover)
myseries |>
gg_tsdisplay(Turnover, plot_type = "partial")
Based on the autoplot, it appears that the data should be transformed to stabilize the variance:
Box-Cox Transformation
lambda_retail <- myseries |>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
myseries |>
features(box_cox(Turnover, lambda_retail), unitroot_kpss)
## # A tibble: 1 × 4
## State Industry kpss_stat kpss_pvalue
## <chr> <chr> <dbl> <dbl>
## 1 Northern Territory Clothing, footwear and personal acce… 5.43 0.01
myseries |>
features(box_cox(Turnover, lambda_retail), unitroot_ndiffs)
## # A tibble: 1 × 3
## State Industry ndiffs
## <chr> <chr> <int>
## 1 Northern Territory Clothing, footwear and personal accessory retailing 1
myseries |>
gg_tsdisplay(difference(box_cox(Turnover, lambda_retail), 12), plot_type = "partial", lag = 36)
## Warning: Removed 12 rows containing missing values (`geom_line()`).
## Warning: Removed 12 rows containing missing values (`geom_point()`).
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim |>
autoplot(y)
sim |>
gg_tsdisplay(y, plot_type = "partial") +
ggtitle("Using 0.6 as Phi")
For an AR(1) model: \(-1 < \phi < 1\)
\(\phi_{1} = 1\)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 1*y[i-1] + e[i]
sim_2 <- tsibble(idx = seq_len(100), y = y, index = idx)
sim_2 |>
autoplot(y)
sim_2 |>
gg_tsdisplay(y, plot_type = "partial") +
ggtitle("Using 1 as Phi")
\(\phi_{1} = -1\)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- -1*y[i-1] + e[i]
sim_3 <- tsibble(idx = seq_len(100), y = y, index = idx)
sim_3 |>
autoplot(y)
sim_3 |>
gg_tsdisplay(y, plot_type = "partial") +
ggtitle("Using -1 as Phi")
\(\phi_{1} = 0.1\)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.1*y[i-1] + e[i]
sim_4 <- tsibble(idx = seq_len(100), y = y, index = idx)
sim_4 |>
autoplot(y)
sim_4 |>
gg_tsdisplay(y, plot_type = "partial")+
ggtitle("Using 0.1 as Phi")
\(\phi_{1} = -0.6\)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- -0.6*y[i-1] + e[i]
sim_5 <- tsibble(idx = seq_len(100), y = y, index = idx)
sim_5 |>
autoplot(y)
sim_5 |>
gg_tsdisplay(y, plot_type = "partial")+
ggtitle("Using -0.6 as Phi")
There are interesting observations from each of the five AR models. The model when \(\phi = 0.1\) shows the closest to white noise, with both ACF and PACF plots appearing randomized with no distinct pattern and an autoplot that shows stationarity. This makes sense, since AR(1) models that have a constant of zero and \(\phi\) of zero are equivalent to white noise. The models that had a \(\phi\) of 0.6 and 1 showed oscillation in the ACF plot, showing less stationarity and longer distance between data points in the autoplot, which indicates a random walk. The models that had a \(\phi\) of -0.6 and -1 showed more prominent spikes in the ACF plot, in particular when \(\phi = 1\). The autoplots of each show more of up-and-down trend and less stationarity.
MA(1): \(y_t = c + \epsilon_t + \theta_{1} \epsilon_{t-1}\)
set.seed(123)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- e[i] + 0.6*e[i-1]
ma_sim <- tsibble(idx = seq_len(100), y = y, index = idx)
ma_sim |>
autoplot(y)
ma_sim |>
gg_tsdisplay(y, plot_type = "partial") +
ggtitle("Using 0.6 as Theta")
For an MA(1) model: \(-1 < \theta_1 < 1\)
\(\theta_{1} = 1\)
set.seed(123)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- e[i] + 1*e[i-1]
ma_sim2 <- tsibble(idx = seq_len(100), y = y, index = idx)
ma_sim2 |>
autoplot(y)
ma_sim2 |>
gg_tsdisplay(y, plot_type = "partial") +
ggtitle("Using 1 as Theta")
\(\theta_{1} = -1\)
set.seed(123)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- e[i] + -1*e[i-1]
ma_sim3 <- tsibble(idx = seq_len(100), y = y, index = idx)
ma_sim3 |>
autoplot(y)
ma_sim3 |>
gg_tsdisplay(y, plot_type = "partial") +
ggtitle("Using -1 as Theta")
\(\theta_{1} = 0.1\)
set.seed(123)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- e[i] + 0.1*e[i-1]
ma_sim4 <- tsibble(idx = seq_len(100), y = y, index = idx)
ma_sim4 |>
autoplot(y)
ma_sim4 |>
gg_tsdisplay(y, plot_type = "partial")+
ggtitle("Using 0.1 as Theta")
\(\theta_{1} = -0.6\)
set.seed(123)
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- e[i] + -0.6*e[i-1]
ma_sim5 <- tsibble(idx = seq_len(100), y = y, index = idx)
ma_sim5 |>
autoplot(y)
ma_sim5 |>
gg_tsdisplay(y, plot_type = "partial")+
ggtitle("Using -0.6 as Theta")
When comparing the five autoplots, there doesn’t appear to be any distinct differences between each of them. They mostly show different variations over time with each theta value, but nothing that would suggest a clear pattern. When comparing the ACF and PACF plots, the theta value of 0.1 shows the most white noise, which makes sense since it is the closest value to zero among the other values used. Nevertheless, there doesn’t appear to be any suggestion of oscillation or distinct pattern in the plots. The plots that have 1 and -1 theta values appears to show more spikes in the lags than the other values.
set.seed(123)
y <- numeric(100)
e <- rnorm(100)
phi <- 0.6
theta <- 0.6
for(i in 2:100)
y[i] <- phi*y[i-1] + theta*e[i-1] + e[i]
arma11_sim <- tsibble(idx = seq_len(100), y = y, index = idx)
set.seed(123)
y <- numeric(100)
e <- rnorm(100)
phi1 <- -0.8
phi2 <- 0.3
for(i in 3:100)
y[i] <- phi1*y[i-1] + phi2*y[i-2] + e[i]
ar2_sim <- tsibble(idx = seq_len(100), y = y, index = idx)
arma11_sim |>
autoplot(y)
arma11_sim |>
gg_tsdisplay(y, plot_type = "partial")
ar2_sim |>
autoplot(y)
ar2_sim |>
gg_tsdisplay(y, plot_type = "partial")
The ARMA(1,1) model shows more stationarity in its autoplot than the AR(2) model. The ACF plot of the ARMA(1,1) model shows oscillation, while the AR(2) model has spikes beyond the confidence intervals and are decreasing slowly. The autoplot of the AR(2) model is mostly flat until the data points start to go up and down after value 50 and begins rapidly widen after value 75.
aus_airpassengers, the total number of
passengers (in millions) from Australian air carriers for the period
1970-2011.aus_airpassengers |>
gg_tsdisplay(Passengers, plot_type = "partial")
fit <- aus_airpassengers |>
model(ARIMA(Passengers))
report(fit)
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8963
## s.e. 0.0594
##
## sigma^2 estimated as 4.308: log likelihood=-97.02
## AIC=198.04 AICc=198.32 BIC=201.65
Using the ARIMA() function, the model that was selected was a MA model with a \(q\) value of 1 and differencing \(d\) of 2.
fit |>
gg_tsresiduals()
fit |>
forecast(h=10) |>
autoplot(aus_airpassengers)
ARIMA(0,2,1): \(y_t = -0.8963 * \epsilon_{t-1} + \epsilon_{t}\)
fit2 <- aus_airpassengers |>
model(ARIMA(Passengers ~ 1 + pdq(0,1,0)))
report(fit2)
## Series: Passengers
## Model: ARIMA(0,1,0) w/ drift
##
## Coefficients:
## constant
## 1.4191
## s.e. 0.3014
##
## sigma^2 estimated as 4.271: log likelihood=-98.16
## AIC=200.31 AICc=200.59 BIC=203.97
fit2 |>
gg_tsresiduals()
fit2 |>
forecast(h=10) |>
autoplot(aus_airpassengers)
There appears to be less white noise in the ARIMA(0,1,0) model than in the suggested ARIMA(0,2,1) model. The second model has an AIC value of 200.31, slightly more than the first model’s AIC of 198.04. When comparing the forecast plots of each, the first model’s confidence interval appears slightly wider than the second model. Nevertheless, each forecast appears to show an upward trajectory.
fit3 <- aus_airpassengers |>
model(ARIMA(Passengers ~ 1 + pdq(2,1,2)))
report(fit3)
## Series: Passengers
## Model: ARIMA(2,1,2) w/ drift
##
## Coefficients:
## ar1 ar2 ma1 ma2 constant
## -0.5518 -0.7327 0.5895 1.0000 3.2216
## s.e. 0.1684 0.1224 0.0916 0.1008 0.7224
##
## sigma^2 estimated as 4.031: log likelihood=-96.23
## AIC=204.46 AICc=206.61 BIC=215.43
fit3 |>
gg_tsresiduals()
fit3 |>
forecast(h=10) |>
autoplot(aus_airpassengers)
When examining the third model’s plot with those in parts a and c, the thin blue line appears to not be straight, showing less of an upward trajectory and more of a flatter forecast. Its confidence interval looks to be roughly the same as the confidence interval produced in the plot for part c.Â
Without the constant
fit4 <- aus_airpassengers |>
model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))
## Warning: 1 error encountered for ARIMA(Passengers ~ 0 + pdq(2, 1, 2))
## [1] non-stationary AR part from CSS
report(fit4)
## Series: Passengers
## Model: NULL model
## NULL model
Removing the constant produces an error stating that the non-stationary AR part from the CSS (conditional sum of squares). This may be due to the values falling outside the region for stationary processes.
fit5 <- aus_airpassengers |>
model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))
## Warning: Model specification induces a quadratic or higher order polynomial trend.
## This is generally discouraged, consider removing the constant or reducing the number of differences.
report(fit5)
## Series: Passengers
## Model: ARIMA(0,2,1) w/ poly
##
## Coefficients:
## ma1 constant
## -1.0000 0.0571
## s.e. 0.0585 0.0213
##
## sigma^2 estimated as 3.855: log likelihood=-95.1
## AIC=196.21 AICc=196.79 BIC=201.63
We get a warning that the model specification induces a quadratic or higher order polynomial trend, with a suggestion to remove the constant or reduce the number of differences. This may be caused by the fact that the model is differenced twice with a constant of 1.
fit5 |>
gg_tsresiduals()
fit5 |>
forecast(h=10) |>
autoplot(aus_airpassengers)
Despite the warning, the model still produces a forecast, which shows an upward trajectory with a more narrow confidence interval than the previous forecasts.
global_economy):us_gdp <- global_economy |>
filter(Country == "United States")
us_gdp |>
autoplot(GDP)
us_gdp |>
gg_tsdisplay(GDP, plot_type = "partial")
lambda_us <- us_gdp |>
features(GDP, features = guerrero) |>
pull(lambda_guerrero)
us_gdp |>
autoplot(box_cox(GDP, lambda_us))
us_gdp |>
gg_tsdisplay(box_cox(GDP, lambda_us), plot_type = "partial")
fit <- us_gdp |>
model(ARIMA(box_cox(GDP, lambda_us)))
report(fit)
## Series: GDP
## Model: ARIMA(1,1,0) w/ drift
## Transformation: box_cox(GDP, lambda_us)
##
## Coefficients:
## ar1 constant
## 0.4586 118.1822
## s.e. 0.1198 9.5047
##
## sigma^2 estimated as 5479: log likelihood=-325.32
## AIC=656.65 AICc=657.1 BIC=662.78
us_gdp |>
features(box_cox(GDP, lambda_us), unitroot_kpss)
## # A tibble: 1 × 3
## Country kpss_stat kpss_pvalue
## <fct> <dbl> <dbl>
## 1 United States 1.55 0.01
us_gdp |>
features(box_cox(GDP, lambda_us), unitroot_ndiffs)
## # A tibble: 1 × 2
## Country ndiffs
## <fct> <int>
## 1 United States 1
us_gdp |>
gg_tsdisplay(difference(box_cox(GDP, lambda_us)), plot_type = "partial")
## Warning: Removed 1 row containing missing values (`geom_line()`).
## Warning: Removed 1 rows containing missing values (`geom_point()`).
fit_models <- us_gdp |>
model(
"ARIMA110" = ARIMA(box_cox(GDP, lambda_us) ~ 1 + pdq(1,1,0)),
"ARIMA011" = ARIMA(box_cox(GDP, lambda_us) ~ 1 + pdq(0,1,1)),
"ARIMA210" = ARIMA(box_cox(GDP, lambda_us) ~ 1 + pdq(2,1,0)),
"ARIMA010" = ARIMA(box_cox(GDP, lambda_us) ~ 1 + pdq(0,1,0)),
"ARIMA012" = ARIMA(box_cox(GDP, lambda_us) ~ 1 + pdq(0,1,2)),
"ARIMA212" = ARIMA(box_cox(GDP, lambda_us) ~ 1 + pdq(2,1,2))
)
glance(fit_models) |>
arrange(AIC) |>
select(.model:BIC)
## # A tibble: 6 × 6
## .model sigma2 log_lik AIC AICc BIC
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 ARIMA110 5479. -325. 657. 657. 663.
## 2 ARIMA210 5580. -325. 659. 659. 667.
## 3 ARIMA011 5689. -326. 659. 659. 665.
## 4 ARIMA012 5634. -326. 659. 660. 667.
## 5 ARIMA212 5734. -325. 662. 664. 674.
## 6 ARIMA010 6774. -332. 668. 668. 672.
I chose the model that had the lowest AIC value, which was the ARIMA110 model that was recommended by the ARIMA() function.
fit_models |>
select(ARIMA110) |>
gg_tsresiduals()
fit_models |>
forecast(h=10) |>
filter(.model=='ARIMA110') |>
autoplot(us_gdp) +
labs(y = "GDP", title = "Forecast of U.S. GDP Using ARIMA110") +
theme(plot.title = element_text(hjust=0.5))
The forecast model look reasonable. The confidence interval is not wide and the forecast shows an upward trajectory that’s consistent with the data represented in the plot.
fit_ets <- us_gdp |>
model(ETS(GDP))
report(fit_ets)
## Series: GDP
## Model: ETS(M,A,N)
## Smoothing parameters:
## alpha = 0.9990876
## beta = 0.5011949
##
## Initial states:
## l[0] b[0]
## 448093333334 64917355687
##
## sigma^2: 7e-04
##
## AIC AICc BIC
## 3190.787 3191.941 3201.089
fit_ets |>
forecast(h=10) |>
autoplot(us_gdp) +
labs(y = "GDP", title = "Forecast of U.S. GDP Using ETS") +
theme(plot.title = element_text(hjust=0.5))
The ETS forecast model shows a wider confidence interval and less of an upward trajectory than the recommended ARIMA model. The AIC value of the ETS model is 3190.787, which is significantly higher than the AIC of the selected ARIMA model, which was 656.65.