Instructions Do the exercises 9.1, 9.2, 9.3, 9.5, 9.6, 9.7, 9.8 in
Hyndman.
Please submit both the Rpubs link as well as your .rmd file.
A. Explain the differences among these figures. Do they all indicate that the data are white noise?
As we examine the graphs, we see that the spikes in the lags diminish as n (the length of the time series) increases, which also leads to a decrease in the significance threshold. As n increases, the bounded area (represented by the blue-dotted line) decreases. All the lag spikes fall within the bounded area of ±1.96n±n1.96. This indicates that the auto-correlations are white noise.
B.Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?
The critical values can be described by the formula ±1.96n±n1.96, where n is the length of the time series. As n increases, the critical values become smaller and closer to zero because the standard error of the autocorrelation estimates decreases. Therefore, a larger sample size n correlates to narrower confidence intervals around the mean of zero, leading to critical values that reflect this increased precision.
A classic example of a non-stationary series are stock prices. Plot
the daily closing prices for Amazon stock (contained
in gafa_stock
), along with the ACF and PACF. Explain how
each plot shows that the series is non-stationary and should be
differenced.
gafa_stock |>
filter(Symbol == "AMZN") |>
gg_tsdisplay(Close, plot_type='partial') +
labs(title = "AMZN Closing Stock Price")
## Warning: Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
From the quick plot of the time series for AMZN closing prices, we can see that there’s a clear trend in the data. The plot shows a continuous upward curve, indicating the presence of a trend. For data to be considered stationary, it should look more horizontal, with no clear seasonality or trend. This signals the need for at least one order of differencing.
The ACF and PACF plots back this up. For truly stationary data, the ACF should drop to zero quickly, but here we see the ACF decreasing slowly, which confirms non-stationarity. The PACF shows a significant spike at lag 1, indicating strong autocorrelation, and then smaller significant spikes at regular intervals, possibly hinting at some seasonality.
Altogether, this shows that the data should be differenced to remove the trend and make it stationary.
For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
A. Turkish Data
global_economy |>
filter(Country == "Turkey") |>
gg_tsdisplay(GDP, plot_type='partial') +
labs(title = "Non-transformed Turkish GDP")
lambda_turkey <- global_economy |>
filter(Country == "Turkey") |>
features(GDP, features = guerrero) |>
pull(lambda_guerrero) |>
round(4)
global_economy |>
filter(Country == "Turkey") |>
features(GDP, unitroot_ndiffs)
## # A tibble: 1 × 2
## Country ndiffs
## <fct> <int>
## 1 Turkey 1
global_economy |>
filter(Country == "Turkey") |>
features(GDP, unitroot_nsdiffs)
## # A tibble: 1 × 2
## Country nsdiffs
## <fct> <int>
## 1 Turkey 0
global_economy |>
filter(Country == "Turkey") |>
features(GDP, unitroot_kpss)
## # A tibble: 1 × 3
## Country kpss_stat kpss_pvalue
## <fct> <dbl> <dbl>
## 1 Turkey 1.22 0.01
# Print the lambda value
print(paste("lambda =", lambda_turkey))
## [1] "lambda = 0.1572"
turkey_transform <- global_economy |>
filter(Country == "Turkey") |>
mutate(GDP_transformed_diff = difference(box_cox(GDP, lambda_turkey))) # Apply Box-Cox and differencing
# Plot the transformed and differenced data
turkey_transform |>
gg_tsdisplay(GDP_transformed_diff) +
labs(title = TeX(paste0("Differenced GDP with $\\lambda$ = ", lambda_turkey)))
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
The Turkish data needs one order of difference and does not require a seasonal difference
B. Accommodation takings in the state of Tasmania from
aus_accommodation
.
## Visualize data ACF and APCF
aus_accommodation |>
filter(State == "Tasmania") |>
gg_tsdisplay(Takings, plot_type='partial',) +
labs(title = "Tasmania Accomodation Takings")
# Lambda
lambda_tasmania <-aus_accommodation |>
filter(State == "Tasmania") |>
features(Takings, features = guerrero) |>
pull(lambda_guerrero)|>
round(4)
print(paste("lamda=", lambda_tasmania))
## [1] "lamda= 0.0018"
aus_accommodation |>
filter(State == "Tasmania") |>
features(box_cox(Takings,lambda_tasmania), unitroot_ndiffs)
## # A tibble: 1 × 2
## State ndiffs
## <chr> <int>
## 1 Tasmania 1
aus_accommodation |>
filter(State == "Tasmania") |>
features(box_cox(Takings,lambda_tasmania), unitroot_nsdiffs)
## # A tibble: 1 × 2
## State nsdiffs
## <chr> <int>
## 1 Tasmania 1
The Tasmania data need both a first difference and seasonal difference. I chose M=4 for this quarterly time series.
aus_accommodation %>%
filter(State == "Tasmania") %>%
gg_tsdisplay(difference(box_cox(Takings,lambda_tasmania))|>
difference(4), plot_type='partial',
) +
labs(title = TeX(paste0("Differenced Tasmania Accomodation Takings with $\\lambda$ = ",
lambda_tasmania)))
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_point()`).
C. Monthly from souvenirs
# Plot Data Souvenirs monthlysales and ACF/PACF
souvenirs |>
gg_tsdisplay(Sales, plot_type = 'partial') +
labs(title = "Monthly Souvenirs sales")
# Lambda
lambda_monthly_souvenirs <-souvenirs |>
features(Sales, features = guerrero) |>
pull(lambda_guerrero)|>
round(4)
print(paste("LAmbda= ",lambda_monthly_souvenirs))
## [1] "LAmbda= 0.0021"
# Unit Root Test seasonality
souvenirs|>
features(Sales, unitroot_nsdiffs)
## # A tibble: 1 × 1
## nsdiffs
## <int>
## 1 1
souvenirs|>
features(
Sales, unitroot_ndiffs
)
## # A tibble: 1 × 1
## ndiffs
## <int>
## 1 1
souvenirs|>
gg_tsdisplay(difference(box_cox(Sales,lambda_monthly_souvenirs))|>
difference(12), plot_type='partial',
) +
labs(title = TeX(paste0("Differenced Monthly sales with $\\lambda$ = ",
lambda_monthly_souvenirs)))
## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_point()`).
The Souvenirs data needs a first difference as well as a seasonal difference, went with m=12 since it is monthly data.
For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
set.seed(3231)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries_train <- myseries %>%
filter(year(Month) < 2011)
autoplot(myseries, Turnover) +
autolayer(myseries_train, Turnover, colour = "red")
myseries|>
gg_tsdisplay(Turnover, plot_type = "partial")
my_series_lamda<- myseries|>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)|>
round(4)
print(my_series_lamda)
## [1] 0.0902
myseries|>
features(Turnover, unitroot_ndiffs)
## # A tibble: 1 × 3
## State Industry ndiffs
## <chr> <chr> <int>
## 1 Tasmania Food retailing 1
myseries|>
features(Turnover, unitroot_nsdiffs)
## # A tibble: 1 × 3
## State Industry nsdiffs
## <chr> <chr> <int>
## 1 Tasmania Food retailing 1
We need one difference and a seasonal difference
myseries|>
gg_tsdisplay(difference(box_cox(Turnover, my_series_lamda))|>
difference(12), plot_type='partial',
) +
labs(title = TeX(paste0("Differenced Monthly Turnover with $\\lambda$ = ",
my_series_lamda)))
## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_point()`).
Simulate and plot some data from simple ARIMA models.
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
B. Plot the series
sim %>% autoplot(y)+
labs(title = "AR(1) Model with Phi = 0.6", x = "Time", y = "Value")
Changing the ϕ1 to -0.6
for(i in 2:100)
y[i] <- -1*0.6*y[i-1] + e[i]
sim2 <- tsibble(idx = seq_len(100), y = y, index = idx)
sim2 |>
autoplot(y) +
labs(title = "AR(1) Model with Phi = -0.6", x = "Time", y = "Value")
AR(1) model with ϕ=−0.6= -0.6ϕ=−0.6. As expected, we can see the alternating pattern in the time series due to the negative autocorrelation (ϕ=−0.6= -0.6ϕ=−0.6).
C ,D
MA model
# Simulate an MA(1) process with theta = 0.6
y <- numeric(100) # Initialize the vector
e <- rnorm(100) # Generate white noise
# Simulate the MA(1) process
for(i in 2:100) {
y[i] <- e[i] + 0.6 * e[i-1] # MA(1) model with theta = 0.6
}
# Create a tsibble
sim_4 <- tsibble(idx = seq_len(100), y = y, index = idx)
# Plot the data
MA_1<-sim_4 |>
gg_tsdisplay(y, plot_type = "partial") +
labs(title = "MA(1) Model with Theta = 0.6", x = "Time", y = "Value")
MA_1
The ACF plot shows a significant spike at lag 1 with values dropping close to zero at higher lags, which is typical of an MA(1) process. The PACF plot has no significant spikes beyond lag 1, further confirming the suitability of the MA(1) model for this data.
Generate data from an ARMA(1,1) model with ϕ1=0.6ϕ1=0.6, θ1=0.6θ1=0.6 and σ2=1σ2=1.
n <- 100 # Length of the time series
phi <- 0.6 # AR(1) coefficient
theta <- 0.6 # MA(1) coefficient
sigma <- 1 # Standard deviation of the white noise (sigma^2 = 1)
# Generate white noise with standard deviation sigma = 1
e <- rnorm(n, mean = 0, sd = sigma)
# Simulate an ARMA(1,1) model with AR(1) and MA(1) components
y <- numeric(n)
for(i in 2:n) {
y[i] <- phi * y[i-1] + e[i] + theta * e[i-1] # ARMA(1,1) process
}
# Create a tsibble for the simulated data
library(tsibble)
sim_5 <- tsibble(idx = seq_len(n), y = y, index = idx)
# Plot the simulated data
arma_1_1<-sim_5|>
gg_tsdisplay(y, plot_type = "partial")+
labs(title = expression(paste(
"Simulated ARMA(1,1) Process with ", phi[1], " = 0.6, ",
theta[1], " = 0.6, and ", sigma^2, " = 1")),
x = "Time", y = "Value")
arma_1_1
Generate AR(2) with
0.8ϕ1=−0.8, ϕ2=0.3ϕ2=0.3 and σ2=1σ2=1
phi1= -0.8
phi2= 0.3
e <- rnorm(n, mean = 0, sd = sigma)
# Initialize a vector for the AR(2) process
y <- numeric(n)
e <- rnorm(n, mean = 0, sd = sigma)
# Initialize a vector for the AR(2) process
y <- numeric(n)
# Simulate the AR(2) process:
#y[i] depends on y[i-1], y[i-2], and white noise e[i]
for(i in 3:n) {
y[i] <- phi1 * y[i-1] + phi2 * y[i-2] + e[i] # AR(2) model
}
sim_Ar2 <- tsibble(idx = seq_len(n), y = y, index = idx)
sim_Ar2|>
gg_tsdisplay(y, plot_type = "partial")
The ARMA(1,1) plot indicates stationarity with its stable, mean-reverting behavior. The rapid decline in the ACF and the sharp cut-off in the PACF after lag 1 confirm this.
In contrast, the AR(2) plot shows a non-stationary process due to its unstable, diverging pattern and persistent auto-correlations over several lags.
Consider aus_airpassengers
, the total number of
passengers (in millions) from Australian air carriers for the period
1970-2011.
Use ARIMA()
to find an appropriate ARIMA model. What
model was selected. Check that the residuals look like white noise. Plot
forecasts for the next 10 periods.
Write the model in terms of the backshift operator.
Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.
Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.
Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?
1.
fit <- aus_airpassengers %>%
filter(Year < 2012) %>%
model(ARIMA(Passengers))
report(fit)
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8756
## s.e. 0.0722
##
## sigma^2 estimated as 4.671: log likelihood=-87.8
## AIC=179.61 AICc=179.93 BIC=182.99
fit|>
gg_tsresiduals()
fit %>%
forecast(h=10) %>%
autoplot(aus_airpassengers) +
labs(title = "Aus Aircraft Passengers with ARIMA(0,2,1)", y = "Passengers Mil")
The residuals and ACF for the selected ARIMA(0,2,1) model indicate white noise, and the forecast aligns well with the test data, falling within the confidence intervals.
2.
yt=−0.8756ϵt−1+ϵtyt=−0.8756ϵt−1+ϵt
(1−B)2yt=(1−0.87B)ϵt
3.
fitdrift <-aus_airpassengers %>%
filter(Year < 2012) %>%
model(ARIMA(Passengers ~ pdq(0,1,0)))
fitdrift|>
forecast(h=10)|>
autoplot(aus_airpassengers)+
labs(
title = "AUS Arircraft Passenges with Arima(0,1,0)",
y = "Passengers Mil"
)
fitdrift|>
gg_tsresiduals()+
labs(
title = "AUS Arircraft Passenges with Arima(0,1,0)")
4.
fit212drift<- aus_airpassengers|>
filter(Year <2012)|>
model(ARIMA(Passengers ~ pdq(2,1,2)))
report(fit212drift)
## Series: Passengers
## Model: ARIMA(2,1,2) w/ drift
##
## Coefficients:
## ar1 ar2 ma1 ma2 constant
## 1.4694 -0.5103 -1.5736 0.6780 0.0650
## s.e. 0.3780 0.3558 0.3081 0.2688 0.0294
##
## sigma^2 estimated as 4.748: log likelihood=-87.74
## AIC=187.47 AICc=189.94 BIC=197.75
fit212drift|>
gg_tsresiduals()+
labs(
title = "AUS Arircraft Passenges with Arima(2,1,2) W/ Drift",)
fit212drift|>
forecast(h=10)|>
autoplot(aus_airpassengers)+
labs(
title = "AUS Arircraft Passenges with Arima(2,1,2) W/ Drift",
y = "Passengers Mil"
)
fit_no_constant212<-aus_airpassengers|>
filter(Year < 2012)|>
model(ARIMA(Passengers ~ 0 +pdq(2,1,2)))
## Warning: 1 error encountered for ARIMA(Passengers ~ 0 + pdq(2, 1, 2))
## [1] non-stationary AR part from CSS
report(fit_no_constant212)
## Series: Passengers
## Model: NULL model
## NULL model
When we remove the Constant we have a null model.
5.
fit_021C<- aus_airpassengers|>
filter(Year<2012)|>
model(ARIMA(Passengers ~ 1+ pdq(0,2,1)))
## Warning: Model specification induces a quadratic or higher order polynomial trend.
## This is generally discouraged, consider removing the constant or reducing the number of differences.
report(fit_021C)
## Series: Passengers
## Model: ARIMA(0,2,1) w/ poly
##
## Coefficients:
## ma1 constant
## -1.0000 0.0721
## s.e. 0.0709 0.0260
##
## sigma^2 estimated as 4.086: log likelihood=-85.74
## AIC=177.48 AICc=178.15 BIC=182.55
We receive a warning because the model uses second differencing (d=2d=2d=2), which implies that we’re modeling a trend of at least quadratic order. In this case, ARIMA(0,2,1) assumes a trend that grows at a nonlinear rate, but no explicit polynomial trend has been included in the model
fit_021C|>
forecast(h=50) %>%
autoplot(aus_airpassengers) +
labs(title = "Australian Aircraft Passengers with ARIMA(0,2,1) with constant",
y = "Passengers Mil")
“As we can see, the inclusion of the constant is causing the forecast values to explode in a quadratic curve, which is not ideal for real-world forecasting. This kind of growth is unlikely for the number of passengers and suggests that the model may need refinement to avoid unrealistic projections.
fit_021C|>
gg_tsresiduals()+
labs(title = "Australian Aircraft Passengers with ARIMA(0,2,1) with constant")
For the United States GDP series (from
global_economy
):
if necessary, find a suitable Box-Cox transformation for the data;
fit a suitable ARIMA model to the transformed data using
ARIMA()
;
try some other plausible models by experimenting with the orders chosen;
choose what you think is the best model and check the residual diagnostics;
produce forecasts of your fitted model. Do the forecasts look reasonable?
compare the results with what you would obtain using
ETS()
(with no transformation).
1.
usa_gdp<- global_economy|>
filter(Country == "United States")
usa_gdp|>
autoplot()
## Plot variable not specified, automatically selected `.vars = GDP`
usa_gdp|>
autoplot(sqrt(GDP))
lambda_usagdp <- usa_gdp |>
features(GDP, features = guerrero) |>
pull(lambda_guerrero)|>
round(4)
print(lambda_usagdp)
## [1] 0.2819
usa_gdp|>
mutate(transformed_gdp = box_cox( GDP,lambda_usagdp))|>
gg_tsdisplay(transformed_gdp, plot_type = "partial")+
labs(
title = paste("transformed US GDP with a lamda", lambda_usagdp
))
2.
fitUSA<- usa_gdp|>
model(ARIMA(box_cox(GDP,lambda_usagdp)))
report(fitUSA)
## Series: GDP
## Model: ARIMA(1,1,0) w/ drift
## Transformation: box_cox(GDP, lambda_usagdp)
##
## Coefficients:
## ar1 constant
## 0.4586 118.0287
## s.e. 0.1198 9.4923
##
## sigma^2 estimated as 5465: log likelihood=-325.25
## AIC=656.5 AICc=656.95 BIC=662.63
usa_gdp|>
features(box_cox(GDP, lambda_usagdp), unitroot_ndiffs)
## # A tibble: 1 × 2
## Country ndiffs
## <fct> <int>
## 1 United States 1
usa_gdp|>
features(box_cox(GDP, lambda_usagdp), unitroot_nsdiffs)
## # A tibble: 1 × 2
## Country nsdiffs
## <fct> <int>
## 1 United States 0
usa_gdp|>
features(box_cox(GDP, lambda_usagdp), unitroot_kpss)
## # A tibble: 1 × 3
## Country kpss_stat kpss_pvalue
## <fct> <dbl> <dbl>
## 1 United States 1.55 0.01
usa_fit <- usa_gdp |>
filter(Year< 2005)|>
model(
arima_110 = ARIMA(box_cox(GDP, lambda_usagdp) ~ pdq(1,1,0)),
arima_120 = ARIMA(box_cox(GDP, lambda_usagdp) ~ pdq(1,2,0)),
arima_210 = ARIMA(box_cox(GDP, lambda_usagdp) ~ pdq(2,1,0)),
arima_212 = ARIMA(box_cox(GDP, lambda_usagdp) ~1+ pdq(2,1,2)),
arima_111 = ARIMA(box_cox(GDP, lambda_usagdp) ~ pdq(1,1,1))
)
glance(usa_fit) %>% arrange(AICc) %>% select(.model:BIC)
## # A tibble: 5 × 6
## .model sigma2 log_lik AIC AICc BIC
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 arima_120 4982. -244. 491. 492. 495.
## 2 arima_111 4187. -245. 497. 498. 504.
## 3 arima_110 4365. -246. 498. 498. 503.
## 4 arima_210 4412. -246. 499. 500. 506.
## 5 arima_212 4087. -244. 499. 502. 510.
usa_fit %>%
select(arima_212) %>%
gg_tsresiduals() +
ggtitle("Residuals for ARIMA(1,1,0)")
The histogram of residuals is centered around zero, although there seems to be some skewness on the left side. This suggests that the residuals might not be perfectly normally distributed, but the deviation is minor.
The ACF plot of the residuals shows no significant spikes beyond the confidence bounds. This suggests that the residuals are uncorrelated, which is what we expect from a good model (i.e., the residuals resemble white noise).
usa_fit %>%
forecast(h=25) %>%
filter(.model=='arima_212') %>%
autoplot(usa_gdp)+
labs(
title = paste("GDP boxcox lamda:",lambda_usagdp," Arima model (2,1,2) \nconstant 1,\nTrained < 2005"
))
usa_fit %>%
forecast(h=25) %>%
autoplot(usa_gdp,level =NULL, )+
labs(title = paste("GDP boxcox lamda:",lambda_usagdp, "\nForcasts for USA transformed and trained < 2005 "))
fit_usaets <- usa_gdp|>
filter(Year<2005)|>
model(ETS(GDP))
report(fit_usaets)
## Series: GDP
## Model: ETS(M,A,N)
## Smoothing parameters:
## alpha = 0.9203841
## beta = 0.5312597
##
## Initial states:
## l[0] b[0]
## 448093333334 63642663151
##
## sigma^2: 8e-04
##
## AIC AICc BIC
## 2436.235 2437.773 2445.268
fit_usaets|>
forecast(h = 25)|>
autoplot(usa_gdp,)+
labs(
title = "non transformed forcast ETSMAM\nTrained < 2005"
)
The ETS model using (MAN) generates a much wider forecast due to the multiplicative errors. While this proves advantageous in the short run—capturing events like the 2008 recession within the confidence intervals—it leads to a much more unpredictable forecast as the horizon (H) increases