library(fpp3)
library(feasts)
library(latex2exp)DATA 624 Homework 6
1-) Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
a. Explain the differences among these figures. Do they all indicate that the data are white noise?Figure 9.32: Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers.
The figures differ in the lengths of each spike and the size of the confidence bands (area between blue dotted lines). As the time series progresses, the spikes tend to shorten and the bounded area diminishes. For white noise, the autocorrelations should be close to zero for all lags.This suggests that the data are white noise, as they fall within two blue lines defined by ±1.96/√T, where T represents the length of the time series and close to zero. the ACF plots show no significant correlation at any lag.
b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?
The critical values vary in their distance from the mean of zero because they are calculated as ±1.96/√T, where T is the length of the time series. As T increases, the critical values decrease, bringing them closer to zero. The autocorrelations differ due to larger series sizes of random numbers, which reduce the likelihood of autocorrelation.
2-) A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.
gafa_stock |>
filter(Symbol == "AMZN") |>
gg_tsdisplay(Close, plot_type='partial') +
labs(title = "Amazon Closing Stock Price")gafa_stock |>
filter(Symbol == "AMZN") |>
features(Close, unitroot_ndiffs)# A tibble: 1 × 2
Symbol ndiffs
<chr> <int>
1 AMZN 1
Comment: The time series appears to have an increasing trend, indicating it is non-stationary. The ACF plot shows a slow decrease with a large, positive r1, whereas a stationary time series would drop to zero relatively quickly. The PACF plot has a first lag around 1, further indicating non-stationarity. To stabilize the mean, the series should be differenced. According to the KPSS test, the Amazon closing price needs to be differenced once to achieve stationarity.
3-) For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
a. Turkish GDP from global_economy.
# plot
global_economy |>
filter(Country == "Turkey") |>
gg_tsdisplay(GDP, plot_type='partial') +
labs(title = "Non-transformed Turkish GDP")# calculate lambda
lambda <- global_economy |>
filter(Country == "Turkey") |>
features(GDP, features = guerrero) |>
pull(lambda_guerrero)
# unit root test
global_economy |>
filter(Country == "Turkey") |>
features(box_cox(GDP,lambda), unitroot_ndiffs) # A tibble: 1 × 2
Country ndiffs
<fct> <int>
1 Turkey 1
# transformed plot
global_economy |>
filter(Country == "Turkey") |>
gg_tsdisplay(difference(box_cox(GDP,lambda)), plot_type='partial') +
labs(title = TeX(paste0("Differenced Turkish GDP with $\\lambda$ = ",
round(lambda,2))))Comment: The Turkish GDP is non-seasonal and non-stationary. The KPSS test suggests that differencing once is sufficient, and the graphs indicate that it appears stationary after this single differencing.
b. Accommodation takings in the state of Tasmania from aus_accommodation.
# plot
aus_accommodation |>
filter(State == "Tasmania") |>
gg_tsdisplay(Takings, plot_type='partial') +
labs(title = "Non-transformed Tasmania Accomodation Takings")# calculate lambda
lambda <-aus_accommodation |>
filter(State == "Tasmania") |>
features(Takings, features = guerrero) |>
pull(lambda_guerrero)
#unit root test
aus_accommodation |>
filter(State == "Tasmania") |>
features(box_cox(Takings,lambda), unitroot_nsdiffs) # A tibble: 1 × 2
State nsdiffs
<chr> <int>
1 Tasmania 1
# transformed plot
aus_accommodation |>
filter(State == "Tasmania") |>
gg_tsdisplay(difference(box_cox(Takings,lambda), 4), plot_type='partial') +
labs(title = TeX(paste0("Differenced Tasmania Accomodation Takings with $\\lambda$ = ",
round(lambda,2))))Comment: Accommodation takings in Tasmania are seasonal and non-stationary. The KPSS test suggests that differencing once is sufficient, and the graphs indicate stationarity after this single differencing. The ACF decays quickly, and the PACF plot is truncated after the first lag.
c. Monthly sales from souvenirs.
# plot
souvenirs |>
gg_tsdisplay(Sales, plot_type='partial', lag = 36) +
labs(title = "Non-transformed Monthly Souvenir Sales")# calculate lambda
lambda <- souvenirs |>
features(Sales, features = guerrero) |>
pull(lambda_guerrero)
# unit root test
souvenirs |>
features(box_cox(Sales,lambda), unitroot_nsdiffs) # A tibble: 1 × 1
nsdiffs
<int>
1 1
# transformed plot
souvenirs |>
gg_tsdisplay(difference(box_cox(Sales,lambda), 12), plot_type='partial', lag = 36) +
labs(title = TeX(paste0("Differenced Monthly Souvenir Sales with $\\lambda$ = ",
round(lambda,2))))Comment: Monthly souvenir sales are seasonal and non-stationary. The KPSS test suggests that differencing once is sufficient, and a lambda value close to 0 indicates a natural log transformation. The graphs show that the series appears stationary after this single differencing. The ACF decays quickly, and the PACF plot is truncated after the first lag.
4-) For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
set.seed(0007)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
# plot
myseries |>
gg_tsdisplay(Turnover, plot_type='partial', lag = 36) +
labs(title = "Non-transformed Retail Turnover")# lambda calculation
lambda <- myseries |>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
# unit root test
myseries |>
features(box_cox(Turnover, lambda), unitroot_nsdiffs) # A tibble: 1 × 3
State Industry nsdiffs
<chr> <chr> <int>
1 Queensland Food retailing 1
# transformed plot
myseries |>
gg_tsdisplay(difference(box_cox(Turnover,lambda), 12), plot_type='partial', lag = 36) +
labs(title = TeX(paste0("Differenced Queensland Food Retailing with $\\lambda$ = ",
round(lambda,2))))Comment: The retail data is seasonal and non-stationary. A Box-Cox transformation was applied to appropriately transform the data. The seasonal KPSS unit root test indicates that differencing once is sufficient. After this differencing and transformation, the data appears stationary, as it is centered around zero, and the ACF and PACF plots confirm stationarity.
5-) Simulate and plot some data from simple ARIMA models.
a. Use the following R code to generate data from an AR(1) model with ϕ1 = 0.6 and σ2 = 1. The process starts with y1 = 0.
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)b. Produce a time plot for the series. How does the plot change as you change ϕ1?
sim |>
autoplot(y) +
labs(title = TeX(" AR(1) model with $\\phi_i$ = 0.6"))# phi_1 = 1
for(i in 2:100)
y[i] <- 1*y[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) |>
autoplot(y) +
labs(title = TeX(" AR(1) model with $\\phi_i$ = 1"))# phi_1 = 0
for(i in 2:100)
y[i] <- 0*y[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) |>
autoplot(y) +
labs(title = TeX(" AR(1) model with $\\phi_i$ = 0"))# phi_1 = -0.6
for(i in 2:100)
y[i] <- -0.6*y[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) |>
autoplot(y) +
labs(title = TeX(" AR(1) model with $\\phi_i$ = -0.6"))Comment: Changing ϕ1 results in different time series patterns. When ϕ1=0, it resembles white noise, and when ϕ1=1, it resembles a random walk. When ϕ1 becomes negative, the series tends to oscillate around the mean. As ϕ1 decreases, the variation increases, producing more spikes.
c. Write your own code to generate data from an MA(1) model with θ1 = 0.6 and σ2 = 1.
for(i in 2:100)
y[i] <- e[i] + 0.6 * e[i-1]
sim_2 <- tsibble(idx = seq_len(100), y = y, index = idx)d. Produce a time plot for the series. How does the plot change as you change θ1.
sim_2 |>
autoplot(y) +
labs(title = TeX(" MA(1) model with $\\theta_i$ = 0.6"))# theta_1 = 1
for(i in 2:100)
y[i] <- e[i] + 1 * e[i-1]
tsibble(idx = seq_len(100), y = y, index = idx) |>
autoplot(y) +
labs(title = TeX(" MA(1) model with $\\theta_1$ = 1"))# theta_1 = 0
for(i in 2:100)
y[i] <- e[i] + 0 * e[i-1]
tsibble(idx = seq_len(100), y = y, index = idx) |>
autoplot(y) +
labs(title = TeX(" MA(1) model with $\\theta_1$ = 0"))# theta_1 = -0.6
for(i in 2:100)
y[i] <- e[i] + -0.6 * e[i-1]
tsibble(idx = seq_len(100), y = y, index = idx) |>
autoplot(y) +
labs(title = TeX(" MA(1) model with $\\theta_1$ = -0.6"))Comment: As θ1 decreases, the frequency of the spikes increases. The shapes of the graphs remain largely unchanged in terms of magnitude. It appears that as θ1 decreases, the centering around the mean improves.
e. Generate data from an ARMA(1,1) model with ϕ1=0.6, θ1=0.6 and σ2=1.
set.seed(0008)
y <- numeric(100)
for(i in 2:100)
y[i] <- e[i] + 0.6 * e[i-1] + 0.6 * y[i-1]
arma_1_1 <- tsibble(idx = seq_len(100), y = y, index = idx)f. Generate data from an AR(2) model with ϕ1=−0.8, ϕ2=0.3 and σ2=1. (Note that these parameters will give a non-stationary series.)
set.seed(0009)
y <- numeric(100)
for(i in 3:100)
y[i] <- -0.8 * y[i-1] + 0.3 * y[i-2] + e[i]
ar_2 <- tsibble(idx = seq_len(100), y = y, index = idx)g. Graph the latter two series and compare them.
arma_1_1 |>
gg_tsdisplay(y, plot_type='partial') +
labs(title = TeX("ARMA(1,1) model with $\\phi_1 = 0.6$, $\\theta_1 = 0.6$, and $\\sigma^2 = 1$"))ar_2 |>
gg_tsdisplay(y, plot_type='partial') +
labs(title = TeX("AR(2) model with $\\phi_1 = -0.8$, $\\phi_2 = 0.3$, and $\\sigma^2 = 1$"))Comment: The ARMA(1,1) model appears to be stationary as it seems random, the ACF plot decreases quickly, and the PACF is truncated after the first lag. In contrast, the AR(2) model is not stationary; it oscillates around the mean and its variance increases exponentially with the index. The PACF plot shows a negative first lag with no significant values afterward, while the ACF plot also oscillates. This occurs because the parameter constraints are not satisfied, specifically ϕ2−ϕ1<1 is not met.
6-) Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.
a. Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.
fit <- aus_airpassengers |>
filter(Year < 2012) |>
model(ARIMA(Passengers))
report(fit)Series: Passengers
Model: ARIMA(0,2,1)
Coefficients:
ma1
-0.8756
s.e. 0.0722
sigma^2 estimated as 4.671: log likelihood=-87.8
AIC=179.61 AICc=179.93 BIC=182.99
fit |>
forecast(h=10) |>
autoplot(aus_airpassengers) +
labs(title = "Australian Aircraft Passengers with ARIMA(0,2,1)", y = "Passengers (in millions)")fit |>
gg_tsresiduals() +
labs(title = "Residuals for ARIMA(0,2,1)")Comment: The ARIMA (0,2,1) model was chosen, and the residuals appear to resemble white noise.
b. Write the model in terms of the backshift operator.
yt=−0.87εt−1+εt
(1−B)2yt=(1−0.87B)εt
c. Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.
fit2 <-aus_airpassengers |>
filter(Year < 2012) |>
model(ARIMA(Passengers ~ pdq(0,1,0)))
fit2 |>
forecast(h=10) |>
autoplot(aus_airpassengers) +
labs(title = "Australian Aircraft Passengers with ARIMA(0,1,0)", y = "Passengers (in millions)")fit2 |>
gg_tsresiduals() +
labs(title = "Residuals for ARIMA(0,1,0)")Comment: The ARIMA model from part (a) predicted higher values than the actual ones, whereas this ARIMA model predicted lower values than the actual ones. Additionally, the slope of the forecasts appears to be more gradual.
d. Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.
fit3 <-aus_airpassengers |>
filter(Year < 2012) |>
model(ARIMA(Passengers ~ pdq(2,1,2)))
fit3 |>
forecast(h=10) |>
autoplot(aus_airpassengers) +
labs(title = "Australian Aircraft Passengers with ARIMA(2,1,2)", y = "Passengers (in millions)")fit3 |>
gg_tsresiduals() +
labs(title = "Residuals for ARIMA(2,1,2)")#removing constant
fit4 <-aus_airpassengers |>
filter(Year < 2012) |>
model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))report(fit4)Series: Passengers
Model: NULL model
NULL model
Comment: It is more similar to part (a), and the residuals appear to be white noise. Removing the constant results in a null model. When the constant is omitted, the forecast includes a polynomial trend of order d−1, which in this case is 0, making it non-stationary.
e. Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?
fit5 <-aus_airpassengers |>
filter(Year < 2012) |>
model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))Comment: The slope becomes steeper, and a warning indicates that the model specifies a higher-order polynomial trend, suggesting the removal of the constant.
7-) For the United States GDP series (from global_economy):
a. If necessary, find a suitable Box-Cox transformation for the data;
# plot
global_economy |>
filter(Code == "USA") |>
gg_tsdisplay(GDP, plot_type='partial') +
labs(title = "Non-transformed United States GDP")Comment: The GDP shows an upward trend and is non-stationary. A Box-Cox transformation was applied to stabilize the data.
b. Fit a suitable ARIMA model to the transformed data using ARIMA();
fit6 <- global_economy |>
filter(Code == "USA") |>
model(ARIMA(box_cox(GDP, lambda)))
report(fit6)Series: GDP
Model: ARIMA(0,2,1)
Transformation: box_cox(GDP, lambda)
Coefficients:
ma1
-0.6439
s.e. 0.1196
sigma^2 estimated as 0.03444: log likelihood=15.33
AIC=-26.66 AICc=-26.43 BIC=-22.61
Comment: ARIMA(1,1,0) with a drift was fitted to the tansformed data.
c. Try some other plausible models by experimenting with the orders chosen;
# transformed plot
global_economy |>
filter(Code == "USA") |>
gg_tsdisplay(box_cox(GDP,lambda), plot_type='partial') +
labs(title = "Transformed United States GDP")# unit root test
global_economy |>
filter(Code == "USA") |>
features(box_cox(GDP,lambda), unitroot_ndiffs) # A tibble: 1 × 2
Country ndiffs
<fct> <int>
1 United States 2
# modeling several
usa_fit <- global_economy |>
filter(Code == "USA") |>
model(arima110 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,1,0)),
arima120 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,2,0)),
arima210 = ARIMA(box_cox(GDP,lambda) ~ pdq(2,1,0)),
arima212 = ARIMA(box_cox(GDP,lambda) ~ pdq(2,1,2)),
arima111 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,1,1)))
glance(usa_fit) |>
arrange(AICc) |>
select(.model:BIC)# A tibble: 5 × 6
.model sigma2 log_lik AIC AICc BIC
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 arima111 0.0339 16.9 -25.9 -25.1 -17.7
2 arima110 0.0353 15.3 -24.7 -24.2 -18.5
3 arima210 0.0351 16.0 -24.1 -23.3 -15.9
4 arima212 0.0345 17.5 -22.9 -21.2 -10.7
5 arima120 0.0392 11.9 -19.8 -19.5 -15.7
Comment: The unit root test indicates that differencing twice will be sufficient. The ACF plot shows a decreasing trend, and the PACF plot has a significant first lag, suggesting an AR(1) model, or ARIMA(1,1,0). This is consistent with part (b).
The models have comparable AICc values, with ARIMA(1,2,0) having the lowest, closely followed by ARIMA(1,1,0).
d. Choose what you think is the best model and check the residual diagnostics;
usa_fit |>
select(arima120) |>
gg_tsresiduals() +
ggtitle("Residuals for ARIMA(1,2,0)")Comment: Since ARIMA(1,2,0) has the lowest AICc value, that model was chosen. The residuals seems to be white noise.
e. Produce forecasts of your fitted model. Do the forecasts look reasonable?
usa_fit |>
forecast(h=5) |>
filter(.model=='arima120') |>
autoplot(global_economy)Comment: The forecasts appear reasonable, as the trend continues with a similar slope.
f. Compare the results with what you would obtain using ETS() (with no transformation).
ets_fit <- global_economy |>
filter(Code == "USA") |>
model(ETS(GDP))
report(ets_fit)Series: GDP
Model: ETS(M,A,N)
Smoothing parameters:
alpha = 0.9990876
beta = 0.5011949
Initial states:
l[0] b[0]
448093333334 64917355687
sigma^2: 7e-04
AIC AICc BIC
3190.787 3191.941 3201.089
ets_fit |>
forecast(h=5) |>
autoplot(global_economy)Comment: The ETS() model has a significantly higher AICc, indicating that it performs worse.