Code
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, error = FALSE)knitr::opts_chunk$set(echo = TRUE, warning = FALSE, error = FALSE)Do exercises 9.1, 9.2, 9.3, 9.5, 9.6, 9.7, 9.8 in Hyndman.
9.1 Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
ANSWER: These do look like white noise because we expect the autocorrelation to be close to zero. We expect 95% of spikes to be in the AcF to lie within the formula below to be defined as white noise. Additionally, These graphs do not have any spikes outside of the bound, therefore these are white noise.
From the book: “For white noise series, we expect each autocorrelation to be close to zero. Of course, they will not be exactly equal to zero as there is some random variation. For a white noise series, we expect 95% of the spikes in the ACF to lie within
±1.96/√T where T is the length of the time series. It is common to plot these bounds on a graph of the ACF (the blue dashed lines above). If one or more large spikes are outside these bounds, or if substantially more than 5% of spikes are outside these bounds, then the series is probably not white noise.
Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?
9.2 A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.
ANSWER:
Amazon and stock price have Trends and changing levels, therefore these are not stationary data. For a stationary time series, the ACF will drop to zero relatively quickly, while the ACF of non-stationary data decreases slowly.Here we can see that the ACF has not dropped to 0 quickly therefore this is non-stationary data.PACF of differenced data will show more like a white noise , however in the graph we can see we see a significant spike in the beginning, therefore this is not white noise yet. We can consider differencing.
amazon <- gafa_stock |>
filter(Symbol == 'AMZN') |>
mutate(Day = row_number()) |>
update_tsibble(index=Day, regular=T)
plot1 <- amazon |>
autoplot(Close) +
labs(title="Amazon Daily Stock Closing Prices", x = "Day")
plot2 <- amazon |>
ACF(Close) |>
autoplot() +
labs(title = "ACF Plot")
plot3 <- amazon |>
PACF(Close) |>
autoplot() +
labs(title = "PACF Plot")
bottom_row <- plot_grid(plot2, plot3)
plot_grid(plot1, bottom_row, ncol=1)9.3 For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
Turkish GDP from global_economy.
ANSWER: Plot given below. Since the ACF shows a statinary series, we would not have to go for differencing.For a stationary time series, the ACF will drop to zero relatively quickly as seen below and there is a pattern as well.
turkey <- global_economy |>
filter(Country == "Turkey")
turkey |>
autoplot(GDP) +
labs(title = "Turkish GDP", x = "Year")turkey |>
autoplot(log(GDP)) +
labs(title = "Turkish GDP: Log ", x = "Year") turkey |>
ACF(GDP) |>
autoplot() +
labs(title = "ACF Plot")aus_accommodation.ANSWER: Plot given below
tasmania <- aus_accommodation |>
filter(State == "Tasmania")
tasmania |>
autoplot(Takings) +
labs(title = "Tasmanian Accomodation")ANSWER:Box-Cox
lambda <- tasmania |>
features(Takings, features = guerrero) |>
pull(lambda_guerrero)
tasmania |>
autoplot(box_cox(Takings, lambda)) +
labs(title = "Box-Cox Transformed Tasmanian Accomodation")ANSWER: first order difference
plot1 <- tasmania |>
mutate(Trans = box_cox(Takings, lambda)) |>
autoplot(difference(Trans, lag=4)) +
labs(title = "First Order Difference Tasmanian")
plot2 <- tasmania |>
mutate(Trans = box_cox(Takings, lambda)) |>
ACF(difference(Trans, lag=4)) |>
autoplot() +
labs(title = "ACF Plot: First Order Difference Tasmania")
plot_grid(plot1, plot2, ncol=1)ANSWER:Second order differencing has created a better ACF plot, however, there are 2 spikes outside of the boundary.
plot1 <- tasmania |>
mutate(Trans = box_cox(Takings, lambda)) |>
autoplot(difference(Trans, lag=4) |> difference()) +
labs(title = "Second Order Difference Tasmania ")
plot2 <- tasmania |>
mutate(Trans = box_cox(Takings, lambda)) |>
ACF(difference(Trans, lag=4) |> difference()) |>
autoplot() +
labs(title = "ACF Plot: Second Order Difference Tasmania")
plot_grid(plot1, plot2, ncol=1)Monthly sales from souvenirs.
ANSWER:
souvenirs |>
autoplot(Sales) +
labs(title = "Monthly Souvenir Sales")souvenirs |>
autoplot(log(Sales)) +
labs(title = "Log Transformed Monthly Souvenir Sales")ANSWER: We can try to difference the data, since this is a seasonal and non-stationary series.
plot1 <- souvenirs |>
autoplot(difference(log(Sales), lag=16)) +
labs(title = "First Order Difference ")
plot2 <- souvenirs |>
ACF(difference(log(Sales), lag=16)) |>
autoplot() +
labs(title = "ACF Plot: First Order Difference ")
plot_grid(plot1, plot2, ncol=1)ANSWER: The logarithms stabilise the variance, while the seasonal differences remove the seasonality and trend. The ACF plot shows much better data here with very limited outlier.
plot1 <- souvenirs |>
autoplot(difference(log(Sales), lag=12) |> difference()) +
labs(title = "Second Order Difference")
plot2 <- souvenirs |>
ACF(difference(log(Sales), lag=12) |> difference()) |>
autoplot() +
labs(title = "ACF Plot: Second Order Difference")
plot_grid(plot1, plot2, ncol=1)9.5 For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
ANSWER:
set.seed(12345678)
retail <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
retail |>
autoplot(Turnover) +
labs(title = "Retail Turnover")lambda <- retail |>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
retail |>
autoplot(box_cox(Turnover, lambda)) +
labs(title = "Transformed Retail Turnover")ANSWER: The ACF shows that the plot is not yet white noise, so we can consider proceeding with 2nd order differenciation.
plot1 <- retail |>
autoplot(difference(box_cox(Turnover, lambda), 12)) +
labs(title = "First Order Difference: Transformed Retail Turnover")
plot2 <- retail |>
ACF(difference(box_cox(Turnover, lambda), 12)) |>
autoplot() +
labs(title = "ACF Plot First Order Difference: Transformed Retail Turnover")
plot_grid(plot1, plot2, ncol=1)ANSWER: The 2nd order difference seems a bit better except with 2 spikes and outliers outside of the boundary box.
plot1 <- retail |>
autoplot(difference(box_cox(Turnover, lambda), 12) |> difference()) +
labs(title = "Second Order Difference: Transformed Retail Turnover")
plot2 <- retail |>
ACF(difference(box_cox(Turnover, lambda), 12) |> difference()) |>
autoplot() +
labs(title = "ACF Plot Second Order Difference: Transformed Retail Turnover")
plot_grid(plot1, plot2, ncol=1)9.6 Simulate and plot some data from simple ARIMA models.
ANSWER:
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)ANSWER:
When phi_1 is changed and 0, the series resembles more of a white noise. When it is 1, it will be more of a random walk. For phi <0 it will oscillate around the mean.
sim |>
autoplot(y) +
labs(title = "Time Series (Phi = 0.6)")y1 <- numeric(100)
y2 <- numeric(100)
y3 <- numeric(100)
y4 <- numeric(100)
y5 <- numeric(100)
y6 <- numeric(100)
for(i in 2:100){
y1[i] <- -0.6*y[i-1] + e[i]
y2[i] <- -0.2*y[i-1] + e[i]
y3[i] <- 0.1*y[i-1] + e[i]
y4[i] <- 0.2*y[i-1] + e[i]
y5[i] <- 0.6*y[i-1] + e[i]
y6[i] <- 0.9*y[i-1] + e[i]
}
sim <- tsibble(idx = seq_len(100), y1 = y1, y2 = y2, y3 = y3, y4 = y4, y5 = y5, y6 = y6, index = idx)
plot1 <- sim |>
autoplot(y1) +
labs(title =
TeX("$\\phi_1 = -0.6$"))
plot2 <- sim |>
autoplot(y2) +
labs(title = TeX("$\\phi_1 = -0.2$"))
plot3 <- sim |>
autoplot(y3) +
labs(title = TeX("$\\phi_1 = 0.1$"))
plot4 <- sim |>
autoplot(y4) +
labs(title = TeX("$\\phi_1 = 0.2$"))
plot5 <- sim |>
autoplot(y5) +
labs(title = TeX("$\\phi_1 = 0.3$"))
plot6 <- sim |>
autoplot(y6) +
labs(title = TeX("$\\phi_1 = 0.9$"))
plot_grid(plot1, plot2, plot3, plot4, plot5, plot6, ncol=2)ANSWER: Code below shows that for MA(1) model we work with the error term rather than the Y.
y <- numeric(100)
e <- rnorm(100,1)
theta_1 = 0.6
for(i in 2:100)
y[i] <- theta_1*e[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)sim |>
autoplot(y) +
labs(title = "Time Series MA(1) (Theta = 0.6)")y1 <- numeric(100)
y2 <- numeric(100)
y3 <- numeric(100)
y4 <- numeric(100)
y5 <- numeric(100)
y6 <- numeric(100)
for(i in 2:100){
y1[i] <- -0.6*e[i-1] + e[i]
y2[i] <- -0.2*e[i-1] + e[i]
y3[i] <- 0.1*e[i-1] + e[i]
y4[i] <- 0.2*e[i-1] + e[i]
y5[i] <- 0.3*e[i-1] + e[i]
y6[i] <- 0.9*e[i-1] + e[i]
}
sim <- tsibble(idx = seq_len(100), y1 = y1, y2 = y2, y3 = y3, y4 = y4, y5 = y5, y6 = y6, index = idx)
plot1 <- sim |>
autoplot(y1) +
labs(title = TeX("$\\theta_1 = -0.6$"))
plot2 <- sim |>
autoplot(y2) +
labs(title = TeX("$\\theta_1 = -0.2$"))
plot3 <- sim |>
autoplot(y3) +
labs(title = TeX("$\\theta_1 = 0.1$"))
plot4 <- sim |>
autoplot(y4) +
labs(title = TeX("$\\theta_1 = 0.2$"))
plot5 <- sim |>
autoplot(y5) +
labs(title = TeX("$\\theta_1 = 0.3$"))
plot6 <- sim |>
autoplot(y6) +
labs(title = TeX("$\\theta_1 = 0.9$"))
plot_grid(plot1, plot2, plot3, plot4, plot5, plot6, ncol=2)Generate data from an ARMA(1,1) model with ϕ1=0.6ϕ1=0.6, θ1=0.6θ1=0.6 and σ2=1σ2=1.
set.seed(123)
Y <- numeric(100)
e <- rnorm(100,1)
phi_1 <- 0.6
theta_1 <- 0.6
for(i in 2:100)
y[i] <- phi_1*y[i-1] + theta_1*e[i-1] + e[i]
sim1 <- tsibble(idx = seq_len(100), y = y, index = idx)Generate data from an AR(2) model with ϕ1=−0.8ϕ1=−0.8, ϕ2=0.3ϕ2=0.3 and σ2=1σ2=1. (Note that these parameters will give a non-stationary series.)
set.seed(123)
y <- numeric(100)
e <- rnorm(100,1)
phi_1 <- -0.8
phi_2 <- 0.3
for(i in 3:100)
y[i] <- phi_1*y[i-1] + phi_2*y[i-2] + e[i]
sim2 <- tsibble(idx = seq_len(100), y = y, index = idx)Graph the latter two series and compare them.
ANSWER: ar(2) appears to have a grandual increase , whereas the ar(1) doesn’t have a pattern.
plot1 <- sim1 |>
autoplot(y) +
labs(title = TeX("ARMA(1,1) [$\\phi_1 = 0.6$, $\\theta_1 = 0.6$]"))
plot2 <- sim2 |>
autoplot(y) +
labs(title = TeX("AR(2) [$\\phi_1 = -0.8$, $\\phi_2 = 0.3$]"))
plot_grid(plot1, plot2, ncol=1)9.7 Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.
head(aus_airpassengers)# A tsibble: 6 x 2 [1Y]
Year Passengers
<dbl> <dbl>
1 1970 7.32
2 1971 7.33
3 1972 7.80
4 1973 9.38
5 1974 10.7
6 1975 11.1
ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.fit <- aus_airpassengers |>
model(ARIMA(Passengers))
report(fit)Series: Passengers
Model: ARIMA(0,2,1)
Coefficients:
ma1
-0.8963
s.e. 0.0594
sigma^2 estimated as 4.308: log likelihood=-97.02
AIC=198.04 AICc=198.32 BIC=201.65
train <- aus_airpassengers %>%
filter(Year <= 2011)
fit3 <- train %>%
model(arima110 = ARIMA(log(Passengers) ~ pdq(1,1,0)),
stepwise = ARIMA(log(Passengers)),
search = ARIMA(Passengers, stepwise=FALSE))
glance(fit)# A tibble: 1 × 8
.model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
1 ARIMA(Passengers) 4.31 -97.0 198. 198. 202. <cpl [0]> <cpl [1]>
ANSWER: Yes the residuals do look like white noise.
# check if white noise
fit |>
gg_tsresiduals()# plot forecasts
fc <- fit |>
forecast(h="10 years")
fc |>
autoplot(aus_airpassengers) +
labs(title = "10 Year Forecast ")fit_new <- train %>%
model(arima021 = ARIMA(log(Passengers) ~ pdq(0,2,1)))
fit_new %>%
forecast(h=10) %>%
autoplot(train)y_t = (1-B_2) (1 + {}B){_t}
#ANSWER: "y_t = (1-B_2) \times (1 + {\theta}B){\epsilon_t}"fit <- aus_airpassengers |>
model(ARIMA(log(Passengers) ~ 1 + pdq(0,1,0)))
fc <- fit |>
forecast(h="10 years")
fc |>
autoplot(aus_airpassengers) +
labs(title = "10 Year Forecast with Drift")fit <- aus_airpassengers |>
model(ARIMA(log(Passengers) ~ 1 + pdq(2,1,2)))
fc <- fit |>
forecast(h="10 years")
fc |>
autoplot(aus_airpassengers) +
labs(title = "10 Year Forecast with Drift")fit1 <- aus_airpassengers |>
model(ARIMA(log(Passengers) ~ pdq(2,1,2)))
fc1 <- fit1 |>
forecast(h="10 years")
fc1 |>
autoplot(aus_airpassengers) +
labs(title = "10 Year Forecast with Drift")ANSWER : We get a warning - Warning: Model specification induces a quadratic or higher order polynomial trend. This is generally discouraged, consider removing the constant or reducing the number of differences.
fit <- aus_airpassengers %>%
model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))
fc <- fit %>% forecast(h=10)
fit %>%
forecast(h=10) %>%
autoplot(aus_airpassengers)9.8 For the United States GDP series (from global_economy):
united_states <- global_economy |>
filter(Country == "United States")plot1 <- united_states |>
autoplot(GDP) +
labs(title = "United States GDP")
lambda <- united_states |>
features(GDP, features = guerrero) |>
pull(lambda_guerrero)
united_states <- united_states |>
mutate(BoxCox_GDP = box_cox(GDP, lambda))
plot2 <- united_states |>
autoplot(BoxCox_GDP) +
labs(title = "United States GDP BoxCox Transformed")
plot_grid(plot1, plot2, ncol=1)ARIMA();fit1 <- united_states |>
model(ARIMA(box_cox(GDP, lambda)))
report(fit1)Series: GDP
Model: ARIMA(1,1,0) w/ drift
Transformation: box_cox(GDP, lambda)
Coefficients:
ar1 constant
0.4586 118.1822
s.e. 0.1198 9.5047
sigma^2 estimated as 5479: log likelihood=-325.32
AIC=656.65 AICc=657.1 BIC=662.78
fit2 <- united_states |>
model(ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(0,1,1)))
report(fit2)Series: GDP
Model: ARIMA(0,1,1) w/ drift
Transformation: box_cox(GDP, lambda)
Coefficients:
ma1 constant
0.4026 219.6195
s.e. 0.1098 13.6953
sigma^2 estimated as 5689: log likelihood=-326.37
AIC=658.73 AICc=659.18 BIC=664.86
fit3 <- united_states |>
model(ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(2,1,1)))
report(fit3)Series: GDP
Model: ARIMA(2,1,1) w/ drift
Transformation: box_cox(GDP, lambda)
Coefficients:
ar1 ar2 ma1 constant
1.1661 -0.2792 -0.7356 24.3346
s.e. 0.3418 0.2108 0.3077 2.5572
sigma^2 estimated as 5647: log likelihood=-325.14
AIC=660.29 AICc=661.46 BIC=670.5
fit4 <- united_states |>
model(ARIMA(box_cox(GDP, lambda) ~ 0 + pdq(0,1,1)))
report(fit4)Series: GDP
Model: ARIMA(0,1,1)
Transformation: box_cox(GDP, lambda)
Coefficients:
ma1
0.7673
s.e. 0.0674
sigma^2 estimated as 23338: log likelihood=-367.47
AIC=738.93 AICc=739.16 BIC=743.02
ANSWER: FIT1 and FIT2 look like they have lower AIC scores. Additionally the ACF plot seem like they are white noise on the error. Therefore we have selected these 2.
fit1 |>
gg_tsresiduals()fit2 |>
gg_tsresiduals()ANSWER - The forecast shows upward trend similar to the original model.
fc <- fit1 |>
forecast(h="10 years")
fc |>
autoplot(united_states) +
labs("10 Year United States GDP Prediction")ETS() (with no transformation).ANSWER: ETS also shows a similar upward trend in GDP.
fit6 <- united_states |>
model(ETS(GDP))
report(fit6)Series: GDP
Model: ETS(M,A,N)
Smoothing parameters:
alpha = 0.9990876
beta = 0.5011949
Initial states:
l[0] b[0]
448093333334 64917355687
sigma^2: 7e-04
AIC AICc BIC
3190.787 3191.941 3201.089
fc6 <- fit6 |>
forecast(h="10 years")
fc6 |>
autoplot(united_states)