Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
a) Explain the differences among these figures. Do they all indicate that the data are white noise?
The ACF plots while having spikes stay within the confidence limits for all three plots. In the small sample set the variation causes the auto-correlations to fluctuate a bit but as the sample size increases the confidence intervals gets narrower as the uncertainty decreases especially when it gets to a 1000 samples where the ACF is nearly flat and there are no significant lags. The variation in the beginning even thought its present can be accounted to the low sample size. The data still qualifies as white nose
b) Why are the critical values at different distances from the mean of zero? Why are the auto-correlations different in each figure when they each refer to white noise?
The critical values are always a function of the sample size and it decreases as an exponent as the sample size increases. For the 95 percent confidence interval for a white noise series we expect each of the correlations to be within the 1.96/sqrt T, close to zero but not exactly zero ,and also not similar when it comes to different sample sizes to account for the random variation
A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.
gafa_stock %>%
filter(Symbol == "AMZN") %>%
autoplot(Close)
We see a constant upwards trend for the closing price until 2019 and
then a sharp decline. This is clearly not a stationary data considering
the obvious trend.
gafa_stock %>%
filter(Symbol=="AMZN") %>%
ACF(Close) %>%
autoplot()
gafa_stock %>%
filter(Symbol=="AMZN") %>%
PACF(Close) %>%
autoplot()
ACF shows all lags outside the confidence interval r1 large and positive
and decreasing very slowly consistent with a non stationary data as for
stationary the ACF would’ve dropped to zero relatively quickly. PACF
shows a huge spike at lag 1 showing us that it needs to be differenced
most probably just once to be stationary
For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
a) Turkish GDP from global_economy.
turkish <- global_economy %>%
filter(Country=="Turkey")
turkish %>%
autoplot(GDP)
turkish %>%
ACF(GDP) %>%
autoplot()
turkish %>%
PACF(GDP) %>%
autoplot()
lambda <- turkish %>%
features(GDP, features = guerrero) %>%
pull(lambda_guerrero)
turkish <- turkish %>%
mutate(lambdagdp = box_cox(GDP, lambda))
turkish <- turkish %>%
mutate(diff1gdp = difference(lambdagdp))
autoplot(turkish, diff1gdp)
turkish %>%
ACF(diff1gdp) %>%
autoplot()
turkish %>%
PACF(diff1gdp) %>%
autoplot()
We were able to get the data almost stationary with a box cox
transformation and one round of differencing
b) Accommodation takings in the state of Tasmania from aus_accommodation.
tas <- aus_accommodation %>%
filter(State == "Tasmania") %>%
select(Takings)
autoplot(tas, Takings)
Here we would probably have to difference to deal with the trend as well
as the seasonality after the box cos transformation
lambda <- tas %>%
features(Takings, features = guerrero) %>%
pull(lambda_guerrero)
tas <- tas %>%
mutate(takings_bc = box_cox(Takings, lambda))
We do the first difference to handle the trend
tas <- tas %>%
mutate(diff1_takings = difference(takings_bc))
And a seasonal differencing to handle the seasonality
tas <- tas %>%
mutate(diff_seasonal = difference(diff1_takings, lag = 4))
autoplot(tas, diff_seasonal)
tas %>%
ACF(diff_seasonal) %>%
autoplot()
tas %>%
PACF(diff_seasonal) %>%
autoplot()
We were able to achieve near stationary data with two rounds of
differencing here.
c) Monthly sales from souvenirs.
sou <- souvenirs %>%
select(Month, Sales)
autoplot(sou, Sales)
lambda <- sou %>%
features(Sales, features = guerrero) %>%
pull(lambda_guerrero)
sou <- sou %>%
mutate(sales_bc = box_cox(Sales, lambda))
First differencing
sou <- sou %>%
mutate(diff1 = difference(sales_bc),lag = 12)
autoplot(sou, diff1)
sou %>%
ACF(diff1) %>%
autoplot()
sou %>%
PACF(diff1) %>%
autoplot()
Second differencing
sou <- sou %>%
mutate(diff2 = difference(diff1),lag = 12)
autoplot(sou, diff2)
sou %>%
ACF(diff2) %>%
autoplot()
sou %>%
PACF(diff2) %>%
autoplot()
For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
set.seed(90483484)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries %>%
autoplot(Turnover)
lambda <- myseries %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
myseries <- myseries %>%
mutate(turnover_bc = box_cox(Turnover, lambda))
autoplot(myseries, turnover_bc)
myseries <- myseries %>%
mutate(diff1 = difference(turnover_bc))
autoplot(myseries, diff1)
myseries %>%
ACF(diff1) %>%
autoplot()
myseries %>%
PACF(diff1) %>%
autoplot()
myseries <- myseries %>%
mutate(diff2 = difference(diff1, lag = 12))
autoplot(myseries, diff2)
myseries %>%
ACF(diff2) %>%
autoplot()
myseries %>%
PACF(diff2) %>%
autoplot()
A box cox transformation , first order differencing to get the trend and the second order seasonal differencing to get the seasonality seems to almost make this data stationary
a) Use the following R code to generate data from an AR(1) model
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
autoplot(sim, y)
b) Produce a time plot for the series. How does the plot change as you change
y_neg <- numeric(100)
y_zero <- numeric(100)
y_1 <- numeric(100)
e <- rnorm(100)
for(i in 2:100){
y_neg[i] <- -0.6 * y_neg[i-1] + e[i]
y_zero[i] <- 0 * y_zero[i-1] + e[i]
y_1[i] <- 1 * y_1[i-1] + e[i]
}
sim <- tsibble(
idx = seq_len(100),
phi_neg = y_neg,
phi_0 = y_zero,
phi_1 = y_1,
index = idx
)
sim %>%
autoplot(phi_neg) +
labs(title = "phi = -0.6$")
sim %>%
autoplot(phi_0) +
labs(title = "phi = 0$")
sim %>%
autoplot(y) +
labs(title = "phi = +0.6")
sim %>%
autoplot(phi_1) +
labs(title = "phi = +1")
We notice that as Phi goes on to 0 the data resembles white noise more
closely, the variances remain relatively constant, data remains
stationary, as it goes away from 0 it goes to resemble a random walk,
variance changes over time , and trends seem to emerge and the data gets
less and less stationary
c) Write your own code to generate data from an MA(1) model with
yma1 <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
yma1[i] <- e[i] + 0.6*e[i-1]
d) Produce a time plot for the series. How does the plot change as you change theta
sima1 <- tsibble(idx = seq_len(100), yma1 = yma1, index = idx)
sima1 %>%
autoplot(yma1)
n <- 100
e <- rnorm(n)
yma1_neg <- numeric(n)
yma1_0 <- numeric(n)
yma1_1 <- numeric(n)
for(i in 2:n){
yma1_neg[i] <- -0.6*e[i-1] + e[i]
yma1_0[i] <- 0*e[i-1] + e[i]
yma1_1[i] <- 1*e[i-1] + e[i]
}
sima1 <- tsibble(
idx = seq_len(n),
yma1_neg = yma1_neg,
yma1_0 = yma1_0,
yma1_1 = yma1_1,
index = idx
)
sima1 %>% autoplot(yma1_neg) + labs(title = "theta = -0.6")
sima1 %>% autoplot(yma1_0) + labs(title = "theta = 0")
sima1 %>% autoplot(yma1_1) + labs(title = "theta = 1")
This is similar to the Ar model with the data getting more similar to
white noise at 0 and random walk as it moves away
e) Generate data from an ARMA(1,1) model wit
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- (0.6*y[i-1]) + e[i] + (0.6*e[i-1])
simarma <- tsibble(idx = seq_len(100), y = y, index = idx)
f) Generate data from an AR(2) mode
y <- numeric(100)
e <- rnorm(100)
for (i in 3:100)
y[i] <- (-0.8*y[i-1]) + e[i] + (0.3*y[i-2])
simar2 <- tsibble(idx = seq_len(100), y = y, index = idx)
g) Graph the latter two series and compare them.
simarma %>%
autoplot(y)
simar2 %>%
autoplot(y)
ARMA plot looks to be stationary while the AR series seems to be oscillating with wider variances with a consistent cyclic pattern
Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.
a) Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.
fit <- aus_airpassengers %>%
model(ARIMA(Passengers))
report(fit)
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8963
## s.e. 0.0594
##
## sigma^2 estimated as 4.308: log likelihood=-97.02
## AIC=198.04 AICc=198.32 BIC=201.65
ARIMA(0,2,1) model was selected
fit %>%
gg_tsresiduals()
Residuals look mostly like white noise, normally distributed around 0, peaks within confidence intervals, no discernible patterns
fit %>%
forecast(h=10) %>%
autoplot(aus_airpassengers)
b) Write the model in terms of the backshift operator.
(1−B)^2yt= c + (1−theta1B)Et
Here Theta1 = -0.8963 Constant = 0
(1−B)^2yt=(1−0.8963B)Et
Where Byt = yt-1
c) Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.
fit <- aus_airpassengers |>
model(ARIMA(Passengers ~ 1 + pdq(0,1,0)))
fit %>%
forecast(h=10) %>%
autoplot(aus_airpassengers)
The confidence interval is narrower and forecast looks to be damped for
this model when compared to a
d) Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.
fit_constant <- aus_airpassengers %>%
model(ARIMA(Passengers ~ 1 + pdq(2,1,2)))
fit_constant %>%
forecast(h = 10) %>%
autoplot(aus_airpassengers)
fit_withoutconstant <- aus_airpassengers %>%
model(ARIMA(Passengers ~ pdq(2,1,2)))
fit_withoutconstant %>%
forecast(h = 10) %>%
autoplot(aus_airpassengers)
This is a lot closer to the automatically chosen ARIMA model. Without
the constant it threw an error and did not forecast as invalid model due
to non stationary
e) Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?
fit <- aus_airpassengers %>%
model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))
fit %>%
forecast(h = 10) %>%
autoplot(aus_airpassengers)
Here the forecast increases much more sharply in line with the trend and
the interval is also narrower than a)
For the United States GDP series (from global_economy):
a) if necessary, find a suitable Box-Cox transformation for the data;
usa <- global_economy %>%
filter(Country == "United States")
usa %>%
autoplot(GDP)
lambda <- usa %>%
features(GDP, features = guerrero) %>%
pull(lambda_guerrero)
usa <- usa %>%
mutate(GDP_bc = box_cox(GDP, lambda))
usa %>%
autoplot(GDP_bc)
b) fit a suitable ARIMA model to the transformed data using ARIMA();
fitusa <- usa %>%
model(ARIMA(box_cox(GDP, lambda)))
report(fitusa)
## Series: GDP
## Model: ARIMA(1,1,0) w/ drift
## Transformation: box_cox(GDP, lambda)
##
## Coefficients:
## ar1 constant
## 0.4586 118.1822
## s.e. 0.1198 9.5047
##
## sigma^2 estimated as 5479: log likelihood=-325.32
## AIC=656.65 AICc=657.1 BIC=662.78
c) try some other plausible models by experimenting with the orders chosen;
keeping the drift as recommended, d=1 to account for trend and changing p, q
fit1 <- usa %>%
model(ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(1,1,1)))
report(fit1)
## Series: GDP
## Model: ARIMA(1,1,1) w/ drift
## Transformation: box_cox(GDP, lambda)
##
## Coefficients:
## ar1 ma1 constant
## 0.4736 -0.0190 114.8547
## s.e. 0.2851 0.3286 9.3486
##
## sigma^2 estimated as 5580: log likelihood=-325.32
## AIC=658.64 AICc=659.41 BIC=666.82
fit2 <- usa %>%
model(ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(0,1,1)))
report(fit1)
## Series: GDP
## Model: ARIMA(1,1,1) w/ drift
## Transformation: box_cox(GDP, lambda)
##
## Coefficients:
## ar1 ma1 constant
## 0.4736 -0.0190 114.8547
## s.e. 0.2851 0.3286 9.3486
##
## sigma^2 estimated as 5580: log likelihood=-325.32
## AIC=658.64 AICc=659.41 BIC=666.82
d) choose what you think is the best model and check the residual diagnostics;
The best model is the one that was recommended, ARIMA(1,1,0) w/ drift with AICc = 657.1, less residual variance with the lowest sigma
fitusa %>% gg_tsresiduals()
The plots confirm the facts we have. This model has residuals that
resemble white noise, no discernible patterns, adequately normal around
0 and auto-correlations much lower than the intervals.
e) produce forecasts of your fitted model. Do the forecasts look reasonable?
forecast_arima <- fitusa %>%
forecast(h = 10)
autoplot(forecast_arima, usa)
Given the prior data the forecasts look very reasonable and follows the
trend line
f) compare the results with what you would obtain using ETS() (with no transformation).
fitets <- usa |>
model(ETS(GDP))
report(fitets)
## Series: GDP
## Model: ETS(M,A,N)
## Smoothing parameters:
## alpha = 0.9990876
## beta = 0.5011949
##
## Initial states:
## l[0] b[0]
## 448093333334 64917355687
##
## sigma^2: 7e-04
##
## AIC AICc BIC
## 3190.787 3191.941 3201.089
forecast_ets <- fitets %>%
forecast(h=10)
autoplot(forecast_ets, usa)
This recommended ETS model ETS(M,A,N) does worse with its forecasts when compared to the ARIMA(1,1,0) w/ drift model with higher AIc. It also accounts for the trend better and doesent dampen off like the ETS forecasts which are more conservative.