Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
Answer:
The main difference between these figures is that the more random number get higher, the spike become small. In other words, as the sample size increases, the ACF values are closer to zero, which is expected for white noise.
Answer:
The critical values are further from zero when the number of data points is smaller because the estimate of the autocorrelation is less precise. We can calculate the critical value as \(±1.96/\sqrt{T}\), where \(T\) is the length of the time series. Even though all series refer to white noise, the autocorrelations can differ due to random variation. Each series is a different realization of white noise, and due to the randomness, the actual autocorrelations observed in a finite sample can vary.
A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.
Answer:
data(gafa_stock)
gafa_stock
## # A tsibble: 5,032 x 8 [!]
## # Key: Symbol [4]
## Symbol Date Open High Low Close Adj_Close Volume
## <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AAPL 2014-01-02 79.4 79.6 78.9 79.0 67.0 58671200
## 2 AAPL 2014-01-03 79.0 79.1 77.2 77.3 65.5 98116900
## 3 AAPL 2014-01-06 76.8 78.1 76.2 77.7 65.9 103152700
## 4 AAPL 2014-01-07 77.8 78.0 76.8 77.1 65.4 79302300
## 5 AAPL 2014-01-08 77.0 77.9 77.0 77.6 65.8 64632400
## 6 AAPL 2014-01-09 78.1 78.1 76.5 76.6 65.0 69787200
## 7 AAPL 2014-01-10 77.1 77.3 75.9 76.1 64.5 76244000
## 8 AAPL 2014-01-13 75.7 77.5 75.7 76.5 64.9 94623200
## 9 AAPL 2014-01-14 76.9 78.1 76.8 78.1 66.1 83140400
## 10 AAPL 2014-01-15 79.1 80.0 78.8 79.6 67.5 97909700
## # ℹ 5,022 more rows
Plotting closing prices for Amazon Stock:
gafa_stock %>%
filter(Symbol == "AMZN") %>%
gg_tsdisplay(Close, plot_type='partial') +
labs(title = "Amazon Daily Closing Stock Price")
The stock price shows a clear upward trend over time. There are also changes in the variance over time, as seen by the amplitude of the price changes, which seems to increase with the level of the series, indicating that the series may also be heteroskedastic, another indicator of non-stationarity.
The ACF plot displays a very slow decay, wchich indicates that the past values have a strong correlation with the future values even at higher lags. In a stationary series, we would expect the ACF to drop off quickly to zero, which is not the case here. The PACF shows a significant spike at lag 1 and then cuts off, which is typical for a non-stationary series with a stochastic trend.
As we can see the chart suggest that the time series is non-stationary. Hence, differencing should be considered to make the series stationary.
For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
Answer:
# plotting
global_economy %>%
filter(Country == "Turkey") %>%
gg_tsdisplay(GDP, plot_type='partial') +
labs(title = "Non-transformed Turkish GDP")
# calculating lambda
lambda <- global_economy %>%
filter(Country == "Turkey") %>%
features(GDP, features = guerrero) %>%
pull(lambda_guerrero)
# unit root test
global_economy %>%
filter(Country == "Turkey") %>%
features(box_cox(GDP,lambda), unitroot_ndiffs)
## # A tibble: 1 × 2
## Country ndiffs
## <fct> <int>
## 1 Turkey 1
Answer:
# plotting
aus_accommodation %>%
filter(State == 'Tasmania') %>%
gg_tsdisplay(Takings, plot_type='partial') +
labs(title = "Non-transformed Tasmania Accomodation Takings")
# calculating lambda
lambda <- aus_accommodation %>%
filter(State == 'Tasmania') %>%
features(Takings, features = guerrero) %>%
pull(lambda_guerrero)
# unit root test
aus_accommodation %>%
filter(State == 'Tasmania') %>%
features(box_cox(Takings,lambda), unitroot_ndiffs)
## # A tibble: 1 × 2
## State ndiffs
## <chr> <int>
## 1 Tasmania 1
Answer:
# plotting
souvenirs %>%
gg_tsdisplay(Sales, plot_type = 'partial') +
labs(title = "Monthly Souvenirs sales")
# calculating lambda
lambda <- souvenirs %>%
features(Sales, features = guerrero) %>%
pull(lambda_guerrero)
# unit root test
souvenirs %>%
mutate(Sales = box_cox(Sales, lambda)) %>%
features(Sales, unitroot_ndiffs)
## # A tibble: 1 × 1
## ndiffs
## <int>
## 1 1
For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
Answer
set.seed(123)
myseries <- aus_retail %>%
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
# plotting
myseries %>%
gg_tsdisplay(Turnover, plot_type='partial', lag = 36) +
labs(title = "Non-transformed Retail Turnover")
# calculating lambda
lambda_turnover <- myseries %>%
features(Turnover, features = guerrero) %>%
pull(lambda_guerrero)
# unit root test
myseries %>%
mutate(Turnover = box_cox(Turnover, lambda_turnover)) %>%
features(Turnover, unitroot_ndiffs)
## # A tibble: 1 × 3
## State Industry ndiffs
## <chr> <chr> <int>
## 1 Victoria Household goods retailing 1
The retail data shows non-stationarity and seasonality. To change the data as needed, a box-cox transformation was used. The seasonal KPSS unit root test recommends a single differencing step. The data appears to be stationary after transforming it and differencing once; the ACF and PACF plots demonstrate stationarity, and the data is centered around zero.
Simulate and plot some data from simple ARIMA models.
Answer:
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
Answer:
sim %>%
autoplot(y) +
labs(title = "AR(1) model with phi_1 = 0.6")
Changing the \(\phi_1\) to -0.6:
for(i in 2:100)
y[i] <- -0.6*y[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) %>%
autoplot(y) +
labs(title = "AR(1) model with phi_1 = -0.6")
Changing the \(\phi_1\) to 0.9:
for(i in 2:100)
y[i] <- 0.9*y[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) %>%
autoplot(y) +
labs(title = "AR(1) model with phi_1 = 0.9")
The plot shows in a smoother series, where increases tend to follow increases and decreases tend to follow decreases, when \(\phi_1\) to 0.6. On the other hand, the plot has a oscillatory pattern, where the series appears to switch back and forth across the mean more frequently, when \(\phi_1\) to -0.6. Moreover, \(\phi_1\) with 0.9 value, the plot is smoother and the swings from one time point to the next are larger because the values are more persistent.
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*e[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim %>%
autoplot(y) +
labs(title = "AR(1) model with theta_1 = 0.6")
Changing the \(\theta_1\) to -0.6:
for(i in 2:100)
y[i] <- -0.6*e[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) %>%
autoplot(y) +
labs(title = "AR(1) model with theta_1 = -0.6")
Changing the \(\theta_1\) to 0.9:
for(i in 2:100)
y[i] <- 0.9*e[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) %>%
autoplot(y) +
labs(title = "AR(1) model with theta_1 = 0.9")
As \(\theta_1\) decreases, the frequency of the spikes increase. The shapes of the graphs do not change much as they have similar magnitude. It seems that as it decreases, it improves the centering around the mean.
set.seed(123)
y <- numeric(100)
for(i in 2:100)
y[i] <- e[i] + 0.6 * e[i-1] + 0.6 * y[i-1]
arma1_1 <- tsibble(idx = seq_len(100), y = y, index = idx)
set.seed(123)
y <- numeric(100)
for(i in 3:100)
y[i] <- -0.8 * y[i-1] + 0.3 * y[i-2] + e[i]
ar2 <- tsibble(idx = seq_len(100), y = y, index = idx)
arma1_1 %>%
gg_tsdisplay(y, plot_type='partial') +
labs(title = ("ARMA(1,1) model"))
ar2 %>%
gg_tsdisplay(y, plot_type='partial') +
labs(title = ("AR(2) model"))
The ARMA(1,1) plot shows a stable, mean-reverting process, whereas the AR(2) plot shows an unstable, diverging process. In the ACF and PACF plots, the ARMA(1,1) process shows a more rapid decline in autocorrelations and a sharp cut-off in the PACF after lag 1, while the AR(2) process shows significant autocorrelation at the first two lags and a more complex ACF pattern.
Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.
fit <- aus_airpassengers %>%
filter(Year < 2012) %>%
model(ARIMA(Passengers))
report(fit)
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8756
## s.e. 0.0722
##
## sigma^2 estimated as 4.671: log likelihood=-87.8
## AIC=179.61 AICc=179.93 BIC=182.99
fit %>%
forecast(h=10) %>%
autoplot(aus_airpassengers) +
labs(title = "Australian Aircraft Passengers with ARIMA(0,2,1)", y = "Passengers (in millions)")
fit %>%
gg_tsresiduals() +
labs(title = "Residuals for ARIMA(0,2,1)")
\((1-B)^2y_t = c + (1+\theta_1B)e_t\)
fit2 <-aus_airpassengers %>%
filter(Year < 2012) %>%
model(ARIMA(Passengers ~ pdq(0,1,0)))
fit2 %>%
forecast(h=10) %>%
autoplot(aus_airpassengers) +
labs(title = "Australian Aircraft Passengers with ARIMA(0,1,0)", y = "Passengers (in millions)")
fit2 %>%
gg_tsresiduals() +
labs(title = "Residuals for ARIMA(0,1,0)")
fit3 <-aus_airpassengers %>%
filter(Year < 2012) %>%
model(ARIMA(Passengers ~ pdq(2,1,2)))
fit3 %>%
forecast(h=10) %>%
autoplot(aus_airpassengers) +
labs(title = "Australian Aircraft Passengers with ARIMA(2,1,2)", y = "Passengers (in millions)")
fit3 %>%
gg_tsresiduals() +
labs(title = "Residuals for ARIMA(2,1,2)")
#removing constant
fit4 <-aus_airpassengers %>%
filter(Year < 2012) %>%
model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))
report(fit4)
## Series: Passengers
## Model: NULL model
## NULL model
fit5 <-aus_airpassengers %>%
filter(Year < 2012) %>%
model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))
fit5 %>%
forecast(h=10) %>%
autoplot(aus_airpassengers) +
labs(title = "Australian Aircraft Passengers with ARIMA(0,2,1) with constant", y = "Passengers (in millions)")
fit5 %>%
gg_tsresiduals() +
labs(title = "Residuals for ARIMA(0,2,1) with constant")
For the United States GDP series (from global_economy):
# plotting
global_economy %>%
filter(Code == "USA") %>%
gg_tsdisplay(GDP, plot_type='partial') +
labs(title = "Non-transformed United States GDP")
# calculating lambda
lambda <-global_economy %>%
filter(Code == "USA") %>%
features(GDP, features = guerrero) %>%
pull(lambda_guerrero)
fit6 <- global_economy %>%
filter(Code == "USA") %>%
model(ARIMA(box_cox(GDP, lambda)))
report(fit6)
## Series: GDP
## Model: ARIMA(1,1,0) w/ drift
## Transformation: box_cox(GDP, lambda)
##
## Coefficients:
## ar1 constant
## 0.4586 118.1822
## s.e. 0.1198 9.5047
##
## sigma^2 estimated as 5479: log likelihood=-325.32
## AIC=656.65 AICc=657.1 BIC=662.78
global_economy %>%
filter(Code == "USA") %>%
gg_tsdisplay(box_cox(GDP,lambda), plot_type='partial') +
labs(title = "Transformed United States GDP")
global_economy %>%
filter(Code == "USA") %>%
features(box_cox(GDP,lambda), unitroot_ndiffs)
## # A tibble: 1 × 2
## Country ndiffs
## <fct> <int>
## 1 United States 1
usa_fit <- global_economy %>%
filter(Code == "USA") %>%
model(arima110 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,1,0)),
arima120 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,2,0)),
arima210 = ARIMA(box_cox(GDP,lambda) ~ pdq(2,1,0)),
arima212 = ARIMA(box_cox(GDP,lambda) ~ pdq(2,1,2)),
arima111 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,1,1)))
glance(usa_fit) %>% arrange(AICc) %>% select(.model:BIC)
## # A tibble: 5 × 6
## .model sigma2 log_lik AIC AICc BIC
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 arima120 6780. -326. 656. 656. 660.
## 2 arima110 5479. -325. 657. 657. 663.
## 3 arima111 5580. -325. 659. 659. 667.
## 4 arima210 5580. -325. 659. 659. 667.
## 5 arima212 5734. -325. 662. 664. 674.
The models have similar AICc values, with ARIMA(1,2,0) having the lowest and closely followed by ARIMA(1,1,0). The unit root test suggests to differ only once. There is a decreasing trend on the ACF plot and a significant first lag on the PACF plot, so an AR(1) model is suggested, or ARIMA(1,1,0). This is the same as in part b.
usa_fit %>%
select(arima120) %>%
gg_tsresiduals() +
ggtitle("Residuals for ARIMA(1,2,0)")
Since ARIMA(1,2,0) has the lowest AICc value, that model was chosen. The residuals seems to be white noise.
usa_fit %>%
forecast(h=5) %>%
filter(.model=='arima120') %>%
autoplot(global_economy)
fit_ets <- global_economy %>%
filter(Code == "USA") %>%
model(ETS(GDP))
report(fit_ets)
## Series: GDP
## Model: ETS(M,A,N)
## Smoothing parameters:
## alpha = 0.9990876
## beta = 0.5011949
##
## Initial states:
## l[0] b[0]
## 448093333334 64917355687
##
## sigma^2: 7e-04
##
## AIC AICc BIC
## 3190.787 3191.941 3201.089
fit_ets %>%
forecast(h=5) %>%
autoplot(global_economy)
With no transformations, the forecast using ETS() generates a forecast with a much wider confidence level and has a significantly greater AICc.