exercises 9.1, 9.2, 9.3, 9.5, 9.6, 9.7, 9.8
Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
Explain the differences among these figures. Do they all indicate that the data are white noise?
Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?
A classic example of a non-stationary series are stock prices. Plot
the daily closing prices for Amazon stock (contained in
gafa_stock
), along with the ACF and PACF. Explain how each
plot shows that the series is non-stationary and should be
differenced.
The time series increases until around 2019, the decreases, but overall increasing trend, so non-stationary. ACF plot is decreasing and has a large height that goes beyond the blue line, even though stationary time series goes to zero quickly. PACF plot starts off with 1.00, shows its non-stationary. It should be differenced because it will help the mean be stabilized.
library(fpp3)
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
## ── Attaching packages ──────────────────────────────────────────── fpp3 1.0.1 ──
## ✔ tibble 3.2.1 ✔ tsibble 1.1.6
## ✔ dplyr 1.1.4 ✔ tsibbledata 0.4.1
## ✔ tidyr 1.3.1 ✔ feasts 0.4.1
## ✔ lubridate 1.9.4 ✔ fable 0.4.1
## ✔ ggplot2 3.5.1
## ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
## ✖ lubridate::date() masks base::date()
## ✖ dplyr::filter() masks stats::filter()
## ✖ tsibble::intersect() masks base::intersect()
## ✖ tsibble::interval() masks lubridate::interval()
## ✖ dplyr::lag() masks stats::lag()
## ✖ tsibble::setdiff() masks base::setdiff()
## ✖ tsibble::union() masks base::union()
gafa_stock |>
filter(Symbol == "AMZN") |>
gg_tsdisplay(Close, plot_type='partial')
## Warning: Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
## Provided data has an irregular interval, results should be treated with caution. Computing ACF by observation.
For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
Turkish GDP from global_economy
.
Accommodation takings in the state of Tasmania from
aus_accommodation
.
Monthly sales from souvenirs
.
global_economy |>
filter(Country == "Turkey") |>
gg_tsdisplay(GDP, plot_type='partial')
lambda <- global_economy |>
filter(Country == "Turkey") |>
features(GDP, features = guerrero) |>
pull(lambda_guerrero)
global_economy |>
filter(Country == "Turkey") |>
features(box_cox(GDP,lambda), unitroot_ndiffs)
global_economy |>
filter(Country == "Turkey") |>
gg_tsdisplay(difference(box_cox(GDP,lambda)), plot_type='partial')
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
aus_accommodation |>
filter(State == "Tasmania") |>
gg_tsdisplay(Takings, plot_type='partial')
lambda <-aus_accommodation |>
filter(State == "Tasmania") |>
features(Takings, features = guerrero) |>
pull(lambda_guerrero)
aus_accommodation |>
filter(State == "Tasmania") |>
features(box_cox(Takings,lambda), unitroot_nsdiffs)
For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
The data is both seasonal and non stationary, and the differencing is only suggested once, by the KPSS unit root testing. ACF and PACF is around zero, so it should be stationary.
set.seed(12345678)
myseries <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries |>
gg_tsdisplay(Turnover, plot_type='partial', lag = 50)
lambda <- myseries |>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
myseries |>
features(box_cox(Turnover, lambda), unitroot_nsdiffs)
myseries |>
gg_tsdisplay(difference(box_cox(Turnover,lambda), 15), plot_type='partial', lag = 50)
## Warning: Removed 15 rows containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 15 rows containing missing values or values outside the scale range
## (`geom_point()`).
Simulate and plot some data from simple ARIMA models.
Use the following R code to generate data from an AR(1) model witho1=0.6 and p2=1. The process starts with y1=0
Produce a time plot for the series. How does the plot change as you change o1
Different o1 has different plot, like as o1 decrease, the plot becomes bigger in height, negative value makes the plotting even larger.
Write your own code to generate data from an MA(1) model with x1=0.6 and p2=1
Produce a time plot for the series. How does the plot change as you change 01
It seems to have the opposite effect with 01, the more the value is, the larger the height, and vice versa, helps the centering around the mean.
Generate data from an ARMA(1,1) model with o1=0.6, x1=0.6 and p2=1
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
set.seed(1111)
y <- numeric(100)
for(i in 2:100)
y[i] <- e[i] + 0.6 * e[i-1] + 0.6 * y[i-1]
tsibble(idx = seq_len(100), y = y, index = idx) |>
autoplot(y)
Generate data from an AR(2) model with o1=-0.8 o2=0.3 and p2=1
set.seed(1111)
y <- numeric(100)
for(i in 3:100)
y[i] <- -0.8 * y[i-1] + 0.3 * y[i-2] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) |>
autoplot(y)
Graph the latter two series and compare them.
ARMA(1,1) looks to be stationary and random. ACF plot seems to have alot of up and down and only the last value goes beyond the blue line, and PACF is the same. AR model doesn’t seem stationary, ACF is up and down as well but the size consistently decreases overtime. PACF plot has only 1 negative value.
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim |>
autoplot(y)
for(i in 2:100)
y[i] <- 0.2*y[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) |>
autoplot(y)
for(i in 2:100)
y[i] <- -0.8*y[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) |>
autoplot(y)
for(i in 2:100)
y[i] <- 0*y[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) |>
autoplot(y)
for(i in 2:100)
y[i] <- e[i] + 0.6 * e[i-1]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim |>
autoplot(y)
for(i in 2:100)
y[i] <- e[i] + 0.2 * e[i-1]
tsibble(idx = seq_len(100), y = y, index = idx) |>
autoplot(y)
for(i in 2:100)
y[i] <- e[i] + -0.8 * e[i-1]
tsibble(idx = seq_len(100), y = y, index = idx) |>
autoplot(y)
for(i in 2:100)
y[i] <- e[i] + 0 * e[i-1]
tsibble(idx = seq_len(100), y = y, index = idx) |>
autoplot(y)
set.seed(1111)
y <- numeric(100)
tsibble(idx = seq_len(100), y = y, index = idx) |>
gg_tsdisplay(y, plot_type='partial')
## Warning: Removed 20 rows containing missing values or values outside the scale range
## (`geom_segment()`).
## Removed 20 rows containing missing values or values outside the scale range
## (`geom_segment()`).
for(i in 3:100)
y[i] <- -0.8 * y[i-1] + 0.3 * y[i-2] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) |>
gg_tsdisplay(y, plot_type='partial')
Consider aus_airpassengers
, the total number of
passengers (in millions) from Australian air carriers for the period
1970-2011.
Use ARIMA()
to find an appropriate ARIMA model. What
model was selected. Check that the residuals look like white noise. Plot
forecasts for the next 10 periods.
Residuals look to be ARMA(0,2,1)
Write the model in terms of the backshift operator.
Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.
The model has higher forcasting in comparison, also the slope looks more slow.
Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.
The model is more similar other parts, and they seem to be white noise. Removing the constant makes it not stationary.
Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?
The slope is steeper after applying the model.
nextperiods <- aus_airpassengers |>
filter(Year <= 2011) |>
model(ARIMA(Passengers))
report(nextperiods)
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8756
## s.e. 0.0722
##
## sigma^2 estimated as 4.671: log likelihood=-87.8
## AIC=179.61 AICc=179.93 BIC=182.99
nextperiods |>
gg_tsresiduals()
nextperiodss <-aus_airpassengers |>
filter(Year <= 2011) |>
model(ARIMA(Passengers ~ pdq(0,1,0)))
nextperiodss |>
forecast(h=15) |>
autoplot(aus_airpassengers)
nextperiodss |>
gg_tsresiduals()
nextperiodsss <-aus_airpassengers |>
filter(Year <= 2011) |>
model(ARIMA(Passengers ~ pdq(2,1,2)))
nextperiodsss |>
forecast(h=15) |>
autoplot(aus_airpassengers)
nextperiodsss |>
gg_tsresiduals()
nextperiodssss <-aus_airpassengers |>
filter(Year <= 2011) |>
model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))
## Warning: 1 error encountered for ARIMA(Passengers ~ 0 + pdq(2, 1, 2))
## [1] non-stationary AR part from CSS
nextperiodsssss <-aus_airpassengers |>
filter(Year <= 2011) |>
model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))
## Warning: Model specification induces a quadratic or higher order polynomial trend.
## This is generally discouraged, consider removing the constant or reducing the number of differences.
nextperiodsssss |>
forecast(h=10) |>
autoplot(aus_airpassengers)
nextperiodsssss |>
gg_tsresiduals()
For the United States GDP series (from
global_economy
):
if necessary, find a suitable Box-Cox transformation for the data;
GDP has increasing trend and is not stationary, and suitable box-cos is 0.2819443
fit a suitable ARIMA model to the transformed data using
ARIMA()
;
ARIMA(1,1,0)
try some other plausible models by experimenting with the orders chosen;
The difference is only once with unit root test. The ACF plot has a decreasing trend and only one value in PACF plot, also with ARIMA(1,2,0) having the lowest.
choose what you think is the best model and check the residual diagnostics;
The lowest ARIMA’s residuals seems to be white noise.
produce forecasts of your fitted model. Do the forecasts look reasonable?
The forecasts seem reasonable. The trend is has a consistent slope.
compare the results with what you would obtain using
ETS()
(with no transformation).
The model has a larger AICc, so this model is worse.
global_economy |>
filter(Code == "USA") |>
gg_tsdisplay(GDP, plot_type='partial')
lambda <-global_economy |>
filter(Code == "USA") |>
features(GDP, features = guerrero) |>
pull(lambda_guerrero)
lambda
## [1] 0.2819443
fit <- global_economy |>
filter(Code == "USA") |>
model(ARIMA(box_cox(GDP, lambda)))
report(fit)
## Series: GDP
## Model: ARIMA(1,1,0) w/ drift
## Transformation: box_cox(GDP, lambda)
##
## Coefficients:
## ar1 constant
## 0.4586 118.1822
## s.e. 0.1198 9.5047
##
## sigma^2 estimated as 5479: log likelihood=-325.32
## AIC=656.65 AICc=657.1 BIC=662.78
global_economy |>
filter(Code == "USA") |>
gg_tsdisplay(box_cox(GDP,lambda), plot_type='partial')
global_economy |>
filter(Code == "USA") |>
features(box_cox(GDP,lambda), unitroot_ndiffs)
fit <- global_economy |>
filter(Code == "USA") |>
model(arima110 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,1,0)),
arima120 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,2,0)),
arima111 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,1,1)))
glance(fit)
fit |>
select(arima120) |>
gg_tsresiduals()
fit |>
forecast(h=10) |>
filter(.model=='arima120') |>
autoplot(global_economy)
ets <- global_economy |>
filter(Code == "USA") |>
model(ETS(GDP))
report(ets)
## Series: GDP
## Model: ETS(M,A,N)
## Smoothing parameters:
## alpha = 0.9990876
## beta = 0.5011949
##
## Initial states:
## l[0] b[0]
## 448093333334 64917355687
##
## sigma^2: 7e-04
##
## AIC AICc BIC
## 3190.787 3191.941 3201.089
ets |>
forecast(h=10) |>
autoplot(global_economy)