Q1.Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
Answer: The difference between 3 ACF plots are the lengths of spikes and the area size inbetween blue dashed lines. As observations increase from 36 to 1000, the spikes generally decreases, as well as the bounded area between dashed lines. There are no significant lags in all 3 ACF plots, and spikes are all within the dashed line area, therefore all data seems to be white noise.
Answer: Section 2.9 of the book described whitenoise as :“For a white noise series, we expect 95% of the spikes in the ACF to lie within ±1.96/√T where T is the length of the time series” therefore it is normal that there are some varaition from the mean of zero. In this questions the autocorrelations are different among 3 plots due to different observations size, it makes sense that larger series sizes of random numbers result in decreases of autocorrelation.
Q2: A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.
Answer: The plot shows increasing trend on the stock closeing prices ove the time which indicates its non-stationary. ACF plot shows strong autocorrelation slowly decreases. Similarly, the PACF plot has a very high first lag values close to 1 and few other spikes that are outside dashed line indicate data to be non-stationary, it should be differenced to help stabilize the mean.
library(fpp3)
library(patchwork)
amazon_stock <- gafa_stock |>
filter(Symbol == "AMZN") |>
select(Date, Close)
p1 <- amazon_stock |>
autoplot(Close) +
labs(title = "Amazon Closing Price",
y = "Price (USD)")
p2 <- amazon_stock |>
ACF(Close) |>
autoplot() +
labs(title = "ACF of Amazon Closing Price",
y = "ACF")
## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.
p3 <- amazon_stock |>
PACF(Close) |>
autoplot() +
labs(title = "PACF of Amazon Closing Price",
y = "PACF")
## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.
p1 / (p2 | p3)
Q3.For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
Turkish GDP data is non-seasonal and non-stationary. The test suggested 1 differencing to be apply. I used lamda 0.16 for the box_cox transformation and first order differncing. As shown in the plots below, after differencing once, data seems to be stationary as all spikes within two blue dashedlines.
library(fpp3)
library(latex2exp)
global_economy |>
filter(Country == "Turkey") |>
gg_tsdisplay(GDP) +
labs(title = "Non-transformed Turkish GDP")
## Warning: `gg_tsdisplay()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsdisplay()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# get lambda
lambda <- global_economy |>
filter(Country == "Turkey") |>
features(GDP, features = guerrero) |>
pull(lambda_guerrero)
# unit root test
global_economy |>
filter(Country == "Turkey") |>
features(box_cox(GDP,lambda), unitroot_ndiffs)
## # A tibble: 1 × 2
## Country ndiffs
## <fct> <int>
## 1 Turkey 1
global_economy |>
filter(Country == "Turkey") |>
gg_tsdisplay(difference(box_cox(GDP,lambda))) +
labs(title = TeX(paste0("Differenced Turkish GDP with $\\lambda$ = ",
round(lambda,2))))
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).
For this one, I tried differenced twice. The data showing to be seasonal and non-stationary. After first differencing it seems to be somewhat stationary, the ACF is fast decaying however, after second differencing, as shown below the ACF plot showing data seems to be better centered around 0.
library(patchwork)
tasmania <- aus_accommodation |>
filter(State == "Tasmania")
tasmania |>
autoplot(Takings) +
labs(title = "Tasmanian oritional")
#find lambda
lambda <- tasmania |>
features(Takings, features = guerrero) |>
pull(lambda_guerrero)
tasmania |>
autoplot(box_cox(Takings, lambda)) +
labs(title = "Box-Cox Transformed Tasmanian")
#first order
plot1 <- tasmania |>
mutate(Trans = box_cox(Takings, lambda)) |>
autoplot(difference(Trans, lag=4)) +
labs(title = "First Order Difference Tasmanian")
plot2 <- tasmania |>
mutate(Trans = box_cox(Takings, lambda)) |>
ACF(difference(Trans, lag=4)) |>
autoplot() +
labs(title = "ACF Plot: First Order Difference Tasmanian")
#second order
plot3 <- tasmania |>
mutate(Trans = box_cox(Takings, lambda)) |>
autoplot(difference(Trans, lag=4) |> difference()) +
labs(title = "Second Order Difference Tasmanian Test Taking")
plot4 <- tasmania |>
mutate(Trans = box_cox(Takings, lambda)) |>
ACF(difference(Trans, lag=4) |> difference()) |>
autoplot() +
labs(title = "ACF Plot: Second Order Difference Tasmanian")
plot1/plot2
## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).
plot3/plot4
## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).
For the montly sale from souvenirs, I used log transformation and applied first order differencing, it somewhat becomes more stationary. I tried second order differencing and ACF plot showing a better result. Therefore second order differencing is a better fit in this case.
souvenirs |>
autoplot(Sales) +
labs(title = "Monthly Souvenir Sales")
#log transformed
souvenirs |>
autoplot(log(Sales)) +
labs(title = "Log Transformed")
#first order
plot1 <- souvenirs |>
autoplot(difference(log(Sales), lag=12)) +
labs(title = "First Order Difference Souvenir Sales")
plot2 <- souvenirs |>
ACF(difference(log(Sales), lag=12)) |>
autoplot() +
labs(title = "ACF Plot: First Order Difference Souvenir Sales")
#second order
plot3 <- souvenirs |>
autoplot(difference(log(Sales), lag=12) |> difference()) +
labs(title = "Second Order Difference Souvenir Sales")
plot4 <- souvenirs |>
ACF(difference(log(Sales), lag=12) |> difference()) |>
autoplot() +
labs(title = "ACF Plot: Second Order Difference Souvenir Sales")
plot1/plot2
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
plot3/plot4
## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).
Q5. For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
The retail data series from aus_retail is showing seasonal and non-stationary. After find optimal lambda for box-cox transformation, I applied first order differncing. I tried second order differencing, with only one exception around lag12, the ACF plot is showing more stationary than the first differencing.
set.seed(613)
retail <- aus_retail |>
filter(`Series ID` == sample(aus_retail$`Series ID`,1))
retail |>
autoplot(Turnover) +
labs(title = "Retail Turnover")
#find lambda and apply box_cox
lambda <- retail |>
features(Turnover, features = guerrero) |>
pull(lambda_guerrero)
retail |>
autoplot(box_cox(Turnover, lambda)) +
labs(title = "Transformed Retail Turnover")
#first order differencing
plot1 <- retail |>
autoplot(difference(box_cox(Turnover, lambda), 12)) +
labs(title = "First Order Difference: Transformed Retail Turnover")
plot2 <- retail |>
ACF(difference(box_cox(Turnover, lambda), 12)) |>
autoplot() +
labs(title = "ACF Plot First Order Difference: Transformed Retail Turnover")
#second order differencing
plot3 <- retail |>
autoplot(difference(box_cox(Turnover, lambda), 12) |> difference()) +
labs(title = "Second Order Difference: Transformed Retail Turnover")
plot4 <- retail |>
ACF(difference(box_cox(Turnover, lambda), 12) |> difference()) |>
autoplot() +
labs(title = "ACF Plot Second Order Difference: Transformed Retail Turnover")
plot1/plot2
## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).
plot3/plot4
## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).
Q6. Simulate and plot some data from simple ARIMA models.
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
As showing below, I made different plots with different ϕ1 number. When ϕ1=0,it showing close to white noise, and when ϕ1=1, it resembles a random walk. As ϕ1 decreases, the variationproducing more spikes.
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
y1 <- numeric(100)
y2 <- numeric(100)
y3 <- numeric(100)
y4 <- numeric(100)
y5 <- numeric(100)
y6 <- numeric(100)
for(i in 2:100){
y1[i] <- -0.6*y[i-1] + e[i]
y2[i] <- -0.2*y[i-1] + e[i]
y3[i] <- 0.01*y[i-1] + e[i]
y4[i] <- 0.2*y[i-1] + e[i]
y5[i] <- 0.6*y[i-1] + e[i]
y6[i] <- 0.9*y[i-1] + e[i]
}
sim <- tsibble(idx = seq_len(100), y1 = y1, y2 = y2, y3 = y3, y4 = y4, y5 = y5, y6 = y6, index = idx)
plot1 <- sim |>
autoplot(y1) +
labs(title = TeX("$\\phi_1 = -0.6$"))
plot2 <- sim |>
autoplot(y2) +
labs(title = TeX("$\\phi_1 = -0.2$"))
plot3 <- sim |>
autoplot(y3) +
labs(title = TeX("$\\phi_1 = 0.01$"))
plot4 <- sim |>
autoplot(y4) +
labs(title = TeX("$\\phi_1 = 0.2$"))
plot5 <- sim |>
autoplot(y5) +
labs(title = TeX("$\\phi_1 = 0.6$"))
plot6 <- sim |>
autoplot(y6) +
labs(title = TeX("$\\phi_1 = 0.9$"))
plot1/plot2
plot3/plot4
plot5/plot6
c)Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1.
y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*e[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)
sim |>
autoplot(y) +
labs(title = "Time Series (Theta = 0.6)")
y1 <- numeric(100)
y2 <- numeric(100)
y3 <- numeric(100)
y4 <- numeric(100)
y5 <- numeric(100)
y6 <- numeric(100)
for(i in 2:100){
y1[i] <- -0.6*e[i-1] + e[i]
y2[i] <- -0.2*e[i-1] + e[i]
y3[i] <- 0.01*e[i-1] + e[i]
y4[i] <- 0.2*e[i-1] + e[i]
y5[i] <- 0.6*e[i-1] + e[i]
y6[i] <- 0.9*e[i-1] + e[i]
}
sim <- tsibble(idx = seq_len(100), y1 = y1, y2 = y2, y3 = y3, y4 = y4, y5 = y5, y6 = y6, index = idx)
plot1 <- sim |>
autoplot(y1) +
labs(title = TeX("$\\theta_1 = -0.6$"))
plot2 <- sim |>
autoplot(y2) +
labs(title = TeX("$\\theta_1 = -0.2$"))
plot3 <- sim |>
autoplot(y3) +
labs(title = TeX("$\\theta_1 = 0.01$"))
plot4 <- sim |>
autoplot(y4) +
labs(title = TeX("$\\theta_1 = 0.2$"))
plot5 <- sim |>
autoplot(y5) +
labs(title = TeX("$\\theta_1 = 0.6$"))
plot6 <- sim |>
autoplot(y6) +
labs(title = TeX("$\\theta_1 = 0.9$"))
plot1/plot2
plot3/plot4
plot5/plot6
All e,f,g answers are showing below. The AR(2) appears to have a gradual increase while the ARMA(1,1) does not show any clear apparent trend.
y <- numeric(100)
e <- rnorm(100)
for (i in 2:100){
y[i] <- (0.6 * y[i-1]) + (0.6 * e[i-1]) + e[i]
}
sim1 <- tsibble(idx = seq_len(100), y = y, index = idx)
y <- numeric(100)
e <- rnorm(100)
for (i in 3:100){
y[i] <- (-0.8 * y[i-1]) + (0.3 * y[i-2]) + e[i]
}
sim2 <- tsibble(idx = seq_len(100), y = y, index = idx)
plot1 <- sim1 |>
autoplot(y) +
labs(title = TeX("ARMA(1,1) [$\\phi_1 = 0.6$, $\\theta_1 = 0.6$]"))
plot2 <- sim2 |>
autoplot(y) +
labs(title = TeX("AR(2) [$\\phi_1 = -0.8$, $\\phi_2 = 0.3$]"))
plot1/plot2
9.7 Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.
fit <- aus_airpassengers |>
model(ARIMA(Passengers))
report(fit)
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8963
## s.e. 0.0594
##
## sigma^2 estimated as 4.308: log likelihood=-97.02
## AIC=198.04 AICc=198.32 BIC=201.65
# check if white noise
fit |>
gg_tsresiduals()
## Warning: `gg_tsresiduals()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsresiduals()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
#forecasts for 10 periods
fc <- fit |>
forecast(h=10)
fc |>
autoplot(aus_airpassengers) +
labs(title = "10 Year Forecase for Australian Air Passengers")
b)Write the model in terms of the backshift operator.
yt=−0.87εt−1+εt (1−B)2yt=(1−0.87B)εt
The forecast distribution in the model used in a is larger than that of the ARIMA(0,1,0) model. The forecast is also slightly dampened in the ARIMA(0,1,0) model and does not increase as drastically.
fit2 <-aus_airpassengers |>
model(ARIMA(Passengers ~ pdq(0,1,0)))
fit2 |>
forecast(h=10) |>
autoplot(aus_airpassengers) +
labs(title = "Australian Aircraft Passengers with ARIMA(0,1,0)", y = "Passengers (in millions)")
fit2 |>
gg_tsresiduals() +
labs(title = "Residuals for ARIMA(0,1,0)")
d) Plot
forecasts from an ARIMA(2,1,2) model with drift and compare these to
parts a and c. Remove the constant and see what happens.
It is more similar to part and the residuals seem to be white noise. When the constant is removed, a null model is produced. When a constant is omitted, the forecast included a polynomial trend of order d−1, or in this case, 0 and making it non-stationary.
fit3 <-aus_airpassengers |>
filter(Year < 2012) |>
model(ARIMA(Passengers ~ pdq(2,1,2)))
fit3 |>
forecast(h=10) |>
autoplot(aus_airpassengers) +
labs(title = "Australian Aircraft Passengers with ARIMA(2,1,2)", y = "Passengers (in millions)")
fit3 |>
gg_tsresiduals() +
labs(title = "Residuals for ARIMA(2,1,2)")
#removing constant
fit4 <-aus_airpassengers |>
filter(Year < 2012) |>
model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))
## Warning: 1 error encountered for ARIMA(Passengers ~ 0 + pdq(2, 1, 2))
## [1] non-stationary AR part from CSS
The slope become steeper and a warning is given that the model specifies a higher order polynomial trend and to remove the constant.
fit5 <-aus_airpassengers |>
filter(Year < 2012) |>
model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))
## Warning: Model specification induces a quadratic or higher order polynomial trend.
## This is generally discouraged, consider removing the constant or reducing the number of differences.
fit5 |>
forecast(h=10) |>
autoplot(aus_airpassengers) +
labs(title = "Australian Aircraft Passengers with ARIMA(0,2,1) with constant", y = "Passengers (in millions)")
fit5 |>
gg_tsresiduals() +
labs(title = "Residuals for ARIMA(0,2,1) with constant")
Q8. For the United States GDP series (from global_economy):
a.if necessary, find a suitable Box-Cox transformation for the data;
united_states <- global_economy |>
filter(Country == "United States")
plot1 <- united_states |>
autoplot(GDP) +
labs(title = "Origional United States GDP")
lambda <- united_states |>
features(GDP, features = guerrero) |>
pull(lambda_guerrero)
united_states <- united_states |>
mutate(BoxCox_GDP = box_cox(GDP, lambda))
plot2 <- united_states |>
autoplot(BoxCox_GDP) +
labs(title = "United States GDP BoxCox Transformed")
plot1/ plot2
b.fit a suitable ARIMA model to the transformed data using ARIMA();
best fit ARIMA model is: ARIMA(1,1,0)
fit1 <- united_states |>
model(ARIMA(box_cox(GDP, lambda)))
report(fit1)
## Series: GDP
## Model: ARIMA(1,1,0) w/ drift
## Transformation: box_cox(GDP, lambda)
##
## Coefficients:
## ar1 constant
## 0.4586 118.1822
## s.e. 0.1198 9.5047
##
## sigma^2 estimated as 5479: log likelihood=-325.32
## AIC=656.65 AICc=657.1 BIC=662.78
fit2 <- united_states |>
model(ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(0,1,1)))
report(fit2)
## Series: GDP
## Model: ARIMA(0,1,1) w/ drift
## Transformation: box_cox(GDP, lambda)
##
## Coefficients:
## ma1 constant
## 0.4026 219.6195
## s.e. 0.1098 13.6953
##
## sigma^2 estimated as 5689: log likelihood=-326.37
## AIC=658.73 AICc=659.18 BIC=664.86
fit3 <- united_states |>
model(ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(2,1,1)))
report(fit3)
## Series: GDP
## Model: ARIMA(2,1,1) w/ drift
## Transformation: box_cox(GDP, lambda)
##
## Coefficients:
## ar1 ar2 ma1 constant
## 1.1661 -0.2792 -0.7356 24.3346
## s.e. 0.3418 0.2108 0.3077 2.5572
##
## sigma^2 estimated as 5647: log likelihood=-325.14
## AIC=660.29 AICc=661.46 BIC=670.5
fit4 <- united_states |>
model(ARIMA(box_cox(GDP, lambda) ~ 0 + pdq(0,1,1)))
report(fit4)
## Series: GDP
## Model: ARIMA(0,1,1)
## Transformation: box_cox(GDP, lambda)
##
## Coefficients:
## ma1
## 0.7673
## s.e. 0.0674
##
## sigma^2 estimated as 23338: log likelihood=-367.47
## AIC=738.93 AICc=739.16 BIC=743.02
global_economy |>
filter(Code == "USA") |>
features(box_cox(GDP,lambda), unitroot_ndiffs)
## # A tibble: 1 × 2
## Country ndiffs
## <fct> <int>
## 1 United States 1
usa_fit <- global_economy |>
filter(Code == "USA") |>
model(arima110 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,1,0)),
arima120 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,2,0)),
arima210 = ARIMA(box_cox(GDP,lambda) ~ pdq(2,1,0)),
arima212 = ARIMA(box_cox(GDP,lambda) ~ pdq(2,1,2)),
arima111 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,1,1)))
glance(usa_fit) |> arrange(AICc) %>% select(.model:BIC)
## # A tibble: 5 × 6
## .model sigma2 log_lik AIC AICc BIC
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 arima120 6780. -326. 656. 656. 660.
## 2 arima110 5479. -325. 657. 657. 663.
## 3 arima111 5580. -325. 659. 659. 667.
## 4 arima210 5580. -325. 659. 659. 667.
## 5 arima212 5734. -325. 662. 664. 674.
e.choose what you think is the best model and check the residual diagnostics;
From questions c’sresult, among all the model that I explored, ARIMA(1,2,0) has the lowest AICc value therefore that model was chosen. The residuals seems to be white noise.
usa_fit |>
forecast(h=5) |>
filter(.model=='arima120') |>
autoplot(global_economy)
The ETS() model has a higher AICc which suggests this model is not as good as ARIMA in forcasting.
fit_ets <- global_economy |>
filter(Code == "USA") |>
model(ETS(GDP))
report(fit_ets)
## Series: GDP
## Model: ETS(M,A,N)
## Smoothing parameters:
## alpha = 0.9990876
## beta = 0.5011949
##
## Initial states:
## l[0] b[0]
## 448093333334 64917355687
##
## sigma^2: 7e-04
##
## AIC AICc BIC
## 3190.787 3191.941 3201.089
fit_ets |>
forecast(h=5) |>
autoplot(global_economy)