HW6

Q1.Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise?

Answer: The difference between 3 ACF plots are the lengths of spikes and the area size inbetween blue dashed lines. As observations increase from 36 to 1000, the spikes generally decreases, as well as the bounded area between dashed lines. There are no significant lags in all 3 ACF plots, and spikes are all within the dashed line area, therefore all data seems to be white noise.

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

Answer: Section 2.9 of the book described whitenoise as :“For a white noise series, we expect 95% of the spikes in the ACF to lie within ±1.96/√T where T is the length of the time series” therefore it is normal that there are some varaition from the mean of zero. In this questions the autocorrelations are different among 3 plots due to different observations size, it makes sense that larger series sizes of random numbers result in decreases of autocorrelation.

Q2: A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

Answer: The plot shows increasing trend on the stock closeing prices ove the time which indicates its non-stationary. ACF plot shows strong autocorrelation slowly decreases. Similarly, the PACF plot has a very high first lag values close to 1 and few other spikes that are outside dashed line indicate data to be non-stationary, it should be differenced to help stabilize the mean.

library(fpp3)
library(patchwork)

amazon_stock <- gafa_stock |>
  filter(Symbol == "AMZN") |>
  select(Date, Close)

p1 <- amazon_stock |>
  autoplot(Close) +
  labs(title = "Amazon Closing Price",
       y = "Price (USD)")

p2 <- amazon_stock |>
  ACF(Close) |>
  autoplot() +
  labs(title = "ACF of Amazon Closing Price",
       y = "ACF")

## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.

p3 <- amazon_stock |>
  PACF(Close) |>
  autoplot() +
  labs(title = "PACF of Amazon Closing Price",
       y = "PACF")

## Warning: Provided data has an irregular interval, results should be treated
## with caution. Computing ACF by observation.

p1 / (p2 | p3)

Q3.For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Turkish GDP from global_economy.

Turkish GDP data is non-seasonal and non-stationary. The test suggested 1 differencing to be apply. I used lamda 0.16 for the box_cox transformation and first order differncing. As shown in the plots below, after differencing once, data seems to be stationary as all spikes within two blue dashedlines.

library(fpp3)
library(latex2exp)  

global_economy |>
  filter(Country == "Turkey") |>
  gg_tsdisplay(GDP) +
  labs(title = "Non-transformed Turkish GDP")

## Warning: `gg_tsdisplay()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsdisplay()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

# get lambda
lambda <- global_economy |>
  filter(Country == "Turkey") |>
  features(GDP, features = guerrero) |>
  pull(lambda_guerrero)

# unit root test
global_economy |>
  filter(Country == "Turkey") |>
  features(box_cox(GDP,lambda), unitroot_ndiffs)

## # A tibble: 1 × 2
##   Country ndiffs
##   <fct>    <int>
## 1 Turkey       1

global_economy |>
  filter(Country == "Turkey") |>
  gg_tsdisplay(difference(box_cox(GDP,lambda))) +
  labs(title = TeX(paste0("Differenced Turkish GDP with $\\lambda$ = ",
         round(lambda,2))))

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_line()`).

## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_point()`).

Accommodation takings in the state of Tasmania from aus_accommodation.

For this one, I tried differenced twice. The data showing to be seasonal and non-stationary. After first differencing it seems to be somewhat stationary, the ACF is fast decaying however, after second differencing, as shown below the ACF plot showing data seems to be better centered around 0.

library(patchwork)

tasmania <- aus_accommodation |>
  filter(State == "Tasmania")

tasmania |>
  autoplot(Takings) +
  labs(title = "Tasmanian oritional")

#find lambda 
lambda <- tasmania |>
  features(Takings, features = guerrero) |>
  pull(lambda_guerrero)

tasmania |>
  autoplot(box_cox(Takings, lambda)) +
  labs(title = "Box-Cox Transformed Tasmanian")

#first order 
plot1 <- tasmania |>
  mutate(Trans = box_cox(Takings, lambda)) |>
  autoplot(difference(Trans, lag=4)) +
  labs(title = "First Order Difference Tasmanian")

plot2 <- tasmania |>
  mutate(Trans = box_cox(Takings, lambda)) |>
  ACF(difference(Trans, lag=4)) |>
  autoplot() +
  labs(title = "ACF Plot: First Order Difference Tasmanian")

#second order
plot3 <- tasmania |>
  mutate(Trans = box_cox(Takings, lambda)) |>
  autoplot(difference(Trans, lag=4) |> difference()) +
  labs(title = "Second Order Difference Tasmanian Test Taking")

plot4 <- tasmania |>
  mutate(Trans = box_cox(Takings, lambda)) |>
  ACF(difference(Trans, lag=4) |> difference()) |>
  autoplot() +
  labs(title = "ACF Plot: Second Order Difference Tasmanian")

plot1/plot2

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

plot3/plot4

## Warning: Removed 5 rows containing missing values or values outside the scale range
## (`geom_line()`).

Monthly sales from souvenirs.

For the montly sale from souvenirs, I used log transformation and applied first order differencing, it somewhat becomes more stationary. I tried second order differencing and ACF plot showing a better result. Therefore second order differencing is a better fit in this case.

souvenirs |>
  autoplot(Sales) +
  labs(title = "Monthly Souvenir Sales")

#log transformed
souvenirs |>
  autoplot(log(Sales)) +
  labs(title = "Log Transformed")

#first order
plot1 <- souvenirs |>
  autoplot(difference(log(Sales), lag=12)) +
  labs(title = "First Order Difference Souvenir Sales")

plot2 <- souvenirs |>
  ACF(difference(log(Sales), lag=12)) |>
  autoplot() +
  labs(title = "ACF Plot: First Order Difference Souvenir Sales")

#second order
plot3 <- souvenirs |>
  autoplot(difference(log(Sales), lag=12) |> difference()) +
  labs(title = "Second Order Difference Souvenir Sales")

plot4 <- souvenirs |>
  ACF(difference(log(Sales), lag=12) |> difference()) |>
  autoplot() +
  labs(title = "ACF Plot: Second Order Difference Souvenir Sales")


plot1/plot2

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

plot3/plot4

## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).

Q5. For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

The retail data series from aus_retail is showing seasonal and non-stationary. After find optimal lambda for box-cox transformation, I applied first order differncing. I tried second order differencing, with only one exception around lag12, the ACF plot is showing more stationary than the first differencing.

set.seed(613)

retail <- aus_retail |>
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

retail |>
  autoplot(Turnover) +
  labs(title = "Retail Turnover")

#find lambda and apply box_cox
lambda <- retail |>
  features(Turnover, features = guerrero) |>
  pull(lambda_guerrero)

retail |>
  autoplot(box_cox(Turnover, lambda)) +
  labs(title = "Transformed Retail Turnover")

#first order differencing
plot1 <- retail |>
  autoplot(difference(box_cox(Turnover, lambda), 12)) +
  labs(title = "First Order Difference: Transformed Retail Turnover")

plot2 <- retail |>
  ACF(difference(box_cox(Turnover, lambda), 12)) |>
  autoplot() +
  labs(title = "ACF Plot First Order Difference: Transformed Retail Turnover")

#second order differencing

plot3 <- retail |>
  autoplot(difference(box_cox(Turnover, lambda), 12) |> difference()) +
  labs(title = "Second Order Difference: Transformed Retail Turnover")

plot4 <- retail |>
  ACF(difference(box_cox(Turnover, lambda), 12) |> difference()) |>
  autoplot() +
  labs(title = "ACF Plot Second Order Difference: Transformed Retail Turnover")

plot1/plot2

## Warning: Removed 12 rows containing missing values or values outside the scale range
## (`geom_line()`).

plot3/plot4

## Warning: Removed 13 rows containing missing values or values outside the scale range
## (`geom_line()`).

Q6. Simulate and plot some data from simple ARIMA models.

Use the following R code to generate data from an AR(1) model with ϕ1=0.6 and σ2=1.The process starts with y1=0

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

Produce a time plot for the series. How does the plot change as you change ϕ1?

As showing below, I made different plots with different ϕ1 number. When ϕ1=0,it showing close to white noise, and when ϕ1=1, it resembles a random walk. As ϕ1 decreases, the variationproducing more spikes.

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

y1 <- numeric(100)
y2 <- numeric(100)
y3 <- numeric(100)
y4 <- numeric(100)
y5 <- numeric(100)
y6 <- numeric(100)

for(i in 2:100){
  y1[i] <- -0.6*y[i-1] + e[i]
  y2[i] <- -0.2*y[i-1] + e[i]
  y3[i] <- 0.01*y[i-1] + e[i]
  y4[i] <- 0.2*y[i-1] + e[i]
  y5[i] <- 0.6*y[i-1] + e[i]
  y6[i] <- 0.9*y[i-1] + e[i]
}

sim <- tsibble(idx = seq_len(100), y1 = y1, y2 = y2, y3 = y3, y4 = y4, y5 = y5, y6 = y6, index = idx)

plot1 <- sim |>
  autoplot(y1) +
  labs(title = TeX("$\\phi_1 = -0.6$"))

plot2 <- sim |>
  autoplot(y2) +
  labs(title = TeX("$\\phi_1 = -0.2$"))

plot3 <- sim |>
  autoplot(y3) +
  labs(title = TeX("$\\phi_1 = 0.01$"))

plot4 <- sim |>
  autoplot(y4) +
  labs(title = TeX("$\\phi_1 = 0.2$"))

plot5 <- sim |>
  autoplot(y5) +
  labs(title = TeX("$\\phi_1 = 0.6$"))

plot6 <- sim |>
  autoplot(y6) +
  labs(title = TeX("$\\phi_1 = 0.9$"))

plot1/plot2

plot3/plot4

plot5/plot6

c)Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1.

y <- numeric(100)

e <- rnorm(100)

for(i in 2:100)
  y[i] <- 0.6*e[i-1] + e[i]

sim <- tsibble(idx = seq_len(100), y = y, index = idx)

Produce a time plot for the series. How does the plot change as you change θ1?

sim |>
  autoplot(y) +
  labs(title = "Time Series (Theta = 0.6)")

y1 <- numeric(100)
y2 <- numeric(100)
y3 <- numeric(100)
y4 <- numeric(100)
y5 <- numeric(100)
y6 <- numeric(100)

for(i in 2:100){
  y1[i] <- -0.6*e[i-1] + e[i]
  y2[i] <- -0.2*e[i-1] + e[i]
  y3[i] <- 0.01*e[i-1] + e[i]
  y4[i] <- 0.2*e[i-1] + e[i]
  y5[i] <- 0.6*e[i-1] + e[i]
  y6[i] <- 0.9*e[i-1] + e[i]
}

sim <- tsibble(idx = seq_len(100), y1 = y1, y2 = y2, y3 = y3, y4 = y4, y5 = y5, y6 = y6, index = idx)

plot1 <- sim |>
  autoplot(y1) +
  labs(title = TeX("$\\theta_1 = -0.6$"))

plot2 <- sim |>
  autoplot(y2) +
  labs(title = TeX("$\\theta_1 = -0.2$"))

plot3 <- sim |>
  autoplot(y3) +
  labs(title = TeX("$\\theta_1 = 0.01$"))

plot4 <- sim |>
  autoplot(y4) +
  labs(title = TeX("$\\theta_1 = 0.2$"))

plot5 <- sim |>
  autoplot(y5) +
  labs(title = TeX("$\\theta_1 = 0.6$"))

plot6 <- sim |>
  autoplot(y6) +
  labs(title = TeX("$\\theta_1 = 0.9$"))

plot1/plot2

plot3/plot4

plot5/plot6

Generate data from an ARMA(1,1) model with ϕ1=0.6, θ1=0.6,and σ2=1.
Generate data from an AR(2) model with ϕ1=−0.8, phi2=0.3 and σ2=1.(Note that these parameters will give a non-stationary series.)
Graph the latter two series and compare them.

All e,f,g answers are showing below. The AR(2) appears to have a gradual increase while the ARMA(1,1) does not show any clear apparent trend.

y <- numeric(100)

e <- rnorm(100)

for (i in 2:100){
    y[i] <- (0.6 * y[i-1]) + (0.6 * e[i-1]) + e[i]
}

sim1 <- tsibble(idx = seq_len(100), y = y, index = idx)

y <- numeric(100)
e <- rnorm(100)

for (i in 3:100){
    y[i] <- (-0.8 * y[i-1]) + (0.3 * y[i-2]) + e[i]
}

sim2 <- tsibble(idx = seq_len(100), y = y, index = idx)


plot1 <- sim1 |>
  autoplot(y) +
  labs(title = TeX("ARMA(1,1) [$\\phi_1 = 0.6$, $\\theta_1 = 0.6$]"))

plot2 <- sim2 |>
  autoplot(y) +
  labs(title = TeX("AR(2) [$\\phi_1 = -0.8$, $\\phi_2 = 0.3$]"))

plot1/plot2

9.7 Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

fit <- aus_airpassengers |>
  model(ARIMA(Passengers))

report(fit)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

# check if white noise
fit |>
  gg_tsresiduals()

## Warning: `gg_tsresiduals()` was deprecated in feasts 0.4.2.
## ℹ Please use `ggtime::gg_tsresiduals()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

#forecasts for 10 periods
fc <- fit |>
  forecast(h=10)

fc |>
  autoplot(aus_airpassengers) +
  labs(title = "10 Year Forecase for Australian Air Passengers")

b)Write the model in terms of the backshift operator.

yt=−0.87εt−1+εt (1−B)2yt=(1−0.87B)εt

Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

The forecast distribution in the model used in a is larger than that of the ARIMA(0,1,0) model. The forecast is also slightly dampened in the ARIMA(0,1,0) model and does not increase as drastically.

fit2 <-aus_airpassengers |>
  model(ARIMA(Passengers ~ pdq(0,1,0)))

fit2 |>
  forecast(h=10) |>
  autoplot(aus_airpassengers) +
  labs(title = "Australian Aircraft Passengers with ARIMA(0,1,0)", y = "Passengers (in millions)")

fit2 |>
  gg_tsresiduals() +
  labs(title = "Residuals for ARIMA(0,1,0)")

d) Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

It is more similar to part and the residuals seem to be white noise. When the constant is removed, a null model is produced. When a constant is omitted, the forecast included a polynomial trend of order d−1, or in this case, 0 and making it non-stationary.

fit3 <-aus_airpassengers |>
  filter(Year < 2012) |>
  model(ARIMA(Passengers ~ pdq(2,1,2)))

fit3 |>
  forecast(h=10) |>
  autoplot(aus_airpassengers) +
  labs(title = "Australian Aircraft Passengers with ARIMA(2,1,2)", y = "Passengers (in millions)")

fit3 |>
  gg_tsresiduals() +
  labs(title = "Residuals for ARIMA(2,1,2)")

#removing constant
fit4 <-aus_airpassengers |>
  filter(Year < 2012) |>
  model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))

## Warning: 1 error encountered for ARIMA(Passengers ~ 0 + pdq(2, 1, 2))
## [1] non-stationary AR part from CSS

Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

The slope become steeper and a warning is given that the model specifies a higher order polynomial trend and to remove the constant.

fit5 <-aus_airpassengers |>
  filter(Year < 2012) |>
  model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))

## Warning: Model specification induces a quadratic or higher order polynomial trend. 
## This is generally discouraged, consider removing the constant or reducing the number of differences.

fit5 |>
  forecast(h=10) |>
  autoplot(aus_airpassengers) +
  labs(title = "Australian Aircraft Passengers with ARIMA(0,2,1) with constant", y = "Passengers (in millions)")

fit5 |>
  gg_tsresiduals() +
  labs(title = "Residuals for ARIMA(0,2,1) with constant")

Q8. For the United States GDP series (from global_economy):

a.if necessary, find a suitable Box-Cox transformation for the data;

united_states <- global_economy |>
  filter(Country == "United States")

plot1 <- united_states |>
  autoplot(GDP) +
  labs(title = "Origional United States GDP")

lambda <- united_states |>
  features(GDP, features = guerrero) |>
  pull(lambda_guerrero)

united_states <- united_states |>
  mutate(BoxCox_GDP = box_cox(GDP, lambda))

plot2 <- united_states |>
  autoplot(BoxCox_GDP) +
  labs(title = "United States GDP BoxCox Transformed")

 plot1/ plot2

b.fit a suitable ARIMA model to the transformed data using ARIMA();

best fit ARIMA model is: ARIMA(1,1,0)

fit1 <- united_states |>
  model(ARIMA(box_cox(GDP, lambda)))

report(fit1)

## Series: GDP 
## Model: ARIMA(1,1,0) w/ drift 
## Transformation: box_cox(GDP, lambda) 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

try some other plausible models by experimenting with the orders chosen;

fit2 <- united_states |>
  model(ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(0,1,1)))

report(fit2)

## Series: GDP 
## Model: ARIMA(0,1,1) w/ drift 
## Transformation: box_cox(GDP, lambda) 
## 
## Coefficients:
##          ma1  constant
##       0.4026  219.6195
## s.e.  0.1098   13.6953
## 
## sigma^2 estimated as 5689:  log likelihood=-326.37
## AIC=658.73   AICc=659.18   BIC=664.86

fit3 <- united_states |>
  model(ARIMA(box_cox(GDP, lambda) ~ 1 + pdq(2,1,1)))

report(fit3)

## Series: GDP 
## Model: ARIMA(2,1,1) w/ drift 
## Transformation: box_cox(GDP, lambda) 
## 
## Coefficients:
##          ar1      ar2      ma1  constant
##       1.1661  -0.2792  -0.7356   24.3346
## s.e.  0.3418   0.2108   0.3077    2.5572
## 
## sigma^2 estimated as 5647:  log likelihood=-325.14
## AIC=660.29   AICc=661.46   BIC=670.5

fit4 <- united_states |>
  model(ARIMA(box_cox(GDP, lambda) ~ 0 + pdq(0,1,1)))

report(fit4)

## Series: GDP 
## Model: ARIMA(0,1,1) 
## Transformation: box_cox(GDP, lambda) 
## 
## Coefficients:
##          ma1
##       0.7673
## s.e.  0.0674
## 
## sigma^2 estimated as 23338:  log likelihood=-367.47
## AIC=738.93   AICc=739.16   BIC=743.02

choose what you think is the best model and check the residual diagnostics;

global_economy |>
  filter(Code == "USA") |>
  features(box_cox(GDP,lambda), unitroot_ndiffs)

## # A tibble: 1 × 2
##   Country       ndiffs
##   <fct>          <int>
## 1 United States      1

usa_fit <- global_economy |>
  filter(Code == "USA") |>
  model(arima110 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,1,0)),
        arima120 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,2,0)),
        arima210 = ARIMA(box_cox(GDP,lambda) ~ pdq(2,1,0)),
        arima212 = ARIMA(box_cox(GDP,lambda) ~ pdq(2,1,2)),
        arima111 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,1,1)))

glance(usa_fit) |> arrange(AICc) %>% select(.model:BIC)

## # A tibble: 5 × 6
##   .model   sigma2 log_lik   AIC  AICc   BIC
##   <chr>     <dbl>   <dbl> <dbl> <dbl> <dbl>
## 1 arima120  6780.   -326.  656.  656.  660.
## 2 arima110  5479.   -325.  657.  657.  663.
## 3 arima111  5580.   -325.  659.  659.  667.
## 4 arima210  5580.   -325.  659.  659.  667.
## 5 arima212  5734.   -325.  662.  664.  674.

e.choose what you think is the best model and check the residual diagnostics;

From questions c’sresult, among all the model that I explored, ARIMA(1,2,0) has the lowest AICc value therefore that model was chosen. The residuals seems to be white noise.

usa_fit |>
  forecast(h=5) |>
  filter(.model=='arima120') |>
  autoplot(global_economy)

compare the results with what you would obtain using ETS() (with no transformation).

The ETS() model has a higher AICc which suggests this model is not as good as ARIMA in forcasting.

fit_ets <- global_economy |>
  filter(Code == "USA") |>
  model(ETS(GDP))

report(fit_ets)

## Series: GDP 
## Model: ETS(M,A,N) 
##   Smoothing parameters:
##     alpha = 0.9990876 
##     beta  = 0.5011949 
## 
##   Initial states:
##          l[0]        b[0]
##  448093333334 64917355687
## 
##   sigma^2:  7e-04
## 
##      AIC     AICc      BIC 
## 3190.787 3191.941 3201.089

fit_ets |>
  forecast(h=5) |>
  autoplot(global_economy)

HW6

XiaoFei Mei

2025-10-18