DATA 624 Homework 6

9.1

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Figure 9.32: Left: ACF for a white noise series of 36 numbers.
Middle: ACF for a white noise series of 360 numbers.
Right: ACF for a white noise series of 1,000 numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise?

The difference between these figures are the lengths of each spike and the size of the bounded area. As the time series increases, the spikes seem to decrease and the size of the bounded area decreases. They all indicate that the data are white noise as they all lie within two blue lines which can be described as \(\pm 1.96 / \sqrt{T}\), where \(T\) is the length of the time series.

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The critical values are at different distance from the mean of zero because the critical values are calculated as \(\pm 1.96 / \sqrt{T}\), where \(T\) is the length of the time series. As \(T\) increases, the critical value decreases. As the series size increases, the critical values get closer to zero. The autocorrelations are different due to larger series sizes of random numbers, which decreases the chance of autocorrelation.

9.2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

gafa_stock %>%
  filter(Symbol == "AMZN") %>%
  gg_tsdisplay(Close, plot_type='partial') +
  labs(title = "Amazon Closing Stock Price")

gafa_stock %>%
  filter(Symbol == "AMZN") %>%
  features(Close, unitroot_ndiffs)

## # A tibble: 1 x 2
##   Symbol ndiffs
##   <chr>   <int>
## 1 AMZN        1

The time series seems to have an increasing trend for the most part which make it non-stationary. The ACF plot is slowly decreasing and has a \(r_1\) that is large and positive, whereas a stationary time series will drop to 0 relatively quick. The PACF plot has a first lag of around 1 proving it to be non-stationary. It should be differenced to help stabilize the mean. According to the the KPSS test, the Amazon closing price would need to be differenced once to be stationary.

9.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Turkish GDP from global_economy.

Turkish GDP is non-seasonal and non-stationary. The KPSS test recommends only differencing once and it seems to be stationary according to the graphs after differencing once.

# plot
global_economy %>%
  filter(Country == "Turkey") %>%
  gg_tsdisplay(GDP, plot_type='partial') +
  labs(title = "Non-transformed Turkish GDP")

# calculate lambda
lambda <- global_economy %>%
  filter(Country == "Turkey") %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

# unit root test
global_economy %>%
  filter(Country == "Turkey") %>%
  features(box_cox(GDP,lambda), unitroot_ndiffs)

## # A tibble: 1 x 2
##   Country ndiffs
##   <fct>    <int>
## 1 Turkey       1

# transformed plot
global_economy %>%
  filter(Country == "Turkey") %>%
  gg_tsdisplay(difference(box_cox(GDP,lambda)), plot_type='partial') +
  labs(title = TeX(paste0("Differenced Turkish GDP with $\\lambda$ = ",
         round(lambda,2))))

Accommodation takings in the state of Tasmania from aus_accommodation.

Accommodation takings in the state of Tasmania is seasonal and non-stationary. The KPSS test recommends only differencing once and it seems to be stationary according to the graphs after differencing once. The ACF is fast decaying and the PACF plot is truncated after the first lag. However, the data seems to be better centered around 0 wen twice differenced.

# plot
aus_accommodation %>%
  filter(State == "Tasmania") %>%
  gg_tsdisplay(Takings, plot_type='partial') +
  labs(title = "Non-transformed Tasmania Accomodation Takings")

# calculate lambda
lambda <-aus_accommodation %>%
  filter(State == "Tasmania") %>%
  features(Takings, features = guerrero) %>%
  pull(lambda_guerrero)

#unit root test
aus_accommodation %>%
  filter(State == "Tasmania") %>%
  features(box_cox(Takings,lambda), unitroot_nsdiffs)

## # A tibble: 1 x 2
##   State    nsdiffs
##   <chr>      <int>
## 1 Tasmania       1

# transformed plot
aus_accommodation %>%
  filter(State == "Tasmania") %>%
  gg_tsdisplay(difference(box_cox(Takings,lambda), 4), plot_type='partial') +
  labs(title = TeX(paste0("Differenced Tasmania Accomodation Takings with $\\lambda$ = ",
         round(lambda,2))))

Monthly sales from souvenirs.

Monthly souvernir sales is seasonal and non-stationary. The KPSS test recommends only differencing once and a \(lambda\) of close to 0 suggests a natural log transformation. It seems to be stationary according to the graphs after differencing once. The ACF is fast decaying and the PACF plot is truncated after the first lag.

# plot
souvenirs %>%
  gg_tsdisplay(Sales, plot_type='partial', lag = 36) +
  labs(title = "Non-transformed Monthly Souvenir Sales")

# calculate lambda
lambda <- souvenirs %>%
  features(Sales, features = guerrero) %>%
  pull(lambda_guerrero)

# unit root test
souvenirs %>%
  features(box_cox(Sales,lambda), unitroot_nsdiffs)

## # A tibble: 1 x 1
##   nsdiffs
##     <int>
## 1       1

# transformed plot
souvenirs %>%
  gg_tsdisplay(difference(box_cox(Sales,lambda), 12), plot_type='partial', lag = 36) +
  labs(title = TeX(paste0("Differenced Monthly Souvenir Sales with $\\lambda$ = ",
         round(lambda,2))))

9.5

For your retail data (from Exercise 8 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

The retail data is seasonal and non-stationary. A box-cox transformation was applied to transform the data as needed. The seasonal KPSS unit root test suggests differencing only once. After differencing once and transforming the data, the data seems to be stationary as it is centered around zero and the ACF and PACF plots show stationarity.

set.seed(1234)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1)) 

# plot
myseries %>%
  gg_tsdisplay(Turnover, plot_type='partial', lag = 36) +
  labs(title = "Non-transformed Retail Turnover")

# lambda calculation
lambda <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

# unit root test
myseries %>%
  features(box_cox(Turnover, lambda), unitroot_nsdiffs)

## # A tibble: 1 x 3
##   State    Industry                                      nsdiffs
##   <chr>    <chr>                                           <int>
## 1 Tasmania Cafes, restaurants and takeaway food services       1

# transformed plot
myseries %>%
  gg_tsdisplay(difference(box_cox(Turnover,lambda), 12), plot_type='partial', lag = 36) +
  labs(title = TeX(paste0("Differenced Tasmania Accomodation Takings with $\\lambda$ = ",
         round(lambda,2))))

9.6

Simulate and plot some data from simple ARIMA models.

Use the following R code to generate data from an AR(1) model with \(\phi_1 = 0.6\) and \(\sigma^2 = 1\). The process starts with \(y_1=0\)/

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

Produce a time plot for the series. How does the plot change as you change \(\phi_1\)?

Changing the \(\phi_1\) results in different time series patterns. When \(\phi_1 = 0\), it resemebles white noise, and when \(\phi_1 = 1\), it resembles a random walk. When it becomes negative, it tends to oscillate around the mean. As \(\phi_1\) decreases, the variation, producing more spikes.

sim %>%
  autoplot(y) +
  labs(title = TeX(" AR(1) model with $\\phi_i$ = 0.6"))

# phi_1 = 1
for(i in 2:100)
  y[i] <- 1*y[i-1] + e[i]

tsibble(idx = seq_len(100), y = y, index = idx) %>%
  autoplot(y) +
  labs(title = TeX(" AR(1) model with $\\phi_i$ = 1"))

# phi_1 = 0
for(i in 2:100)
  y[i] <- 0*y[i-1] + e[i]

tsibble(idx = seq_len(100), y = y, index = idx) %>%
  autoplot(y) +
  labs(title = TeX(" AR(1) model with $\\phi_i$ = 0"))

# phi_1 = -0.6
for(i in 2:100)
  y[i] <- -0.6*y[i-1] + e[i]

tsibble(idx = seq_len(100), y = y, index = idx) %>%
  autoplot(y) +
  labs(title = TeX(" AR(1) model with $\\phi_i$ = -0.6"))

Write your own code to generate data from an MA(1) model with \(\theta_1 = 0.6\) and \(\sigma^2 = 1\).

for(i in 2:100)
  y[i] <- e[i] + 0.6 * e[i-1]

sim2 <- tsibble(idx = seq_len(100), y = y, index = idx)

Produce a time plot for the series. How does the plot change as you change \(\theta_1\)

As \(\theta_1\) decreases, the frequency of the spikes increase. The shapes of the graphs do not change much as they have similar magnitude. It seems that as it decreases, it improves the centering around the mean.

sim2 %>%
  autoplot(y) +
  labs(title = TeX(" MA(1) model with $\\theta_i$ = 0.6"))

# theta_1 = 1
for(i in 2:100)
  y[i] <- e[i] + 1 * e[i-1]

tsibble(idx = seq_len(100), y = y, index = idx) %>%
  autoplot(y) +
  labs(title = TeX(" MA(1) model with $\\theta_1$ = 1"))

# theta_1 = 0
for(i in 2:100)
  y[i] <- e[i] + 0 * e[i-1]

tsibble(idx = seq_len(100), y = y, index = idx) %>%
  autoplot(y) +
  labs(title = TeX(" MA(1) model with $\\theta_1$ = 0"))

# theta_1 = -0.6
for(i in 2:100)
  y[i] <- e[i] + -0.6 * e[i-1]

tsibble(idx = seq_len(100), y = y, index = idx) %>%
  autoplot(y) +
  labs(title = TeX(" MA(1) model with $\\theta_1$ = -0.6"))

Generate data from an ARMA(1,1) model with \(\phi_1 = 0.6\), \(\theta_1 = 0.6\), and \(\sigma^2 = 1\).

set.seed(1234)
y <- numeric(100)

for(i in 2:100)
  y[i] <- e[i] + 0.6 * e[i-1] + 0.6 * y[i-1]

arma1_1 <- tsibble(idx = seq_len(100), y = y, index = idx)

Generate data from an AR(2) model with \(\phi_1 = -0.8\), \(\phi_2 = 0.3\), and \(\sigma^2 = 1\). (Note that these parameters will give a non-stationary series.)

set.seed(1234)
y <- numeric(100)

for(i in 3:100)
  y[i] <- -0.8 * y[i-1] + 0.3 * y[i-2] + e[i]

ar2 <- tsibble(idx = seq_len(100), y = y, index = idx)

Graph the latter two series and compare them.

The ARMA(1,1) model seems to be stationary as it appears to be random, the ACF plot is quickly decreasing and the PACF is truncated after the first lag. The AR(2) model is not not stationary is it oscillates about the mean and exponentially increases in variance as the index increases. The PACF plot show a negative first lag and nothing afterwards while the ACF plot also oscillates. This is because the constraints on the values of the parameters are not satisfied. In this case, \(\phi_2 - \phi_1 < 1\) is not satisfied.

arma1_1 %>%
  gg_tsdisplay(y, plot_type='partial') +
  labs(title = TeX("ARMA(1,1) model with $\\phi_1 = 0.6$, $\\theta_1 = 0.6$, and $\\sigma^2 = 1$"))

ar2 %>%
  gg_tsdisplay(y, plot_type='partial') +
  labs(title = TeX("AR(2) model with $\\phi_1 = -0.8$, $\\phi_2 = 0.3$, and $\\sigma^2 = 1$"))

9.7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

ARIMA (0,2,1) was selected and the residuals do resemble white noise.

fit <- aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers)) 

report(fit)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8756
## s.e.   0.0722
## 
## sigma^2 estimated as 4.671:  log likelihood=-87.8
## AIC=179.61   AICc=179.93   BIC=182.99

fit %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Aircraft Passengers with ARIMA(0,2,1)", y = "Passengers (in millions)")

fit %>% 
  gg_tsresiduals() +
  labs(title = "Residuals for ARIMA(0,2,1)")

Write the model in terms of the backshift operator.

\(y_t = -0.87 \varepsilon_{t-1} + \varepsilon_t\)

\((1-B)^2 y_t = (1-0.87B)\varepsilon_t\)

Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

The ARIMA model from part a forecasted higher than the actual values while this ARIMA model dorecasted lower than the actual values. The slope of the forecasts also seems to be more gradual.

fit2 <-aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ pdq(0,1,0)))

fit2 %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Aircraft Passengers with ARIMA(0,1,0)", y = "Passengers (in millions)")

fit2 %>% 
  gg_tsresiduals() +
  labs(title = "Residuals for ARIMA(0,1,0)")

Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

It is more similar to part and the residuals seem to be white noise. When the constant is removed, a null model is produced. When a constant is omitted, the forecast included a polynomial trend of order \(d-1\), or in this case, 0 and making it non-stationary.

fit3 <-aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ pdq(2,1,2)))

fit3 %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Aircraft Passengers with ARIMA(2,1,2)", y = "Passengers (in millions)")

fit3 %>% 
  gg_tsresiduals() +
  labs(title = "Residuals for ARIMA(2,1,2)")

#removing constant
fit4 <-aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))

## Warning: 1 error encountered for ARIMA(Passengers ~ 0 + pdq(2, 1, 2))
## [1] non-stationary AR part from CSS

report(fit4)

## Series: Passengers 
## Model: NULL model 
## NULL model

Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

The slope become steeper and a warning is given that the model specifies a higher order polynomial trend and to remove the constant.

fit5 <-aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))

## Warning: Model specification induces a quadratic or higher order polynomial trend. 
## This is generally discouraged, consider removing the constant or reducing the number of differences.

fit5 %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Aircraft Passengers with ARIMA(0,2,1) with constant", y = "Passengers (in millions)")

fit5 %>% 
  gg_tsresiduals() +
  labs(title = "Residuals for ARIMA(0,2,1) with constant")

9.8

For the United States GDP series (from global_economy):

if necessary, find a suitable Box-Cox transformation for the data;

The GDP has an increasing trend and is not stationary. A Box-Cox transformation was applied to stabilize the data.

# plot
global_economy %>%
  filter(Code == "USA") %>%
  gg_tsdisplay(GDP, plot_type='partial') +
  labs(title = "Non-transformed United States GDP")

# calculate lambda
lambda <-global_economy %>%
  filter(Code == "USA") %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

fit a suitable ARIMA model to the transformed data using ARIMA();

ARIMA(1,1,0) with a drift was fitted to the tansformed data.

fit6 <- global_economy %>%
  filter(Code == "USA") %>%
  model(ARIMA(box_cox(GDP, lambda))) 

report(fit6)

## Series: GDP 
## Model: ARIMA(1,1,0) w/ drift 
## Transformation: box_cox(GDP, lambda) 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

try some other plausible models by experimenting with the orders chosen;

The unit root test suggests to difference only once. There is a decreasing trend on the ACF plot and a significant first lag on the PACF plot, so an AR(1) model is suggested, or ARIMA(1,1,0). This is the same as in part b.

The models have similar AICc values, with ARIMA(1,2,0) having the lowest, closely followed by ARIMA(1,1,0).

# transformed plot
global_economy %>%
  filter(Code == "USA") %>%
  gg_tsdisplay(box_cox(GDP,lambda), plot_type='partial') +
  labs(title = "Transformed United States GDP")

# unit root test
global_economy %>%
  filter(Code == "USA") %>%
  features(box_cox(GDP,lambda), unitroot_ndiffs)

## # A tibble: 1 x 2
##   Country       ndiffs
##   <fct>          <int>
## 1 United States      1

# modeling several 
usa_fit <- global_economy %>%
  filter(Code == "USA") %>%
  model(arima110 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,1,0)),
        arima120 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,2,0)),
        arima210 = ARIMA(box_cox(GDP,lambda) ~ pdq(2,1,0)),
        arima212 = ARIMA(box_cox(GDP,lambda) ~ pdq(2,1,2)),
        arima111 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,1,1)))

glance(usa_fit) %>% arrange(AICc) %>% select(.model:BIC)

## # A tibble: 5 x 6
##   .model   sigma2 log_lik   AIC  AICc   BIC
##   <chr>     <dbl>   <dbl> <dbl> <dbl> <dbl>
## 1 arima120  6780.   -326.  656.  656.  660.
## 2 arima110  5479.   -325.  657.  657.  663.
## 3 arima111  5580.   -325.  659.  659.  667.
## 4 arima210  5580.   -325.  659.  659.  667.
## 5 arima212  5734.   -325.  662.  664.  674.

choose what you think is the best model and check the residual diagnostics;

Since ARIMA(1,2,0) has the lowest AICc value, that model was chosen. The residuals seems to be white noise.

usa_fit %>%
  select(arima120) %>%
  gg_tsresiduals() +
  ggtitle("Residuals for ARIMA(1,2,0)")

produce forecasts of your fitted model. Do the forecasts look reasonable?

The forecasts seem reasonable as the trend is continuing at a similar slope.

usa_fit %>%
  forecast(h=5) %>%
  filter(.model=='arima120') %>%
  autoplot(global_economy)

compare the results with what you would obtain using ETS() (with no transformation).

The ETS() model has a significantly greater AICc, which suggests this model is worse.

fit_ets <- global_economy %>%
  filter(Code == "USA") %>%
  model(ETS(GDP))

report(fit_ets)

## Series: GDP 
## Model: ETS(M,A,N) 
##   Smoothing parameters:
##     alpha = 0.9990876 
##     beta  = 0.5011949 
## 
##   Initial states:
##          l[0]        b[0]
##  448093333334 64917355687
## 
##   sigma^2:  7e-04
## 
##      AIC     AICc      BIC 
## 3190.787 3191.941 3201.089

fit_ets %>%
  forecast(h=5) %>%
  autoplot(global_economy)

DATA 624 Homework 6

Orli Khaimova

3/27/2022

9.1

9.2

9.3

9.5

9.6

9.7

9.8