Data-624 Homework 6

Question 9.1:

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise?

Figure 9.32: Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers.

Answer:

The main difference between these figures is that the more random number get higher, the spike become small. In other words, as the sample size increases, the ACF values are closer to zero, which is expected for white noise.

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

Answer:

The critical values are further from zero when the number of data points is smaller because the estimate of the autocorrelation is less precise. We can calculate the critical value as \(±1.96/\sqrt{T}\), where \(T\) is the length of the time series. Even though all series refer to white noise, the autocorrelations can differ due to random variation. Each series is a different realization of white noise, and due to the randomness, the actual autocorrelations observed in a finite sample can vary.

Question 9.2:

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

Answer:

data(gafa_stock)
gafa_stock

## # A tsibble: 5,032 x 8 [!]
## # Key:       Symbol [4]
##    Symbol Date        Open  High   Low Close Adj_Close    Volume
##    <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>     <dbl>
##  1 AAPL   2014-01-02  79.4  79.6  78.9  79.0      67.0  58671200
##  2 AAPL   2014-01-03  79.0  79.1  77.2  77.3      65.5  98116900
##  3 AAPL   2014-01-06  76.8  78.1  76.2  77.7      65.9 103152700
##  4 AAPL   2014-01-07  77.8  78.0  76.8  77.1      65.4  79302300
##  5 AAPL   2014-01-08  77.0  77.9  77.0  77.6      65.8  64632400
##  6 AAPL   2014-01-09  78.1  78.1  76.5  76.6      65.0  69787200
##  7 AAPL   2014-01-10  77.1  77.3  75.9  76.1      64.5  76244000
##  8 AAPL   2014-01-13  75.7  77.5  75.7  76.5      64.9  94623200
##  9 AAPL   2014-01-14  76.9  78.1  76.8  78.1      66.1  83140400
## 10 AAPL   2014-01-15  79.1  80.0  78.8  79.6      67.5  97909700
## # ℹ 5,022 more rows

Plotting closing prices for Amazon Stock:

gafa_stock %>%
  filter(Symbol == "AMZN") %>%
  gg_tsdisplay(Close, plot_type='partial') +
  labs(title = "Amazon Daily Closing Stock Price")

The stock price shows a clear upward trend over time. There are also changes in the variance over time, as seen by the amplitude of the price changes, which seems to increase with the level of the series, indicating that the series may also be heteroskedastic, another indicator of non-stationarity.

The ACF plot displays a very slow decay, wchich indicates that the past values have a strong correlation with the future values even at higher lags. In a stationary series, we would expect the ACF to drop off quickly to zero, which is not the case here. The PACF shows a significant spike at lag 1 and then cuts off, which is typical for a non-stationary series with a stochastic trend.

As we can see the chart suggest that the time series is non-stationary. Hence, differencing should be considered to make the series stationary.

Questino 9.3:

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Turkish GDP from global_economy.

Answer:

# plotting
global_economy %>%
  filter(Country == "Turkey") %>%
  gg_tsdisplay(GDP, plot_type='partial') +
  labs(title = "Non-transformed Turkish GDP")

# calculating lambda
lambda <- global_economy %>%
  filter(Country == "Turkey") %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

# unit root test
global_economy %>%
  filter(Country == "Turkey") %>%
  features(box_cox(GDP,lambda), unitroot_ndiffs)

## # A tibble: 1 × 2
##   Country ndiffs
##   <fct>    <int>
## 1 Turkey       1

Accommodation takings in the state of Tasmania from aus_accommodation.

Answer:

# plotting
aus_accommodation %>%
  filter(State == 'Tasmania') %>%
  gg_tsdisplay(Takings, plot_type='partial') +
  labs(title = "Non-transformed Tasmania Accomodation Takings")

# calculating lambda
lambda <- aus_accommodation %>%
  filter(State == 'Tasmania') %>%
  features(Takings, features = guerrero) %>%
  pull(lambda_guerrero)

# unit root test
aus_accommodation %>%
  filter(State == 'Tasmania') %>%
  features(box_cox(Takings,lambda), unitroot_ndiffs)

## # A tibble: 1 × 2
##   State    ndiffs
##   <chr>     <int>
## 1 Tasmania      1

Monthly sales from souvenirs.

Answer:

# plotting
souvenirs %>%
  gg_tsdisplay(Sales, plot_type = 'partial') + 
  labs(title = "Monthly Souvenirs sales")

# calculating lambda
lambda <- souvenirs %>%
  features(Sales, features = guerrero) %>%
  pull(lambda_guerrero)

# unit root test
souvenirs %>%
  mutate(Sales = box_cox(Sales, lambda)) %>%
  features(Sales, unitroot_ndiffs)

## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      1

Question 9.5:

For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

Answer

set.seed(123)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

# plotting
myseries %>%
  gg_tsdisplay(Turnover, plot_type='partial', lag = 36) +
  labs(title = "Non-transformed Retail Turnover")

# calculating lambda
lambda_turnover <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

# unit root test
myseries %>%
  mutate(Turnover = box_cox(Turnover, lambda_turnover)) %>%
  features(Turnover, unitroot_ndiffs)

## # A tibble: 1 × 3
##   State    Industry                  ndiffs
##   <chr>    <chr>                      <int>
## 1 Victoria Household goods retailing      1

The retail data shows non-stationarity and seasonality. To change the data as needed, a box-cox transformation was used. The seasonal KPSS unit root test recommends a single differencing step. The data appears to be stationary after transforming it and differencing once; the ACF and PACF plots demonstrate stationarity, and the data is centered around zero.

Question 9.6:

Simulate and plot some data from simple ARIMA models.

Use the following R code to generate data from an AR(1) model with \(\phi_1 = 0.6\) and \(\sigma^2 = 1\). The process starts with \(y_1 = 0\).

Answer:

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

Produce a time plot for the series. How does the plot change as you change \(\phi_1\)?

Answer:

sim %>%
  autoplot(y) +
  labs(title = "AR(1) model with phi_1 = 0.6")

Changing the \(\phi_1\) to -0.6:

for(i in 2:100)
  y[i] <- -0.6*y[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) %>%
  autoplot(y) +
  labs(title = "AR(1) model with phi_1 = -0.6")

Changing the \(\phi_1\) to 0.9:

for(i in 2:100)
  y[i] <- 0.9*y[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) %>%
  autoplot(y) +
  labs(title = "AR(1) model with phi_1 = 0.9")

The plot shows in a smoother series, where increases tend to follow increases and decreases tend to follow decreases, when \(\phi_1\) to 0.6. On the other hand, the plot has a oscillatory pattern, where the series appears to switch back and forth across the mean more frequently, when \(\phi_1\) to -0.6. Moreover, \(\phi_1\) with 0.9 value, the plot is smoother and the swings from one time point to the next are larger because the values are more persistent.

Write your own code to generate data from an MA(1) model with \(\theta_1 = 0.6\) and \(\sigma^2 = 1\).

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*e[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

Produce a time plot for the series. How does the plot change as you change \(\theta_1\)?

sim %>%
  autoplot(y) +
  labs(title = "AR(1) model with theta_1 = 0.6")

Changing the \(\theta_1\) to -0.6:

for(i in 2:100)
  y[i] <- -0.6*e[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) %>%
  autoplot(y) +
  labs(title = "AR(1) model with theta_1 = -0.6")

Changing the \(\theta_1\) to 0.9:

for(i in 2:100)
  y[i] <- 0.9*e[i-1] + e[i]
tsibble(idx = seq_len(100), y = y, index = idx) %>%
  autoplot(y) +
  labs(title = "AR(1) model with theta_1 = 0.9")

As \(\theta_1\) decreases, the frequency of the spikes increase. The shapes of the graphs do not change much as they have similar magnitude. It seems that as it decreases, it improves the centering around the mean.

Generate data from an ARMA(1,1) model with \(\phi_1 = 0.6\), \(\theta_1 = 0.6\) and \(\sigma^2 = 1\).

set.seed(123)
y <- numeric(100)

for(i in 2:100)
  y[i] <- e[i] + 0.6 * e[i-1] + 0.6 * y[i-1]

arma1_1 <- tsibble(idx = seq_len(100), y = y, index = idx)

Generate data from an AR(2) model with \(\phi_1 = -0.8\), \(\phi_2 = 0.3\) and \(\sigma^2 = 1\). (Note that these parameters will give a non-stationary series.)

set.seed(123)
y <- numeric(100)

for(i in 3:100)
  y[i] <- -0.8 * y[i-1] + 0.3 * y[i-2] + e[i]

ar2 <- tsibble(idx = seq_len(100), y = y, index = idx)

Graph the latter two series and compare them.

arma1_1 %>%
  gg_tsdisplay(y, plot_type='partial') +
  labs(title = ("ARMA(1,1) model"))

ar2 %>%
  gg_tsdisplay(y, plot_type='partial') +
  labs(title = ("AR(2) model"))

The ARMA(1,1) plot shows a stable, mean-reverting process, whereas the AR(2) plot shows an unstable, diverging process. In the ACF and PACF plots, the ARMA(1,1) process shows a more rapid decline in autocorrelations and a sharp cut-off in the PACF after lag 1, while the AR(2) process shows significant autocorrelation at the first two lags and a more complex ACF pattern.

Question 9.7:

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

fit <- aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers)) 

report(fit)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8756
## s.e.   0.0722
## 
## sigma^2 estimated as 4.671:  log likelihood=-87.8
## AIC=179.61   AICc=179.93   BIC=182.99

fit %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Aircraft Passengers with ARIMA(0,2,1)", y = "Passengers (in millions)")

fit %>% 
  gg_tsresiduals() +
  labs(title = "Residuals for ARIMA(0,2,1)")

Write the model in terms of the backshift operator.

\((1-B)^2y_t = c + (1+\theta_1B)e_t\)

Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

fit2 <-aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ pdq(0,1,0)))

fit2 %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Aircraft Passengers with ARIMA(0,1,0)", y = "Passengers (in millions)")

fit2 %>% 
  gg_tsresiduals() +
  labs(title = "Residuals for ARIMA(0,1,0)")

Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

fit3 <-aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ pdq(2,1,2)))

fit3 %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Aircraft Passengers with ARIMA(2,1,2)", y = "Passengers (in millions)")

fit3 %>% 
  gg_tsresiduals() +
  labs(title = "Residuals for ARIMA(2,1,2)")

#removing constant
fit4 <-aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))

report(fit4)

## Series: Passengers 
## Model: NULL model 
## NULL model

Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

fit5 <-aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))

fit5 %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Aircraft Passengers with ARIMA(0,2,1) with constant", y = "Passengers (in millions)")

fit5 %>% 
  gg_tsresiduals() +
  labs(title = "Residuals for ARIMA(0,2,1) with constant")

Question 9.8:

For the United States GDP series (from global_economy):

if necessary, find a suitable Box-Cox transformation for the data;

# plotting
global_economy %>%
  filter(Code == "USA") %>%
  gg_tsdisplay(GDP, plot_type='partial') +
  labs(title = "Non-transformed United States GDP")

# calculating lambda
lambda <-global_economy %>%
  filter(Code == "USA") %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

fit a suitable ARIMA model to the transformed data using ARIMA();

fit6 <- global_economy %>%
  filter(Code == "USA") %>%
  model(ARIMA(box_cox(GDP, lambda))) 

report(fit6)

## Series: GDP 
## Model: ARIMA(1,1,0) w/ drift 
## Transformation: box_cox(GDP, lambda) 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

try some other plausible models by experimenting with the orders chosen;

global_economy %>%
  filter(Code == "USA") %>%
  gg_tsdisplay(box_cox(GDP,lambda), plot_type='partial') +
  labs(title = "Transformed United States GDP")

global_economy %>%
  filter(Code == "USA") %>%
  features(box_cox(GDP,lambda), unitroot_ndiffs)

## # A tibble: 1 × 2
##   Country       ndiffs
##   <fct>          <int>
## 1 United States      1

usa_fit <- global_economy %>%
  filter(Code == "USA") %>%
  model(arima110 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,1,0)),
        arima120 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,2,0)),
        arima210 = ARIMA(box_cox(GDP,lambda) ~ pdq(2,1,0)),
        arima212 = ARIMA(box_cox(GDP,lambda) ~ pdq(2,1,2)),
        arima111 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,1,1)))

glance(usa_fit) %>% arrange(AICc) %>% select(.model:BIC)

## # A tibble: 5 × 6
##   .model   sigma2 log_lik   AIC  AICc   BIC
##   <chr>     <dbl>   <dbl> <dbl> <dbl> <dbl>
## 1 arima120  6780.   -326.  656.  656.  660.
## 2 arima110  5479.   -325.  657.  657.  663.
## 3 arima111  5580.   -325.  659.  659.  667.
## 4 arima210  5580.   -325.  659.  659.  667.
## 5 arima212  5734.   -325.  662.  664.  674.

The models have similar AICc values, with ARIMA(1,2,0) having the lowest and closely followed by ARIMA(1,1,0). The unit root test suggests to differ only once. There is a decreasing trend on the ACF plot and a significant first lag on the PACF plot, so an AR(1) model is suggested, or ARIMA(1,1,0). This is the same as in part b.

choose what you think is the best model and check the residual diagnostics;

usa_fit %>%
  select(arima120) %>%
  gg_tsresiduals() +
  ggtitle("Residuals for ARIMA(1,2,0)")

Since ARIMA(1,2,0) has the lowest AICc value, that model was chosen. The residuals seems to be white noise.

produce forecasts of your fitted model. Do the forecasts look reasonable?

usa_fit %>%
  forecast(h=5) %>%
  filter(.model=='arima120') %>%
  autoplot(global_economy)

compare the results with what you would obtain using ETS() (with no transformation).

fit_ets <- global_economy %>%
  filter(Code == "USA") %>%
  model(ETS(GDP))

report(fit_ets)

## Series: GDP 
## Model: ETS(M,A,N) 
##   Smoothing parameters:
##     alpha = 0.9990876 
##     beta  = 0.5011949 
## 
##   Initial states:
##          l[0]        b[0]
##  448093333334 64917355687
## 
##   sigma^2:  7e-04
## 
##      AIC     AICc      BIC 
## 3190.787 3191.941 3201.089

fit_ets %>%
  forecast(h=5) %>%
  autoplot(global_economy)

With no transformations, the forecast using ETS() generates a forecast with a much wider confidence level and has a significantly greater AICc.

Data-624 Homework 6

Mohammed Rahman

2024-03-13

Question 9.1:

Question 9.2:

Questino 9.3:

Question 9.5:

Question 9.6:

Question 9.7:

Question 9.8: