Homework 6 Data 624

library(fpp3)

Question 1

A.

Explain the differences among these figures. Do they all indicate that the data are white noise?

The differences between the figures starts with that they are all ACFs for a different amount of random numbers. They differ with the amount of random numbers each of the graphs represent, as well as, we can see that peaks and valleys of the graphs seem to get lower as we increase the amount of random numbers. This with the bounded area indicated by the blue lines which represents the significant threshold that can be calculated by ±\(\frac{1.96}{\sqrt{N}}\) where N is the length of the time series. As we can see from each of the graphs, the data is all within the bounded area. which signifies that the data is all white noise.

B.

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

Since the criticla values are calculated a ±\(\frac{1.96}{\sqrt{N}}\). Therefore the distances would be different from the mean of zero; since N represents the length of the time series. As N increases, the critical values approaches 0 since its dividing by a larger number. This also aligns with the different autocorrelation which keep on decreasing due to the increasing series sizes of random numbers.

Question 2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

data("gafa_stock")

# Time Series/ACF/PACF
gafa_stock %>%
  filter(Symbol == "AMZN") %>%
  gg_tsdisplay(Close, plot_type='partial') +
  labs(title = "Amazon Closing Stock Price")

# Unit Root Test
gafa_stock %>%
  filter(Symbol == "AMZN") %>%
  features(Close, unitroot_ndiffs)

## # A tibble: 1 × 2
##   Symbol ndiffs
##   <chr>   <int>
## 1 AMZN        1

# KPSS 
gafa_stock %>%
  filter(Symbol == "AMZN") %>%
  features(Close, unitroot_kpss)

## # A tibble: 1 × 3
##   Symbol kpss_stat kpss_pvalue
##   <chr>      <dbl>       <dbl>
## 1 AMZN        14.0        0.01

From a quick look at the time series graph we can indicate that the graph is non-stationary since it has an increasing trend throughout its period. For the ACF Plot, there is a large and positive \(r_{1}\) and the plot is slowly decreasing also showing a difference from a stationary time series which would rapidly drop to 0. Lastly in the PACF graph we can see that it has a first lag of 1 which shows a non-stationary trend. For the time series being differenced, we can use the KPSS test and see that the closing price would have to be differenced once in order to become stationary.

Question 3

For the following series, find an appropriate Box-Cox transformation and order of difference in order to obtain stationary data.

A.

Turkish GDP from global_economy.

data("global_economy")

# Lambda
Turkey_lambda <- global_economy %>%
  filter(Country == "Turkey") %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

# Unit Root Test KPSS
global_economy %>%
  filter(Country == "Turkey") %>%
  features(box_cox(GDP,Turkey_lambda), unitroot_ndiffs)

## # A tibble: 1 × 2
##   Country ndiffs
##   <fct>    <int>
## 1 Turkey       1

# Time Series/ACF/PACF
global_economy %>% 
  filter(Country =="Turkey") %>%
  gg_tsdisplay(GDP, plot_type='partial') +
  labs(title = "Turkish GDP")

# Box-Cox Transformation
global_economy %>%
  filter(Country == "Turkey") %>%
  gg_tsdisplay(difference(box_cox(GDP, Turkey_lambda)), plot_type = 'partial') +
  labs(title = paste("Differenced Turkish GDP with lambda = ", round(Turkey_lambda, 2)))

The Turkish GDP is both non-seasonal and non-stationary. Following the KPPS test, a single differencing step is suggested in order to transform the data to be stationary.

B.

Accommodation takings in the state of Tasmania from aus_accommodation.

data("aus_accommodation")

# Lambda
Tas_lambda <-aus_accommodation %>%
  filter(State == "Tasmania") %>%
  features(Takings, features = guerrero) %>%
  pull(lambda_guerrero)

# Unit Root Test KPSS
aus_accommodation %>%
  filter(State == "Tasmania") %>%
  features(box_cox(Takings,Tas_lambda), unitroot_nsdiffs)

## # A tibble: 1 × 2
##   State    nsdiffs
##   <chr>      <int>
## 1 Tasmania       1

# Time Series/ACF/PACF
aus_accommodation %>%
  filter(State == "Tasmania") %>%
  gg_tsdisplay(Takings, plot_type='partial') +
  labs(title = "Tasmania Accomodation Takings")

# Box-Cox Transformation
aus_accommodation %>%
  filter(State == "Tasmania") %>%
  gg_tsdisplay(difference(box_cox(Takings,Tas_lambda), 4), plot_type='partial') +
  labs(title = paste("Differenced Tasmania Accomodation Takings with lambda = ",round(Tas_lambda,2)))

Accommodation takings in Tasmania exhibits both seasonality and non-stationarity. According to the KPSS test, a single differencing step is advised, which renders the data stationary. The autocorrelation function (ACF) shows rapid decay, and the partial autocorrelation function (PACF) plot truncates after the first lag. However, the data appears to be more centered around zero when differenced twice.

C.

Monthly sales from souvenirs.

# Lambda
Souv_lambda <- souvenirs %>%
  features(Sales, features = guerrero) %>%
  pull(lambda_guerrero)

# Time Series/ACF/PACF
souvenirs %>%
  gg_tsdisplay(Sales, plot_type='partial', lag = 24) +
  labs(title = "Monthly Souvenir Sales")

# Unit Root Test KPSS
souvenirs %>%
  features(box_cox(Sales,Souv_lambda), unitroot_nsdiffs)

## # A tibble: 1 × 1
##   nsdiffs
##     <int>
## 1       1

# Box-Cox Transformation
souvenirs %>%
  gg_tsdisplay(difference(box_cox(Sales,Souv_lambda), 12), plot_type='partial', lag = 24) +
   labs(title = paste("Differenced Monthly Souvenir Sales with lambda = ",round(Souv_lambda,2)))

Monthly souvenir sales exhibits both seasonality and non-stationarity. The KPSS test suggests a single differencing step, and a lambda value close to 0 implies a natural log transformation. The graphs, then indicate stationarity after a single differencing. The autocorrelation function (ACF) displays rapid decay, while the partial autocorrelation function (PACF) plot truncates after the first lag.

Question 5

For your retail data (from Exercise 7 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

# Exercise 7 from Section 2.10
set.seed(334)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1)) 

# Time Series/ACF/PACF
myseries %>%
  gg_tsdisplay(Turnover, plot_type='partial', lag = 36) +
  labs(title = "Retail Turnover")

# Lambda
lambda <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

# Unit Root Test KPSS
myseries %>%
  features(box_cox(Turnover, lambda), unitroot_nsdiffs)

## # A tibble: 1 × 3
##   State           Industry                                         nsdiffs
##   <chr>           <chr>                                              <int>
## 1 South Australia Hardware, building and garden supplies retailing       1

# Box-Cox Transformation
myseries %>%
  gg_tsdisplay(difference(box_cox(Turnover,lambda), 12), plot_type='partial', lag = 36) +
  labs(title = paste("Differenced Tasmania Accomodation Takings with lambda = ",round(lambda,2)))

Question 6

A.

Use the following R code to generate data from an AR(1) model with ϕ1= 0.6 and σ2= 1. The process starts with y1=0.

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

B.

Produce a time plot for the series. How does the plot change as you change ϕ1?

# Base Phi
sim %>%
  autoplot(y) +
  labs(title = "AR(1) model with Phi = 0.6")

# Phi_1 = 1
for(i in 2:100)
  y[i] <- 1*y[i-1] + e[i]

tsibble(idx = seq_len(100), y = y, index = idx) %>%
  autoplot(y) +
  labs(title = "AR(1) model with Phi = 1")

# Phi_1 = -0.6
for(i in 2:100)
  y[i] <- -0.6*y[i-1] + e[i]

tsibble(idx = seq_len(100), y = y, index = idx) %>%
  autoplot(y) +
  labs(title = "AR(1) model with Phi = -0.6")

As ϕ is its base value we see it looks like random noise. When ϕ increase it shows characteristics that is similar to random walk as indicated when we set ϕ = 1. When ϕ decrease and we set it to negative -0.6 we can see larger variations with each high and low point spiking around the mean.

C.

Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2= 1.

for(i in 2:100)
  y[i] <- e[i] + 0.6 * e[i-1]

sim_2 <- tsibble(idx = seq_len(100), y = y, index = idx)

D.

Produce a time plot for the series. How does the plot change as you change θ1?

# Base Theta
sim_2 %>%
  autoplot(y) +
  labs(title = "MA(1) model with Theta = 0.6")

# Theta_1 = 1
for(i in 2:100)
  y[i] <- e[i] + 1 * e[i-1]

tsibble(idx = seq_len(100), y = y, index = idx) %>%
  autoplot(y) +
  labs(title = "MA(1) model with Theta = 1")

# Theta_1 = -0.6
for(i in 2:100)
  y[i] <- e[i] + -0.6 * e[i-1]

tsibble(idx = seq_len(100), y = y, index = idx) %>%
  autoplot(y) +
  labs(title = "MA(1) model with Theta = -0.6")

As θ increase the amplitude of the graph as the peaks/valleys seems to increase. When θ decreases the frequency of the spikes increase which seems that the valleys are just compressed along the x axis.

E.

Generate data from an ARMA(1,1) model with ϕ1=0.6, θ1= 0.6 and σ2 = 1.

set.seed(334)
y <- numeric(100)

for(i in 2:100)
  y[i] <- e[i] + 0.6 * e[i-1] + 0.6 * y[i-1]

ARMA_11 <- tsibble(idx = seq_len(100), y = y, index = idx)

F.

Generate data from an AR(2) model with ϕ1=−0.8, ϕ2=0.3 and σ2=1. (Note that these parameters will give a non-stationary series.)

set.seed(334)
y <- numeric(100)

for(i in 3:100)
  y[i] <- -0.8 * y[i-1] + 0.3 * y[i-2] + e[i]

AR_2 <- tsibble(idx = seq_len(100), y = y, index = idx)

G.

Graph the latter two series and compare them.

ARMA_11 %>%
  gg_tsdisplay(y, plot_type = 'partial') +
  labs(title = "ARMA(1,1) model with Phi = 0.6, Theta = 0.6, and Sigma^2 = 1")

AR_2 %>%
  gg_tsdisplay(y, plot_type = 'partial') +
  labs(title = "AR(2) model with Phi = -0.8, Thi = 0.3, and Sigma^2 = 1")

The ARMA(1,1) model seems to exhibit stationarity. It also appears random with a rapidly decreasing ACF plot and a truncated PACF plot after the first lag. The AR(2) model displays non-stationarity and oscillates around the mean with increasing variance as the index rises. It’s PACF plot reveals a negative first lag and no subsequent lags. Furthermore the ACF plot also oscillates and these discrepancies seem to come from an unsatisfied parameter constraints, ϕ2−ϕ1<1.

Question 7

A.

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

fit <- aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers)) 

report(fit)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8756
## s.e.   0.0722
## 
## sigma^2 estimated as 4.671:  log likelihood=-87.8
## AIC=179.61   AICc=179.93   BIC=182.99

fit %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Aircraft Passengers with ARIMA(0,2,1)", y = "Passengers (in millions)")

fit %>% 
  gg_tsresiduals() +
  labs(title = "Residuals for ARIMA(0,2,1)")

B.

Write the model in terms of the backshift operator.

\[y_{t} = -0.8756 \epsilon_{t-1} + \epsilon_{t} \] \[(1-B)^{2} y_{t} = (1-0.87B)\epsilon_{t}\]

C.

Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

fit2 <-aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ pdq(0,1,0)))

fit2 %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Aircraft Passengers with ARIMA(0,1,0)", y = "Passengers (in millions)")

fit2 %>% 
  gg_tsresiduals() +
  labs(title = "Residuals for ARIMA(0,1,0)")

The ARIMA model from part A is forecasted higher than actual values, where this ARIMA model’s forecasted values are below the actual values. The forecasted slopes do seems to be more gradual.

D.

Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

fit3 <-aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ pdq(2,1,2)))

fit3 %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Aircraft Passengers with ARIMA(2,1,2)", y = "Passengers (in millions)")

fit3 %>% 
  gg_tsresiduals() +
  labs(title = "Residuals for ARIMA(2,1,2)")

#removing constant
fit4 <-aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ 0 + pdq(2,1,2)))

## Warning: 1 error encountered for ARIMA(Passengers ~ 0 + pdq(2, 1, 2))
## [1] non-stationary AR part from CSS

report(fit4)

## Series: Passengers 
## Model: NULL model 
## NULL model

It closely resembles the previous part, with residuals exhibiting characteristics of white noise. When the constants are emitted we can see a null model alternatively if they are omitted the forecast within the polynomial trend of an order d-1 or 0, makes it non-stationary.

E.

Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

fit5 <-aus_airpassengers %>%
  filter(Year < 2012) %>%
  model(ARIMA(Passengers ~ 1 + pdq(0,2,1)))

## Warning: Model specification induces a quadratic or higher order polynomial trend. 
## This is generally discouraged, consider removing the constant or reducing the number of differences.

fit5 %>% 
  forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(title = "Australian Aircraft Passengers with ARIMA(0,2,1) with constant", y = "Passengers (in millions)")

fit5 %>% 
  gg_tsresiduals() +
  labs(title = "Residuals for ARIMA(0,2,1) with constant")

When the slope increases notably, a warning indicating the model specifies a higher order polynomial trend and advising the removal of the constant.

Question 8

A.

If necessary, find a suitable Box-Cox transformation for the data

# plot
global_economy %>%
  filter(Code == "USA") %>%
  gg_tsdisplay(GDP, plot_type='partial') +
  labs(title = "United States GDP")

The GDP is not stationary and its trend is increasing.I believe using a Box-Cox transformation will help regularize the data.

B.

Fit a suitable ARIMA model to the transformed data using ARIMA()

# calculate lambda
lambda <-global_economy %>%
  filter(Code == "USA") %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

fit6 <- global_economy %>%
  filter(Code == "USA") %>%
  model(ARIMA(box_cox(GDP, lambda))) 

report(fit6)

## Series: GDP 
## Model: ARIMA(1,1,0) w/ drift 
## Transformation: box_cox(GDP, lambda) 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

ARIMA(1,1,0) with a drift was suitable for the transformed data.

C.

Try some other plausible models by experimenting with the orders chosen

# transformed plot
global_economy %>%
  filter(Code == "USA") %>%
  gg_tsdisplay(box_cox(GDP,lambda), plot_type='partial') +
  labs(title = "Transformed United States GDP")

# unit root test
global_economy %>%
  filter(Code == "USA") %>%
  features(box_cox(GDP,lambda), unitroot_ndiffs)

## # A tibble: 1 × 2
##   Country       ndiffs
##   <fct>          <int>
## 1 United States      1

usa_fit <- global_economy %>%
  filter(Code == "USA") %>%
  model(arima110 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,1,0)),
        arima120 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,2,0)),
        arima210 = ARIMA(box_cox(GDP,lambda) ~ pdq(2,1,0)),
        arima212 = ARIMA(box_cox(GDP,lambda) ~ pdq(2,1,2)),
        arima111 = ARIMA(box_cox(GDP,lambda) ~ pdq(1,1,1)))

glance(usa_fit) %>% arrange(AICc) %>% select(.model:BIC)

## # A tibble: 5 × 6
##   .model   sigma2 log_lik   AIC  AICc   BIC
##   <chr>     <dbl>   <dbl> <dbl> <dbl> <dbl>
## 1 arima120  6780.   -326.  656.  656.  660.
## 2 arima110  5479.   -325.  657.  657.  663.
## 3 arima111  5580.   -325.  659.  659.  667.
## 4 arima210  5580.   -325.  659.  659.  667.
## 5 arima212  5734.   -325.  662.  664.  674.

The unit root test suggests to difference only once. There is a decreasing trend on the ACF plot and a significant first lag on the PACF plot, so an AR(1) model is suggested, or ARIMA(1,1,0). This is the same as in part b. The models have similar AICc values, with ARIMA(1,2,0) having the lowest, closely followed by ARIMA(1,1,0).

The unit root test showcases a single differencing. Both the ACF and PACF plots exhibit a declining trend, suggesting an AR(1) model or ARIMA(1,1,0). This is similar to the findings in part b. Despite having similar AICc values, ARIMA(1,2,0) has the lowest, closely trailed by ARIMA(1,1,0).

D.

Choose what you think is the best model and check the residual diagnostics

{R]} usa_fit %>% select(arima120) %>% gg_tsresiduals() + ggtitle("Residuals for ARIMA(1,2,0)")

E.

Produce forecasts of your fitted model. Do the forecasts look reasonable?

usa_fit %>%
  forecast(h=5) %>%
  filter(.model=='arima120') %>%
  autoplot(global_economy)

The forecasts seems reasonable since the trend is continuing at a very similar slope.

F.

Compare the results with what you would obtain using ETS() (with no transformation).

fit_ets <- global_economy %>%
  filter(Code == "USA") %>%
  model(ETS(GDP))

report(fit_ets)

## Series: GDP 
## Model: ETS(M,A,N) 
##   Smoothing parameters:
##     alpha = 0.9990876 
##     beta  = 0.5011949 
## 
##   Initial states:
##          l[0]        b[0]
##  448093333334 64917355687
## 
##   sigma^2:  7e-04
## 
##      AIC     AICc      BIC 
## 3190.787 3191.941 3201.089

fit_ets %>%
  forecast(h=5) %>%
  autoplot(global_economy)

Since the ETS() model has a significantly greater AICc it shows that this model is worse.