Data624

Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise? All three plots indicate that the data is white noise. This is because none of the spikes are larger than the critical value range for any of the plots. For data with smaller number of samples, the ACF bars are taller than the data with larger number of samples. Because the data is randomly generated, they are independently and identically distributed, and that is why they should have autocorrelation of zero for all of its lags.
Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The formula for the critical values is +/- 1.96/(sqrt(T - d)) where T is the sample size and d is the amount of difference used. As the sample size increases the critical values get smaller. This explains why the critical value region gets smaller (from left to right in the plot) as the sample size increases.

classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

gafa_stock %>%
  filter(Symbol == 'AMZN') %>%
  autoplot(Close) +
  labs(title='Amazon Closing Stock Prices')

gafa_stock %>%
  filter(Symbol == 'AMZN') %>%
  gg_tsdisplay(Close, plot_type = 'partial')

## Warning: Provided data has an irregular interval, results should be treated with
## caution. Computing ACF by observation.

## Warning: Provided data has an irregular interval, results should be treated with
## caution. Computing ACF by observation.

For a stationary data, the ACF usually drops very quickly to zero. For non-stationary data, the ACF decrease slowly and smoothly. In the ACF plot above, it is apparent that the ACF drop offs very slowly. In the price plot above, there are apparent trends throughout the time. Therefore it is not stationary.

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Turkish GDP from global_economy.

 gdp <- global_economy %>%
  filter(Country == 'Turkey') %>%
  select(Country, GDP)

lgdp <- gdp %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

gdp %>%
  mutate(GDP = box_cox(GDP, lgdp)) %>%
  features(GDP, unitroot_ndiffs)

## # A tibble: 1 x 2
##   Country ndiffs
##   <fct>    <int>
## 1 Turkey       1

Accommodation takings in the state of Tasmania from aus_accommodation.

takings <- aus_accommodation %>%
  filter(State == 'Tasmania') %>%
  select(State, Takings)

ltakings <- takings %>%
  features(Takings, features = guerrero) %>%
  pull(lambda_guerrero)

takings %>%
  mutate(Takings = box_cox(Takings, ltakings)) %>%
  features(Takings, unitroot_ndiffs)

## # A tibble: 1 x 2
##   State    ndiffs
##   <chr>     <int>
## 1 Tasmania      1

Monthly sales from souvenirs.

lsales <- souvenirs %>%
  features(Sales, features = guerrero) %>%
  pull(lambda_guerrero)

souvenirs %>%
  mutate(Sales = box_cox(Sales, lsales)) %>%
  features(Sales, unitroot_ndiffs)

## # A tibble: 1 x 1
##   ndiffs
##    <int>
## 1      1

For your retail data (from Exercise 8 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

set.seed(666)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))

lturnover <- myseries %>%
  features(Turnover, features = guerrero) %>%
  pull(lambda_guerrero)

myseries %>%
  mutate(Turnover = box_cox(Turnover, lturnover)) %>%
  features(Turnover, unitroot_ndiffs)

## # A tibble: 1 x 3
##   State           Industry                                         ndiffs
##   <chr>           <chr>                                             <int>
## 1 New South Wales Hardware, building and garden supplies retailing      1

6.Simulate and plot some data from simple ARIMA models.

Use the following R code to generate data from an AR(1) model with \[\phi_{1} = 0.6\] and \[\sigma^2=1\].The process starts with y(1)=0.

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
sim <- tsibble(idx = seq_len(100), y = y, index = idx)

Produce a time plot for the series. How does the plot change as you change ϕ1?

sim %>%
  autoplot()

## Plot variable not specified, automatically selected `.vars = y`

for(i in 2:100)
  y[i] <- 0.9*y[i-1] + e[i]
sim <- tsibble(x = seq_len(100), y = y, index = x)
sim %>% autoplot()

## Plot variable not specified, automatically selected `.vars = y`

for(i in 2:100)
  y[i] <- 1.5*y[i-1] + e[i]
sim <- tsibble(x = seq_len(100), y = y, index = x)
sim %>% autoplot()

## Plot variable not specified, automatically selected `.vars = y`

Increasing the ϕ1 reduces the randomness and forms a trend.

Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1.

y <- numeric(100)
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*e[i-1] + e[i]

d.Produce a time plot for the series. How does the plot change as you change θ1?

sim <- tsibble(x = seq_len(100), y = y, index = x)

sim %>%
  autoplot()

## Plot variable not specified, automatically selected `.vars = y`

for(i in 2:100)
  y[i] <- 0.15*e[i-1] + e[i]

sim <- tsibble(x = seq_len(100), y = y, index = x)

sim %>%
  autoplot()

## Plot variable not specified, automatically selected `.vars = y`

e. Generate data from an ARMA(1,1) model with ϕ1=0.6, θ1=0.6 and σ2=1.

p =0.6
t =0.6
s = 1
y1 <- ts(numeric(100))
e <- rnorm(1000, s)

for(i in 2:100)
  y1[i] <- p*y1[i-1] + t *e[i-1] + e[i]

f.Generate data from an AR(2) model with ϕ1=−0.8, ϕ2=0.3 and σ2=1. (Note that these parameters will give a non-stationary series.)

p1 =0.6
p2 = 0.3
s <- 1
y2 <- ts(numeric(100))
e <- rnorm(100, s)

for(i in 3:100)
  y2[i] <- p1*y2[i-1] + p2*y2[i-2] + e[i]

Graph the latter two series and compare them.

sim1 <- tsibble(x = seq_len(100), y = y1, index = x)
sim1 %>% autoplot()

## Plot variable not specified, automatically selected `.vars = y`

sim2 <- tsibble(x = seq_len(100), y = y2, index = x)
sim2 %>% autoplot()

## Plot variable not specified, automatically selected `.vars = y`

From above graphs it is visible that the AR(2) model produces a time series with a developing trend line.

7.Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

fit <- aus_airpassengers %>%
  model(search = ARIMA(Passengers, stepwise = FALSE))

report(fit)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

fit %>%
  gg_tsresiduals()

augment(fit) %>%
  features(.innov, ljung_box, lag = 12, dof = 3)

## # A tibble: 1 x 3
##   .model lb_stat lb_pvalue
##   <chr>    <dbl>     <dbl>
## 1 search    7.79     0.555

fit %>%
  forecast(h=12) %>%
  autoplot(aus_airpassengers)

Write the model in terms of the backshift operator. \[(1-B)^2y_t=c -(1+\theta_1B)e_t\]
Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

fit2 <- aus_airpassengers %>%
  model(arima1 = ARIMA(Passengers ~ pdq(0,1,0)))

report(fit2)

## Series: Passengers 
## Model: ARIMA(0,1,0) w/ drift 
## 
## Coefficients:
##       constant
##         1.4191
## s.e.    0.3014
## 
## sigma^2 estimated as 4.271:  log likelihood=-98.16
## AIC=200.31   AICc=200.59   BIC=203.97

fit2 %>%
  gg_tsresiduals()

augment(fit2) %>%
  features(.innov, ljung_box, lag = 12, dof = 3)

## # A tibble: 1 x 3
##   .model lb_stat lb_pvalue
##   <chr>    <dbl>     <dbl>
## 1 arima1    7.75     0.560

fit2 %>%
  forecast(h=12) %>%
  autoplot(aus_airpassengers)

The above auto generated ARIMA model outperforms this newer model.

Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

fit3 <- aus_airpassengers %>%
  model(arima2 = ARIMA(Passengers ~ 0 + pdq(2,1,2)))

## Warning: 1 error encountered for arima2
## [1] non-stationary AR part from CSS

report(fit3)

## Series: Passengers 
## Model: NULL model 
## NULL model

glance(fit3)

## # A tibble: 0 x 1
## # ... with 1 variable: .model <chr>

Since it is a null model it does not produce any forecasts.

Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

fit4 <- aus_airpassengers %>%
  model(arima3 = ARIMA(Passengers ~ pdq(0,2,1)))

report(fit4)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

fit4 %>%
  gg_tsresiduals()

augment(fit4) %>%
  features(.innov, ljung_box, lag = 12, dof = 3)

## # A tibble: 1 x 3
##   .model lb_stat lb_pvalue
##   <chr>    <dbl>     <dbl>
## 1 arima3    7.79     0.555

fit4 %>%
  forecast(h=12) %>%
  autoplot(aus_airpassengers)

The above model is very similar to the auto generated ARIMA model.

8.For the United States GDP series (from global_economy):

if necessary, find a suitable Box-Cox transformation for the data;

gdp <- global_economy %>%
  filter(Code == 'USA') %>%
  select(Country, GDP)

lam <- gdp %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)

gdp <- gdp %>%
  mutate(GDP = box_cox(GDP, lam))

fit a suitable ARIMA model to the transformed data using ARIMA();

fit <- gdp %>%
  model(
    arima = ARIMA(GDP, stepwise = FALSE, approx = FALSE))

report(fit)

## Series: GDP 
## Model: ARIMA(1,1,0) w/ drift 
## 
## Coefficients:
##          ar1  constant
##       0.4586  118.1822
## s.e.  0.1198    9.5047
## 
## sigma^2 estimated as 5479:  log likelihood=-325.32
## AIC=656.65   AICc=657.1   BIC=662.78

try some other plausible models by experimenting with the orders chosen;

gdp %>%
  features(GDP, unitroot_ndiffs)

## # A tibble: 1 x 2
##   Country       ndiffs
##   <fct>          <int>
## 1 United States      1

gdp %>%
  gg_tsdisplay(GDP, plot_type = 'partial')

fit2 <- gdp %>%
  model(arima2 = ARIMA(GDP ~ pdq(2,2,1)))

report(fit2)

## Series: GDP 
## Model: ARIMA(2,2,1) 
## 
## Coefficients:
##          ar1      ar2      ma1
##       0.3079  -0.0953  -0.7979
## s.e.  0.1850   0.1568   0.1437
## 
## sigma^2 estimated as 5834:  log likelihood=-321.06
## AIC=650.12   AICc=650.91   BIC=658.23

The above models show where the d = 2 and q = 1 as a ARIMA model - a similar model that the auto generated by the ARIMA model. I tested the other examples as well.

choose what you think is the best model and check the residual diagnostics;

augment(fit) %>%
  features(.innov, ljung_box, lag = 12, dof = 3)

## # A tibble: 1 x 4
##   Country       .model lb_stat lb_pvalue
##   <fct>         <chr>    <dbl>     <dbl>
## 1 United States arima     4.46     0.879

fit %>%
  gg_tsresiduals()

The auto generated model has the lower AIC and BIC. The above test shows a high p - value indicating that the data is white noise. Furthermore, the ACF graph show that the spikes remain within boundaries.

produce forecasts of your fitted model. Do the forecasts look reasonable?

fit %>%
  forecast(h = 12) %>%
  autoplot(gdp)

The above graph shows the forecast on the US GDP time series. The forecast looks reasonable because it is within reason on the time series.

compare the results with what you would obtain using ETS() (with no transformation).

global_economy %>%
  filter(Code == 'USA') %>%
  select(Country, GDP) %>%
  model(
    ETS(GDP)
  ) %>%
  forecast(h = 12) %>%
  autoplot(global_economy)

We can see from the above graph that with no transformations, the forecast using ETS() creates a forecast with a much wider confidence level.

Data624_HW6

Dominika Markowska-Desvallons

3/23/2022