HW6

Do the exercises 9.1, 9.2, 9.3, 9.5, 9.6, 9.7, 9.8 in Hyndman

9.1

a. Explain the differences among these figures. Do they all indicate that the data are white noise?
The correlations in each figure are not significantly different from zero. The peaks for the figures are all within the dashed blue lines. The figures suggests they are all white noise.

b. Why are the critical values at different distances from the mean of zero? Why are the auto correlations different in each figure when they each refer to white noise?

The length of time, \(T\) values are different for each other. The formula for the critical values is \(\pm 2/\sqrt{T}\), so as T increases, the critical value decreases.

As the \(T\) value is increasing, the critical value area decreasing from left to right.

9.2

A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

#filter for amazon
amznStock <- gafa_stock %>% 
  filter(Symbol == "AMZN")
#plot the time series, acf, pacf
amznStock %>% 
  gg_tsdisplay(Close, plot_type = 'partial') +
  labs(title= "Daily Closing Prices",
       subtitle= "Ticker: AMZN")

Trend: general upward trend with no seasonality or cyclic behavior. ACF: There is no seasonal pattern with a small trailing negative trend towards the end.

#plot the time series, acf, pacf
amznStock %>% 
  gg_tsdisplay(difference(Close), plot_type = 'partial') +
  labs(title= "Differenced Daily Closing Prices",
       subtitle= "Ticker: AMZN")

ACF of difference(close): Closing price is not autocorrelation. ### 9.3 For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

a. Turkish GDP from global_economy.

#filter
tk_economy <- global_economy %>% 
  filter(Country=='Turkey') %>% 
  select(Country, GDP)
#lambda value
lambda <- tk_economy %>%
  features(GDP, features = guerrero) %>%
  pull(lambda_guerrero)
 
tk_economy  %>%
  mutate(GDP = box_cox(GDP, lambda)) %>%
  features(GDP, unitroot_ndiffs)

## # A tibble: 1 × 2
##   Country ndiffs
##   <fct>    <int>
## 1 Turkey       1

lambda = ~ 0.16.

b. Accommodation takings in the state of Tasmania from aus_accommodation.

#filter for tasmania
tasmania_acco <- aus_accommodation %>% 
  filter(State == 'Tasmania')  %>%
  select(State, Takings)
#find lambda value
lambda_tasmania <- tasmania_acco %>%
  features(Takings, features = guerrero) %>%
  pull(lambda_guerrero)
#get ndiffs
tasmania_acco %>%
  mutate(Takings = box_cox(Takings, lambda_tasmania)) %>%
  features(Takings, unitroot_ndiffs)

## # A tibble: 1 × 2
##   State    ndiffs
##   <chr>     <int>
## 1 Tasmania      1

lambda = approximately -0.05.

c. Monthly sales from souvenirs.

# lambda
lambda_of_souvenirs <- souvenirs %>% 
  features(Sales, features = guerrero) %>%
  pull(lambda_guerrero)
#get ndiffs
souvenirs %>%
  mutate(Sales = box_cox(Sales, lambda_of_souvenirs)) %>%
  features(Sales, unitroot_ndiffs)

## # A tibble: 1 × 1
##   ndiffs
##    <int>
## 1      1

lambda = approximately 0.002.

9.5

For your retail data (from Exercise 8 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

set.seed(999)
myseries <- aus_retail %>%
  filter(`Series ID` == sample(aus_retail$`Series ID`,1))
myseries %>%
    gg_tsdisplay(Turnover, plot_type = 'partial', lag_max = 36) +
  labs(title= "Turnover of the Western Australia",
       subtitle="Industry: Department Stores", y = NULL)

myseries %>% 
  transmute(
    `Turnover` = Turnover,
    `Log Turnover` = log(Turnover),
    `Annual Change Log Turnover` = difference(log(Turnover), 12),
        `Doubly differenced log Turnover` =
                     difference(difference(log(Turnover), 12), 1)
  )%>%
  pivot_longer(-Month, names_to="Type", values_to="Turnover") %>%
  mutate(
    Type = factor(Type, levels = c(
      "Turnover",
      "Log Turnover",
      "Annual Change Log Turnover",
      "Doubly differenced log Turnover"))
  ) %>%
  ggplot(aes(x = Month, y = Turnover)) +
  geom_line() +
  facet_grid(vars(Type), scales = "free_y") +
  labs(title= "Western Australia Turnover",
       subtitle="Industry: Department Stores", y = NULL)

The stationary series has no predictable patterns in the long term, with no trend and seasonality. There appears to be an increasing trend and strong seasonal pattern that increases in size as the level of the series increases.

In order to stabilize the variance of the time series, a log transformation was taken, the variance was tamed, but the increasing trend is still not as expected. The seasonally differenced data is not completely stationary as such another round of differencing was performed.

9.7

Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

aus_airpassengers

## # A tsibble: 47 x 2 [1Y]
##     Year Passengers
##    <dbl>      <dbl>
##  1  1970       7.32
##  2  1971       7.33
##  3  1972       7.80
##  4  1973       9.38
##  5  1974      10.7 
##  6  1975      11.1 
##  7  1976      10.9 
##  8  1977      11.3 
##  9  1978      12.1 
## 10  1979      13.0 
## # … with 37 more rows

a. Use ARIMA() to find an appropriate ARIMA model. What model was selected. Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

#create model
fit <- aus_airpassengers %>%
  model(ARIMA(Passengers))
report(fit)

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

#forecast 10 periods
fit %>% forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(y = "Millions of Passengers", 
       title = "Air Passengers",
       subtitle = "10 Year Forecast")

fit %>% gg_tsresiduals() + 
  labs(title = "Air Passengers",
       subtitle = "10 Year Forecast")

Using the ARIMA() function, the model automatically selected for aus_airpassengers data the was an ARIMA(0,2,1). The output of the gg_tsresiduals() function tells us that the residuals are white noise.

b. Write the model in terms of the backshift operator.

\(y_t = -0.8963 * \epsilon_{t-1} + \epsilon_{t}\)

c. Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

#ARIMA(0,1,0) 
fit2 <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ pdq(0,1,0)))
#plot forecast
fit2 %>% forecast(h=10) %>%
  autoplot(aus_airpassengers) +
  labs(y = "Millions of Passengers", 
       title = "Air Passengers",
       subtitle = "10 Year Forecast")

#plot residuals
fit2 %>% gg_tsresiduals() +
  labs(title = "Air Passengers",
       subtitle = "10 Year Forecast")

d. Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

Running an ARIMA(2,1,2) on the aus_airpassengers results in a NULL model.

#ARIMA(2,1,2) 
fit3 <- aus_airpassengers %>%
  model(ARIMA(Passengers ~ pdq(2,1,2)))
report(fit3)

## Series: Passengers 
## Model: NULL model 
## NULL model

e. Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

The automated model generated in 9.7.a is an ARIMA(0,2,1) model.

9.8

For the United States GDP series (from global_economy):

a. if necessary, find a suitable Box-Cox transformation for the data;

The variation doesn’t seem to increase or decrease with the level of the series, so a transformation isn’t necessary.

us_gdp <- global_economy %>% 
  filter(Country=="United States")%>%
  select(Country, GDP)
us_gdp %>% autoplot(GDP) +
  labs(title = "United States GDP")

b. fit a suitable ARIMA model to the transformed data using ARIMA();

fit <- us_gdp %>%
  model(
    arima = ARIMA(GDP, stepwise = FALSE, approx = FALSE))
report(fit)

## Series: GDP 
## Model: ARIMA(0,2,2) 
## 
## Coefficients:
##           ma1      ma2
##       -0.4206  -0.3048
## s.e.   0.1197   0.1078
## 
## sigma^2 estimated as 2.615e+22:  log likelihood=-1524.08
## AIC=3054.15   AICc=3054.61   BIC=3060.23

Using the ARIMA() function, ARIMA(0,2,2) was found to be the best fit.

HW6

2023-03-21

9.1

9.2

9.5

9.7

9.8