title: “DS624_HW6_JagdishChhabria”
author: “Jagdish Chhabria”
date: “10/14/2021”
output:
pdf_document: default
html_document: null
toc_float: yes
toc_collapsed: yes
toc: yes
toc_depth: 3
number_sections: yes
theme: lumen

9.1) Figure 9.32 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

a. Explain the differences among these figures. Do they all indicate that the data are white noise?

Given that the numbers used to represent this time series are random, the ACF plot expectedly shows no autocorrelation between them. The autocorrelation bars do not display any recurring pattern or seasonality, and none of them exceed the 95% confidence interval represented by the two blue horizontal lines. So all 3 plots indicate that the data is white noise.

b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The critical values are at different distances from the mean of zero because they are based on the standard deviation of the autocorrelation coefficient at lag k, which in turn is inversely proportional to the number of observations in the time series. As N increases, the variance decreases and the bands get narrower. The autocorrelations are different in each figure possibly because the same random seed may not have been used in generating the 3 sets of random numbers. so while they each refer to white noise the actual value for the ACF at different lags is not the same across the 3 time series.

9.2) A classic example of a non-stationary series are stock prices. Plot the daily closing prices for Amazon stock (contained in gafa_stock), along with the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

The above plot shows a clear trend in the data - an upward trend from 2015 to mid-2018, and a downward trend from mid-2018 to 2019.

The ACF plot shows significant autocorrelations for all lags going upto lag 30. This indicates that the data is not stationary.

The PACF plot shows significant partial autocorrelation for lag = 1 and then significant partial autocorrelations for lag = 5, 19 and 25.

We calculate how many rounds of differencing will need to be applied in order to achieve stationarity.

## # A tibble: 1 x 2
##   Symbol ndiffs
##   <chr>   <int>
## 1 AMZN        1

From the above, it seems just first-differencing should suffice. We re-plot the ACF and PACF after applying first differencing.

9.3) For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

a. Turkish GDP from global_economy.

The Turkish GDP shows am upward trend that gets strong after 2001. Let’s apply the Box Cox transformation.

## [1] 0.1572187

The optimal lambda value for the Box Cox tranformation is 0.1572. We now calculate first differences of the transformed series.

We run the KPSS test for stationarity

## # A tibble: 1 x 3
##   Country kpss_stat kpss_pvalue
##   <fct>       <dbl>       <dbl>
## 1 Turkey     0.0889         0.1

Based on the test, we can conclude that the time series is now stationary.

b. Accommodation takings in the state of Tasmania from aus_accommodation.

## # A tsibble: 6 x 5 [1Q]
## # Key:       State [1]
##      Date State                        Takings Occupancy   CPI
##     <qtr> <chr>                          <dbl>     <dbl> <dbl>
## 1 1998 Q1 Australian Capital Territory    24.3        65  67  
## 2 1998 Q2 Australian Capital Territory    22.3        59  67.4
## 3 1998 Q3 Australian Capital Territory    22.5        58  67.5
## 4 1998 Q4 Australian Capital Territory    24.4        59  67.8
## 5 1999 Q1 Australian Capital Territory    23.7        58  67.8
## 6 1999 Q2 Australian Capital Territory    25.4        61  68.1

The above plot shows an upward trend, strong seasonality as well as increasing variance. We apply the Box Cox transformation.

This seems to have stabilized the variance, but there is still an upward trend.

## [1] -0.04884781

The optimal lambda avalue is -0.0488.

After taking first differences of the transformed data, the upward trend has been removed. We run the KPSS test for stationarity

## # A tibble: 1 x 3
##   State    kpss_stat kpss_pvalue
##   <chr>        <dbl>       <dbl>
## 1 Tasmania     0.256         0.1

Based on the test, we can conclude that the time series is now stationary.

c. Monthly sales from souvenirs.

The souvenirs data shows an upward trend, strong seasonality and an increasing variance. Let’s apply the Box Cox transformation.

This transformation seems to have stabilized the variance.

Taking first differences of the transformed data has removed the upward trend.

We run the KPSS test for stationarity

## # A tibble: 1 x 2
##   kpss_stat kpss_pvalue
##       <dbl>       <dbl>
## 1    0.0631         0.1

Based on the test, we can conclude that the time series is now stationary.

##9.5) For your retail data (from Exercise 8 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

The retail data shows trends and seasonality. Let’s compute the required number of differencing.

## # A tibble: 1 x 3
##   State             Industry                     nsdiffs
##   <chr>             <chr>                          <int>
## 1 Western Australia Newspaper and book retailing       1

The result above suggests that one seasonal difference is required. Let’s try the same method on the difference of the log of the turnover.

## # A tibble: 1 x 3
##   State             Industry                     ndiffs
##   <chr>             <chr>                         <int>
## 1 Western Australia Newspaper and book retailing      0

The result above suggests that no more differencing is needed.

## # A tibble: 1 x 4
##   State             Industry                     kpss_stat kpss_pvalue
##   <chr>             <chr>                            <dbl>       <dbl>
## 1 Western Australia Newspaper and book retailing     0.222         0.1

9.6) Simulate and plot some data from simple ARIMA models.

a. Use the following R code to generate data from an AR(1) model with ϕ1=0.6 and σ^2=1. The process starts with y=0

b. Produce a time plot for the series. How does the plot change as you change ϕ1?

Let’s try 2 other values for phi, one lower at 0.3 and one higher at 0.9.

With a lower phi value, the data looks more like white noise, while with a higher phi value, it looks more auto-regressive. This is to be expected since we are placing a greater weight on the lagged term.

c. Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1.

d. Produce a time plot for the series. How does the plot change as you change θ1?

Let’s try 2 other values for theta, one lower at 0.3 and one higher at 0.9.

The different theta values do not seem to have much impact on the time series plot.

e. Generate data from an ARMA(1,1) model with ϕ1=0.6, θ1=0.6 and σ2=1.

f. Generate data from an AR(2) model with ϕ1=−0.8, ϕ2=0.3 and σ2=1. (Note that these parameters will give a non-stationary series.)

g. Graph the latter two series and compare them.

The AR(2) model has sharply increasing variance with no trend. In comparison, the ARMA(1,1) model seems to have constant variance and no trend.

9.7) Consider aus_airpassengers, the total number of passengers (in millions) from Australian air carriers for the period 1970-2011.

## # A tsibble: 6 x 2 [1Y]
##    Year Passengers
##   <dbl>      <dbl>
## 1  1970       7.32
## 2  1971       7.33
## 3  1972       7.80
## 4  1973       9.38
## 5  1974      10.7 
## 6  1975      11.1

a. Use ARIMA() to find an appropriate ARIMA model. What model was selected? Check that the residuals look like white noise. Plot forecasts for the next 10 periods.

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

The default model fitted is an ARIMA(0,2,1) model. Next we try to use the stepwise search feature.

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

As expected, it returns the ARIMA model with the same parameters.

## # A tibble: 1 x 8
##   .model sigma2 log_lik   AIC  AICc   BIC ar_roots  ma_roots 
##   <chr>   <dbl>   <dbl> <dbl> <dbl> <dbl> <list>    <list>   
## 1 search   4.31   -97.0  198.  198.  202. <cpl [0]> <cpl [1]>
## # A tibble: 1 x 3
##   .model lb_stat lb_pvalue
##   <chr>    <dbl>     <dbl>
## 1 search    6.70     0.461

The innovation residuals pass the ljung-box test.

The plot above indicates that the data is stationary. We now forecast using this model.

b. Write the model in terms of the backshift operator.

((1−B)^2).yt = c + (1+θ1B).εt

c. Plot forecasts from an ARIMA(0,1,0) model with drift and compare these to part a.

## Series: Passengers 
## Model: ARIMA(0,1,0) w/ drift 
## 
## Coefficients:
##       constant
##         1.4191
## s.e.    0.3014
## 
## sigma^2 estimated as 4.271:  log likelihood=-98.16
## AIC=200.31   AICc=200.59   BIC=203.97

Compared to the forecast plot in a), the forecast plot with the drift term is less steep and has wider confidence bands.

d. Plot forecasts from an ARIMA(2,1,2) model with drift and compare these to parts a and c. Remove the constant and see what happens.

## Series: Passengers 
## Model: NULL model 
## NULL model

This results in a NULL model.

e. Plot forecasts from an ARIMA(0,2,1) model with a constant. What happens?

## Series: Passengers 
## Model: ARIMA(0,2,1) 
## 
## Coefficients:
##           ma1
##       -0.8963
## s.e.   0.0594
## 
## sigma^2 estimated as 4.308:  log likelihood=-97.02
## AIC=198.04   AICc=198.32   BIC=201.65

This results in a model on the same lines as part a.

3.2. For the United States GDP series (from global_economy):

###a) if necessary, find a suitable Box-Cox transformation for the data;

###b) fit a suitable ARIMA model to the transformed data using ARIMA();

## Series: GDP 
## Model: ARIMA(0,2,2) 
## 
## Coefficients:
##           ma1      ma2
##       -0.4206  -0.3048
## s.e.   0.1197   0.1078
## 
## sigma^2 estimated as 26150:  log likelihood=-363.57
## AIC=733.14   AICc=733.61   BIC=739.22

This fits an ARIMA(0,2,2) model to the US GDP time series. This shows that differencing twice is required to make the series stationary.

###c) try some other plausible models by experimenting with the orders chosen;

Let us try to fit an ARIMA(2,2,1) model instead.

## Series: GDP 
## Model: ARIMA(2,2,1) 
## 
## Coefficients:
##          ar1      ar2      ma1
##       0.4321  -0.1606  -0.8028
## s.e.  0.1537   0.1405   0.0908
## 
## sigma^2 estimated as 26190:  log likelihood=-363.11
## AIC=734.21   AICc=734.99   BIC=742.31

This model is less accurate than the model fitted by default.

###d) choose what you think is the best model and check the residual diagnostics;

## # A tibble: 1 x 4
##   Country       .model                                       lb_stat lb_pvalue
##   <fct>         <chr>                                          <dbl>     <dbl>
## 1 United States ARIMA(GDP, stepwise = FALSE, approx = FALSE)    12.2    0.0946

The above plots indicate that the innovation residuals are white noise.

###e) produce forecasts of your fitted model. Do the forecasts look reasonable?

The forecasts look reasonable.

###f) compare the results with what you would obtain using ETS() (with no transformation).

With no transformations, the ETS model results in a less steep increase in forecast value, but within much wider confidence bands.