| title: “DS624_HW6_JagdishChhabria” |
| author: “Jagdish Chhabria” |
| date: “10/14/2021” |
| output: |
| pdf_document: default |
| html_document: null |
| toc_float: yes |
| toc_collapsed: yes |
| toc: yes |
| toc_depth: 3 |
| number_sections: yes |
| theme: lumen |
Given that the numbers used to represent this time series are random, the ACF plot expectedly shows no autocorrelation between them. The autocorrelation bars do not display any recurring pattern or seasonality, and none of them exceed the 95% confidence interval represented by the two blue horizontal lines. So all 3 plots indicate that the data is white noise.
The critical values are at different distances from the mean of zero because they are based on the standard deviation of the autocorrelation coefficient at lag k, which in turn is inversely proportional to the number of observations in the time series. As N increases, the variance decreases and the bands get narrower. The autocorrelations are different in each figure possibly because the same random seed may not have been used in generating the 3 sets of random numbers. so while they each refer to white noise the actual value for the ACF at different lags is not the same across the 3 time series.
The above plot shows a clear trend in the data - an upward trend from 2015 to mid-2018, and a downward trend from mid-2018 to 2019.
The ACF plot shows significant autocorrelations for all lags going upto lag 30. This indicates that the data is not stationary.
The PACF plot shows significant partial autocorrelation for lag = 1 and then significant partial autocorrelations for lag = 5, 19 and 25.
We calculate how many rounds of differencing will need to be applied in order to achieve stationarity.
## # A tibble: 1 x 2
## Symbol ndiffs
## <chr> <int>
## 1 AMZN 1
From the above, it seems just first-differencing should suffice. We re-plot the ACF and PACF after applying first differencing.
The Turkish GDP shows am upward trend that gets strong after 2001. Let’s apply the Box Cox transformation.
## [1] 0.1572187
The optimal lambda value for the Box Cox tranformation is 0.1572. We now calculate first differences of the transformed series.
We run the KPSS test for stationarity
## # A tibble: 1 x 3
## Country kpss_stat kpss_pvalue
## <fct> <dbl> <dbl>
## 1 Turkey 0.0889 0.1
Based on the test, we can conclude that the time series is now stationary.
## # A tsibble: 6 x 5 [1Q]
## # Key: State [1]
## Date State Takings Occupancy CPI
## <qtr> <chr> <dbl> <dbl> <dbl>
## 1 1998 Q1 Australian Capital Territory 24.3 65 67
## 2 1998 Q2 Australian Capital Territory 22.3 59 67.4
## 3 1998 Q3 Australian Capital Territory 22.5 58 67.5
## 4 1998 Q4 Australian Capital Territory 24.4 59 67.8
## 5 1999 Q1 Australian Capital Territory 23.7 58 67.8
## 6 1999 Q2 Australian Capital Territory 25.4 61 68.1
The above plot shows an upward trend, strong seasonality as well as increasing variance. We apply the Box Cox transformation.
This seems to have stabilized the variance, but there is still an upward trend.
## [1] -0.04884781
The optimal lambda avalue is -0.0488.
After taking first differences of the transformed data, the upward trend has been removed. We run the KPSS test for stationarity
## # A tibble: 1 x 3
## State kpss_stat kpss_pvalue
## <chr> <dbl> <dbl>
## 1 Tasmania 0.256 0.1
Based on the test, we can conclude that the time series is now stationary.
The souvenirs data shows an upward trend, strong seasonality and an increasing variance. Let’s apply the Box Cox transformation.
This transformation seems to have stabilized the variance.
Taking first differences of the transformed data has removed the upward trend.
We run the KPSS test for stationarity
## # A tibble: 1 x 2
## kpss_stat kpss_pvalue
## <dbl> <dbl>
## 1 0.0631 0.1
Based on the test, we can conclude that the time series is now stationary.
##9.5) For your retail data (from Exercise 8 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
The retail data shows trends and seasonality. Let’s compute the required number of differencing.
## # A tibble: 1 x 3
## State Industry nsdiffs
## <chr> <chr> <int>
## 1 Western Australia Newspaper and book retailing 1
The result above suggests that one seasonal difference is required. Let’s try the same method on the difference of the log of the turnover.
## # A tibble: 1 x 3
## State Industry ndiffs
## <chr> <chr> <int>
## 1 Western Australia Newspaper and book retailing 0
The result above suggests that no more differencing is needed.
## # A tibble: 1 x 4
## State Industry kpss_stat kpss_pvalue
## <chr> <chr> <dbl> <dbl>
## 1 Western Australia Newspaper and book retailing 0.222 0.1
Let’s try 2 other values for phi, one lower at 0.3 and one higher at 0.9.
With a lower phi value, the data looks more like white noise, while with a higher phi value, it looks more auto-regressive. This is to be expected since we are placing a greater weight on the lagged term.
Let’s try 2 other values for theta, one lower at 0.3 and one higher at 0.9.
The different theta values do not seem to have much impact on the time series plot.
The AR(2) model has sharply increasing variance with no trend. In comparison, the ARMA(1,1) model seems to have constant variance and no trend.
## # A tsibble: 6 x 2 [1Y]
## Year Passengers
## <dbl> <dbl>
## 1 1970 7.32
## 2 1971 7.33
## 3 1972 7.80
## 4 1973 9.38
## 5 1974 10.7
## 6 1975 11.1
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8963
## s.e. 0.0594
##
## sigma^2 estimated as 4.308: log likelihood=-97.02
## AIC=198.04 AICc=198.32 BIC=201.65
The default model fitted is an ARIMA(0,2,1) model. Next we try to use the stepwise search feature.
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8963
## s.e. 0.0594
##
## sigma^2 estimated as 4.308: log likelihood=-97.02
## AIC=198.04 AICc=198.32 BIC=201.65
As expected, it returns the ARIMA model with the same parameters.
## # A tibble: 1 x 8
## .model sigma2 log_lik AIC AICc BIC ar_roots ma_roots
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <list> <list>
## 1 search 4.31 -97.0 198. 198. 202. <cpl [0]> <cpl [1]>
## # A tibble: 1 x 3
## .model lb_stat lb_pvalue
## <chr> <dbl> <dbl>
## 1 search 6.70 0.461
The innovation residuals pass the ljung-box test.
The plot above indicates that the data is stationary. We now forecast using this model.
((1−B)^2).yt = c + (1+θ1B).εt
## Series: Passengers
## Model: ARIMA(0,1,0) w/ drift
##
## Coefficients:
## constant
## 1.4191
## s.e. 0.3014
##
## sigma^2 estimated as 4.271: log likelihood=-98.16
## AIC=200.31 AICc=200.59 BIC=203.97
Compared to the forecast plot in a), the forecast plot with the drift term is less steep and has wider confidence bands.
## Series: Passengers
## Model: NULL model
## NULL model
This results in a NULL model.
## Series: Passengers
## Model: ARIMA(0,2,1)
##
## Coefficients:
## ma1
## -0.8963
## s.e. 0.0594
##
## sigma^2 estimated as 4.308: log likelihood=-97.02
## AIC=198.04 AICc=198.32 BIC=201.65
This results in a model on the same lines as part a.
###a) if necessary, find a suitable Box-Cox transformation for the data;
###b) fit a suitable ARIMA model to the transformed data using ARIMA();
## Series: GDP
## Model: ARIMA(0,2,2)
##
## Coefficients:
## ma1 ma2
## -0.4206 -0.3048
## s.e. 0.1197 0.1078
##
## sigma^2 estimated as 26150: log likelihood=-363.57
## AIC=733.14 AICc=733.61 BIC=739.22
This fits an ARIMA(0,2,2) model to the US GDP time series. This shows that differencing twice is required to make the series stationary.
###c) try some other plausible models by experimenting with the orders chosen;
Let us try to fit an ARIMA(2,2,1) model instead.
## Series: GDP
## Model: ARIMA(2,2,1)
##
## Coefficients:
## ar1 ar2 ma1
## 0.4321 -0.1606 -0.8028
## s.e. 0.1537 0.1405 0.0908
##
## sigma^2 estimated as 26190: log likelihood=-363.11
## AIC=734.21 AICc=734.99 BIC=742.31
This model is less accurate than the model fitted by default.
###d) choose what you think is the best model and check the residual diagnostics;
## # A tibble: 1 x 4
## Country .model lb_stat lb_pvalue
## <fct> <chr> <dbl> <dbl>
## 1 United States ARIMA(GDP, stepwise = FALSE, approx = FALSE) 12.2 0.0946
The above plots indicate that the innovation residuals are white noise.
###e) produce forecasts of your fitted model. Do the forecasts look reasonable?
The forecasts look reasonable.
###f) compare the results with what you would obtain using ETS() (with no transformation).
With no transformations, the ETS model results in a less steep increase in forecast value, but within much wider confidence bands.