1. The following figure shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

a. Explain the differences among these figures. Do they all indicate that the data are white noise?

For a white noise series, we expect 95% of the spikes in the ACF to lie within ±2/T−√, where T is the length of the time series. That is why, as T gets larger, the range between the dashed lines around the mean of zero in the diagrams is getting narrower. The diagrams do have some spikes touching or going slightly beyond the 95% interval border lines and, counted together, none of them make up more than 5% of T values. Therefore all 3 series can be regarded as white noise.

In other words, if the vast majority of the spikes are within the blue dashed line you likely have white noise - This is the case with all three plots.


b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The reason why the critical values are at different distances from the mean of zero is because there is a random autocorrelation with some positive and negative values around the zero line.

Given that the 3 series are composed of randomly chosen numbers, we expect the values and subsequently the autocorrelation values (in magnitude and direction) to also be random. Therefore, we would not expect all three graphs to be identical.


  1. A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.


ACF plot shows that the autocorrelation values are bigger than critical values (blue line) and decreases slowly over time. Also, r1 in the PACF is large and positive at about the value 1, indicating a lag value can be utilized to forecast the series. This also means that the series is non-stationary. Generally, the PACF plot shows that there is a strong correlation between IBM stock data and their 1 lagged values.

To achieve stationary data, IBM stock data would need to be differenced. Differencing wil stabilize the mean of a time series by removing changes in the level of a time series. Therefore it will eliminate or reduce trend and seasonality, thus making the series more stationary.


  1. For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

usnetelec

The plot shows a series that increases postively in a linear fashion. The lamda of .51 indicates a square-root transform could be utilized, but it did not appear to change the plot very much. The series displays no seasonality, therefore first differencing would be appropriate.



usgdp

In this case the lambda transform value of 0.36 and resulting transform (Log or square root) did straighten out the plot - linear. As a result, this series lends itself to a linear regression analysis. The ndiffs function below indicates no seasonality and similar to the prior series first diffencing would be appropriate to achieve white noise.



mcopper

Our lambda transform value of copper is 0.19 (near zero) so a log transform would be employed. The series displays an increasing trend and also may have some out-liers around the great-recession that could be influencing the series. The series appears to have monthly seasonality which seems to be supported somewhat by the polar plot below. Finally, the plots below indicate that the box-cox transform and first differencing made the series (near) stationary.



enplanements

The emplanements series shows an upward trend and strong seasonality. The variability of the series also appears to increase over time. The box-cox transform value of -0.22 results and resulting transform (log) improve the series plot reducing variability (smaller highs and lows). The nsdiffs function (value of 1) indicates a seasonal difference. In turn, we apply the ndiff function which suggests first differencing. We also see that frequency of the series is annual. Accordingly, the plot with box-cox transform, seasonal differncing and a lag of 1 (BoxCox(BoxCox.lambda(var)) %>% diff(12) %>% diff(1)) appears to yield the most stationary series.



visitors

Similar to the emplanements series the visior’s lambda transform value calls for a log transform. The nsdiffs and ndiffs results of 1 and 1, respectively indicate that a first difference after a seasonal difference will result in a stationary series.



  1. For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

The lambda value of 0.19 calls for the log transforms and clearly straightens out the data series. The nsdiff value of 1 as well as the plots indicate seasonality. Therefore, single seasonal differencing along with the box-cox transform should lead yield a stionary series.



  1. Use R to simulate and plot some data from simple ARIMA models.



Write your own code to generate data from an MA(1) model with theta1 = 0.6 and sigma^2 = 1.




Generate data from an AR(2) model with phi1 = -0.8, phi2 = 0.3 and sigma^2 = 1. (Note that these parameters will give a non-stationary series.)


Graph the latter two series and compare them.

Both ARIMA(1,1) and ARIMA(2) are scattered around zero, however ARMA(1,1) look more like white noise, whereas AR(2)’s variance increase with time to form a bugle/horn like graph .

ARMA(1,1)

AR(2)


  1. Consider the number of women murdered each year (per 100,000 standard population) in the United States. (Data set wmurders).

By studying appropriate graphs of the series in R, find an appropriate ARIMA(p,d,q) model for these data.

Initially, the series shows an positive upward trend. It then levels off from the 70s through the 90s and then begins a steady decline into the 2000s. There is also a upward spike in early 2000 that temporarily interrupts the steady decline.

## [1] 2
## 
##  Box-Pierce test
## 
## data:  .
## X-squared = 0.39628, df = 1, p-value = 0.529
## 
##  Box-Pierce test
## 
## data:  .
## X-squared = 24.722, df = 1, p-value = 6.623e-07

Alternative Forecasts

## Series: wmurders 
## ARIMA(2,1,0) 
## 
## Coefficients:
##           ar1     ar2
##       -0.0572  0.2967
## s.e.   0.1277  0.1275
## 
## sigma^2 estimated as 0.04265:  log likelihood=9.48
## AIC=-12.96   AICc=-12.48   BIC=-6.99
## Series: wmurders 
## ARIMA(0,1,2) 
## 
## Coefficients:
##           ma1     ma2
##       -0.0660  0.3712
## s.e.   0.1263  0.1640
## 
## sigma^2 estimated as 0.0422:  log likelihood=9.71
## AIC=-13.43   AICc=-12.95   BIC=-7.46

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,2)
## Q* = 9.7748, df = 8, p-value = 0.2812
## 
## Model df: 2.   Total lags used: 10


Should you include a constant in the model? Explain.

No, I don’t believe a constant should be added to the model. The time series does not appear to a have a consistent trend. Therefore the introduction of a constant could cause drift and undermine the model.


Write this model in terms of the backshift operator.

The model is characterized by (1−B)yt


Fit the model using R and examine the residuals. Is the model satisfactory?

The residuals plot for the ARIMA(0,1,2) above appears to be white noise, which is indicative of a good fit.


Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.

Hand Calculation

Compare Forecast

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005       2.458450 2.195194 2.721707 2.055834 2.861066
## 2006       2.477101 2.116875 2.837327 1.926183 3.028018
## 2007       2.477101 1.979272 2.974929 1.715738 3.238464
##      ma1      ma1      ma1 
## 2.458450 2.477101 2.477101


Create a plot of the series with forecasts and prediction intervals for the next three periods shown.


Does auto.arima give the same model you have chosen? If not, which model do you think is better?

The auto.arima model performed worse than the manually selected model.

## Series: wmurders 
## ARIMA(0,2,3) 
## 
## Coefficients:
##           ma1     ma2      ma3
##       -1.0154  0.4324  -0.3217
## s.e.   0.1282  0.2278   0.1737
## 
## sigma^2 estimated as 0.04475:  log likelihood=7.77
## AIC=-7.54   AICc=-6.7   BIC=0.35