Question HA 8.1
Question
Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
Answer
Part A
Explain the differences among these figures. Do they all indicate that the data are white noise?
Fig 8.31 ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers.
Ans : Data in all the 3 figures resemble white noise.The difference between all these figures is the critical values for each dataset, as the dataset is small the critical values are larger and as the dataset increases the critical values are smaller.
Part B
Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?
Ans : Critical values are at different distances from the mean of zero, the formula to calculate critical values is \[\pm 2/\sqrt { N }\] where N=length of TS. As the length of TS increases or decreases the critical values come closer or move further away from mean of zero.
The autocorrelations are different in each figure because of the different length of each timeseries.
Question HA 8.2
Question
A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose
). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.
Answer
First, the autoplot.
The timeseries appears to have some trend component (generally downward, sorry, IBM), which would make the data non-stationary. Taking either the first or second difference should remove this trend and make the data stationary.
Here we see an autocorrelation plot that agrees with our initial assumptions; the decreacing values are a clear sign of trend, and every lag has a significant autocorrelation. This data is non-stationary data.
All of the autocorrelations are explainable by the first lag, the other correlations being significant are a product of this. Basically, for each lag the autocorrelation depends most significantly on the previous value. Again, this correlation indicates non-stationary data which can likely be fixed by a first difference.
Question HA 8.3
Question
For the following series, find an appropriate Box-Cox transformation and order differencing in order to obtain stationary data.
- usnetelec
- usgdp
- mcopper
- enplanements
Answer
We have done these data set previously for Box-Cox only. For all of these, we will first show the autoplot, then the BC trans, then order differencing. Then we will weigh the merits of using either BC or order differencing for this data is appropriate.
Part A
Trended data, but no apparent seasonality. A first difference should be adequate.
lamb<- BoxCox.lambda(usnetelec)
bc_uselec<- BoxCox(usnetelec, lamb)
autoplot(bc_uselec) + ggtitle("Box Cox Transform on Us Net Electric")
This seems like a nice transformation that levels out the scoop in the data from 1960-1970, reduces the scale, and irons out some the variation in 1980-1990.
One might get upset about a seemingly non-constant variance from 1950-1970, so let’s see if the Box Cox transformation fixes that apparent increase in variation.
usnetelec %>%
BoxCox(lambda=lamb) %>%
diff() %>%
autoplot() + ggtitle("First Difference Box Cox Data usnetelec")
Looks good. Box Cox worked well here with the first difference.
Part B
Again, trended data with no apparent seasonality - tried a seasonality plot and subseries to confirm that there was not a slight seaonality. This will likely end up similar to usnetelec
.
lamb<- BoxCox.lambda(usgdp)
bc_usgdp<- BoxCox(usgdp, lamb)
autoplot(bc_usgdp) + ggtitle("Box Cox Transform on US GDP")
Again, an good adjustment to the data. More linear and more reasonable scale.
After some playing with transformed or untransformed data, we again landed on using Box Cox and a first difference. The untransformed first difference of this data also appeared to have a non-constant variance. The above has constant variance in the first difference and thereby is “more” stationary.
Part C
This is a more complex data set. There is clear seasonality, some possible trending from the start to the around the year 2000, then a massive run in price around 2005.
lamb<- BoxCox.lambda(mcopper)
bc_copper<- BoxCox(mcopper, lamb)
autoplot(bc_copper) + ggtitle("Box Cox Transform on Copper Price")
Much better in terms of consistent seasonality and reduced scale and the trend appears more clear.
A quality result from the first difference. This is now stationary data. A second difference was not materiallty different and a call to ndiffs
returned a value of one.
Part D
Seasonal data with some kind of trend overall.
Given the piece-wise nature of the seasonally adjusted data for this set, Box Cox leaves us wanting more. We explored a log transformation as well as no transformation on the first and second difference of this data and decided on Box Cox and second difference.
bc_planes %>%
diff() %>%
diff() %>%
autoplot() +
ggtitle("Enplanements with Second Difference and Box Cox Transformation")
This transformation and second difference of the data seems the most stationary since the variance is more constant, and the crash in enplanements is more leveled out in the second difference. The un-transformed and log transformed are similar, but the scale is more centered around zero and crash in values is better modulated.
Part E
This should be easy for Box Cox and first difference.
visitors %>%
BoxCox(.,lambda = BoxCox.lambda(.)) %>%
autoplot() +
ggtitle("Box Cox on visitors data set")
A good fit.
visitors %>%
BoxCox(.,lambda = BoxCox.lambda(.)) %>%
diff() %>%
autoplot() +
ggtitle("First Difference and Box Cox on visitors data set")
The second difference does not confer a meaningful difference in terms of how stationary the data has become.
Question HA 8.5
Question
For your retail data (from Exercise 3 in Section 2.10), find an appropriate order of differencing to obtains stationary data.
Answer
url <- "https://otexts.com/fpp2/extrafiles/retail.xlsx"
GET(url, write_disk("retail.xlsx", overwrite=TRUE))
## Response [https://otexts.com/fpp2/extrafiles/retail.xlsx]
## Date: 2021-03-25 15:35
## Status: 200
## Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
## Size: 639 kB
## <ON DISK> /Users/jeffshamp/Documents/GitHub/cuny_msds/DATA_624/retail.xlsx
retail<- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retail[, "A3349721R"], frequency = 12, start = c(1982, 1))
Given what we have learned in the previous question, this data set should be well handled by a transform and differencing.
myts %>%
BoxCox(., lambda = BoxCox.lambda(.)) %>%
autoplot() +
ggtitle("Box Cox Transform on Retail Data")
Nice. The log transform is not meaningfully different than BC, so maybe that is an alternative.
myts %>%
BoxCox(., lambda = BoxCox.lambda(.)) %>%
diff() %>%
autoplot() +
ggtitle("Box Cox Transform and First Difference on Retail Data")
Well stablized from the first difference. The second difference was not meaningfully better in terms of constant variance.
Question HA 8.6
Question
Use R to simulate and plot some data from simple ARIMA models.
Part A
Use the following R code to generate data from an AR(1) model with \(\phi_1 = 0.6\) and \(\sigma^2 = 1\). The process starts with \(y_1 = 0\).
Part B
Produce a time plot for the series. How does the plot change as you change \(\phi_1\)?
for (i in c(-0.95, -0.8, -0.6, -0.4, -0.2, 0.2, 0.4, 0.6, 0.8, 0.95)) {
y <- ts(numeric(100))
e <- rnorm(100)
for (j in 2:100) {
y[j] <- i * y[j - 1] + e[j]
}
plot1 <- autoplot(y) + xlab(i)
print(plot1)
}
The value chosen for \(\phi\) affects how volatile the series is. Negative values (close to -1) force the series to oscillate across the mean nearly every observation, whereas positive values for \(\phi\) (close to 1) show much less volatility and are smoother overall.
Part C
Write your own code to generate data from an MA(1) model with \(\theta_1 = 0.6\) and \(\sigma^2 = 1\).
set.seed(42)
for (theta in c(-1, -0.6, -0.3, 0, 0.3, 0.6, 1)) {
y <- ts(numeric(100))
e <- rnorm(100)
for (i in 2:100) {
y[i] <- theta * e[i - 1] + e[i]
}
print(autoplot(y) + xlab(theta))
}
Part D
Produce a time plot for the series. How does the plot change as you change \(\theta_1\)?
As \(\theta\) approaches 1, the time series appears to resemble a random walk. However, as \(\theta\) approaches -1, the time series oscillates a lot across the mean with almost every step in the series. For values close to 1, the plot is harder to describe (other than random walk), whereas with \(\theta\) close to -1, we see a rapid oscillation.
Part E
Generate data from an ARMA(1,1) model with \(\phi_1 = 0.6, \theta_1 = 0.6\), and \(\sigma^2 = 1\).
Part F
Generate data from an AR(2) model with \(\phi_1 = −0.8\), \(\phi_1 = 0.3\), and \(\sigma^2 = 1\). (Note that these parameters will give a non-stationary series.)
Part G
Graph the latter two series and compare them.
Well, these time series are extremely different. The AR 2 model is divergent. This is because The \(\phi_1\) parameter makes the series oscillate, but the \(\phi_2\) parameter has a positive sign and increases the size of the oscillations. Since \(\phi_1\) and \(\phi_2\) add to greater than 1, the oscillations diverge in a classic geometric way. This leads to a time series with ever increasing amplitudes rather than the stationary time series we observe with the ARMA model.
Question HA 8.7
Question
Consider wmurders , the number of women murdered each year (per 100,000 standard population) in the United States.
- By studying appropriate graphs of the series in R, find an appropirate ARIMA( ) model for these data.
- Should you include a constant in the model? Explain.
- Write this model in terms of the backshift operator.
- Fit the model using R and examine the residuals. Is the model satisfactory?
- Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.
- Create a plot of the series with forecasts and prediction intervals for the next three periods shown.
- Does auto.arima() give the same model you have chosen? If not, which model do you think is better?
Answer
Part A
First we inspect the data. These data appear to have one large cycle, but no seasonality. The trends seem a bit piece-wise. We have clear trends up from the 1950s, a relative plateau between 1970 and 1995 and a rapid decrease from there. We will not use a BC transform on this data due to these variations, we doubt much will be gained from the transform.
Next, we will determine some possible value for p, d, and q.
Since we have a decaying ACF and a single significant point for PACF, it seems reasonable that p is one, a first order AR.
We can also determine that the MA term is likely zero or one as the ACF cutoff is at nine, and slowly tapers which is more the signature of AR.
Next, we made a call to ndiffs
as also viewed the first difference (just to check) for stationary data. The second difference appears to be the better choice. Difference of two is also the output of the ndiffs
function. Thus, d is two.
data %>%
diff() %>%
diff() %>%
autoplot() +
ggtitle("Second difference of transformed wmurders data")
Part B
The differencing for this dataset to stabilize is second order so no constant is needed.
Part C
\((1 - \phi_1 B) *(1-B)^2y_t = (1-B \phi_1)\epsilon_t\)
Part D
##
## Call:
## arima(x = data, order = c(1, 2, 1))
##
## Coefficients:
## ar1 ma1
## -0.2434 -0.8261
## s.e. 0.1553 0.1143
##
## sigma^2 estimated as 0.04457: log likelihood = 6.44, aic = -6.88
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,2,1)
## Q* = 12.419, df = 8, p-value = 0.1335
##
## Model df: 2. Total lags used: 10
This is a decent model. The ACF plot suggests the residuals are acting like noise and the Ljung-Box test has a p-value of .133, which is above the critical value suggesting the residuals are noise. Other indicators like the distribution looking well centered at zero and nearly-normal points to the conclusion that the residuals are noise.
Part E and Part F
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2005 2.470660 2.200091 2.741229 2.056861 2.884459
## 2006 2.363106 1.993529 2.732684 1.797886 2.928327
## 2007 2.252833 1.774677 2.730989 1.521557 2.984110
y_t <- tail(data,3)
res <- tail(residuals(fit),1)
ar1 <- fit$coef['ar1']
ma1 <- fit$coef['ma1']
y1 <- (1.7566 * y_t[3]) - (0.5132 * y_t[2]) + (ar1 * y_t[1]) + (ma1 * res)
y2 <- 1.7566 * y1[1]- 0.5132 * y_t[3] + ar1 * y_t[2] + ma1 * 0
y3 <- 1.7566 * y2[1] - 0.5132 * y1[1] + ar1 * y_t[3] + ma1 * 0
check_df <- data.frame(cbind(fc$mean, c(y1,y2,y3)))
colnames(check_df) <- c("Model","Manual")
check_df
## Model Manual
## 1 2.470660 2.470527
## 2 2.363106 2.362740
## 3 2.252833 2.252133
A consistent result. Using a BC transform on this data would likely make this manual checking process infuriating - sensitivity of log/exponent tranformed numbers is tough.
Part G
If you run the basic sample space with auto.arima
we get a consistent result as our analysis.
## Series: data
## ARIMA(1,2,1)
##
## Coefficients:
## ar1 ma1
## -0.2434 -0.8261
## s.e. 0.1553 0.1143
##
## sigma^2 estimated as 0.04632: log likelihood=6.44
## AIC=-6.88 AICc=-6.39 BIC=-0.97
If we transverse the entire space, we find a different model that unlikely to be choosen by intutition.
## Series: data
## ARIMA(0,2,3)
##
## Coefficients:
## ma1 ma2 ma3
## -1.0154 0.4324 -0.3217
## s.e. 0.1282 0.2278 0.1737
##
## sigma^2 estimated as 0.04475: log likelihood=7.77
## AIC=-7.54 AICc=-6.7 BIC=0.35
A zero order AR and three order MA model does not seem like one that is likely to be choosen given the information covered in the text. That said, the AIC is significantly lower for this model. Let’s check the prediction plot.
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2005 2.408817 2.137718 2.679916 1.994206 2.823428
## 2006 2.365555 1.985092 2.746018 1.783687 2.947423
## 2007 2.290976 1.753245 2.828706 1.468588 3.113363
In practice this model from auto.arima
appears to predict a little flatter decrease in murders. We feel as though the arima(121) model is better in that it is more explanable to have a single AR term and single MA term rather than three MA terms. The change in AIC is meaningful in terms of simpler models overfit less, but the three step point forecast differences between the (023) and (121) are marginal. Just don’t forecast the (121) too much further.