CUNY DATA624 HW6
CUNY DATA624 HW6
- 8.1 Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
- 8.2 A classic example of a non-stationary series is the daily closing IBM stock price series (data set
ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced. - 8.3 For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
- 8.5 For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
- 8.6 Use R to simulate and plot some data from simple ARIMA models.
- a) Use the following R code to generate data from an AR(1) model with \(\phi_1 = 0.6\) and \(\sigma^2 = 1\). The process starts with \(y_1 = 0\)
- b) Produce a time plot for the series. How does the plot change as you change \(\phi_1\)?
- c) Write your own code to generate data from an MA(1) model with \(\theta_1 = 0.6\) and \(\sigma^2 = 1\)
- d) Produce a time plot for the series. How does the plot change as you change \(\theta_1\)?
- e) Generate data from an ARMA(1,1) model with \(\phi_1 = 0.6\), \(\theta_1 = 0.6\) and \(\sigma^2 = 1\)
- f) Generate data from an AR(2) model with \(\phi_1 = -0.8\), \(\phi_2 = 0.3\), and \(\sigma^2=1\). (Note that these parameters will give a non-stationary series.)
- g) Graph the latter two series and compare them.
- 8.7 Consider
wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.- a) By studying appropriate graphs of the series in R, find an appropriate ARIMA(\(p,d,q\)) model for these data.
- b) Should you include a constant in the model? Explain.
- c) Write this model in terms of the backshift operator.
- d) Fit the model using R and examine the residuals. Is the model satisfactory?
- e) Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.
- f) Create a plot of the series with forecasts and prediction intervals for the next three periods shown.
- g) Does
auto.arima()give the same model you have chosen? If not, which model do you think is better?
8.1 Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
a) Explain the differences among these figures. Do they all indicate that the data are white noise?
Figure 8.31
The bounds for the autocorrelation coefficients for each series has different bounds. It appears that for each plot, all or almost all the autocorrelation coefficients lie within these bounds confirming that they are all white noise.
b) Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?
Each dataset has a different number of observations, which is why the critical values (I called them bounds earlier) are different. The larger the dataset, the smaller the critical values are. In a sense, you could say that the bounds become more restrictive (smaller) in larger data sets.
8.2 A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.
The plot of the daily closing prices for IBM stock is below. According to the text, non-stationary data should be differenced to achieve a stationary state. Looking at this, the data is trending downward, which is characteristic of a non-stationary series.
The ACF plot below is slowly decreasing and the \(r_1\) is fairly large (almost 1) and positive. These traits are indicative of a non-stationary series.
This last PACF plot also shows \(r_1\) as almost 1, which indicates a non-stationary series.
8.3 For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
a) usnetelec
A quick look at the data above confirms a non-stationary series. Because the data remains very linear, a transformation is not necessary. We also only need a single order difference. Once applied, we can see that the plot is now roughly horizontal, the \(r_1\) for both the ACF and PACF plots are close to 0. We can also see that ACF largely remains within the critical values confirming a white noise series, which is another characteristic of a stationary series.
b) usgdp
The above plots show that a transformation of the data is necessary in this case. We obtain the best lambda value below and apply the boxcox transformation and below we can see that the data is now appropriately transformed.
## [1] "Appropriate lambda for BoxCox transformation: 0.366352049520934"
Now that the data is appropriately transformed, a single order difference is all that is needed to be applied and we achieve a stationary series. The series is now roughly horizontal with a generally constant variance. \(r_1\) is high with a steep dropoff immediately in the ACF plot, while the rest of the data more or less stays within the critical values indicating white noise.
c) mcopper
We’ll apply the same methods in the previous two series.
## [1] "Appropriate lambda for BoxCox transformation: 0.191904709003829"
As before, with the same characteristics that define a stationary series, we see that exhibited below.
d) enplanements
The thing about this data that’s different from the previous data is that there appears to be an obvious seasonal component, which is confirmed in the below plot.
Now that we’ve confirmed seasonality in the data, when we difference the series later, we can do apply lag=12 to the difference.
## [1] "Appropriate lambda for BoxCox transformation: -0.226946111237065"
Above we notice that the data is not quite stationary just yet, so we’ll apply a first difference after the seasonal difference to achieve a stationary series.
e) visitors
As with the previous series, the visitors series is also seasonal.
As before, we’ll apply a seasonal difference followed by a first order difference to achieve a stationary series.
## [1] "Appropriate lambda for BoxCox transformation: 0.277524910835111"
8.5 For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349398A"],
frequency=12, start=c(1982,4))
autoplot(myts)The retail series above exhibits seasonal behavior, with an upward trend and increasing variance. The series could benefit from a BoxCox transformation
## [1] "Appropriate lambda for BoxCox transformation: 0.123156269082221"
In this case, I applied a seasonal difference followed by a first order difference after a BoxCox transformation. It’s closer as the series is now horizontal and the \(r_1\) values are high followed by a steep drop, but the ACF plot appears to show that the series is not white noise. The series might not be stationary yet, but I’m unaware of a more effected method.
8.6 Use R to simulate and plot some data from simple ARIMA models.
Source: https://en.wikipedia.org/wiki/Autoregressive%E2%80%93moving-average_model
a) Use the following R code to generate data from an AR(1) model with \(\phi_1 = 0.6\) and \(\sigma^2 = 1\). The process starts with \(y_1 = 0\)
Just to note, the formula for AR(q) is below:
Autoregressive formula
For an AR(1) model, we input \(q=1\) in the formula
ar1 <- function(phi){
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
y[i] <- phi*y[i-1] + e[i]
return(y)
}
autoplot(ar1(0.6))b) Produce a time plot for the series. How does the plot change as you change \(\phi_1\)?
When we change the value of \(\phi_1\), the time series patterns change as demonstrated by the plots below.
c) Write your own code to generate data from an MA(1) model with \(\theta_1 = 0.6\) and \(\sigma^2 = 1\)
The formula for the MA(q) model is as follows:
Moving Avg formula
As this is an MA(1) model, we input \(q=1\) into the formula.
We can recreate the above with the following code:
ma1 <- function(theta){
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
y[i] <- theta*e[i-1] + e[i]
return(y)
}
autoplot(ma1(0.6))d) Produce a time plot for the series. How does the plot change as you change \(\theta_1\)?
When we change the value of \(\theta_1\), the time series patterns change as demonstrated by the plots below.
e) Generate data from an ARMA(1,1) model with \(\phi_1 = 0.6\), \(\theta_1 = 0.6\) and \(\sigma^2 = 1\)
A combination of the autoregressive model and the moving average model, the ARMA(p,q) model is shown below:
ARMA(p,q) formula
For an ARMA(1,1) model, we input \(p=1,q=1\).
arma1_1 <- function(phi, theta, n=100){
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
y[i] <- phi*y[i-1] + theta*e[i-1] + e[i]
return(y)
}
autoplot(arma1_1(0.6, 0.6))f) Generate data from an AR(2) model with \(\phi_1 = -0.8\), \(\phi_2 = 0.3\), and \(\sigma^2=1\). (Note that these parameters will give a non-stationary series.)
For the AR(2) model, we input \(q=2\)
ar2 <- function(phi1, phi2, n=100){
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 3:100)
y[i] <- phi1*y[i-1] + phi2*y[i-2] + e[i]
return(y)
}
autoplot(ar2(-0.8,0.3))g) Graph the latter two series and compare them.
We have graphed the two together in the previous answers and immediately notice that the AR(2) series has an almost constant increase in variance over time.
8.7 Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.
a) By studying appropriate graphs of the series in R, find an appropriate ARIMA(\(p,d,q\)) model for these data.
From the above plots, we can see that the data first needs to be differenced.
Since we differenced just once and there appears to be no drift in our series, our \(d\) parameter in ARIMA(\(p,d,q\)) should be 1 (\(d=1\)). Because the lag 2 spikes are large in both the ACF and PACF plots, then we are looking at either an AR(2) or MA(2), so the more appropriate model would either be ARIMA(2,1,0) or ARIMA(0,1,2). We can determine the more appropriate one fro evaluating the \(AIC_c\) value for each model.
## [1] "AR(2) AICc = -12.4787165334312"
## [1] "MA(2) AICc = -12.9452660674903"
The \(AIC_c\) value for the MA(2) model is lower, so we’ll go with ARIMA(0,1,2).
b) Should you include a constant in the model? Explain.
No, we shouldn’t because as mentioned above, there doesn’t appear to be any drift in the series.
c) Write this model in terms of the backshift operator.
Based on the text: https://otexts.com/fpp2/non-seasonal-arima.html
\((1-By_t)=c + (1 + \theta_1B + \theta_2B^2)\varepsilon_t\)
d) Fit the model using R and examine the residuals. Is the model satisfactory?
From the checkresiduals function, we can see that thethe residuals exhibit homoscedasticity. The ACF plot of the residuals indicate that the residuals are white noise. The residuals are about normally distributed as well, so the model appears satisfactory.
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,2)
## Q* = 9.7748, df = 8, p-value = 0.2812
##
## Model df: 2. Total lags used: 10
e) Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2005 2.458450 2.195194 2.721707 2.055834 2.861066
## 2006 2.477101 2.116875 2.837327 1.926183 3.028018
## 2007 2.477101 1.979272 2.974929 1.715738 3.238464
Recall our model in terms of the backshift operator:
\((1-By_t)=c + (1 + \theta_1B + \theta_2B^2)\varepsilon_t\)
From the below, \(\theta_1=-0.0660\) and \(\theta_2=0.3712\). The last observation of our series is 2.589383. We only need the residuals from the last two observations for our calculations, which are -0.34376126 and 0.05023862.
## Series: wmurders
## ARIMA(0,1,2)
##
## Coefficients:
## ma1 ma2
## -0.0660 0.3712
## s.e. 0.1263 0.1640
##
## sigma^2 estimated as 0.0422: log likelihood=9.71
## AIC=-13.43 AICc=-12.95 BIC=-7.46
## Time Series:
## Start = 2004
## End = 2004
## Frequency = 1
## [1] 2.589383
## Time Series:
## Start = 2003
## End = 2004
## Frequency = 1
## [1] -0.34376126 0.05023862
We can rewrite the backshift operator this way below with our known values:
Recall that \(B=y_t-y_{t-1}\), so…
\(y_t=y_{t-1} + \varepsilon_t - 0.066\varepsilon_{t-1} + 0.3712\varepsilon_{t-2}\)
Now, we plug in the following last two residuals: \(\varepsilon_{t-2}=-0.34376126\) \(\varepsilon_{t-1}=0.05023862\)
Now, we can calculate the 3 point forecasts by hand and can see that the values in the forecast match.
pf1 <- 2.589383 + 0 - 0.066*0.05023862 + 0.3712*-0.34376126
pf2 <- pf1 + 0 + 0 + 0.3712*0.05023862
pf3 <- pf2
by_hand <- data.frame(c(pf1,pf2,pf3))
colnames(by_hand) <- c("3 Point Forecasts by hand")
knitr::kable(by_hand)| 3 Point Forecasts by hand |
|---|
| 2.458463 |
| 2.477112 |
| 2.477112 |
f) Create a plot of the series with forecasts and prediction intervals for the next three periods shown.
g) Does auto.arima() give the same model you have chosen? If not, which model do you think is better?
Below is the auto.arima function applied. We have to set d=1 for the appropriate differencing. We set the stepwise argument to FALSE to search over all potential models. We see below that the function also went with ARIMA(0,1,2).
autofit_arima <- auto.arima(wmurders, d=1, stepwise = FALSE)
forecast(autofit_arima, h=3) %>%
autoplot()