DATA 624 HW# 6

=======================

Chapter 8 ARIMA Models

8.1 Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

Figure 8.31: Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers.

a. Explain the differences among these figures. Do they all indicate that the data are white noise?

All the 3 plots show the autocorrelation of the observations with future observations at a distance of 1, 2, 3, ..., 20 steps ahead. The smaller the sample size, the more of chance there is we will observe a stronger than below random noise level correlation (past confidence bands). This does not seem the case in these plots. Also, we see how the random correlation between the numbers appears evenly distributed as positive or negative.

b. Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The correlation function has the form:

\[\pm \frac{z 1-\alpha / 2}{\sqrt{N}}\]

and the confidence bands are dependent on the sample size. The larger the sample size, the smaller the confidence bands are as we can see in this plot.

Autocorrelation Plot *https://www.itl.nist.gov/div898/handbook/eda/section3/eda331.htm*

8.2 A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

From time plot

Stationary series have no predictible patterns in the long term. As can be seen in the time plot, this is not the case in the IBM series as we can see trends in the stock price. There is a long cycle (day 0 to around 280) of increase - decrease followed by a smaller cycle (around day 280 to end of series) of stock price increase - decrease.

From ACF and PACF

As defined in Forecasting: Principles and Practice:

For a stationary time series, the ACF will drop to zero relatively quickly, while the ACF of non-stationary data decreases slowly. Also, for non-stationary data, the value of \(r_{1}\) is often large and positive.

In this data set we clearly see the behavior proper of a non-stationary data as the the ACF of the lags decreases slowly and the value of \(r_{1}\) is large and positive.

The PACF plots indicates that there is a strong correlation between the points at t and t-1 but no signficant correlation between points t and t-2, t and t-3 and so on.

Learn About Time Series ACF and PACF in SPSS With Data From the USDA Feed Grains Database (1876–2015) *http://methods.sagepub.com/base/download/DatasetStudentGuide/time-series-acf-pacf-in-us-feedgrains-1876-2015*

8.3 For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

usnetelec - Annual US net electricity generation

The original data set has only an increasing trend as its most important feature rather than changes in variance. As we can see below, a BoxCox transformation using a lamda of 0.5 does not seem to show an improvemnt on the series.

A single order of differencing as shown below is enough to transform the Electricity Generation series to a stationary condition.

usgdp - Quarterly US GDP

The main feature of the Quarterly US GDP series in the increasing trend. A single order differencing and a BoxCox transformation using a lambda parameter of 1/3 seems to stabilize the variance of the data.

mcopper - Monthly copper prices

The main features of the Monthly copper prices series in the increasing trend combined with an increase in the variance variability with time. A single order differencing and a BoxCox transformation using a lambda parameter of 1/3 seems to stabilize the variance of the data.

enplanements - Monthly US domestic enplanements

The main features of the Monthly US domestic enplanements series in the increasing trend combined with an increase in the variance variability with time. A single order differencing and a BoxCox transformation using a lambda parameter of -1/4 seems to stabilize the variance of the data.

visitors - Monthly Australian overseas vistors

The main features of the Monthly Australian overseas vistors series in the increasing trend combined with an increase in the variance variability with time. A single order differencing and a BoxCox transformation using a lambda parameter of 1/4 seems to stabilize the variance of the data.

8.5 For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

The main features of the Monthly Australian retail data series in the increasing trend combined with an increase in the variance variability with time. A single order differencing and a BoxCox transformation using a natural log (base e) transformation of the series to stabilize the variance of the data.

We followed this transformation with first a montly differencing operation using lag 1. And we follow with a seasonal differencing operation using las 12.

The final result behaves as a stationary series.

8.6 Use R to simulate and plot some data from simple ARIMA models.

An AR(p) or an autoregressive model of order p can be written as:

\[y_{t}=c+\phi_{1} y_{t-1}+\phi_{2} y_{t-2}+\cdots+\phi_{p} y_{t-p}+\varepsilon_{t}\]

Here ϕ are the parameters of the process, p is the order of the process and \(\varepsilon_{t}\) is the white noise. Where MA(q) is a weighted average over the error terms (white noise), AR(p) is a weighted average over the previous values of the series \(X_{t-p}\). Note that this process also has a white noise variable, which makes this a stochastic series.

while and MA(q) process has the form:

\[y_{t}=c+\varepsilon_{t}+\theta_{1} \varepsilon_{t-1}+\theta_{2} \varepsilon_{t-2}+\cdots+\theta_{q} \varepsilon_{t-q}\]

Where θ are the parameters of the process, q is the order of the process and ϵ is the white noise. With order we mean how many time steps q we should include in the weighted average.

From Algorithm Breakdown: AR, MA and ARIMA models https://www.ritchievink.com/blog/2018/09/26/algorithm-breakdown-ar-ma-and-arima-models/

and Forecasting: Principles and Practice https://otexts.com/fpp2/arima.html

a. Use the following R code to generate data from an AR(1) model with \(\phi_{1}=0.6\) and \(\sigma^{2}=1\). The process starts with \(y_{1}=0\).

set.seed(5)
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
AR1 <- y

A histogram of the generated y values from the AR(1) model shows the normal distribution of white noise.

b. Produce a time plot for the series. How does the plot change as you change \(\phi_{1} ?\)

From the evolution below of increasing \(\phi_{1}\) values we see how the white noise properties decrease with increasing \(\phi_{1}\) parameters. At \(\phi_{1}\) close to one, we start seeing periodicity in the series.

Evolution of AR(1) Model with Different Phi Coefficients

c. Write your own code to generate data from an MA(1) model with \(\theta_{1}=0.6 \text { and } \sigma^{2}=1\)

set.seed(5)
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*e[i-1] +  e[i]
MA1 <- y

A histogram of the generated y values from the MA(1) model shows the normal distribution of white noise.

d. Produce a time plot for the series. How does the plot change as you change \(\theta_{1} ?\)

Similarly as the case of increasing \(\phi_{1}\) for AR(1), increasing \(\theta_{1}\) values we see how the white noise properties decrease with increasing \(\theta_{1}\) parameters. Also, when \(\theta_{1}\) is close to one, we start seeing periodicity in the series.

Evolution of MA(1) Model with Different Theta Coefficients

e. Generate data from an ARMA(1,1) model with \(\phi_{1}=0.6, \theta_{1}=0.6 \text { and } \sigma^{2}=1\)

An ARMA model has the form:

\[\begin{array}{c}y_{t}=\delta+\left\{\phi_{1} y_{t-1}+\phi_{2} y_{t-2}+\cdots+\phi_{p} y_{t-p}\right\}+\left\{\theta_{1} \epsilon_{t-1}+\theta_{2} \epsilon_{t-2}+\cdots+\theta_{q} \epsilon_{t-q}\right\}+\epsilon_{t} \\ \Longrightarrow y_{t}=\delta+\sum_{i=1}^{p} \phi_{i} y_{t-i}+\sum_{j=1}^{q} \theta_{j} \epsilon_{t-j}+\epsilon_{t}\end{array}\]

From: ARIMA model for forecasting– Example in R

https://rstudio-pubs-static.s3.amazonaws.com/345790_3c1459661736433382863ed19c30ea55.html

set.seed(5)
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*e[i-1] + 0.6*y[i-1] + e[i]
ARMA11 <- y

f. Generate data from an AR(2) model with \(\phi_{1}=-0.8, \phi_{2}=0.3 \text { and } \sigma^{2}=1\)

set.seed(5)
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 3:100)
  y[i] <- -0.8*y[i-1] + 0.3*y[i-2] + e[i]
AR2 <- y

g. Graph the latter two series and compare them.

We can see how the ARMA(1,1) generated series is a mix of AR(1) and MA(1) and all three series behave as stationary. The AR(2) series is much different than all other three showing periodicity and increasing variability in the variance.

8.7 Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.

From the plot above we notice that the data series does not show seasonality. However, it does show a increasing - stable - decreasing trend over the range of the data set.

a. By studying appropriate graphs of the series in R, find an appropriate ARIMA(p,d,q) model for these data.

The plot of the data series below and the ACF and PACF show that the series is non-stationary and needs to differencing.

What should be the order of the differencing? Let's compare the RMSE of order 1 and 2 and see which one is the lowest:

## Series: wmurders 
## ARIMA(0,1,0) 
## 
## sigma^2 estimated as 0.04563:  log likelihood=6.73
## AIC=-11.46   AICc=-11.38   BIC=-9.47

##                      ME      RMSE       MAE        MPE     MAPE      MASE
## Training set 0.00295268 0.2116718 0.1597016 -0.0635393 4.638393 0.9820898
##                     ACF1
## Training set -0.08564966

## Series: wmurders 
## ARIMA(0,2,0) 
## 
## sigma^2 estimated as 0.1007:  log likelihood=-14.38
## AIC=30.76   AICc=30.84   BIC=32.73

##                         ME      RMSE       MAE         MPE     MAPE     MASE
## Training set -0.0001657022 0.3115711 0.2248982 -0.02948252 6.457417 1.383019
##                    ACF1
## Training set -0.6830188

The first order differencing shows the lowest RMSE and therefore is the most appropiate one to use in our model

The plot of the data after a first differencing (lag 1) shows a pattern closer to a white noise series.

We observe both a strong autocorrelation and partial autocorrelation at lag 2. These peaks in the ACF and PACF would indicate an ARIMA(2,1,2) model.

b. Should you include a constant in the model? Explain.

If c=0 and d=0 , the long-term forecasts will go to zero.
If c=0 and d=1 , the long-term forecasts will go to a non-zero constant.
If c=0 and d=2 , the long-term forecasts will follow a straight line.
If c≠0 and d=0 , the long-term forecasts will go to the mean of the data.
If c≠0 and d=1 , the long-term forecasts will follow a straight line.
If c≠0 and d=2 , the long-term forecasts will follow a quadratic trend.

The constant is zero as we require one order of differencing to make our data stationary. Also, any long term forecast based on our data is best fit by a straight line.

c. Write this model in terms of the backshift operator.

The general form of the ARIMA(p,d,q) model is:

\[\left(1-\phi_{1} B-\cdots-\phi_{p} B^{p}\right) \quad(1-B)^{d} y_{t}=c+\left(1+\theta_{1} B+\cdots+\theta_{q} B^{q}\right) \varepsilon_{t}\] For our ARIMA(2,1,2) we will have:

\[\left ( {1-{\phi }_{1}B- {\phi }_{2}{B}^{2}} \right )\quad (1-B{)}^{1}{y}_{t}=c+\left ( {1+{\theta }_{1}B+{\theta }_{2}{B}^{2}} \right ){\varepsilon }_{t}\]

d. Fit the model using R and examine the residuals. Is the model satisfactory?

Let's check the residuals, ACF and PACF of the model.

All spikes are within limits and the residuals appear to be white noise and whith a normal distribution.

e. Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.

Let's calculate the 3 first forecast using R:

##      Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 2005       2.334533 2.082481 2.586586 1.949052 2.720015
## 2006       2.309009 1.959969 2.658050 1.775198 2.842821
## 2007       2.428477 1.931931 2.925022 1.669075 3.187878

Let's calculate by hand:

First, let's expand the equation:

\[\left ( {1-{\phi }_{1}B- {\phi }_{2}{B}^{2}} \right )\quad (1-B{)}^{1}{y}_{t}=c+\left ( {1+{\theta }_{1}B+{\theta }_{2}{B}^{2}} \right ){\varepsilon }_{t}\]

into:

\[{y}_{t}=\left ( {1+{\phi }_{1}} \right ){y}_{t-1}-\left ( {{\phi }_{1}-{\phi }_{2}} \right ){y}_{t-2}-{\phi }_{2}{y}_{t-3}+{\varepsilon }_{t}+{\theta }_{1}{\varepsilon }_{t-1}+{\theta }_{2}{\varepsilon }_{t-2}\] We will need the AR, MA coefficients and the residuals:

Coefficients:

## Series: wmurders 
## ARIMA(2,1,2) 
## 
## Coefficients:
##          ar1      ar2      ma1     ma2
##       0.3862  -0.5075  -0.4354  0.9999
## s.e.  0.1385   0.1358   0.0546  0.0949
## 
## sigma^2 estimated as 0.03743:  log likelihood=11.86
## AIC=-13.71   AICc=-12.46   BIC=-3.77

Residuals and last observations:

## Time Series:
## Start = 2002 
## End = 2004 
## Frequency = 1 
## [1] 2.797697 2.662227 2.589383

## Time Series:
## Start = 2003 
## End = 2004 
## Frequency = 1 
## [1] -0.28441898  0.03653369

For the first forecast:

phi_1 <- 0.3862; phi_2 <- -0.5075; theta_1 <- -0.4354; theta_2 <- 0.9999
y_minus_1 <- 2.589383; y_minus_2 <- 2.662227; y_minus_3 <- 2.797697
res_minus_1 <- 0.03653369; res_minus_2 <- -0.28441898
y_tplus1 <-(1+phi_1)*y_minus_1 - (phi_1-phi_2)*y_minus_2 - phi_2*y_minus_3 + theta_1*res_minus_1 + theta_2*res_minus_2
y_tplus1

## [1] 2.329704

For the second forecast:

## [1] 2.302915

For the third forecast:

## [1] 2.424356

These 3 forecast calculations match (closely) the ones calculated by R using the ARIMA function.

f. Create a plot of the series with forecasts and prediction intervals for the next three periods shown.

g. Does auto.arima() give the same model you have chosen? If not, which model do you think is better?

## Series: wmurders 
## ARIMA(1,2,1) 
## 
## Coefficients:
##           ar1      ma1
##       -0.2434  -0.8261
## s.e.   0.1553   0.1143
## 
## sigma^2 estimated as 0.04632:  log likelihood=6.44
## AIC=-6.88   AICc=-6.39   BIC=-0.97

The function auto.arima() calculates different coefficients for ARIMA giving (1,2,1) instead of (2,1,2) that we calculated.

The fitness parameters of the ARIMA(1,2,1) model are:

##                       ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.01065956 0.2072523 0.1528734 -0.2149476 4.335214 0.9400996
##                    ACF1
## Training set 0.02176343

While the fitness parameters of the ARIMA(2,1,2) model are:

##                        ME      RMSE       MAE         MPE     MAPE      MASE
## Training set -2.38612e-05 0.1844534 0.1385308 -0.09500124 4.034883 0.8518993
##                     ACF1
## Training set -0.06334153

The ARIMA(2,1,2) model has better RMSE values. Also, the auto.arima() function forecast a reduction towards zero murder rate while the ARIMA(2,1,2) model shows the model reaching a plateu in the following years that in my opinion is more likely.