\label{fig:fig1}Forecasting: Principles and Practice.

Forecasting: Principles and Practice.

Instructions

From the book Forecasting Principles and Practice by Hyndman, R. & Athanasopoulus, G.

Please submit exercises 8.1, 8.2, 8.3, 8.5., 8.6 and 8.7 from the Hyndman online Forecasting book. Please submit both your Rpubs link as well as attach the .rmd file with your code.

Exercises

8.1

The following figure shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

\label{fig:fig2}Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers.

Left: ACF for a white noise series of 36 numbers. Middle: ACF for a white noise series of 360 numbers. Right: ACF for a white noise series of 1,000 numbers.

a)

Explain the differences among these figures. Do they all indicate that the data are white noise?

From what we can notice on those 3 different figures is that the bounds differ based on the length of the time series; that is, for a White noise series, we expect 95% of the spikes in the ACF to lie within \(\pm \frac{2}{\sqrt{T}}\) where \(T\) is the length of the time series.

Let’s calculate this number:

Length Lower Upper
36 -0.3333333 0.3333333
360 -0.1054093 0.1054093
1000 -0.0632456 0.0632456

By looking at the charts, there seem to indicate white noise based on the first 20 lags; even though on the second chart seems that two correlations are above the bounds; but this compared to the length of the time series is not statistical significant in order to think otherwise.

b)

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

Auto-correlation measures the linear relationship between lagged values of a time series.

When data have a trend, the auto-correlations for small lags tend to be large and positive because observations nearby in time are also nearby in size. So the ACF of trended time series tend to have positive values that slowly decrease as the lags increase.

When data are seasonal, the auto-correlations will be larger for the seasonal lags (at multiples of the seasonal frequency) than for other lags.

When data are both trended and seasonal, you see a combination of these effects.

When a time series show no auto-correlation are called white noise. The above graphs, show no auto-correlation.

For white noise series, we expect each auto-correlation to be close to zero. Of course, they will not be exactly equal to zero as there is some random variation. Also, from the above graphs, we can observe that all of the auto-correlation coefficients lie within these limits, confirming that the data are white noise.

8.2

A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

An ACF plot shows the auto-correlations which measure the relationship between \(y_t\) and \(y_{t-k}\) for different values of \(k\).

A Partial auto-correlations or PACF, measure the relationship between \(y_t\) and and \(y_{t-k}\) after removing the effects of lags \(1,2,3,..,k-1\). So the first partial auto-correlation is identical to the first auto-correlation, because there is nothing between them to remove. Each partial auto-correlation can be estimated as the last coefficient in an auto-regressive model.

For a stationary time series, the ACF will drop to zero relatively quickly, while the ACF of non-stationary data decreases slowly. Also, for non-stationary data, the value of \(r_1\) is often large and positive.

Now, by looking at the ACF plot, we notice how the data is decreasing slowly, making a reference to a non-stationary data; also, as we notice the auto-correlation \(r_1\) is large and positive, confirming the above description.

The PACF plot of the data show the following patterns:

  • the PACF is exponentially decaying or sinusoidal;
  • there is a significant spike at lag 1 in the ACF.

From the above graphs, we can notice note that the IBM stock price was non-stationary in the original plot of the data, but the lags between consecutive observations were auto-correlated as seeing on the ACF graph; however on the PACF graph it shows what seems to be a white noise series, indication that a differentiation has to be implemented.

8.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

usnetelec

Let’s have a first visualization of the data set.

The suggested \(\lambda\) = 0.5; hence, my suggested transformation will be a square root transformation.

Let’s compare the following charts, in order to decide.

From the above graph, it seems that this data does not have an evident seasonality.

Let’s see the following analysis as well.

Number of differences required for a seasonally stationary series

nsdiffs(myts.BC)
# Error in nsdiffs(myts.BC) : Non seasonal data

Number of differences required for a stationary series

ndiffs(myts.BC)
## [1] 2

Because nsdiffs() returns an error (indicating no seasonal difference is required), we apply the ndiffs() function to the seasonally differenced data. These functions suggest we should do a two difference and a first difference.

usgdp

Let’s have a first visualization of the data set.

The suggested \(\lambda\) = 0.37; hence, my suggested transformation will be a Cube root transformation.

Let’s compare the following charts, in order to decide.

From the above graph, it seems that this data does not have an evident seasonality.

It is interesting to note that it seems that the original data present seasonality as seeing in the following chart.

From my perspective, it seems that we need to eliminate the seasonality in the data, and then to have one difference in order to have white noise.

Let’s see the following analysis as well.

Number of differences required for a seasonally stationary series

nsdiffs(myts.BC)
## [1] 0

Number of differences required for a stationary series

ndiffs(myts.BC)
## [1] 1

Because nsdiffs() returns 0 (indicating no seasonal difference is required), we apply the ndiffs() function to the data. These functions suggest we should do a first difference only.

mcopper

Let’s have a first visualization of the data set.

The suggested \(\lambda\) = 0.19; hence, my suggested transformation will be a log transformation.

Let’s compare the following charts, in order to decide.

From the above graph, it seems that this data does not have an evident seasonality.

It is interesting to note that it seems that the original data present seasonality as seeing in the following chart.

From my perspective, it seems that we need to eliminate the seasonality in the data, and then to have one difference in order to have white noise.

Let’s see the following analysis as well.

Number of differences required for a seasonally stationary series

nsdiffs(myts.BC)
## [1] 0

Number of differences required for a stationary series

ndiffs(myts.BC)
## [1] 1

Because nsdiffs() returns 0 (indicating no seasonal difference is required), we apply the ndiffs() function to the data. These functions suggest we should do a first difference only.

enplanements

Let’s have a first visualization of the data set.

The suggested \(\lambda\) = -0.23; hence, my suggested transformation will be a log transformation.

Let’s compare the following charts, in order to decide.

From the above graph, it seems that this data presents some sort of an evident seasonality.

It is interesting to note that it seems that the original data present seasonality as seeing in the following chart.

From my perspective, it seems that we need to eliminate the seasonality in the data, and then to have one difference in order to have white noise.

Let’s see the following analysis as well.

Number of differences required for a seasonally stationary series

nsdiffs(myts.BC)
## [1] 1

Number of differences required for a stationary series

ndiffs(myts.BC)
## [1] 1

Because nsdiffs() returns 1 (indicating that a seasonal difference is required), we apply the ndiffs() function to the data. These functions suggest we should do a first difference after the seasonal difference.

Now, since the frequency on the series is yearly, in order to visualize the difference of the season and the extra difference, we do as follows:

frequency(myts.BC)
## [1] 12

visitors

Let’s have a first visualization of the data set.

The suggested \(\lambda\) = 0.28; hence, my suggested transformation will be a log transformation.

Let’s compare the following charts, in order to decide.

From the above graph, it seems that this data presents some sort of an evident seasonality.

It is interesting to note that it seems that the original data present seasonality as seeing in the following chart.

From my perspective, it seems that we need to eliminate the seasonality in the data, and then to have one difference in order to have white noise.

Let’s see the following analysis as well.

Number of differences required for a seasonally stationary series

nsdiffs(myts.BC)
## [1] 1

Number of differences required for a stationary series

ndiffs(myts.BC)
## [1] 1

Because nsdiffs() returns 1 (indicating that a seasonal difference is required), we apply the ndiffs() function to the data. These functions suggest we should do a first difference after the seasonal difference.

Now, since the frequency on the series is yearly, in order to visualize the difference of the season and the extra difference, we do as follows:

frequency(myts.BC)
## [1] 12

8.5

For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

Let’s recall that series:

read_excel(retail.xlsx)

My previously selected time series was A3349627V; and it represents the Turnover in New South Wales about Liquor retailing.

Let’s have a first visualization of the data set.

The suggested \(\lambda\) = -0.06; hence, my suggested transformation will be a log transformation.

Let’s compare the following charts, in order to decide.

From the above graph, it seems that this data presents some sort of an evident seasonality.

From my perspective, it seems that we need to eliminate the seasonality in the data, and then to have one difference in order to have white noise.

Let’s see the following analysis as well.

Number of differences required for a seasonally stationary series

nsdiffs(myts.BC)
## [1] 1

Number of differences required for a stationary series

ndiffs(myts.BC)
## [1] 1

Because nsdiffs() returns 1 (indicating that a seasonal difference is required), we apply the ndiffs() function to the data. These functions suggest we should do a first difference after the seasonal difference.

Now, since the frequency on the series is yearly, in order to visualize the difference of the season and the extra difference, we do as follows:

frequency(myts.BC)
## [1] 12

8.6

Use R to simulate and plot some data from simple ARIMA models.

a)

Use the following R code to generate data from an AR(1) model with \(\phi_{1} = 0.6\) and \(\sigma^2=1\). The process starts with \(y_1=0\).

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]

Let’s visualize the original time series as given above.

b)

Produce a time plot for the series. How does the plot change as you change \(\phi_1\).

\(\phi_1=0\)

Now, let’s visualize different series for different values of \(\phi_1\). Let’s visualize \(\phi_1=0\).

In this case, when \(\phi_1=0\), \(y_t\) is equivalent to white noise.

Comparing side by side

\(\phi_1=1\)

Now, let’s visualize different series for different values of \(\phi_1\). Let’s visualize \(\phi_1=1\).

In this case, when \(\phi_1=1\), \(y_t\) is equivalent to a random walk.

Comparing side by side

\(\phi_1 < 0\)

Now, let’s visualize different series for different values of \(\phi_1\). Let’s visualize \(\phi_1 < 0\).

y3 <- ts(numeric(100))
#e <- rnorm(100)
for(i in 2:100)
  y3[i] <- -0.5*y3[i-1] + e[i]

In this case, when \(\phi_1 < 0\), \(y_t\) tends to oscillate between positive and negative values.

Comparing side by side

c)

Write your own code to generate data from an MA(1) model with \(\theta_{1} = 0.6\) and \(\sigma^2=1\).

Moving average models

Preamble: Rather than using past values of the forecast variable in a regression, a moving average model uses past forecast errors in a regression-like model.

\[y_{t} = c + \varepsilon_t + \theta_{1}\varepsilon_{t-1} + \theta_{2}\varepsilon_{t-2} + \dots + \theta_{q}\varepsilon_{t-q}\]

where \(\varepsilon_t\) is white noise. We refer to this as an MA(q) model, a moving average model of order q.

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 1:100)
  y[i] <- 0.6 * e[i]

d)

Produce a time plot for the series. How does the plot change as you change \(\theta_1\)?

\(\theta_1 < 0\)

Now, let’s visualize different series for different values of \(\theta_1\). Let’s visualize \(\theta_1 < 0\).

In this case, when \(\theta_1 < 0\), \(y_t\) tends to oscillate between positive and negative values but the variance does not change. Changing the parameters results in different time series patterns. The variance of the error terms will only change the scale of the series.

e)

Generate data from an ARMA(1,1) model with \(\phi_{1} = 0.6\), \(\theta_{1} = 0.6\) and \(\sigma^2=1\).

y <- ts(numeric(100))
e <- rnorm(100)

phi1 <- 0.6
theta1 <- 0.6

for(i in 2:100)
  y[i] <- phi1 * y[i-1] + theta1 * e[i]

# Need for plot later on
y_ARMA <- y

f)

Generate data from an AR(2) model with \(\phi_{1} =-0.8\), \(\phi_{2} = 0.3\) and \(\sigma^2=1\). (Note that these parameters will give a non-stationary series.)

y <- ts(numeric(100))
#e <- rnorm(100)

phi1 <- -0.8
phi2 <- 0.3

for(i in 3:100)
  y[i] <- phi1 * y[i-1] + phi2 * y[i-2] + e[i] 

# Need for plot later on
y_AR <- y

g)

Graph the latter two series and compare them.

It seems that the the two series are diverting; one exponentially and the other is considered stationary.

8.7

Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.

a)

By studying appropriate graphs of the series in R, find an appropriate ARIMA(p,d,q) model for these data.

Let’s have a first visualization of the data set.

The suggested \(\lambda\) = -0.1; hence, my suggested transformation will be a log transformation.

Let’s compare the following charts, in order to decide.

From the above graph, it seems that this data presents some sort of an evident seasonality.

The data are clearly non-stationary, as the series wanders up and down for long periods. Consequently, we will take a first difference of the data.

The data may follow an ARIMA(p,d,0) model if the ACF and PACF plots of the differenced data show the following patterns:

the ACF is exponentially decaying or sinusoidal;
there is a significant spike at lag p

in the PACF, but none beyond lag p.

The data may follow an ARIMA(0,d,q) model if the ACF and PACF plots of the differenced data show the following patterns:

the PACF is exponentially decaying or sinusoidal;
there is a significant spike at lag q

in the ACF, but none beyond lag q.

As we can notice, the above plot shows the data seems to follow an ARIMA(p,d,0) model.

Let’s visualize the second difference.

The differenced data are shown in the above figure look stationary, and so we will not consider further differences.

We see that is one spike in the ACF, followed by white noise. In the PACF, there is a significant spike on lag 2, and then no significant spikes thereafter.

The PACF shown in the above figure, suggest an AR(2) model. So an initial candidate model is an ARIMA(2,0,0). There are no other obvious candidate models.

b)

Should you include a constant in the model? Explain.

The constant c has an important effect on the long-term forecasts obtained from these models.

  • If c=0 and d=0, the long-term forecasts will go to zero.

  • If c=0 and d=1, the long-term forecasts will go to a non-zero constant.

  • If c=0 and d=2, the long-term forecasts will follow a straight line.

  • If \(c\ne0\) and d=0, the long-term forecasts will go to the mean of the data.

  • If \(c\ne0\) and d=1, the long-term forecasts will follow a straight line.

  • If \(c\ne0\) and d=2, the long-term forecasts will follow a quadratic trend.

From the above, there might be an advantage to add a constant, that is, it will follow a straight line, meaning stationary data.

c)

Write this model in terms of the backshift operator.

\[(1-\phi_1B - \phi_2 B^2)y_t = c + \varepsilon_t\] where \(c = \mu(1- \phi_1 - \phi_2 )\).

d)

Fit the model using R and examine the residuals. Is the model satisfactory?

fit1 <- Arima(myts.BC, order=c(2,0,0))

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,0,0) with non-zero mean
## Q* = 13.555, df = 7, p-value = 0.05968
## 
## Model df: 3.   Total lags used: 10

From the residuals analysis, there seems to be a satisfactory result; the lags, seems not to be auto correlated and the Ljung-Box test is not giving an indication of the time series not being white noise since the p-value is relatively large.

e)

Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.

Let’s see the returned forecast table:

Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
2005 0.9563850 0.8773800 1.035390 0.8355573 1.077213
2006 0.9588985 0.8527127 1.065084 0.7965013 1.121296
2007 0.9615038 0.8347308 1.088277 0.7676212 1.155386

From d) it was determined that the Coefficients are:

## Series: myts.BC 
## ARIMA(2,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1     ar2    mean
##       0.8980  0.0701  1.0462
## s.e.  0.1334  0.1343  0.1820
## 
## sigma^2 estimated as 0.0038:  log likelihood=75.39
## AIC=-142.79   AICc=-141.99   BIC=-134.76
## 
## Training set error measures:
##                       ME       RMSE        MAE       MPE     MAPE
## Training set 0.005359604 0.05994303 0.04680461 0.2131439 3.974467
##                   MASE       ACF1
## Training set 0.9909796 0.02567834

It is easier to plot the inverse roots instead, as they should all lie within the unit circle.

In order to see more clearly the coefficients, lets take a look as follows:

Coefficients for \(\phi\).

## [1] 0.89802060 0.07012368

The below table shows a list of manually calculated values; the values are then compared to previously forecast values. We noticed how they match up to 6 decimals, the difference might be due to rounding.

The formula is:

\[y_{t+1} = c + \phi_1 \cdot y_t + \phi_2 \cdot y_{t-1}\]

Where \(\phi_1 = 0.89802060\), \(\phi_2 = 0.07012368\), \(\mu = 1.0462\) and \(c = \mu \cdot (1 - \phi_1 - \phi_2)\).

Let’s compare our forecasts:

ARIMA MANUAL Diff
2005 0.9563850 0.9563844 6.0e-07
2006 0.9588985 0.9588974 1.1e-06
2007 0.9615038 0.9615022 1.6e-06

f)

Create a plot of the series with forecasts and prediction intervals for the next three periods shown.

g)

Does auto.arima() give the same model you have chosen? If not, which model do you think is better?

fit2 <- auto.arima(myts.BC, seasonal=FALSE, stepwise=FALSE, approximation=FALSE)

The below results are given by the manually selected arima() model.

## Series: myts.BC 
## ARIMA(2,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1     ar2    mean
##       0.8980  0.0701  1.0462
## s.e.  0.1334  0.1343  0.1820
## 
## sigma^2 estimated as 0.0038:  log likelihood=75.39
## AIC=-142.79   AICc=-141.99   BIC=-134.76
## 
## Training set error measures:
##                       ME       RMSE        MAE       MPE     MAPE
## Training set 0.005359604 0.05994303 0.04680461 0.2131439 3.974467
##                   MASE       ACF1
## Training set 0.9909796 0.02567834

The below results are given by auto.arima().

## Series: myts.BC 
## ARIMA(1,2,1) 
## 
## Coefficients:
##           ar1      ma1
##       -0.2954  -0.7892
## s.e.   0.1533   0.1190
## 
## sigma^2 estimated as 0.003613:  log likelihood=74.09
## AIC=-142.17   AICc=-141.68   BIC=-136.26
## 
## Training set error measures:
##                        ME       RMSE       MAE        MPE     MAPE
## Training set -0.002716443 0.05788261 0.0437842 -0.1322084 3.621476
##                   MASE       ACF1
## Training set 0.9270295 0.01694217

We can notice a different model with a slightly higher \(AIC_c\); that is, the model selected manually has an \(AIC_c\) = -141.9886025 while the auto.arima() model returned an \(AIC_c\) = -141.6846625; hence by choosing the model with the smallest \(AIC_c\) will be the one manually selected, that is ARIMA(2,0,0) vs the automated returned model ARIMA(1,2,1).

References

Hyndman, R. & Athanasopoulos, G. 2019. Forecasting: Principles and Practice. Australia: Monash University. https://otexts.com/fpp2/.

R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.