DATA624

Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1000 random numbers.

Explain the differences among these figures. Do they all indicate that the data are white noise?

All above plots are ACF plots and all of them have some fluctuation.

x-axis show the correlation between different lags of the series.
The y-axis (the correlation) is consistent in each plot, but the x axis shows an increasing number of lags over the period of time.
corelations below the blue line indicates significance lag, hence the presense of white noise.
For all the plots, white noise is evident as the correlations of the lags are below the significance level.

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The formula for critical value is \[\pm \frac{1.96}{\sqrt{T-d}}\] where T is the sample size and d is lag

T is 1000 and d =1, so value is +- 0.062, that’s why as the sample size increase the value of a significant correlation decreases.

Formula for auto correlation \[r_k = \frac{\Sigma_{t=k+1}^{T}(y_t-\bar{y})(y_{t-k}-\bar{y})}{\Sigma_{t=1}^{T}(y_t-\bar{y})^2}\]

A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

library(fpp2)

ggtsdisplay(ibmclose)

Per 8.1, “Stationarity and Differencing”, a stationary time series is one whose properties do not depend on the time at which the series is observed. Thus, time series with trends, or with seasonality, are not stationary - the trend and seasonality will affect the value of the time series at different times.

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

usnetelec usgdp mcopper enplanements visitors

Solving for usnetelec - Annual US net electricity generation

#help("usnetelec")
head(usnetelec)

## Time Series:
## Start = 1949 
## End = 1954 
## Frequency = 1 
## [1] 296.1 334.1 375.3 403.8 447.0 476.3

summary(usnetelec)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   296.1   889.0  2040.9  1972.1  3002.7  3858.5

autoplot(usnetelec)

The data is linearly increasing

Box.test(diff(usnetelec), type = "Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  diff(usnetelec)
## X-squared = 0.8508, df = 1, p-value = 0.3563

The data is linearly increasing

library(tseries)
kpss.test(diff(usnetelec))

## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(usnetelec)
## KPSS Level = 0.15848, Truncation lag parameter = 3, p-value = 0.1

kpss test shows that it becomes stationary after first differentiating

Solving for usgdp - Quarterly US GDP - Quarterly US GDP. 1947:1 - 2006.1.

#help("usgdp")
head(usgdp)

##        Qtr1   Qtr2   Qtr3   Qtr4
## 1947 1570.5 1568.7 1568.0 1590.9
## 1948 1616.1 1644.6

summary(usgdp)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1568    2632    4552    5168    7130   11404

autoplot(usgdp)

The data is linearly increasing and it appears that it needs only first differentiating.

Box.test(diff(usgdp), type = "Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  diff(usgdp)
## X-squared = 39.187, df = 1, p-value = 3.85e-10

first differenced usnetelec data cannot be thought of as a white noise series.

library(tseries)
autoplot(diff(usgdp))

ndiffs(usgdp)

## [1] 2

autoplot(diff(diff(usgdp)))

Box.test(diff(diff(usgdp)), type = "Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  diff(diff(usgdp))
## X-squared = 53.294, df = 1, p-value = 2.872e-13

ggAcf(diff(diff(usgdp)))

kpss.test(diff(diff(usgdp)))

## Warning in kpss.test(diff(diff(usgdp))): p-value greater than printed p-
## value

## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(diff(usgdp))
## KPSS Level = 0.020988, Truncation lag parameter = 4, p-value = 0.1

kpss test shows that differencings twice with Box-Cox transformation was able to make the data stationary.

Solving for mcopper - Monthly copper prices

#help("mcopper")
head(mcopper)

##        Jan   Feb   Mar   Apr   May   Jun
## 1960 255.2 259.7 249.3 258.0 244.3 246.8

summary(mcopper)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   216.6   566.0   949.2   997.8  1262.5  4306.0

autoplot(mcopper)

lambda_mcopper <- BoxCox.lambda(mcopper)
autoplot(diff(BoxCox(mcopper, lambda_mcopper)))

Box.test(diff(BoxCox(mcopper, lambda_mcopper)), type = "Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  diff(BoxCox(mcopper, lambda_mcopper))
## X-squared = 57.517, df = 1, p-value = 3.353e-14

ggAcf(diff(BoxCox(mcopper, lambda_mcopper)))

kpss.test(diff(BoxCox(mcopper, lambda_mcopper)))

## Warning in kpss.test(diff(BoxCox(mcopper, lambda_mcopper))): p-value
## greater than printed p-value

## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(BoxCox(mcopper, lambda_mcopper))
## KPSS Level = 0.057275, Truncation lag parameter = 6, p-value = 0.1

kpss test shows that differencings with Box-Cox transformation was able to make the data stationary.

enplanements - Monthly US domestic enplanements

#help("enplanements")
head(enplanements)

##        Jan   Feb   Mar   Apr   May   Jun
## 1979 21.12 22.92 25.90 24.38 23.41 26.82

summary(enplanements)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   20.14   27.18   34.88   35.67   42.78   56.14

autoplot(enplanements)

lambda_enplanements <- BoxCox.lambda(enplanements)
ndiffs(enplanements)

## [1] 1

autoplot(diff(diff(BoxCox(enplanements, lambda_enplanements), lag = 12)))

Box.test(diff(diff(BoxCox(enplanements, lambda_enplanements), lag = 12)), type = "Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  diff(diff(BoxCox(enplanements, lambda_enplanements), lag = 12))
## X-squared = 29.562, df = 1, p-value = 5.417e-08

ggAcf(diff(diff(BoxCox(enplanements, lambda_enplanements),lag = 12)))

kpss.test(diff(diff(BoxCox(enplanements, lambda_enplanements),lag = 12)))

## Warning in kpss.test(diff(diff(BoxCox(enplanements, lambda_enplanements), :
## p-value greater than printed p-value

## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(diff(BoxCox(enplanements, lambda_enplanements), lag = 12))
## KPSS Level = 0.042424, Truncation lag parameter = 5, p-value = 0.1

kpss test shows that differencings with Box-Cox transformation was able to make the data stationary.

Solving for visitors - Monthly Australian overseas vistors

#help("visitors")
head(visitors)

##        May   Jun   Jul   Aug   Sep   Oct
## 1985  75.7  75.4  83.1  82.9  77.3 105.7

summary(visitors)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    75.4   189.2   303.1   288.2   378.7   593.1

autoplot(visitors)

lambda_visitors <- BoxCox.lambda(visitors)
ndiffs(visitors)

## [1] 1

autoplot(diff(diff(BoxCox(visitors, lambda_visitors),lag = 12)))

Box.test(diff(diff(BoxCox(visitors, lambda_visitors),lag = 12)),type = "Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  diff(diff(BoxCox(visitors, lambda_visitors), lag = 12))
## X-squared = 21.804, df = 1, p-value = 3.02e-06

ggAcf(diff(diff(BoxCox(visitors, lambda_visitors),lag = 12)))

kpss.test(diff(diff(BoxCox(visitors, lambda_visitors),lag = 12)))

## Warning in kpss.test(diff(diff(BoxCox(visitors, lambda_visitors), lag =
## 12))): p-value greater than printed p-value

## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(diff(BoxCox(visitors, lambda_visitors), lag = 12))
## KPSS Level = 0.015833, Truncation lag parameter = 4, p-value = 0.1

kpss test shows that differencings with Box-Cox transformation was able to make the data stationary.

For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

retaildata <- readxl::read_excel("C:\\NITEEN\\GitHub\\CUNY_DATA_624\\Dataset\\retail.xlsx", skip=1)

#A3349661X
myts <- ts(retaildata$A3349661X, start = c(1982,4),frequency = 12)
autoplot(myts) + ggtitle("Retail Data Time Series")

ndiffs(myts)

## [1] 1

nsdiffs(myts)

## [1] 1

kpss.test(diff(diff(BoxCox(myts, BoxCox.lambda(myts)),lag = 12)))

## Warning in kpss.test(diff(diff(BoxCox(myts, BoxCox.lambda(myts)), lag =
## 12))): p-value greater than printed p-value

## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(diff(BoxCox(myts, BoxCox.lambda(myts)), lag = 12))
## KPSS Level = 0.022485, Truncation lag parameter = 5, p-value = 0.1

In ordet to make data stationary - performed Box-Cox transformation, 1 first differencing and 1 seasonal differencing

Use R to simulate and plot some data from simple ARIMA models.

Use the following R code to generate data from an AR(1) model with $ $\theta_1=0.6$ and $\sigma^2=1$ . The process starts with y1=0.

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100){
   y[i] <- 0.6*y[i-1] + e[i]
}

Produce a time plot for the series. How does the plot change as you change $\theta_1$ ?

AR1model <- function(phi1){
  y <- ts(numeric(100))
  e <- rnorm(100)
  for(i in 2:100){
    y[i] <- phi1*y[i-1] + e[i]
  }
  return(y)
}

Write your own code to generate data from an MA(1) model with $\theta_1=0.6$ and $\sigma^2=1$

autoplot(AR1model(0.3), series = "0.3") +
  geom_line(size = 1, colour = "red") +
  autolayer(y, series = "0.6", size = 1) +
  autolayer(AR1model(0.9), size = 1, series = "0.9") +
  ylab("AR(1) models") +
  guides(colour = guide_legend(title = "Phi1"))

phi increases, the variation of y increased

MA1Model <- function(theta1){
  y <- ts(numeric(100))
  e <- rnorm(100)
  for(i in 2:100){
    y[i] <- theta1*e[i-1] + e[i]
  }
  return(y)
}

MA(1) model with input theta1.

Produce a time plot for the series. How does the plot change as you change $\theta_1$ ?

autoplot(MA1Model(0.3), series = "0.3") +
  geom_line(size = 1, colour = "red") +
  autolayer(y, series = "0.6", size = 1) +
  autolayer(AR1model(0.9), size = 1, series = "0.9") +
  ylab("MA(1) models") +
  guides(colour = guide_legend(title = "Theta1"))

theta increases, the variation of y increased

Generate data from an ARMA(1,1) model with $\phi_{1} = 0.6$ , $\theta_{1} = 0.6$ and $\sigma^2=1$

ARMA.1.1 <- ts(numeric(50))
e <- rnorm(50)
for(i in 2:50){
   ARMA.1.1[i] <- 0.6*ARMA.1.1[i-1] + 0.6*e[i-1] + e[i]
}

Generate data from an AR(2) model with $\phi_{1} =-0.8$, $\phi_{2} = 0.3$ and $\sigma^2=1$ . (Note that these parameters will give a non-stationary series.)

ARMA2 <- ts(numeric(50))
e <- rnorm(50)
for(i in 3:50){
   ARMA2[i] <- -0.8*ARMA2[i-1] + 0.3*ARMA2[i-2] + e[i]
}

Graph the latter two series and compare them.

autoplot(ARMA.1.1, series = "ARMA(1, 1)") +
  autolayer(ARMA2, series = "AR(2)") +
  ylab("y") +
  guides(colour = guide_legend(title = "Models"))

autoplot(ARMA.1.1)

AR(2) model was fluctuating but data from an ARMA(1, 1) model were stationary.

Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.

By studying appropriate graphs of the series in R, find an appropriate ARIMA(p,d,q ) model for these data.

autoplot(wmurders)

The data don’t need seasonal differencing or Box-Cox transformation.

autoplot(diff(wmurders))

ndiffs(wmurders)

## [1] 2

It appears that it would take 2 difference to become stattionary

autoplot(diff(wmurders, differences = 2))

kpss.test(diff(wmurders, differences = 2))

## Warning in kpss.test(diff(wmurders, differences = 2)): p-value greater than
## printed p-value

## 
##  KPSS Test for Level Stationarity
## 
## data:  diff(wmurders, differences = 2)
## KPSS Level = 0.045793, Truncation lag parameter = 3, p-value = 0.1

We can see that using parameter differences = 2, makes the data stationary.

diff(wmurders, differences = 2) %>% ggtsdisplay()

We can observe that ARIMA(0, 2, 2) is preferred to model the data.

Should you include a constant in the model? Explain.

A constant will be excluded in the model as ARIMA model of the data includes twice differencing.

Write this model in terms of the backshift operator.

\[(1 - B)^2*yt = (1 + theta1*B + theta2*B^2)*et \]

Fit the model using R and examine the residuals. Is the model satisfactory?

wmurders.ARMA2.2 <- Arima(wmurders, order = c(0, 2, 2))
checkresiduals(wmurders.ARMA2.2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,2,2)
## Q* = 11.764, df = 8, p-value = 0.1621
## 
## Model df: 2.   Total lags used: 10

The residuals of the model appears to be white noise series.

Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.

FC.wmurders.ARMA2.2 <- forecast(wmurders.ARMA2.2, h = 3)

FC.wmurders.ARMA2.2$mean

## Time Series:
## Start = 2005 
## End = 2007 
## Frequency = 1 
## [1] 2.480525 2.374890 2.269256

FC.wmurders.ARMA2.2$model

## Series: wmurders 
## ARIMA(0,2,2) 
## 
## Coefficients:
##           ma1     ma2
##       -1.0181  0.1470
## s.e.   0.1220  0.1156
## 
## sigma^2 estimated as 0.04702:  log likelihood=6.03
## AIC=-6.06   AICc=-5.57   BIC=-0.15

years <- length(wmurders)
e <- FC.wmurders.ARMA2.2$residuals
fc1 <- 2*wmurders[years] - wmurders[years - 1] - 1.0181*e[years] + 0.1470*e[years - 1]
fc2 <- 2*fc1 - wmurders[years] + 0.1470*e[years]
fc3 <- 2*fc2 - fc1

c(fc1, fc2, fc3)

## [1] 2.480523 2.374887 2.269252

The manual forecasts are very similar to ARIMA function.

Create a plot of the series with forecasts and prediction intervals for the next three periods shown.

autoplot(FC.wmurders.ARMA2.2)

Does auto.arima() give the same model you have chosen? If not, which model do you think is better?

FC.wmurders_autoarima <- forecast(auto.arima(wmurders), h = 3)
accuracy(FC.wmurders.ARMA2.2)

##                      ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.0113461 0.2088162 0.1525773 -0.2403396 4.331729 0.9382785
##                     ACF1
## Training set -0.05094066

accuracy(FC.wmurders_autoarima)

##                       ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.01065956 0.2072523 0.1528734 -0.2149476 4.335214 0.9400996
##                    ACF1
## Training set 0.02176343

ARIMA(0, 2, 2) is better than ARIMA(1, 2, 1)

FC.wmurders_autoarima2 <- forecast(auto.arima(wmurders, stepwise = FALSE, approximation = FALSE), h = 3)

accuracy(FC.wmurders_autoarima2)

##                       ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.01336585 0.2016929 0.1531053 -0.3332051 4.387024 0.9415259
##                     ACF1
## Training set -0.03193856

Using Auto Arima, errors do not show better performance or significant improvement

ggtsdisplay(diff(wmurders, differences = 2))

ARIMA(0, 2, 2) rather than ARIMA(0, 2, 3).

checkresiduals(FC.wmurders.ARMA2.2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,2,2)
## Q* = 11.764, df = 8, p-value = 0.1621
## 
## Model df: 2.   Total lags used: 10

checkresiduals(FC.wmurders_autoarima2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,2,3)
## Q* = 10.706, df = 7, p-value = 0.152
## 
## Model df: 3.   Total lags used: 10

By looking a the residual results - ARIMA(0, 2, 2) is preferred.

DATA624_HW6

Niteen Kumar

March 24, 2019