Explain the differences among these figures. Do they all indicate that the data are white noise?
All above plots are ACF plots and all of them have some fluctuation.
Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?
The formula for critical value is \[\pm \frac{1.96}{\sqrt{T-d}}\] where T is the sample size and d is lag
T is 1000 and d =1, so value is +- 0.062, that’s why as the sample size increase the value of a significant correlation decreases.
Formula for auto correlation \[r_k = \frac{\Sigma_{t=k+1}^{T}(y_t-\bar{y})(y_{t-k}-\bar{y})}{\Sigma_{t=1}^{T}(y_t-\bar{y})^2}\]
library(fpp2)
ggtsdisplay(ibmclose)
Per 8.1, “Stationarity and Differencing”, a stationary time series is one whose properties do not depend on the time at which the series is observed. Thus, time series with trends, or with seasonality, are not stationary - the trend and seasonality will affect the value of the time series at different times.
usnetelec usgdp mcopper enplanements visitors
Solving for usnetelec - Annual US net electricity generation
#help("usnetelec")
head(usnetelec)
## Time Series:
## Start = 1949
## End = 1954
## Frequency = 1
## [1] 296.1 334.1 375.3 403.8 447.0 476.3
summary(usnetelec)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 296.1 889.0 2040.9 1972.1 3002.7 3858.5
autoplot(usnetelec)
The data is linearly increasing
Box.test(diff(usnetelec), type = "Ljung-Box")
##
## Box-Ljung test
##
## data: diff(usnetelec)
## X-squared = 0.8508, df = 1, p-value = 0.3563
The data is linearly increasing
library(tseries)
kpss.test(diff(usnetelec))
##
## KPSS Test for Level Stationarity
##
## data: diff(usnetelec)
## KPSS Level = 0.15848, Truncation lag parameter = 3, p-value = 0.1
kpss test shows that it becomes stationary after first differentiating
Solving for usgdp - Quarterly US GDP - Quarterly US GDP. 1947:1 - 2006.1.
#help("usgdp")
head(usgdp)
## Qtr1 Qtr2 Qtr3 Qtr4
## 1947 1570.5 1568.7 1568.0 1590.9
## 1948 1616.1 1644.6
summary(usgdp)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1568 2632 4552 5168 7130 11404
autoplot(usgdp)
The data is linearly increasing and it appears that it needs only first differentiating.
Box.test(diff(usgdp), type = "Ljung-Box")
##
## Box-Ljung test
##
## data: diff(usgdp)
## X-squared = 39.187, df = 1, p-value = 3.85e-10
first differenced usnetelec data cannot be thought of as a white noise series.
library(tseries)
autoplot(diff(usgdp))
ndiffs(usgdp)
## [1] 2
autoplot(diff(diff(usgdp)))
Box.test(diff(diff(usgdp)), type = "Ljung-Box")
##
## Box-Ljung test
##
## data: diff(diff(usgdp))
## X-squared = 53.294, df = 1, p-value = 2.872e-13
ggAcf(diff(diff(usgdp)))
kpss.test(diff(diff(usgdp)))
## Warning in kpss.test(diff(diff(usgdp))): p-value greater than printed p-
## value
##
## KPSS Test for Level Stationarity
##
## data: diff(diff(usgdp))
## KPSS Level = 0.020988, Truncation lag parameter = 4, p-value = 0.1
kpss test shows that differencings twice with Box-Cox transformation was able to make the data stationary.
Solving for mcopper - Monthly copper prices
#help("mcopper")
head(mcopper)
## Jan Feb Mar Apr May Jun
## 1960 255.2 259.7 249.3 258.0 244.3 246.8
summary(mcopper)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 216.6 566.0 949.2 997.8 1262.5 4306.0
autoplot(mcopper)
lambda_mcopper <- BoxCox.lambda(mcopper)
autoplot(diff(BoxCox(mcopper, lambda_mcopper)))
Box.test(diff(BoxCox(mcopper, lambda_mcopper)), type = "Ljung-Box")
##
## Box-Ljung test
##
## data: diff(BoxCox(mcopper, lambda_mcopper))
## X-squared = 57.517, df = 1, p-value = 3.353e-14
ggAcf(diff(BoxCox(mcopper, lambda_mcopper)))
kpss.test(diff(BoxCox(mcopper, lambda_mcopper)))
## Warning in kpss.test(diff(BoxCox(mcopper, lambda_mcopper))): p-value
## greater than printed p-value
##
## KPSS Test for Level Stationarity
##
## data: diff(BoxCox(mcopper, lambda_mcopper))
## KPSS Level = 0.057275, Truncation lag parameter = 6, p-value = 0.1
kpss test shows that differencings with Box-Cox transformation was able to make the data stationary.
enplanements - Monthly US domestic enplanements
#help("enplanements")
head(enplanements)
## Jan Feb Mar Apr May Jun
## 1979 21.12 22.92 25.90 24.38 23.41 26.82
summary(enplanements)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 20.14 27.18 34.88 35.67 42.78 56.14
autoplot(enplanements)
lambda_enplanements <- BoxCox.lambda(enplanements)
ndiffs(enplanements)
## [1] 1
autoplot(diff(diff(BoxCox(enplanements, lambda_enplanements), lag = 12)))
Box.test(diff(diff(BoxCox(enplanements, lambda_enplanements), lag = 12)), type = "Ljung-Box")
##
## Box-Ljung test
##
## data: diff(diff(BoxCox(enplanements, lambda_enplanements), lag = 12))
## X-squared = 29.562, df = 1, p-value = 5.417e-08
ggAcf(diff(diff(BoxCox(enplanements, lambda_enplanements),lag = 12)))
kpss.test(diff(diff(BoxCox(enplanements, lambda_enplanements),lag = 12)))
## Warning in kpss.test(diff(diff(BoxCox(enplanements, lambda_enplanements), :
## p-value greater than printed p-value
##
## KPSS Test for Level Stationarity
##
## data: diff(diff(BoxCox(enplanements, lambda_enplanements), lag = 12))
## KPSS Level = 0.042424, Truncation lag parameter = 5, p-value = 0.1
kpss test shows that differencings with Box-Cox transformation was able to make the data stationary.
Solving for visitors - Monthly Australian overseas vistors
#help("visitors")
head(visitors)
## May Jun Jul Aug Sep Oct
## 1985 75.7 75.4 83.1 82.9 77.3 105.7
summary(visitors)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 75.4 189.2 303.1 288.2 378.7 593.1
autoplot(visitors)
lambda_visitors <- BoxCox.lambda(visitors)
ndiffs(visitors)
## [1] 1
autoplot(diff(diff(BoxCox(visitors, lambda_visitors),lag = 12)))
Box.test(diff(diff(BoxCox(visitors, lambda_visitors),lag = 12)),type = "Ljung-Box")
##
## Box-Ljung test
##
## data: diff(diff(BoxCox(visitors, lambda_visitors), lag = 12))
## X-squared = 21.804, df = 1, p-value = 3.02e-06
ggAcf(diff(diff(BoxCox(visitors, lambda_visitors),lag = 12)))
kpss.test(diff(diff(BoxCox(visitors, lambda_visitors),lag = 12)))
## Warning in kpss.test(diff(diff(BoxCox(visitors, lambda_visitors), lag =
## 12))): p-value greater than printed p-value
##
## KPSS Test for Level Stationarity
##
## data: diff(diff(BoxCox(visitors, lambda_visitors), lag = 12))
## KPSS Level = 0.015833, Truncation lag parameter = 4, p-value = 0.1
kpss test shows that differencings with Box-Cox transformation was able to make the data stationary.
retaildata <- readxl::read_excel("C:\\NITEEN\\GitHub\\CUNY_DATA_624\\Dataset\\retail.xlsx", skip=1)
#A3349661X
myts <- ts(retaildata$A3349661X, start = c(1982,4),frequency = 12)
autoplot(myts) + ggtitle("Retail Data Time Series")
ndiffs(myts)
## [1] 1
nsdiffs(myts)
## [1] 1
kpss.test(diff(diff(BoxCox(myts, BoxCox.lambda(myts)),lag = 12)))
## Warning in kpss.test(diff(diff(BoxCox(myts, BoxCox.lambda(myts)), lag =
## 12))): p-value greater than printed p-value
##
## KPSS Test for Level Stationarity
##
## data: diff(diff(BoxCox(myts, BoxCox.lambda(myts)), lag = 12))
## KPSS Level = 0.022485, Truncation lag parameter = 5, p-value = 0.1
In ordet to make data stationary - performed Box-Cox transformation, 1 first differencing and 1 seasonal differencing
Use the following R code to generate data from an AR(1) model with $ \(\theta_1=0.6\) and \(\sigma^2=1\) . The process starts with y1=0.
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100){
y[i] <- 0.6*y[i-1] + e[i]
}
Produce a time plot for the series. How does the plot change as you change \(\theta_1\) ?
AR1model <- function(phi1){
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100){
y[i] <- phi1*y[i-1] + e[i]
}
return(y)
}
Write your own code to generate data from an MA(1) model with \(\theta_1=0.6\) and \(\sigma^2=1\)
autoplot(AR1model(0.3), series = "0.3") +
geom_line(size = 1, colour = "red") +
autolayer(y, series = "0.6", size = 1) +
autolayer(AR1model(0.9), size = 1, series = "0.9") +
ylab("AR(1) models") +
guides(colour = guide_legend(title = "Phi1"))
phi increases, the variation of y increased
MA1Model <- function(theta1){
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100){
y[i] <- theta1*e[i-1] + e[i]
}
return(y)
}
MA(1) model with input theta1.
Produce a time plot for the series. How does the plot change as you change \(\theta_1\) ?
autoplot(MA1Model(0.3), series = "0.3") +
geom_line(size = 1, colour = "red") +
autolayer(y, series = "0.6", size = 1) +
autolayer(AR1model(0.9), size = 1, series = "0.9") +
ylab("MA(1) models") +
guides(colour = guide_legend(title = "Theta1"))
theta increases, the variation of y increased
Generate data from an ARMA(1,1) model with \(\phi_{1} = 0.6\) , \(\theta_{1} = 0.6\) and \(\sigma^2=1\)
ARMA.1.1 <- ts(numeric(50))
e <- rnorm(50)
for(i in 2:50){
ARMA.1.1[i] <- 0.6*ARMA.1.1[i-1] + 0.6*e[i-1] + e[i]
}
Generate data from an AR(2) model with \(\phi_{1} =-0.8\), \(\phi_{2} = 0.3\) and \(\sigma^2=1\) . (Note that these parameters will give a non-stationary series.)
ARMA2 <- ts(numeric(50))
e <- rnorm(50)
for(i in 3:50){
ARMA2[i] <- -0.8*ARMA2[i-1] + 0.3*ARMA2[i-2] + e[i]
}
Graph the latter two series and compare them.
autoplot(ARMA.1.1, series = "ARMA(1, 1)") +
autolayer(ARMA2, series = "AR(2)") +
ylab("y") +
guides(colour = guide_legend(title = "Models"))
autoplot(ARMA.1.1)
AR(2) model was fluctuating but data from an ARMA(1, 1) model were stationary.
By studying appropriate graphs of the series in R, find an appropriate ARIMA(p,d,q ) model for these data.
autoplot(wmurders)
The data don’t need seasonal differencing or Box-Cox transformation.
autoplot(diff(wmurders))
ndiffs(wmurders)
## [1] 2
It appears that it would take 2 difference to become stattionary
autoplot(diff(wmurders, differences = 2))
kpss.test(diff(wmurders, differences = 2))
## Warning in kpss.test(diff(wmurders, differences = 2)): p-value greater than
## printed p-value
##
## KPSS Test for Level Stationarity
##
## data: diff(wmurders, differences = 2)
## KPSS Level = 0.045793, Truncation lag parameter = 3, p-value = 0.1
We can see that using parameter differences = 2, makes the data stationary.
diff(wmurders, differences = 2) %>% ggtsdisplay()
We can observe that ARIMA(0, 2, 2) is preferred to model the data.
Should you include a constant in the model? Explain.
A constant will be excluded in the model as ARIMA model of the data includes twice differencing.
Write this model in terms of the backshift operator.
\[(1 - B)^2*yt = (1 + theta1*B + theta2*B^2)*et \]
Fit the model using R and examine the residuals. Is the model satisfactory?
wmurders.ARMA2.2 <- Arima(wmurders, order = c(0, 2, 2))
checkresiduals(wmurders.ARMA2.2)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,2,2)
## Q* = 11.764, df = 8, p-value = 0.1621
##
## Model df: 2. Total lags used: 10
The residuals of the model appears to be white noise series.
Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.
FC.wmurders.ARMA2.2 <- forecast(wmurders.ARMA2.2, h = 3)
FC.wmurders.ARMA2.2$mean
## Time Series:
## Start = 2005
## End = 2007
## Frequency = 1
## [1] 2.480525 2.374890 2.269256
FC.wmurders.ARMA2.2$model
## Series: wmurders
## ARIMA(0,2,2)
##
## Coefficients:
## ma1 ma2
## -1.0181 0.1470
## s.e. 0.1220 0.1156
##
## sigma^2 estimated as 0.04702: log likelihood=6.03
## AIC=-6.06 AICc=-5.57 BIC=-0.15
years <- length(wmurders)
e <- FC.wmurders.ARMA2.2$residuals
fc1 <- 2*wmurders[years] - wmurders[years - 1] - 1.0181*e[years] + 0.1470*e[years - 1]
fc2 <- 2*fc1 - wmurders[years] + 0.1470*e[years]
fc3 <- 2*fc2 - fc1
c(fc1, fc2, fc3)
## [1] 2.480523 2.374887 2.269252
The manual forecasts are very similar to ARIMA function.
Create a plot of the series with forecasts and prediction intervals for the next three periods shown.
autoplot(FC.wmurders.ARMA2.2)
Does auto.arima() give the same model you have chosen? If not, which model do you think is better?
FC.wmurders_autoarima <- forecast(auto.arima(wmurders), h = 3)
accuracy(FC.wmurders.ARMA2.2)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.0113461 0.2088162 0.1525773 -0.2403396 4.331729 0.9382785
## ACF1
## Training set -0.05094066
accuracy(FC.wmurders_autoarima)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.01065956 0.2072523 0.1528734 -0.2149476 4.335214 0.9400996
## ACF1
## Training set 0.02176343
ARIMA(0, 2, 2) is better than ARIMA(1, 2, 1)
FC.wmurders_autoarima2 <- forecast(auto.arima(wmurders, stepwise = FALSE, approximation = FALSE), h = 3)
accuracy(FC.wmurders_autoarima2)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.01336585 0.2016929 0.1531053 -0.3332051 4.387024 0.9415259
## ACF1
## Training set -0.03193856
Using Auto Arima, errors do not show better performance or significant improvement
ggtsdisplay(diff(wmurders, differences = 2))
ARIMA(0, 2, 2) rather than ARIMA(0, 2, 3).
checkresiduals(FC.wmurders.ARMA2.2)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,2,2)
## Q* = 11.764, df = 8, p-value = 0.1621
##
## Model df: 2. Total lags used: 10
checkresiduals(FC.wmurders_autoarima2)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,2,3)
## Q* = 10.706, df = 7, p-value = 0.152
##
## Model df: 3. Total lags used: 10
By looking a the residual results - ARIMA(0, 2, 2) is preferred.