From the Hyman text book, perform exercises: ‘8.1, 8.2, 8.3, 8.5, 8.6, 8.7’
Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
Explain the differences among these figures. Do they all indicate that the data are white noise?
Why are the critical values at different distances from the mean of zero?
Why are the autocorrelations different in each figure when they each refer to white noise?
A classic example of a non-stationary series is the daily closing IBM stock price series (data set ‘ibmclose’). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differencing.
autoplot(ibmclose)
ggtsdisplay(ibmclose)
ndiffs(ibmclose)
## [1] 1
ggtsdisplay(ibmclose %>% log() %>% diff(1))
ibmclose %>% ur.kpss()%>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 3.6236
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
ibmclose %>% diff() %>% ur.kpss()%>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 5 lags.
##
## Value of test-statistic is: 0.4702
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
ndiff test indicates a single differnce should be usedFor the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
usnetelecautoplot(usnetelec)
ggtsdisplay(usnetelec)
autoplot(BoxCox(usnetelec, (BoxCox.lambda(usnetelec))))
ndiffs(usnetelec)
## [1] 1
ggtsdisplay(diff(usnetelec))
Box.test(diff(usnetelec))
##
## Box-Pierce test
##
## data: diff(usnetelec)
## X-squared = 0.80522, df = 1, p-value = 0.3695
ggtsdisplay(usnetelec %>% sqrt() %>% diff(1))
Box.test(sqrt(diff(usnetelec)))
## Warning in sqrt(diff(usnetelec)): NaNs produced
##
## Box-Pierce test
##
## data: sqrt(diff(usnetelec))
## X-squared = 3.5219, df = 1, p-value = 0.06056
usgdpautoplot(usgdp)
ggtsdisplay(usgdp)
autoplot(BoxCox(usgdp, (BoxCox.lambda(usgdp))))
ndiffs(usgdp)
## [1] 2
ggtsdisplay(diff(usgdp))
Box.test(diff(usgdp))
##
## Box-Pierce test
##
## data: diff(usgdp)
## X-squared = 38.693, df = 1, p-value = 4.959e-10
ggtsdisplay(usgdp %>% BoxCox(BoxCox.lambda(usgdp)) %>% diff(1))
usgdp %>% BoxCox(BoxCox.lambda(usgdp)) %>% diff(1) %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 4 lags.
##
## Value of test-statistic is: 0.2013
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
mcopperautoplot(mcopper)
ggtsdisplay(mcopper)
ndiffs(mcopper)
## [1] 1
autoplot(BoxCox(mcopper, (BoxCox.lambda(mcopper))))
bc.lambda <- BoxCox.lambda(mcopper)
mcopper_lam <- mcopper %>% BoxCox(bc.lambda)
ggtsdisplay(mcopper_lam)
ggtsdisplay(diff(mcopper))
Box.test(diff(mcopper))
##
## Box-Pierce test
##
## data: diff(mcopper)
## X-squared = 38.705, df = 1, p-value = 4.93e-10
bc.lambda <- BoxCox.lambda(mcopper)
mcopper_lam <- mcopper %>% BoxCox(bc.lambda) %>% diff(1)
ggtsdisplay(mcopper_lam)
Box.test(mcopper_lam)
##
## Box-Pierce test
##
## data: mcopper_lam
## X-squared = 57.212, df = 1, p-value = 3.908e-14
mcopper_lam %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 6 lags.
##
## Value of test-statistic is: 0.0573
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
enplanementsautoplot(enplanements)
ggtsdisplay(enplanements)
ndiffs(enplanements)
## [1] 1
autoplot(BoxCox(enplanements, (BoxCox.lambda(enplanements))))
ggtsdisplay(BoxCox(enplanements, BoxCox.lambda(enplanements)))
ggtsdisplay(diff(enplanements))
Box.test(diff(enplanements))
##
## Box-Pierce test
##
## data: diff(enplanements)
## X-squared = 12.811, df = 1, p-value = 0.0003446
lambda <- BoxCox.lambda(enplanements)
transformdata <- BoxCox(enplanements, lambda)
transformdata %>% diff(lag = frequency(enplanements)) %>% diff() %>% ggtsdisplay()
visitorsautoplot(visitors)
ggtsdisplay(visitors)
ndiffs(visitors)
## [1] 1
autoplot(BoxCox(visitors, (BoxCox.lambda(visitors))))
ggtsdisplay(BoxCox(visitors, BoxCox.lambda(visitors)))
ggtsdisplay(diff(visitors))
Box.test(diff(visitors))
##
## Box-Pierce test
##
## data: diff(visitors)
## X-squared = 28.739, df = 1, p-value = 8.283e-08
transformdata %>% diff() %>% diff(lag = frequency(visitors)) %>% ggtsdisplay()
ggtsdisplay(visitors %>% BoxCox(BoxCox.lambda(visitors)) %>% diff(12) %>% diff(1))
Box.test(visitors %>% BoxCox(BoxCox.lambda(visitors)) %>% diff(12) %>% diff(1),lag=24, fitdf=0, type="Lj")
##
## Box-Ljung test
##
## data: visitors %>% BoxCox(BoxCox.lambda(visitors)) %>% diff(12) %>% diff(1)
## X-squared = 121.61, df = 24, p-value = 4.996e-15
visitors %>% BoxCox(BoxCox.lambda(visitors)) %>% diff(12) %>% diff(1) %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 4 lags.
##
## Value of test-statistic is: 0.0158
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
retaildata <- read_excel('retail.xlsx', skip=1)
myts <- ts(retaildata[,"A3349398A"],frequency=12, start=c(1982,4))
autoplot(myts)
ggseasonplot(myts, polar = TRUE)
ggseasonplot(myts)
ggtsdisplay(myts)
autoplot(BoxCox(myts, (BoxCox.lambda(myts))))
ggtsdisplay(BoxCox(myts, BoxCox.lambda(myts)))
ggtsdisplay(usgdp %>% BoxCox(BoxCox.lambda(myts)) %>% diff(1))
usgdp %>% BoxCox(BoxCox.lambda(myts)) %>% diff(1) %>% ur.kpss() %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 4 lags.
##
## Value of test-statistic is: 0.0278
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
Use R to simulate and plot some data from simple ARIMA models. ### a. Use the following R code to generate data from an AR(1) model with \(\phi _{1} = 0.6\) and \(\sigma ^{2} = 1\) . The process starts with \(y_{1} = 0\).
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
autoplot(y)
Produce a time plot for the series. How does the plot change as you change \(\phi _{1}\)?
ar <- function(theta){
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100){
y[i] <- 0.6*y[i-1] + e[i]}
return(y)
}
set.seed(346)
data <- list()
i<- 0
phi_values <- runif(2, min = 0.0001, max = .9)
for (phi in phi_values)
{
i<- i+1
data[[i]] <- ar(phi)
}
data2 <- do.call(cbind,data)
colnames(data2) <- paste("phi=", phi_values, sep = '')
autoplot(data2) + ylab('Data')
autoplot(data2, series = "b values")+
autolayer(y, series = "a values")+
ylab('Data')
## Warning: Ignoring unknown parameters: series
b_data_low <- data2[,1]
b_data_high <- data2[,2]
par(mfrow=c(1,3))
acf(y, main="A data");
acf(b_data_low, main = "B low");
acf(b_data_high, main = "B high")
Write your own code to generate data from an MA(1) model with \(\Theta _{1} = 0.6\) and \(\sigma ^{2} = 1\).
ma <- function(theta){
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100){
y[i] <- theta*e[i-1] + e[i]}
return(y)
}
matest <- ma(0.6)
autoplot(matest)
Produce a time plot for the series. How does the plot change as you change \(\phi _{1}\)?
ma_2 <- ma(.1)
ma_3 <- ma(1)
ma_4 <- ma(4)
autoplot(matest, series = "Theta 0.6")+
autolayer(ma_2, series = "Theta 0.1")+
autolayer(ma_3, series = "Theta 1")+
autolayer(ma_4, series = "Theta 4")+
ylab('Data')
autoplot(cbind(matest, ma_2, ma_3, ma_4), facet = TRUE)+
ylab('Data')
Generate data from an ARMA(1,1) model with \(\phi _{1} = 0.6\), \(\Theta _{1} = 0.6\) and \(\sigma ^{2} = 1\).
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100){
y[i] <- 0.6*y[i-1] + 0.6*e[i-1] + e[i]}
autoplot(y)
Generate data from an AR(2) model with \(\phi _{1} = -0.8\), _{2} = 0.3 and \(\sigma ^{2} = 1\). (Note that these parameters will give a non-stationary series.)
y2 <- ts(numeric(100))
e <- rnorm(100)
for(i in 3:100){
y2[i] <- (-0.8)*y2[i-1] + 0.3*y2[i-2] + e[i]
}
autoplot(y2)
Graph the latter two series and compare them.
ggAcf(y)
ggAcf(y2)
Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.
By studying appropriate graphs of the series in R, find an appropriate ARIMA(p, d, q) model for these data.
wmurders %>%
ggtsdisplay()
wmurders %>% ndiffs()
## [1] 2
wmurders %>% diff() %>% ur.kpss(lags = "short") %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 3 lags.
##
## Value of test-statistic is: 0.4697
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
wmurders %>% diff() %>% ggtsdisplay()
wmurders %>% diff() %>% diff() %>% ur.kpss(lags = "short") %>% summary()
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 3 lags.
##
## Value of test-statistic is: 0.0458
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
wmurders %>% diff() %>% diff() %>% ggtsdisplay()
fit <- Arima(wmurders, c(1,1,0))
checkresiduals(fit)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,1,0)
## Q* = 13.281, df = 9, p-value = 0.1503
##
## Model df: 1. Total lags used: 10
fit
## Series: wmurders
## ARIMA(1,1,0)
##
## Coefficients:
## ar1
## -0.0841
## s.e. 0.1346
##
## sigma^2 estimated as 0.04616: log likelihood=6.92
## AIC=-9.85 AICc=-9.61 BIC=-5.87
fit2 <- Arima(wmurders, c(0,1,2))
checkresiduals(fit2)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,2)
## Q* = 9.7748, df = 8, p-value = 0.2812
##
## Model df: 2. Total lags used: 10
fit2
## Series: wmurders
## ARIMA(0,1,2)
##
## Coefficients:
## ma1 ma2
## -0.0660 0.3712
## s.e. 0.1263 0.1640
##
## sigma^2 estimated as 0.0422: log likelihood=9.71
## AIC=-13.43 AICc=-12.95 BIC=-7.46
Should you include a constant in the model? Explain.
Write this model in terms of the backshift operator.
\((1-\phi _{1} B-\phi _{2}B^{2})y_{t}=c+\varepsilon _{t}\)
Fit the model using R and examine the residuals. Is the model satisfactory?
Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.
fit
## Series: wmurders
## ARIMA(1,1,0)
##
## Coefficients:
## ar1
## -0.0841
## s.e. 0.1346
##
## sigma^2 estimated as 0.04616: log likelihood=6.92
## AIC=-9.85 AICc=-9.61 BIC=-5.87
wm3 <- forecast(fit, h=3)
wm3
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 2005 2.595506 2.320177 2.870835 2.174427 3.016585
## 2006 2.594992 2.221624 2.968359 2.023976 3.166007
## 2007 2.595035 2.143387 3.046682 1.904300 3.285770
res <- resid(fit)
len <- length(res)
len_1 <- length(res)-1
et <- res[len]
et_1 <- res[len-1]
ar1 <- coef(fit)["ar1"]
f1 <- wmurders[len] + (ar1 * (wmurders[len]-wmurders[len_1]))-0.0000025
f2 <- f1 + (ar1 * (f1 - wmurders[len]))
f3 <- f1 + (ar1 * (f2 - f1))
print(paste('1st Forecast is: ', f1))
## [1] "1st Forecast is: 2.59550371529307"
print(paste('2nd Forecast is: ', f2))
## [1] "2nd Forecast is: 2.59498921227996"
print(paste('3rd Forecast is: ', f3))
## [1] "3rd Forecast is: 2.59554696405369"
Create a plot of the series with forecasts and prediction intervals for the next three periods shown.
autoplot(forecast(wm3))
Does auto.arima() give the same model you have chosen? If not, which model do you think is better?
auto_fit <- auto.arima(wmurders, stepwise = FALSE, approximation = FALSE)
autoplot(forecast(auto_fit, 3))