Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
Explain the differences among these figures. Do they all indicate that the data are white noise?
In each figure the h or maximum lag is 20 and the T or number of observations are increasing. For the data to be white noise 95% of the of the spikes in the ACF to lie within ±2/√T±2/T. All 3 series confirm this fact and are therefore white noise.
Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?
The critical values are different because the number of observations are increasing. The law of large numbers say that the average will get closer to the expected vaalues as the number of trials increase. For white noise we expect autocorrelation to be close to zero.
A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.
In non-stationary data the ACS plot will decrease slowly which we can clearly see here. The Pacf plot shows relationship between yt and yt-k after removing affects of lags. The first lag is expected to be above the critical threshold as it has nothing to remove, but the remaing lags are within the 95% threashold. Since the ACF is decaying and there a significant spike in lag p in the Pacf , but none beyond lag p, the data may follow ARIMA(p,d,0)
## Series: ibmclose
## ARIMA(0,1,0)
##
## sigma^2 estimated as 52.62: log likelihood=-1251.37
## AIC=2504.74 AICc=2504.75 BIC=2508.64
For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.
Transform Function
transform<-function (x)
{
compt<-cbind(x=x,
"log" =log(x),
"diff" =diff(log(x),12),
"Double diff"=diff(diff(log(x),12),1))%>%
autoplot(facets=TRUE)+
xlab("Year") + ylab("")+
ggtitle(x)
x %>% diff()%>% ggtsdisplay(main="")
diff(diff(log(x),12),1)%>% ggtsdisplay(main="")
auto.arima(x)
Box.test(x)
checkresiduals(auto.arima(x))
compt
}
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,1,2) with drift
## Q* = 2.7945, df = 5, p-value = 0.7316
##
## Model df: 5. Total lags used: 10
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,2,2)
## Q* = 8.6247, df = 4, p-value = 0.0712
##
## Model df: 4. Total lags used: 8
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,1,1)
## Q* = 30.716, df = 23, p-value = 0.1299
##
## Model df: 1. Total lags used: 24
##
## Ljung-Box test
##
## data: Residuals from ARIMA(2,0,1)(0,1,1)[12]
## Q* = 15.625, df = 20, p-value = 0.7396
##
## Model df: 4. Total lags used: 24
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,1)(0,1,2)[12] with drift
## Q* = 19.26, df = 19, p-value = 0.4403
##
## Model df: 5. Total lags used: 24
For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
Using the transform fuction created earlier auto.arima provides best model: ARIMA(1,0,2)(0,1,1)[12] with drift
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,2)(0,1,1)[12] with drift
## Q* = 12.946, df = 19, p-value = 0.8413
##
## Model df: 5. Total lags used: 24
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,0,2)(0,1,1)[12] with drift
## Q* = 12.946, df = 19, p-value = 0.8413
##
## Model df: 5. Total lags used: 24
Use R to simulate and plot some data from simple ARIMA models.
Use the following R code to generate data from an AR(1) model with ϕ1=0.6 and σ2=1. The process starts with y1=0 .
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
autoplot(y)
auto.arima(y)
## Series: y
## ARIMA(1,0,0) with non-zero mean
##
## Coefficients:
## ar1 mean
## 0.5441 0.3848
## s.e. 0.0834 0.2357
##
## sigma^2 estimated as 1.206: log likelihood=-150.41
## AIC=306.83 AICc=307.08 BIC=314.64
fit1<-Arima(y, order=c(1,0,0))
fit1
## Series: y
## ARIMA(1,0,0) with non-zero mean
##
## Coefficients:
## ar1 mean
## 0.5441 0.3848
## s.e. 0.0834 0.2357
##
## sigma^2 estimated as 1.206: log likelihood=-150.41
## AIC=306.83 AICc=307.08 BIC=314.64
autoplot(fit1)
Produce a time plot for the series. How does the plot change as you change ϕ1?
The change to ϕ1 adds a bit of a long-term uptrend to the plot although it is not seasonaly consistent
Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
fit2<-Arima(y, order=c(0,0,1))
fit2
## Series: y
## ARIMA(0,0,1) with non-zero mean
##
## Coefficients:
## ma1 mean
## 0.4419 0.4027
## s.e. 0.0723 0.1629
##
## sigma^2 estimated as 1.31: log likelihood=-154.48
## AIC=314.97 AICc=315.22 BIC=322.78
autoplot(fit2)
Produce a time plot for the series. How does the plot change as you change θ1?
Once again, the change to ϕ1 adds a bit of a long-term uptrend to the plot although it is not seasonaly consistent
Generate data from an ARMA(1,1) model with ϕ1=0.6 and σ2=1
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
fit3<-Arima(y, order=c(1,0,1), include.constant = TRUE)
fit3
## Series: y
## ARIMA(1,0,1) with non-zero mean
##
## Coefficients:
## ar1 ma1 mean
## 0.5154 0.0405 0.3867
## s.e. 0.1484 0.1679 0.2310
##
## sigma^2 estimated as 1.217: log likelihood=-150.39
## AIC=308.77 AICc=309.19 BIC=319.19
autoplot(fit3)
plot1<-autoplot(y)
Generate data from an AR(2) model with ϕ1=−0.8, ϕ2=0.3 and σ2=1
for(i in 2:100)
y[i] <- 0.8*y[i-1] + 0.3*y[i-1] +e[i]
fit4<-Arima(y, order=c(2,0,1), include.constant = TRUE)
fit4
## Series: y
## ARIMA(2,0,1) with non-zero mean
##
## Coefficients:
## ar1 ar2 ma1 mean
## 1.9995 -0.9995 0.9753 21381.77
## s.e. 0.0007 0.0007 0.0369 NaN
##
## sigma^2 estimated as 3008: log likelihood=-546.48
## AIC=1102.95 AICc=1103.59 BIC=1115.98
autoplot(fit4)
plot2<-autoplot(y)
Graph the latter two series and compare them.
AR(1) plot is stationary, with no specific trends. AR(2) is non-stationary with a quadradic trend
Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.
By studying appropriate graphs of the series in R, find an appropriate ARIMA(p,d,q) model for these data.
The ACF model shows lags declining show we should use aa ARIMA(p,d,0) model. The auto.arima selects ARIMA(1,2,1) as the model
## Series: twm
## ARIMA(1,2,1)
##
## Coefficients:
## ar1 ma1
## -0.2434 -0.8261
## s.e. 0.1553 0.1143
##
## sigma^2 estimated as 0.04632: log likelihood=6.44
## AIC=-6.88 AICc=-6.39 BIC=-0.97
The plots of the ts with log, diff and doubdle diff show that 1-2 differences would stabalize the data. Ndiff and NSdiffs confirm there should be 2 diffes and 0 seasonal diffs respectively
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 3 lags.
##
## Value of test-statistic is: 0.6331
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
##
## #######################
## # KPSS Unit Root Test #
## #######################
##
## Test is of type: mu with 3 lags.
##
## Value of test-statistic is: 0.4697
##
## Critical value for a significance level of:
## 10pct 5pct 2.5pct 1pct
## critical values 0.347 0.463 0.574 0.739
## [1] 2
## [1] 0
Fits with residual checks show fit1 with ARIMA(1,1,2) has lowest ACF.
## Series: log(twm)
## ARIMA(1,1,2)
##
## Coefficients:
## ar1 ma1 ma2
## 0.1014 -0.1729 0.4023
## s.e. 0.2761 0.2479 0.1911
##
## sigma^2 estimated as 0.003469: log likelihood=77.67
## AIC=-147.35 AICc=-146.53 BIC=-139.39
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,1,2)
## Q* = 8.7524, df = 8, p-value = 0.3636
##
## Model df: 3. Total lags used: 11
## Series: log(twm)
## ARIMA(1,2,2)
##
## Coefficients:
## ar1 ma1 ma2
## -0.7427 -0.3057 -0.4482
## s.e. 0.3341 0.4080 0.3919
##
## sigma^2 estimated as 0.003599: log likelihood=74.67
## AIC=-141.34 AICc=-140.5 BIC=-133.46
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,2,2)
## Q* = 8.0787, df = 8, p-value = 0.4258
##
## Model df: 3. Total lags used: 11
## Series: log(twm)
## ARIMA(1,2,1)
##
## Coefficients:
## ar1 ma1
## -0.2954 -0.7892
## s.e. 0.1533 0.1190
##
## sigma^2 estimated as 0.003613: log likelihood=74.09
## AIC=-142.17 AICc=-141.68 BIC=-136.26
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,2,1)
## Q* = 10.73, df = 9, p-value = 0.2947
##
## Model df: 2. Total lags used: 11
Should you include a constant in the model? Explain.
When setting constant = TRUE for selected fit1 the AIC is slightly with constant turned on. Therefor, we should NOT include constant
fit4<- Arima(log(twm), order= c(1,1,2), include.constant = TRUE)
fit4
## Series: log(twm)
## ARIMA(1,1,2) with drift
##
## Coefficients:
## ar1 ma1 ma2 drift
## 0.1013 -0.1728 0.4022 0.0001
## s.e. 0.2768 0.2486 0.1916 0.0106
##
## sigma^2 estimated as 0.003538: log likelihood=77.67
## AIC=-145.35 AICc=-144.1 BIC=-135.4
checkresiduals(fit4)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,1,2) with drift
## Q* = 8.7516, df = 7, p-value = 0.271
##
## Model df: 4. Total lags used: 11
Write this model in terms of the backshift operator.
fit5<- Arima(log(twm), order= c(1,1,3), seasonal= c(0,0,1))
fit5
## Series: log(twm)
## ARIMA(1,1,3)(0,0,1)[12]
##
## Coefficients:
## ar1 ma1 ma2 ma3 sma1
## -0.1851 0.1432 0.3886 0.1396 -0.1815
## s.e. 0.8467 0.8321 0.1938 0.3991 0.1713
##
## sigma^2 estimated as 0.003494: log likelihood=78.35
## AIC=-144.69 AICc=-142.9 BIC=-132.76
autoplot(fit5)
checkresiduals(fit5)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,1,3)(0,0,1)[12]
## Q* = 9.2793, df = 6, p-value = 0.1585
##
## Model df: 5. Total lags used: 11
Fit the model using R and examine the residuals. Is the model satisfactory
The selected model shows it is non-stationary, lags are withing the 95% critical values and residuals are normally distributed
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,1,2)
## Q* = 8.7524, df = 8, p-value = 0.3636
##
## Model df: 3. Total lags used: 11
Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.
ffit1<-forecast(h=3,fit1)
ffit1
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Aug 5 0.8993161 0.8238387 0.9747934 0.7838835 1.014749
## Sep 5 0.8970516 0.7940571 1.0000461 0.7395351 1.054568
## Oct 5 0.8968220 0.7533395 1.0403045 0.6773844 1.116260
upper <- fitted(fit1) + 1.96*sqrt(fit1$sigma2)
lower <- fitted(fit1) - 1.96*sqrt(fit1$sigma2)
upper
## Jan Feb Mar Apr May Jun Jul
## 1 1.0021976 1.0010827 0.9776922 0.9703593 0.9483266 0.9509012 0.9252877
## 2 1.0767625 1.0782150 1.0667314 1.1073994 1.1761637 1.2781993 1.3797627
## 3 1.6008775 1.6510270 1.6103024 1.4852780 1.4803772 1.5174857 1.5599804
## 4 1.4585937 1.5174839 1.5398297 1.5314754 1.5084241 1.5199277 1.6013790
## 5 1.2690963 1.2288638 1.2187199 1.1499233 1.2706081 1.2138904 1.0593507
## Aug Sep Oct Nov Dec
## 1 0.9655385 0.9797345 0.9957116 1.0355284 1.0757024
## 2 1.3622532 1.3401220 1.4007896 1.5194449 1.5273218
## 3 1.6018248 1.5735416 1.5238265 1.4365567 1.4241183
## 4 1.5663690 1.5605468 1.5021113 1.4474497 1.3658249
## 5
lower
## Jan Feb Mar Apr May Jun Jul
## 1 0.7713281 0.7702133 0.7468228 0.7394898 0.7174571 0.7200317 0.6944182
## 2 0.8458930 0.8473456 0.8358619 0.8765299 0.9452942 1.0473299 1.1488932
## 3 1.3700080 1.4201576 1.3794329 1.2544085 1.2495077 1.2866162 1.3291109
## 4 1.2277242 1.2866144 1.3089602 1.3006060 1.2775547 1.2890583 1.3705095
## 5 1.0382269 0.9979943 0.9878504 0.9190538 1.0397386 0.9830210 0.8284812
## Aug Sep Oct Nov Dec
## 1 0.7346690 0.7488651 0.7648421 0.8046589 0.8448329
## 2 1.1313837 1.1092526 1.1699201 1.2885754 1.2964523
## 3 1.3709554 1.3426721 1.2929570 1.2056873 1.1932488
## 4 1.3354995 1.3296773 1.2712418 1.2165802 1.1349554
## 5
Create a plot of the series with forecasts and prediction intervals for the next three periods shown.
autoplot(forecast(h=3,fit1))
Does auto.arima() give the same model you have chosen? If not, which model do you think is better?
The auot.arima() selected a 1,0,1 model, however, it is not the best model. The best model will minimized both AIC and BIC. The ARIMA112 model is better because it has lowest AIC and BIC values.
| Metric | ARIMA112 | ARIMA122 | Auto ARIMA101 | Constant112 | BackShft112 |
|---|---|---|---|---|---|
| AIC | -147.3450239 | -141.337082 | -142.1744584 | -145.3450764 | -144.692147 |
| BIC | -139.3890877 | -133.4559143 | -136.2635827 | -135.4001562 | -132.7582427 |
Code used in analysis
knitr::opts_chunk$set(
echo = FALSE,
message = FALSE,
warning = FALSE
)
#knitr::opts_chunk$set(echo = TRUE)
require(knitr)
library(ggplot2)
library(tidyr)
library(MASS)
library(psych)
library(kableExtra)
library(dplyr)
library(faraway)
library(gridExtra)
library(reshape2)
library(leaps)
library(pROC)
library(caret)
library(naniar)
library(pander)
library(pROC)
library(mlbench)
library(e1071)
library(fpp2)
library(urca)
data(ibmclose)
auto.arima(ibmclose)
autoplot(ibmclose)
ggAcf(ibmclose)
ggPacf(ibmclose)
transform<-function (x)
{
compt<-cbind(x=x,
"log" =log(x),
"diff" =diff(log(x),12),
"Double diff"=diff(diff(log(x),12),1))%>%
autoplot(facets=TRUE)+
xlab("Year") + ylab("")+
ggtitle(x)
x %>% diff()%>% ggtsdisplay(main="")
diff(diff(log(x),12),1)%>% ggtsdisplay(main="")
auto.arima(x)
Box.test(x)
checkresiduals(auto.arima(x))
compt
}
transform(usnetelec)
transform(usgdp)
transform(mcopper)
transform(enplanements)
transform(visitors)
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349873A"],
frequency=12, start=c(1982,4))
transform(myts)
checkresiduals(auto.arima(myts))
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
autoplot(y)
auto.arima(y)
fit1<-Arima(y, order=c(1,0,0))
fit1
autoplot(fit1)
autoplot(y)
for(i in 2:100)
y[i] <- 1*y[i-1] + e[i]
#checkresiduals(fit1)
autoplot(y)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
fit2<-Arima(y, order=c(0,0,1))
fit2
autoplot(fit2)
autoplot(y)
for(i in 2:100)
y[i] <- 1*y[i-1] + e[i]
#checkresiduals(fit1)
autoplot(y)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
fit3<-Arima(y, order=c(1,0,1), include.constant = TRUE)
fit3
autoplot(fit3)
plot1<-autoplot(y)
for(i in 2:100)
y[i] <- 0.8*y[i-1] + 0.3*y[i-1] +e[i]
fit4<-Arima(y, order=c(2,0,1), include.constant = TRUE)
fit4
autoplot(fit4)
plot2<-autoplot(y)
plot1
plot2
data("wmurders")
twm<-ts(wmurders, frequency = 12)
autoplot(twm)
ggAcf(twm)
ggPacf(twm)
auto.arima(twm)
cbind("WMurders"= twm,
"Diff Wmurders" = diff(twm,lag=4),
"Log Wmurders" = log(twm),
"2 x Diff Wmurders" = diff(diff(twm,lag=4)),
"2 x logDiff Wmurders" = diff(diff(log(twm))))%>%
autoplot(facets=TRUE)+
xlab("Year") + ylab("")+
ggtitle("Wmurders")
twm %>% ur.kpss() %>% summary
diff(twm) %>% ur.kpss() %>% summary
ndiffs(twm)
nsdiffs(twm)
fit1<- Arima(log(twm), order= c(1,1,2))
fit1
checkresiduals(fit1)
fit2<- Arima(log(twm), order= c(1,2,2))
fit2
checkresiduals(fit2)
fit3<- auto.arima(log(twm))
fit3
checkresiduals(fit3)
fit4<- Arima(log(twm), order= c(1,1,2), include.constant = TRUE)
fit4
checkresiduals(fit4)
fit5<- Arima(log(twm), order= c(1,1,3), seasonal= c(0,0,1))
fit5
autoplot(fit5)
checkresiduals(fit5)
checkresiduals(fit1)
ffit1<-forecast(h=3,fit1)
ffit1
upper <- fitted(fit1) + 1.96*sqrt(fit1$sigma2)
lower <- fitted(fit1) - 1.96*sqrt(fit1$sigma2)
upper
lower
autoplot(forecast(h=3,fit1))
m1AIC <- AIC(fit1)
m1BIC <- BIC(fit1)
m2AIC <- AIC(fit2)
m2BIC <- BIC(fit2)
m3AIC <- AIC(fit3)
m3BIC <- BIC(fit3)
m4AIC <- AIC(fit4)
m4BIC <- BIC(fit4)
m5AIC <- AIC(fit5)
m5BIC <- BIC(fit5)