Data 624 Homework 6

Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers. acfimage

1) ACF

(a) Explain Differences

Explain the differences among these figures. Do they all indicate that the data are white noise?

In each figure the h or maximum lag is 20 and the T or number of observations are increasing. For the data to be white noise 95% of the of the spikes in the ACF to lie within ±2/√T±2/T. All 3 series confirm this fact and are therefore white noise.

(b) Critical Values

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The critical values are different because the number of observations are increasing. The law of large numbers say that the average will get closer to the expected vaalues as the number of trials increase. For white noise we expect autocorrelation to be close to zero.

2) Non-Stationary Series

A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

In non-stationary data the ACS plot will decrease slowly which we can clearly see here. The Pacf plot shows relationship between yt and yt-k after removing affects of lags. The first lag is expected to be above the critical threshold as it has nothing to remove, but the remaing lags are within the 95% threashold. Since the ACF is decaying and there a significant spike in lag p in the Pacf , but none beyond lag p, the data may follow ARIMA(p,d,0)

## Series: ibmclose 
## ARIMA(0,1,0) 
## 
## sigma^2 estimated as 52.62:  log likelihood=-1251.37
## AIC=2504.74   AICc=2504.75   BIC=2508.64

3) Box-Cox Transform

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data.

Transform Function

transform<-function (x)
{
compt<-cbind(x=x,
      "log" =log(x),
      "diff" =diff(log(x),12),
      "Double diff"=diff(diff(log(x),12),1))%>%
  autoplot(facets=TRUE)+
  xlab("Year") + ylab("")+
  ggtitle(x)


x %>% diff()%>% ggtsdisplay(main="")
diff(diff(log(x),12),1)%>% ggtsdisplay(main="")

auto.arima(x)
Box.test(x)

checkresiduals(auto.arima(x))

compt
}

(a) usnetelec

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,1,2) with drift
## Q* = 2.7945, df = 5, p-value = 0.7316
## 
## Model df: 5.   Total lags used: 10

(b) usgdp

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,2,2)
## Q* = 8.6247, df = 4, p-value = 0.0712
## 
## Model df: 4.   Total lags used: 8

(c) mcopper

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,1)
## Q* = 30.716, df = 23, p-value = 0.1299
## 
## Model df: 1.   Total lags used: 24

(d) enplanements

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(2,0,1)(0,1,1)[12]
## Q* = 15.625, df = 20, p-value = 0.7396
## 
## Model df: 4.   Total lags used: 24

(e) visitors

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,1)(0,1,2)[12] with drift
## Q* = 19.26, df = 19, p-value = 0.4403
## 
## Model df: 5.   Total lags used: 24

5) Differencing

For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

Using the transform fuction created earlier auto.arima provides best model: ARIMA(1,0,2)(0,1,1)[12] with drift

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,2)(0,1,1)[12] with drift
## Q* = 12.946, df = 19, p-value = 0.8413
## 
## Model df: 5.   Total lags used: 24

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,2)(0,1,1)[12] with drift
## Q* = 12.946, df = 19, p-value = 0.8413
## 
## Model df: 5.   Total lags used: 24

6) ARIMA

Use R to simulate and plot some data from simple ARIMA models.

(a) AR(1)

Use the following R code to generate data from an AR(1) model with ϕ1=0.6 and σ2=1. The process starts with y1=0 .

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]

autoplot(y)

auto.arima(y)

## Series: y 
## ARIMA(1,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1    mean
##       0.5441  0.3848
## s.e.  0.0834  0.2357
## 
## sigma^2 estimated as 1.206:  log likelihood=-150.41
## AIC=306.83   AICc=307.08   BIC=314.64

fit1<-Arima(y, order=c(1,0,0))
fit1

## Series: y 
## ARIMA(1,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1    mean
##       0.5441  0.3848
## s.e.  0.0834  0.2357
## 
## sigma^2 estimated as 1.206:  log likelihood=-150.41
## AIC=306.83   AICc=307.08   BIC=314.64

autoplot(fit1)

(b) Time Plot AR(1)

Produce a time plot for the series. How does the plot change as you change ϕ1?

The change to ϕ1 adds a bit of a long-term uptrend to the plot although it is not seasonaly consistent

(c) MA(1)

Write your own code to generate data from an MA(1) model with θ1=0.6 and σ2=1

for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
fit2<-Arima(y, order=c(0,0,1))
fit2

## Series: y 
## ARIMA(0,0,1) with non-zero mean 
## 
## Coefficients:
##          ma1    mean
##       0.4419  0.4027
## s.e.  0.0723  0.1629
## 
## sigma^2 estimated as 1.31:  log likelihood=-154.48
## AIC=314.97   AICc=315.22   BIC=322.78

autoplot(fit2)

(d) Time Plot MA(1)

Produce a time plot for the series. How does the plot change as you change θ1?

Once again, the change to ϕ1 adds a bit of a long-term uptrend to the plot although it is not seasonaly consistent

(e) ARIMA(1,1)

Generate data from an ARMA(1,1) model with ϕ1=0.6 and σ2=1

for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
fit3<-Arima(y, order=c(1,0,1), include.constant = TRUE)
fit3

## Series: y 
## ARIMA(1,0,1) with non-zero mean 
## 
## Coefficients:
##          ar1     ma1    mean
##       0.5154  0.0405  0.3867
## s.e.  0.1484  0.1679  0.2310
## 
## sigma^2 estimated as 1.217:  log likelihood=-150.39
## AIC=308.77   AICc=309.19   BIC=319.19

autoplot(fit3)

plot1<-autoplot(y)

(f) AR(2)

Generate data from an AR(2) model with ϕ1=−0.8, ϕ2=0.3 and σ2=1

for(i in 2:100)
  y[i] <- 0.8*y[i-1] + 0.3*y[i-1] +e[i]
fit4<-Arima(y, order=c(2,0,1), include.constant = TRUE)
fit4

## Series: y 
## ARIMA(2,0,1) with non-zero mean 
## 
## Coefficients:
##          ar1      ar2     ma1      mean
##       1.9995  -0.9995  0.9753  21381.77
## s.e.  0.0007   0.0007  0.0369       NaN
## 
## sigma^2 estimated as 3008:  log likelihood=-546.48
## AIC=1102.95   AICc=1103.59   BIC=1115.98

autoplot(fit4)

plot2<-autoplot(y)

(g) Graph and Compare

Graph the latter two series and compare them.

AR(1) plot is stationary, with no specific trends. AR(2) is non-stationary with a quadradic trend

Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States.

7 Wmurders

(a) Find ARIMA Model

By studying appropriate graphs of the series in R, find an appropriate ARIMA(p,d,q) model for these data.

The ACF model shows lags declining show we should use aa ARIMA(p,d,0) model. The auto.arima selects ARIMA(1,2,1) as the model

## Series: twm 
## ARIMA(1,2,1) 
## 
## Coefficients:
##           ar1      ma1
##       -0.2434  -0.8261
## s.e.   0.1553   0.1143
## 
## sigma^2 estimated as 0.04632:  log likelihood=6.44
## AIC=-6.88   AICc=-6.39   BIC=-0.97

The plots of the ts with log, diff and doubdle diff show that 1-2 differences would stabalize the data. Ndiff and NSdiffs confirm there should be 2 diffes and 0 seasonal diffs respectively

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.6331 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

## 
## ####################### 
## # KPSS Unit Root Test # 
## ####################### 
## 
## Test is of type: mu with 3 lags. 
## 
## Value of test-statistic is: 0.4697 
## 
## Critical value for a significance level of: 
##                 10pct  5pct 2.5pct  1pct
## critical values 0.347 0.463  0.574 0.739

## [1] 2

## [1] 0

Fits with residual checks show fit1 with ARIMA(1,1,2) has lowest ACF.

## Series: log(twm) 
## ARIMA(1,1,2) 
## 
## Coefficients:
##          ar1      ma1     ma2
##       0.1014  -0.1729  0.4023
## s.e.  0.2761   0.2479  0.1911
## 
## sigma^2 estimated as 0.003469:  log likelihood=77.67
## AIC=-147.35   AICc=-146.53   BIC=-139.39

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,1,2)
## Q* = 8.7524, df = 8, p-value = 0.3636
## 
## Model df: 3.   Total lags used: 11

## Series: log(twm) 
## ARIMA(1,2,2) 
## 
## Coefficients:
##           ar1      ma1      ma2
##       -0.7427  -0.3057  -0.4482
## s.e.   0.3341   0.4080   0.3919
## 
## sigma^2 estimated as 0.003599:  log likelihood=74.67
## AIC=-141.34   AICc=-140.5   BIC=-133.46

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,2,2)
## Q* = 8.0787, df = 8, p-value = 0.4258
## 
## Model df: 3.   Total lags used: 11

## Series: log(twm) 
## ARIMA(1,2,1) 
## 
## Coefficients:
##           ar1      ma1
##       -0.2954  -0.7892
## s.e.   0.1533   0.1190
## 
## sigma^2 estimated as 0.003613:  log likelihood=74.09
## AIC=-142.17   AICc=-141.68   BIC=-136.26

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,2,1)
## Q* = 10.73, df = 9, p-value = 0.2947
## 
## Model df: 2.   Total lags used: 11

(b) Include Constant

Should you include a constant in the model? Explain.

When setting constant = TRUE for selected fit1 the AIC is slightly with constant turned on. Therefor, we should NOT include constant

fit4<- Arima(log(twm), order= c(1,1,2), include.constant = TRUE)
fit4

## Series: log(twm) 
## ARIMA(1,1,2) with drift 
## 
## Coefficients:
##          ar1      ma1     ma2   drift
##       0.1013  -0.1728  0.4022  0.0001
## s.e.  0.2768   0.2486  0.1916  0.0106
## 
## sigma^2 estimated as 0.003538:  log likelihood=77.67
## AIC=-145.35   AICc=-144.1   BIC=-135.4

checkresiduals(fit4)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,1,2) with drift
## Q* = 8.7516, df = 7, p-value = 0.271
## 
## Model df: 4.   Total lags used: 11

(c) Backshift Operator

Write this model in terms of the backshift operator.

fit5<- Arima(log(twm), order= c(1,1,3), seasonal= c(0,0,1))
fit5

## Series: log(twm) 
## ARIMA(1,1,3)(0,0,1)[12] 
## 
## Coefficients:
##           ar1     ma1     ma2     ma3     sma1
##       -0.1851  0.1432  0.3886  0.1396  -0.1815
## s.e.   0.8467  0.8321  0.1938  0.3991   0.1713
## 
## sigma^2 estimated as 0.003494:  log likelihood=78.35
## AIC=-144.69   AICc=-142.9   BIC=-132.76

autoplot(fit5)

checkresiduals(fit5)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,1,3)(0,0,1)[12]
## Q* = 9.2793, df = 6, p-value = 0.1585
## 
## Model df: 5.   Total lags used: 11

(d) Examine Residuals

Fit the model using R and examine the residuals. Is the model satisfactory

The selected model shows it is non-stationary, lags are withing the 95% critical values and residuals are normally distributed

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,1,2)
## Q* = 8.7524, df = 8, p-value = 0.3636
## 
## Model df: 3.   Total lags used: 11

(e) Forecast

Forecast three times ahead. Check your forecasts by hand to make sure that you know how they have been calculated.

ffit1<-forecast(h=3,fit1)
ffit1

##       Point Forecast     Lo 80     Hi 80     Lo 95    Hi 95
## Aug 5      0.8993161 0.8238387 0.9747934 0.7838835 1.014749
## Sep 5      0.8970516 0.7940571 1.0000461 0.7395351 1.054568
## Oct 5      0.8968220 0.7533395 1.0403045 0.6773844 1.116260

upper <- fitted(fit1) + 1.96*sqrt(fit1$sigma2)
lower <- fitted(fit1) - 1.96*sqrt(fit1$sigma2)
upper

##         Jan       Feb       Mar       Apr       May       Jun       Jul
## 1 1.0021976 1.0010827 0.9776922 0.9703593 0.9483266 0.9509012 0.9252877
## 2 1.0767625 1.0782150 1.0667314 1.1073994 1.1761637 1.2781993 1.3797627
## 3 1.6008775 1.6510270 1.6103024 1.4852780 1.4803772 1.5174857 1.5599804
## 4 1.4585937 1.5174839 1.5398297 1.5314754 1.5084241 1.5199277 1.6013790
## 5 1.2690963 1.2288638 1.2187199 1.1499233 1.2706081 1.2138904 1.0593507
##         Aug       Sep       Oct       Nov       Dec
## 1 0.9655385 0.9797345 0.9957116 1.0355284 1.0757024
## 2 1.3622532 1.3401220 1.4007896 1.5194449 1.5273218
## 3 1.6018248 1.5735416 1.5238265 1.4365567 1.4241183
## 4 1.5663690 1.5605468 1.5021113 1.4474497 1.3658249
## 5

lower

##         Jan       Feb       Mar       Apr       May       Jun       Jul
## 1 0.7713281 0.7702133 0.7468228 0.7394898 0.7174571 0.7200317 0.6944182
## 2 0.8458930 0.8473456 0.8358619 0.8765299 0.9452942 1.0473299 1.1488932
## 3 1.3700080 1.4201576 1.3794329 1.2544085 1.2495077 1.2866162 1.3291109
## 4 1.2277242 1.2866144 1.3089602 1.3006060 1.2775547 1.2890583 1.3705095
## 5 1.0382269 0.9979943 0.9878504 0.9190538 1.0397386 0.9830210 0.8284812
##         Aug       Sep       Oct       Nov       Dec
## 1 0.7346690 0.7488651 0.7648421 0.8046589 0.8448329
## 2 1.1313837 1.1092526 1.1699201 1.2885754 1.2964523
## 3 1.3709554 1.3426721 1.2929570 1.2056873 1.1932488
## 4 1.3354995 1.3296773 1.2712418 1.2165802 1.1349554
## 5

(f) Plot Forecast

Create a plot of the series with forecasts and prediction intervals for the next three periods shown.

autoplot(forecast(h=3,fit1))

(g) Auto.arima and Conclusion

Does auto.arima() give the same model you have chosen? If not, which model do you think is better?

The auot.arima() selected a 1,0,1 model, however, it is not the best model. The best model will minimized both AIC and BIC. The ARIMA112 model is better because it has lowest AIC and BIC values.

Metric	ARIMA112	ARIMA122	Auto ARIMA101	Constant112	BackShft112
AIC	-147.3450239	-141.337082	-142.1744584	-145.3450764	-144.692147
BIC	-139.3890877	-133.4559143	-136.2635827	-135.4001562	-132.7582427

APPENDIX

Code used in analysis

knitr::opts_chunk$set(
    echo = FALSE,
    message = FALSE,
    warning = FALSE
)
#knitr::opts_chunk$set(echo = TRUE)
require(knitr)
library(ggplot2)
library(tidyr)
library(MASS)
library(psych)
library(kableExtra)
library(dplyr)
library(faraway)
library(gridExtra)
library(reshape2)
library(leaps)
library(pROC)
library(caret)
library(naniar)
library(pander)
library(pROC)
library(mlbench)
library(e1071)
library(fpp2)
library(urca)
data(ibmclose)
auto.arima(ibmclose)
autoplot(ibmclose)

ggAcf(ibmclose)
ggPacf(ibmclose)

transform<-function (x)
{
compt<-cbind(x=x,
      "log" =log(x),
      "diff" =diff(log(x),12),
      "Double diff"=diff(diff(log(x),12),1))%>%
  autoplot(facets=TRUE)+
  xlab("Year") + ylab("")+
  ggtitle(x)


x %>% diff()%>% ggtsdisplay(main="")
diff(diff(log(x),12),1)%>% ggtsdisplay(main="")

auto.arima(x)
Box.test(x)

checkresiduals(auto.arima(x))

compt
}


transform(usnetelec)
transform(usgdp)
transform(mcopper)
transform(enplanements)
transform(visitors)
retaildata <- readxl::read_excel("retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349873A"],
  frequency=12, start=c(1982,4))

transform(myts)

checkresiduals(auto.arima(myts))
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]

autoplot(y)
auto.arima(y)
fit1<-Arima(y, order=c(1,0,0))
fit1
autoplot(fit1)
autoplot(y)
for(i in 2:100)
  y[i] <- 1*y[i-1] + e[i]
#checkresiduals(fit1)
autoplot(y)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
fit2<-Arima(y, order=c(0,0,1))
fit2
autoplot(fit2)
autoplot(y)
for(i in 2:100)
  y[i] <- 1*y[i-1] + e[i]
#checkresiduals(fit1)
autoplot(y)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]
fit3<-Arima(y, order=c(1,0,1), include.constant = TRUE)
fit3
autoplot(fit3)
plot1<-autoplot(y)
for(i in 2:100)
  y[i] <- 0.8*y[i-1] + 0.3*y[i-1] +e[i]
fit4<-Arima(y, order=c(2,0,1), include.constant = TRUE)
fit4
autoplot(fit4)
plot2<-autoplot(y)
plot1
plot2
data("wmurders")
twm<-ts(wmurders, frequency = 12)
autoplot(twm)
ggAcf(twm)
ggPacf(twm)
auto.arima(twm)

cbind("WMurders"= twm,
      "Diff Wmurders" = diff(twm,lag=4),
      "Log Wmurders" = log(twm),
      "2 x Diff Wmurders" = diff(diff(twm,lag=4)),
      "2 x logDiff Wmurders" = diff(diff(log(twm))))%>%
        autoplot(facets=TRUE)+
        xlab("Year") + ylab("")+
        ggtitle("Wmurders")

twm %>% ur.kpss() %>% summary
diff(twm) %>% ur.kpss() %>% summary
ndiffs(twm)
nsdiffs(twm)

fit1<- Arima(log(twm), order= c(1,1,2))
fit1
checkresiduals(fit1)
fit2<- Arima(log(twm), order= c(1,2,2))
fit2
checkresiduals(fit2)
fit3<- auto.arima(log(twm))
fit3
checkresiduals(fit3)

fit4<- Arima(log(twm), order= c(1,1,2), include.constant = TRUE)
fit4
checkresiduals(fit4)

fit5<- Arima(log(twm), order= c(1,1,3), seasonal= c(0,0,1))
fit5
autoplot(fit5)
checkresiduals(fit5)



checkresiduals(fit1)
ffit1<-forecast(h=3,fit1)
ffit1

upper <- fitted(fit1) + 1.96*sqrt(fit1$sigma2)
lower <- fitted(fit1) - 1.96*sqrt(fit1$sigma2)
upper
lower
autoplot(forecast(h=3,fit1))

m1AIC <- AIC(fit1)
m1BIC <- BIC(fit1)
m2AIC <- AIC(fit2)
m2BIC <- BIC(fit2)
m3AIC <- AIC(fit3)
m3BIC <- BIC(fit3)
m4AIC <- AIC(fit4)
m4BIC <- BIC(fit4)
m5AIC <- AIC(fit5)
m5BIC <- BIC(fit5)