HA 8.1, 8.2,8.3,8.5,8.6,8.7

library(GGally)

## Loading required package: ggplot2

library(fpp2)

## Loading required package: forecast

## Loading required package: fma

## 
## Attaching package: 'fma'

## The following object is masked from 'package:GGally':
## 
##     pigs

## Loading required package: expsmooth

library(seasonal)
library(readxl)
library(forecast)

Question 1: 8.1

Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.

a)Explain the differences among these figures. Do they all indicate that the data are white noise? Answer

All the ACF plots from a-c show white noise because the ACF lines are below the boundary lines and they drop to zero quickly.

The plots differ by the magnitude of their critical values as they decrease from left to right

Why are the critical values at different distances from the mean of zero? Why are the autocorrelations different in each figure when they each refer to white noise?

The difference between the plots is the fact that there is a decreasing of variance between the lag plots of the random numbers. This also shows the increasing level of uncorrelation from left to right to the plots.

Question 2: 8.2

A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.

autoplot(ibmclose) + xlab("Day") + ylab("Daily closing prices for IBM stock")

ggAcf(ibmclose,main="")

This ACF plot definitely shows that the series is non stationary, we can notice the lag values are higher than the critical bounds and they never drop to zero

ggPacf(ibmclose,main="")

The partial ACF also shows the high magnitude of the first lag that is more than the critical value. meaning that all the higher-order autocorrelations are effectively explained by the lag-1 autocorrelation

This shows that this timeseries needs to be differenced. Let’s check how many lags it needs to be differenced

ndiffs(ibmclose)

## [1] 1

Apply the differencing and check the plots again

 autoplot(diff(ibmclose,1))

ggAcf(diff(ibmclose,1))

ggPacf(diff(ibmclose,1))

Question 3, 8.3

For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data. usnetelec usgdp mcopper enplanements visitors

usnetelec

(lambda <- BoxCox.lambda(usnetelec))

## [1] 0.5167714

#this tells us the number of differences we need get a stationary data

ndiffs(BoxCox(usnetelec,lambda))

## [1] 2

autoplot(usnetelec)+
    xlab("Year") + ylab("") +
    ggtitle(" usnetelec series")

autoplot(BoxCox(usnetelec,lambda))+
    xlab("Year") + ylab("") +
    ggtitle("Boxcox transformed usnetelec series")

autoplot(diff(BoxCox(usnetelec,lambda)))+
    xlab("Year") + ylab("") +
    ggtitle("1st difference of\n boxcox transform usnetelec series")

ggAcf(diff(BoxCox(usnetelec,lambda)))

ggPacf(diff(BoxCox(usnetelec,lambda)))

autoplot(diff(diff(BoxCox(usnetelec,lambda))))+
    xlab("Year") + ylab("") +
    ggtitle("Double\n differenced boxcox transform usnetelec series")

ggAcf(diff(diff(BoxCox(usnetelec,lambda))))

ggPacf(diff(diff(BoxCox(usnetelec,lambda))))

This series does not look like it needs any transformation, however we used the boxcox and ran the ndiff function which showed we needed two orders of differences. However, looking at the ACF and PACF plots it looks like just one differencing is good enough to make the series stationary

usgdp

(lambda <- BoxCox.lambda(usgdp))

## [1] 0.366352

#this tells us the number of differences we need get a stationary data

ndiffs(BoxCox(usgdp,lambda))

## [1] 1

Box.test(BoxCox(usgdp,lambda), type="Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  BoxCox(usgdp, lambda)
## X-squared = 233.76, df = 1, p-value < 2.2e-16

Box.test(diff(BoxCox(usgdp,lambda)), type="Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  diff(BoxCox(usgdp, lambda))
## X-squared = 23.874, df = 1, p-value = 1.029e-06

nsdiffs(BoxCox(usgdp,lambda))

## [1] 0

autoplot(usgdp)+
    xlab("Year") + ylab("") +
    ggtitle(" usgdp series")

autoplot(BoxCox(usgdp,lambda))+
    xlab("Year") + ylab("") +
    ggtitle("Boxcox transformed usgdp series")

autoplot(diff(BoxCox(usgdp,lambda)))+
    xlab("Year") + ylab("") +
    ggtitle("1st difference of\n boxcox transform usgdp series")

ggAcf(diff(BoxCox(usgdp,lambda)))

ggPacf(diff(BoxCox(usgdp,lambda)))

autoplot(diff(diff(BoxCox(usgdp,lambda))))+
    xlab("Year") + ylab("") +
    ggtitle("Double\n differenced boxcox transform usgdp series")

ggAcf(diff(diff(BoxCox(usgdp,lambda))))

ggPacf(diff(diff(BoxCox(usgdp,lambda))))

In this case we only need one order of differencing even though the ACF has some values above the range. We also don’t need any seasonal differencing

mcopper

(lambda <- BoxCox.lambda(mcopper))

## [1] 0.1919047

#this tells us the number of differences we need get a stationary data

ndiffs(BoxCox(mcopper,lambda))

## [1] 1

Box.test(BoxCox(mcopper,lambda), type="Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  BoxCox(mcopper, lambda)
## X-squared = 550.42, df = 1, p-value < 2.2e-16

Box.test(diff(BoxCox(mcopper,lambda)), type="Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  diff(BoxCox(mcopper, lambda))
## X-squared = 57.517, df = 1, p-value = 3.353e-14

nsdiffs(BoxCox(mcopper,lambda))

## [1] 0

autoplot(mcopper)+
    xlab("Year") + ylab("") +
    ggtitle(" mcopper series")

autoplot(BoxCox(mcopper,lambda))+
    xlab("Year") + ylab("") +
    ggtitle("Boxcox transformed mcopper series")

autoplot(diff(BoxCox(mcopper,lambda)))+
    xlab("Year") + ylab("") +
    ggtitle("1st difference of\n boxcox transform mcopper series")

ggAcf(diff(BoxCox(mcopper,lambda)))

ggPacf(diff(BoxCox(mcopper,lambda)))

autoplot(diff(diff(BoxCox(mcopper,lambda))))+
    xlab("Year") + ylab("") +
    ggtitle("Double\n differenced boxcox transform mcopper series")

ggAcf(diff(diff(BoxCox(mcopper,lambda))))

ggPacf(diff(diff(BoxCox(mcopper,lambda))))

In this case we only need one order of differencing even though the first ACF lag has a value above the boundary. We also don’t need any seasonal differencing

enplanements

(lambda <- BoxCox.lambda(enplanements))

## [1] -0.2269461

#this tells us the number of differences we need get a stationary data

ndiffs(BoxCox(enplanements,lambda))

## [1] 1

Box.test(BoxCox(enplanements,lambda), type="Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  BoxCox(enplanements, lambda)
## X-squared = 253.7, df = 1, p-value < 2.2e-16

Box.test(diff(BoxCox(enplanements,lambda)), type="Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  diff(BoxCox(enplanements, lambda))
## X-squared = 16.55, df = 1, p-value = 4.739e-05

nsdiffs(BoxCox(enplanements,lambda))

## [1] 1

autoplot(enplanements)+
    xlab("Year") + ylab("") +
    ggtitle(" enplanements series")

autoplot(BoxCox(enplanements,lambda))+
    xlab("Year") + ylab("") +
    ggtitle("Boxcox transformed enplanements series")

autoplot(diff(BoxCox(enplanements,lambda)))+
    xlab("Year") + ylab("") +
    ggtitle("1st difference of\n boxcox transform enplanements series")

ggAcf(diff(BoxCox(enplanements,lambda)))

ggPacf(diff(BoxCox(enplanements,lambda)))

# Seasonal Differencing

enplanements_seasdiff <- diff(BoxCox(enplanements,lambda), lag=frequency(BoxCox(enplanements,lambda)), differences=1)
autoplot(diff(enplanements_seasdiff))+
    xlab("Year") + ylab("") +
    ggtitle("1st difference after seasonal difference of\n boxcox transform enplanements series")

ggAcf(diff(enplanements_seasdiff))

ggPacf(diff(enplanements_seasdiff))

In this case we need one order of differencing for the lags, however due to the seasonality of the time series we need to difference the seasons by 1. so after that we got a more stationary data

visitors

(lambda <- BoxCox.lambda(visitors))

## [1] 0.2775249

#this tells us the number of differences we need get a stationary data

ndiffs(BoxCox(visitors,lambda))

## [1] 1

Box.test(BoxCox(visitors,lambda), type="Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  BoxCox(visitors, lambda)
## X-squared = 205.74, df = 1, p-value < 2.2e-16

Box.test(diff(BoxCox(visitors,lambda)), type="Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  diff(BoxCox(visitors, lambda))
## X-squared = 21.922, df = 1, p-value = 2.839e-06

nsdiffs(BoxCox(visitors,lambda))

## [1] 1

autoplot(visitors)+
    xlab("Year") + ylab("") +
    ggtitle(" visitors series")

autoplot(BoxCox(visitors,lambda))+
    xlab("Year") + ylab("") +
    ggtitle("Boxcox transformed visitors series")

autoplot(diff(BoxCox(visitors,lambda)))+
    xlab("Year") + ylab("") +
    ggtitle("1st difference of\n boxcox transform visitors series")

ggAcf(diff(BoxCox(visitors,lambda)))

ggPacf(diff(BoxCox(visitors,lambda)))

# Seasonal Differencing

visitors_seasdiff <- diff(BoxCox(visitors,lambda), lag=frequency(BoxCox(visitors,lambda)), differences=1)
autoplot(diff(visitors_seasdiff))+
    xlab("Year") + ylab("") +
    ggtitle("1st difference after seasonal difference of\n boxcox transform visitors series")

ggAcf(diff(visitors_seasdiff))

ggPacf(diff(visitors_seasdiff))

#Second differencing
visitors_seasdiff2 <- diff((visitors_seasdiff), lag=frequency(visitors_seasdiff), differences = 1)
autoplot(diff(visitors_seasdiff2) )+
    xlab("Year") + ylab("") +
    ggtitle("2st difference after seasonal difference of\n boxcox transform visitors series")

ggAcf(diff(visitors_seasdiff2))

ggPacf(diff(visitors_seasdiff2))

In this case we need one order of differencing for the lags, however due to the seasonality of the time series we need to difference the seasons by 1. so after that we got a more stationary data

Question 4, 8.5

For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.

retaildata <- readxl::read_excel("C:/Users/Mezu/Documents/retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349873A"],
  frequency=12, start=c(1982,4))


ndiffs(BoxCox(myts,lambda))

## [1] 1

nsdiffs(BoxCox(myts,lambda))

## [1] 1

autoplot(myts)+xlab("Year") + ylab("") +
    ggtitle(" myts for retail data series")

lambda <- BoxCox.lambda(myts)
autoplot(BoxCox(myts,lambda))+
   xlab("Year") + ylab("") +
    ggtitle("Boxcox transformed myts for retail data series")

autoplot(diff(BoxCox(myts,lambda)))+
    xlab("Year") + ylab("") +
    ggtitle("1st difference of\n boxcox transform myts series")

ggAcf(diff(BoxCox(myts,lambda)))

ggPacf(diff(BoxCox(myts,lambda)))

# Seasonal Differencing

myts_seasdiff <- diff(BoxCox(myts,lambda), lag=frequency(BoxCox(myts,lambda)), differences=1)
autoplot(diff(myts_seasdiff))+
    xlab("Year") + ylab("") +
    ggtitle("1st difference after seasonal difference of\n boxcox transform myts series")

ggAcf(diff(myts_seasdiff))

ggPacf(diff(myts_seasdiff))

In this case we have some seasonality and first order differences to apply to our timeseries to make it stationary. We notice that after applying the differences to handle the seasonality we got a more stationary model.

Question 5, 8.6

Use R to simulate and plot some data from simple ARIMA models. Use the following R code to generate data from an AR(1) model with ??1=0.6 and ??2=1.The process starts with y=0

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.6*y[i-1] + e[i]

Produce a time plot for the series. How does the plot change as you change ??1 ?

autoplot(y)+ggtitle("??1 = 0.6")

phi1=0.94
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- 0.94*y[i-1] + e[i]
autoplot(y)+ggtitle("??1 = 0.94")

phi1=0.-0.3
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- -0.3*y[i-1] + e[i]
autoplot(y)+ggtitle("??1 = -0.3")

We notice as ??1 decreases the time series oscillates more while when we increase ??1 it speareds out or less oscillation

Write your own code to generate data from an MA(1) model with ??1=0.6 and ??2=1

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- e[i] + 0.6*e[i-1]
autoplot(y)+ggtitle("??1 = 0.6")

Produce a time plot for the series. How does the plot change as you change ??1 ?

theta1=0.94
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- e[i] + 0.94*e[i-1]
autoplot(y)+ggtitle("??1 = 0.94")

theta1=0.-0.3
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- e[i] -0.3*e[i-1]
autoplot(y)+ggtitle("??1 = -0.3")

As theta1 decreasing the timeseries oscillates more and are more closely together.

Generate data from an ARMA(1,1) model with phi1=0.6 and theta1=0.6 and ??2=1

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
  y[i] <- e[i] + 0.6*e[i-1] + 0.6*y[i-1]
autoplot(y)+ggtitle("??1 = 0.6 and ??1 = 0.6")

Generate data from an AR(2) model with phi1=-0.8 and phi2=0.3

y <- ts(numeric(100))
e <- rnorm(100)
for(i in 3:100)
  y[i] <- e[i] + -0.8*y[i-1] + 0.3*y[i-2]
autoplot(y)+ggtitle("AR(2) ??1 = -0.8 and ??2 = 0.3")

Graph the latter two series and compare them.

These two series show very different outcomes plots. THe first one is stationary while the later is not.

question 8.7

Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States. a) By studying appropriate graphs of the series in R, find an appropriate ARIMA( p,d,q ) model for these data.

autoplot(wmurders)+ggtitle("Number of women murdered in US")

This looks like a non seasonal model so we we would use the auto arima non seasonal to find the coefficients

ggAcf(wmurders)

ggPacf(wmurders)

Let’s difference the data

ndiffs(wmurders)

## [1] 2

This shows we need two orders of differencing

autoplot(diff(wmurders))+
    xlab("Year") + ylab("") +
    ggtitle("1st difference of\n women murders series")

ggAcf(diff(wmurders))

ggPacf(diff(wmurders))

Let’s try second order differencing

d_wmurders=diff(wmurders)
autoplot(diff(d_wmurders))+
    xlab("Year") + ylab("") +
    ggtitle("2nd difference of\n women murders series")

ggAcf(diff(d_wmurders))

ggPacf(diff(d_wmurders))

Looking at the PACF we can see the PACF is exponentially decaying or sinusoidal there is a significant spike at lag 1 in the ACF, but none beyond lag 1 except lag 2

This is an ARIMA(0,2,3) model

Should you include a constant in the model? Explain

No the c should be equal to 0 while the d should be equal to 2 so the prediction can follow a straight line

Write this model in terms of the backshift operator. (1-B)^2 x yt = c+ (1-1.0154B+0.4324B2 -0.3217B3)et

fit <- auto.arima(wmurders, seasonal=FALSE,stepwise=FALSE, approximation=FALSE)
summary(fit)

## Series: wmurders 
## ARIMA(0,2,3) 
## 
## Coefficients:
##           ma1     ma2      ma3
##       -1.0154  0.4324  -0.3217
## s.e.   0.1282  0.2278   0.1737
## 
## sigma^2 estimated as 0.04475:  log likelihood=7.77
## AIC=-7.54   AICc=-6.7   BIC=0.35
## 
## Training set error measures:
##                       ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.01336585 0.2016929 0.1531053 -0.3332051 4.387024 0.9415259
##                     ACF1
## Training set -0.03193856

This is an ARIMA(0,2,3) model

checkresiduals(fit)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,2,3)
## Q* = 10.706, df = 7, p-value = 0.152
## 
## Model df: 3.   Total lags used: 10

THe residuals are not satisfactory because p-value is greater than 0.05

fit %>% forecast(h=3) %>% autoplot(include=80)

fit2 <- auto.arima(wmurders, seasonal=FALSE)
summary(fit2)

## Series: wmurders 
## ARIMA(1,2,1) 
## 
## Coefficients:
##           ar1      ma1
##       -0.2434  -0.8261
## s.e.   0.1553   0.1143
## 
## sigma^2 estimated as 0.04632:  log likelihood=6.44
## AIC=-6.88   AICc=-6.39   BIC=-0.97
## 
## Training set error measures:
##                       ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.01065956 0.2072523 0.1528734 -0.2149476 4.335214 0.9400996
##                    ACF1
## Training set 0.02176343

the auto arima gives ARIMA(1,2,1) which is different from ARIMA(0,2,3) because AICc=-6.7 is lower than AICc=-6.39

homework6

Nnaemezue Obi-Eyisi

March 17, 2019