HA 8.1, 8.2,8.3,8.5,8.6,8.7
library(GGally)
## Loading required package: ggplot2
library(fpp2)
## Loading required package: forecast
## Loading required package: fma
##
## Attaching package: 'fma'
## The following object is masked from 'package:GGally':
##
## pigs
## Loading required package: expsmooth
library(seasonal)
library(readxl)
library(forecast)
Figure 8.31 shows the ACFs for 36 random numbers, 360 random numbers and 1,000 random numbers.
a)Explain the differences among these figures. Do they all indicate that the data are white noise? Answer
All the ACF plots from a-c show white noise because the ACF lines are below the boundary lines and they drop to zero quickly.
The plots differ by the magnitude of their critical values as they decrease from left to right
The difference between the plots is the fact that there is a decreasing of variance between the lag plots of the random numbers. This also shows the increasing level of uncorrelation from left to right to the plots.
A classic example of a non-stationary series is the daily closing IBM stock price series (data set ibmclose). Use R to plot the daily closing prices for IBM stock and the ACF and PACF. Explain how each plot shows that the series is non-stationary and should be differenced.
autoplot(ibmclose) + xlab("Day") + ylab("Daily closing prices for IBM stock")
ggAcf(ibmclose,main="")
This ACF plot definitely shows that the series is non stationary, we can notice the lag values are higher than the critical bounds and they never drop to zero
ggPacf(ibmclose,main="")
The partial ACF also shows the high magnitude of the first lag that is more than the critical value. meaning that all the higher-order autocorrelations are effectively explained by the lag-1 autocorrelation
This shows that this timeseries needs to be differenced. Let’s check how many lags it needs to be differenced
ndiffs(ibmclose)
## [1] 1
Apply the differencing and check the plots again
autoplot(diff(ibmclose,1))
ggAcf(diff(ibmclose,1))
ggPacf(diff(ibmclose,1))
For the following series, find an appropriate Box-Cox transformation and order of differencing in order to obtain stationary data. usnetelec usgdp mcopper enplanements visitors
usnetelec
(lambda <- BoxCox.lambda(usnetelec))
## [1] 0.5167714
#this tells us the number of differences we need get a stationary data
ndiffs(BoxCox(usnetelec,lambda))
## [1] 2
autoplot(usnetelec)+
xlab("Year") + ylab("") +
ggtitle(" usnetelec series")
autoplot(BoxCox(usnetelec,lambda))+
xlab("Year") + ylab("") +
ggtitle("Boxcox transformed usnetelec series")
autoplot(diff(BoxCox(usnetelec,lambda)))+
xlab("Year") + ylab("") +
ggtitle("1st difference of\n boxcox transform usnetelec series")
ggAcf(diff(BoxCox(usnetelec,lambda)))
ggPacf(diff(BoxCox(usnetelec,lambda)))
autoplot(diff(diff(BoxCox(usnetelec,lambda))))+
xlab("Year") + ylab("") +
ggtitle("Double\n differenced boxcox transform usnetelec series")
ggAcf(diff(diff(BoxCox(usnetelec,lambda))))
ggPacf(diff(diff(BoxCox(usnetelec,lambda))))
This series does not look like it needs any transformation, however we used the boxcox and ran the ndiff function which showed we needed two orders of differences. However, looking at the ACF and PACF plots it looks like just one differencing is good enough to make the series stationary
usgdp
(lambda <- BoxCox.lambda(usgdp))
## [1] 0.366352
#this tells us the number of differences we need get a stationary data
ndiffs(BoxCox(usgdp,lambda))
## [1] 1
Box.test(BoxCox(usgdp,lambda), type="Ljung-Box")
##
## Box-Ljung test
##
## data: BoxCox(usgdp, lambda)
## X-squared = 233.76, df = 1, p-value < 2.2e-16
Box.test(diff(BoxCox(usgdp,lambda)), type="Ljung-Box")
##
## Box-Ljung test
##
## data: diff(BoxCox(usgdp, lambda))
## X-squared = 23.874, df = 1, p-value = 1.029e-06
nsdiffs(BoxCox(usgdp,lambda))
## [1] 0
autoplot(usgdp)+
xlab("Year") + ylab("") +
ggtitle(" usgdp series")
autoplot(BoxCox(usgdp,lambda))+
xlab("Year") + ylab("") +
ggtitle("Boxcox transformed usgdp series")
autoplot(diff(BoxCox(usgdp,lambda)))+
xlab("Year") + ylab("") +
ggtitle("1st difference of\n boxcox transform usgdp series")
ggAcf(diff(BoxCox(usgdp,lambda)))
ggPacf(diff(BoxCox(usgdp,lambda)))
autoplot(diff(diff(BoxCox(usgdp,lambda))))+
xlab("Year") + ylab("") +
ggtitle("Double\n differenced boxcox transform usgdp series")
ggAcf(diff(diff(BoxCox(usgdp,lambda))))
ggPacf(diff(diff(BoxCox(usgdp,lambda))))
In this case we only need one order of differencing even though the ACF has some values above the range. We also don’t need any seasonal differencing
mcopper
(lambda <- BoxCox.lambda(mcopper))
## [1] 0.1919047
#this tells us the number of differences we need get a stationary data
ndiffs(BoxCox(mcopper,lambda))
## [1] 1
Box.test(BoxCox(mcopper,lambda), type="Ljung-Box")
##
## Box-Ljung test
##
## data: BoxCox(mcopper, lambda)
## X-squared = 550.42, df = 1, p-value < 2.2e-16
Box.test(diff(BoxCox(mcopper,lambda)), type="Ljung-Box")
##
## Box-Ljung test
##
## data: diff(BoxCox(mcopper, lambda))
## X-squared = 57.517, df = 1, p-value = 3.353e-14
nsdiffs(BoxCox(mcopper,lambda))
## [1] 0
autoplot(mcopper)+
xlab("Year") + ylab("") +
ggtitle(" mcopper series")
autoplot(BoxCox(mcopper,lambda))+
xlab("Year") + ylab("") +
ggtitle("Boxcox transformed mcopper series")
autoplot(diff(BoxCox(mcopper,lambda)))+
xlab("Year") + ylab("") +
ggtitle("1st difference of\n boxcox transform mcopper series")
ggAcf(diff(BoxCox(mcopper,lambda)))
ggPacf(diff(BoxCox(mcopper,lambda)))
autoplot(diff(diff(BoxCox(mcopper,lambda))))+
xlab("Year") + ylab("") +
ggtitle("Double\n differenced boxcox transform mcopper series")
ggAcf(diff(diff(BoxCox(mcopper,lambda))))
ggPacf(diff(diff(BoxCox(mcopper,lambda))))
In this case we only need one order of differencing even though the first ACF lag has a value above the boundary. We also don’t need any seasonal differencing
enplanements
(lambda <- BoxCox.lambda(enplanements))
## [1] -0.2269461
#this tells us the number of differences we need get a stationary data
ndiffs(BoxCox(enplanements,lambda))
## [1] 1
Box.test(BoxCox(enplanements,lambda), type="Ljung-Box")
##
## Box-Ljung test
##
## data: BoxCox(enplanements, lambda)
## X-squared = 253.7, df = 1, p-value < 2.2e-16
Box.test(diff(BoxCox(enplanements,lambda)), type="Ljung-Box")
##
## Box-Ljung test
##
## data: diff(BoxCox(enplanements, lambda))
## X-squared = 16.55, df = 1, p-value = 4.739e-05
nsdiffs(BoxCox(enplanements,lambda))
## [1] 1
autoplot(enplanements)+
xlab("Year") + ylab("") +
ggtitle(" enplanements series")
autoplot(BoxCox(enplanements,lambda))+
xlab("Year") + ylab("") +
ggtitle("Boxcox transformed enplanements series")
autoplot(diff(BoxCox(enplanements,lambda)))+
xlab("Year") + ylab("") +
ggtitle("1st difference of\n boxcox transform enplanements series")
ggAcf(diff(BoxCox(enplanements,lambda)))
ggPacf(diff(BoxCox(enplanements,lambda)))
# Seasonal Differencing
enplanements_seasdiff <- diff(BoxCox(enplanements,lambda), lag=frequency(BoxCox(enplanements,lambda)), differences=1)
autoplot(diff(enplanements_seasdiff))+
xlab("Year") + ylab("") +
ggtitle("1st difference after seasonal difference of\n boxcox transform enplanements series")
ggAcf(diff(enplanements_seasdiff))
ggPacf(diff(enplanements_seasdiff))
In this case we need one order of differencing for the lags, however due to the seasonality of the time series we need to difference the seasons by 1. so after that we got a more stationary data
visitors
(lambda <- BoxCox.lambda(visitors))
## [1] 0.2775249
#this tells us the number of differences we need get a stationary data
ndiffs(BoxCox(visitors,lambda))
## [1] 1
Box.test(BoxCox(visitors,lambda), type="Ljung-Box")
##
## Box-Ljung test
##
## data: BoxCox(visitors, lambda)
## X-squared = 205.74, df = 1, p-value < 2.2e-16
Box.test(diff(BoxCox(visitors,lambda)), type="Ljung-Box")
##
## Box-Ljung test
##
## data: diff(BoxCox(visitors, lambda))
## X-squared = 21.922, df = 1, p-value = 2.839e-06
nsdiffs(BoxCox(visitors,lambda))
## [1] 1
autoplot(visitors)+
xlab("Year") + ylab("") +
ggtitle(" visitors series")
autoplot(BoxCox(visitors,lambda))+
xlab("Year") + ylab("") +
ggtitle("Boxcox transformed visitors series")
autoplot(diff(BoxCox(visitors,lambda)))+
xlab("Year") + ylab("") +
ggtitle("1st difference of\n boxcox transform visitors series")
ggAcf(diff(BoxCox(visitors,lambda)))
ggPacf(diff(BoxCox(visitors,lambda)))
# Seasonal Differencing
visitors_seasdiff <- diff(BoxCox(visitors,lambda), lag=frequency(BoxCox(visitors,lambda)), differences=1)
autoplot(diff(visitors_seasdiff))+
xlab("Year") + ylab("") +
ggtitle("1st difference after seasonal difference of\n boxcox transform visitors series")
ggAcf(diff(visitors_seasdiff))
ggPacf(diff(visitors_seasdiff))
#Second differencing
visitors_seasdiff2 <- diff((visitors_seasdiff), lag=frequency(visitors_seasdiff), differences = 1)
autoplot(diff(visitors_seasdiff2) )+
xlab("Year") + ylab("") +
ggtitle("2st difference after seasonal difference of\n boxcox transform visitors series")
ggAcf(diff(visitors_seasdiff2))
ggPacf(diff(visitors_seasdiff2))
In this case we need one order of differencing for the lags, however due to the seasonality of the time series we need to difference the seasons by 1. so after that we got a more stationary data
For your retail data (from Exercise 3 in Section 2.10), find the appropriate order of differencing (after transformation if necessary) to obtain stationary data.
retaildata <- readxl::read_excel("C:/Users/Mezu/Documents/retail.xlsx", skip=1)
myts <- ts(retaildata[,"A3349873A"],
frequency=12, start=c(1982,4))
ndiffs(BoxCox(myts,lambda))
## [1] 1
nsdiffs(BoxCox(myts,lambda))
## [1] 1
autoplot(myts)+xlab("Year") + ylab("") +
ggtitle(" myts for retail data series")
lambda <- BoxCox.lambda(myts)
autoplot(BoxCox(myts,lambda))+
xlab("Year") + ylab("") +
ggtitle("Boxcox transformed myts for retail data series")
autoplot(diff(BoxCox(myts,lambda)))+
xlab("Year") + ylab("") +
ggtitle("1st difference of\n boxcox transform myts series")
ggAcf(diff(BoxCox(myts,lambda)))
ggPacf(diff(BoxCox(myts,lambda)))
# Seasonal Differencing
myts_seasdiff <- diff(BoxCox(myts,lambda), lag=frequency(BoxCox(myts,lambda)), differences=1)
autoplot(diff(myts_seasdiff))+
xlab("Year") + ylab("") +
ggtitle("1st difference after seasonal difference of\n boxcox transform myts series")
ggAcf(diff(myts_seasdiff))
ggPacf(diff(myts_seasdiff))
In this case we have some seasonality and first order differences to apply to our timeseries to make it stationary. We notice that after applying the differences to handle the seasonality we got a more stationary model.
Use R to simulate and plot some data from simple ARIMA models. Use the following R code to generate data from an AR(1) model with ??1=0.6 and ??2=1.The process starts with y=0
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.6*y[i-1] + e[i]
Produce a time plot for the series. How does the plot change as you change ??1 ?
autoplot(y)+ggtitle("??1 = 0.6")
phi1=0.94
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
y[i] <- 0.94*y[i-1] + e[i]
autoplot(y)+ggtitle("??1 = 0.94")
phi1=0.-0.3
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
y[i] <- -0.3*y[i-1] + e[i]
autoplot(y)+ggtitle("??1 = -0.3")
We notice as ??1 decreases the time series oscillates more while when we increase ??1 it speareds out or less oscillation
Write your own code to generate data from an MA(1) model with ??1=0.6 and ??2=1
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
y[i] <- e[i] + 0.6*e[i-1]
autoplot(y)+ggtitle("??1 = 0.6")
Produce a time plot for the series. How does the plot change as you change ??1 ?
theta1=0.94
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
y[i] <- e[i] + 0.94*e[i-1]
autoplot(y)+ggtitle("??1 = 0.94")
theta1=0.-0.3
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
y[i] <- e[i] -0.3*e[i-1]
autoplot(y)+ggtitle("??1 = -0.3")
As theta1 decreasing the timeseries oscillates more and are more closely together.
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 2:100)
y[i] <- e[i] + 0.6*e[i-1] + 0.6*y[i-1]
autoplot(y)+ggtitle("??1 = 0.6 and ??1 = 0.6")
Generate data from an AR(2) model with phi1=-0.8 and phi2=0.3
y <- ts(numeric(100))
e <- rnorm(100)
for(i in 3:100)
y[i] <- e[i] + -0.8*y[i-1] + 0.3*y[i-2]
autoplot(y)+ggtitle("AR(2) ??1 = -0.8 and ??2 = 0.3")
Graph the latter two series and compare them.
These two series show very different outcomes plots. THe first one is stationary while the later is not.
Consider wmurders, the number of women murdered each year (per 100,000 standard population) in the United States. a) By studying appropriate graphs of the series in R, find an appropriate ARIMA( p,d,q ) model for these data.
autoplot(wmurders)+ggtitle("Number of women murdered in US")
This looks like a non seasonal model so we we would use the auto arima non seasonal to find the coefficients
ggAcf(wmurders)
ggPacf(wmurders)
Let’s difference the data
ndiffs(wmurders)
## [1] 2
This shows we need two orders of differencing
autoplot(diff(wmurders))+
xlab("Year") + ylab("") +
ggtitle("1st difference of\n women murders series")
ggAcf(diff(wmurders))
ggPacf(diff(wmurders))
Let’s try second order differencing
d_wmurders=diff(wmurders)
autoplot(diff(d_wmurders))+
xlab("Year") + ylab("") +
ggtitle("2nd difference of\n women murders series")
ggAcf(diff(d_wmurders))
ggPacf(diff(d_wmurders))
Looking at the PACF we can see the PACF is exponentially decaying or sinusoidal there is a significant spike at lag 1 in the ACF, but none beyond lag 1 except lag 2
This is an ARIMA(0,2,3) model
Should you include a constant in the model? Explain
No the c should be equal to 0 while the d should be equal to 2 so the prediction can follow a straight line
Write this model in terms of the backshift operator. (1-B)^2 x yt = c+ (1-1.0154B+0.4324B2 -0.3217B3)et
fit <- auto.arima(wmurders, seasonal=FALSE,stepwise=FALSE, approximation=FALSE)
summary(fit)
## Series: wmurders
## ARIMA(0,2,3)
##
## Coefficients:
## ma1 ma2 ma3
## -1.0154 0.4324 -0.3217
## s.e. 0.1282 0.2278 0.1737
##
## sigma^2 estimated as 0.04475: log likelihood=7.77
## AIC=-7.54 AICc=-6.7 BIC=0.35
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.01336585 0.2016929 0.1531053 -0.3332051 4.387024 0.9415259
## ACF1
## Training set -0.03193856
This is an ARIMA(0,2,3) model
checkresiduals(fit)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(0,2,3)
## Q* = 10.706, df = 7, p-value = 0.152
##
## Model df: 3. Total lags used: 10
THe residuals are not satisfactory because p-value is greater than 0.05
fit %>% forecast(h=3) %>% autoplot(include=80)
fit2 <- auto.arima(wmurders, seasonal=FALSE)
summary(fit2)
## Series: wmurders
## ARIMA(1,2,1)
##
## Coefficients:
## ar1 ma1
## -0.2434 -0.8261
## s.e. 0.1553 0.1143
##
## sigma^2 estimated as 0.04632: log likelihood=6.44
## AIC=-6.88 AICc=-6.39 BIC=-0.97
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.01065956 0.2072523 0.1528734 -0.2149476 4.335214 0.9400996
## ACF1
## Training set 0.02176343
the auto arima gives ARIMA(1,2,1) which is different from ARIMA(0,2,3) because AICc=-6.7 is lower than AICc=-6.39