Note: This note includes April 27.

In summary, there are 3 major steps on time series modeling:

  1. Model Identification

    -Create plot
    -identify if there is seasonality
    -check stationarity by using ADF test
    -Difference if needed
    -Examine ACF and PACF
    -Identify possible model
  2. Model Building

    -estimate the parameter
    -we can simply used built in function like Arima() of forecast package
    -Use lmtest package and see if there are parameters that are not significant
    -Remodel if needed, we can use AIC to select best model
  3. Run Model Diagnostic Check. Goal is to have white noise residual and if possible guassian white noise.

    -check normality of residual using qqnorm() function or shapiro.test()
    -check check constant variance assumption, residual vs fitted plot
    -check residual independence, residual vs order plot
  4. (Optional) Forecast Evaluation using in-sample and out sample strategy, split your data, use in sample to estimate the model and check model performance on out sample. This is recommended if you have large data set. (How large? I don’t know haha)

April 27 Notes

Load necessary Library

library(forecast)
library(lmtest)
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library(readxl)
require(fUnitRoots)
## Loading required package: fUnitRoots
## Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
## logical.return = TRUE, : there is no package called 'fUnitRoots'
#set the working directory to your folder
setwd("C:/Users/DSTD-NilreyC/Desktop/Nilrey Desktop files/Stat and Data Science References/MS Applied Math Materials/Time Series/Math 215 -MSMA")

Example 6.3

Model Identification

#Load the data
Example_6_3 <- read_excel("Example 6.3.xlsx")
appdata<-ts(Example_6_3$Applications)
plot(appdata)

#if you have fUnitRoots follow the code below
##adfTest(appdata,lags=1,type="ct")

#if none, use tseries
tseries::adf.test(appdata)
## Warning in tseries::adf.test(appdata): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  appdata
## Dickey-Fuller = -4.4074, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
Acf(appdata)

Pacf(appdata)

Based on the result above, below is summary of the result:

ACF plot ACF - tails off PACF - cuts off after lag 2

Partial ACF plot -ACF - cuts off after lag 2 -PACF - tails off -ARIMA (0,0,2) -MA (2)

Model Building

model1<-Arima(appdata,order=c(0,0,2))
model2<-Arima(appdata,order=c(2,0,0))
model3<-Arima(appdata,order=c(1,0,1))
model1$aic
## [1] 689.7361
model2$aic
## [1] 682.924
model3$aic
## [1] 689.7336

We see from AIC above that model2 has the lowest AIC.

Now, let us see the coefficients and if they are significant using lmtest package.

require(lmtest)
coeftest(model2)
## 
## z test of coefficients:
## 
##            Estimate Std. Error z value  Pr(>|z|)    
## ar1        0.265893   0.088983  2.9881  0.002807 ** 
## ar2        0.412971   0.090068  4.5851 4.538e-06 ***
## intercept 66.853785   1.833404 36.4643 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model2$coef
##        ar1        ar2  intercept 
##  0.2658927  0.4129705 66.8537847
model2$coef[3]*(1-model2$coef[1]-model2$coef[2])
## intercept 
##  21.46921

Model Diagnostic Checking

res<-as.numeric(model2$residuals)
fit<-as.numeric(model2$fitted)
qqnorm(res,datax=TRUE,ylab="Residuals",pch=16)
qqline(res,datax=TRUE)

hist(res,xlab="Residuals",ylab="Frequency")

plot(fit,res,xlab="Fitted Value",ylab="Residuals",pch=16)
abline(h=0)

Based on the plots above, the residuals have a constant vairance because the plot is structureless.

Let’s make another plot.

plot(res,type="l",xlab="Order",ylab="Residuals")
abline(h=0)

Acf(res)

Pacf(res)

Box.test(res,lag=20,type="Ljung-Box",fitdf = 3)
## 
##  Box-Ljung test
## 
## data:  res
## X-squared = 14.024, df = 17, p-value = 0.6654

Based on above result:

-The residuals are uncorrelated -Seems normally distributted -The residuals are gaussian white noise -Box plot also is non-significant

Now let us forecast 5 step-ahead

forecast(model2,h=5)
##     Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 105       62.58571 54.53554 70.63588 50.27404 74.89738
## 106       64.12744 55.79756 72.45731 51.38798 76.86689
## 107       64.36628 55.17133 73.56123 50.30381 78.42875
## 108       65.06647 55.67336 74.45959 50.70094 79.43201
## 109       65.35129 55.72228 74.98029 50.62500 80.07757

May 4, 2019 Notes

Example 2.2

Example_2_2<-read_excel("Example 2.2.xlsx")
data1<-ts(Example_2_2$'Dow Jones',start=c(1999,6),frequency=12)
plot(data1)

#adfTest(data1,lags=0,type="nc")

#if no adfTest from fUnitRoot, use tseries
tseries::adf.test(data1)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  data1
## Dickey-Fuller = -1.3254, Lag order = 4, p-value = 0.8528
## alternative hypothesis: stationary

-Result = P value is small

-Model identification:

    - A decreasing trend from 1999 to 2003
    - Increasing trend from 2003 onwards
    - nonstationaty

-ADF Test

    - We fail to reject the Ho
    - Ho: nonstationary
    - H1: stationary

So we will difference the data and plot ACF and PACF and conduct ADF test again on difference data.

Ddata1<-diff(data1,1)
plot(Ddata1)

Acf(data1)

Pacf(data1)

Based on ACF and PACF, possible model is ARIMA(0,1,0)

Model building

model<-Arima(data1,order=c(0,1,0))
model
## Series: data1 
## ARIMA(0,1,0) 
## 
## sigma^2 estimated as 165630:  log likelihood=-623.93
## AIC=1249.85   AICc=1249.9   BIC=1252.28

Model Diagnostic check

We already made analysis on the model from last discussion. But we will still employ diagnostic check.

res<-as.numeric(model$residuals[-1])
fit<-as.numeric(model$fitted[-1])
length((Ddata1))
## [1] 84
length(res)
## [1] 84
qqnorm(res,datax=TRUE,ylab="Residual",pch=16)
qqline(res,datax=TRUE)

plot(fit,res,xlab="Fitted Value",ylab="Residual",pch=16)
abline(h=0)

plot(res,xlab="Order",ylab="Residual",type="l")
abline(h=0)

-QQnorm shows that since residuals are close to the link, this suggest that it is normally distributed. -Residual vs Fitted value is structureless, no pattern. -We can clearly see in residual vs order plot that the variance is not constant, hence not white noise.

Note: Sometimes, we can’t find a perfect model we will just settle on what we can get.

Forecasting Using Difference Equation

Example 6.5 ARIMA(1,1,0)

Note: In case your forecasting equation has \[\epsilon\] , replace it with e (residual)
[Attached photo on ex.6.5 derivation]

Example 6.7, same to 6.3 where model develop is AR(2) or ARIMA(2,0,0). We can use Arima() function from forecast package. Then we will use forecast() function to forecast or predict

appdata<-ts(Example_6_3$Applications)
data2<-ts(Example_6_3$Applications)
model<-arima(data2,order=c(2,0,0))
model2<-arima(appdata,order=c(2,0,0))
appdata
## Time Series:
## Start = 1 
## End = 104 
## Frequency = 1 
##   [1] 71 57 62 64 65 67 65 82 70 74 75 81 71 75 82 74 78 75 73 76 66 69 63
##  [24] 76 65 73 62 77 76 88 71 72 66 65 73 76 81 84 68 63 66 71 67 69 63 61
##  [47] 68 75 66 81 72 77 66 71 59 57 66 51 59 56 57 55 53 74 64 70 74 69 64
##  [70] 68 64 70 73 59 68 59 66 63 63 61 73 72 65 70 54 63 62 60 67 59 74 61
##  [93] 61 52 55 61 56 61 60 65 55 61 59 63
class(model2)
## [1] "Arima"
#load forcast package
forecast(model2,h=12)
##     Point Forecast    Lo 80    Hi 80    Lo 95    Hi 95
## 105       62.58571 54.65250 70.51892 50.45291 74.71851
## 106       64.12744 55.91858 72.33629 51.57307 76.68180
## 107       64.36628 55.30492 73.42764 50.50812 78.22444
## 108       65.06647 55.80983 74.32312 50.90965 79.22330
## 109       65.35129 55.86218 74.84039 50.83895 79.86362
## 110       65.71617 56.13346 75.29889 51.06068 80.37167
## 111       65.93081 56.27109 75.59054 51.15754 80.70409
## 112       66.13857 56.43926 75.83789 51.30475 80.97240
## 113       66.28246 56.55529 76.00962 51.40605 81.15887
## 114       66.40651 56.66341 76.14961 51.50572 81.30730
## 115       66.49892 56.74534 76.25249 51.58211 81.41572
## 116       66.57472 56.81486 76.33458 51.64830 81.50114

Seasonal Process

[See slides for more details] ARIMA(p,d,q)x(P,D,Q)s

recall: MA(1) cut off after log one ACF, PACF exponential decay. For seasonal, look at the frequency, example if it is monthly data s=12, then look at the behavior of 12,24,36 . . . of ACF and PACF

Example 6.8 ARIMA(0,1,1)x(0,1,1)12

Example 6.9

Model Identification

Load the Data Example 6.9

#Load the Data Example 6.9
Example_6_9<-read_excel("Example 6.9.xlsx")
Example_6_9$Mo<-rep(1:12,11)
data6.9<-Example_6_9
data6.9<-ts(data6.9$Employ, start=c(1971,1), frequency = 12)
plot(data6.9)

seasonplot(data6.9)

boxplot(Employ~Mo, Example_6_9)

tseries::adf.test(data6.9)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  data6.9
## Dickey-Fuller = -2.5744, Lag order = 5, p-value = 0.3376
## alternative hypothesis: stationary

-Based on graph, there is change in mean -seasonal -Based on adf test, it is non stationary. We will proceed on differencing

Ddata1<-diff(diff(data6.9,12),1)
plot(Ddata1)

tseries::adf.test(Ddata1)
## Warning in tseries::adf.test(Ddata1): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  Ddata1
## Dickey-Fuller = -5.9371, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary

Now, the data is stationary and the adf test is significant.

Let’s create ACF and PACF plot.

Acf(Ddata1)

Pacf(Ddata1)

-Seasonal lags, ACF cuts of after lag 12 and PACF tails off. -Non seasonal lags, non significant ACF and PACF, but let us say ACF and PACF tails off. -Based on above findins we can try following model ARIMA(0,1,0)x(0,1,1) and ARIMA(1,1,1)(0,1,1)

Model Building

model1<-Arima(data6.9, order=c(0,1,0), seasonal = c(0,1,1))
model2<-Arima(data6.9, order = c(1,1,1), seasonal = c(0,1,1))
model1$aic
## [1] 1302.617
model2$aic
## [1] 1296.806
# let us see if coeffciencts are significant
library(lmtest)
coeftest(model2)
## 
## z test of coefficients:
## 
##      Estimate Std. Error z value  Pr(>|z|)    
## ar1   0.28778    0.24101  1.1940  0.232467    
## ma1  -0.57017    0.20550 -2.7745  0.005528 ** 
## sma1 -0.99987    0.12934 -7.7308 1.069e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Let us drop AR(1), hence ARIMA(0,1,1)x(0,1,1)12

Model Diagnostic Checking

model3<-Arima(data6.9, order = c(0,1,1), seasonal = c(0,1,1))
model3$aic
## [1] 1295.926

We see that AIC of model 3 is much lower, hence we select model 3. Now let us proceed on model diagnostic step. We saythat all parameters are significant.

#now let us get the residual, first observed that residual still did not consider the differencing
length(model3$residuals)
## [1] 132
#now let us rmeove first 13
res_val<-as.numeric(model3$residuals[-1:-13])
fit_val<-as.numeric(model3$fitted[-1:-13])

#test for nromality
qqnorm(res_val, datax = TRUE, ylab="Residual", pch=16)
qqline(res_val, datax = TRUE, ylab="Residual", pch=16)

shapiro.test(res_val)
## 
##  Shapiro-Wilk normality test
## 
## data:  res_val
## W = 0.98081, p-value = 0.08734

Based on residual plot, seems not normally distributed.

let’s plot more about residual

plot(fit_val, res_val,pch=16)

plot(res_val, xlab="Order", type="l")

Based on the plots above, seems that residuals when plotted against fitted value is strucrreless or no pattern which could mean that variance is constant.

In addition, pattern of residual vs order, there is no pattern. Hence, residual is independence.

We could say now that residual is white noise. Let us examine Acf and Pacf to see if there is no significant spikes to support our claim.

Acf(res_val)

Pacf(res_val)

ACF and PACF are not significant based on the plots which indicates that individually autocorrelation and partial-autocorrelation are not significant.

Let us proceed to Ljung Box, where Ho: No serial correlation among the residuals.

Box.test(res_val, lag=20, type = "Ljung-Box", fitdf=2)
## 
##  Box-Ljung test
## 
## data:  res_val
## X-squared = 16.908, df = 18, p-value = 0.5294

Based on test, p-value is greater than 0.05 which we say that there is no serail correlation among the residuals.

Model Evaluation

Say we have in-sample that we used to build the model and out-sample to get the performance.

We will use the out-sample to performa forecast evaluation

insample<-ts(data6.9[1:82], start=c(1971,1),frequency = 12)
modelA<-Arima(insample, order=c(0,1,1), seasonal = c(0,1,1))
modelB<-Arima(data6.9, model=modelA)
FE<-modelB$residuals[83:132]
acf(FE)

pacf(FE)

shapiro.test(FE)
## 
##  Shapiro-Wilk normality test
## 
## data:  FE
## W = 0.9486, p-value = 0.02989

If forecast error ACF and PACF plot shows no spikes which means independence and shaprito test shows normality assumption holds, then our model is good to go.

Note: Having out-sample model evaluation is not a hard rule (by sir Capili)

End of note