Time Series Project Material by Coursera Projects

#Outline

This project has several sections and will provide you a concise introduction to time series concepts in R. We will learn the essential theory and also practice fitting the four main types of time series models, getting you up and running with all the basics in a little more than an hour!

Introduction to Rhyme Environment
Time Series Data Overview (Theory)
Why Time Series? (Theory)
Key Concepts: Autocorrelation / Autocovariance (Theory)
Key Concepts: Stationarity (Theory)
Checking for Stationarity (Practice)
Transforming for Stationarity: Differencing (Practice)
Transforming for Stationarity: Detrending (Practice)
Basic Model Types: AR(p), MA(q), ARMA(p,q), ARIMA(p,d,q), Decomposition (Theory)
Fitting AR / MA / ARMA / ARIMA models with the Box Jenkins Method (Theory)
Box Jenkins Method: Checking for Stationarity (Practice)
Box Jenkins Method: Transforming for Stationarity & Identifying Model Parameters (Practice)
Box Jenkins Method: Checking the Residuals of the Model Fit (Practice)
Making a Forecast for Each Model (Practice)
Fitting STL (Seasonal Trend Loess) Decomposition Models (Practice)
Where to go Next

#load required r packages

library(IRdisplay)

library(magrittr)
library(tidyverse)

## ── Attaching packages ──────────────────── tidyverse 1.3.0 ──

## ✓ ggplot2 3.3.2     ✓ purrr   0.3.4
## ✓ tibble  3.0.3     ✓ dplyr   1.0.0
## ✓ tidyr   1.1.0     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0

## ── Conflicts ─────────────────────── tidyverse_conflicts() ──
## x tidyr::extract()   masks magrittr::extract()
## x dplyr::filter()    masks stats::filter()
## x dplyr::lag()       masks stats::lag()
## x purrr::set_names() masks magrittr::set_names()

library(scales)

## 
## Attaching package: 'scales'

## The following object is masked from 'package:purrr':
## 
##     discard

## The following object is masked from 'package:readr':
## 
##     col_factor

library(gridExtra)

## 
## Attaching package: 'gridExtra'

## The following object is masked from 'package:dplyr':
## 
##     combine

library(forecast)
library(tseries)

library(ggthemes)

theme_set(theme_economist())

#load helper R functions

source("/cloud/project/compare_models_function.R")
source("/cloud/project/sim_random_walk_function.R")
source("/cloud/project/sim_stationary_example_function.R")

print("Loading is completed")

## [1] "Loading is completed"

Time Series Data Overview

(Univariate) time series data is defined as sequence data over time: 𝑋1,𝑋2,…,𝑋𝑇 where 𝑡 is the time period and 𝑋𝑡 is the value of the time series at a particular point

Examples: daily temperatures in Boston, US presidential election turnout by year, minute stock prices

Variables in time series models generally fall into three categories:

endogenous
random noise
exogenous

All time series models involve (1) and (2) but (3) is optional.

The answer is that:

many forecasting tasks actually involve small samples which makes machine learning less effective
time series models are more interpretable and less black box than machine learning algorithms
time series appropriately accounts for forecasting uncertainty.

As an example, lets look at the following data generating process known as a random walk: 𝑋𝑡=𝑋𝑡−1+𝜖𝑡 We can compare the forecasting performance of linear regression to that of a basic time series model known as an AR(1) model.

#run function to compare linear regression to basic AR(1) time series model
compare.models(n=100)

Key Concepts: Autocorrelation/Autocovariance Autocorrelation/autocovariance refers to the correlation/covariance between two observations in the time series at different points.

The central idea behind it is how related the data/time series is over time.

For ease of interpretation we typically focus on autocorrelation i.e. what is the correlation between 𝑋𝑡 and 𝑋𝑡+𝑝 for some integer 𝑝 .

A related concept is partial autocorrelation that computes the correlation adjusting for previous lags/periods i.e. the autocorrelation between 𝑋𝑡 and 𝑋𝑡+𝑝 adjusting for the correlation of 𝑋𝑡 and 𝑋𝑡+1 , … , 𝑋𝑡+𝑝−1 .

When analyzing time series we usually view autocorrelation/partial autocorrelation in ACF/PACF plots.

Let’s view this for the random walk model we analyzed above: 𝑋𝑡=𝑋𝑡−1+𝜖𝑡 .

#simulate random walk
dat<-sim.random.walk()

#plot random walk
dat %>% 
ggplot(aes(t,X)) + geom_line() + xlab("T") + ylab("X") + ggtitle("Time Series Plot")

#ACF plot
ggAcf(dat$X,type="correlation") + ggtitle("Autocorrelation ACF Plot")

#PACF plot
ggAcf(dat$X,type="partial") + ggtitle("Partial Autocorrelation PACF Plot")

Key Concepts: Stationarity

The second key concept in time series is stationarity.

While the concept can get quite technical, the basic idea is examining whether the distribution of the data over time is consistent.

There are two main forms of stationarity.

Strict stationarity imples:

The cumulative distribution function of the data does not depend on time:

𝐹𝑋(𝑋1,…,𝑋𝑇)=𝐹𝑋(𝑋1+Δ,…,𝑋𝑇+Δ) ∀Δ∈ℝ

Weak stationarity implies:

the mean of the time series is constant 𝐸(𝑋𝑡)=𝐸(𝑋𝑡+Δ) the autocovariance/autocorrelation only depends on the time difference between points 𝐴𝐶𝐹(𝑋𝑡,𝑋𝑡+Δ−1)=𝐴𝐶𝐹(𝑋1,𝑋Δ) the time series has a finite variance 𝑉𝑎𝑟(𝑋Δ)<∞ ∀Δ∈ℝ

Checking for Stationarity

#create three time series for example
df<-sim.stationary.example(n=1000)
head(df);dim(df)

##   t         X1       X2         X3
## 1 1  0.3376258 1.144196  1.8088315
## 2 2 -0.9906829 1.799888  0.7675419
## 3 3 -1.9938056 1.293361  1.5449087
## 4 4 -0.4263313 5.455526  0.4429517
## 5 5  0.7133339 5.452631  0.2237386
## 6 6 -0.6529274 7.750084 -1.4701338

## [1] 1000    4

Check a plot of the time series over time and look for constant mean and finite variance i.e. values appear bounded.

#plot nonstationary and stationary time series

g1<- ggplot(df,aes(x=t,y=X1)) + geom_line() + xlab("T") + ylab("X1") + ggtitle("Nonstationary")

g2<- ggplot(df,aes(x=t,y=X3)) + geom_line() + xlab("T") + ylab("X3") + ggtitle("Stationary")
                                                                               
grid.arrange(g1,g2)

Look at the ACF plot and see if it dies off quickly as opposed to a gradual decline.

#ACF for nonstationary and stationary time series

g1<- ggAcf(df$X1,type="correlation") + xlab("T") + ylab("X1") + ggtitle("Nonstationary")

g2<- ggAcf(df$X3,type="correlation") + xlab("T") + ylab("X3") + ggtitle("Stationary")

grid.arrange(g1,g2)

Perform unit root tests such as the Augmented Dickey–Fuller test.

#perform unit test; nonstationary example has large, non-significant p-value

adf.test(df$X1)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  df$X1
## Dickey-Fuller = -2.4507, Lag order = 9, p-value = 0.3875
## alternative hypothesis: stationary

adf.test(df$X3)

## Warning in adf.test(df$X3): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  df$X3
## Dickey-Fuller = -9.8636, Lag order = 9, p-value = 0.01
## alternative hypothesis: stationary

##Transforming for Stationarity

#Differencing

Differencing involves taking differences between successive time series values. The order of differencing is defined as p for Xt - Xt-p

Let’s transform a nonstationary time series to stationary by differencing with the random walk model in a random walk Xt =Xt-1 + et where et ~ N(0,sigma^2) iid.

Differencing with an order of one means tha X˜t = Xt -Xt-1 = et

#difference time series to make stationary

diff<- df$X1 - lag(df$X1,1)

#plot original and differenced time series

g1<- ggAcf(df$X1,type="correlation")
g2<- ggAcf(diff,type="correlation")
grid.arrange(g1,g2)

Detrending

Detrending involves removing a deterministic relationship with time. As an example suppose we have the following data generating process Xt= Bt + et where et~N(0,sigma^2) iid. Detrending involves using the transformed time series X˜t=Xt-Bt=et

#detrending time series to make stationary

detrended<- resid(lm(X2~t,data=df))

#plot original and detrended time series

g1<- ggAcf(df$X2,type="correlation")
g2<- ggAcf(detrended,type="correlation")
grid.arrange(g1,g2)

Basic Model Types: AR(p), MA(q), ARMA(p,q),ARIMA(p,d,q),Decomposition

Autoregressive AR(p) Models

AR models specify Xt as a function of lagged time series values Xt-1,Xt-2….

i.e. Xt = µ + ø1 Xt-1 +…+øp Xt-p + et

where µ is a mean term and et ~iid N(0,sigma^2) is a random error

When fitting an AR model the key choice is p, the number of lags to include

Moving Average MA(q) Models

MA models specify Xt using random noise lags: Xt = µ + et +Ø1 et-1+…+Øqet-q where µ is a mean term and et~iid N(0,sigma^2) is a random error similar to an AR model the key choice is q, the number of random shock lags

Autoregressive Moving Average ARMA(p,q) Models

ARMA(p,q) models are a combination of an AR and MA model:

Xt = µ + ø1 Xt-1 +…+øp Xt-p + et + Ø1 et-1+…+Øqet-q

where µ is a mean term and et ~iid N(0,sigma^2) is a random error

when fitting an ARMA model, we need to choose two things: p, the number of AR lags, and q, the number of MA lags

Autoregressive Integrated Moving Average ARIMA(p,d, q) Models

ARIMA(p,d,q) is an ARMA model with differencing

when fitting an ARIMA model we need to choose three things: p, the number of AR lags, q, the number of MA lags, and d, the number of differences to use.

Decomposition Models

Decomposition models specify Xt as a combination of a trend component (Tt), seasonal component (St), and an error component/residual (Et)i.e. Xt =f(Tt,St,Et)

Common decomposition forms are: Xt=Tt+St+Et or Xt=TtStEt (where then take logs to recover the additive form).

There are various waist estimate the different trend components: exponential shooting, state space models/Kalman filtering, STL models, etc.

In this project we will cover STL models because of their ease of use and flexibility.

Fitting AR/MA/ARMA/ARIMA models with the BOX Jenkins

We will now go over how to fit AR/MA/ARMA/ARIMA models on a real data set and review a generic strategy for fitting them known as the Box Jenkins method.

This process involves several steps to help identify the p,d, and q parameters that we need: - identify whether the time series is stationary or not - identify p,d, and q of the time series by: - making the time series stationary through differencing/detrending to find d - looking at ACF/PACF to find p and q - using model fit diagnostics like AIC or BIC to select the best model to find p,d, and q - check the model fit using the Ljung-Box test.

#load data

ur<- read.csv("Mass Monthly Unemployment Rate.csv")

head(ur); dim(ur)

##         DATE MAURN
## 1 1976-01-01  11.6
## 2 1976-02-01  11.3
## 3 1976-03-01  10.9
## 4 1976-04-01   9.9
## 5 1976-05-01   9.4
## 6 1976-06-01   9.8

## [1] 529   2

#check date

class(ur$DATE)

## [1] "factor"

#change date class to date type

ur$DATE<- as.Date(ur$DATE)
class(ur$DATE)

## [1] "Date"

Checking for stationarity

#check time series plot
ggplot(ur,aes(x=DATE,y=MAURN)) +geom_line()

#check ACF plot
ggAcf(ur$MAURN,type='correlation')

#run ADF test
adf.test(ur$MAURN)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  ur$MAURN
## Dickey-Fuller = -3.0954, Lag order = 8, p-value = 0.1146
## alternative hypothesis: stationary

#fit AR model

ar.model<- auto.arima(ur$MAURN, max.d=0, max.q=0,allowdrift=T)

ar.model

## Series: ur$MAURN 
## ARIMA(1,0,0) with non-zero mean 
## 
## Coefficients:
##          ar1    mean
##       0.9787  5.7425
## s.e.  0.0101  0.8498
## 
## sigma^2 estimated as 0.2:  log likelihood=-325.44
## AIC=656.88   AICc=656.93   BIC=669.7

#fit MA model

ma.model<- auto.arima(ur$MAURN, max.d=0, max.p=0,allowdrift=T)

ma.model

## Series: ur$MAURN 
## ARIMA(0,0,5) with non-zero mean 
## 
## Coefficients:
##          ma1     ma2     ma3     ma4     ma5    mean
##       1.3646  1.7103  1.4882  1.2714  0.4804  5.4588
## s.e.  0.0368  0.0492  0.0578  0.0393  0.0350  0.1507
## 
## sigma^2 estimated as 0.229:  log likelihood=-361.03
## AIC=736.05   AICc=736.27   BIC=765.95

#fit ARMA model
arma.model<- auto.arima(ur$MAURN, max.d=0,allowdrift=T)

arma.model

## Series: ur$MAURN 
## ARIMA(3,0,2) with non-zero mean 
## 
## Coefficients:
##           ar1     ar2     ar3     ma1     ma2    mean
##       -0.2267  0.5998  0.5573  1.3361  0.8876  5.7038
## s.e.   0.0885  0.0544  0.0569  0.0544  0.0221  0.7764
## 
## sigma^2 estimated as 0.1693:  log likelihood=-280.15
## AIC=574.3   AICc=574.51   BIC=604.19

#fit ARIMA model

arima.model<- auto.arima(ur$MAURN, allowdrift=T)

arima.model

## Series: ur$MAURN 
## ARIMA(4,1,2) 
## 
## Coefficients:
##          ar1      ar2      ar3     ar4      ma1     ma2
##       1.0029  -0.1834  -0.3982  0.4872  -1.1149  0.2512
## s.e.  0.0708   0.0750   0.0560  0.0394   0.0793  0.0711
## 
## sigma^2 estimated as 0.1509:  log likelihood=-247.45
## AIC=508.9   AICc=509.12   BIC=538.78

Checking the Residuals of the Model fit

#calculate residuals of each model

ar.residual<- resid(ar.model)
ma.residual<- resid(ma.model)
arma.residual<- resid(arma.model)
arima.residual<- resid(arima.model)

#plot PACF plot of each models residuals
ggAcf(ar.residual,type='partial')

ggAcf(ma.residual,type='partial')

ggAcf(arma.residual,type='partial')

ggAcf(arima.residual,type='partial')

#run the Ljung Box test on the residuals

Box.test(ar.residual,type='Ljung-Box',lag=1)

## 
##  Box-Ljung test
## 
## data:  ar.residual
## X-squared = 0.88802, df = 1, p-value = 0.346

Box.test(ma.residual,type='Ljung-Box',lag=1)

## 
##  Box-Ljung test
## 
## data:  ma.residual
## X-squared = 0.56386, df = 1, p-value = 0.4527

Box.test(arma.residual,type='Ljung-Box',lag=1)

## 
##  Box-Ljung test
## 
## data:  arma.residual
## X-squared = 0.96749, df = 1, p-value = 0.3253

Box.test(arima.residual,type='Ljung-Box',lag=1)

## 
##  Box-Ljung test
## 
## data:  arima.residual
## X-squared = 0.0032696, df = 1, p-value = 0.9544

Making a forecast for each model

#make forecast for each model
ar.forecast<- forecast(ar.model,h=24,level=80)

ma.forecast<- forecast(ma.model,h=24,level=80)

arma.forecast<- forecast(arma.model,h=24,level=80)

arima.forecast<- forecast(arima.model,h=24,level=80)

#plot forecast for each model
g1<- autoplot(ar.forecast)
g2<- autoplot(ma.forecast)
g3<- autoplot(arma.forecast)
g4<- autoplot(arima.forecast)
grid.arrange(g1,g2,g3,g4,nrow=2,ncol=2)

Fitting Seasonal Trend Losses (STL) Decomposition Models

#transform to time series object; need to specify frequency

ur.ts<- ts(ur$MAURN,frequency=12)

#fit stl model

stl.model<- stl(ur.ts,s.window='periodic')

#plot model fit

autoplot(stl.model)

#make forecast

stl.forecast<- forecast(stl.model,h=24,level=80)

autoplot(stl.forecast)

Where to go Next

Advanced time series models
- ARCH, GARCH, etc. that model changing variance over time
Vector Autoregression (VAR)
- for multivariate i.e. multiple time series and modelling dependencies between them
Machine learning
- how to do CV with time series
- neural networks for sequence data (LSTMs, etc.)
Spatial Statistics
- generalise time dependence to spatial dependence in multiple dimensions
Econometrics
- cointegration
- granger causality
- serial correlation
- regression with time series data
Bayesian time series

Time Series Project Material by Coursera Projects

FG_coursera

7/28/2020

Detrending