Martin A. Liendo
22/01/2018
Introduction: uses of Time Series Analysis
Concepts and main topics
Methodology
Application: Time Series forecast example in R
Univariate Time Series:
set of observations of a random variable spaced evenly through (discrete) time
different methodology, assumptions and forecasting model than cross-sectional
Used primarily for forecasting and decomposition of values.
example of ts variables: daily price of bitcoin, monthly demand of items in Nicequest's shop, quaterly air pollution in Barcelona.
require("forecast")
plot(JohnsonJohnson)
The pattern in a time series is normally classified or decomposed into trend, seasonality, cyclical and random components:
Trend : Observations increase or decrease (regularly) through time
Cyclical : Component fluctuate around a certain trend or value. The repeated pattern appears in a time series but beyond a frequency of one year. Commonly wavelike pattern.
Seasonality : Patterns that repeats in a regular/evenly through time. Frequency of occurence is within a year or shorter.
Random : Unpredictable component
Stationary Process:
1 . Strictly stationary : if the marginal distribution of Y at time t is the same as at any other point in time . Then
\[ p(Y_t) = p(Y_{t+k}) \]
\[ p(Y_{t-s}) = p(T_{t+k-s}) \]
This implies that mean, variance and covariance of the serie are time invariant
2 . Covariance Stationary or Weakly Stationary:
\[ E(Y_1) = E(Y_2) = ... = E(Y_t) = \mu \]
\[ Var(Y_1) = Var(Y_2) = ... = Var(Y_t) = \sigma^2 \]
\[ Cov(Y_t, Y_{t-k}) = Cov(Y_{t-2}, Y_{t-2-k})= \gamma_k \]
Why Stationary assumption?
The results of classical econometric theory are derived under the assumption that variables of concern are stationary
Standard techniques are largely invalid where data is non-stationary
Non-stationary time series regressions or analysis may result in spurious regressions
Stationary process:
If the stationary criterion are violated, then the time series must be stationarized and then apply stochastic models.
Multiple ways to stationarize a time series and depend in multiple factor. The most common ones are Detrending and Differencing
In general, a series which is stationary after being differentiated “d” times is said to be integrated of order “d”, denoted I(d)
Therefore a series which is stationary without differencing, is said to be I(0)
Testing for Stationary Time Series:
Augmented Dickey Fuller Test (ADF): tests the null hyphotesis that a unit root is present in a time series sample. The augmented Dickey-Fuller statistic used in ADF test is a negative number, and the more negative it is, the stronger the rejection of the hypothesis that there is a uni root.
\[ \Delta y_t = \alpha + \delta y_{t-1} + \sum_i \beta_i \Delta y_{t-1} + \epsilon_t \] where:
\begin{eqnarray}
& H0: \delta = 0 ;
& H1: \delta < 0\
\end{eqnarray}
Different Time Series Processes :
1. White Noise:
A series is called white noise if it is purely random in nature.
Then the series will have no pattern and hence forecasting the future values of such a series is not possible. Some properties are :
\[ E(\epsilon_t) = 0 \]
\[ V(\epsilon_t) = \sigma^2 \]
\[ E(\epsilon_t \epsilon_s) = 0 \]
Thus, if I time series follows a white noise process, we won't be able to forecast or model that random variable
Different Time Series Processes :
2.AutoRegressive Model (AR):
An AR model is one in which the random variable at moment t depends only on its own past values
\[ Y_t = f(Y_{t-1}, Y_{t-2},..., \epsilon_t) \]
A common representation of an autoregressive model where it depends on p of its past values called as AR(p) model can be represented like:
\[ Y_t = \beta_0 + \beta_1 Y_{t-1} + \beta_2 Y_{t-2} +...+ \beta_p Y_{t-p} + \epsilon_t \]
Different Time Series Processes :
3.Moving Average Model(MA):
A MA model is one in which the random variable at moment t depends only on the random error terms which follow a white noise process
\[ Y_t = f(\epsilon_{t-1}, \epsilon_{t-2},\epsilon_{t-3}) \]
A common representation of a moving average model where it depends on q of its past values called as MA(q) model can be represented like:
\[ Y_t = \theta_0 + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} +...+ \theta_q Y_{t-q} \]
4.AutoRegressive Moving Average model:
There are situations where the time-series may be represented as a mix of both AR and MA models referred as ARMA(p,q).
The general form of such a time-series model, which depends on p of its own past values and q past values of white noise disturbances, takes the form:
\[ Y_t = \beta_0 + \beta_1 Y_{t-1} + \beta_2 Y_{t-2} +...+ \beta_p Y_{t-p} \]
\[ + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} +...+ \theta_q Y_{t-q} \]
Box Jenkins : The estimation and forecasting of univariate time-series models is (very often) carried out using the Box-Jenkins (BJ) methodology which has the following steps:
(1) Identification
(2) Estimation
(3) Diagnostic Checking
The B-J methodology is applicable only to stationary variables
Steps in ARIMA modeling for forecasting purpose:
Visualize the time series. Detect Trend, seasonality, random effects
Ensure that the time series is stationary
Identify a reasonable model or models (p and q values). Commonly through ACF/PACF charts
Build and fit the optimal model
Make predictions and evaluate the model's fit
Modelling and Forecasting Tractors Sales for the next 2 years
sales = read.csv('http://ucanalytics.com/blogs/wp-content/uploads/2015/06/Tractor-Sales.csv')
sales <- ts(sales[,2],start = c(2003,1),frequency = 12)
sales
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2003 141 157 185 199 203 189 207 207 171 150 138 165
2004 145 168 197 208 210 209 238 238 199 168 152 196
2005 183 200 249 251 289 249 279 279 232 204 194 232
2006 215 239 270 279 307 305 322 339 263 241 229 272
2007 247 261 330 362 385 340 370 381 299 266 239 281
2008 257 250 329 350 393 370 423 410 326 289 270 321
2009 305 310 374 414 454 441 510 486 393 345 315 389
2010 358 368 444 482 534 524 578 567 447 386 360 428
2011 397 400 498 536 596 591 651 654 509 437 406 470
2012 428 423 507 536 610 609 687 707 509 452 412 472
2013 454 455 568 610 706 661 767 783 583 513 481 567
2014 525 520 587 710 793 749 871 848 640 581 519 605
Time series with trend and seasonality
plot(sales, xlab='Years', ylab = 'Tractor Sales')
Making the series mean and variance stationary. First, in order to make the series mean stationary we use 1st order differencing to remove the upward trend.
plot(diff(sales),ylab='Differenced Tractor Sales')
A common way to make a series stationary on variance is through transfoming the original series taking the log of the original set of observations.
plot(log(sales), ylab = "Log(Tractor Sales)" )
Hence, taking log of the original series and then differentiating it, it seems that now the series is stationary .
plot(diff(log10(sales)), ylab = "Differenced Log (Tractor Sales)")
ACF and PACF can be affected where there are strong seasonality effects.
ACF and PACF guidelines
d.sales <- diff(log(sales))
par(mfrow = c(1,2))
Acf(d.sales, main = "ACF d.sales")
Pacf(d.sales, main = "PACF d.sales")
We fit a model with the auto.arima function from the forecast package. This function allows us to include a seasonal component.
library(forecast)
fit <- auto.arima(log10(sales), seasonal = T, ic = "aic")
summary(fit)
Series: log10(sales)
ARIMA(0,1,1)(0,1,1)[12]
Coefficients:
ma1 sma1
-0.4047 -0.5529
s.e. 0.0885 0.0734
sigma^2 estimated as 0.0002571: log likelihood=354.4
AIC=-702.79 AICc=-702.6 BIC=-694.17
Training set error measures:
ME RMSE MAE MPE MAPE
Training set 0.0002410698 0.01517695 0.01135312 0.008335713 0.4462212
MASE ACF1
Training set 0.2158968 0.01062604
model_predict <- predict(fit, n.ahead = 24)
delog <- 10^(model_predict$pred)
upper_band <- 10^(model_predict$pred+model_predict$se)
lower_band <- 10^(model_predict$pred-model_predict$se)
par(mfrow = c(1,1))
plot(sales,type='l',xlim=c(2004,2018),ylim=c(1,1600),xlab = 'Year',ylab = 'Tractor Sales')
lines(delog,col='blue')
lines(upper_band,col='red', type = "l", lty = 3)
lines(lower_band,col='red', type = "l", lty = 3)
The last steps would be to control that the residuals are white noise and do not present autocorrelation.
par(mfrow = c(1,2))
Acf(fit$residuals, main = "ACF Residual")
Pacf(fit$residuals, main = "PACF Residual")