Finacial time series modeling and forcasting with R

VijayaG (rexplore0@gmail.com)

July 2015

Background and Objective

The basic assumption for the time series is that the data points are randomly distributed and have some dependency between data points close together in time but no dependency in data points far apart in time. These assumption can be explained by the stationarity and ergodicity concepts.

In a stationary stochastic process, the joint distribution of data points is time invariant (i.e., mean and variance doesn’t change over time). The autocovariances and autocorrelations are the measures of linear temporal dependence in a covariance stationary stochastic process, known as autocorrelation function (ACF). The ACF revels the interrelationships within a time series or correlation between all pairs of data points that are exactly same steps apart.

In a strictly stationary or covariance stationary stochastic process no assumption is made about the strength of dependence between random data points in the sequence. The strength of dependence between random points in a stochastic process diminishes the farther apart they become. This diminishing dependence assumption is captured by the concept of ergodicity. A stochastic process is ergodic if any two collections of random points partitioned far apart in the sequence.

An important class of linear time series models is the family of Autoregressive Integrated Moving Average (ARIMA) models, proposed by Box and Jenkins (1976). It assumes that the current value can depend only on the past values of the time series itself or on past values of some error term.

Moving average models are simple covariance stationary and ergodic time series models that can capture a wide variety of autocorrelation patterns. To create a covariance stationary and ergodic stochastic process in which yt and yt minus 1 are correlated but Yt and yt minus j are not correlated for j less than 1, where the time dependence in the process only lasts for one period. These processes can be created using the first order moving average (MA (1)) model. The moving average parameter, theta determines the sign and magnitude of the correlation between yt and yt minus 1. Clearly, if theta equals to 0 then yt exhibits no time dependence.

The presence of autocorrelation is one indication that an ARIMA model could be used to model the time series. From ACF plot, one can count number of significant autocorrelations, which is a useful estimate for the number of moving averages (MA) coefficients in the model. The plot for wfc shows only one MA coefficient will be required.

Partial Autocorrelation (PACF)

Partial Autocorrelation is a tool to understand interrelationships in a time series. It is the correlation between all data points that are exactly n steps apart, after accounting for their correlation with the data between those n steps. It helps to identify the number of autoregression(AR) coefficients in an ARIMA model. For wcf, no significant partial autocorrelation found.

Finding lagged correlation between two time series

The cross correlation function helps to discover lagged correlations between two time series. Correlation at lag 0 is the simple correlation between the variables.

[1] 0.5059

Fitting ARIMA Model

Building an ARIMA model consists three steps: 1.Model identification (involves determining the order that is the number of past values and number of past error terms to incorporate in a tentative model, 2.Model estimation (parameters of the model are estimated, generally using either the least squares or maximum likelihood methods), and 3. Diagnostic checking (e.g. Model residuals behave as white noise). The model order is usually denoted by three integers,(p,d,q), where, p= number of autoregressive coeff.; d= degree of differencing (AR); q = number of moving average coeff (MA).

Series: returns[, "sp500"] 
ARIMA(1,1,0) with drift         

Coefficients:
        ar1  drift
      0.082  0.006
s.e.  0.063  0.003

sigma^2 estimated as 0.00188:  log likelihood=436.8
AIC=-867.7   AICc=-867.6   BIC=-857.1
Series: returns[, "aapl"] 
ARIMA(0,1,0) with drift         

Coefficients:
      drift
      0.018
s.e.  0.009

sigma^2 estimated as 0.0184:  log likelihood=146.8
AIC=-289.5   AICc=-289.4   BIC=-282.4
Series: returns[, "vbltx"] 
ARIMA(0,1,2) with drift         

Coefficients:
        ma1     ma2  drift
      0.065  -0.220  0.006
s.e.  0.062   0.064  0.001

sigma^2 estimated as 0.000616:  log likelihood=578.3
AIC=-1149   AICc=-1148   BIC=-1134

Running diagnosis on an ARIMA Model

The tsdiag plots the residuals, the autocorrelation function of the residuals, and the p-values of a Portmanteau test for all lags.

Making forcast from an ARIMA Model

The predict function calculates both the next observation and sd according the model.

$pred
Time Series:
Start = 256 
End = 256 
Frequency = 1 
[1] 4.002

$se
Time Series:
Start = 256 
End = 256 
Frequency = 1 
[1] 0.0827
$pred
Time Series:
Start = 256 
End = 265 
Frequency = 1 
 [1] 4.002 4.002 4.002 4.002 4.002 4.002 4.002 4.002 4.002 4.002

$se
Time Series:
Start = 256 
End = 265 
Frequency = 1 
 [1] 0.0827 0.1169 0.1432 0.1654 0.1849 0.2026 0.2188 0.2339 0.2481
[10] 0.2615