Time Series Regression

Time series introduce a challenge that violate the assumptions of most regression techniques, that the errors are independent. With time series, we deal with serially correlated errors. This means that the error in a predictor is related to a previous time value. In this exercise we are working with a time series of the water levels of Lake Huron and will try to model the relationship using simple regression as well as auto-regressive model using Generalized Least Squares.

Data

The data span from 1875 to 1972 and represents the water level of Lake Huron. At first sight, we see that the first half of the series is mostly governed by a downstrend. In the second half, both uptrends and downstrends are visible with a series of important spikes.

Auto-correlation

The exponentially decaying nature of the auto-correlation function, or the correlation of the series with a lag of itself, suggests an AR(1) autoregressive model.

Simple Linear Regression

We can use simple regression on the data by regressing the series at time \(t\) on \(t{-1}\). We need to drop the first and last values appropriately as these are end values without two neighbors. We see from the model output that t_1 is significant and has a positive value. A plot of the regression line over the time series validates the linear relationship.

## 
## Call:
## lm(formula = t ~ t_1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.01620 -0.46747  0.00781  0.47583  1.88638 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 94.71257   32.23790   2.938  0.00415 ** 
## t_1          0.83641    0.05568  15.022  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7209 on 95 degrees of freedom
## Multiple R-squared:  0.7037, Adjusted R-squared:  0.7006 
## F-statistic: 225.7 on 1 and 95 DF,  p-value: < 2.2e-16

Generalized Least Squares

Unlike linear regression, in Generalized Least Squares the errors are assumed to be normally distributed with mean equal to 0 and standard deviation equal to the correlated errors.

AR(1)

Interestingly, we find that the time effect is no longer significant. We suspect that there might be higher order terms that influence the model that could explain this.

##                        Value   Std.Error   t-value      p-value
## (Intercept)     616.48869320 24.36263217 25.304683 2.944221e-44
## time(LakeHuron)  -0.01943459  0.01266414 -1.534616 1.281674e-01

We compare results from the gls model to results using ARIMA (Auto Regressive Integrated Moving Average). Both the intercept and ar terms are different.

## 
## Call:
## arima(x = LakeHuron, order = c(1, 0, 0))
## 
## Coefficients:
##          ar1  intercept
##       0.8375   579.1153
## s.e.  0.0538     0.4240
## 
## sigma^2 estimated as 0.5093:  log likelihood = -106.6,  aic = 219.2

AR(2)

Investigating our suspicion of higher order terms, we find that there actually is strong evidence for a negative second order term. Comparing the AR(2) models we now see a consistency in the estimates of the coefficients: AR (1st: 1.054, 2nd: -0.267), GLS (1st: 1.02, 2nd: -0.274), ARIMA (1st: 1.04, 2nd: -0.249)

## 
## Call:
## ar(x = LakeHuron)
## 
## Coefficients:
##       1        2  
##  1.0538  -0.2668  
## 
## Order selected 2  sigma^2 estimated as  0.5075
## Generalized least squares fit by REML
##   Model: LakeHuron ~ time(LakeHuron) 
##   Data: NULL 
##       AIC      BIC   logLik
##   221.028 233.8497 -105.514
## 
## Correlation Structure: ARMA(2,0)
##  Formula: ~1 
##  Parameter estimate(s):
##       Phi1       Phi2 
##  1.0203418 -0.2741249 
## 
## Coefficients:
##                    Value Std.Error  t-value p-value
## (Intercept)     619.6442 17.491090 35.42628  0.0000
## time(LakeHuron)  -0.0211  0.009092 -2.32216  0.0223
## 
##  Correlation: 
##                 (Intr)
## time(LakeHuron) -1    
## 
## Standardized residuals:
##         Min          Q1         Med          Q3         Max 
## -2.16624672 -0.57959971  0.01070681  0.61705337  2.03975934 
## 
## Residual standard error: 1.18641 
## Degrees of freedom: 98 total; 96 residual
## 
## Call:
## arima(x = LakeHuron, order = c(2, 0, 0))
## 
## Coefficients:
##          ar1      ar2  intercept
##       1.0436  -0.2495   579.0473
## s.e.  0.0983   0.1008     0.3319
## 
## sigma^2 estimated as 0.4788:  log likelihood = -103.63,  aic = 215.27

We plot the fitted values of the ARIMA second order model alongside the original data and see that it is a close match.

This simplistic walkthrough of time series analysis shows how we can model AR(1) type models with simple regression but also through the use of libraries like nlme for gls. We compared model approaches and saw that adding a higher order terms might be necessary when simpler models are not statistically significant. We also saw how different modelling approach yield similar estimates for the coefficients. Forecasting would be the natural extension of this exercise to take it to the next level.