Lab 7: Time Series Analysis

Task 0: Please fill out your SAIS online if you haven't already!

Thank you!

Data Input

In this lab, we will be returning to Dr. Grissino-Mayer's El Malpais tree ring chronology. Apart from the first line, this code is just copied and pasted from Lab 4:

setwd("~/Dropbox/classes/Geog415_s13/geog415_s13_lab/lab7/")  # This will be different for you!
# trw: tree ring width
trw <- read.table("MLC.COL", na.strings = ".")
# Check the data
head(trw)

##     V1    V2    V3
## 1 -136 0.441    NA
## 2 -135 0.814    NA
## 3 -134 0.677    NA
## 4 -133 0.197    NA
## 5 -132 0.347 0.517
## 6 -131 0.651 0.824

names(trw) <- c("year", "stand_chron", "resid_chron")
summary(trw)

##       year       stand_chron     resid_chron    
##  Min.   :-136   Min.   :0.006   Min.   :-0.271  
##  1st Qu.: 396   1st Qu.:0.674   1st Qu.: 0.736  
##  Median : 928   Median :0.972   Median : 0.992  
##  Mean   : 928   Mean   :1.000   Mean   : 1.000  
##  3rd Qu.:1460   3rd Qu.:1.278   3rd Qu.: 1.260  
##  Max.   :1992   Max.   :3.357   Max.   : 3.035  
##                                 NA's   :4

plot(trw$year, trw$resid_chron, type = "l", xlab = "Year", ylab = "residual chronology")

plot of chunk unnamed-chunk-1

Simple plotting

Plotting of autocorrelation functions is easy:

acf(trw$stand_chron)

plot of chunk unnamed-chunk-2

pacf(trw$stand_chron)

plot of chunk unnamed-chunk-2

Interpreting is harder…

The data also include a “residual chronology” that is detrended and decorrelated. Let's plot these data and verify that.

acf(trw$resid_chron, na.action = na.pass)

plot of chunk unnamed-chunk-3

pacf(trw$resid_chron, na.action = na.pass)

plot of chunk unnamed-chunk-3

###Question 1: Imagine that this is an article or report. In a short paragraph, provide an interpretation of the above four plots to help the reader understand the nature of these two data series (the original, standardized chronology, and the residual chronology).

Modeling a single time series

In this section, we will use R's arima function to model the tree ring width series. This function works well whenever you have a single time series to analyze, like we do here. It is not immediately helpful for regression with time series data. For that, the gls function within the nlme library is better.

# Fit an AR(0) model to use as a baseline
ar0 <- arima(trw$stand_chron)
ar0

## 
## Call:
## arima(x = trw$stand_chron)
## 
## Coefficients:
##       intercept
##            1.00
## s.e.       0.01
## 
## sigma^2 estimated as 0.209:  log likelihood = -1356,  aic = 2715

# This is just going to estimate the mean.

# Fit an AR(1) model
ar1 <- arima(trw$stand_chron, order = c(1, 0, 0))
ar1

## 
## Call:
## arima(x = trw$stand_chron, order = c(1, 0, 0))
## 
## Coefficients:
##         ar1  intercept
##       0.292      1.000
## s.e.  0.021      0.013
## 
## sigma^2 estimated as 0.192:  log likelihood = -1262,  aic = 2529

acf(ar1$residuals)

plot of chunk unnamed-chunk-4

pacf(ar1$residuals)

plot of chunk unnamed-chunk-4

ar1$aic

## [1] 2529

There are at least two things worth pointing out here…

1) After we fit the AR1 and extract the residuals, we can see if the AR1 residuals are decorrelated. They are not. This shouldn't be surprising after the initial PACF plot of the original series.
2) A useful statistic is the AIC (Akaike's “An Information Criterion”). The AIC penalizes the likelihood as we add parameters, i.e. as we make the model more complicated. Thus, the AIC trades off between model fit (good) and model complexity (bad). Look for a small AIC. AIC is often used to choose the order of the AR model to use, rather than the PACF plot.

Homework Question 2:

We already fit the AR(p) model for p=0 and p=1. Continue for p=2,…,10, and create a plot that has p=0-10 on the x-axis and AIC on the y-axis. (Hint, after fitting the AR models, you may want to create a dataframe with two columns, p and AIC, or, alternatively, create two vectors p and AIC).

raw_precip <- read.table("2904.pcp")
head(raw_precip)

##     V1   V2   V3   V4   V5   V6   V7   V8   V9  V10  V11  V12  V13
## 1 1895 1.48 0.61 0.24 0.02 0.73 0.12 3.94 1.96 0.94 1.07 0.94 0.37
## 2 1896 0.21 0.27 0.17 0.26 0.00 0.56 3.31 1.95 1.43 4.01 0.19 0.55
## 3 1897 1.47 0.59 0.63 0.10 0.95 0.56 2.47 1.81 2.35 0.87 0.02 0.21
## 4 1898 1.17 0.04 0.50 0.61 0.24 1.27 3.68 2.29 0.73 0.00 0.34 1.08
## 5 1899 0.36 0.32 0.52 0.13 0.00 0.72 3.98 1.38 1.55 0.27 0.57 0.24
## 6 1900 0.35 0.27 0.46 0.54 0.40 0.44 1.51 1.56 2.22 0.92 0.44 0.22

names(raw_precip) <- c("year", "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", 
    "Aug", "Sep", "Oct", "Nov", "Dec")
# Coerce that from a 'wide' format to a 'long' format
library(reshape)

## Loading required package: plyr

## Attaching package: 'reshape'

## The following object(s) are masked from 'package:plyr':
## 
## rename, round_any

precip <- melt(raw_precip, id.vars = "year", variable_name = "month")
head(precip)

##   year month value
## 1 1895   Jan  1.48
## 2 1896   Jan  0.21
## 3 1897   Jan  1.47
## 4 1898   Jan  1.17
## 5 1899   Jan  0.36
## 6 1900   Jan  0.35

tail(precip)

##      year month value
## 1291 1997   Dec  1.62
## 1292 1998   Dec  0.72
## 1293 1999   Dec  0.27
## 1294 2000   Dec  0.40
## 1295 2001   Dec  0.47
## 1296 2002   Dec  1.25

# The order is screwy.  We should sort by year, then month
precip <- precip[order(precip$year, precip$month), ]

head(precip, 13)

##      year month value
## 1    1895   Jan  1.48
## 109  1895   Feb  0.61
## 217  1895   Mar  0.24
## 325  1895   Apr  0.02
## 433  1895   May  0.73
## 541  1895   Jun  0.12
## 649  1895   Jul  3.94
## 757  1895   Aug  1.96
## 865  1895   Sep  0.94
## 973  1895   Oct  1.07
## 1081 1895   Nov  0.94
## 1189 1895   Dec  0.37
## 2    1896   Jan  0.21

library(zoo)

## Attaching package: 'zoo'

## The following object(s) are masked from 'package:base':
## 
## as.Date, as.Date.numeric

precip$ma <- rollmean(precip$value, 12, fill = NA, align = "right") * 12
# I multiplied by 12 to turn monthly average into annual total.
precip_yr <- subset(precip, month == "Jul" & !is.na(ma))
names(precip_yr)

## [1] "year"  "month" "value" "ma"

names(trw)

## [1] "year"        "stand_chron" "resid_chron"

# And finally, create the regression dataset.
reg_data <- merge(x = trw, y = precip_yr, by.x = "year", by.y = "year", all.x = TRUE)

Understanding why residuals are different

Here we will fit the model using tree ring width to explain annual precipitation. First, we will look at the ACF of precipitation, then we will fit the regression model, and then look at the ACF of the residuals.

acf(reg_data$ma, na.action = na.pass)

plot of chunk unnamed-chunk-5

pacf(reg_data$ma, na.action = na.pass)

plot of chunk unnamed-chunk-5

ols.mod <- lm(ma ~ stand_chron, data = reg_data)
acf(residuals(ols.mod), na.action = na.omit)

plot of chunk unnamed-chunk-5

Question 3

Briefly describe the time-series characteristics of the annual rainfall series and of the residuals from the regression. What causes the difference?

Fitting an Autoregressive model

We don't need to fit an autoregressive model here, but we could…

library(nlme)
gls.mod.ar0 <- gls(ma ~ stand_chron, data = reg_data, na.action = na.omit)
gls.mod.ar1 <- gls(ma ~ stand_chron, data = reg_data, correlation = corARMA(p = 1), 
    na.action = na.omit)
summary(gls.mod.ar1)

## Generalized least squares fit by REML
##   Model: ma ~ stand_chron 
##   Data: reg_data 
##     AIC   BIC logLik
##   414.7 424.9 -203.3
## 
## Correlation Structure: AR(1)
##  Formula: ~1 
##  Parameter estimate(s):
##      Phi 
## -0.01167 
## 
## Coefficients:
##             Value Std.Error t-value p-value
## (Intercept) 8.768    0.4629   18.94       0
## stand_chron 3.484    0.3479   10.02       0
## 
##  Correlation: 
##             (Intr)
## stand_chron -0.904
## 
## Standardized residuals:
##     Min      Q1     Med      Q3     Max 
## -1.7663 -0.6347 -0.1797  0.6183  3.5356 
## 
## Residual standard error: 1.972 
## Degrees of freedom: 97 total; 95 residual

A word of warning… The GLS model we are fitting is: \( Y_t = \beta_0 + \beta_1 X_t + u_t \) \( u_t = \rho_1 u_{t-1} + e_t \)

For diagnostic checking, we would want to look at the residuals \( e_t \). These are - hopefully - uncorrelated in time. Unfortunately, the residuals function returns \( u_t \), not \( e_t \) by default.

# Plot an acf of the u residuals (not the right thing to do!)
acf(residuals(gls.mod.ar1))

plot of chunk unnamed-chunk-7

# Plot an acf of the e residuals (the right thing to do!)
acf(residuals(gls.mod.ar1, type = "normalized"))

plot of chunk unnamed-chunk-7

# You should only see a small difference this time, but that is not true
# in general.

OK, technically, the residuals(… type='normalized') does not return \( e_t \), but rather, something like \( e_t/\sigma \). It shouldn't impact the ACF, much, though.

Question 4

There is a neat dataset in the cars package of crime rates from Canada that were originally published by Fox and Hartnagel. Do a simple regression study, explaining the female conviction rate using 1) total fertility rate, 2) participation rate, 3) number of post-secondary degrees, and 4) the male conviction rate as regressors. Assess autocorrelation, fix it using GLS, and report your results. I am most interested in your explanation of the process by which you assess the fit and eventually work to an acceptable regression model.

This will get you started with finding the data…

install.packages("car")
library("car")
data(Hartnagel)