Thank you!
In this lab, we will be returning to Dr. Grissino-Mayer's El Malpais tree ring chronology. Apart from the first line, this code is just copied and pasted from Lab 4:
setwd("~/Dropbox/classes/Geog415_s13/geog415_s13_lab/lab7/") # This will be different for you!
# trw: tree ring width
trw <- read.table("MLC.COL", na.strings = ".")
# Check the data
head(trw)
## V1 V2 V3
## 1 -136 0.441 NA
## 2 -135 0.814 NA
## 3 -134 0.677 NA
## 4 -133 0.197 NA
## 5 -132 0.347 0.517
## 6 -131 0.651 0.824
names(trw) <- c("year", "stand_chron", "resid_chron")
summary(trw)
## year stand_chron resid_chron
## Min. :-136 Min. :0.006 Min. :-0.271
## 1st Qu.: 396 1st Qu.:0.674 1st Qu.: 0.736
## Median : 928 Median :0.972 Median : 0.992
## Mean : 928 Mean :1.000 Mean : 1.000
## 3rd Qu.:1460 3rd Qu.:1.278 3rd Qu.: 1.260
## Max. :1992 Max. :3.357 Max. : 3.035
## NA's :4
plot(trw$year, trw$resid_chron, type = "l", xlab = "Year", ylab = "residual chronology")
Plotting of autocorrelation functions is easy:
acf(trw$stand_chron)
pacf(trw$stand_chron)
Interpreting is harder…
The data also include a “residual chronology” that is detrended and decorrelated. Let's plot these data and verify that.
acf(trw$resid_chron, na.action = na.pass)
pacf(trw$resid_chron, na.action = na.pass)
###Question 1: Imagine that this is an article or report. In a short paragraph, provide an interpretation of the above four plots to help the reader understand the nature of these two data series (the original, standardized chronology, and the residual chronology).
In this section, we will use R's arima function to model the tree ring width series. This function works well whenever you have a single time series to analyze, like we do here. It is not immediately helpful for regression with time series data. For that, the gls function within the nlme library is better.
# Fit an AR(0) model to use as a baseline
ar0 <- arima(trw$stand_chron)
ar0
##
## Call:
## arima(x = trw$stand_chron)
##
## Coefficients:
## intercept
## 1.00
## s.e. 0.01
##
## sigma^2 estimated as 0.209: log likelihood = -1356, aic = 2715
# This is just going to estimate the mean.
# Fit an AR(1) model
ar1 <- arima(trw$stand_chron, order = c(1, 0, 0))
ar1
##
## Call:
## arima(x = trw$stand_chron, order = c(1, 0, 0))
##
## Coefficients:
## ar1 intercept
## 0.292 1.000
## s.e. 0.021 0.013
##
## sigma^2 estimated as 0.192: log likelihood = -1262, aic = 2529
acf(ar1$residuals)
pacf(ar1$residuals)
ar1$aic
## [1] 2529
There are at least two things worth pointing out here…
1) After we fit the AR1 and extract the residuals, we can see if the AR1 residuals are decorrelated. They are not. This shouldn't be surprising after the initial PACF plot of the original series.
2) A useful statistic is the AIC (Akaike's “An Information Criterion”). The AIC penalizes the likelihood as we add parameters, i.e. as we make the model more complicated. Thus, the AIC trades off between model fit (good) and model complexity (bad). Look for a small AIC. AIC is often used to choose the order of the AR model to use, rather than the PACF plot.
We already fit the AR(p) model for p=0 and p=1. Continue for p=2,…,10, and create a plot that has p=0-10 on the x-axis and AIC on the y-axis. (Hint, after fitting the AR models, you may want to create a dataframe with two columns, p and AIC, or, alternatively, create two vectors p and AIC).
raw_precip <- read.table("2904.pcp")
head(raw_precip)
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13
## 1 1895 1.48 0.61 0.24 0.02 0.73 0.12 3.94 1.96 0.94 1.07 0.94 0.37
## 2 1896 0.21 0.27 0.17 0.26 0.00 0.56 3.31 1.95 1.43 4.01 0.19 0.55
## 3 1897 1.47 0.59 0.63 0.10 0.95 0.56 2.47 1.81 2.35 0.87 0.02 0.21
## 4 1898 1.17 0.04 0.50 0.61 0.24 1.27 3.68 2.29 0.73 0.00 0.34 1.08
## 5 1899 0.36 0.32 0.52 0.13 0.00 0.72 3.98 1.38 1.55 0.27 0.57 0.24
## 6 1900 0.35 0.27 0.46 0.54 0.40 0.44 1.51 1.56 2.22 0.92 0.44 0.22
names(raw_precip) <- c("year", "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul",
"Aug", "Sep", "Oct", "Nov", "Dec")
# Coerce that from a 'wide' format to a 'long' format
library(reshape)
## Loading required package: plyr
## Attaching package: 'reshape'
## The following object(s) are masked from 'package:plyr':
##
## rename, round_any
precip <- melt(raw_precip, id.vars = "year", variable_name = "month")
head(precip)
## year month value
## 1 1895 Jan 1.48
## 2 1896 Jan 0.21
## 3 1897 Jan 1.47
## 4 1898 Jan 1.17
## 5 1899 Jan 0.36
## 6 1900 Jan 0.35
tail(precip)
## year month value
## 1291 1997 Dec 1.62
## 1292 1998 Dec 0.72
## 1293 1999 Dec 0.27
## 1294 2000 Dec 0.40
## 1295 2001 Dec 0.47
## 1296 2002 Dec 1.25
# The order is screwy. We should sort by year, then month
precip <- precip[order(precip$year, precip$month), ]
head(precip, 13)
## year month value
## 1 1895 Jan 1.48
## 109 1895 Feb 0.61
## 217 1895 Mar 0.24
## 325 1895 Apr 0.02
## 433 1895 May 0.73
## 541 1895 Jun 0.12
## 649 1895 Jul 3.94
## 757 1895 Aug 1.96
## 865 1895 Sep 0.94
## 973 1895 Oct 1.07
## 1081 1895 Nov 0.94
## 1189 1895 Dec 0.37
## 2 1896 Jan 0.21
library(zoo)
## Attaching package: 'zoo'
## The following object(s) are masked from 'package:base':
##
## as.Date, as.Date.numeric
precip$ma <- rollmean(precip$value, 12, fill = NA, align = "right") * 12
# I multiplied by 12 to turn monthly average into annual total.
precip_yr <- subset(precip, month == "Jul" & !is.na(ma))
names(precip_yr)
## [1] "year" "month" "value" "ma"
names(trw)
## [1] "year" "stand_chron" "resid_chron"
# And finally, create the regression dataset.
reg_data <- merge(x = trw, y = precip_yr, by.x = "year", by.y = "year", all.x = TRUE)
Here we will fit the model using tree ring width to explain annual precipitation. First, we will look at the ACF of precipitation, then we will fit the regression model, and then look at the ACF of the residuals.
acf(reg_data$ma, na.action = na.pass)
pacf(reg_data$ma, na.action = na.pass)
ols.mod <- lm(ma ~ stand_chron, data = reg_data)
acf(residuals(ols.mod), na.action = na.omit)
Briefly describe the time-series characteristics of the annual rainfall series and of the residuals from the regression. What causes the difference?
We don't need to fit an autoregressive model here, but we could…
library(nlme)
gls.mod.ar0 <- gls(ma ~ stand_chron, data = reg_data, na.action = na.omit)
gls.mod.ar1 <- gls(ma ~ stand_chron, data = reg_data, correlation = corARMA(p = 1),
na.action = na.omit)
summary(gls.mod.ar1)
## Generalized least squares fit by REML
## Model: ma ~ stand_chron
## Data: reg_data
## AIC BIC logLik
## 414.7 424.9 -203.3
##
## Correlation Structure: AR(1)
## Formula: ~1
## Parameter estimate(s):
## Phi
## -0.01167
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 8.768 0.4629 18.94 0
## stand_chron 3.484 0.3479 10.02 0
##
## Correlation:
## (Intr)
## stand_chron -0.904
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -1.7663 -0.6347 -0.1797 0.6183 3.5356
##
## Residual standard error: 1.972
## Degrees of freedom: 97 total; 95 residual
A word of warning… The GLS model we are fitting is: \( Y_t = \beta_0 + \beta_1 X_t + u_t \) \( u_t = \rho_1 u_{t-1} + e_t \)
For diagnostic checking, we would want to look at the residuals \( e_t \). These are - hopefully - uncorrelated in time. Unfortunately, the residuals function returns \( u_t \), not \( e_t \) by default.
# Plot an acf of the u residuals (not the right thing to do!)
acf(residuals(gls.mod.ar1))
# Plot an acf of the e residuals (the right thing to do!)
acf(residuals(gls.mod.ar1, type = "normalized"))
# You should only see a small difference this time, but that is not true
# in general.
OK, technically, the residuals(… type='normalized') does not return \( e_t \), but rather, something like \( e_t/\sigma \). It shouldn't impact the ACF, much, though.
There is a neat dataset in the cars package of crime rates from Canada that were originally published by Fox and Hartnagel. Do a simple regression study, explaining the female conviction rate using 1) total fertility rate, 2) participation rate, 3) number of post-secondary degrees, and 4) the male conviction rate as regressors. Assess autocorrelation, fix it using GLS, and report your results. I am most interested in your explanation of the process by which you assess the fit and eventually work to an acceptable regression model.
This will get you started with finding the data…
install.packages("car")
library("car")
data(Hartnagel)