by Mark R Payne
DTU-Aqua, Charlottenlund, Denmark
http://www.staff.dtu.dk/mpay
Wed Jan 13 11:11:07 2016
Spatial and temporal correlations between data points are a common feature of most datasets in Marine Science. However, accounting and correcting for these features is not necessarily straight forward. In this tutorial, I demonstrate how this can be done in some of the most simple cases.
To start with we need a simple data set to work with. We could generate our own, but lets follow the analysis of Zuur et al 2009, and the Hawaiian bird example. This data set can be downloaded from http://www.highstat.com/book2.htm
dat.raw <- read.table("Hawaii.txt",header=TRUE)
dat.all <- data.frame(year=dat.raw$Year,birds=sqrt(dat.raw$Moorhen.Kauai),rain=dat.raw$Rainfall)
There are missing values in 1972, 1965 and 1957. To start with we will only look at the part of the time series that is complete, from from 1973 onwards.
dat <- subset(dat.all,year>=1973)
save(dat,file="Correlation_data.RData")
First, plotting the data shows that we have got more or less the same thing as in Zuur’s book, just truncated.
plot(dat$year,dat$birds)
Now lets fit a linear trend, ignorning any potential covariates
mdl <- lm(birds ~year,data=dat)
summary(mdl)
##
## Call:
## lm(formula = birds ~ year, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.0179 -1.7426 -0.0977 1.5436 7.2647
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -476.01539 121.04255 -3.933 0.000480 ***
## year 0.24419 0.06089 4.011 0.000389 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.032 on 29 degrees of freedom
## Multiple R-squared: 0.3568, Adjusted R-squared: 0.3346
## F-statistic: 16.09 on 1 and 29 DF, p-value: 0.0003887
which suggest a highly significant slope (p<0.001). How do the model diagnostics look?
par(mfrow=c(2,2))
plot(mdl)
The Tukey-Anscombe plot (top-left) suggests that the variance of the residuals is constant, as does the scale-location plot (bottom-left). The QQ plot looks ok-ish. There don’t appear to be any bad outliers (bottom-right) How does the time series of residuals look?
par(mfrow=c(1,1))
plot(residuals(mdl))
Looks like there are some trends there. Doing the dot-to-dot makes it obvious
plot(residuals(mdl),type="b")
abline(h=0,lty=3)
So, lets make the autocorrelation plot
acf(residuals(mdl))
Looks like we have some work to do! ## Linear trend analysis with autocorrelation So, lets account for that autocorrelation. We do this using the gls() function in the nlme package.
library(nlme)
mdl.ac <- gls(birds ~year, data=dat,
correlation = corAR1(form=~year),
na.action=na.omit)
summary(mdl.ac)
## Generalized least squares fit by REML
## Model: birds ~ year
## Data: dat
## AIC BIC logLik
## 137.7397 143.2089 -64.86987
##
## Correlation Structure: AR(1)
## Formula: ~year
## Parameter estimate(s):
## Phi
## 0.9132806
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) -457.9346 455.1807 -1.00605 0.3227
## year 0.2344 0.2290 1.02386 0.3144
##
## Correlation:
## (Intr)
## year -1
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -0.7504240 -0.0730259 0.2199919 0.5526805 1.7095733
##
## Residual standard error: 5.080289
## Degrees of freedom: 31 total; 29 residual
Which is a pretty different result - the trend is no-longer significant! As a side note, in this situation, the adding auto-correlation does not change the slope, but does change the associated p-value e.g.
coef(mdl)
## (Intercept) year
## -476.0153912 0.2441911
coef(mdl.ac)
## (Intercept) year
## -457.934600 0.234421
Lets check the diagnostics. These have to be done by hand this time. First, the Tukey Anscombe
plot(fitted(mdl.ac),residuals(mdl.ac))
abline(h=0,lty=3)
Ok. And the qqplot
qqnorm(mdl.ac)
And the acf of the residuals
acf(residuals(mdl.ac,type="p"))
How do we decide on the best model? By AIC of course!
library(MuMIn)
model.sel(mdl,mdl.ac)
## Model selection table
## (Intrc) year family class correlation na.action df logLik AICc delta weight
## mdl.ac -457.9 0.2344 gaussian gls corAR1(~year) na.omit 4 -64.87 139.3 0.00 1
## mdl -476.0 0.2442 gaussian lm 3 -77.34 161.6 22.29 0
## Models ranked by AICc(x)
The autocorrelated version of the model is an easy winner.
| This work by Mark R Payne is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. For details, see http://creativecommons.org/licenses/by-nc-sa/3.0/deed.en_US Basically, this means that you are free to “share” and “remix” for non-commerical purposes as you see fit, so long as you “attribute” me for my contribution. Derivatives can be distributed under the same or similar license. |
| This work comes with ABSOLUTELY NO WARRANTY or support. |
| *This work should also be considered as BEER-WARE. For details, see http://en.wikipedia.org/wiki/Beerware* |