Week 3: Forecasting and regression

This is sort of a work in progress…beware.

Loading tree-ring data and building chronology

require(dplR)

## Loading required package: dplR

## Warning: package 'dplR' was built under R version 3.1.3

mwa <- read.rwl('MWA_308series.rwl',long=TRUE) 
mwa.rwi <- detrend(rwl = mwa, method = "ModNegExp")
mwa.crn <- chron(mwa.rwi, prefix = "MWA")

Above, I loaded my tree-ring data from Mount Washington, Nevada, USA using dplR. I detrended the data and created a mean value chronology.

Forecasting tree-ring data

Forecasting tree-ring data is something that I don’t think people normally do, but let’s give it a try anyway. I’m going to try to predict 50 years ahead.

mwa.ar <- ar(mwa.crn$MWAstd)
mwa.forecast <- predict(mwa.ar,n.ahead=50)
plot(mwa.crn$MWAstd,xlim=c(3970,4050),type='l')
lines(mwa.forecast$pred,col='red',lwd=1.5)

upper <- mwa.forecast$pred + mwa.forecast$se
lower <- mwa.forecast$pred - mwa.forecast$se
xx <- c(time(mwa.forecast$pred),rev(time(mwa.forecast$pred)))
yy <- c(upper,rev(lower))
plot(mwa.crn$MWAstd,xlim=c(3970,4050),col='grey',lwd=1.5,type='b',pch=20,ylab='RWI',xlab='Year')
polygon(xx,yy,col='lightgreen',border=NA)
points(mwa.forecast$pred,col='darkgreen',lwd=1.5,pch=20)
lines(mwa.forecast$pred,col='darkgreen',lwd=1.5)

I used the ar function to fit an ar model with maximum order 5. My reasons for choosing a maximum order are fully explained in week 2’s markdown, but models of greater order did not seem to improve prediction in this case. I forecasted my data 50 years out, which isn’t much compared to the length of the rest of my data. After this 50 year period, my data regress to the mean. This may be because of a lack of structure in the data, limiting the ability to predict other than giving a mean value.

TreeClim regression

I’m going to do a regression of temperature vs. ring width index for high elevation ring-width data from Nevada (these are maybe my data?) First, let’s start with ordinary least squares regression, then I’ll check out the residuals using acf plots.

load(file="treeClim.Rdata")
year <- treeClim[,1]
rwi <- treeClim[,2]
temp <- treeClim[,3]
ols <- lm(rwi~temp)
acf(residuals(ols))

pacf(residuals(ols))

summary(ols)

## 
## Call:
## lm(formula = rwi ~ temp)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.34041 -0.05925 -0.00374  0.07165  0.25792 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.58647    0.09838   26.29   <2e-16 ***
## temp         0.34243    0.02336   14.66   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1035 on 110 degrees of freedom
## Multiple R-squared:  0.6614, Adjusted R-squared:  0.6583 
## F-statistic: 214.8 on 1 and 110 DF,  p-value: < 2.2e-16

The residuals actually look clean, so I would use an ols regression here probably. Here’s a plot of temperature vs. RWI.

plot(temp,rwi)
abline(ols)
legend(x='topleft',bty='n', 'r-squared = 0.66')

I know that I was able to fit an AR(5) model to my tree-ring data, so let’s try and see what kind of model fits to rwi from the TreeClim data.

acf(rwi)

pacf(rwi)

rwi.ar <- ar(rwi)

Looks like an AR(2) model fits the ring-width data best. I’m going to compare OLS and GLS regression of rwi vs. year.

require(nlme)

## Loading required package: nlme

## Warning: package 'nlme' was built under R version 3.1.3

gls1 <- gls(rwi~year, correlation = corARMA(rwi.ar$ar,p=2,q=0))
ols1 <- lm(rwi~year)
summary(gls1)

## Generalized least squares fit by REML
##   Model: rwi ~ year 
##   Data: NULL 
##         AIC       BIC   logLik
##   -77.48337 -63.98097 43.74169
## 
## Correlation Structure: ARMA(2,0)
##  Formula: ~1 
##  Parameter estimate(s):
##      Phi1      Phi2 
## 0.1779456 0.2215703 
## 
## Coefficients:
##                 Value Std.Error   t-value p-value
## (Intercept) -3.732103 1.4063172 -2.653813  0.0091
## year         0.002502 0.0007205  3.472927  0.0007
## 
##  Correlation: 
##      (Intr)
## year -1    
## 
## Standardized residuals:
##        Min         Q1        Med         Q3        Max 
## -2.2642166 -0.6395065 -0.0647902  0.7038947  2.4756667 
## 
## Residual standard error: 0.1602786 
## Degrees of freedom: 112 total; 110 residual

summary(ols1)

## 
## Call:
## lm(formula = rwi ~ year)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.36365 -0.10279 -0.01106  0.11337  0.39611 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -3.6905536  0.9037781  -4.083 8.44e-05 ***
## year         0.0024813  0.0004631   5.358 4.66e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1584 on 110 degrees of freedom
## Multiple R-squared:  0.207,  Adjusted R-squared:  0.1998 
## F-statistic: 28.71 on 1 and 110 DF,  p-value: 4.663e-07

plot(year,rwi)
abline(gls1)
abline(ols1,col='red')

There seem to be no differences between the OLS and GLS regression by looking at the trendlines in the plot above. When looking at the summaries, I can tell that the two regressions are very slightly different. You can’t see two trend lines because they are overlaying on each other. I may have done something wrong…I need to look into this more.

Week 3: Forecasting and regression

Tyler Tran

Loading tree-ring data and building chronology

Forecasting tree-ring data

TreeClim regression