1 About

This is an assignment for Applied Time-Series and Spatial Analysis for Environmental Data. This course is a graduate level, R-based statistics course offered by the Huxley College of the Environment, within Western Washington University.

2 Forecasting

Here I’m following Andy’s work and am trying to subset my data, use a training set (years 1700-1963) to train and model, and then use the rest of the data (years 1964-1988) to see how well the model did.

2.1 Sunspot Modeling

I might be doing something wrong, or am just getting poor results. I’m not sure yet.

See below.

Make the training and testing subsets.

train.sun <- window(sunspot.year, end = 1963)  
test.sun <- window(sunspot.year, start = 1963)

And now make the model (using years 1700-1963) and forecast ahead 25 years (1964-1988).

train.sun.ar <- ar(train.sun)
train.sun.forecast <- predict(train.sun.ar, n.ahead=25)

Plotting the results gives this.

Zoomed in.

Repeating this analysis, but with a shorter range of years to make and model and more years to predict…

2.1.1 Updating with recent data

Here I’ll use the most up-to-date dataset and see how the forecast using all the data in sunspot.year stacks up. See data(sunspot.year) for more info, and the link to the Royal Observatory of Belgium’s sunspot data.

dat<- read.csv("ISSN_Y_tot.csv",header=F)
newspots2 <- dat[,2]
newspots <- ts(newspots2, start=1700, end=2014, frequency=1)

# shows that they are the same
c(sunspot.year[54],newspots[54])

## [1] 30.7 30.7

# shows that they are the same
c(sunspot.year[204],newspots[204])

## [1] 24.4 24.4

tail(time(newspots))

## [1] 2009 2010 2011 2012 2013 2014

So, I’ll repeat the analysis Andy did in the notes: I’ll be using train2sun (this is the same as sunspot.year, see below) to build a model of sunspots from 1700-1988. I’ll then use this model to predict the sunspot numnbers from 1988-2015. I’ll break the new data into a training set to create the model, and validation set to see how well the model did.

train2.sun <- window(newspots, end=1988)
valid.sun <- window(newspots, start=1988)
c(length(train2.sun),length(valid.sun))

## [1] 289  27

sunspot.ar <- ar(train2.sun)
sunspot.ar

## 
## Call:
## ar(x = train2.sun)
## 
## Coefficients:
##       1        2        3        4        5        6        7        8  
##  1.1313  -0.3544  -0.1721   0.1382  -0.1345   0.0964  -0.0564   0.0083  
##       9  
##  0.1939  
## 
## Order selected 9  sigma^2 estimated as  267.4

sunspot.forecast <- predict(sunspot.ar, n.ahead = 1+length(newspots)-length(sunspot.year))

How does the model do?

This looks pretty decent. Lets zoom in and see if the predicted values fall within the standard error.

I’m pretty happy with these results.for almost 10 years the model predicts the annual number of sunspots to within 1 standard error term. Not too shabby.

2.2 El Nino Modeling

Revisiting the El Nino data from tseries, lets try the same analysis as above.

I will break the data (Pacific ocean temperatures, see library(tseries);data(nino) for more info) into two groups, the first to create a model, and the second to test the model’s accuracy. I could use all the data (no subestting) and forecast the data into the future, but then I’d be left without a means of testing the accuracy of the model. Here, because I’ve reserved some time period (1994-1999) before creating the model (with 1950-1994), I can then go back and test its accuracy.

library(tseries)
data(nino)
nino <- nino[,1]

train.nino <- window(nino, end = 1994)  
test.nino <- window(nino, start = 1994)

train.nino.ar <- ar(train.nino)
train.nino.forecast <- predict(train.nino.ar, n.ahead=length(test.nino))

Zooming in

And closer still, with errors

This actually seems to do a decent job. Other than summer ’97 - summer ’98 the forecasted values are mostly within +/- one standard error term. If you know a little about ENSO and global temperatures, you’ll recognize that the ’97-’98 En Nino event was massive, and helped set make 1998 the warmest year on record to date. I’m ok with the model NOT being able to predict this anomalous event, especially since it seems to pick up the trend again in ’99.

3 Regression:

From canvas, the treeClim data.

# load("U:/2015 Spring/SpaceTime 597/Week3/treeClim.Rdata")
load('/Users/jamisbruening/Dropbox (Personal)/School work/spacetime/Week3/treeClim.Rdata')
str(treeClim)

## 'data.frame':    112 obs. of  3 variables:
##  $ yr        : int  1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 ...
##  $ treeGrowth: num  0.975 0.966 1.293 0.931 1.097 ...
##  $ minTemp   : num  -4.71 -4.77 -4.3 -4.6 -4.24 ...

I’m going to look at the temperature and tree growth data over time. We see a positive growth trend in the ring width chronology over time, as well as increasing minimum temparatures. This is seen in many treeline ecotones; at the highest elevation limit of the tree species the individuals can be very ‘sensative’ to temperatre (Korner 2012). This means that temperature, more than any other environmental variable, is limiting the growth of the tree, thus the relationship between increasing temperatures and ring widths.

3.1 Growth vs Temperature and OLS Regression

Lets explore this relationship, starting with a simple linear relationship. Because we’re working with time series data, we want to make sure that if we decide to calculate a linear to explain some relationship in the data, the residuals should be independent and randomly distributed. See below.

yr <- treeClim[,1]
tG <- treeClim[,2]
Tmin <- treeClim[,3]

growth <- ts(tG, start=1896)
temp <- ts(Tmin, start=1896)

ols <- lm(growth~temp)
res <- residuals(ols)

This actually looks pretty good here. Visually the residuals don’t seem to follow a trend (that I can see anyway), and the ACF and PACF plots look pretty clean. Lag 17 in the PACF barely shows ‘significance’, but there is nothing I can think of physically in this system that would justify entertaining an AR(17) process here.

So, I feel comfortable using an OLS approach here.

3.2 Generalized Least Squares (GLS) Regression

Now, just for fun, lets look at temperature and growth individually, both as a function of time.

These variables seem to track eachother pretty well, and we know from above there is a pretty strong linear relationship betweem them.

Lets explore their own relationships in time.

3.2.1 GLS Regression on Growth

growth <- ts(tG, start=1896)
ols1 <- lm(growth~yr)
res1 <- residuals(ols1)
# summary(ols1)

There isn’t terribly strong evidence for an AR(2) process here, but lets try it out. The PACF plot shows lag 1 and 2 being barely significant, however tree growth (depending on the species, etc) is known to follow some sort of autoregressive process. Because the theory does not directly contradict when is seen in the above plots, lets try out a GLS approach here assuming an AR(2) process in growth as a function of year.

library(nlme)
ar.growth <- coef(arima(residuals(ols1), order = c(2,0,0), include.mean = FALSE))
ar.growth

##       ar1       ar2 
## 0.1557606 0.2000382

gls.growth <- gls(growth~yr, correlation = corARMA(ar.growth, growth~yr, 2,0))
# summary(gls.growth)

Here, the black dotted line is the best fit from the GLS model, and the thicker green line is the best fit from the OLS model. I’ve included both for two reasons. First, the PACF plot of the growth residuals from the OLS model shows a very weak AR(2) process. Its hard to tell is there are truely ‘significant’ autocorrelations in the residuals. Secondly, the best fit lines are VERY close, ptobably between of the weakness of the AR(2) process (if any at all).

3.2.2 GLS regression on Temperature

temp <- ts(Tmin, start=1896)
ols2 <- lm(temp~yr)
res2 <- residuals(ols2)
# summary(ols2)

Similarly to tree growth, the temperature timeseries seems to follow a very weak AR(2) process also. Again, this may not actually be significant, but we know that a GLS approach on data that is not temporally autocorrelated is the same as an OLS approach, and so we can gauge the strength of the AR(2) process here, by comparing the OLS and GLS best fit lines.

ar.temp <- coef(arima(residuals(ols2), order = c(2,0,0), include.mean = FALSE))
ar.temp

##        ar1        ar2 
## 0.06407408 0.22520545

gls.temp <- gls(temp~yr, correlation = corARMA(ar.temp, temp~yr, 2,0))
# summary(gls.temp)

4 References

Paulsen, J., & Korner, C. 2014. A climate-based model to predict potential treeline position around the globe. Alpine botany, 124(1), 1-12.

03 Forecasting and Regression

Jamis

April 17, 2015