Introduction

Here I present my work on a final project for Applied Time-Series and Spatial Analysis for Environmental Data. This course is a graduate level, R-based statistics course offered by the Huxley College of the Environment, within Western Washington University.

PRISM Temperature Analysis from Mount Washington, Nevada

This project looks at the annual temperature structure of PRISM temperatures from the highest reaches of Mount Washington, in the Snake Range in eastern Nevada. The data are the mean temperature climate normal (preceeding three decades, 1981-2010) from the peak of Mount Washington. This site is home to a climate sensative Great Basin bristlecone pine treeline, and is one of four such sites involved in larger treeline and paleoclimate studies, see (Mount Washington temperature modeling).

## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Loading required package: timeDate
## This is forecast 6.1

Data Organization

Below I show how I organized the temperatures, using 1981 as an example. During the analysis, each step was done in bulk for all 30 years.

First I read in the data, which are daily values from Januaary 1st 1981 to December 31st 2010. I then made a 30-year zoo object, from which I could use window() to make annual temperature series. For easy plotting and visual exploratory plotting I then dropped December 31st from all the leap years. Rather than displaying the noisy raw values, I instead smoothed each series with a 90-day lowess smoother.

Tmean <- read.csv("MWATmean.csv", header=T)
DateDay <- as.POSIXct(Tmean[,1],format = "%m/%d/%y",tz="")
temps <- zoo(x=Tmean[,2], DateDay)
eighty1 <- window(x=temps,index=index(temps), start="1981-01-01", end="1981-12-31")
eighty1 <- ts(c(eighty1), start=1, end=365)
days <- c(1:365)
lo.81 <- lowess(x = days, y = eighty1, f = 90/length(days))

Plotting

To save time and prevent headaches whilst plotting, I wrote all the smoothing curves into a data.frame, so I could automate the plotting with a for loop. I plotted the smoothed curves in 5-year bins, and designated separate colors for each year.

dat <- data.frame(lo.81$y,lo.82$y,lo.83$y,lo.84$y,lo.85$y,lo.86$y,
                  lo.87$y,lo.88$y,lo.89$y,lo.90$y,lo.91$y,lo.92$y,
                  lo.93$y,lo.94$y,lo.95$y,lo.96$y,lo.97$y,lo.98$y,
                  lo.99$y,lo.100$y,lo.101$y,lo.102$y,lo.103$y,lo.104$y,
                  lo.105$y,lo.106$y,lo.107$y,lo.108$y,lo.109$y,lo.110$y)

cols2 <- c("firebrick","dodgerblue","gold","forestgreen","black")
color2 <- c(rep(cols2, times=6))

1981-1985

plot(lo.81, type="n", ylim=c(-9,18),las=1, tcl=.5,mgp=c(2,0.5,0),
     xlab="Julian Day", ylab="Degrees C")
for(i in 1:5){
  lines(dat[,i],col=color2[i],lwd=2)
abline(h=5);abline(h=10);abline(h=15)
legend("topleft",legend=c(1981,1982,1983,1984,1985),
       col=c("firebrick","dodgerblue","gold","forestgreen","black"), 
       lwd=2,cex=1.2)
box(lwd=1)
axis(3,las=1, tcl=.5,mgp=c(2,0.5,0))
axis(4,las=1, tcl=.5,mgp=c(2,0.5,0))
}

1986-1990

1991-1995

1996-2000

2001-2005

2006-2010

Autocorrelation Structure

Being an environmental time series, one must assume there is significant autocorrelation. Weather is usually like the day before, which is the essence of autocorrelation.

Lets take a look at the autocorrelation structure of these temperatures. Above I plotted the smoothed temperatures because they’re much prettier and we were just looking for general trends. In analyzing the autocorrelation structure, we need to work with the actual data.

This is what the raw temperatures look like, just for fun.

plot(eighty1, type="n", ylim=c(-9,18),las=1, tcl=.5,mgp=c(2,0.5,0),
     xlab="Julian Day", ylab="Degrees C")
lines(eighty1,col=color2[1],lwd=2)
lines(eighty2,col=color2[2],lwd=2)
lines(eighty3,col=color2[3],lwd=2)
abline(h=5);abline(h=10);abline(h=15)
legend("topleft",legend=c(1981,1982,1983),
       col=c("firebrick","dodgerblue","gold"), 
       lwd=2,cex=1.2)
box(lwd=1)
axis(3,las=1, tcl=.5,mgp=c(2,0.5,0))
axis(4,las=1, tcl=.5,mgp=c(2,0.5,0))

I won’t overload you with 30 pacf plots, but here are a few different ones.

par(mfrow=c(7,1),mar=c(0.6,4,0,2),tcl=.3,mgp=c(1,.4,0),
    omi=c(.6,.5,.6,.5))
pacf(eighty1, main="",ylab="PACF",xaxt="n", bty="n",ylim=c(-.2,.8))
text(20,.3, "1981",cex=1.1)
axis(3, labels=c(0,5,10,15,20,25), at=c(0,5,10,15,20,25))

par(mar=c(0,4,0,2))
pacf(eighty6, main="",ylab="PACF", xaxt="n", bty="n",ylim=c(-.2,.8))
text(20,.3, "1986",cex=1.1)

pacf(ninety1, main="",ylab="PACF", xaxt="n", bty="n",ylim=c(-.2,.8))
text(20,.3, "1991",cex=1.1)

pacf(ninety6, main="",ylab="PACF", xaxt="n", bty="n",ylim=c(-.2,.8))
text(20,.3, "1996",cex=1.1)

pacf(twothou1, main="",ylab="PACF", xaxt="n", bty="n",ylim=c(-.2,.8))
text(20,.3, "2001",cex=1.1)

pacf(twothou6, main="",ylab="PACF", xaxt="n", bty="n",ylim=c(-.2,.8))
text(20,.3, "2006",cex=1.1)

par(mar=c(.6,4,0,2))
pacf(twothou10, main="",ylab="PACF",xaxt="n", bty="n",ylim=c(-.2,.8))
text(20,.3, "2010",cex=1.1)
axis(1, labels=c(0,5,10,15,20,25), at=c(0,5,10,15,20,25))

Observations

  1. Every year has significant positive correlation at lag 1. We knew this but its nice to see that the data show it.

  2. A few years have significant negative correlation at lag 2. It’s unclear what the process is behind this, if there is one. Regardless this pattern is only seen in a few years and so I’m not going to dwell too much on it right now. This is only 30 years of data and not enough to draw any grand patterns from I’d suspect.

  3. Most years have correlation at lag 3. There seems be a processes that is operating on longer lags (several days). Ideas anyone?

Lets take use a modeling approach to look at the ARMA processes that seem to be at play here.

AMRA(p,q) model fitting

Now that we have an idea of what processes (mathematically, at least; I’ll address the physical mechanisms I think may be involved here later on) are at work, I’ll see if fit the data an ARMA(p,q) model, using the R package forecast. This has a nice function (auto.arima) that will fit the most parsimonious ARMA(p,q) model to the data. I’ve set a limit for both p and q (3 in this case, because the pacf plots do not indicate any further significant lags) to guard against overfitting.

The auto.arima function output is class ARIMA. The model’s outputs are in a list, and we can call from it usefull info about the type of model, coefficents, residuals, etc.

dat1 <- data.frame(eighty1,eighty2,eighty3,eighty4,eighty5,
                   eighty6,eighty7,eighty8,eighty9,ninety0,
                   ninety1,ninety2,ninety3,ninety4,ninety5,
                   ninety6,ninety7,ninety8,ninety9,twothou0,
                   twothou1,twothou2,twothou3,twothou4,twothou5,
                   twothou6,twothou7,twothou8,twothou9,twothou10)

arma81 <- auto.arima(dat1[1], max.p=3, max.q=3)
coef81 <- arma81$coef
## Warning in auto.arima(dat1[4], max.p = 3, max.q = 3): Unable to fit final
## model using maximum likelihood. AIC value approximated
round(coef81,3) # most common
##    ar1    ma1    ma2 
##  0.383 -0.261 -0.457
round(coef85,3) # other order models
##    ar1    ar2    ma1    ma2    ma3 
##  0.024  0.466  0.145 -0.764 -0.193
round(coef96,3) # etc
##    ar1    ar2    ma1    ma2 
##  0.515  0.172 -0.365 -0.524

One can also pull out the model’s AIC value.

arma81$aic
## [1] 1559.841

Mechanism

The interesting thing here isn’t an AIC score or what specific order ARMA(p,q) model each year fits. What we should be focusing on is the mechanism behind these somewhat consistant results.

The above analysis shows that there is some AR(p) and MA(q) consistancy in the models for each year. Its interesting to think about what mechanisms might be causing this.

Here I brainstorm some possible causes behind the AR(p) and MA(q) processes.

Temperature Autocorrelation

Weather is a continuous process; its always happening and so when two measurements are taken closer together in time (two consequative days for example) those values will usually be closer together than measurements separated by longer time periods.

Hence, temporal autocorrelation.

This makes sense when thinking about temperatures at this site. As regional high and low pressure systems move through the region, they influence temperatures, and effect the series at greater lags than 1 day. High pressure systems move slowly, and can resultant conditions can persist for multiple days at a time in a certain location (think of the weather last week).

Moving Average Processes

A moving average mechanism is usually (I think?) an external factor that is effecting the system, independently of the AR process. This effects the residuals of an autocorrelation model; they are no longer independent. If we fit a parsimonious AR model, the remaining MA process can be modeled as well, hence ARMA models, which combine the two.

These results from Mount Washington may suggest there is/are some other processes at play that are effecting local temperatures, but are not directly related to the mechanism behind the autocorrelation.

The million dollar question here is just this; is there a reasonable MA process at play at this site. By reasonable I mean, does it fit with our understanding of regional weather patterns and surface temperatures in complex mountainous terrain. Perhaps more localized weather events, occuring with somewhat consistant frequency, and traveling at different speeds than the regional high and low pressure systems?

Or something different. I’m not sure at this point.

Questions, comments?

Thanks