1 About

This is an intro assignment for the class Applied Time-Series and Spatial Analysis for Environmental Data. This is an R based statistics course offered by the Huxley College of the Environment, within Western Washington University.

1.1 Data Summary

The data I’m working with here are hourly temperature values from a weather station on Mount Washington, in the Snake Mountain Range in Eastern Nevada(approx. 38°54′54″ N 114°18′33″ W). The temperatures are hourly averages in decrees C from May 1st 2014 to September 30th 2014. This time period is important because it spans most (if not all) of the growing season at this site. We’re investigating the treeline dynamics and growth response of high-elevation bristlecone pine trees and so growing season temperatures are important.

mwa <- read.table("/Users/jamisbruening/Desktop/spacetime/MWA/MWAhourly.csv", header=T, sep=",")
sum.aveT <- mwa[5090:8761,7]
sum.ave.temp <- ts(sum.aveT, start=2014+((121*24)/(356*24)), frequency=365*24)

2 Plotting

Now, lets take a look with summary(sum.ave.temp) and plot(sum.ave.temp)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -11.000   5.600   9.100   8.726  12.600  22.900

The dashed line in plot above represents a the 6.4 °C isotherm thought to be important to high-elevation treelines, globally (Paulsen and Korner 2014). This graph shows the hourly, daily, and monthly fluctuations in the data, although the daily trends are harder to see.

Zoomed in, its easier to see the daily fluctuations in temperature. One can also see longer fluctuations, on the order of several days.

Note the many outliers at the negative end of the spectrum, showing that despite the very normal looking spread of data (n=3671), there were still many cold temperatures recorded that were well below average. This seems to agree with a skewness of -0.57.

3 Filtering

I’ll start with moving averages of different lengths. Moving averages are useful, however they do not show trends at the very beginning and end of the series, based on the windowed average. We’ll then look at some other options that do not have this problem.

3.1 Moving Averages

Lets apply some different moving average filters here.

ma1d <- filter(x=sum.ave.temp, filter=rep(x=1/(25),times=(25)), sides=2)
ma1w <- filter(x=sum.ave.temp, filter=rep(x=1/(7*24+1),times=(7*24+1)), sides=2)
ma4w <- filter(x=sum.ave.temp, filter=rep(x=1/(28*24+1),times=(28*24+1)), sides=2)

Here, I’ve made 3 moving average filters, of different lengths. ma1d is a day in length, ma1w is a week and ma4w is about a month. The length of the moving average one chooses should be determined by the length of different periodic trends you believe to be in the data. Based on your theory, the shape of the smoother will show if these periods do in fact exist (I think).

Now July and August

Here, its pretty clear how the different moving averages are smoothing the data. The purple smoother averages the 12 hours before and after time t, the blue smoother looks at all the hours within 3.5 days of time t, and the red smoother takes into account all the hours within 2 weeks of time t. Thus, the purple line represents a sort of ‘daily’ average (not exactly daily because each specific hour has its own ‘day’ defined by the 12 hours before and after it, but its a good visual approximation),the blue curve a ‘weekly’ average, and the red curve a ‘monthly’ average.

Yet, we have the problem of missing data at the beginning and end of the moving averages.

head(cbind(sum.ave.temp,ma1d),n=15)

##       sum.ave.temp  ma1d
##  [1,]         -5.1    NA
##  [2,]         -4.9    NA
##  [3,]         -4.9    NA
##  [4,]         -5.0    NA
##  [5,]         -3.8    NA
##  [6,]         -2.3    NA
##  [7,]          0.0    NA
##  [8,]          2.0    NA
##  [9,]          4.1    NA
## [10,]          5.6    NA
## [11,]          6.7    NA
## [12,]          7.2    NA
## [13,]          8.7 2.520
## [14,]          7.7 2.816
## [15,]          8.1 3.120

This can be avoided using other smoothing techniques.

3.2 Lowess Smoother

Lowess smoothing is a locally weighted regression technique, similar to the loess function in R, as in section 1.5.4 of Cowperwait and Metcalf (2009). They explain its ability to provide a more complete smoothing curve (compared to the strict moving averages) is due to using a relatively small number of points on either side of a specific value where the smoother is being calculated.

f1 <- 1*24/length(sum.ave.temp)
f1.lo <- lowess(sum.ave.temp, f = f1)

f7 <- 7*24/length(sum.ave.temp)
f7.lo <- lowess(sum.ave.temp, f = f7)

f28 <- 28*24/length(sum.ave.temp)
f28.lo <- lowess(sum.ave.temp, f = f28)

Here are three lowess smoothers. Similarly, f1.lo is daily, f7.lo is weekly, and f28.lo is monthly.

Note the lowess curves look more similar to a higher order polynomial than the moving average, and seem to represent the underlying fluctuations slightly better (if only visually).

4 Decomposition

The normal decompose function did not like these data. I’m guessing its because they data are pretty irregular, there is no definite periodicity to the data, other than the daily fluctuations. So, I’m trying a decomposing method here that removes some of the different trends from the data using lowess smoothing.

First, here is the raw data again. plot(sum.ave.temp)

Now, using the different lowess smoothing lines(f1.lo,f7.lo,f28.lo) I will try to decompose the data, aiming to separate different length frequency signals.

For example,

f28 <- 28*24/length(sum.ave.temp)
f28.lo <- lowess(sum.ave.temp, f = f28)
month.smooth <- (sum.ave.temp-f28.lo$y)

removes any monthly trend (represented by the red line in the two figures above) from the data, only showing fluctuations that are encompased by the 4 week curve.

This can be plotted to show the residual (non-monthly) trends.

From the above plot (showing weekly and daily fluctuations) we can pull out more internal variation, via the same method. This time, I’ll make a week-long smoother and subtract it from month.smooth, to show the daily and hourly variation.

fweek <- 7*24/length(month.smooth)
fweek.lo <- lowess(month.smooth, f = fweek)

week.smooth <- (month.smooth-fweek.lo$y)

plot(week.smooth) now shows only the daily fluctuations in temperature, without the weekly and monthly variations.

We can repeat this process to pull out the daily fluctuations.

fday <- 24/length(week.smooth)
fday.lo <- lowess(week.smooth, f = fday)

day.smooth <- (week.smooth-fday.lo$y)

I’m not positive if this is a kosher method of looking at the different trends, but it seems to make sense to me, and looks ok. One could go on to test the significance, or to determine the dominant frequency signals inherent in the data using periodogram(sum.ave.temp) (rather than me arbitrarily setting a day, week, and month).

Thoughts, Comments, Suggestions?

5 References

Cowpertwait, P. S. & Metcalfe, A. V. 2009. Introductory Time Series with R. Springer.

Paulsen, J., & Körner, C. 2014. A climate-based model to predict potential treeline position around the globe. Alpine botany, 124(1), 1-12.

Applied Time Series and Spatial Analysis for Environmental Data

Time Series Intro

Jamis B.

April 5, 2015