yes

##Part 1 Prediction

The purpose of this first exercise was to fit an autoregressive model to the carbon dioxide data collected at Mauna Loa. This was done largely to demosntrate the limitations of the method as the prediction period is extended into the future from the end of the time-series. The data was read into R and plotted, with the data structure assessed. An AR model was fit to the data via the ‘ar’ function, and coefficients returned.

data(co2)
plot(co2)

str(co2)
##  Time-Series [1:468] from 1959 to 1998: 315 316 316 318 318 ...
co2.ar <- ar(co2)
co2.ar
## 
## Call:
## ar(x = co2)
## 
## Coefficients:
##       1        2        3        4        5        6        7        8  
##  1.0862  -0.0871  -0.0848   0.0227  -0.0041   0.0163   0.0005  -0.0425  
##       9       10       11       12       13       14  
##  0.0253   0.0821   0.1103   0.0491  -0.0724  -0.1099  
## 
## Order selected 14  sigma^2 estimated as  3.378

The autoregressive carbon dioxide model was used to predict 2 years (24 months), 12 years (144 months) and 52 years (624 months) into the future.The plots of these predictions clearly illustrate the limitations of this strict-AR approach, as the predictions become increasingly less plausible as the prediction series is extended due to the autoregressive nature of the model beginning to predict exclusively on other predicited measurements past a certain projection point. This causes the prediction to gradually become the mean of the series, with decreasing variation.

par(mfrow=c(3,1))

co2.forecast <- predict(co2.ar, n.ahead = 24)
plot(co2,xlim=c(1959, 2000),col="black",lwd=1, 
     ylab="Carbon dioxide (ppm)")
lines(co2.forecast$pred,col="red",lwd=1)

co2.forecast <- predict(co2.ar, n.ahead = 144)
plot(co2,xlim=c(1959, 2010),col="black",lwd=1, 
     ylab="Carbon dioxide (ppm)")
lines(co2.forecast$pred,col="red",lwd=1)

co2.forecast <- predict(co2.ar, n.ahead = 624)
plot(co2,xlim=c(1959, 2050),col="black",lwd=1, 
     ylab="Carbon dioxide (ppm)")
lines(co2.forecast$pred,col="red",lwd=1)

par(mfrow=c(1,1))

##Part 2 Prediction

This second exercise incorporates a more robust prediction method and allows for assessment via the exclusion of a part of the series from model development. This retained portion of teh series serves as a testing dataset for assessment of the accuracy of the predicted series. The “metrics” package was loaded for use in prediction assessment.

library("Metrics")

The “sunspot” data was read into R, with a period spanning from the start of the series to 1969 being designated as the training data and a period from 1970 until the end of the series serving as the testing data. Prior to and following series manipulations, the data structure was assessed to ensure it was still a “ts” object and that the specified index values were as anticipated.

str(sunspot.year)
##  Time-Series [1:289] from 1700 to 1988: 5 11 16 23 36 58 29 20 10 8 ...
sun.train <- window(sunspot.year, end = 1969) 
str(sun.train)
##  Time-Series [1:270] from 1700 to 1969: 5 11 16 23 36 58 29 20 10 8 ...
sun.test <- window(sunspot.year, start = 1970)
str(sun.test)
##  Time-Series [1:19] from 1970 to 1988: 104.5 66.6 68.9 38 34.5 ...
plot(sunspot.year,col="grey",ylab="Sunspots (n)")
lines(sun.train,col="red",lty="dotted",lwd=2)
lines(sun.test,col="blue",lty="dotted",lwd=2)

An autoregressive model was created from the training data, which specified the use of 9 AR coefficients. These coefficients were used to simulate predicted values for the testing period.

sun.ar <- ar(sun.train)
sun.ar
## 
## Call:
## ar(x = sun.train)
## 
## Coefficients:
##       1        2        3        4        5        6        7        8  
##  1.1816  -0.4239  -0.1655   0.1711  -0.1327   0.0746  -0.0534   0.0288  
##       9  
##  0.1767  
## 
## Order selected 9  sigma^2 estimated as  237.5
sun.forecast <- predict(sun.ar, n.ahead = 19)
str(sun.forecast)
## List of 2
##  $ pred: Time-Series [1:19] from 1970 to 1988: 86.9 65 42.8 26.1 16.5 ...
##  $ se  : Time-Series [1:19] from 1970 to 1988: 15.4 23.9 28.2 29.1 29.2 ...
plot(sunspot.year,col="grey",ylab="Sunspots (n)")
lines(sun.train,col="red",lty="dotted",lwd=1)
lines(sun.test,col="blue",lty="dotted",lwd=1)
lines(sun.forecast$pred, col="green", lty="dotted", lwd=1)

plot(sun.test,col="blue",lwd=2, ylab="Sunspots (n)", xlab="Year")
lines(sun.forecast$pred, col="purple", lwd=2)

In addition to visual assessment of the accuracy of the prediction, RMSE and R-squared were calculated. It is important to note that even with a relatively short period of prediction and 9 autoregressive coefficients, much of the actual dynamic pattern of teh actual sunspot series was lost by the prediction.

rmse(sun.test, sun.forecast$pred)
## [1] 37.63434
(cor(sun.test, sun.forecast$pred))^2
## [1] 0.5961584

##Part 3 CCF Prediction

In this portion, a time-series of population dynamics between phytoplankton and the zooplankton which graze upon them were used to make a prediction regarding another series of separate phytoplankton measurements and the predicted response of zooplankton. This was done using cross-correlation analysis to determine the most significant lag between the two systems and the level of correlation expressed.

library("Hmisc")
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:base':
## 
##     format.pval, units

The initial population series was read into R, with phytoplankton and zooplankton (producers and grazers, respectively) specified as separate “ts” objects with a common daily index.

phyto1<-readRDS("C:/Users/rouechn/Documents/RData/PhytoGrazersExp1.rds")
str(phyto1)
## 'data.frame':    200 obs. of  3 variables:
##  $ day      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ producers: num  0.246 0.268 0.331 0.276 0.258 ...
##  $ grazers  : num  0 0.000896 0.027056 0.0736 0.094798 ...
prod<-ts(phyto1$producers)
str(prod)
##  Time-Series [1:200] from 1 to 200: 0.246 0.268 0.331 0.276 0.258 ...
graz<-ts(phyto1$grazers)
str(graz)
##  Time-Series [1:200] from 1 to 200: 0 0.000896 0.027056 0.0736 0.094798 ...

The series were plotted together, with producers in green and grazers in purple. The lagging positive correlation of grazers to producers was readily apparent.

plot(prod, col="green", ylab="Relative biomass", xlab="Days")
lines(graz, col="purple")

Crossed ACF plots and a cross-correlogram were created, which further illsutrated the strong degree to which grazer populations were strongly correlated to producers. The maximum correlation coefficient between the two series was found to be +0.721 at a lag of 40 days, grazers following producers.

acf(ts.union(prod,graz))
print(acf(ts.union(prod,graz)))

## 
## Autocorrelations of series 'ts.union(prod, graz)', by lag
## 
## , , prod
## 
##  prod         graz        
##   1.000 (  0)  0.214 (  0)
##   0.920 (  1)  0.226 ( -1)
##   0.908 (  2)  0.243 ( -2)
##   0.892 (  3)  0.265 ( -3)
##   0.894 (  4)  0.266 ( -4)
##   0.879 (  5)  0.283 ( -5)
##   0.870 (  6)  0.293 ( -6)
##   0.860 (  7)  0.308 ( -7)
##   0.843 (  8)  0.323 ( -8)
##   0.832 (  9)  0.343 ( -9)
##   0.807 ( 10)  0.372 (-10)
##   0.795 ( 11)  0.383 (-11)
##   0.769 ( 12)  0.402 (-12)
##   0.752 ( 13)  0.407 (-13)
##   0.726 ( 14)  0.417 (-14)
##   0.713 ( 15)  0.440 (-15)
##   0.700 ( 16)  0.470 (-16)
##   0.673 ( 17)  0.483 (-17)
##   0.646 ( 18)  0.484 (-18)
##   0.624 ( 19)  0.515 (-19)
##   0.594 ( 20)  0.528 (-20)
## 
## , , graz
## 
##  prod         graz        
##   0.214 (  0)  1.000 (  0)
##   0.177 (  1)  0.830 (  1)
##   0.148 (  2)  0.844 (  2)
##   0.123 (  3)  0.800 (  3)
##   0.107 (  4)  0.796 (  4)
##   0.082 (  5)  0.788 (  5)
##   0.079 (  6)  0.775 (  6)
##   0.036 (  7)  0.764 (  7)
##   0.035 (  8)  0.767 (  8)
##   0.004 (  9)  0.736 (  9)
##  -0.016 ( 10)  0.728 ( 10)
##  -0.046 ( 11)  0.724 ( 11)
##  -0.063 ( 12)  0.693 ( 12)
##  -0.089 ( 13)  0.703 ( 13)
##  -0.121 ( 14)  0.649 ( 14)
##  -0.126 ( 15)  0.662 ( 15)
##  -0.148 ( 16)  0.628 ( 16)
##  -0.183 ( 17)  0.617 ( 17)
##  -0.213 ( 18)  0.581 ( 18)
##  -0.205 ( 19)  0.573 ( 19)
##  -0.246 ( 20)  0.540 ( 20)
ccf(prod,graz, lag.max=100)

The second phytoplankton series was read into R and plotted, with data structure assessed.

phyto2<-readRDS("C:/Users/rouechn/Documents/RData/PhytoExp2.rds")
str(phyto2)
## 'data.frame':    200 obs. of  2 variables:
##  $ day      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ producers: num  0.0848 0.209 0.1167 0.165 0.1101 ...
prod2<-ts(phyto2)
str(prod2)
##  Time-Series [1:200, 1:2] from 1 to 200: 1 2 3 4 5 6 7 8 9 10 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:2] "day" "producers"
plot(prod2[,2], ylab="Relative biomass", xlab="Days")

The index values of the second producer series was lag-adjusted forty days into the future in order to create the predicted grazer series.

prod2.lag40<-Lag(prod2[,2], shift = 40)

The correlation coefficient from the ccf plot above was used to adjust the lagged prodcuer plot in order to represent the predicted grazer series biomass values.

graz2<-(0.721*(prod2.lag40))
str(graz2)
##  Time-Series [1:200] from 1 to 200: NA NA NA NA NA NA NA NA NA NA ...

Something weird happened here with the second grazer series and I recreated the series as a “ts” object again, I’m not sure what or where it happened but I had double index values at this point.

prod2.prod<-prod2[,2]
str(prod2.prod)
##  Time-Series [1:200] from 1 to 200: 0.0848 0.209 0.1167 0.165 0.1101 ...

The predicted grazer population response was plotted in purple with the known producer series in green, as before. The relationship between the two populations appears reasonable and feasible based on the relationship between the two populations in the first series.

plot(prod2[,2], col="darkgreen", ylab="Relative biomass", xlab="Days")
lines(graz2, col="purple")