Goal

Our ultimate goal is trying to predict flux tower data based on the information we have already known, e.g.: previous years of flux tower data, MODIS data.



1. Data Step

5 spots have been selected.

Corresponding Flux Tower Data Information

Long Lat Site Name Time Range Data Link Data
-106.1978 53.6289 SK Old Aspen 2003 – 2008 Old Aspen CO2Flux_AbvCnpy_39m
-104.6920 53.9163 SK Old Jack Pine 2003 – 2008 Old Jack Pine CO2Flux_AbvCnpy_28m
-105.1178 53.9872 Sk Southern Old Black Spruce 2003 – 2008 Southern Old Black Spruce CO2Flux_AbvCnpy_25m
-104.6453 53.8758 Sk 1975 (Young) Jack Pine 2004 – 2006 1975 (Young) Jack Pine CO2Flux_AbvCnpy_16m
-74.03653 49.26708 QC 2000 Harvested Black Spruce/Jack Pine (HBS00) 2003 – 2010 HBS00 CO2Flux_AbvCnpy_7m

Flux tower data has been collected per 30 min. There are 48 flux tower data have been collected per day.

Corresponding MODIS Data Information

MOIDS data has been collected per 16 days starting at the first day of each year.

MODIS data has been download from the following link: MODIS Land Product Subsets

  • the Number of Kilometers Encompassing the Center Location (both Above and Below and Left and Right) are 0.

  • time range is consistent with flux tower data for each location.

Data Manipulation

The collection frequency of flux tower data is much higher than MODIS data.

  • Computed the daily average of flux tower data

  • Computed the 16-day average of flux tower data (1-16 | 17-32 | 33-48 |…| 353-365/366)



1.1 SK-Old Aspen

Lat: 53.6289 / Long: -106.1978

Time range: 2003 – 2008

Data: CO2Flux_AbvCnpy_39m

1.2 SK-Old Jack Pine

Lat: 53.9163 / Long: -104.6920

Time range: 2003 – 2008

Data: CO2Flux_AbvCnpy_28m

1.3 SK-Southern Old Black Spruce

Lat: 53.9872 / Long: -105.1178

Time range: 2003 – 2008

Data: CO2Flux_AbvCnpy_25m

1.4 SK-1975 (Young) Jack Pine

Lat: 53.8758 / Long: -104.6453

Time range: 2004 – 2006

Data: CO2Flux_AbvCnpy_16m

1.5 QC-2000 Harvested Black Spruce/Jack Pine (HBS00)

Lat:49.26708 / Long:-74.03653

Time range: 2003 – 2010

Data: CO2Flux_AbvCnpy_7m

Compared to the previous 4 plots, HBS00 data doesn’t show obvious underlying cycle. (Why?)

1.6 Corresponding NDVI Data



2.Modelling Step

2.1 Generalized Linear Model

Generalized linear model has first been used which assuming the underlying cycle is the same for all these years.

Dependent variable:

  • Flux tower data

Independent variable:

  • Time (solar days)

  • Species of plants

  • NDVI data

  • Indicator variable (summer/winter)

Mix <- gam(Flx ~ s(Time,bs="cc",k=23)+s(Time,bs="cc",k=23,by=Spe)+ Spe + NDVI + Indi, weights = W,data = D)

As we can see, the residual plot indicates that the variance is not the same for each location which violates one of our model assumptions.

In a word, we can’t assume the underlying cycle is the same for each year – generalized linear model is not very proper here.

The problem can be visualized below:

These 4 plots show that the Spring came at different time for each location each year.

We need functional data analysis to solve this problem.

2.2 Functional Data Analysis

In the following analysis, we used Old Aspen data only. We will handle other sites later.

2.2.1 Smoothing

The first step to apply this method was smooth our data.

B-spline smoothing has been used.

  daytime <- seq(1,365,by=16)
  OA <- matrix(OldAspen_16day,nrow=23)
  OAbasis <- create.bspline.basis(c(1,365),nbasis=13,norder=4,breaks=c(1,120,130,140,170,200,220,250,260,270,365))
  OAbasismat <- eval.basis(daytime, OAbasis)
  OAcoef <- solve(crossprod(OAbasismat,OAbasismat))%*%crossprod(OAbasismat,OA)
  everyday <- 1:365
  yhat <-  eval.basis(everyday,OAbasis)%*%OAcoef

2.2.2 Registration

Registration was used to line up the peaks and valleys, in order to decrease variation which increase the accuracy of prediction.

#Gmat <- Gmat_3landmarks # read the landmarks located by locator()
Gmat0 <- read.csv("/Users/fhe7/Documents/School/Oct_15 Gammon/G3landmarks.csv",header=T)
Gmat <- Gmat0[,-1]

## T* 
  TStars <- apply(Gmat,2,mean)
  
## Use "Map" to find the True time corresponding to the time in T* scale  
  Tmat <- matrix(NA,nrow=6,ncol=365)
  for(i in 1:6){
    Tmat[i,] <- App(TStars,Gmat[i,])
  }

## Get the registered curve value based on the True time calulated above    
  Regimat <- matrix(NA,nrow=6,ncol=365)
  for(i in 1:6){
    Regimat[i,] <- Regi(Tmat[i,],OA[,i])
  }

2.2.3 Find curve for future year.

Data after registration saved in matrix – Regimat (6x365)

  • Smooth all 6 curves

  • Register 6 curves, save the registered curves in matrix – Regimat (6x365)

  • Smooth again based on the registered curves (sample the value at 1,17,…)

  • Get the re-smoothed curves, save the coefficients in matrix – Regicoef (13x6)



Generate future year on registered time scale through Regicoef by multivariate normal.

  OAbasis <- create.bspline.basis(c(1,365),nbasis=13,norder=4,breaks=c(1,120,130,140,170,200,220,250,260,270,365))
  Regibasismat <- eval.basis(daytime, OAbasis)
  Regicoef <- solve(crossprod(OAbasismat,OAbasismat))%*%crossprod(OAbasismat,t(Regimat[,daytime]))
  everyday <- 1:365
  yRegihat <-  eval.basis(everyday,OAbasis)%*%Regicoef
coef.predict <- mvrnorm(n=1,mu=apply(Regicoef,1,mean),Sigma=var(t(Regicoef)))    
y.predict <-  eval.basis(everyday,OAbasis)%*%coef.predict



Generate future year on real time scale through Regicoef and landmark matrix Gmat by multivariate normal.

  • Regicoef (13 x 6)

  • Gmat (6 x 3)

  • Combine these 2 matrices together and simulate data from MVN

  • Back to real time scale.

CoefGmat <- cbind(t(Regicoef),Gmat)

n_simu <- 100
Toge_sim <- mvrnorm(n=n_simu,mu=apply(CoefGmat,2,mean),Sigma=var(CoefGmat))
## Toge_sim (n_simu x (13 + 3)) will be used in the Together() as input

Together <- function(data){
  y.predict <-  eval.basis(everyday,OAbasis)%*%data[1:13]
  backto_T <- App(TStars,data[14:16],1:365)
  list(pred=y.predict,time=backto_T)
}
## Together() will produce predicted FlxTwr data under registered time scale
## Together() will also produce the (calculated back to True) time

##      bspl4.1      bspl4.2      bspl4.3      bspl4.4      bspl4.5 
##   0.23544038  -0.22243275   0.83238071   2.05177236  -1.94814802 
##      bspl4.6      bspl4.7      bspl4.8      bspl4.9     bspl4.10 
##  -6.81219906  -1.21155316  -4.31073348  -0.02592836   1.41328519 
##     bspl4.11     bspl4.12     bspl4.13           V1           V2 
##   1.53449178  -0.96343956   0.67264692 151.39240522 170.54987185 
##           V3 
## 276.69413986

So far the prediction doesn’t involve any spatial elements.

Check Zone

## very sharp turn at the start of spring example
## the corresponding landmarks
Toge_sim[5,][14:16]
##       V1       V2       V3 
## 151.3924 170.5499 276.6941
## regular turn at the start of sping example
Toge_sim[6,][14:16]
##       V1       V2       V3 
## 110.0230 165.3249 273.6039
plot(1:365,(1:365)/9,type="n",main="Landmark Simulations Comparison",ylim=c(-6,4))
abline(v=Toge_sim[5,][14:16],col=2,lty="dashed")
abline(v=Toge_sim[6,][14:16],lty="dashed")
legend("topleft",c("bad","good"),col=c(2,1),pch=c(19,19))

lines(1:365,Y_inter[5,],col=2)
lines(1:365,Y_inter[6,])

plot(as.numeric(Gmat[1,]),rep(1,3),pch=19,
     xlim=c(80,300),ylim=c(1,6),
     xlab="Original landmarks",ylab="years")
for(i in 2:6){
  points(as.numeric(Gmat[i,]),rep(i,3),pch=19)
}

# the minimal distance between landmark 1 and landmark 2 
# form the orignal 6 years of data 
min(Gmat[,2]-Gmat[,1])
## [1] 36.76926

Q: How small is small?