Our ultimate goal is trying to predict flux tower data based on the information we have already known, e.g.: previous years of flux tower data, MODIS data.
5 spots have been selected.
| Long | Lat | Site Name | Time Range | Data Link | Data |
|---|---|---|---|---|---|
| -106.1978 | 53.6289 | SK Old Aspen | 2003 – 2008 | Old Aspen | CO2Flux_AbvCnpy_39m |
| -104.6920 | 53.9163 | SK Old Jack Pine | 2003 – 2008 | Old Jack Pine | CO2Flux_AbvCnpy_28m |
| -105.1178 | 53.9872 | Sk Southern Old Black Spruce | 2003 – 2008 | Southern Old Black Spruce | CO2Flux_AbvCnpy_25m |
| -104.6453 | 53.8758 | Sk 1975 (Young) Jack Pine | 2004 – 2006 | 1975 (Young) Jack Pine | CO2Flux_AbvCnpy_16m |
| -74.03653 | 49.26708 | QC 2000 Harvested Black Spruce/Jack Pine (HBS00) | 2003 – 2010 | HBS00 | CO2Flux_AbvCnpy_7m |
Flux tower data has been collected per 30 min. There are 48 flux tower data have been collected per day.
MOIDS data has been collected per 16 days starting at the first day of each year.
MODIS data has been download from the following link: MODIS Land Product Subsets
the Number of Kilometers Encompassing the Center Location (both Above and Below and Left and Right) are 0.
time range is consistent with flux tower data for each location.
The collection frequency of flux tower data is much higher than MODIS data.
Computed the daily average of flux tower data
Computed the 16-day average of flux tower data (1-16 | 17-32 | 33-48 |…| 353-365/366)
Lat: 53.6289 / Long: -106.1978
Time range: 2003 – 2008
Data: CO2Flux_AbvCnpy_39m
Lat: 53.9163 / Long: -104.6920
Time range: 2003 – 2008
Data: CO2Flux_AbvCnpy_28m
Lat: 53.9872 / Long: -105.1178
Time range: 2003 – 2008
Data: CO2Flux_AbvCnpy_25m
Lat: 53.8758 / Long: -104.6453
Time range: 2004 – 2006
Data: CO2Flux_AbvCnpy_16m
Lat:49.26708 / Long:-74.03653
Time range: 2003 – 2010
Data: CO2Flux_AbvCnpy_7m
Compared to the previous 4 plots, HBS00 data doesn’t show obvious underlying cycle. (Why?)
Generalized linear model has first been used which assuming the underlying cycle is the same for all these years.
Dependent variable:
Independent variable:
Time (solar days)
Species of plants
NDVI data
Indicator variable (summer/winter)
Mix <- gam(Flx ~ s(Time,bs="cc",k=23)+s(Time,bs="cc",k=23,by=Spe)+ Spe + NDVI + Indi, weights = W,data = D)
As we can see, the residual plot indicates that the variance is not the same for each location which violates one of our model assumptions.
In a word, we can’t assume the underlying cycle is the same for each year – generalized linear model is not very proper here.
The problem can be visualized below:
These 4 plots show that the Spring came at different time for each location each year.
We need functional data analysis to solve this problem.
In the following analysis, we used Old Aspen data only. We will handle other sites later.
The first step to apply this method was smooth our data.
B-spline smoothing has been used.
daytime <- seq(1,365,by=16)
OA <- matrix(OldAspen_16day,nrow=23)
OAbasis <- create.bspline.basis(c(1,365),nbasis=13,norder=4,breaks=c(1,120,130,140,170,200,220,250,260,270,365))
OAbasismat <- eval.basis(daytime, OAbasis)
OAcoef <- solve(crossprod(OAbasismat,OAbasismat))%*%crossprod(OAbasismat,OA)
everyday <- 1:365
yhat <- eval.basis(everyday,OAbasis)%*%OAcoef
Registration was used to line up the peaks and valleys, in order to decrease variation which increase the accuracy of prediction.
#Gmat <- Gmat_3landmarks # read the landmarks located by locator()
Gmat0 <- read.csv("/Users/fhe7/Documents/School/Oct_15 Gammon/G3landmarks.csv",header=T)
Gmat <- Gmat0[,-1]
## T*
TStars <- apply(Gmat,2,mean)
## Use "Map" to find the True time corresponding to the time in T* scale
Tmat <- matrix(NA,nrow=6,ncol=365)
for(i in 1:6){
Tmat[i,] <- App(TStars,Gmat[i,])
}
## Get the registered curve value based on the True time calulated above
Regimat <- matrix(NA,nrow=6,ncol=365)
for(i in 1:6){
Regimat[i,] <- Regi(Tmat[i,],OA[,i])
}
Data after registration saved in matrix – Regimat (6x365)
Smooth all 6 curves
Register 6 curves, save the registered curves in matrix – Regimat (6x365)
Smooth again based on the registered curves (sample the value at 1,17,…)
Get the re-smoothed curves, save the coefficients in matrix – Regicoef (13x6)
OAbasis <- create.bspline.basis(c(1,365),nbasis=13,norder=4,breaks=c(1,120,130,140,170,200,220,250,260,270,365))
Regibasismat <- eval.basis(daytime, OAbasis)
Regicoef <- solve(crossprod(OAbasismat,OAbasismat))%*%crossprod(OAbasismat,t(Regimat[,daytime]))
everyday <- 1:365
yRegihat <- eval.basis(everyday,OAbasis)%*%Regicoef
coef.predict <- mvrnorm(n=1,mu=apply(Regicoef,1,mean),Sigma=var(t(Regicoef)))
y.predict <- eval.basis(everyday,OAbasis)%*%coef.predict
Regicoef (13 x 6)
Gmat (6 x 3)
Combine these 2 matrices together and simulate data from MVN
Back to real time scale.
CoefGmat <- cbind(t(Regicoef),Gmat)
n_simu <- 100
Toge_sim <- mvrnorm(n=n_simu,mu=apply(CoefGmat,2,mean),Sigma=var(CoefGmat))
## Toge_sim (n_simu x (13 + 3)) will be used in the Together() as input
Together <- function(data){
y.predict <- eval.basis(everyday,OAbasis)%*%data[1:13]
backto_T <- App(TStars,data[14:16],1:365)
list(pred=y.predict,time=backto_T)
}
## Together() will produce predicted FlxTwr data under registered time scale
## Together() will also produce the (calculated back to True) time
## bspl4.1 bspl4.2 bspl4.3 bspl4.4 bspl4.5
## 0.23544038 -0.22243275 0.83238071 2.05177236 -1.94814802
## bspl4.6 bspl4.7 bspl4.8 bspl4.9 bspl4.10
## -6.81219906 -1.21155316 -4.31073348 -0.02592836 1.41328519
## bspl4.11 bspl4.12 bspl4.13 V1 V2
## 1.53449178 -0.96343956 0.67264692 151.39240522 170.54987185
## V3
## 276.69413986
## very sharp turn at the start of spring example
## the corresponding landmarks
Toge_sim[5,][14:16]
## V1 V2 V3
## 151.3924 170.5499 276.6941
## regular turn at the start of sping example
Toge_sim[6,][14:16]
## V1 V2 V3
## 110.0230 165.3249 273.6039
plot(1:365,(1:365)/9,type="n",main="Landmark Simulations Comparison",ylim=c(-6,4))
abline(v=Toge_sim[5,][14:16],col=2,lty="dashed")
abline(v=Toge_sim[6,][14:16],lty="dashed")
legend("topleft",c("bad","good"),col=c(2,1),pch=c(19,19))
lines(1:365,Y_inter[5,],col=2)
lines(1:365,Y_inter[6,])
plot(as.numeric(Gmat[1,]),rep(1,3),pch=19,
xlim=c(80,300),ylim=c(1,6),
xlab="Original landmarks",ylab="years")
for(i in 2:6){
points(as.numeric(Gmat[i,]),rep(i,3),pch=19)
}
# the minimal distance between landmark 1 and landmark 2
# form the orignal 6 years of data
min(Gmat[,2]-Gmat[,1])
## [1] 36.76926
Q: How small is small?