Phylogenetic distance will be square root and log transformed.
#saveRDS(data.cooc, file="cooc.z.data.rds")
data.cooc<-readRDS("cooc.z.data.rds")
head(data.cooc)
## gen.spec phy.dist cooc.z
## 1 0.047371556 78.891213 -2.39264565
## 2 0.274031336 121.972198 5.85930884
## 3 0.226659781 121.972198 -0.06566243
## 4 0.000248625 121.972198 -2.71153372
## 5 0.047620180 121.972198 -0.70770244
## 6 0.274279961 9.008272 -4.68947236
phy.dist.sqrt<-sqrt(data.cooc$phy.dist)
phy.dist.log<-log(data.cooc$phy.dist)
In the following analyzes, I compare a GAM with phylogenetic distance without a square root transformation to GAMs with phylogenetic distance square root and log transformed. The residuals of the models show that there is some problem with normality. However, the default ‘K’ values, as assessed by gam.check, are acceptable for all models except the log transformed model. GAM.check indicates that k value was too small for the .log model so I used k=6 instead (selected by a trial-and-error process, seeing when p-value of gam.check greater than p<0.05).
For te smooths, the upper limit of degrees of freedom is given by product of the K values provided by each marginal smooth less than one. For the construction exact choice is not generally ciritical, but should be large enough so you are reasonable sure of having enough degrees of freedom to represent the ‘truth’ reasonable well, but small enough for comuptational efficiency.
#set up GAM formula with interaction between two predictor variables####################################################################
#I am using tensor product smoothing because predictors not of same unig
#Do same GAM model with tensor product smoothing because predictors not of same unit; the second analysis includes a square root transformation of phy.dist
cooc.gam.te<-gam(cooc.z~te(phy.dist,gen.spec), dat=data.cooc)
summary(cooc.gam.te)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## cooc.z ~ te(phy.dist, gen.spec)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.1557 0.1149 -1.355 0.176
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(phy.dist,gen.spec) 5.072 5.495 8.523 3.36e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.0704 Deviance explained = 7.88%
## GCV = 7.4837 Scale est. = 7.4027 n = 561
cooc.gam.sqrt.te<-gam(cooc.z~te(phy.dist.sqrt,gen.spec), dat=data.cooc)
summary(cooc.gam.sqrt.te)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## cooc.z ~ te(phy.dist.sqrt, gen.spec)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.1557 0.1148 -1.356 0.176
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(phy.dist.sqrt,gen.spec) 5.353 6.084 7.505 8.19e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.0717 Deviance explained = 8.06%
## GCV = 7.4771 Scale est. = 7.3924 n = 561
cooc.gam.log.te<-gam(cooc.z~te(phy.dist.log,gen.spec), dat=data.cooc)
summary(cooc.gam.log.te)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## cooc.z ~ te(phy.dist.log, gen.spec)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.1557 0.1150 -1.353 0.176
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(phy.dist.log,gen.spec) 7.031 7.972 5.58 8.46e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.0682 Deviance explained = 7.99%
## GCV = 7.5279 Scale est. = 7.4201 n = 561
cooc.gam.log.te.6<-gam(cooc.z~te(phy.dist.log,gen.spec, k=6), dat=data.cooc)
summary(cooc.gam.log.te.6)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## cooc.z ~ te(phy.dist.log, gen.spec, k = 6)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.1557 0.1150 -1.354 0.176
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(phy.dist.log,gen.spec) 6.946 8.047 5.539 8.72e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.0684 Deviance explained = 8%
## GCV = 7.5251 Scale est. = 7.4185 n = 561
gam.check(cooc.gam.te)
##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 13 iterations.
## The RMS GCV score gradiant at convergence was 6.37398e-07 .
## The Hessian was positive definite.
## The estimated model rank was 25 (maximum possible: 25)
## Model rank = 25 / 25
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## te(phy.dist,gen.spec) 24.00 5.07 1.06 0.94
gam.check(cooc.gam.sqrt.te)
##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 15 iterations.
## The RMS GCV score gradiant at convergence was 4.960206e-07 .
## The Hessian was positive definite.
## The estimated model rank was 25 (maximum possible: 25)
## Model rank = 25 / 25
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## te(phy.dist.sqrt,gen.spec) 24.000 5.353 0.979 0.3
gam.check(cooc.gam.log.te)
##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 15 iterations.
## The RMS GCV score gradiant at convergence was 3.634287e-07 .
## The Hessian was positive definite.
## The estimated model rank was 25 (maximum possible: 25)
## Model rank = 25 / 25
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## te(phy.dist.log,gen.spec) 24.000 7.031 0.939 0.04
gam.check(cooc.gam.log.te.6)
##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 15 iterations.
## The RMS GCV score gradiant at convergence was 5.073532e-07 .
## The Hessian was positive definite.
## The estimated model rank was 36 (maximum possible: 36)
## Model rank = 36 / 36
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## te(phy.dist.log,gen.spec) 35.000 6.946 0.947 0.05
par(mfrow=c(1,3))
par(mar=c(5.1,5.1,5.6,4.1))
my.vis.gam(cooc.gam.te, type = "response", plot.type = "contour", color="gray", ylab="Difference in generalism and specialism",main="Predicted co-occurrence based on \ninteracting predictor variables",
xlab="Phylogenetic distance", cex.lab=1.25, cex.axis=1, cex.main=1, lwd=2, cex=2)
my.vis.gam(cooc.gam.sqrt.te, type = "response",color="gray",plot.type = "contour",
ylab="Difference in generalism and specialism",main="Predicted co-occurrence based on \ninteracting predictor variables",
xlab="Phylogenetic distance (square root)", cex.lab=1.25, cex.axis=1, cex.main=1, lwd=2, cex=2)
my.vis.gam(cooc.gam.log.te.6, type = "response",color="gray",plot.type = "contour",
ylab="Difference in generalism and specialism",main="Predicted co-occurrence based on \ninteracting predictor variables",
xlab="Phylogenetic distance (log)", cex.lab=1.25, cex.axis=1, cex.main=1, lwd=2, cex=2)
#textClick("a)", cex=1.5)
#textClick("b)", cex=1.5)
#textClick("c)", cex=1.5)
An ANOVA comparison indicates that the model with a square root transformation of phylogenetic distance explains marginally more variance in the data. Therefore, I will use the square root transformation of phylogenetic distance for the remainder of the analyzes.
#Compare 2 response variable models ANOVAs
anova(cooc.gam.te, cooc.gam.sqrt.te, test = "Chisq")
## Analysis of Deviance Table
##
## Model 1: cooc.z ~ te(phy.dist, gen.spec)
## Model 2: cooc.z ~ te(phy.dist.sqrt, gen.spec)
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 554.93 4108.0
## 2 554.65 4100.2 0.28021 7.8023 0.07893 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Again, the diagnostic plots indicated that there are problems with the normality of the data at the tails and that variance is not homogeneous either (it still looks triangular towards the higher fitted values). The default ‘K’ value for the model is acceptable.
#Therefore, choose te smoothing with sqrt of phylogenetic distance
gam.check(cooc.gam.sqrt.te, k.rep = 1000)
##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 15 iterations.
## The RMS GCV score gradiant at convergence was 4.960206e-07 .
## The Hessian was positive definite.
## The estimated model rank was 25 (maximum possible: 25)
## Model rank = 25 / 25
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## te(phy.dist.sqrt,gen.spec) 24.000 5.353 0.979 0.28
#Also these residual plots
par(mfrow=c(1,2))
plot(fitted(cooc.gam.sqrt.te),residuals(cooc.gam.sqrt.te))
plot(data.cooc$cooc.z,residuals(cooc.gam.sqrt.te))
This analysis would be nice to do, but because I am dealing with standardized co-occurrence scores it is not suitable for my data in their current form.
#Calculate GAM with a binomial distribution and corresponding logit link function
#The following cannot be done because I am working with co-occurrence .z scores and therefore have negative values and cannot fit to a binomial distribution
cooc.gam.sqrt.binomial.te<-gam(cooc.z~te(phy.dist.sqrt,gen.spec), dat=data.cooc, family=binomial)
summary(cooc.gam.sqrt.binomial.te)
The following are the lambda estimates for the GAM with the square root of phylogenetic distance as the x variable and the tensor smoothing parameter.
#extract the lambda estimates, and the RE/ML smoothness selection, the covariance matrix of the log lambda estimates
cooc.gam.sqrt.te$sp
## te(phy.dist.sqrt,gen.spec)1 te(phy.dist.sqrt,gen.spec)2
## 4.499236e-01 2.382014e+08
sp.vcov(cooc.gam.sqrt.te)
## NULL
#Alternativel, extract information as (root) variance components
gam.vcomp(cooc.gam.sqrt.te)
## te(phy.dist.sqrt,gen.spec)1 te(phy.dist.sqrt,gen.spec)2
## 2.188419e-01 9.703355e-06
This plot is not showing the plot labels in its current form, as the code is commented-out to do that.
library(RColorBrewer)
## Warning: package 'RColorBrewer' was built under R version 3.1.2
library(colorRamps)
## Warning: package 'colorRamps' was built under R version 3.1.1
#here, specify your color ramp, if you want something other than the usual R ones
#internet is helpful here
myColorRamp<-function(colors, values){
v <- (values - min(values))/diff(range(values))
x <- colorRamp (colors) (v)
rgb (x[,1], x[,2], x[,3], maxColorValue =255)
}
#color<-colorRampPalette(c("black", "white"))
z<-data.cooc$cooc.z
x<-sqrt(data.cooc$phy.dist)
y<-data.cooc$gen.spec
colsbw <- myColorRamp(c("grey99", "black"), z)
colfunc <- colorRampPalette(c("grey99", "black"))
#dev.new(width=5.9, height=4)
layout(matrix(1:3,ncol=3), width = c(3,3,1),height = c(1,1))
par(mar=c(5.1,5.1,4.1,1))
my.vis.gam(cooc.gam.sqrt.te,type="response", plot.type = "contour", color="gray", ylab="Difference in generalism and specialism", main="",
xlab="Phylogenetic distance(sqrt)", cex.lab=1.5, cex.axis=1.25, cex.main=0.1, lwd=2, cex=2)
par(mar=c(5.1,5.7,4.1,1.5))
plot(x=x,y=y, col = colsbw, pch=16, cex=1.25, bty="n", xlab="Phylogenetic distance(sqrt)", ylab="Difference in generalism and specialism",lwd=2, cex.axis=1.25, cex.lab=1.5 )
box(bty="l", lwd=2)
xl <- 0
yb <- 0
xr <- 1.5
yt <- 13
par(mar=c(5.1,2,4.1,2))
plot(NA,type="n",ann=FALSE,xlim=c(0,2),ylim=c(0,13),xaxt="n",yaxt="n",bty="n")
rect(
xl,
head(seq(yb,yt,(yt-yb)/20),-1),
xr,
tail(seq(yb,yt,(yt-yb)/20),-1),
col=colfunc(20)
)
#textClick("a)", cex=1.5)
#textClick("b)", cex=1.5)
#textClick("Co-occurrence",cex=1.25)
#textClick("-8", cex=1.5)
#textClick("13", cex=1.5)
Import and prepare data
#####################do for same dataset, but remove phy.dist of 40 my
data.cooc.40<-subset(data.cooc[data.cooc$phy.dist<40,])
phy.dist.40.sqrt<-sqrt(data.cooc.40$phy.dist)
gen.spec.40<-data.cooc.40$gen.spec
cooc.z.40<-data.cooc.40$cooc.z
head(data.cooc.40)
## gen.spec phy.dist cooc.z
## 6 0.27427996 9.008272 -4.689472
## 15 0.06309177 14.509728 4.691737
## 18 0.35320123 30.482091 -1.974338
## 19 0.07892127 30.482091 4.552806
## 22 0.17428106 32.031689 8.839739
## 33 0.53791685 24.481046 -1.469640
The following is the formulation for the GAM model for only those species pairs separated by less than 40 million years.
cooc.gam.40<-gam(cooc.z.40~te(phy.dist.40.sqrt,gen.spec.40), dat=data.cooc.40)
The residuals for the model for those species separated by only 40 million years are not normally distributed nor have homogenous variance.
gam.check(cooc.gam.40, k.rep = 1000)
##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 21 iterations.
## The RMS GCV score gradiant at convergence was 4.839791e-07 .
## The Hessian was positive definite.
## The estimated model rank was 25 (maximum possible: 25)
## Model rank = 25 / 25
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## te(phy.dist.40.sqrt,gen.spec.40) 24.00 3.26 1.05 0.74
#other diagnostic checks that should be done
par(mfrow=c(1,2))
plot(fitted(cooc.gam.40),residuals(cooc.gam.40))
plot(data.cooc.40$cooc.z,residuals(cooc.gam.40))
The following GAM for those species separated by only 40 million years explains more of the variance in the data than the corresponding GAM for the entire dataset. Phylogenetic distance was also square root transformed for this analysis that uses tensor product smoothing.
#Now check summary of model to see how it fits data
summary(cooc.gam.40)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## cooc.z.40 ~ te(phy.dist.40.sqrt, gen.spec.40)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.5422 0.2102 -2.58 0.0107 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(phy.dist.40.sqrt,gen.spec.40) 3.261 3.484 6.815 0.000118 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.11 Deviance explained = 12.7%
## GCV = 7.8362 Scale est. = 7.6431 n = 173
GAM predicting co-occurrence of only those species pairs separated by less than 40 million years.The ‘s’ smoothing term is used here and phylogenetic distance is square root transformed.
#Analyze less than 40 million years data with smooth term 's'
cooc.gam.40.s<-gam(cooc.z.40~s(phy.dist.40.sqrt,gen.spec.40), dat=data.cooc.40)
Again, for this model with the smooth term ‘s’ there are problems with normality and homogeneity of the residuals.
gam.check(cooc.gam.40.s, k.rep = 1000)
##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 13 iterations.
## The RMS GCV score gradiant at convergence was 9.843071e-07 .
## The Hessian was positive definite.
## The estimated model rank was 30 (maximum possible: 30)
## Model rank = 30 / 30
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(phy.dist.40.sqrt,gen.spec.40) 29.00 2.00 1.16 0.99
#other diagnostic checks that should be done
par(mfrow=c(1,2))
plot(fitted(cooc.gam.40.s),residuals(cooc.gam.40.s))
plot(data.cooc.40$cooc.z,residuals(cooc.gam.40.s))
The following shows the model summary of the GAM model for only those species pairs separated by less than 40 million years and the ‘s’ smooth term. Phylogenetic distance has been square root transformed.
#Now check summary of model to see how it fits data
summary(cooc.gam.40.s)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## cooc.z.40 ~ s(phy.dist.40.sqrt, gen.spec.40)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.5422 0.2122 -2.555 0.0115 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(phy.dist.40.sqrt,gen.spec.40) 2 2 9.818 9.06e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.093 Deviance explained = 10.4%
## GCV = 7.9305 Scale est. = 7.793 n = 173
I am showing this plot for interest sake. However, I should stay with the te model because that is the one I chose for the analysis of the full dataset and it explains slightly more of the variation in my data.
my.vis.gam(cooc.gam.40.s,theta=-30,ticktype="detailed", color="gray", plot.type="contour")
#Compare 2 response variable models ANOVAs
anova(cooc.gam.40, cooc.gam.40.s, test = "Chisq")
## Analysis of Deviance Table
##
## Model 1: cooc.z.40 ~ te(phy.dist.40.sqrt, gen.spec.40)
## Model 2: cooc.z.40 ~ s(phy.dist.40.sqrt, gen.spec.40)
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 168.74 1289.7
## 2 170.00 1324.8 -1.2609 -35.106 0.04632 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The following are the lambda estimates for the GAM model that includes only those species pairs with less than 40 million years of separation. Phylogenetic distances have been square root transformed for these analyzes and the tensor product smoothing term has been used.
#extract the lambda estimates, and the RE/ML smoothness selection, the covariance matrix of the log lambda estimates
cooc.gam.40$sp
## te(phy.dist.40.sqrt,gen.spec.40)1 te(phy.dist.40.sqrt,gen.spec.40)2
## 4.869586e+10 9.419442e+01
sp.vcov(cooc.gam.40)
## NULL
#Alternativel, extract information as (root) variance components
gam.vcomp(cooc.gam.40)
## te(phy.dist.40.sqrt,gen.spec.40)1 te(phy.dist.40.sqrt,gen.spec.40)2
## 2.671105e-06 6.239320e-02
color<-colorRampPalette(c("black", "white"))
z<-data.cooc.40$cooc.z
x<-sqrt(data.cooc.40$phy.dist)
y<-data.cooc.40$gen.spec
colsbw <- myColorRamp(c("grey99", "black"), z)
colfunc <- colorRampPalette(c("grey99", "black"))
#dev.new(width=5.9, height=4)
layout(matrix(1:3,ncol=3), width = c(3,3,1),height = c(1,1))
par(mar=c(5.1,5.1,4.1,1))
my.vis.gam(cooc.gam.40,type="response", plot.type = "contour", color="gray", ylab="Difference in generalism and specialism", main="",
xlab="Phylogenetic distance(sqrt)", cex.lab=1.5, cex.axis=1.25, cex.main=0.1, lwd=2, cex=2)
par(mar=c(5.1,5.7,4.1,1.5))
plot(x=x, y=y, col = colsbw, pch=16, cex=1.25, bty="n", xlab="Phylogenetic distance(sqrt)", ylab="Difference in generalism and specialism",lwd=2, cex.axis=1.25, cex.lab=1.5 )
box(bty="l", lwd=2)
xl <- 0
yb <- 0
xr <- 1.5
yt <- 13
par(mar=c(5.1,2,4.1,2))
plot(NA,type="n",ann=FALSE,xlim=c(0,2),ylim=c(0,13),xaxt="n",yaxt="n",bty="n")
rect(
xl,
head(seq(yb,yt,(yt-yb)/20),-1),
xr,
tail(seq(yb,yt,(yt-yb)/20),-1),
col=colfunc(20)
)
#textClick("a)", cex=1.5)
#textClick("b)", cex=1.5)
#textClick("Co-occurrence",cex=1.25)
#textClick("-8", cex=1.5)
#textClick("13", cex=1.5)
Import and prepare data
#####################do for same dataset, but remove phy.dist of.30 my
data.cooc.30<-subset(data.cooc[data.cooc$phy.dist<30,])
phy.dist.30.sqrt<-sqrt(data.cooc.30$phy.dist)
gen.spec.30<-data.cooc.30$gen.spec
cooc.z.30<-data.cooc.30$cooc.z
head(data.cooc.30)
## gen.spec phy.dist cooc.z
## 6 0.27427996 9.008272 -4.6894724
## 15 0.06309177 14.509728 4.6917372
## 33 0.53791685 24.481046 -1.4696399
## 34 0.60100862 24.481046 -2.4803428
## 50 0.16159843 24.481046 -0.9417983
## 51 0.22469020 24.481046 0.3893929
The following is the formulation for the GAM model for only those species pairs separated by less than 30 million years.
cooc.gam.30<-gam(cooc.z.30~te(phy.dist.30.sqrt,gen.spec.30), dat=data.cooc.30)
The residuals for the model for those species separated by only 30 million years are not normally distributed nor have homogenous variance.
gam.check(cooc.gam.30, k.rep = 1000)
##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 16 iterations.
## The RMS GCV score gradiant at convergence was 3.426105e-07 .
## The Hessian was positive definite.
## The estimated model rank was 25 (maximum possible: 25)
## Model rank = 25 / 25
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## te(phy.dist.30.sqrt,gen.spec.30) 24.000 3.000 0.948 0.22
#other diagnostic checks that should be done
par(mfrow=c(1,2))
plot(fitted(cooc.gam.30),residuals(cooc.gam.30))
plot(data.cooc.30$cooc.z,residuals(cooc.gam.30))
The following GAM for those species separated by only.30 million years explains more of the variance in the data than the corresponding GAM for the entire dataset. Phylogenetic distance was also square root transformed for this analysis that uses tensor product smoothing.
#Now check summary of model to see how it fits data
summary(cooc.gam.30)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## cooc.z.30 ~ te(phy.dist.30.sqrt, gen.spec.30)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.6775 0.2288 -2.962 0.0037 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## te(phy.dist.30.sqrt,gen.spec.30) 3 3 3.924 0.0103 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.0676 Deviance explained = 9.07%
## GCV = 6.6017 Scale est. = 6.3853 n = 122
GAM predicting co-occurrence of only those species pairs separated by less than.30 million years.The ‘s’ smoothing term is used here and phylogenetic distance is square root transformed.
#Analyze less than.30 million years data with smooth term 's'
cooc.gam.30.s<-gam(cooc.z.30~s(phy.dist.30.sqrt,gen.spec.30), dat=data.cooc.30)
Again, for this model with the smooth term ‘s’ there are problems with normality and homogeneity of the residuals.
gam.check(cooc.gam.30.s, k.rep = 1000)
##
## Method: GCV Optimizer: magic
## Smoothing parameter selection converged after 14 iterations.
## The RMS GCV score gradiant at convergence was 9.156534e-07 .
## The Hessian was positive definite.
## The estimated model rank was 30 (maximum possible: 30)
## Model rank = 30 / 30
##
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
##
## k' edf k-index p-value
## s(phy.dist.30.sqrt,gen.spec.30) 29.00 2.00 1.17 0.96
#other diagnostic checks that should be done
par(mfrow=c(1,2))
plot(fitted(cooc.gam.30.s),residuals(cooc.gam.30.s))
plot(data.cooc.30$cooc.z,residuals(cooc.gam.30.s))
The following shows the model summary of the GAM model for only those species pairs separated by less than 30 million years and the ‘s’ smooth term. Phylogenetic distance has been square root transformed.
#Now check summary of model to see how it fits data
summary(cooc.gam.30.s)
##
## Family: gaussian
## Link function: identity
##
## Formula:
## cooc.z.30 ~ s(phy.dist.30.sqrt, gen.spec.30)
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.6775 0.2286 -2.963 0.00368 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(phy.dist.30.sqrt,gen.spec.30) 2 2 5.458 0.00537 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.0686 Deviance explained = 8.4%
## GCV = 6.539 Scale est. = 6.3782 n = 122
I am showing this plot for interest sake. However, I should stay with the te model because that is the one I chose for the analysis of the full dataset and it explains slightly more of the variation in my data.
my.vis.gam(cooc.gam.30.s,theta=-30,ticktype="detailed", color="gray", plot.type="contour")
#Compare 2 response variable models ANOVAs
anova(cooc.gam.30, cooc.gam.30.s, test = "Chisq")
## Analysis of Deviance Table
##
## Model 1: cooc.z.30 ~ te(phy.dist.30.sqrt, gen.spec.30)
## Model 2: cooc.z.30 ~ s(phy.dist.30.sqrt, gen.spec.30)
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 118 753.46
## 2 119 759.01 -1 -5.5475 0.3513
The following are the lambda estimates for the GAM model that includes only those species pairs with less than 30 million years of separation. Phylogenetic distances have been square root transformed for these analyzes and the tensor product smoothing term has been used.
#extract the lambda estimates, and the RE/ML smoothness selection, the covariance matrix of the log lambda estimates
cooc.gam.30$sp
## te(phy.dist.30.sqrt,gen.spec.30)1 te(phy.dist.30.sqrt,gen.spec.30)2
## 7934677980 4155014
sp.vcov(cooc.gam.30)
## NULL
#Alternativel, extract information as (root) variance components
gam.vcomp(cooc.gam.30)
## te(phy.dist.30.sqrt,gen.spec.30)1 te(phy.dist.30.sqrt,gen.spec.30)2
## 6.336767e-06 2.897326e-04
color<-colorRampPalette(c("black", "white"))
z<-data.cooc.30$cooc.z
x<-sqrt(data.cooc.30$phy.dist)
y<-data.cooc.30$gen.spec
colsbw <- myColorRamp(c("grey99", "black"), z)
colfunc <- colorRampPalette(c("grey99", "black"))
#dev.new(width=5.9, height=4)
layout(matrix(1:3,ncol=3), width = c(3,3,1),height = c(1,1))
par(mar=c(5.1,5.1,4.1,1))
my.vis.gam(cooc.gam.30,type="response", plot.type = "contour", color="gray", ylab="Difference in generalism and specialism", main="",
xlab="Phylogenetic distance(sqrt)", cex.lab=1.5, cex.axis=1.25, cex.main=0.1, lwd=2, cex=2)
par(mar=c(5.1,5.7,4.1,1.5))
plot(x=x, y=y, col = colsbw, pch=16, cex=1.25, bty="n", xlab="Phylogenetic distance(sqrt)", ylab="Difference in generalism and specialism",lwd=2, cex.axis=1.25, cex.lab=1.5 )
box(bty="l", lwd=2)
xl <- 0
yb <- 0
xr <- 1.5
yt <- 13
par(mar=c(5.1,2,4.1,2))
plot(NA,type="n",ann=FALSE,xlim=c(0,2),ylim=c(0,13),xaxt="n",yaxt="n",bty="n")
rect(
xl,
head(seq(yb,yt,(yt-yb)/20),-1),
xr,
tail(seq(yb,yt,(yt-yb)/20),-1),
col=colfunc(20)
)
#textClick("a)", cex=1.5)
#textClick("b)", cex=1.5)
#textClick("Co-occurrence",cex=1.25)
#textClick("-5", cex=1.5)
#textClick("12", cex=1.5)