GAMs with square root transformation

Import data

Phylogenetic distance will be square root and log transformed.

#saveRDS(data.cooc, file="cooc.z.data.rds")
data.cooc<-readRDS("cooc.z.data.rds")
head(data.cooc)

##      gen.spec   phy.dist      cooc.z
## 1 0.047371556  78.891213 -2.39264565
## 2 0.274031336 121.972198  5.85930884
## 3 0.226659781 121.972198 -0.06566243
## 4 0.000248625 121.972198 -2.71153372
## 5 0.047620180 121.972198 -0.70770244
## 6 0.274279961   9.008272 -4.68947236

phy.dist.sqrt<-sqrt(data.cooc$phy.dist)
phy.dist.log<-log(data.cooc$phy.dist)

GAMs with interaction between two predictors

In the following analyzes, I compare a GAM with phylogenetic distance without a square root transformation to GAMs with phylogenetic distance square root and log transformed. The residuals of the models show that there is some problem with normality. However, the default ‘K’ values, as assessed by gam.check, are acceptable for all models except the log transformed model. GAM.check indicates that k value was too small for the .log model so I used k=6 instead (selected by a trial-and-error process, seeing when p-value of gam.check greater than p<0.05).

For te smooths, the upper limit of degrees of freedom is given by product of the K values provided by each marginal smooth less than one. For the construction exact choice is not generally ciritical, but should be large enough so you are reasonable sure of having enough degrees of freedom to represent the ‘truth’ reasonable well, but small enough for comuptational efficiency.

#set up GAM formula with interaction between two predictor variables####################################################################
#I am using tensor product smoothing because predictors not of same unig
#Do same GAM model with tensor product smoothing because predictors not of same unit; the second analysis includes a square root transformation of phy.dist

cooc.gam.te<-gam(cooc.z~te(phy.dist,gen.spec), dat=data.cooc)
summary(cooc.gam.te)

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## cooc.z ~ te(phy.dist, gen.spec)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  -0.1557     0.1149  -1.355    0.176
## 
## Approximate significance of smooth terms:
##                         edf Ref.df     F  p-value    
## te(phy.dist,gen.spec) 5.072  5.495 8.523 3.36e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.0704   Deviance explained = 7.88%
## GCV = 7.4837  Scale est. = 7.4027    n = 561

cooc.gam.sqrt.te<-gam(cooc.z~te(phy.dist.sqrt,gen.spec), dat=data.cooc)
summary(cooc.gam.sqrt.te)

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## cooc.z ~ te(phy.dist.sqrt, gen.spec)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  -0.1557     0.1148  -1.356    0.176
## 
## Approximate significance of smooth terms:
##                              edf Ref.df     F  p-value    
## te(phy.dist.sqrt,gen.spec) 5.353  6.084 7.505 8.19e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.0717   Deviance explained = 8.06%
## GCV = 7.4771  Scale est. = 7.3924    n = 561

cooc.gam.log.te<-gam(cooc.z~te(phy.dist.log,gen.spec), dat=data.cooc)
summary(cooc.gam.log.te)

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## cooc.z ~ te(phy.dist.log, gen.spec)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  -0.1557     0.1150  -1.353    0.176
## 
## Approximate significance of smooth terms:
##                             edf Ref.df    F  p-value    
## te(phy.dist.log,gen.spec) 7.031  7.972 5.58 8.46e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.0682   Deviance explained = 7.99%
## GCV = 7.5279  Scale est. = 7.4201    n = 561

cooc.gam.log.te.6<-gam(cooc.z~te(phy.dist.log,gen.spec, k=6), dat=data.cooc)
summary(cooc.gam.log.te.6)

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## cooc.z ~ te(phy.dist.log, gen.spec, k = 6)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  -0.1557     0.1150  -1.354    0.176
## 
## Approximate significance of smooth terms:
##                             edf Ref.df     F  p-value    
## te(phy.dist.log,gen.spec) 6.946  8.047 5.539 8.72e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.0684   Deviance explained =    8%
## GCV = 7.5251  Scale est. = 7.4185    n = 561

gam.check(cooc.gam.te)

## 
## Method: GCV   Optimizer: magic
## Smoothing parameter selection converged after 13 iterations.
## The RMS GCV score gradiant at convergence was 6.37398e-07 .
## The Hessian was positive definite.
## The estimated model rank was 25 (maximum possible: 25)
## Model rank =  25 / 25 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                          k'   edf k-index p-value
## te(phy.dist,gen.spec) 24.00  5.07    1.06    0.94

gam.check(cooc.gam.sqrt.te)

## 
## Method: GCV   Optimizer: magic
## Smoothing parameter selection converged after 15 iterations.
## The RMS GCV score gradiant at convergence was 4.960206e-07 .
## The Hessian was positive definite.
## The estimated model rank was 25 (maximum possible: 25)
## Model rank =  25 / 25 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                                k'    edf k-index p-value
## te(phy.dist.sqrt,gen.spec) 24.000  5.353   0.979     0.3

gam.check(cooc.gam.log.te)

## 
## Method: GCV   Optimizer: magic
## Smoothing parameter selection converged after 15 iterations.
## The RMS GCV score gradiant at convergence was 3.634287e-07 .
## The Hessian was positive definite.
## The estimated model rank was 25 (maximum possible: 25)
## Model rank =  25 / 25 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                               k'    edf k-index p-value
## te(phy.dist.log,gen.spec) 24.000  7.031   0.939    0.04

gam.check(cooc.gam.log.te.6)

## 
## Method: GCV   Optimizer: magic
## Smoothing parameter selection converged after 15 iterations.
## The RMS GCV score gradiant at convergence was 5.073532e-07 .
## The Hessian was positive definite.
## The estimated model rank was 36 (maximum possible: 36)
## Model rank =  36 / 36 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                               k'    edf k-index p-value
## te(phy.dist.log,gen.spec) 35.000  6.946   0.947    0.05

par(mfrow=c(1,3))
par(mar=c(5.1,5.1,5.6,4.1))
my.vis.gam(cooc.gam.te, type = "response", plot.type = "contour", color="gray",  ylab="Difference in generalism and specialism",main="Predicted co-occurrence based on \ninteracting predictor variables",
xlab="Phylogenetic distance", cex.lab=1.25, cex.axis=1, cex.main=1, lwd=2, cex=2)
my.vis.gam(cooc.gam.sqrt.te, type = "response",color="gray",plot.type = "contour",
 ylab="Difference in generalism and specialism",main="Predicted co-occurrence based on \ninteracting predictor variables",
xlab="Phylogenetic distance (square root)", cex.lab=1.25, cex.axis=1, cex.main=1, lwd=2, cex=2)
my.vis.gam(cooc.gam.log.te.6, type = "response",color="gray",plot.type = "contour",
 ylab="Difference in generalism and specialism",main="Predicted co-occurrence based on \ninteracting predictor variables",
xlab="Phylogenetic distance (log)", cex.lab=1.25, cex.axis=1, cex.main=1, lwd=2, cex=2)

#textClick("a)", cex=1.5)
#textClick("b)", cex=1.5)
#textClick("c)", cex=1.5)

ANOVA comparison with model with square root transformation of phylogenetic distance

An ANOVA comparison indicates that the model with a square root transformation of phylogenetic distance explains marginally more variance in the data. Therefore, I will use the square root transformation of phylogenetic distance for the remainder of the analyzes.

#Compare 2 response variable models ANOVAs
anova(cooc.gam.te, cooc.gam.sqrt.te, test = "Chisq")

## Analysis of Deviance Table
## 
## Model 1: cooc.z ~ te(phy.dist, gen.spec)
## Model 2: cooc.z ~ te(phy.dist.sqrt, gen.spec)
##   Resid. Df Resid. Dev      Df Deviance Pr(>Chi)  
## 1    554.93     4108.0                            
## 2    554.65     4100.2 0.28021   7.8023  0.07893 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Verify if the parameters are acceptable for the GAM with the square root transformation

Again, the diagnostic plots indicated that there are problems with the normality of the data at the tails and that variance is not homogeneous either (it still looks triangular towards the higher fitted values). The default ‘K’ value for the model is acceptable.

#Therefore, choose te smoothing with sqrt of phylogenetic distance
gam.check(cooc.gam.sqrt.te, k.rep = 1000)

## 
## Method: GCV   Optimizer: magic
## Smoothing parameter selection converged after 15 iterations.
## The RMS GCV score gradiant at convergence was 4.960206e-07 .
## The Hessian was positive definite.
## The estimated model rank was 25 (maximum possible: 25)
## Model rank =  25 / 25 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                                k'    edf k-index p-value
## te(phy.dist.sqrt,gen.spec) 24.000  5.353   0.979    0.28

#Also these residual plots
par(mfrow=c(1,2))
plot(fitted(cooc.gam.sqrt.te),residuals(cooc.gam.sqrt.te))
plot(data.cooc$cooc.z,residuals(cooc.gam.sqrt.te))

GAM with binomial distribution

This analysis would be nice to do, but because I am dealing with standardized co-occurrence scores it is not suitable for my data in their current form.

#Calculate GAM with a binomial distribution and corresponding logit link function 


#The following cannot be done because I am working with co-occurrence .z scores and therefore have negative values and cannot fit to a binomial distribution
cooc.gam.sqrt.binomial.te<-gam(cooc.z~te(phy.dist.sqrt,gen.spec), dat=data.cooc, family=binomial)
summary(cooc.gam.sqrt.binomial.te)

Extract lambda estimates

The following are the lambda estimates for the GAM with the square root of phylogenetic distance as the x variable and the tensor smoothing parameter.

#extract the lambda estimates, and the RE/ML smoothness selection, the covariance matrix of the log lambda estimates
cooc.gam.sqrt.te$sp

## te(phy.dist.sqrt,gen.spec)1 te(phy.dist.sqrt,gen.spec)2 
##                4.499236e-01                2.382014e+08

sp.vcov(cooc.gam.sqrt.te)

## NULL

#Alternativel, extract information as (root) variance components
gam.vcomp(cooc.gam.sqrt.te)

## te(phy.dist.sqrt,gen.spec)1 te(phy.dist.sqrt,gen.spec)2 
##                2.188419e-01                9.703355e-06

Plot GAM with square root of phylogenetic distance and tensor product smoothing

This plot is not showing the plot labels in its current form, as the code is commented-out to do that.

library(RColorBrewer)

## Warning: package 'RColorBrewer' was built under R version 3.1.2

library(colorRamps)

## Warning: package 'colorRamps' was built under R version 3.1.1

#here, specify your color ramp, if you want something other than the usual R ones
#internet is helpful here
myColorRamp<-function(colors, values){
    v <- (values - min(values))/diff(range(values))
    x <- colorRamp (colors) (v)
    rgb (x[,1], x[,2], x[,3], maxColorValue =255)
}
#color<-colorRampPalette(c("black", "white"))
z<-data.cooc$cooc.z
x<-sqrt(data.cooc$phy.dist)
y<-data.cooc$gen.spec
colsbw <- myColorRamp(c("grey99", "black"), z)
colfunc <- colorRampPalette(c("grey99", "black"))

#dev.new(width=5.9, height=4)
layout(matrix(1:3,ncol=3), width = c(3,3,1),height = c(1,1))
par(mar=c(5.1,5.1,4.1,1))
my.vis.gam(cooc.gam.sqrt.te,type="response", plot.type = "contour", color="gray",  ylab="Difference in generalism and specialism", main="",
xlab="Phylogenetic distance(sqrt)", cex.lab=1.5, cex.axis=1.25, cex.main=0.1, lwd=2, cex=2)
par(mar=c(5.1,5.7,4.1,1.5))
plot(x=x,y=y, col = colsbw, pch=16, cex=1.25, bty="n", xlab="Phylogenetic distance(sqrt)", ylab="Difference in generalism and specialism",lwd=2, cex.axis=1.25, cex.lab=1.5 )
box(bty="l", lwd=2)
xl <- 0
yb <- 0
xr <- 1.5
yt <- 13

par(mar=c(5.1,2,4.1,2))
plot(NA,type="n",ann=FALSE,xlim=c(0,2),ylim=c(0,13),xaxt="n",yaxt="n",bty="n")
rect(
     xl,
     head(seq(yb,yt,(yt-yb)/20),-1),
     xr,
     tail(seq(yb,yt,(yt-yb)/20),-1),
     col=colfunc(20)
    )

#textClick("a)", cex=1.5)
#textClick("b)", cex=1.5)
#textClick("Co-occurrence",cex=1.25)
#textClick("-8", cex=1.5)
#textClick("13", cex=1.5)

GAMs with phylogenetic distances of less than 40 million years

Import and prepare data

#####################do for same dataset, but remove phy.dist of 40 my
data.cooc.40<-subset(data.cooc[data.cooc$phy.dist<40,])
phy.dist.40.sqrt<-sqrt(data.cooc.40$phy.dist)
gen.spec.40<-data.cooc.40$gen.spec
cooc.z.40<-data.cooc.40$cooc.z
head(data.cooc.40)

##      gen.spec  phy.dist    cooc.z
## 6  0.27427996  9.008272 -4.689472
## 15 0.06309177 14.509728  4.691737
## 18 0.35320123 30.482091 -1.974338
## 19 0.07892127 30.482091  4.552806
## 22 0.17428106 32.031689  8.839739
## 33 0.53791685 24.481046 -1.469640

GAM model for the species pairs separated by less than 40 million years

The following is the formulation for the GAM model for only those species pairs separated by less than 40 million years.

cooc.gam.40<-gam(cooc.z.40~te(phy.dist.40.sqrt,gen.spec.40), dat=data.cooc.40)

Residual check for GAM model for the species pairs separated by less than 40 million years

The residuals for the model for those species separated by only 40 million years are not normally distributed nor have homogenous variance.

gam.check(cooc.gam.40, k.rep = 1000)

## 
## Method: GCV   Optimizer: magic
## Smoothing parameter selection converged after 21 iterations.
## The RMS GCV score gradiant at convergence was 4.839791e-07 .
## The Hessian was positive definite.
## The estimated model rank was 25 (maximum possible: 25)
## Model rank =  25 / 25 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                                     k'   edf k-index p-value
## te(phy.dist.40.sqrt,gen.spec.40) 24.00  3.26    1.05    0.74

#other diagnostic checks that should be done
par(mfrow=c(1,2))
plot(fitted(cooc.gam.40),residuals(cooc.gam.40))
plot(data.cooc.40$cooc.z,residuals(cooc.gam.40))

Summary of GAM model for the species pairs separated by less than 40 million years

The following GAM for those species separated by only 40 million years explains more of the variance in the data than the corresponding GAM for the entire dataset. Phylogenetic distance was also square root transformed for this analysis that uses tensor product smoothing.

#Now check summary of model to see how it fits data
summary(cooc.gam.40)

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## cooc.z.40 ~ te(phy.dist.40.sqrt, gen.spec.40)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  -0.5422     0.2102   -2.58   0.0107 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                                    edf Ref.df     F  p-value    
## te(phy.dist.40.sqrt,gen.spec.40) 3.261  3.484 6.815 0.000118 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =   0.11   Deviance explained = 12.7%
## GCV = 7.8362  Scale est. = 7.6431    n = 173

Analyze less than 40 million years data with smooth term ‘s’

GAM predicting co-occurrence of only those species pairs separated by less than 40 million years.The ‘s’ smoothing term is used here and phylogenetic distance is square root transformed.

#Analyze less than 40 million years data with smooth term 's'
cooc.gam.40.s<-gam(cooc.z.40~s(phy.dist.40.sqrt,gen.spec.40), dat=data.cooc.40)

Check residuals for GAM model with species separated by less than 40 million years and smooth term ‘s’

Again, for this model with the smooth term ‘s’ there are problems with normality and homogeneity of the residuals.

gam.check(cooc.gam.40.s, k.rep = 1000)

## 
## Method: GCV   Optimizer: magic
## Smoothing parameter selection converged after 13 iterations.
## The RMS GCV score gradiant at convergence was 9.843071e-07 .
## The Hessian was positive definite.
## The estimated model rank was 30 (maximum possible: 30)
## Model rank =  30 / 30 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                                    k'   edf k-index p-value
## s(phy.dist.40.sqrt,gen.spec.40) 29.00  2.00    1.16    0.99

#other diagnostic checks that should be done
par(mfrow=c(1,2))
plot(fitted(cooc.gam.40.s),residuals(cooc.gam.40.s))
plot(data.cooc.40$cooc.z,residuals(cooc.gam.40.s))

Summary of GAM model for the species pairs separated by less than 40 million years and smooth term ‘s’

The following shows the model summary of the GAM model for only those species pairs separated by less than 40 million years and the ‘s’ smooth term. Phylogenetic distance has been square root transformed.

#Now check summary of model to see how it fits data
summary(cooc.gam.40.s)

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## cooc.z.40 ~ s(phy.dist.40.sqrt, gen.spec.40)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  -0.5422     0.2122  -2.555   0.0115 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                                 edf Ref.df     F  p-value    
## s(phy.dist.40.sqrt,gen.spec.40)   2      2 9.818 9.06e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.093   Deviance explained = 10.4%
## GCV = 7.9305  Scale est. = 7.793     n = 173

Visualization of GAM model for the species pairs separated by less than 40 million years and smooth term ‘s’

I am showing this plot for interest sake. However, I should stay with the te model because that is the one I chose for the analysis of the full dataset and it explains slightly more of the variation in my data.

my.vis.gam(cooc.gam.40.s,theta=-30,ticktype="detailed", color="gray", plot.type="contour")

ANOVA comparison with model with two different smooth terms for species pairs separated by less than 40 million years

#Compare 2 response variable models ANOVAs
anova(cooc.gam.40, cooc.gam.40.s, test = "Chisq")

## Analysis of Deviance Table
## 
## Model 1: cooc.z.40 ~ te(phy.dist.40.sqrt, gen.spec.40)
## Model 2: cooc.z.40 ~ s(phy.dist.40.sqrt, gen.spec.40)
##   Resid. Df Resid. Dev      Df Deviance Pr(>Chi)  
## 1    168.74     1289.7                            
## 2    170.00     1324.8 -1.2609  -35.106  0.04632 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Extract lambda estimates for GAM model with te smoothing and squareroot of phylogenetic distance

The following are the lambda estimates for the GAM model that includes only those species pairs with less than 40 million years of separation. Phylogenetic distances have been square root transformed for these analyzes and the tensor product smoothing term has been used.

#extract the lambda estimates, and the RE/ML smoothness selection, the covariance matrix of the log lambda estimates
cooc.gam.40$sp

## te(phy.dist.40.sqrt,gen.spec.40)1 te(phy.dist.40.sqrt,gen.spec.40)2 
##                      4.869586e+10                      9.419442e+01

sp.vcov(cooc.gam.40)

## NULL

#Alternativel, extract information as (root) variance components
gam.vcomp(cooc.gam.40)

## te(phy.dist.40.sqrt,gen.spec.40)1 te(phy.dist.40.sqrt,gen.spec.40)2 
##                      2.671105e-06                      6.239320e-02

Plot GAM model with te smoothing and squareroot of phylogenetic distance for the species pairs with less than 40 million years of separation

color<-colorRampPalette(c("black", "white"))
z<-data.cooc.40$cooc.z
x<-sqrt(data.cooc.40$phy.dist)
y<-data.cooc.40$gen.spec
colsbw <- myColorRamp(c("grey99", "black"), z)
colfunc <- colorRampPalette(c("grey99", "black"))

#dev.new(width=5.9, height=4)
layout(matrix(1:3,ncol=3), width = c(3,3,1),height = c(1,1))
par(mar=c(5.1,5.1,4.1,1))
my.vis.gam(cooc.gam.40,type="response", plot.type = "contour", color="gray",  ylab="Difference in generalism and specialism", main="",
xlab="Phylogenetic distance(sqrt)", cex.lab=1.5, cex.axis=1.25, cex.main=0.1, lwd=2, cex=2)
par(mar=c(5.1,5.7,4.1,1.5))
plot(x=x, y=y, col = colsbw, pch=16, cex=1.25, bty="n", xlab="Phylogenetic distance(sqrt)", ylab="Difference in generalism and specialism",lwd=2, cex.axis=1.25, cex.lab=1.5 )
box(bty="l", lwd=2)
xl <- 0
yb <- 0
xr <- 1.5
yt <- 13

par(mar=c(5.1,2,4.1,2))
plot(NA,type="n",ann=FALSE,xlim=c(0,2),ylim=c(0,13),xaxt="n",yaxt="n",bty="n")
rect(
     xl,
     head(seq(yb,yt,(yt-yb)/20),-1),
     xr,
     tail(seq(yb,yt,(yt-yb)/20),-1),
     col=colfunc(20)
    )

#textClick("a)", cex=1.5)
#textClick("b)", cex=1.5)
#textClick("Co-occurrence",cex=1.25)
#textClick("-8", cex=1.5)
#textClick("13", cex=1.5)

GAMs with phylogenetic distances of less than 30 million years

Import and prepare data

#####################do for same dataset, but remove phy.dist of.30 my
data.cooc.30<-subset(data.cooc[data.cooc$phy.dist<30,])
phy.dist.30.sqrt<-sqrt(data.cooc.30$phy.dist)
gen.spec.30<-data.cooc.30$gen.spec
cooc.z.30<-data.cooc.30$cooc.z
head(data.cooc.30)

##      gen.spec  phy.dist     cooc.z
## 6  0.27427996  9.008272 -4.6894724
## 15 0.06309177 14.509728  4.6917372
## 33 0.53791685 24.481046 -1.4696399
## 34 0.60100862 24.481046 -2.4803428
## 50 0.16159843 24.481046 -0.9417983
## 51 0.22469020 24.481046  0.3893929

GAM model for the species pairs separated by less than 30 million years

The following is the formulation for the GAM model for only those species pairs separated by less than 30 million years.

cooc.gam.30<-gam(cooc.z.30~te(phy.dist.30.sqrt,gen.spec.30), dat=data.cooc.30)

Residual check for GAM model for the species pairs separated by less than 30 million years

The residuals for the model for those species separated by only 30 million years are not normally distributed nor have homogenous variance.

gam.check(cooc.gam.30, k.rep = 1000)

## 
## Method: GCV   Optimizer: magic
## Smoothing parameter selection converged after 16 iterations.
## The RMS GCV score gradiant at convergence was 3.426105e-07 .
## The Hessian was positive definite.
## The estimated model rank was 25 (maximum possible: 25)
## Model rank =  25 / 25 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                                      k'    edf k-index p-value
## te(phy.dist.30.sqrt,gen.spec.30) 24.000  3.000   0.948    0.22

#other diagnostic checks that should be done
par(mfrow=c(1,2))
plot(fitted(cooc.gam.30),residuals(cooc.gam.30))
plot(data.cooc.30$cooc.z,residuals(cooc.gam.30))

Summary of GAM model for the species pairs separated by less than 30 million years

The following GAM for those species separated by only.30 million years explains more of the variance in the data than the corresponding GAM for the entire dataset. Phylogenetic distance was also square root transformed for this analysis that uses tensor product smoothing.

#Now check summary of model to see how it fits data
summary(cooc.gam.30)

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## cooc.z.30 ~ te(phy.dist.30.sqrt, gen.spec.30)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  -0.6775     0.2288  -2.962   0.0037 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                                  edf Ref.df     F p-value  
## te(phy.dist.30.sqrt,gen.spec.30)   3      3 3.924  0.0103 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.0676   Deviance explained = 9.07%
## GCV = 6.6017  Scale est. = 6.3853    n = 122

Analyze less than 30 million years data with smooth term ‘s’

GAM predicting co-occurrence of only those species pairs separated by less than.30 million years.The ‘s’ smoothing term is used here and phylogenetic distance is square root transformed.

#Analyze less than.30 million years data with smooth term 's'
cooc.gam.30.s<-gam(cooc.z.30~s(phy.dist.30.sqrt,gen.spec.30), dat=data.cooc.30)

Check residuals for GAM model with species separated by less than 30 million years and smooth term ‘s’

Again, for this model with the smooth term ‘s’ there are problems with normality and homogeneity of the residuals.

gam.check(cooc.gam.30.s, k.rep = 1000)

## 
## Method: GCV   Optimizer: magic
## Smoothing parameter selection converged after 14 iterations.
## The RMS GCV score gradiant at convergence was 9.156534e-07 .
## The Hessian was positive definite.
## The estimated model rank was 30 (maximum possible: 30)
## Model rank =  30 / 30 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##                                    k'   edf k-index p-value
## s(phy.dist.30.sqrt,gen.spec.30) 29.00  2.00    1.17    0.96

#other diagnostic checks that should be done
par(mfrow=c(1,2))
plot(fitted(cooc.gam.30.s),residuals(cooc.gam.30.s))
plot(data.cooc.30$cooc.z,residuals(cooc.gam.30.s))

Summary of GAM model for the species pairs separated by less than 30 million years and smooth term ‘s’

The following shows the model summary of the GAM model for only those species pairs separated by less than 30 million years and the ‘s’ smooth term. Phylogenetic distance has been square root transformed.

#Now check summary of model to see how it fits data
summary(cooc.gam.30.s)

## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## cooc.z.30 ~ s(phy.dist.30.sqrt, gen.spec.30)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  -0.6775     0.2286  -2.963  0.00368 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                                 edf Ref.df     F p-value   
## s(phy.dist.30.sqrt,gen.spec.30)   2      2 5.458 0.00537 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.0686   Deviance explained =  8.4%
## GCV =  6.539  Scale est. = 6.3782    n = 122

Visualization of GAM model for the species pairs separated by less than 30 million years and smooth term ‘s’

my.vis.gam(cooc.gam.30.s,theta=-30,ticktype="detailed", color="gray", plot.type="contour")

ANOVA comparison with model with two different smooth terms for species pairs separated by less than 30 million years

#Compare 2 response variable models ANOVAs
anova(cooc.gam.30, cooc.gam.30.s, test = "Chisq")

## Analysis of Deviance Table
## 
## Model 1: cooc.z.30 ~ te(phy.dist.30.sqrt, gen.spec.30)
## Model 2: cooc.z.30 ~ s(phy.dist.30.sqrt, gen.spec.30)
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1       118     753.46                     
## 2       119     759.01 -1  -5.5475   0.3513

Extract lambda estimates for GAM model with te smoothing and squareroot of phylogenetic distance

The following are the lambda estimates for the GAM model that includes only those species pairs with less than 30 million years of separation. Phylogenetic distances have been square root transformed for these analyzes and the tensor product smoothing term has been used.

#extract the lambda estimates, and the RE/ML smoothness selection, the covariance matrix of the log lambda estimates
cooc.gam.30$sp

## te(phy.dist.30.sqrt,gen.spec.30)1 te(phy.dist.30.sqrt,gen.spec.30)2 
##                        7934677980                           4155014

sp.vcov(cooc.gam.30)

## NULL

#Alternativel, extract information as (root) variance components
gam.vcomp(cooc.gam.30)

## te(phy.dist.30.sqrt,gen.spec.30)1 te(phy.dist.30.sqrt,gen.spec.30)2 
##                      6.336767e-06                      2.897326e-04

Plot GAM model with te smoothing and squareroot of phylogenetic distance for the species pairs with less than 30 million years of separation

color<-colorRampPalette(c("black", "white"))
z<-data.cooc.30$cooc.z
x<-sqrt(data.cooc.30$phy.dist)
y<-data.cooc.30$gen.spec
colsbw <- myColorRamp(c("grey99", "black"), z)
colfunc <- colorRampPalette(c("grey99", "black"))

#dev.new(width=5.9, height=4)
layout(matrix(1:3,ncol=3), width = c(3,3,1),height = c(1,1))
par(mar=c(5.1,5.1,4.1,1))
my.vis.gam(cooc.gam.30,type="response", plot.type = "contour", color="gray",  ylab="Difference in generalism and specialism", main="",
xlab="Phylogenetic distance(sqrt)", cex.lab=1.5, cex.axis=1.25, cex.main=0.1, lwd=2, cex=2)
par(mar=c(5.1,5.7,4.1,1.5))
plot(x=x, y=y, col = colsbw, pch=16, cex=1.25, bty="n", xlab="Phylogenetic distance(sqrt)", ylab="Difference in generalism and specialism",lwd=2, cex.axis=1.25, cex.lab=1.5 )
box(bty="l", lwd=2)
xl <- 0
yb <- 0
xr <- 1.5
yt <- 13

par(mar=c(5.1,2,4.1,2))
plot(NA,type="n",ann=FALSE,xlim=c(0,2),ylim=c(0,13),xaxt="n",yaxt="n",bty="n")
rect(
     xl,
     head(seq(yb,yt,(yt-yb)/20),-1),
     xr,
     tail(seq(yb,yt,(yt-yb)/20),-1),
     col=colfunc(20)
    )

#textClick("a)", cex=1.5)
#textClick("b)", cex=1.5)
#textClick("Co-occurrence",cex=1.25)
#textClick("-5", cex=1.5)
#textClick("12", cex=1.5)

GAMs with square root transformation

Tammy Elliott

Monday, February 2, 2015

Import data

GAMs with interaction between two predictors

ANOVA comparison with model with square root transformation of phylogenetic distance

Verify if the parameters are acceptable for the GAM with the square root transformation

GAM with binomial distribution

Extract lambda estimates

Plot GAM with square root of phylogenetic distance and tensor product smoothing

GAMs with phylogenetic distances of less than 40 million years

GAM model for the species pairs separated by less than 40 million years

Residual check for GAM model for the species pairs separated by less than 40 million years

Summary of GAM model for the species pairs separated by less than 40 million years

Analyze less than 40 million years data with smooth term ‘s’

Check residuals for GAM model with species separated by less than 40 million years and smooth term ‘s’

Summary of GAM model for the species pairs separated by less than 40 million years and smooth term ‘s’

Visualization of GAM model for the species pairs separated by less than 40 million years and smooth term ‘s’

ANOVA comparison with model with two different smooth terms for species pairs separated by less than 40 million years

Extract lambda estimates for GAM model with te smoothing and squareroot of phylogenetic distance

Plot GAM model with te smoothing and squareroot of phylogenetic distance for the species pairs with less than 40 million years of separation

GAMs with phylogenetic distances of less than 30 million years

GAM model for the species pairs separated by less than 30 million years

Residual check for GAM model for the species pairs separated by less than 30 million years

Summary of GAM model for the species pairs separated by less than 30 million years

Analyze less than 30 million years data with smooth term ‘s’

Check residuals for GAM model with species separated by less than 30 million years and smooth term ‘s’

Summary of GAM model for the species pairs separated by less than 30 million years and smooth term ‘s’

Visualization of GAM model for the species pairs separated by less than 30 million years and smooth term ‘s’

ANOVA comparison with model with two different smooth terms for species pairs separated by less than 30 million years

Extract lambda estimates for GAM model with te smoothing and squareroot of phylogenetic distance

Plot GAM model with te smoothing and squareroot of phylogenetic distance for the species pairs with less than 30 million years of separation