This code is practice for running a PGLS for the Basics of Phylogenetics Independent Study. The code was obtained from http://www.phytools.org/Cordoba2017/ex/4/PGLS.html. This script was created on 3/10/21, and updated for class on 3/8/22.
Before you begin: Create a new project where you’ll store this practice code and the associated data.
#install.packages("ape")
#install.packages("gieger")
#install.packages("nlme")
#intsall.packages("caper")
library("ape")
library("geiger")
library("nlme")
library("phytools")
## Loading required package: maps
library("caper")
## Loading required package: MASS
## Loading required package: mvtnorm
These data are from a study that explored factors that influence song trait evolution in 33 different species of barbets.
dat<-read.csv("Barbet.dat.csv",header=TRUE,row.names=1)
dat
## wing Lnalt patch colour
## Calorhamphus_fuliginosus_fuliginosus 4.388257 5.298317 2.000000 1.666667
## Calorhamphus_fuliginosus_hayi 4.427239 5.298317 2.000000 1.666667
## M_armillaris 4.532599 7.170120 6.333333 4.000000
## M_asiatica_asiatica 4.611152 6.802395 7.333333 5.000000
## M_asiatica_davisoni 4.605170 7.003065 6.666667 3.333333
## M_australis_duvaucelli 4.282206 6.620073 9.000000 4.000000
## M_chrysopogon 4.829113 6.620073 9.000000 5.333333
## M_corvina 4.824306 7.313220 2.000000 2.000000
## M_eximia 4.359270 6.620073 7.666667 5.000000
## M_faiostricta 4.693181 5.298317 5.000000 5.000000
## M_flavifrons 4.487512 6.907755 5.000000 3.666667
## M_franklinii_auricularis 4.550714 7.313220 8.000000 5.666667
## M_franklinii_franklinii 4.604170 7.313220 7.666667 5.000000
## M_franklinii_ramsayi 4.570579 7.313220 7.333333 5.000000
## M_haemacephala_indica 4.387014 6.109248 6.666667 4.000000
## M_henricii 4.557030 6.163315 7.000000 4.666667
## M_incognita 4.548600 6.802395 9.000000 3.333333
## M_javensis 4.695925 6.214608 8.666667 4.333333
## M_lagrandieri 4.898586 7.003065 7.000000 4.333333
## M_lineata_hodgsoni 4.857484 5.991465 3.666667 2.666667
## M_monticola 4.616110 6.802395 7.000000 5.000000
## M_mystacophanos 4.553877 6.745236 10.333333 5.000000
## M_oorti_annamensis 4.558079 7.313220 9.333333 5.000000
## M_oorti_nuchalis 4.588024 6.620073 9.666667 5.666667
## M_oorti_oorti 4.514151 7.170120 9.000000 5.000000
## M_pulcherrima 4.521789 7.467371 5.666667 3.666667
## M_rafflesi 4.741448 4.605170 8.333333 5.000000
## M_rubricapillus_malabarica 4.387014 6.109248 6.333333 4.000000
## M_rubricapillus_rubricapillus 4.346399 6.109248 9.000000 5.000000
## M_virens 4.953712 7.495542 2.000000 2.000000
## M_viridis 4.587006 6.214608 5.666667 3.000000
## M_zeylanica 4.682131 5.991465 2.000000 2.333333
## Psilopogon_pyrolophus 4.773224 7.130899 10.000000 6.333333
## Frequency Length Lnote
## Calorhamphus_fuliginosus_fuliginosus 20.468894 3.1207387 0.260
## Calorhamphus_fuliginosus_hayi 22.483670 3.1371727 0.230
## M_armillaris -7.135924 -11.9001412 0.030
## M_asiatica_asiatica -10.153448 0.3671016 0.025
## M_asiatica_davisoni -9.700025 0.4492719 0.030
## M_australis_duvaucelli 1.206501 -2.5198843 0.030
## M_chrysopogon -17.961279 0.1493287 0.100
## M_corvina -16.777077 0.8625655 0.115
## M_eximia -2.308277 3.2467332 0.030
## M_faiostricta -13.550036 0.3944917 0.045
## M_flavifrons -5.997081 1.7302251 0.140
## M_franklinii_auricularis -10.305768 1.7466592 0.140
## M_franklinii_franklinii -8.885098 1.7411811 0.135
## M_franklinii_ramsayi -8.648623 1.7685712 0.115
## M_haemacephala_indica -16.782037 3.2248211 0.070
## M_henricii -9.201584 -6.4335091 0.125
## M_incognita -8.166673 0.3616236 0.040
## M_javensis -17.204273 -5.6678766 0.050
## M_lagrandieri -5.910024 3.0440464 0.400
## M_lineata_hodgsoni -12.746571 0.3287555 0.070
## M_monticola -16.093079 -7.7148614 0.060
## M_mystacophanos -15.948289 -9.0316506 0.060
## M_oorti_annamensis -9.654623 -0.1503694 0.030
## M_oorti_nuchalis -11.342388 -6.2060987 0.045
## M_oorti_oorti -11.091419 0.7530051 0.140
## M_pulcherrima -7.183282 -5.5981665 0.300
## M_rafflesi -22.419596 -16.9434720 0.050
## M_rubricapillus_malabarica -16.255444 3.2357771 0.050
## M_rubricapillus_rubricapillus -17.699099 -0.6955755 0.030
## M_virens -6.799948 2.9892662 0.500
## M_viridis -8.780492 -0.0736771 0.040
## M_zeylanica -13.327782 0.4273598 0.060
## Psilopogon_pyrolophus 5.990051 -10.1022207 0.300
Before we can run a PGLS, we first have to obtain data on the evolutionary relationships between our species of interest. The code below reads in a phylogenetic tree with 42 different species of barbets.
tree<-read.nexus("http://www.phytools.org/Cordoba2017/data/BarbetTree.nex")
tree
##
## Phylogenetic tree with 42 tips and 41 internal nodes.
##
## Tip labels:
## M_asiatica_asiatica, M_lineata_hodgsoni, M_australis_australis, M_haemacephala_intermedia, M_eximia, M_viridis, ...
##
## Rooted; includes branch lengths.
The command plot() allows us to visualize the barbet tree
plot(tree, no.margin = TRUE, font = 3, cex = .75)
You’ll notice that our phylogenetic tree has 42 species, but our data only includes 33. Before using the tree, we’ll have to make sure that all species in the tree are also in the data.
Remember that if there are any spelling differences, even small, then the tree & dataset will not coincide.
To establish that we have the same species in our data as in the tree, we can use a function in the geiger package called name.check.
obj<-name.check(tree, dat)
obj
## $tree_not_data
## [1] "M_asiatica_chersonesus" "M_australis_australis"
## [3] "M_haemacephala_celestinoi" "M_haemacephala_delica"
## [5] "M_haemacephala_intermedia" "M_haemacephala_rosea"
## [7] "M_lineata_lineata" "M_oorti_faber"
## [9] "M_oorti_sini"
##
## $data_not_tree
## character(0)
We should see that there are 9 species in the tree for which we do not have data, but all of the species that we have in our dataset are included in the tree. We have to prune the extra species from the tree to proceed with the analysis. We can do this using the function drop.tip. Then, run name.check() again to see if it worked correctly.
tree.short <-drop.tip(tree, obj$tree_not_data)
name.check(tree.short,dat)
## [1] "OK"
Now that we are sure that the tree and dataset match, we can start to run phylogenetic least squares regressions.
We will start by defining our variance-covariance matrix, which corrects for correlation in the residuals between closely related species. We will use the Brownian mode of evolution (the “random walk” mode), which predicts that species that are more closely related will have more similar trait values.
bm<-corBrownian(1, tree.short)
bm
## Uninitialized correlation structure of class corBrownian
The first model we will fit is for a simple analysis investigating the relationship between altitude at which a species is found and the length of the note in its song. This model will use the Brownian mode of evolution (we won’t yet correct for lambda).
model.1<-gls(Lnote~Lnalt, data=dat, correlation=bm)
## Warning in Initialize.corPhyl(X[[i]], ...): No covariate specified, species
## will be taken as ordered in the data frame. To avoid this message, specify a
## covariate containing the species names with the 'form' argument.
Before looking at the output, we’ll visualize the model. Does the relationship between altitude and note length look very strong?
plot(Lnote~Lnalt, data=dat)
abline(a = coef(model.1)[1], b = coef(model.1)[2])
summary(model.1)
## Generalized least squares fit by REML
## Model: Lnote ~ Lnalt
## Data: dat
## AIC BIC logLik
## -14.84531 -10.54335 10.42265
##
## Correlation Structure: corBrownian
## Formula: ~1
## Parameter estimate(s):
## numeric(0)
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 0.16821373 0.21394057 0.7862638 0.4377
## Lnalt -0.00761511 0.02034964 -0.3742136 0.7108
##
## Correlation:
## (Intr)
## Lnalt -0.643
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -0.30893712 -0.27804583 -0.19007275 0.08217655 1.31021476
##
## Residual standard error: 0.2967954
## Degrees of freedom: 33 total; 31 residual
R outputs several rows of summary information about the model we just ran. Take a minute to look at the table where the Values and p-values are listed. The Value of the (Intercept) indicates where the regression line intercepts the y axis, and the Value of Lnalt indicates the slope of the line. If you look at the p-values, you’ll see that according to this model, altitude does not have a significant influence upon note length (p-values are above 0.05).
Next, we will run this analysis while relaxing the assumption of Brownian motion. We will add in a term for λ, which allows us to scale the variance-covariance matrix for observed trait similarity between species (closely related species may not actually have the most similar trait values). This model is called the λ model of Pagel (Pagel’s λ). When λ = 0 the covariance between species is zero and this corresponds to a non-phylogenetic regression. When λ = 1, the evolution of the residual error is Brownian and will look like model.1, which we ran without including λ. R will calculate λ for us using a maximum likelihood estimate
We will run the same analysis with λ.
model.2<-gls(Lnote~Lnalt, data=dat, correlation=corPagel(1,tree.short))
## Warning in Initialize.corPhyl(X[[i]], ...): No covariate specified, species
## will be taken as ordered in the data frame. To avoid this message, specify a
## covariate containing the species names with the 'form' argument.
Visualize model 2. Note the change in the slope of the line.
plot(Lnote~Lnalt, data=dat)
abline(a = coef(model.2)[1], b = coef(model.2)[2])
Summarize model 2
summary(model.2)
What value of λ did you obtain? What does this value of λ indicate about covariance between species?
Let’s explore the relationship between the length of notes and wing length.
model.3<-gls(Lnote~wing, data=dat, correlation=corPagel(1, tree.short))
## Warning in Initialize.corPhyl(X[[i]], ...): No covariate specified, species
## will be taken as ordered in the data frame. To avoid this message, specify a
## covariate containing the species names with the 'form' argument.
plot(Lnote~wing, data=dat)
abline(a = coef(model.3)[1], b = coef(model.3)[2])
summary(model.3)
## Generalized least squares fit by REML
## Model: Lnote ~ wing
## Data: dat
## AIC BIC logLik
## -38.47619 -32.74024 23.23809
##
## Correlation Structure: corPagel
## Formula: ~1
## Parameter estimate(s):
## lambda
## -0.02886085
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) -1.1367106 0.5223785 -2.176029 0.0373
## wing 0.2735472 0.1139858 2.399835 0.0226
##
## Correlation:
## (Intr)
## wing -1
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -1.1349649 -0.7452844 -0.2395217 0.2964407 2.6191951
##
## Residual standard error: 0.1075279
## Degrees of freedom: 33 total; 31 residual
What is a value of λ? Is the relationship between note length and wing length significant?