Simulating trees

What values of lambda and mu did you get? How do these compare to your original simulation?

Lambda= 9.559118 and mu=3.611103. Both rates are smaller than the rates I originally set for the simulation.
setwd("C:/Users/Camila Medeiros/Desktop/R Lab/alfaro-lectures/lab1-trees")

library(geiger)
library(phytools)
library(diversitree)

source('rabosky_functions.R')

pars <- c(10, 5); #in order: lambda, mu
tt <- simulateTree(pars, max.taxa=100)
plot(tt,  show.tip.label = F)

tt.func <- make.bd(tt)
fitDiversitree(tt.func)
## $pars
##   lambda       mu 
## 9.144315 5.935629 
## 
## $loglik
## [1] 437.071
## 
## $AIC
## [1] -870.1419
## 
## $counts
## function gradient 
##       13       13 
## 
## $convergence
## [1] 0
## 
## $message
## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"

Exercise 1

Simulate 100 trees with the same birth and death rate (use a non-zero death rate). Fit a birth-death model to each tree. Extract the estimated birth and death rates and plot a histogram from your sample and comment on how well the estimates compare to the true values. What does this imply about what we can learn from empirical studies of molecular phylogenies about diversification?

The estimated birth and death rate values from this model do not correspond exactly to the values set for the simulation. The difference is small and the model’s likelihood is high and AIC is low, showing that this is a good fit. Even though molecular phylogenies are generated using extant species only, b-d models are a good tool to estimate speciation and extinction rates.
par(mfrow=c(1,2))
REPS <- 100
lambdavector <- numeric(REPS)
muvector <- numeric(REPS)
pars <- c(1, .1) #Using the same lambda and mu rates (10 and 10, for example), the values in the histogram are concentrated around the point 0 and do not represent a normal distribution and are different from the rates you imput for the model. This happens because hen the net diversification rate is zero the clade tree is biased- doesn't change.


for (i in 1:REPS){
  tt <- simulateTree(pars, max.taxa=100)
  tt.func <- make.bd(tt)
  tmp<-fitDiversitree(tt.func)
  lambda<-tmp[[1]][[1]]
  mu<-tmp[[1]][[2]]
  lambdavector[i]<-lambda
  muvector[i]<-mu
}

tmp
hist(lambdavector, main="Birth rate")
hist(muvector, main="Death rate")

Exercise 2

How much stochasticity is associated with the outcome of a birth-death process? What does this suggest about our ability to intuitively identify clades that have undergone exceptional speciation?

The b-d is governed by stochasticity. This is the reason why this model is a powerful tool- it can be used as a null model to describe the generation of biodiversity. By fitting data from molecular phylogenies, for example, it is possible to identify clades that went through exceptional speciation - because their net diversification rates are going to deviate from the expected null model and have an almost log-like ltt curve instead of the exponential-like ltt curve from stochastic diversification models.
REPS <- 100
taxon_count <- numeric(REPS)
pars <- c(1,.5)

for (i in 1:REPS){
  trees <- simulateTree(pars, max.t=12)
  taxon_count [i]<- length(trees$tip.label)
}
taxon_count

hist(taxon_count)

mean(taxon_count)
sd(taxon_count)
quantile(taxon_count)
#Lineage accumulation through time under constant speciation rate
ltt(trees)

#Simulating a tree with extinct taxa retained
pars <- c(10, 5)
tt <- simulateTree(pars, max.taxa=100)
ttEx <- sim.bdtree(b = 10, d = 1, stop = "taxa", t = 100, n = 100 )

#Pruning the extinct taxa from the tree
livingOnly <- drop.fossil(ttEx)

#Plotting both figures together
par(mfrow = c(1,2))
plot(ttEx, show.tip.label = F, main = "extant and extinct")
plot(livingOnly, show.tip.label = F, main = "extant only")

Exercise 3

Simulate 5 trees with and without extinction that have similar net diversification rates. Can you say anything about the general shape of the trees that have been simulated with extinction?

The trees simulated with extinction had higher taxa diversification more recently- the pull of the recent. Because both with and without extinction trees have the same net diversification rate they present the same number of taxa, but the absence of extinction makes speciation slow at the beggining of clade evolution- when there are less species.The growth then scales with time, behaving like a pure-birth model.
par(mfrow=c(5,2), mar=c(1,1,1,1))

for (ii in 1:5){
  pars <- c(3, 1); #extant+extinct
  tt_Ex <- simulateTree(pars, max.taxa=100)
  plot(tt_Ex, show.tip.label = F, main="With Extinction")
  
  pars <- c(1.5, 0); #extant only
  tt_noEx <- simulateTree(pars, max.taxa=100)
  plot(tt_noEx, show.tip.label = F, main="Without Extinction")
}

Exercise 4

Does the diversity dependent tree or lineage through time plot look different than the pure birth tree?

Yes. The diversity dependent tree has a higher speciation earlier than the pure birth model. The lineage accumulation through time plot, consequently, has a different shape for each model. The DD model curve has a logaritimic-like growth shape, while the PB model has an exponential-like growth curve.
library(DDD)
## Loading required package: ade4
#Plotting pure birth
par(mfrow=c(1,2))

pars <- c(.12, 0); #in order: lambda, mu
tt <- simulateTree(pars, max.taxa=100)
ttEx <- sim.bdtree(b = .12, d = 0, stop = "taxa", t = 100, n = 100 )
plot(ttEx, show.tip.label = F, main = "pure birth")
ltt(ttEx)

#Plotting DD
par(mfrow=c(1,2))

ddTree <- dd_sim(c(.12,.02,100),100)
ddLivingOnly <- ddTree[[1]]
plot(ddLivingOnly, show.tip.label = F, main = "diversity dependent")
ltt(ddLivingOnly)

Diversification analysis

setwd("C:/Users/Camila Medeiros/Desktop/R Lab/alfaro-lectures/lab1-trees")

require(geiger)
require(laser)
## Loading required package: laser
snake_tree<-read.tree("homalops.phy")
par(mfrow=c(1,1))
plot(snake_tree)

par(mfrow=c(1,1))
plot(ladderize(snake_tree)) 

#Lineage through time plots
par(mfrow=c(1,1))
ltt.plot(snake_tree, log="y")

layout(matrix(1:2, 2, 1))
data(bird.families)
data(bird.orders)

#Plotting them with titles
par(mfrow=c(2,1))
ltt.plot(bird.families)
title("Lineages Through Time Plot of the Bird Families")
ltt.plot(bird.families, log = "y")
title(main = "Lineages Through Time Plot of the Bird Families",
      sub = "(with logarithmic transformation of the y-axis)")

par(mfrow=c(2,2))
layout(matrix(1:4, 2, 2))
plot(bird.families, show.tip.label = FALSE)
ltt.plot(bird.families, main = "Bird families")
plot(snake_tree, show.tip.label = FALSE)
ltt.plot(snake_tree, main = "Homalopsid species")

par(mfrow=c(1,1))
layout(matrix(1))
mltt.plot(bird.families, bird.orders)

#Doing the ltt plot with laser
snake.times<-getBtimes('homalops.phy')
snake_tree<-read.tree("homalops.phy") 
snake.times2<-getBtimes(string = write.tree(snake_tree))
snake.times2
##  [1] 21.783281 19.698911 19.291015 18.896251 18.190025 17.078195 16.419725
##  [8] 15.900081 14.847855 14.456585 13.782375 12.123975 11.514905 10.507591
## [15]  9.624845  9.128891  4.476151  3.462541  2.871535  1.960470
plotLtt(snake.times2)

##  [1] 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101 2.0794415
##  [8] 2.1972246 2.3025851 2.3978953 2.4849066 2.5649494 2.6390573 2.7080502
## [15] 2.7725887 2.8332133 2.8903718 2.9444390 2.9957323 3.0445224 3.0910425
par(mfrow=c(1,1))
layout(matrix(1:2, 2,1))
ltt.plot(snake_tree, log="y", main = "APE LTT")
plotLtt(snake.times)

##  [1] 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101 2.0794415
##  [8] 2.1972246 2.3025851 2.3978953 2.4849066 2.5649494 2.6390573 2.7080502
## [15] 2.7725887 2.8332133 2.8903718 2.9444390 2.9957323 3.0445224 3.0910425
#The gamma statistics
snake_gamma <- gammaStat(snake_tree)
snake_gamma
## [1] -3.241085
mccrTest(34, 13, 100, ObservedGamma=snake_gamma)->mccrResults
mccrResults
## $null.gamma
##   [1]  0.20572359 -1.75842918  0.34869075 -1.22199517 -1.79623339
##   [6] -1.09901705 -1.37071506  0.40227475 -0.60362879 -1.07220244
##  [11] -0.20806940  0.28714342 -0.11887165 -1.48100763 -2.62842847
##  [16] -1.61584883  0.37845854 -1.52616460 -0.79790494 -0.45946174
##  [21] -3.92105693  0.16707522 -1.58428260 -1.69080729  0.22282201
##  [26] -0.10401314 -1.65530400 -1.03533985 -0.40380992 -1.38313204
##  [31] -0.82598014 -0.01555071 -0.22766067 -1.33667672 -0.81264181
##  [36] -1.94716212 -0.12422766 -0.12383168 -2.01089679 -1.06316708
##  [41]  0.07397719 -0.04354865 -0.51807683  0.06034942  0.21894943
##  [46]  0.80090972 -1.13639253  0.52204941 -0.55910189  0.31756189
##  [51] -0.71818788 -2.22644115 -1.24694423 -0.03681140 -1.80018465
##  [56] -1.43304284 -1.37340486 -0.95822804 -0.88489999  0.20258597
##  [61] -0.54714678 -1.11207103  0.79674312  0.57996769 -1.19135388
##  [66] -2.50883851  0.79403632 -0.47016940 -2.82732287  1.52406782
##  [71] -1.37406054  0.20879030 -0.86571534 -0.85604438 -0.29984404
##  [76]  0.58174877 -0.63708553 -1.60527774 -0.17104369 -2.29559495
##  [81]  0.51543151 -0.45253644 -0.68912485 -2.69723001 -2.14929744
##  [86] -0.54099363  0.06100528  0.43724853 -0.82155376 -1.52220705
##  [91] -0.66700253 -0.02004704 -0.67720707 -1.41909424  0.18478017
##  [96] -0.72487160  0.63463822 -0.96631685 -1.61628511 -0.51026282
## 
## $critical.value
## [1] -2.306257
## 
## $p.value
## [1] 0.01980198
age <- 22
richness <- 34
snakebirth =  log(richness)/age
snakebirth
## [1] 0.1602891
#This simulates gamma values when trees are undersampled.
#we will grow trees with n=34 and prune them down to 13 taxa

num_simulations<-200 #number of simulations
g1_null<-numeric(num_simulations) #g1_null will hold the simulated gamma values
for(i in 1:num_simulations) {
  sim.bdtree(snakebirth, d=0, stop = "taxa", n=34)->sim_tree 
  drop.random(sim_tree, 13)->prune # prune down to the # of taxa in the phylogeny
  gammaStat(prune)->g1_null[i]
}

#Create a histogram of the null distribution
par=(mfrow=c(1,1))
hist(g1_null)

#Arrow indicates where the observed gamma falls in the null you just generated
arrows(snake_gamma, 40, snake_gamma, 0, col="red", lwd=2)

#Which of the null values are smaller (more negative) than the data?
smallerNull<-g1_null<=snake_gamma

#How many TRUEs are there?
count<-sum(smallerNull)

#What is the p-value?
mccr_pval<-(count+1)/(num_simulations+1)
mccr_pval
## [1] 0.01492537
####Fitting Diversification Models to Branching Times
snake.times<-getBtimes('homalops.phy')
pb<-pureBirth(snake.times) #pb
aic.pb<-pb$aic#store aic score
bd<-bd(snake.times) # birth-death
aic.bd<-bd$aic #store aic
ddl<-DDL(snake.times) # logistic density dependent diversification
aic.ddl<-ddl$aic
ddx<-DDX(snake.times) # exponential density dependent diversification
aic.ddx<-ddx$aic
#the MCCR test above cannot discriminate between early rapid speciation and early slow extinction.
#these models allow those to scenarios to be compared 
spvar<-fitSPVAR(snake.times, init=c(2, .2, .1))#variable speciation
aic.spvar<-spvar$aic
exvar<-fitEXVAR(snake.times, init=c(.3, .01, 1))#variable extinction
aic.exvar<-exvar$aic
both<-fitBOTHVAR(snake.times, init=c(.3, .5, .1, .5))#variable extinction and speciation
aic.both<-both$aic

#Make a table and compare them all
aic.all<-cbind(aic.pb, aic.bd, aic.ddl, aic.ddx, aic.spvar, aic.exvar, aic.both)
foo<-function(x) x-x[which(x==min(x))] #quick function to compare all models to the model with lowest aic score
daic<-t(apply(aic.all, 1, foo))#apply that function to everything in aic.all
cat("Table of delta-aic values; zero - best model\n")
## Table of delta-aic values; zero - best model
print(daic, digits=2)
##    aic.pb aic.bd aic.ddl aic.ddx aic.spvar aic.exvar aic.both
## LH     12     14       0     1.8       4.5        17      6.6

Exercise 5

What would you conclude from this table? How is this different from what you can say after the MCCR test?

We can conclude that the density dependent- logistic model is the best fit for the Homalopsids clade. The MCCR test allows us to check if our data follows a pure-birth or a diversity dependent model, but doesn’t differentiate diversity dependent models from one another.

Exercise 6

Calculate the gamma statistic for the balistoid tree and comment on whether this clade shows evidence for rates that have slowed through time. Fit the laser models to this tree and explain whether balistoid diversification is consistent with density dependent diversification. Fit laser models to balistoid tree.

setwd("C:/Users/Camila Medeiros/Desktop/R Lab/alfaro-lectures/lab1-trees")
fish_tree<-read.tree("balistoids.phy")
plot(fish_tree)

par(mfrow=c(1,1))
plot(ladderize(fish_tree)) 

#Doing the ltt plot with laser
par(mfrow=c(1,1))
fish.times<-getBtimes('balistoids.phy')
fish_tree<-read.tree("balistoids.phy") 
fish.times2<-getBtimes(string = write.tree(fish_tree))
fish.times2
##  [1] 48.1564091 36.1125562 34.7600483 30.6556250 27.7519900 27.5629351
##  [7] 23.2525386 22.3796628 21.9469046 20.6355391 19.4944695 18.7695189
## [13] 17.6689957 17.0271071 16.8184142 16.1334671 16.0785478 14.4120575
## [19] 14.3361008 14.1793372 13.7315658 13.6374863 13.4082427 12.8097083
## [25] 12.7087501 12.4649466 11.6680490 11.4665666 11.3947635 11.1946575
## [31] 10.9490729 10.8862547 10.0578026  9.9924564  9.8317479  9.7136110
## [37]  9.6447375  9.3941632  8.4943235  8.4809942  8.4157905  8.1952149
## [43]  7.2155031  6.1797769  5.8988949  5.8178101  5.7156689  5.5843149
## [49]  5.5491179  5.2408393  4.7923994  4.6671079  4.5333976  4.5022536
## [55]  4.4391749  3.9122475  3.8285386  3.7353650  3.5517725  3.4044888
## [61]  3.2574712  3.1954775  3.1508580  2.9184643  2.7409477  2.4963736
## [67]  2.4545971  2.2925717  1.8872528  1.8426827  1.6963747  1.6233703
## [73]  1.6229827  1.3841502  1.3018082  0.9982860  0.7513015  0.7392239
## [79]  0.7240316  0.5576282  0.5246271  0.4577925  0.3621222  0.3430004
## [85]  0.3237452
plotLtt(fish.times2)

##  [1] 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101 2.0794415
##  [8] 2.1972246 2.3025851 2.3978953 2.4849066 2.5649494 2.6390573 2.7080502
## [15] 2.7725887 2.8332133 2.8903718 2.9444390 2.9957323 3.0445224 3.0910425
## [22] 3.1354942 3.1780538 3.2188758 3.2580965 3.2958369 3.3322045 3.3672958
## [29] 3.4011974 3.4339872 3.4657359 3.4965076 3.5263605 3.5553481 3.5835189
## [36] 3.6109179 3.6375862 3.6635616 3.6888795 3.7135721 3.7376696 3.7612001
## [43] 3.7841896 3.8066625 3.8286414 3.8501476 3.8712010 3.8918203 3.9120230
## [50] 3.9318256 3.9512437 3.9702919 3.9889840 4.0073332 4.0253517 4.0430513
## [57] 4.0604430 4.0775374 4.0943446 4.1108739 4.1271344 4.1431347 4.1588831
## [64] 4.1743873 4.1896547 4.2046926 4.2195077 4.2341065 4.2484952 4.2626799
## [71] 4.2766661 4.2904594 4.3040651 4.3174881 4.3307333 4.3438054 4.3567088
## [78] 4.3694479 4.3820266 4.3944492 4.4067192 4.4188406 4.4308168 4.4426513
## [85] 4.4543473 4.4659081
layout(matrix(1:2, 2,1))
ltt.plot(fish_tree, log="y", main = "APE LTT")
par=(mfrow=c(1,1))
plotLtt(fish.times)

##  [1] 0.6931472 1.0986123 1.3862944 1.6094379 1.7917595 1.9459101 2.0794415
##  [8] 2.1972246 2.3025851 2.3978953 2.4849066 2.5649494 2.6390573 2.7080502
## [15] 2.7725887 2.8332133 2.8903718 2.9444390 2.9957323 3.0445224 3.0910425
## [22] 3.1354942 3.1780538 3.2188758 3.2580965 3.2958369 3.3322045 3.3672958
## [29] 3.4011974 3.4339872 3.4657359 3.4965076 3.5263605 3.5553481 3.5835189
## [36] 3.6109179 3.6375862 3.6635616 3.6888795 3.7135721 3.7376696 3.7612001
## [43] 3.7841896 3.8066625 3.8286414 3.8501476 3.8712010 3.8918203 3.9120230
## [50] 3.9318256 3.9512437 3.9702919 3.9889840 4.0073332 4.0253517 4.0430513
## [57] 4.0604430 4.0775374 4.0943446 4.1108739 4.1271344 4.1431347 4.1588831
## [64] 4.1743873 4.1896547 4.2046926 4.2195077 4.2341065 4.2484952 4.2626799
## [71] 4.2766661 4.2904594 4.3040651 4.3174881 4.3307333 4.3438054 4.3567088
## [78] 4.3694479 4.3820266 4.3944492 4.4067192 4.4188406 4.4308168 4.4426513
## [85] 4.4543473 4.4659081
#The gamma statistics
fish_gamma <- gammaStat(fish_tree)
fish_gamma
## [1] 0.3488196
#How many TRUEs are there?
count<-sum(smallerNull)

#Fitting Diversification Models to Branching Times
fish.times<-getBtimes("balistoids.phy")
pb<-pureBirth(fish.times) #pb
aic.pb<-pb$aic#store aic score
bd<-bd(fish.times) # birth-death
aic.bd<-bd$aic #store aic
ddl<-DDL(fish.times) # logistic density dependent diversification
aic.ddl<-ddl$aic
ddx<-DDX(fish.times) # exponential density dependent diversification
aic.ddx<-ddx$aic
#the MCCR test above cannot discriminate between early rapid speciation and early slow extinction.
#these models allow those to scenarios to be compared 
spvar<-fitSPVAR(fish.times, init=c(2, .2, .1))#variable speciation
aic.spvar<-spvar$aic
exvar<-fitEXVAR(fish.times, init=c(.3, .01, 1))#variable extinction
aic.exvar<-exvar$aic
both<-fitBOTHVAR(fish.times, init=c(.3, .5, .1, .5))#variable extinction and speciation
aic.both<-both$aic

#Making a table and comparing them all
aic.all<-cbind(aic.pb, aic.bd, aic.ddl, aic.ddx, aic.spvar, aic.exvar, aic.both)
foo<-function(x) x-x[which(x==min(x))] #quick function to compare all models to the model with lowest aic score
daic<-t(apply(aic.all, 1, foo))#apply that function to everything in aic.all
cat("Table of delta-aic values; zero - best model\n")
## Table of delta-aic values; zero - best model
print(daic, digits=2)
##    aic.pb aic.bd aic.ddl aic.ddx aic.spvar aic.exvar aic.both
## LH      0    1.8       2     1.8       3.9       3.8      5.9
The gamma statistics for the Balistoids clade is a positive close to 0 value. This indicates that the clade did not go through adaptive radiation- or diversity dependent evolution. The laser models fit shos that the best model to explain the clade’s evolution is the pure birth model.