We create a new function to grep species from an existing tree, based
on the given genus names, called get_species_from_genus
get_species_from_genus <- function(phylo, genus) {
# Get the tips (species) from the tree
tips <- phylo$tip.label
# Find the nodes that correspond to the given genus
nodes <- which(sapply(strsplit(tips, "_"), `[`, 1) == genus)
# Get the tips that are descendants of the nodes corresponding to the given genus
species <- character(0)
for (node in nodes) {
descendants <- phylo$tip.label[getDescendants(phylo, node)]
species <- union(species, descendants)
}
return(species)
}
Then we can use this function to get all “Cichla” species from the backbone tree, randomly select one and add new “Cichla belem” as a sister species to it.
genus_species<-get_species_from_genus(phylo,"Cichla")
#find random sister species
sister_index<-sample(genus_species,1)
# get info from the edge `file`
parent_node<-phylo$edge[which(phylo$edge[,2]==which(phylo$tip.label==sister_index)),1]
# get info about the branch length
branch_length<-phylo$edge.length[which(phylo$edge[,2]==which(phylo$tip.label==sister_index))]
#now, add "Cichla belem"
cichla_phy_plus <- bind.tip(cichla_phy, "Cichla_belem", edge.length = branch_length, where = which(phylo$tip.label==sister_index), position = branch_length/2)
A step by step plan of how to achieve the properly-working loop:
We want to take the genus names from the object with species
names 'new_sp'
For each genus name use the
’get_species_from_genus()'
After, we need to randomly choose one ‘sister species’ with
'sample()' from the list from step 2
Get the branch length for each sister one directly from the backbone tree
Finally, add species to the tree with 'bind.tip()'
function
2.1. Create an object that will store the species of interest & Make an object with the genus names of those species for further use to search the tree
new_sp <- c("Cichla_belem", "Bathygobius_toronto", "Arapaima_regina", "Mugil_kyiv", "Colomesus_lagos", "Pimelodus_paris", "Ancistrus_quito", "Gymnotus_beijing")
new_sp_genus <- sub("_.*", "", new_sp)
2.2. Open the backbone tree and duplicate it to the tree where the changes are going to be introduced
library(phytools)
backbone.tree <- read.tree(file = "updated_backbone3.tre")
fish.tree.plus <- backbone.tree
2.3. Create the loop
# Initialize empty lists to store the results
out1_list <- list()
out2_list <- list()
out3_list <- list()
# LOOP the functions for the set of species that one wants to add
# Sequence the content of new_sp_genus with 'seq_along() for the use in iterations
for (i in seq_along(new_sp_genus)) {
genus <- new_sp_genus[i]
add_sp <- new_sp[i]
# Get the excisting species from the backbone tree by the genuses our interest, and output them in 'out1'
out1 <- get_species_from_genus(fish.tree.plus, genus)
# Directly assigns the value out1 to the key genus in out1_list.
out1_list[[genus]] <- out1
# Randomly select one species of each genus from 'out1' and store in 'out2'
out2 <- sample(out1, 1)
out2_list[[genus]] <- out2
# Get the edge length for the selected species and put it in 'out3'
out3 <- fish.tree.plus$edge.length[which(fish.tree.plus$edge[,2] == which(fish.tree.plus$tip.label == out2))]
out3_list[[genus]] <- out3
# Add new species to the tree
fish.tree.plus <- bind.tip(fish.tree.plus, add_sp, edge.length = out3 / 2,
where = which(fish.tree.plus$tip.label == out2),
position = out3 / 2)
}
Explanation on how the loop works:
Primarily, each genus name in the new_sp_genus are being
put through the loop code, gradually acquiring out1, out2, out3 and
finally adding the species to the tree. When on the second iteration,
the next genus is being taken into the loop, it creates new out1, out2
and so on —> the previous out-files are being overwritten. Hence, to
keep track of all the outputs, we create the out_list(s) as
spaces to store all the outputs that we gain within the loop. This way
the out1_list , for instance, now includes all the outputs from out1 for
each genus, and same with others.
Finally, important note is that the name of the final tree should be
THE SAME as the name of tree used in the bind.tip().
Otherwise, the tree will overwrite itself, leaving the previously added
species behind.
2.4 Check the result
2.4.1 Are new added species present in the tree?
Run sapply() with grep to get the list of
species from the tree with their respective positions in the
tip.label list. As we can see, the new species are indeed
now present in the tree, and we can din them by their position.
sapply(new_sp,grep, fish.tree.plus$tip.label)
## Cichla_belem Bathygobius_toronto Arapaima_regina Mugil_kyiv
## 1613 3111 11300 2250
## Colomesus_lagos Pimelodus_paris Ancistrus_quito Gymnotus_beijing
## 6190 9453 10252 10977
2.4.2 How does it look like in the tree?
Now, we can build the tree to actually see where and how the new species were integrated into the tree. For this matter we create 2 subtrees with only the genus of interest: Bathygobius & Pimelodus.
tips1 <- c(out1_list[["Bathygobius"]], "Bathygobius_toronto")
tips2 <- c(out1_list[["Pimelodus"]], "Pimelodus_paris")
bathygobius_tree <- keep.tip(fish.tree.plus, tips1)
pimelodus_tree <- keep.tip(fish.tree.plus, tips2)
par(mfrow = c(1,2))
par(mar = c(5,1,4,0))
mycol1 <- def(bathygobius_tree$tip.label, Bathygobius_toronto = "coral", default = "black")
mycol2 <- def(pimelodus_tree$tip.label, Pimelodus_paris = "coral", default = "black")
plot(bathygobius_tree, tip.color = mycol1)
title("Bathygobius toronto")
plot(pimelodus_tree, tip.color = mycol2)
title("Pimelodus paris")