1. Adding one species to the backbone tree, based on the same genus

We create a new function to grep species from an existing tree, based on the given genus names, called get_species_from_genus

get_species_from_genus <- function(phylo, genus) {
  # Get the tips (species) from the tree
tips <- phylo$tip.label 
  # Find the nodes that correspond to the given genus
nodes <- which(sapply(strsplit(tips, "_"), `[`, 1) == genus)
  # Get the tips that are descendants of the nodes corresponding to the given genus
species <- character(0)
  for (node in nodes) {
    descendants <- phylo$tip.label[getDescendants(phylo, node)]
    species <- union(species, descendants)
  }
    return(species)
}

Then we can use this function to get all “Cichla” species from the backbone tree, randomly select one and add new “Cichla belem” as a sister species to it.

genus_species<-get_species_from_genus(phylo,"Cichla")
#find random sister species
sister_index<-sample(genus_species,1)
 # get info from the edge `file`
parent_node<-phylo$edge[which(phylo$edge[,2]==which(phylo$tip.label==sister_index)),1] 
# get info about the branch length   
branch_length<-phylo$edge.length[which(phylo$edge[,2]==which(phylo$tip.label==sister_index))]  

#now, add "Cichla belem"    
cichla_phy_plus <- bind.tip(cichla_phy, "Cichla_belem", edge.length = branch_length, where = which(phylo$tip.label==sister_index), position = branch_length/2)

2. Challenge: add many species to the tree at once

A step by step plan of how to achieve the properly-working loop:

  1. We want to take the genus names from the object with species names 'new_sp'

  2. For each genus name use the ’get_species_from_genus()'

  3. After, we need to randomly choose one ‘sister species’ with 'sample()' from the list from step 2

  4. Get the branch length for each sister one directly from the backbone tree

  5. Finally, add species to the tree with 'bind.tip()' function

2.1. Create an object that will store the species of interest & Make an object with the genus names of those species for further use to search the tree

new_sp <- c("Cichla_belem", "Bathygobius_toronto", "Arapaima_regina", "Mugil_kyiv", "Colomesus_lagos", "Pimelodus_paris", "Ancistrus_quito",  "Gymnotus_beijing")
new_sp_genus <- sub("_.*", "", new_sp)

2.2. Open the backbone tree and duplicate it to the tree where the changes are going to be introduced

library(phytools)
backbone.tree <- read.tree(file = "updated_backbone3.tre")
fish.tree.plus <- backbone.tree

2.3. Create the loop

# Initialize empty lists to store the results
out1_list <- list()
out2_list <- list()
out3_list <- list()

# LOOP the functions for the set of species that one wants to add
  
  # Sequence the content of new_sp_genus with 'seq_along() for the    use in iterations
   for (i in seq_along(new_sp_genus)) {
     genus <- new_sp_genus[i] 
     add_sp <- new_sp[i]
  
  # Get the excisting species from the backbone tree by the genuses our interest, and output them in 'out1'
    out1 <- get_species_from_genus(fish.tree.plus, genus)
  # Directly assigns the value out1 to the key genus in out1_list. 
    out1_list[[genus]] <- out1 
  
  # Randomly select one species of each genus from 'out1' and store   in 'out2'
    out2 <- sample(out1, 1)
    out2_list[[genus]] <- out2
  
  # Get the edge length for the selected species and put it in 'out3'
    out3 <- fish.tree.plus$edge.length[which(fish.tree.plus$edge[,2] == which(fish.tree.plus$tip.label == out2))]
    out3_list[[genus]] <- out3
  
  # Add new species to the tree
    fish.tree.plus <- bind.tip(fish.tree.plus, add_sp, edge.length = out3 / 2, 
    where = which(fish.tree.plus$tip.label == out2), 
    position = out3 / 2)
}

Explanation on how the loop works:

Primarily, each genus name in the new_sp_genus are being put through the loop code, gradually acquiring out1, out2, out3 and finally adding the species to the tree. When on the second iteration, the next genus is being taken into the loop, it creates new out1, out2 and so on —> the previous out-files are being overwritten. Hence, to keep track of all the outputs, we create the out_list(s) as spaces to store all the outputs that we gain within the loop. This way the out1_list , for instance, now includes all the outputs from out1 for each genus, and same with others.

Finally, important note is that the name of the final tree should be THE SAME as the name of tree used in the bind.tip(). Otherwise, the tree will overwrite itself, leaving the previously added species behind.

2.4 Check the result

2.4.1 Are new added species present in the tree?

Run sapply() with grep to get the list of species from the tree with their respective positions in the tip.label list. As we can see, the new species are indeed now present in the tree, and we can din them by their position.

sapply(new_sp,grep, fish.tree.plus$tip.label)
##        Cichla_belem Bathygobius_toronto     Arapaima_regina          Mugil_kyiv 
##                1613                3111               11300                2250 
##     Colomesus_lagos     Pimelodus_paris     Ancistrus_quito    Gymnotus_beijing 
##                6190                9453               10252               10977

2.4.2 How does it look like in the tree?

Now, we can build the tree to actually see where and how the new species were integrated into the tree. For this matter we create 2 subtrees with only the genus of interest: Bathygobius & Pimelodus.

tips1 <- c(out1_list[["Bathygobius"]], "Bathygobius_toronto")
tips2 <- c(out1_list[["Pimelodus"]], "Pimelodus_paris")
bathygobius_tree <- keep.tip(fish.tree.plus, tips1) 
pimelodus_tree <- keep.tip(fish.tree.plus, tips2)
par(mfrow = c(1,2))
par(mar = c(5,1,4,0))
mycol1 <- def(bathygobius_tree$tip.label, Bathygobius_toronto = "coral", default = "black")
mycol2 <- def(pimelodus_tree$tip.label, Pimelodus_paris = "coral", default = "black")
plot(bathygobius_tree, tip.color = mycol1)
title("Bathygobius toronto")
plot(pimelodus_tree, tip.color = mycol2)
title("Pimelodus paris")

Worked :)