1 Introduction

As one of the most diverse insect groups on Earth, butterflies serve as excellent model organisms for understanding biodiversity. Biodiversity could result from a variety of geographic processes. These processes can create barriers between populations, and this could lead to speciation and diversification of the organisms (Bergstrom and Dugatkin 2016). Diversification is noticeable in regions where mountain and island formations are abundant. A large number of islands and elevational gradients provide organisms with numerous ecological niches and resources for occupation. Over time, the organisms occupying different niches become characteristically distinct and reproductively isolated from each other. When genetic differences are large enough, these organisms might be considered separate species, and speciation has occurred (Bergstrom and Dugatkin 2016). One region where rapid species diversification occurred due to massive island and mountain formations is Southeast Asia. Therefore, species are highly diverse throughout this area.

One of the butterfly genera that also speciated and diversified in Southeast Asia is known as Delias Hübner (Lepidoptera: Pieridae) (Müller, Matos-Maraví, and Beheregaray 2013; Michael F. Braby and Pierce 2007). Delias consists of approximately 250 species (Michael F. Braby and Pierce 2007), so it is one of the most species-rich butterfly groups on Earth. The genus is also widely distributed in India, China, Australia, and many small islands near Australia, but it is the most diverse in New Guinea, which is an island located north of Australia. Toussaint et al. (2014) found that the high diversity of a water beetle is associated with the highly varying elevations of mountain ranges on the island. Therefore, it is likely that the diversity of Delias was also caused by these geographic characteristics, but this requires further investigation.

Previous studies have discovered some facts about Delias. Delias exhibit various wing patterns, and they use these patterns as effective warning signals to avoid predators (Wee and Monteiro 2017). The wing patterns of some toxic species also resemble each other to reinforce their warning colorations for this purpose (Morinaka et al. 2018). The larvae of Delias obligately feed on a parasitic plant called mistletoes (M. F. Braby and Trueman 2006), so this genus is considered the parasite of parasites. Delias butterflies were classified into 14 sub-groups: nysa, isse, pasithoe, belladonna, ladas, generaldina, aroae, eichhorni, sagessa, aganippe, hyparete, belisama, albertisi, and nigrina (Müller, Matos-Maraví, and Beheregaray 2013), and all of the groups share a common ancestor likely originated in Australia. The ancestor later spread across, speciated, and diversified to where they are currently distributed (Müller, Matos-Maraví, and Beheregaray 2013; Michael F. Braby and Pierce 2007).

2 Problem, Hypotheses, and Objective

Although scientists have spent a lot of effort in sampling and classifying Delias butterflies, the evolutionary aspects of the butterflies in terms of diversification history remained unknown. Because Delias is a large group, and the geographies of where Delias species are distributed vary extensively, this situation likely led to an increase in the diversification rate of some species but not others. Therefore, I hypothesize that the diversification rates of the butterflies also vary among different Delias species. Furthermore, because the geographical features in Southeast Asia are complex, I also expect that the rate increased dramatically over time. Here, I used a program called BAMM (Bayesian Analysis of Macroevolutionary Mixtures) and its associated R package BAMMtools (Rabosky 2014; Rabosky et al. 2014) to analyze the history of Delias diversification, this program can analyze how many rate shifts of diversification occurred in the evolutionary history and how the diversification rate changed over time. To conduct the analysis, I input two phylogenetic trees–which show the evolutionary history and relationship among species–that I created with genetic data into the BAMM program and decipher the outputs of my results with BAMMtools.

3 Methods and Results

3.1 Phylogenetic Inferences

Before the onset of the diversification rate analysis, a time-calibrated phylogenetic tree of the Delias species is needed as input for the BAMM program. For this project, I created two phylogenetic trees by using two methods: Maximum Likelihood and Bayesian Inference. The reason I used two methods is that different methods generate slightly different trees due to different algorithms. By using two methods to create the species relationship, I would get to know whether the two methods influence the later analysis of the diversification rate. I used the genetic data of 13 gene loci for making the trees. The program I used to create a Maximum Likelihood tree is IQ-Tree v2.1.2 (Minh et al. 2020), and the one I used for Bayesian Inference is BEAST (Drummond and Rambaut 2007). By using the methods, I inferred the relationship among 213 Delias species. I used multiple samples per species (406 Delias samples in total) for the Maximum Likelihood method and only one sample per species (213 Delias samples in total) for the Bayesian method since the Bayesian method takes a much longer time to finish. The difference in sample size also allows me to know if it affects the diversification rate analysis.

To plot the phylogenetic trees in R, I used R packages ape and ggtree:

library(ape)
library(ggtree)

Plotting the Bayesian tree:

Bayes_tree <- read.tree("Delias_228_3_no_outgroup.txt")
ggtree(Bayes_tree)

Plotting the Maximum Likelihood tree:

ML_tree <- read.tree("Delias_421_3_no_outgroup.txt")
ggtree(ML_tree)

3.2 Diversification Rate Analysis

3.2.1 BAMM Analysis

BAMM (Bayesian Analysis of Macroevolutionary Mixtures) is also a Bayesian method, and in order to run a Bayesian analysis, the method requires specifying prior parameters. In BAMM, four priors are needed to be specified: expectedNumberOfShifts (the expected number of diversification rate shifts), lambdaInitPrior (the speciation rate of the data under a pure birth model), muInitPrior (the same as lambdaInitPrior), lambdaShiftPrior (the standard deviation of the normal prior on the exponential change parameter) (Rabosky et al. 2014). These prior parameters could be inferred by using the time-calibrated trees with BAMMtools:

library(BAMMtools)

Generating priors for the Bayesian tree:

setBAMMpriors(Bayes_tree, outfile = 'Bayes_priors.txt')

Viewing the prior file of the Bayesian tree:

cat(readLines('Bayes_priors.txt'), sep = '\n')
## ###############################################
## 
## # Prior block chosen by BAMMtools::setBAMMpriors
## 
## expectedNumberOfShifts = 1.0
## 
## lambdaInitPrior = 0.664563836507147
## 
## lambdaShiftPrior = 0.0742223409916785
## 
## muInitPrior = 0.664563836507147
## 
## #### End Prior block
## ######################

Generating priors for the Maximum Likelihood tree:

setBAMMpriors(ML_tree, outfile = 'ML_priors.txt')

Viewing the prior file of the Maximum Likelihood tree:

#Viewing the prior file
cat(readLines('ML_priors.txt'), sep = '\n')
## ###############################################
## 
## # Prior block chosen by BAMMtools::setBAMMpriors
## 
## expectedNumberOfShifts = 1.0
## 
## lambdaInitPrior = 0.529953932805708
## 
## lambdaShiftPrior = 0.0817750685001484
## 
## muInitPrior = 0.529953932805708
## 
## #### End Prior block
## ######################

The priors were then copied and pasted into a control file, which loads all the configuration options, parameters, and input tree files into the BAMM program. One of the configuration options is the number of generations for an MCMC (Markov Chain Monte Carlo) run (Gamerman and Lopes 2006), which generate a distribution of log-likelihood of all the hypothesized numbers of rate shifts. The following shows more details of the control files for the Bayesian tree and the Maximum Likelihood tree:

#The control file for the Bayesian tree
cat(readLines('Delias_228_3_control_file.txt'), sep = '\n')
## # BAMM configuration file for speciation/extinction analysis 
## # ==========================================================
## #
## # Format
## # ------
## #
## #     - Each option is specified as: option_name = option_value
## #     - Comments start with # and go to the end of the line
## #     - True is specified with "1" and False with "0"
## 
## 
## ################################################################################
## # GENERAL SETUP AND DATA INPUT
## ################################################################################
## 
## modeltype = speciationextinction        
## # Specify "speciationextinction" or "trait" analysis
##                                   
## treefile = Delias_ssp_conc_228_3_tree_tips_removed.txt
## # File name of the phylogenetic tree to be analyzed
## 
## runInfoFilename = run_info.txt
## # File name to output general information about this run
## 
## sampleFromPriorOnly = 0                 
## # Whether to perform analysis sampling from prior only (no likelihoods computed)
## 
## runMCMC = 1                             
## # Whether to perform the MCMC simulation. If runMCMC = 0, the program will only
## # check whether the data file can be read and the initial likelihood computed
## 
## simulatePriorShifts = 1
## # Whether to simulate the prior distribution of the number of shift events,
## # given the hyperprior on the Poisson rate parameter. This is necessary to
## # compute Bayes factors
## 
## loadEventData = 0                       
## # Whether to load a previous event data file
## 
## eventDataInfile = event_data_in.txt
## # File name of the event data file to load, used only if loadEventData = 1
## 
## initializeModel = 1                     
## # Whether to initialize (but not run) the MCMC. If initializeModel = 0, the
## # program will only ensure that the data files (e.g., treefile) can be read
## 
## useGlobalSamplingProbability = 1        
## # Whether to use a "global" sampling probability. If False (0), expects a file
## # name for species-specific sampling probabilities (see sampleProbsFilename)
##                                         
## globalSamplingFraction = 1.0            
## # The sampling probability. If useGlobalSamplingFraction = 0, this is ignored
## # and BAMM looks for a file name with species-specific sampling fractions
## 
## sampleProbsFilename = sample_probs.txt
## # File name containing species-specific sampling fractions
## 
## # seed = 12345
## # Seed for the random number generator. 
## # If not specified (or is -1), a seed is obtained from the system clock
## 
## overwrite = 1
## # If True (1), the program will overwrite any output files in the current
## # directory (if present)
## 
## 
## ################################################################################
## # PRIORS
## ################################################################################
## 
## expectedNumberOfShifts = 1.0
## # prior on the number of shifts in diversification
## # Suggested values: 
## #     expectedNumberOfShifts = 1.0 for small trees (< 500 tips)
## #      expectedNumberOfShifts = 10 or even 50 for large trees (> 5000 tips) 
##  
## lambdaInitPrior = 0.664563836507147
## # Prior (rate parameter of exponential) on the initial lambda value for rate
## # regimes
## 
## lambdaShiftPrior = 0.0742223409916785
## # Prior (std dev of normal) on lambda shift parameter for rate regimes
## # You cannot adjust the mean of this distribution (fixed at zero, which is
## # equal to a constant rate diversification process)
## 
## muInitPrior = 0.664563836507147
## # Prior (rate parameter of exponential) on extinction rates  
## 
## lambdaIsTimeVariablePrior = 1
## # Prior (probability) of the time mode being time-variable (vs. time-constant)
## 
## 
## ################################################################################
## # MCMC SIMULATION SETTINGS & OUTPUT OPTIONS
## ################################################################################
## 
## numberOfGenerations = 1000000
## # Number of generations to perform MCMC simulation
## 
## mcmcOutfile = mcmc_out.txt
## # File name for the MCMC output, which only includes summary information about
## # MCMC simulation (e.g., log-likelihoods, log-prior, number of processes)
## 
## mcmcWriteFreq = 1000
## # Frequency in which to write the MCMC output to a file
## 
## eventDataOutfile = event_data.txt
## # The raw event data (these are the main results). ALL of the results are
## # contained in this file, and all branch-specific speciation rates, shift
## # positions, marginal distributions etc can be reconstructed from this output.
## # See R package BAMMtools for working with this output
## 
## eventDataWriteFreq = 1000
## # Frequency in which to write the event data to a file
## 
## printFreq = 100
## # Frequency in which to print MCMC status to the screen
## 
## acceptanceResetFreq = 1000
## # Frequency in which to reset the acceptance rate calculation
## # The acceptance rate is output to both the MCMC data file and the screen
## 
## # outName = BAMM
## # Optional name that will be prefixed on all output files (separated with "_")
## # If commented out, no prefix will be used
## 
## 
## ################################################################################
## # OPERATORS: MCMC SCALING OPERATORS
## ################################################################################
## 
## updateLambdaInitScale = 2.0
## # Scale parameter for updating the initial speciation rate for each process
## 
## updateLambdaShiftScale = 0.1
## # Scale parameter for the exponential change parameter for speciation
## 
## updateMuInitScale = 2.0
## # Scale parameter for updating initial extinction rate for each process
## 
## updateEventLocationScale = 0.05
## # Scale parameter for updating LOCAL moves of events on the tree
## # This defines the width of the sliding window proposal
##  
## updateEventRateScale = 4.0
## # Scale parameter (proportional shrinking/expanding) for updating
## # the rate parameter of the Poisson process 
## 
## 
## ################################################################################
## # OPERATORS: MCMC MOVE FREQUENCIES
## ################################################################################
## 
## updateRateEventNumber = 0.1
## # Relative frequency of MCMC moves that change the number of events
## 
## updateRateEventPosition = 1
## # Relative frequency of MCMC moves that change the location of an event on the
## # tree
## 
## updateRateEventRate = 1
## # Relative frequency of MCMC moves that change the rate at which events occur 
## 
## updateRateLambda0 = 1
## # Relative frequency of MCMC moves that change the initial speciation rate
## # associated with an event
## 
## updateRateLambdaShift = 1
## # Relative frequency of MCMC moves that change the exponential shift parameter
## # of the speciation rate associated with an event
## 
## updateRateMu0 = 1
## # Relative frequency of MCMC moves that change the extinction rate for a given
## # event
## 
## updateRateLambdaTimeMode = 0
## # Relative frequency of MCMC moves that flip the time mode
## # (time-constant <=> time-variable)
## 
## localGlobalMoveRatio = 10.0
## # Ratio of local to global moves of events 
## 
## 
## ################################################################################
## # INITIAL PARAMETER VALUES
## ################################################################################
## 
## lambdaInit0 = 0.032
## # Initial speciation rate (at the root of the tree)
## 
## lambdaShift0 = 0
## # Initial shift parameter for the root process
## 
## muInit0 = 0.005
## # Initial value of extinction (at the root)
## 
## initialNumberEvents = 0
## # Initial number of non-root processes
##  
## 
## ################################################################################
## # METROPOLIS COUPLED MCMC
## ################################################################################
## 
## numberOfChains = 4
## # Number of Markov chains to run
## 
## deltaT = 0.01
## # Temperature increment parameter. This value should be > 0
## # The temperature for the i-th chain is computed as 1 / [1 + deltaT * (i - 1)]
## 
## swapPeriod = 1000
## # Number of generations in which to propose a chain swap
## 
## chainSwapFileName = chain_swap.txt
## # File name in which to output data about each chain swap proposal.
## # The format of each line is [generation],[rank_1],[rank_2],[swap_accepted]
## # where [generation] is the generation in which the swap proposal was made,
## # [rank_1] and [rank_2] are the chains that were chosen, and [swap_accepted] is
## # whether the swap was made. The cold chain has a rank of 1.
## 
## 
## ################################################################################
## # NUMERICAL AND OTHER PARAMETERS
## ################################################################################
## 
## minCladeSizeForShift = 1
## # Allows you to constrain location of possible rate-change events to occur
## # only on branches with at least this many descendant tips. A value of 1
## # allows shifts to occur on all branches. 
## 
## segLength = 0.02
## # Controls the "grain" of the likelihood calculations. Approximates the
## # continuous-time change in diversification rates by breaking each branch into
## # a constant-rate diversification segments, with each segment given a length
## # determined by segLength. segLength is in units of the root-to-tip distance of
## # the tree. So, if the segLength parameter is 0.01, and the crown age of your
## # tree is 50, the "step size" of the constant rate approximation will be 0.5.
## # If the value is greater than the branch length (e.g., you have a branch of
## # length < 0.5 in the preceding example) BAMM will not break the branch into
## # segments but use the mean rate across the entire branch.
#The control file for the Maximum Likelihood tree
cat(readLines('Delias_421_3_control_file.txt'), sep = '\n')
## # BAMM configuration file for speciation/extinction analysis 
## # ==========================================================
## #
## # Format
## # ------
## #
## #     - Each option is specified as: option_name = option_value
## #     - Comments start with # and go to the end of the line
## #     - True is specified with "1" and False with "0"
## 
## 
## ################################################################################
## # GENERAL SETUP AND DATA INPUT
## ################################################################################
## 
## modeltype = speciationextinction        
## # Specify "speciationextinction" or "trait" analysis
##                                   
## treefile = Delias_421_3_no_outgroup.txt
## # File name of the phylogenetic tree to be analyzed
## 
## runInfoFilename = run_info.txt
## # File name to output general information about this run
## 
## sampleFromPriorOnly = 0                 
## # Whether to perform analysis sampling from prior only (no likelihoods computed)
## 
## runMCMC = 1                             
## # Whether to perform the MCMC simulation. If runMCMC = 0, the program will only
## # check whether the data file can be read and the initial likelihood computed
## 
## simulatePriorShifts = 1
## # Whether to simulate the prior distribution of the number of shift events,
## # given the hyperprior on the Poisson rate parameter. This is necessary to
## # compute Bayes factors
## 
## loadEventData = 0                       
## # Whether to load a previous event data file
## 
## eventDataInfile = event_data_in.txt
## # File name of the event data file to load, used only if loadEventData = 1
## 
## initializeModel = 1                     
## # Whether to initialize (but not run) the MCMC. If initializeModel = 0, the
## # program will only ensure that the data files (e.g., treefile) can be read
## 
## useGlobalSamplingProbability = 1        
## # Whether to use a "global" sampling probability. If False (0), expects a file
## # name for species-specific sampling probabilities (see sampleProbsFilename)
##                                         
## globalSamplingFraction = 1.0            
## # The sampling probability. If useGlobalSamplingFraction = 0, this is ignored
## # and BAMM looks for a file name with species-specific sampling fractions
## 
## sampleProbsFilename = sample_probs.txt
## # File name containing species-specific sampling fractions
## 
## # seed = 12345
## # Seed for the random number generator. 
## # If not specified (or is -1), a seed is obtained from the system clock
## 
## overwrite = 1
## # If True (1), the program will overwrite any output files in the current
## # directory (if present)
## 
## 
## ################################################################################
## # PRIORS
## ################################################################################
## 
## expectedNumberOfShifts = 1.0
## # prior on the number of shifts in diversification
## # Suggested values: 
## #     expectedNumberOfShifts = 1.0 for small trees (< 500 tips)
## #      expectedNumberOfShifts = 10 or even 50 for large trees (> 5000 tips) 
##  
## lambdaInitPrior = 0.529953932805708
## # Prior (rate parameter of exponential) on the initial lambda value for rate
## # regimes
## 
## lambdaShiftPrior = 0.0817750685001484
## # Prior (std dev of normal) on lambda shift parameter for rate regimes
## # You cannot adjust the mean of this distribution (fixed at zero, which is
## # equal to a constant rate diversification process)
## 
## muInitPrior = 0.529953932805708
## # Prior (rate parameter of exponential) on extinction rates  
## 
## lambdaIsTimeVariablePrior = 1
## # Prior (probability) of the time mode being time-variable (vs. time-constant)
## 
## 
## ################################################################################
## # MCMC SIMULATION SETTINGS & OUTPUT OPTIONS
## ################################################################################
## 
## numberOfGenerations = 1000000
## # Number of generations to perform MCMC simulation
## 
## mcmcOutfile = mcmc_out.txt
## # File name for the MCMC output, which only includes summary information about
## # MCMC simulation (e.g., log-likelihoods, log-prior, number of processes)
## 
## mcmcWriteFreq = 1000
## # Frequency in which to write the MCMC output to a file
## 
## eventDataOutfile = event_data.txt
## # The raw event data (these are the main results). ALL of the results are
## # contained in this file, and all branch-specific speciation rates, shift
## # positions, marginal distributions etc can be reconstructed from this output.
## # See R package BAMMtools for working with this output
## 
## eventDataWriteFreq = 1000
## # Frequency in which to write the event data to a file
## 
## printFreq = 100
## # Frequency in which to print MCMC status to the screen
## 
## acceptanceResetFreq = 1000
## # Frequency in which to reset the acceptance rate calculation
## # The acceptance rate is output to both the MCMC data file and the screen
## 
## # outName = BAMM
## # Optional name that will be prefixed on all output files (separated with "_")
## # If commented out, no prefix will be used
## 
## 
## ################################################################################
## # OPERATORS: MCMC SCALING OPERATORS
## ################################################################################
## 
## updateLambdaInitScale = 2.0
## # Scale parameter for updating the initial speciation rate for each process
## 
## updateLambdaShiftScale = 0.1
## # Scale parameter for the exponential change parameter for speciation
## 
## updateMuInitScale = 2.0
## # Scale parameter for updating initial extinction rate for each process
## 
## updateEventLocationScale = 0.05
## # Scale parameter for updating LOCAL moves of events on the tree
## # This defines the width of the sliding window proposal
##  
## updateEventRateScale = 4.0
## # Scale parameter (proportional shrinking/expanding) for updating
## # the rate parameter of the Poisson process 
## 
## 
## ################################################################################
## # OPERATORS: MCMC MOVE FREQUENCIES
## ################################################################################
## 
## updateRateEventNumber = 0.1
## # Relative frequency of MCMC moves that change the number of events
## 
## updateRateEventPosition = 1
## # Relative frequency of MCMC moves that change the location of an event on the
## # tree
## 
## updateRateEventRate = 1
## # Relative frequency of MCMC moves that change the rate at which events occur 
## 
## updateRateLambda0 = 1
## # Relative frequency of MCMC moves that change the initial speciation rate
## # associated with an event
## 
## updateRateLambdaShift = 1
## # Relative frequency of MCMC moves that change the exponential shift parameter
## # of the speciation rate associated with an event
## 
## updateRateMu0 = 1
## # Relative frequency of MCMC moves that change the extinction rate for a given
## # event
## 
## updateRateLambdaTimeMode = 0
## # Relative frequency of MCMC moves that flip the time mode
## # (time-constant <=> time-variable)
## 
## localGlobalMoveRatio = 10.0
## # Ratio of local to global moves of events 
## 
## 
## ################################################################################
## # INITIAL PARAMETER VALUES
## ################################################################################
## 
## lambdaInit0 = 0.032
## # Initial speciation rate (at the root of the tree)
## 
## lambdaShift0 = 0
## # Initial shift parameter for the root process
## 
## muInit0 = 0.005
## # Initial value of extinction (at the root)
## 
## initialNumberEvents = 0
## # Initial number of non-root processes
##  
## 
## ################################################################################
## # METROPOLIS COUPLED MCMC
## ################################################################################
## 
## numberOfChains = 4
## # Number of Markov chains to run
## 
## deltaT = 0.01
## # Temperature increment parameter. This value should be > 0
## # The temperature for the i-th chain is computed as 1 / [1 + deltaT * (i - 1)]
## 
## swapPeriod = 1000
## # Number of generations in which to propose a chain swap
## 
## chainSwapFileName = chain_swap.txt
## # File name in which to output data about each chain swap proposal.
## # The format of each line is [generation],[rank_1],[rank_2],[swap_accepted]
## # where [generation] is the generation in which the swap proposal was made,
## # [rank_1] and [rank_2] are the chains that were chosen, and [swap_accepted] is
## # whether the swap was made. The cold chain has a rank of 1.
## 
## 
## ################################################################################
## # NUMERICAL AND OTHER PARAMETERS
## ################################################################################
## 
## minCladeSizeForShift = 1
## # Allows you to constrain location of possible rate-change events to occur
## # only on branches with at least this many descendant tips. A value of 1
## # allows shifts to occur on all branches. 
## 
## segLength = 0.02
## # Controls the "grain" of the likelihood calculations. Approximates the
## # continuous-time change in diversification rates by breaking each branch into
## # a constant-rate diversification segments, with each segment given a length
## # determined by segLength. segLength is in units of the root-to-tip distance of
## # the tree. So, if the segLength parameter is 0.01, and the crown age of your
## # tree is 50, the "step size" of the constant rate approximation will be 0.5.
## # If the value is greater than the branch length (e.g., you have a branch of
## # length < 0.5 in the preceding example) BAMM will not break the branch into
## # segments but use the mean rate across the entire branch.

I ran the BAMM program in the command prompt of my computer. To run it, I first put the control files and my tree files in the folder containing the bamm.exe program, and then I went to the folder by using the cd command. To start the analysis, I used the command bamm -s [the control file name]. After it finished each analysis, BAMM generated four output files: chain swap, event data, MCMC, and run info. The event data file and the MCMC output file are what I need to decipher to create the results.

3.2.2 Deciphering the Outputs With BAMMtools

First, I deciphered the MCMC outputs by using R package coda:

library(coda)

I then generated the MCMC graphs for both the Bayesian tree and the Maximum Likelihood tree for checking whether the MCMC analysis was run properly:

#Creating two panels
par(mfrow=c(1,2))

#MCMC for the Bayesian tree
Bayes_mcmcout <- read.csv("BEAST_mcmc_out.txt", header=T) 
plot(Bayes_mcmcout$logLik ~ Bayes_mcmcout$generation, type = 'l' , ylim = c(-465,-445)
     , xlab = 'MCMC Generation', ylab = 'Log-Likelihood'
     , main = 'MCMC Analysis for \nthe Bayesian Tree')

#MCMC for the Maximum Likelihood tree
ML_mcmcout <- read.csv("IQTree_mcmc_out.txt", header=T) 
plot(ML_mcmcout$logLik ~ ML_mcmcout$generation, type = 'l' , ylim = c(-465,-445)
     , xlab = 'MCMC Generation', ylab = 'Log-Likelihood'
     , main = 'MCMC Analysis for the \nMaximum Liklihood Tree')

Both MCMC runs look good. However, the log-likelihoods of the first few generations of the MCMC runs are too low because they have not converged, I thus discarded the first 10% of the generations as burn-in:

#Creating two panels
par(mfrow=c(1,2))

#Bayesian tree
Bayes_burnstart <- floor(0.1 * nrow(Bayes_mcmcout))
Bayes_postburn <- Bayes_mcmcout[Bayes_burnstart:nrow(Bayes_mcmcout), ]
plot(Bayes_postburn$logLik ~ Bayes_postburn$generation, type = 'l' , ylim = c(-465,-445)
     , xlab = 'MCMC Generation', ylab = 'Log-Likelihood'
     , main = '10% burn-in MCMC Analysis \nfor the Bayesian Tree')

#Maximum Likelihood tree
ML_burnstart <- floor(0.1 * nrow(ML_mcmcout))
ML_postburn <- ML_mcmcout[ML_burnstart:nrow(ML_mcmcout), ]
plot(ML_postburn$logLik ~ ML_postburn$generation, type = 'l' , ylim = c(-465,-445)
     , xlab = 'MCMC Generation', ylab = 'Log-Likelihood'
     , main = '10% burn-in MCMC Analysis \nfor the Maximum Likelihood Tree')

Next, I checked the effective sample sizes of the log-likelihoods and the numbers of shift events. We want effective sample sizes to be greater than 200:

#Bayesian tree
#Numer of shift events
effectiveSize(Bayes_postburn$N_shifts)
##  var1 
## 791.1
#Log-likelihoods
effectiveSize(Bayes_postburn$logLik)
##     var1 
## 808.1391
#Maximum Likelihood tree
#Number of shift events
effectiveSize(ML_postburn$N_shifts)
##     var1 
## 800.9727
#Log-likelihoods
effectiveSize(ML_postburn$logLik)
##     var1 
## 480.9872

All effective sample sizes are greater than 200.

Based on the trees and the priors, BAMM automatically hypothesized all the possible numbers of rate shifts for the trees. These numbers are called shift models. The null model is always 0 shift, and the alternative models are numbers of shifts greater than 0. To find out which model is the best, I calculated the posterior probabilities for all the models. The model with the highest posterior probability is the most supported. The formula for calculating the posterior probability is:

\[PP = N_i / N\]

Where PP is the posterior probability, Ni is how many times a particular model (a particular number of shifts) is supported in all the MCMC generations, and N is the total number of MCMC generations.

#Bayesian tree
Bayes_post_probs <- (table(Bayes_postburn$N_shifts)/nrow(Bayes_postburn))
Bayes_post_probs
## 
##           0           1           2           3 
## 0.870144284 0.106548280 0.022197558 0.001109878
#Maximum Likelihood tree
ML_post_probs <- (table(ML_postburn$N_shifts)/nrow(ML_postburn))
ML_post_probs
## 
##          0          1 
## 0.97336293 0.02663707

For both trees, the null models have the greatest posterior probabilities, thus indicating that the most supported number of rate shifts is 0 in both cases.

To further verify that the null models have the greatest support, I calculated the Bayes factors for all the models by using the following equation:

\[BF = M_i/M_j\]

Where BF is the Bayes factor, Mi is the marginal likelihood of a model, and Mj is the marginal likelihood of another model. Therefore, a Bayes factor is the ratio of marginal likelihoods between two models. Function computeBayesFactors() did all the calculations for me and showed the Bayes factors in matrices:

#Bayesian tree
Bayes_postfile <- "BEAST_mcmc_out.txt"
Bayes_bfmat <- computeBayesFactors(Bayes_postfile, expectedNumberOfShifts=1, burnin=0.1)
Bayes_bfmat
##            0          1   2  3
## 0 1.00000000 4.08333333 9.8 98
## 1 0.24489796 1.00000000 2.4 24
## 2 0.10204082 0.41666667 1.0 10
## 3 0.01020408 0.04166667 0.1  1
#Maximum Likelihood tree
ML_postfile <- "IQTree_mcmc_out.txt"
ML_bfmat <- computeBayesFactors(ML_postfile, expectedNumberOfShifts=1, burnin=0.1)
ML_bfmat
##            0        1
## 0 1.00000000 18.27083
## 1 0.05473204  1.00000

In the matrices, the Bayes factors are equal to the marginal likelihoods of the models in the rows divided by the marginal likelihoods of the models in the columns. The matrices show that the null models have the greatest Bayes factors across all comparisons.

The final step to decipher the outputs is to use graphs to show how the diversification rates changed on the phylogenetic trees, how the rates changed over time, and whether the rates differ among species. To do so, I first loaded the event data:

#Bayesian tree
Bayes_edata <- getEventData(Bayes_tree, eventdata = "BEAST_event_data.txt", burnin=0.1)

#Maximum Likelihood tree
ML_edata <- getEventData(ML_tree, eventdata = "IQTree_event_data.txt", burnin=0.1)

I then graphed the rate changes on the trees:

#Creating two panels
par(mfrow=c(1,2))

#Bayesian tree
Bayes_best_shift <- getBestShiftConfiguration(Bayes_edata, expectedNumberOfShifts=1)
Bayes_best_tree <- plot(Bayes_best_shift, tau=0.001, breaksmethod='linear', lwd=2, method = 'phylogram') 
addBAMMlegend(Bayes_best_tree, direction = 'horizontal'
              , location = 'topleft') 
addBAMMshifts(Bayes_best_shift, par.reset=FALSE, cex=2) 
axisPhylo() 
title(main = 'Bayesian Tree', cex.main = 0.75)

#Maximum Likelihood tree
ML_best_shift <- getBestShiftConfiguration(ML_edata, expectedNumberOfShifts=1)
ML_best_tree <- plot(ML_best_shift, tau=0.001, breaksmethod='linear', lwd=2, method = 'phylogram') 
addBAMMlegend(ML_best_tree, direction = 'horizontal'
              , location = 'topleft') 
addBAMMshifts(ML_best_shift, par.reset=FALSE, cex=2) 
axisPhylo() 
title(main = 'Maximum Likelihood Tree', cex.main = 0.75)

The dark red portions of the trees have the highest diversification rate, and the dark blue portions of the trees have the lowest diversification rate. However, the tree diagrams still do not show us a clear picture of how the rates changed through time, so I plotted line graphs to show the trends of the rate changes for both trees:

#Creating two panels
par(mfrow=c(1,2))

#Bayesian tree
plotRateThroughTime(Bayes_edata, intervalCol="red", avgCol="red", ylim=c(0,1)
                    , cex.axis = 0.8, axis.labels = FALSE) 
title(main = 'Diversification Rate Through \nTime for the Bayesian Tree'
      , cex.main = 0.7)
title(xlab = 'Time Before Present (Ma)', ylab = 'Diversification Rate'
      , cex.lab = 0.8)

#Maximum Likelihood tree
plotRateThroughTime(ML_edata, intervalCol="red", avgCol="red", ylim=c(0,3)
                    , cex.axis = 0.8, axis.labels = FALSE) 
title(main = 'Diversification Rate Through \nTime for the Maximum Likelihood Tree'
      , cex.main = 0.7)
title(xlab = 'Time Before Present (Ma)', ylab = 'Diversification Rate'
      , cex.lab = 0.8)

Although the diversification rate of the Bayesian tree increased over the past 15 million years, and the diversification rate of the Maximum Likelihood tree decreased over the past 14 million years, the rates do not seem to change a lot in both cases. These results suggest that the diversification rate is close to constant through time. Also, the rates of both trees differ a lot; the rate of the Bayesian tree is around 0.3, and the rate of the Maximum Likelihood tree is around 1.9. This difference is probably associated with sample sizes; a large sample size causes BAMM to derive a larger rate.

Lastly, to verify whether the diversification rates are also constant across the whole trees, I built macroevolutionary cohort matrices to compare the rates among different samples of the trees:

#Bayesian tree
Bayes_cmat <- getCohortMatrix(Bayes_edata)
cohorts(Bayes_cmat, Bayes_edata, lwd=3, pal="temperature", use.plot.bammdata=TRUE)

#Maximum Likelihood tree
ML_cmat <- getCohortMatrix(ML_edata)
cohorts(ML_cmat, ML_edata, lwd=3, pal="temperature", use.plot.bammdata=TRUE)

According to the cohort matrices, the rates among the samples are very similar (most of the colors are dark red). These findings strongly suggest that the diversification rates do not vary among different species.

4 Discussion and Conclusion

My hypotheses are that the diversification rates of the butterflies vary among different Delias species, and the rate increased dramatically over time due to the complex geographical features of where the genus is currently distributed. However, all the results do not support my hypotheses for both the Bayesian tree and the Maximum Likelihood tree. The null models of rate shift (no rate shift on the phylogenetic trees) have the greatest posterior probabilities, and the Bayes factors also give the null models the greatest support. Therefore, these findings propose that no dramatic change in diversification rate occurred in the evolutionary history of Delias butterflies. The phylogenetic tree diagrams with rate changes, the rate-through-time graph, and the macroevolutionary cohort matrices further illustrate that the diversification rate of Delias is constant through millions of years, and all the samples have the same rate. Therefore, no particular species diversified at a rate that is different from the others.

5 Facets/Problems That I Solved by Using R and R Markdown

I solved some problems by using R or R Markdown for my project and my writing. By using the R packages, I converted the human-unreadable output content to human-readable figures and numbers that one can see in this R Markdown document. Moreover, I found that citing sources in R Markdown is much easier than citing sources in Microsoft Word. In Microsoft Word, without any citation program to assist me, I need to copy and paste all the citations manually to finish my in-text citations and references. However, in R Markdown, the references could be created automatically. To create my references, I just need to download the citations in BibTeX format from Google Scholar, create a bib file, and put the citations into the file. When I need to cite something, I could use the @ function to specify the citations in the bib file for my writing.

References

Bergstrom, Carl T., and Lee Alan Dugatkin. 2016. Evolution. Book. 2nd ed. W. W. Norton & Company.
Braby, M. F., and J. W. H. Trueman. 2006. “Evolution of Larval Host Plant Associations and Adaptive Radiation in Pierid Butterflies.” Journal Article. Journal of Evolutionary Biology 19 (5): 1677–90. https://doi.org/10.1111/j.1420-9101.2006.01109.x.
Braby, Michael F., and Naomi E. Pierce. 2007. “Systematics, Biogeography and Diversification of the Indo-Australian Genus Delias Hübner (Lepidoptera: Pieridae): Phylogenetic Evidence Supports an ‘Out-of-Australia’ Origin.” Journal Article. Systematic Entomology 32 (1): 2–25. https://doi.org/https://doi.org/10.1111/j.1365-3113.2006.00349.x.
Drummond, Alexei J., and Andrew Rambaut. 2007. “BEAST: Bayesian Evolutionary Analysis by Sampling Trees.” Journal Article. BMC Evolutionary Biology 7 (1): 214. https://doi.org/10.1186/1471-2148-7-214.
Gamerman, Dani, and Hedibert F Lopes. 2006. Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference. CRC press.
Minh, Bui Quang, Heiko A. Schmidt, Olga Chernomor, Dominik Schrempf, Michael D. Woodhams, Arndt von Haeseler, and Robert Lanfear. 2020. “IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era.” Journal Article. Molecular Biology and Evolution 37 (5): 1530–34. https://doi.org/10.1093/molbev/msaa015.
Morinaka, Sadaharu, Erni Erniwati, Nobuhiro Minaka, and Tadashi Miyata. 2018. “Müllerian Mimetic Radiation of Delias Butterflies (Lepidoptera: Pieridae) in Bali and Timor.” Journal Article. Entomological Science 21 (4): 396–405. https://doi.org/https://doi.org/10.1111/ens.12318.
Müller, Chris J., Pável F. Matos-Maraví, and Luciano B. Beheregaray. 2013. “Delving into Delias Hübner (Lepidoptera: Pieridae): Fine-Scale Biogeography, Phylogenetics and Systematics of the World’s Largest Butterfly Genus.” Journal Article. Journal of Biogeography 40 (5): 881–93. https://doi.org/10.1111/jbi.12040.
Rabosky, Daniel L. 2014. “Automatic Detection of Key Innovations, Rate Shifts, and Diversity-Dependence on Phylogenetic Trees.” Journal Article. PLOS ONE 9 (2): e89543. https://doi.org/10.1371/journal.pone.0089543.
Rabosky, Daniel L., Michael Grundler, Carlos Anderson, Pascal Title, Jeff J. Shi, Joseph W. Brown, Huateng Huang, and Joanna G. Larson. 2014. “BAMMtools: An r Package for the Analysis of Evolutionary Dynamics on Phylogenetic Trees.” Journal Article. Methods in Ecology and Evolution 5 (7): 701–7. https://doi.org/https://doi.org/10.1111/2041-210X.12199.
Toussaint, Emmanuel F. A., Robert Hall, Michael T. Monaghan, Katayo Sagata, Sentiko Ibalim, Helena V. Shaverdo, Alfried P. Vogler, Joan Pons, and Michael Balke. 2014. “The Towering Orogeny of New Guinea as a Trigger for Arthropod Megadiversity.” Journal Article. Nature Communications 5 (1): 4001. https://doi.org/10.1038/ncomms5001.
Wee, Jocelyn Liang Qi, and Antónia Monteiro. 2017. “Yellow and the Novel Aposematic Signal, Red, Protect Delias Butterflies from Predators.” Journal Article. PLOS ONE 12 (1): e0168243. https://doi.org/10.1371/journal.pone.0168243.