Preliminaries

# install.packages(ape)
# install.packages(phangorn)

library(ape)
library(phangorn)

Example 1: 3 Sequences

Similarity matrix

This matrix is based on the proportion of bases that are identical between sequence. This is often referred to as PID for Proportion Identical or Percentage Identical.

BLAST reports PID in its main output. PID is a very simple metric of similarity; more sophisticated measures are used in pratice.

Make a similarity matrix with the matrix() command. Note that I have to declare the number of rows

# Bad matrix 1
matrix(c(1.0, 0.5, 0.3,
         0.5, 1.0, 0.4,
         0.3, 0.4, 1.0))
##       [,1]
##  [1,]  1.0
##  [2,]  0.5
##  [3,]  0.3
##  [4,]  0.5
##  [5,]  1.0
##  [6,]  0.4
##  [7,]  0.3
##  [8,]  0.4
##  [9,]  1.0
# Good matrix
matrix(c(1.0, 0.5, 0.3,
         0.5, 1.0, 0.4,
         0.3, 0.4, 1.0),
       nrow = 3)
##      [,1] [,2] [,3]
## [1,]  1.0  0.5  0.3
## [2,]  0.5  1.0  0.4
## [3,]  0.3  0.4  1.0

Store the matrix

my_sim_mat <- matrix(c(1.0, 0.5, 0.3,
                       0.5, 1.0, 0.4,
                       0.3, 0.4, 1.0),
                 nrow = 3,
                 byrow = T)

Label the matrix with row.names() and colnames()

row.names(my_sim_mat) <- c("G","T","M")
colnames(my_sim_mat) <- c("G","T","M")

Disimilarity matrix

Similarity, disimilarity, and distance are all related. Most methods use distance, not similarity.

We can do vectorized math to recalculate the matrix

my_dist_mat <- 1-my_sim_mat
my_dist_mat
##     G   T   M
## G 0.0 0.5 0.7
## T 0.5 0.0 0.6
## M 0.7 0.6 0.0

Convert to R’s distance format

my_dist_mat2 <- as.dist(my_dist_mat)
my_dist_mat2
##     G   T
## T 0.5    
## M 0.7 0.6

Build a neighbor-joining (nj) tree

Neighbor Joining is one of the most common ways to build a tree using molecular data that’s been converted to sequences; its one of the options within BLAST.

Build the tree with nj()

my_nj <- ape::nj(my_dist_mat2)

Plot the tree as an “unrooted” tree

plot(my_nj, "unrooted")

Plot the tree as an “rooted” tree

plot(my_nj)

UPGMA/WPGMA are other algorithms that work with distance matrices. They are not commonly used now but are useful for teaching becaues they can easily be done by hand on small datasets.

my_upgma <- phangorn::upgma(my_dist_mat2)

Plot the UPGMA tree

plot(my_upgma)

Compare the rooted NJ and the UPGMA

par(mfrow = c(1,2))
plot(my_nj)
plot(my_upgma)

WPGMA tree

plot(wpgma(my_dist_mat2))

Minimum evolution tree

plot(fastme.ols(my_dist_mat2))

Example 2: 5 Sequences

In preparation to building these phylogenetic trees, I chose ten polymorphic loci from five chimp alignment sequences. The loci were all dimorphic with two alleles. Then, I compared the sequences in a pairwise-similarity manner (i.e. I looked at two sequences at a time). I noted how many times the sequence pairs were identical and also the proportion to which they were. After comparing the ten different combinations of the five sequences, I was able to construct a PID-similarity matrix and a dis-similarity matrix.

Build the matrix.

Be sure to add the nrow = … statemetn.

five_sim_mat <- matrix(c(1.0,     0.0,  0.0,  0.0,  0.0,        
                         0.9,   1.0,    0.0,  0.0,  0.0,                
                         0.5,   0.6,    1.0,    0.0,  0.0,      
                         0.6,     0.5,  0.5,    1.0,    0.0,        
                         0.6,   0.5,    0.3,    0.4,    1.0),
                       nrow = 5,
                       byrow = T) ######

Name things

row.names(five_sim_mat) <- c("ME", "B", "G", "T", "MW") ######
colnames(five_sim_mat) <- c("ME", "B", "G", "T", "MW")  ######

Turn into a distnace matrix. This is 2 steps and requires the as.dist() command

five_dist_mat <- 1-five_sim_mat   ######
five_dist_mat2 <- as.dist(five_dist_mat) ######
five_dist_mat
##     ME   B   G   T MW
## ME 0.0 1.0 1.0 1.0  1
## B  0.1 0.0 1.0 1.0  1
## G  0.5 0.4 0.0 1.0  1
## T  0.4 0.5 0.5 0.0  1
## MW 0.4 0.5 0.7 0.6  0
five_dist_mat2
##     ME   B   G   T
## B  0.1            
## G  0.5 0.4        
## T  0.4 0.5 0.5    
## MW 0.4 0.5 0.7 0.6

Neighbor-Joining tree with nj()

five_nj <- nj(five_dist_mat2)
five_nj
## 
## Phylogenetic tree with 5 tips and 3 internal nodes.
## 
## Tip labels:
##   ME, B, G, T, MW
## 
## Unrooted; includes branch lengths.

Plot unrooted NJ tree

plot(five_nj, "unrooted")

Plot rooted NJ tree

plot(five_nj)

Build UPGMA tree

five_upgma <- phangorn::upgma(five_dist_mat2)
five_upgma
## 
## Phylogenetic tree with 5 tips and 4 internal nodes.
## 
## Tip labels:
##   ME, B, G, T, MW
## 
## Rooted; includes branch lengths.

Plot UPGMA tree

plot(five_upgma)

Compare rooted NJ and UPGMA plots

par(mfrow = c(1,2))
plot(five_nj)
plot(five_upgma)

Build WPGMA tree

plot(wpgma(five_dist_mat2))

Compare rooted WPGMA and UPGMA plots

par(mfrow = c(1,2))
plot(wpgma(five_dist_mat2))
plot(five_upgma)

Build Minimum evolution tree

plot(fastme.ols(five_dist_mat2))