Before starting this lab, read the “PhyloII_reading.pdf” document on D2L. Be sure to address all items on the rubric listed at the end of the document.

Install the packages needed for this lab. There are many packages available for use with R that are not part of the base install. These particular packages are commonly used for phylogenetic analyses. When prompted, choose to update all (a), and install from sources (y). This may take some time, read ahead while you wait.

Make sure your fasta file is saved in the same place as this Rmd file and your working directory is set to this location also.

## Warning: package 'phangorn' was built under R version 3.3.3
## Warning: package 'ape' was built under R version 3.3.3

BEAR TREE

Read the fasta file into R by editing the file name below to reflect the actual name of your file.

Align the DNA sequences using the Muscle algorithm. A DNA sequence alignment shows where sequences have different bases, as well as insertions and deletions.

alignment <- msa(seqs,method='Muscle')
print(alignment, show = "alignment")
## 
## MsaDNAMultipleAlignment with 6 rows and 1140 columns
##     aln                                               names
## [1] ATGACCAACATCCGAAAAACTCA...AAAATAACCTCTCAAAGTGAAGA T.ornatus
## [2] ATGACCAACATCCGAAAAACCCA...AAAACAACCTCTCAAAATGAAGA U.americanus
## [3] ATGACCAACATCCGAAAAACCCA...AAAACAACCTCTCAAAATGAAGA U.thibetanusussur...
## [4] ATGACCAACATCCGAAAAACCCA...AAAACAACCTCTCAAAATGAAGA U.thibetanus
## [5] ATGACCAACATCCGAAAAACCCA...AAAATAATCTCTCAAAGTGAAGA M.ursinus
## [6] ATGACCAACATCCGAAAAACCCA...AAAATAACCTCTCAAAATGAAGA H.malayanus
## Con ATGACCAACATCCGAAAAACCCA...AAAA?AACCTCTCAAAATGAAGA Consensus

Export the alignment to a fasta file, for use in the next steps. This requires conversion to DNAStringSet first. This simply converts the characters recognized as nucleotide bases that can be used in alignments and tree making.

DNAStr <- as(alignment, "DNAStringSet")
writeXStringSet(DNAStr, file="Bears_alignment.fasta")

Read in the fasta alignment file that you just generated.

bears <- read.phyDat("Bears_alignment.fasta",
                     format="fasta",type="DNA")

#Calculate pairwise distances between sequences.
dm <- dist.ml(bears)

Infer a UPGMA tree based on the pairwise differences between sequences. UPGMA stands for Unweighted Pair Group Method with Arithmetic Mean. This is a hierarchical clustering method, meaning that pairs of taxa are clustered into a higher level cluster, which is clustered to the next most similar cluster, and so on.

treeUPGMA <- upgma(dm)

Root the tree by the designated outgroup. The outgroup name must exactly match that printed on your tree.

rootedtree <- root(treeUPGMA,outgroup='T.ornatus')

#Plot the rooted UPGMA tree.
plot.phylo(rootedtree, main="Bear Phylogenetic tree")

Calculate the parsimony score.

parsim <- parsimony(treeUPGMA, bears)

This tree requires a minimum of 354 DNA sequences changes.

SKULL TREE

Follow the steps above to infer a rooted phylogeny for your skull group DNA sequence data. Write your own annotations and code below. We expect your code to be well annotated (#comments). You will lose points for insufficient annotations, and/or not completing all the tasks.

#Sylvilagus floridanus: Cottontail Rabbit
#AY292724.1 

#Marmota monax: Woodchuck
#AF100719.1

#Sciurus carolinensis: Grey Squirrel
#FJ200744.1

#Cynomys ludovicianus: Prairie Dog
#AF157890.1

#Solenodon paradoxus (PARTIAL CDS ONLY: 411bp): Haitian Solenodon
#LN994573.1

#YOUR CODE HERE 
#USE ANNOTATIONS!
rodentsseqs <- readDNAStringSet("sequence_cottontail.fasta")

rodentsalignment <- msa(rodentsseqs,method='Muscle')
print(rodentsalignment, show = "alignment")
## 
## MsaDNAMultipleAlignment with 5 rows and 1140 columns
##     aln                                               names
## [1] -----------------------...----------------------- S.paradoxus
## [2] ATGACCAACATCCGTAAAACCCA...AGAACAAAATCCTCAAATGAAGG S.floridanus
## [3] ATGACAAATACCCGCAAAACCCA...AGAATAAGCTCCTTAAATGAAGA S.carolinensis
## [4] ATGACAAACACCCGCAAAACCCA...AAAACAAACTTCTTAAATGAAGA M.monax
## [5] ATGACAAACACTCGCAAAACCCA...AAAATAAACTTCTTAAATGAAGA C.ludovicianus
## Con ATGACAAACACCCGCAAAACCCA...A?AA?AAACT?CTTAAATGAAGA Consensus
rodentsDNAStr <- as(rodentsalignment, "DNAStringSet")
writeXStringSet(rodentsDNAStr, file="rodents_alignment.fasta")

rodents <- read.phyDat("rodents_alignment.fasta",
                     format="fasta",type="DNA")

#Calculate pairwise distances between sequences.
rodentsdm <- dist.ml(rodents)

rodentstreeUPGMA <- upgma(rodentsdm)

rodentsrootedtree <- root(rodentstreeUPGMA,outgroup="S.paradoxus")

#Plot the rooted UPGMA tree.
plot.phylo(rodentsrootedtree, main="Rodents Phylogenetic Tree")
Figure 2:In this figure we see a phylogenetic tree of reodents with Solenodon paradoxus as the outgroup or common ancestor. This tree shows the most likely relationships between the rodents based off the cytochrome B gene. The tree requires 599 DNA sequence changes minimum.

Figure 2:In this figure we see a phylogenetic tree of reodents with Solenodon paradoxus as the outgroup or common ancestor. This tree shows the most likely relationships between the rodents based off the cytochrome B gene. The tree requires 599 DNA sequence changes minimum.

rodentsparsim <- parsimony(rodentstreeUPGMA,rodents)

```

Questions

  1. Was your character-based tree the same as your molecular-based tree? If they were different, how were they different? (2 pts)

I used Solenodon paradoxus as my outgroup in my molecular-based tree because when I tried to use Sylvatius floridanus it created a polytomy because it was unable to resolve some of the relationships.

  1. Why might these two hypotheses about relatedness differ? You might find it helpful to look at the literature about molecular vs. morphological phylogenies. (2 pts)

The molecular relatedness is different than the morphological phylogeny because rather than looking at phenotype the molecular phylogeny considers genotype. Phenotype can vary even when genes are the same.

  1. If you could redo your character analysis, what are 3 new characters you think could make your analysis better, and why do you think these traits would be good for inferring relatedness? For this question, you may assume that you have access to the entire animal in its natural habitat. (1 pt)

Skull width: Scientifically skull width shows brain development. Marsupials tend to have narrower skulls than other mammals.

Facial Muscle Attachment: Muscle attachment varies depending on diet and other factors. These are species specific and help differentiate between species.

Number of teeth: The number of teeth also varies based off diet. Lagomorphs have an extra set of inscors than Rodentia. This will further help seperate the different species from each other.

Rubric

\(\square\) Bear tree (3 points)

\(\square\) Figure caption for bear tree (2 points)

\(\square\) Skull tree (5 points)

\(\square\) Figure caption for skull tree (3 points)

\(\square\) Answers to three questions (5 points)

\(\square\) Well annotated and tidy code (1 point)

\(\square\) Knitted to HTML (1 point)

20 points total