A Complete Bioinformatics Workflow in R

By: Jared H Harris

“Worked example: Building a Phylogeny in R”

Introduction

Describe how phylogeneies can be used in biology (readings will be assigned)

Phylogenies are used in biology to hypothesize the evolutionary relationship between genes, species, or families of organisms. This can be useful in understanding evolution and how genes and species have changed and evolved over time.

Vocab

Make a list of at least 10 vocab terms that are important (don’t have to define) 1. Accession Number 2. Fasta Files 3. Multiple Sequence Alignment 4. Reproducible Work Flow 5. NCIB 6. Pairwise Alignment 7. Genome
8. Amino acid (single letter shortcut) 9. Genetic mutation 10. Phylogeny

Key functions

Make a list of at least 5 key functions Put in the format of package::function 1. Bioconductor :: Biostrings 2. Bioconductor :: msa 3. CRAN :: rentrez 4. CRAN :: entrez_fetch 5. ggmsa :: ggmsa

Software Preliminaires

Add the necessary calls to library() to load call packages Indicate which packages cam from Bioconductor, CRAN, and GitHub

Load packages into memory

# github packages
# devtools::install_github("brouwern/compbio4all")
# devtools::install_github("YuLab-SMU/ggmsa")
library(compbio4all)
library(ggmsa)

# CRAN packages
library(rentrez)
library(seqinr)
library(ape)

# Bioconductor packages
library(BiocManager)
library(Biostrings)
library(msa)

Downloading Macromolecular Sequences

The code chunk here is calling up the protein sequence from the human shroom file in NCBI database and saving it as hShroom3. The entrez_fetch function from the Rentrez package calls up the data using the genetic code’s accession number with the NCBI.

# Human shroom 3 (H. sapiens)
hShroom3 <- rentrez::entrez_fetch(db = "protein", 
                          id = "NP_065910", 
                          rettype = "fasta")

The cat() function is allowing the output of hShroom3 to be viewed in a way that respects the new line command (/n). The output is a continuous string of the amino acids that the shroom protein is comprised of in this organism. This format looks nicer on the screen which will allow it to be read more easily.

cat(hShroom3)

## >NP_065910.3 protein Shroom3 [Homo sapiens]
## MMRTTEDFHKPSATLNSNTATKGRYIYLEAFLEGGAPWGFTLKGGLEHGEPLIISKVEEGGKADTLSSKL
## QAGDEVVHINEVTLSSSRKEAVSLVKGSYKTLRLVVRRDVCTDPGHADTGASNFVSPEHLTSGPQHRKAA
## WSGGVKLRLKHRRSEPAGRPHSWHTTKSGEKQPDASMMQISQGMIGPPWHQSYHSSSSTSDLSNYDHAYL
## RRSPDQCSSQGSMESLEPSGAYPPCHLSPAKSTGSIDQLSHFHNKRDSAYSSFSTSSSILEYPHPGISGR
## ERSGSMDNTSARGGLLEGMRQADIRYVKTVYDTRRGVSAEYEVNSSALLLQGREARASANGQGYDKWSNI
## PRGKGVPPPSWSQQCPSSLETATDNLPPKVGAPLPPARSDSYAAFRHRERPSSWSSLDQKRLCRPQANSL
## GSLKSPFIEEQLHTVLEKSPENSPPVKPKHNYTQKAQPGQPLLPTSIYPVPSLEPHFAQVPQPSVSSNGM
## LYPALAKESGYIAPQGACNKMATIDENGNQNGSGRPGFAFCQPLEHDLLSPVEKKPEATAKYVPSKVHFC
## SVPENEEDASLKRHLTPPQGNSPHSNERKSTHSNKPSSHPHSLKCPQAQAWQAGEDKRSSRLSEPWEGDF
## QEDHNANLWRRLEREGLGQSLSGNFGKTKSAFSSLQNIPESLRRHSSLELGRGTQEGYPGGRPTCAVNTK
## AEDPGRKAAPDLGSHLDRQVSYPRPEGRTGASASFNSTDPSPEEPPAPSHPHTSSLGRRGPGPGSASALQ
## GFQYGKPHCSVLEKVSKFEQREQGSQRPSVGGSGFGHNYRPHRTVSTSSTSGNDFEETKAHIRFSESAEP
## LGNGEQHFKNGELKLEEASRQPCGQQLSGGASDSGRGPQRPDARLLRSQSTFQLSSEPEREPEWRDRPGS
## PESPLLDAPFSRAYRNSIKDAQSRVLGATSFRRRDLELGAPVASRSWRPRPSSAHVGLRSPEASASASPH
## TPRERHSVTPAEGDLARPVPPAARRGARRRLTPEQKKRSYSEPEKMNEVGIVEEAEPAPLGPQRNGMRFP
## ESSVADRRRLFERDGKACSTLSLSGPELKQFQQSALADYIQRKTGKRPTSAAGCSLQEPGPLRERAQSAY
## LQPGPAALEGSGLASASSLSSLREPSLQPRREATLLPATVAETQQAPRDRSSSFAGGRRLGERRRGDLLS
## GANGGTRGTQRGDETPREPSSWGARAGKSMSAEDLLERSDVLAGPVHVRSRSSPATADKRQDVLLGQDSG
## FGLVKDPCYLAGPGSRSLSCSERGQEEMLPLFHHLTPRWGGSGCKAIGDSSVPSECPGTLDHQRQASRTP
## CPRPPLAGTQGLVTDTRAAPLTPIGTPLPSAIPSGYCSQDGQTGRQPLPPYTPAMMHRSNGHTLTQPPGP
## RGCEGDGPEHGVEEGTRKRVSLPQWPPPSRAKWAHAAREDSLPEESSAPDFANLKHYQKQQSLPSLCSTS
## DPDTPLGAPSTPGRISLRISESVLRDSPPPHEDYEDEVFVRDPHPKATSSPTFEPLPPPPPPPPSQETPV
## YSMDDFPPPPPHTVCEAQLDSEDPEGPRPSFNKLSKVTIARERHMPGAAHVVGSQTLASRLQTSIKGSEA
## ESTPPSFMSVHAQLAGSLGGQPAPIQTQSLSHDPVSGTQGLEKKVSPDPQKSSEDIRTEALAKEIVHQDK
## SLADILDPDSRLKTTMDLMEGLFPRDVNLLKENSVKRKAIQRTVSSSGCEGKRNEDKEAVSMLVNCPAYY
## SVSAPKAELLNKIKEMPAEVNEEEEQADVNEKKAELIGSLTHKLETLQEAKGSLLTDIKLNNALGEEVEA
## LISELCKPNEFDKYRMFIGDLDKVVNLLLSLSGRLARVENVLSGLGEDASNEERSSLYEKRKILAGQHED
## ARELKENLDRRERVVLGILANYLSEEQLQDYQHFVKMKSTLLIEQRKLDDKIKLGQEQVKCLLESLPSDF
## IPKAGALALPPNLTSEPIPAGGCTFSGIFPTLTSPL

This code chuck is calling up the mouse shroom genome sequence from NCBI. The fasta files of the amino acid sequence will be stored in each of the files, mShroom3a, hShroom2, and sShroom. This will allow us to later compare the different sets of data side by side.

# Mouse shroom 3a (M. musculus)
mShroom3a <- entrez_fetch(db = "protein", 
                          id = "AAF13269", 
                          rettype = "fasta")

# Human shroom 2 (H. sapiens)
hShroom2 <- entrez_fetch(db = "protein", 
                          id = "CAA58534", 
                          rettype = "fasta")


# Sea-urchin shroom
sShroom <- entrez_fetch(db = "protein", 
                          id = "XP_783573", 
                          rettype = "fasta")

This code chunk is outputting the number of characters in each of the saved shroom files. This will output the number of amino acids in each fasta file.

nchar(hShroom3)

## [1] 2070

nchar(mShroom3a)

## [1] 2083

nchar(sShroom)

## [1] 1758

nchar(hShroom2)

## [1] 1673

Prepping macromolecular sequences

The fasta_cleaner function takes the fasta files that we uploaded and edits them into the proper format for sequences and phylogenetic trees.

library(compbio4all)
fasta_cleaner

## function (fasta_object, parse = TRUE) 
## {
##     fasta_object <- sub("^(>)(.*?)(\\n)(.*)(\\n\\n)", "\\4", 
##         fasta_object)
##     fasta_object <- gsub("\n", "", fasta_object)
##     if (parse == TRUE) {
##         fasta_object <- stringr::str_split(fasta_object, pattern = "", 
##             simplify = FALSE)
##     }
##     return(fasta_object[[1]])
## }
## <bytecode: 0x7fc00557d1a8>
## <environment: namespace:compbio4all>

To add fasta_cleaner to R if you don’t have compbio4all downloaded, you can input devtools :: install_github(“brouwern/compbio4all”) which will install the file to the current R session.

fasta_cleaner <- function(fasta_object, parse = TRUE){

  fasta_object <- sub("^(>)(.*?)(\\n)(.*)(\\n\\n)","\\4",fasta_object)
  fasta_object <- gsub("\n", "", fasta_object)

  if(parse == TRUE){
    fasta_object <- stringr::str_split(fasta_object,
                                       pattern = "",
                                       simplify = FALSE)
  }

  return(fasta_object[[1]])
}

This chunk of code is “cleaning up the shroom files so that the data will be properly formatted to make sequences and phylogentic trees. It is”scrubbing" our data.

hShroom3  <- fasta_cleaner(hShroom3,  parse = F)
mShroom3a <- fasta_cleaner(mShroom3a, parse = F)
hShroom2  <- fasta_cleaner(hShroom2,  parse = F)
sShroom   <- fasta_cleaner(sShroom,   parse = F)

hShroom3

## [1] "MMRTTEDFHKPSATLNSNTATKGRYIYLEAFLEGGAPWGFTLKGGLEHGEPLIISKVEEGGKADTLSSKLQAGDEVVHINEVTLSSSRKEAVSLVKGSYKTLRLVVRRDVCTDPGHADTGASNFVSPEHLTSGPQHRKAAWSGGVKLRLKHRRSEPAGRPHSWHTTKSGEKQPDASMMQISQGMIGPPWHQSYHSSSSTSDLSNYDHAYLRRSPDQCSSQGSMESLEPSGAYPPCHLSPAKSTGSIDQLSHFHNKRDSAYSSFSTSSSILEYPHPGISGRERSGSMDNTSARGGLLEGMRQADIRYVKTVYDTRRGVSAEYEVNSSALLLQGREARASANGQGYDKWSNIPRGKGVPPPSWSQQCPSSLETATDNLPPKVGAPLPPARSDSYAAFRHRERPSSWSSLDQKRLCRPQANSLGSLKSPFIEEQLHTVLEKSPENSPPVKPKHNYTQKAQPGQPLLPTSIYPVPSLEPHFAQVPQPSVSSNGMLYPALAKESGYIAPQGACNKMATIDENGNQNGSGRPGFAFCQPLEHDLLSPVEKKPEATAKYVPSKVHFCSVPENEEDASLKRHLTPPQGNSPHSNERKSTHSNKPSSHPHSLKCPQAQAWQAGEDKRSSRLSEPWEGDFQEDHNANLWRRLEREGLGQSLSGNFGKTKSAFSSLQNIPESLRRHSSLELGRGTQEGYPGGRPTCAVNTKAEDPGRKAAPDLGSHLDRQVSYPRPEGRTGASASFNSTDPSPEEPPAPSHPHTSSLGRRGPGPGSASALQGFQYGKPHCSVLEKVSKFEQREQGSQRPSVGGSGFGHNYRPHRTVSTSSTSGNDFEETKAHIRFSESAEPLGNGEQHFKNGELKLEEASRQPCGQQLSGGASDSGRGPQRPDARLLRSQSTFQLSSEPEREPEWRDRPGSPESPLLDAPFSRAYRNSIKDAQSRVLGATSFRRRDLELGAPVASRSWRPRPSSAHVGLRSPEASASASPHTPRERHSVTPAEGDLARPVPPAARRGARRRLTPEQKKRSYSEPEKMNEVGIVEEAEPAPLGPQRNGMRFPESSVADRRRLFERDGKACSTLSLSGPELKQFQQSALADYIQRKTGKRPTSAAGCSLQEPGPLRERAQSAYLQPGPAALEGSGLASASSLSSLREPSLQPRREATLLPATVAETQQAPRDRSSSFAGGRRLGERRRGDLLSGANGGTRGTQRGDETPREPSSWGARAGKSMSAEDLLERSDVLAGPVHVRSRSSPATADKRQDVLLGQDSGFGLVKDPCYLAGPGSRSLSCSERGQEEMLPLFHHLTPRWGGSGCKAIGDSSVPSECPGTLDHQRQASRTPCPRPPLAGTQGLVTDTRAAPLTPIGTPLPSAIPSGYCSQDGQTGRQPLPPYTPAMMHRSNGHTLTQPPGPRGCEGDGPEHGVEEGTRKRVSLPQWPPPSRAKWAHAAREDSLPEESSAPDFANLKHYQKQQSLPSLCSTSDPDTPLGAPSTPGRISLRISESVLRDSPPPHEDYEDEVFVRDPHPKATSSPTFEPLPPPPPPPPSQETPVYSMDDFPPPPPHTVCEAQLDSEDPEGPRPSFNKLSKVTIARERHMPGAAHVVGSQTLASRLQTSIKGSEAESTPPSFMSVHAQLAGSLGGQPAPIQTQSLSHDPVSGTQGLEKKVSPDPQKSSEDIRTEALAKEIVHQDKSLADILDPDSRLKTTMDLMEGLFPRDVNLLKENSVKRKAIQRTVSSSGCEGKRNEDKEAVSMLVNCPAYYSVSAPKAELLNKIKEMPAEVNEEEEQADVNEKKAELIGSLTHKLETLQEAKGSLLTDIKLNNALGEEVEALISELCKPNEFDKYRMFIGDLDKVVNLLLSLSGRLARVENVLSGLGEDASNEERSSLYEKRKILAGQHEDARELKENLDRRERVVLGILANYLSEEQLQDYQHFVKMKSTLLIEQRKLDDKIKLGQEQVKCLLESLPSDFIPKAGALALPPNLTSEPIPAGGCTFSGIFPTLTSPL"

Aligning Sequences: Pairwise Alignments

This code chunk is creating a pairwise alignment using the human and mouse shroom sequences. This will allow us to create a mulitple sequence alignment in later code chunks.

align.h3.vs.m3a <- Biostrings::pairwiseAlignment(
  hShroom3, 
  mShroom3a)

This object shows the pairwise alignment of the human shroom amino acids and mouse shroom amino acids. It also shows the score of the alignment.

align.h3.vs.m3a

## Global PairwiseAlignmentsSingleSubject (1 of 1)
## pattern: MMRTTEDFHKPSATLN-SNTATKGRYIYLEAFLE...KAGALALPPNLTSEPIPAGGCTFSGIFPTLTSPL
## subject: MK-TPENLEEPSATPNPSRTPTE-RFVYLEALLE...KAGAISLPPALTGHATPGGTSVFGGVFPTLTSPL
## score: 2189.934

The pid() function shows the percent identification of the aligned sequences. This basically means that the output is the percentage of amino acids that the two sequences have in common.

# add necessary function *No idea if this is what he wanted to be added here-check at office hours
Biostrings:: pid(align.h3.vs.m3a)

## [1] 70.56511

This code chunk is aligning the sequences of human shroom protein amino acids. It is creating a new save file for the alignment between the two different human shroom proteins that will give the output of the pairwise alignment.

align.h3.vs.h2 <- Biostrings::pairwiseAlignment(
                  hShroom3,
                  hShroom2)

This code chunk gives the score of the pairwise alignment.

score(align.h3.vs.h2)

## [1] -5673.853

The score is a mathematical summary of the alignment. The score is used to compare with other scores. PID is percent identity. It is a rough estimate of how similar two sequences are and can be reported as a decimal or a percentage.

Biostrings::pid(align.h3.vs.h2)

## [1] 33.83277

The Shroom Family of Genes

There is a whole table here in order to create a vector of a bunch of the shroom sequences from different organisms. It takes the raw data and makes a nice table. It will allow us to create a multiple sequence alignment with the data.

shroom_table <- c("CAA78718" , "X. laevis Apx" ,         "xShroom1",
            "NP_597713" , "H. sapiens APXL2" ,     "hShroom1",
            "CAA58534" , "H. sapiens APXL",        "hShroom2",
            "ABD19518" , "M. musculus Apxl" ,      "mShroom2",
            "AAF13269" , "M. musculus ShroomL" ,   "mShroom3a",
            "AAF13270" , "M. musculus ShroomS" ,   "mShroom3b",
            "NP_065910", "H. sapiens Shroom" ,     "hShroom3",
            "ABD59319" , "X. laevis Shroom-like",  "xShroom3",
            "NP_065768", "H. sapiens KIAA1202" ,   "hShroom4a",
            "AAK95579" , "H. sapiens SHAP-A" ,     "hShroom4b",
            #"DQ435686" , "M. musculus KIAA1202" ,  "mShroom4",
            "ABA81834" , "D. melanogaster Shroom", "dmShroom",
            "EAA12598" , "A. gambiae Shroom",      "agShroom",
            "XP_392427" , "A. mellifera Shroom" ,  "amShroom",
            "XP_783573" , "S. purpuratus Shroom" , "spShroom") #sea urchin

This next code chunk is converting the data into different forms. We create a matrix first, then a data frame. From there we change the names of the columns and change the scientific names in the data to simplified species names that will make the data easier to read.

# convert to Matrix
shroom_table_matrix <- matrix(shroom_table,
                                  byrow = T,
                                  nrow = 14)
# convert to Data Frame
shroom_table <- data.frame(shroom_table_matrix, 
                     stringsAsFactors = F)

# Naming Columns
names(shroom_table) <- c("accession", "name.orig","name.new")

# Create simplified species names
shroom_table$spp <- "Homo"
shroom_table$spp[grep("laevis",shroom_table$name.orig)] <- "Xenopus"
shroom_table$spp[grep("musculus",shroom_table$name.orig)] <- "Mus"
shroom_table$spp[grep("melanogaster",shroom_table$name.orig)] <- "Drosophila"
shroom_table$spp[grep("gambiae",shroom_table$name.orig)] <- "mosquito"
shroom_table$spp[grep("mellifera",shroom_table$name.orig)] <- "bee"
shroom_table$spp[grep("purpuratus",shroom_table$name.orig)] <- "sea urchin"

This code chunk is will output the table that was created in the previous chunk, displaying the matrix with the updated names from the last code chunk.

shroom_table

##    accession              name.orig  name.new        spp
## 1   CAA78718          X. laevis Apx  xShroom1    Xenopus
## 2  NP_597713       H. sapiens APXL2  hShroom1       Homo
## 3   CAA58534        H. sapiens APXL  hShroom2       Homo
## 4   ABD19518       M. musculus Apxl  mShroom2        Mus
## 5   AAF13269    M. musculus ShroomL mShroom3a        Mus
## 6   AAF13270    M. musculus ShroomS mShroom3b        Mus
## 7  NP_065910      H. sapiens Shroom  hShroom3       Homo
## 8   ABD59319  X. laevis Shroom-like  xShroom3    Xenopus
## 9  NP_065768    H. sapiens KIAA1202 hShroom4a       Homo
## 10  AAK95579      H. sapiens SHAP-A hShroom4b       Homo
## 11  ABA81834 D. melanogaster Shroom  dmShroom Drosophila
## 12  EAA12598      A. gambiae Shroom  agShroom   mosquito
## 13 XP_392427    A. mellifera Shroom  amShroom        bee
## 14 XP_783573   S. purpuratus Shroom  spShroom sea urchin

Aligning multiple sequences

The `$` allows us to call up a specific column within the table and display it in R by itself. In this case, the output is the “accession” column.

shroom_table$accession

##  [1] "CAA78718"  "NP_597713" "CAA58534"  "ABD19518"  "AAF13269"  "AAF13270" 
##  [7] "NP_065910" "ABD59319"  "NP_065768" "AAK95579"  "ABA81834"  "EAA12598" 
## [13] "XP_392427" "XP_783573"

This code chunk contains the entrez_fetch function which is taking the accession column from the saved file “shroom_table” and inputting that data into a new save file called “shrooms.”

# add necessary function *I think this is right
shrooms <-  rentrez :: entrez_fetch(db = "protein", 
                          id = shroom_table$accession, 
                          rettype = "fasta")

This code chunk uses the cat() function in order to create a better looking output of the data.

cat(shrooms)

This code chunk is using the entrez_fetch_list function to compile the shroom sequences into a single vector. It is different from the previous chunks because it pulls the entire list of shroom sequences we have used as opposed to a single one.

shrooms_list <- entrez_fetch_list(db = "protein", 
                          id = shroom_table$accession, 
                          rettype = "fasta")
is(shrooms_list)

## [1] "list"             "vector"           "list_OR_List"     "vector_OR_Vector"
## [5] "vector_OR_factor"

length(shrooms_list)

## [1] 14

nchar(shrooms_list)

##  CAA78718 NP_597713  CAA58534  ABD19518  AAF13269  AAF13270 NP_065910  ABD59319 
##      1486       915      1673      1543      2083      1895      2070      1864 
## NP_065768  AAK95579  ABA81834  EAA12598 XP_392427 XP_783573 
##      1560       778      1647       750      2230      1758

This code chunk is necessary to see the length of the list created (shrooms_list) and it allows us to see the length of the sequence.

length(shrooms_list)

## [1] 14

The for() function allows for a task to be repeated mulitple times. This used to allow the fasta_cleaner to clean the data mulitple times.

for(i in 1:length(shrooms_list)){
  shrooms_list[[i]] <- fasta_cleaner(shrooms_list[[i]], parse = F)
}

The rep() function is used to replicate values. The for() function, also known a a for loop is used to repeat a task mulitple times, and in this instance it is being used to clean to the data multiple times. The names() function is being used to add names to the newly created shrooms vector.

# Replication
shrooms_vector <- rep(NA, length(shrooms_list))

# For loop cleaning
for(i in 1:length(shrooms_vector)){
  shrooms_vector[i] <- shrooms_list[[i]]
}

#  Adding names
names(shrooms_vector) <- names(shrooms_list)

The function AAStringSet is used to clean the data. It converts our vector into a string set. It is not important to know how it works because it is not very straight forward.

# 
shrooms_vector_ss <- Biostrings:: AAStringSet       (shrooms_vector)

MSA

An MSA is a Multiple Sequence Alignment that allows us to comapre data side by side. It will output many different genetic sequnces that we can input and the function will output them in an MSA that compiles them all side by side.

Building a Multple Sequence Alignment (MSA)

This sequence is creating a mulitple sequence alignment from the data set “shrooms_align” that we created earlier.

shrooms_align <-  msa(shrooms_vector_ss,
                     method = "ClustalW")

## use default substitution matrix

Viewing an MSA

This section will allow us to see the MSA that we created and actually view the data we collected in one single dataframe.

Viewing an MSA in R

The output below is a mulitple sequence alignment of the shroom data.

shrooms_align

## CLUSTAL 2.1  
## 
## Call:
##    msa(shrooms_vector_ss, method = "ClustalW")
## 
## MsaAAMultipleAlignment with 14 rows and 2252 columns
##      aln                                                   names
##  [1] -------------------------...------------------------- NP_065768
##  [2] -------------------------...------------------------- AAK95579
##  [3] -------------------------...SVFGGVFPTLTSPL----------- AAF13269
##  [4] -------------------------...SVFGGVFPTLTSPL----------- AAF13270
##  [5] -------------------------...CTFSGIFPTLTSPL----------- NP_065910
##  [6] -------------------------...NKS--LPPPLTSSL----------- ABD59319
##  [7] -------------------------...------------------------- CAA58534
##  [8] -------------------------...------------------------- ABD19518
##  [9] -------------------------...LT----------------------- NP_597713
## [10] -------------------------...------------------------- CAA78718
## [11] -------------------------...------------------------- EAA12598
## [12] -------------------------...------------------------- ABA81834
## [13] MTELQPSPPGYRVQDEAPGPPSCPP...------------------------- XP_392427
## [14] -------------------------...AATSSSSNGIGGPEQLNSNATSSYC XP_783573
##  Con -------------------------...------------------------- Consensus

This code chunk is more data preparation. The first function is creating a class name for shrooms_align. The second function is converting the mulitple sequence alignment into a different format that will allow us to more easily print our msa.

# This line give the output "AAMultipleAlignment" when "class(shrooms_align) is the input.
class(shrooms_align) <- "AAMultipleAlignment"

# This line of code is converting the data into a form that we can use to print the msa.
shrooms_align_seqinr <- msaConvert(shrooms_align, type = "seqinr::alignment")

The output of this is an ugly version of the MSA.

print_msa(alignment = shrooms_align_seqinr, 
          chunksize = 60)

Displaying an MSA XXXXXXXX

The ggmsa function creates a colorful MSA using the data. This is far different from the output of the last code chunk which was a long, confusing string of characters.

library(ggmsa)
ggmsa :: ggmsa(shrooms_align,   # shrooms_align, NOT shrooms_align_seqinr
      start = 2000, 
      end = 2100)

Saving an MSA as PDF

This command works to edit the MSA and allow us to download it as a PDF. This did not run for me due to a problem with latex (office hours 9/27).

# msa::msaPrettyPrint(shrooms_align,             # alignment
               file = "shroom_msa.pdf",   # file name
               y=c(2000, 2100),           # range
               askForOverwrite=FALSE)

This command is the get working directory funciton. The working directory is where everything is saved in R. This will save the MSA to R.

getwd()

## [1] "/Users/jaredharris/Downloads"