“Worked example: Building a phylogeny in R”

Introduction

In biology, phylogenies can be used to study evolutionary relationships among different species of organisms. Phylogenies can be made using morphological data or molecular data (nucleic acids or proteins). Scientists use phylogenetic trees to visualize evolutionary relationships among different species, and can also show evolutionary.

Vocab

package, function, dependency, accession number, pairwise alignment, multiple sequence alignment, phylogenetic tree, reproducible workflow, YAML header, knitr header

Key functions

rentrez::entrez_fetch Biostrings::pairwiseAlignment Biostrings::pid compbio4all::fasta_cleaner compbio4all::entrez_fetch_list compbio4all::print_msa

Software Preliminaires

CRAN : install.packages(“PackageName”) function Bioconductor: first install.packages(“BiocManager”), then library(BiocManager), and then to install a package within Bioconductor use the install(“PackageName”) function Github: first install.packages(“devtools”), then library(devtools), and then install_github(“DeveloperName/PackageName”)

Load packages into memory

# github packages
library(compbio4all)

# CRAN packages
library(rentrez)
library(seqinr)
library(ape)

# Bioconductor packages
library(msa)
library(Biostrings)

Retrieving macromolecular sequences

In this chunk of code, you are using the entrez_fetch function to retrieve fasta files from NCBI databases on the internet. rentrez :: entrez_fetch

# Human shroom 3 (H. sapiens)
hShroom3 <- entrez_fetch(db = "protein", 
                          id = "NP_065910", 
                          rettype = "fasta")

The cat() function respects the newline character . Without cat(), Rstudio will not respect the newline character and the code will remain on one line.

cat(hShroom3)

## >NP_065910.3 protein Shroom3 [Homo sapiens]
## MMRTTEDFHKPSATLNSNTATKGRYIYLEAFLEGGAPWGFTLKGGLEHGEPLIISKVEEGGKADTLSSKL
## QAGDEVVHINEVTLSSSRKEAVSLVKGSYKTLRLVVRRDVCTDPGHADTGASNFVSPEHLTSGPQHRKAA
## WSGGVKLRLKHRRSEPAGRPHSWHTTKSGEKQPDASMMQISQGMIGPPWHQSYHSSSSTSDLSNYDHAYL
## RRSPDQCSSQGSMESLEPSGAYPPCHLSPAKSTGSIDQLSHFHNKRDSAYSSFSTSSSILEYPHPGISGR
## ERSGSMDNTSARGGLLEGMRQADIRYVKTVYDTRRGVSAEYEVNSSALLLQGREARASANGQGYDKWSNI
## PRGKGVPPPSWSQQCPSSLETATDNLPPKVGAPLPPARSDSYAAFRHRERPSSWSSLDQKRLCRPQANSL
## GSLKSPFIEEQLHTVLEKSPENSPPVKPKHNYTQKAQPGQPLLPTSIYPVPSLEPHFAQVPQPSVSSNGM
## LYPALAKESGYIAPQGACNKMATIDENGNQNGSGRPGFAFCQPLEHDLLSPVEKKPEATAKYVPSKVHFC
## SVPENEEDASLKRHLTPPQGNSPHSNERKSTHSNKPSSHPHSLKCPQAQAWQAGEDKRSSRLSEPWEGDF
## QEDHNANLWRRLEREGLGQSLSGNFGKTKSAFSSLQNIPESLRRHSSLELGRGTQEGYPGGRPTCAVNTK
## AEDPGRKAAPDLGSHLDRQVSYPRPEGRTGASASFNSTDPSPEEPPAPSHPHTSSLGRRGPGPGSASALQ
## GFQYGKPHCSVLEKVSKFEQREQGSQRPSVGGSGFGHNYRPHRTVSTSSTSGNDFEETKAHIRFSESAEP
## LGNGEQHFKNGELKLEEASRQPCGQQLSGGASDSGRGPQRPDARLLRSQSTFQLSSEPEREPEWRDRPGS
## PESPLLDAPFSRAYRNSIKDAQSRVLGATSFRRRDLELGAPVASRSWRPRPSSAHVGLRSPEASASASPH
## TPRERHSVTPAEGDLARPVPPAARRGARRRLTPEQKKRSYSEPEKMNEVGIVEEAEPAPLGPQRNGMRFP
## ESSVADRRRLFERDGKACSTLSLSGPELKQFQQSALADYIQRKTGKRPTSAAGCSLQEPGPLRERAQSAY
## LQPGPAALEGSGLASASSLSSLREPSLQPRREATLLPATVAETQQAPRDRSSSFAGGRRLGERRRGDLLS
## GANGGTRGTQRGDETPREPSSWGARAGKSMSAEDLLERSDVLAGPVHVRSRSSPATADKRQDVLLGQDSG
## FGLVKDPCYLAGPGSRSLSCSERGQEEMLPLFHHLTPRWGGSGCKAIGDSSVPSECPGTLDHQRQASRTP
## CPRPPLAGTQGLVTDTRAAPLTPIGTPLPSAIPSGYCSQDGQTGRQPLPPYTPAMMHRSNGHTLTQPPGP
## RGCEGDGPEHGVEEGTRKRVSLPQWPPPSRAKWAHAAREDSLPEESSAPDFANLKHYQKQQSLPSLCSTS
## DPDTPLGAPSTPGRISLRISESVLRDSPPPHEDYEDEVFVRDPHPKATSSPTFEPLPPPPPPPPSQETPV
## YSMDDFPPPPPHTVCEAQLDSEDPEGPRPSFNKLSKVTIARERHMPGAAHVVGSQTLASRLQTSIKGSEA
## ESTPPSFMSVHAQLAGSLGGQPAPIQTQSLSHDPVSGTQGLEKKVSPDPQKSSEDIRTEALAKEIVHQDK
## SLADILDPDSRLKTTMDLMEGLFPRDVNLLKENSVKRKAIQRTVSSSGCEGKRNEDKEAVSMLVNCPAYY
## SVSAPKAELLNKIKEMPAEVNEEEEQADVNEKKAELIGSLTHKLETLQEAKGSLLTDIKLNNALGEEVEA
## LISELCKPNEFDKYRMFIGDLDKVVNLLLSLSGRLARVENVLSGLGEDASNEERSSLYEKRKILAGQHED
## ARELKENLDRRERVVLGILANYLSEEQLQDYQHFVKMKSTLLIEQRKLDDKIKLGQEQVKCLLESLPSDF
## IPKAGALALPPNLTSEPIPAGGCTFSGIFPTLTSPL

In this chunk of code, R using the entrez_fetch function to retrieve the sequences of the 3 shroom proteins from the NCBI database, in FASTA format.

# Mouse shroom 3a (M. musculus)
mShroom3a <- entrez_fetch(db = "protein", 
                          id = "AAF13269", 
                          rettype = "fasta")

# Human shroom 2 (H. sapiens)
hShroom2 <- entrez_fetch(db = "protein", 
                          id = "CAA58534", 
                          rettype = "fasta")


# Sea-urchin shroom
sShroom <- entrez_fetch(db = "protein", 
                          id = "XP_783573", 
                          rettype = "fasta")

This chunk of code is telling you the number of amino acids in each protein. In general, the nchar() function tells you the number of characters in an object.

nchar(hShroom3)

## [1] 2070

nchar(mShroom3a)

## [1] 2083

nchar(sShroom)

## [1] 1758

nchar(hShroom2)

## [1] 1673

Prepping macromolecular sequences

The fasta_cleaner function converts a FASTA file stored as an object into a vector, making it friendly to do alignments in R.

fasta_cleaner

## function (fasta_object, parse = TRUE) 
## {
##     fasta_object <- sub("^(>)(.*?)(\\n)(.*)(\\n\\n)", "\\4", 
##         fasta_object)
##     fasta_object <- gsub("\n", "", fasta_object)
##     if (parse == TRUE) {
##         fasta_object <- stringr::str_split(fasta_object, pattern = "", 
##             simplify = FALSE)
##     }
##     return(fasta_object[[1]])
## }
## <bytecode: 0x7f99595b1660>
## <environment: namespace:compbio4all>

Use function() in order to define a new function that would essentially accomplish the same goal as the preexisting one contained in a different package. The function() function is is Base R so no packages are needed to use it.

fasta_cleaner <- function(fasta_object, parse = TRUE){

  fasta_object <- sub("^(>)(.*?)(\\n)(.*)(\\n\\n)","\\4",fasta_object)
  fasta_object <- gsub("\n", "", fasta_object)

  if(parse == TRUE){
    fasta_object <- stringr::str_split(fasta_object,
                                       pattern = "",
                                       simplify = FALSE)
  }

  return(fasta_object[[1]])
}

In this code chunk, R takes the fasta sequences of each shroom proteins and converts them into a character vector.

hShroom3  <- fasta_cleaner(hShroom3,  parse = F)
mShroom3a <- fasta_cleaner(mShroom3a, parse = F)
hShroom2  <- fasta_cleaner(hShroom2,  parse = F)
sShroom   <- fasta_cleaner(sShroom,   parse = F)

hShroom3

## [1] "MMRTTEDFHKPSATLNSNTATKGRYIYLEAFLEGGAPWGFTLKGGLEHGEPLIISKVEEGGKADTLSSKLQAGDEVVHINEVTLSSSRKEAVSLVKGSYKTLRLVVRRDVCTDPGHADTGASNFVSPEHLTSGPQHRKAAWSGGVKLRLKHRRSEPAGRPHSWHTTKSGEKQPDASMMQISQGMIGPPWHQSYHSSSSTSDLSNYDHAYLRRSPDQCSSQGSMESLEPSGAYPPCHLSPAKSTGSIDQLSHFHNKRDSAYSSFSTSSSILEYPHPGISGRERSGSMDNTSARGGLLEGMRQADIRYVKTVYDTRRGVSAEYEVNSSALLLQGREARASANGQGYDKWSNIPRGKGVPPPSWSQQCPSSLETATDNLPPKVGAPLPPARSDSYAAFRHRERPSSWSSLDQKRLCRPQANSLGSLKSPFIEEQLHTVLEKSPENSPPVKPKHNYTQKAQPGQPLLPTSIYPVPSLEPHFAQVPQPSVSSNGMLYPALAKESGYIAPQGACNKMATIDENGNQNGSGRPGFAFCQPLEHDLLSPVEKKPEATAKYVPSKVHFCSVPENEEDASLKRHLTPPQGNSPHSNERKSTHSNKPSSHPHSLKCPQAQAWQAGEDKRSSRLSEPWEGDFQEDHNANLWRRLEREGLGQSLSGNFGKTKSAFSSLQNIPESLRRHSSLELGRGTQEGYPGGRPTCAVNTKAEDPGRKAAPDLGSHLDRQVSYPRPEGRTGASASFNSTDPSPEEPPAPSHPHTSSLGRRGPGPGSASALQGFQYGKPHCSVLEKVSKFEQREQGSQRPSVGGSGFGHNYRPHRTVSTSSTSGNDFEETKAHIRFSESAEPLGNGEQHFKNGELKLEEASRQPCGQQLSGGASDSGRGPQRPDARLLRSQSTFQLSSEPEREPEWRDRPGSPESPLLDAPFSRAYRNSIKDAQSRVLGATSFRRRDLELGAPVASRSWRPRPSSAHVGLRSPEASASASPHTPRERHSVTPAEGDLARPVPPAARRGARRRLTPEQKKRSYSEPEKMNEVGIVEEAEPAPLGPQRNGMRFPESSVADRRRLFERDGKACSTLSLSGPELKQFQQSALADYIQRKTGKRPTSAAGCSLQEPGPLRERAQSAYLQPGPAALEGSGLASASSLSSLREPSLQPRREATLLPATVAETQQAPRDRSSSFAGGRRLGERRRGDLLSGANGGTRGTQRGDETPREPSSWGARAGKSMSAEDLLERSDVLAGPVHVRSRSSPATADKRQDVLLGQDSGFGLVKDPCYLAGPGSRSLSCSERGQEEMLPLFHHLTPRWGGSGCKAIGDSSVPSECPGTLDHQRQASRTPCPRPPLAGTQGLVTDTRAAPLTPIGTPLPSAIPSGYCSQDGQTGRQPLPPYTPAMMHRSNGHTLTQPPGPRGCEGDGPEHGVEEGTRKRVSLPQWPPPSRAKWAHAAREDSLPEESSAPDFANLKHYQKQQSLPSLCSTSDPDTPLGAPSTPGRISLRISESVLRDSPPPHEDYEDEVFVRDPHPKATSSPTFEPLPPPPPPPPSQETPVYSMDDFPPPPPHTVCEAQLDSEDPEGPRPSFNKLSKVTIARERHMPGAAHVVGSQTLASRLQTSIKGSEAESTPPSFMSVHAQLAGSLGGQPAPIQTQSLSHDPVSGTQGLEKKVSPDPQKSSEDIRTEALAKEIVHQDKSLADILDPDSRLKTTMDLMEGLFPRDVNLLKENSVKRKAIQRTVSSSGCEGKRNEDKEAVSMLVNCPAYYSVSAPKAELLNKIKEMPAEVNEEEEQADVNEKKAELIGSLTHKLETLQEAKGSLLTDIKLNNALGEEVEALISELCKPNEFDKYRMFIGDLDKVVNLLLSLSGRLARVENVLSGLGEDASNEERSSLYEKRKILAGQHEDARELKENLDRRERVVLGILANYLSEEQLQDYQHFVKMKSTLLIEQRKLDDKIKLGQEQVKCLLESLPSDFIPKAGALALPPNLTSEPIPAGGCTFSGIFPTLTSPL"

Aligning sequences

Pairwise Alignment

The code below is doing a pairwise alignment of the proteins hShroom3 and mShroom3a and storing the result in a variable called align.h3.vs.m3a.

# add necessary function
library(Biostrings)
align.h3.vs.m3a <- pairwiseAlignment(
                  hShroom3,
                  mShroom3a)

This object shows the global pairwise alignment between hShroom3 and mShroom3a, and also gives the score of the alignment. You can see all of the indels.

align.h3.vs.m3a

## Global PairwiseAlignmentsSingleSubject (1 of 1)
## pattern: MMRTTEDFHKPSATLN-SNTATKGRYIYLEAFLE...KAGALALPPNLTSEPIPAGGCTFSGIFPTLTSPL
## subject: MK-TPENLEEPSATPNPSRTPTE-RFVYLEALLE...KAGAISLPPALTGHATPGGTSVFGGVFPTLTSPL
## score: 2189.934

This code chunk just displays the score of the alignment without the alignment itself.

# add necessary function
Biostrings:: score(align.h3.vs.m3a)

## [1] 2189.934

This code chunk shows all the indels (insertions or deletions) in addition to the alignment score whereas the previous code chunk just shows the score of the alignment.

align.h3.vs.h2 <- Biostrings::pairwiseAlignment(
                  hShroom3,
                  hShroom2)

The score() function only shows the alignment score while in the previous code chunk, the output included the full pairwise alignment and you could see all of the indels.

score(align.h3.vs.h2)

## [1] -5673.853

pid() calculates the percent or proportion of sequence identity for a pairwise sequence alignment, and doesn’t include indels. The score is a quantitative, numerical value and not a proportion.

Biostrings::pid(align.h3.vs.h2)

## [1] 33.83277

The shroom family of genes

The full table is included here so that you can reference the accession numbers and names for each protein.

shroom_table <- c("CAA78718" , "X. laevis Apx" ,         "xShroom1",
            "NP_597713" , "H. sapiens APXL2" ,     "hShroom1",
            "CAA58534" , "H. sapiens APXL",        "hShroom2",
            "ABD19518" , "M. musculus Apxl" ,      "mShroom2",
            "AAF13269" , "M. musculus ShroomL" ,   "mShroom3a",
            "AAF13270" , "M. musculus ShroomS" ,   "mShroom3b",
            "NP_065910", "H. sapiens Shroom" ,     "hShroom3",
            "ABD59319" , "X. laevis Shroom-like",  "xShroom3",
            "NP_065768", "H. sapiens KIAA1202" ,   "hShroom4a",
            "AAK95579" , "H. sapiens SHAP-A" ,     "hShroom4b",
            #"DQ435686" , "M. musculus KIAA1202" ,  "mShroom4",
            "ABA81834" , "D. melanogaster Shroom", "dmShroom",
            "EAA12598" , "A. gambiae Shroom",      "agShroom",
            "XP_392427" , "A. mellifera Shroom" ,  "amShroom",
            "XP_783573" , "S. purpuratus Shroom" , "spShroom") #sea urchin

In this code chunk, the set of values contained in shroom_table are converted into a matrix, and then a data frame, which is then made easier to read by naming columns and creating more simplified species names.

# convert to matrix
shroom_table_matrix <- matrix(shroom_table,
                                  byrow = T,
                                  nrow = 14)
# taking the values contained into shroom_table and creating a matrix using them

# convert to data frame
shroom_table <- data.frame(shroom_table_matrix, 
                     stringsAsFactors = F)
# converted the matrix into a data frame

# naming columns
names(shroom_table) <- c("accession","name.orig","name.new")
# assinging names to each column in the table

# Create simplified species names
shroom_table$spp <- "Homo"
shroom_table$spp[grep("laevis",shroom_table$name.orig)] <- "Xenopus"
shroom_table$spp[grep("musculus",shroom_table$name.orig)] <- "Mus"
shroom_table$spp[grep("melanogaster",shroom_table$name.orig)] <- "Drosophila"
shroom_table$spp[grep("gambiae",shroom_table$name.orig)] <- "mosquito"
shroom_table$spp[grep("mellifera",shroom_table$name.orig)] <- "bee"
shroom_table$spp[grep("purpuratus",shroom_table$name.orig)] <- "sea urchin"
# updating the names of the species to more simplified ones in the column spp

This line of code is displaying the data frame containing shroom_table_matrix. In the table, the columns are accession, name.orig, name.new, and spp, all of which are different names/classifications of each protein.

shroom_table

##    accession              name.orig  name.new        spp
## 1   CAA78718          X. laevis Apx  xShroom1    Xenopus
## 2  NP_597713       H. sapiens APXL2  hShroom1       Homo
## 3   CAA58534        H. sapiens APXL  hShroom2       Homo
## 4   ABD19518       M. musculus Apxl  mShroom2        Mus
## 5   AAF13269    M. musculus ShroomL mShroom3a        Mus
## 6   AAF13270    M. musculus ShroomS mShroom3b        Mus
## 7  NP_065910      H. sapiens Shroom  hShroom3       Homo
## 8   ABD59319  X. laevis Shroom-like  xShroom3    Xenopus
## 9  NP_065768    H. sapiens KIAA1202 hShroom4a       Homo
## 10  AAK95579      H. sapiens SHAP-A hShroom4b       Homo
## 11  ABA81834 D. melanogaster Shroom  dmShroom Drosophila
## 12  EAA12598      A. gambiae Shroom  agShroom   mosquito
## 13 XP_392427    A. mellifera Shroom  amShroom        bee
## 14 XP_783573   S. purpuratus Shroom  spShroom sea urchin

Indexing multiple sequences

The $ allows us to specifically index certain rows or columns of data.

shroom_table$accession

##  [1] "CAA78718"  "NP_597713" "CAA58534"  "ABD19518"  "AAF13269"  "AAF13270" 
##  [7] "NP_065910" "ABD59319"  "NP_065768" "AAK95579"  "ABA81834"  "EAA12598" 
## [13] "XP_392427" "XP_783573"

This code chunk fetches the protein sequences from the shroom table by indexing the accession numbers.

# add necessary function
library(rentrez)
shrooms <- entrez_fetch(db = "protein", 
                        id = shroom_table$accession, 
                        rettype = "fasta")

This code chunk is displaying all of the protein sequences with newlines.

cat(shrooms)

This code chunk uses the entrez_fetch_list function and all of the shroom protein sequences are displayed separated by the accession number, while the previous code chunk using the entrez_fetch function just displayed all of the amino acid sequences in one large chunk, not separated.shr

shrooms_list <- entrez_fetch_list(db = "protein", 
                          id = shroom_table$accession, 
                          rettype = "fasta")
is(shrooms_list)

## [1] "list"             "vector"           "list_OR_List"     "vector_OR_Vector"
## [5] "vector_OR_factor"

length(shrooms_list)

## [1] 14

nchar(shrooms_list)

##  CAA78718 NP_597713  CAA58534  ABD19518  AAF13269  AAF13270 NP_065910  ABD59319 
##      1486       915      1673      1543      2083      1895      2070      1864 
## NP_065768  AAK95579  ABA81834  EAA12598 XP_392427 XP_783573 
##      1560       778      1647       750      2230      1758

This line of code returns the number of components in shrooms_list. We use it to double check and make sure the number of components are what we intended for it to be.

length(shrooms_list)

## [1] 14

The for loop makes the code more efficient so that you only have to write one line of code in order to clean all the fasta files in shrooms_listshr.

for(i in 1:length(shrooms_list)){
  shrooms_list[[i]] <- fasta_cleaner(shrooms_list[[i]], parse = F)
}

In summary, this code chunk is reassigning the values of shrooms_list to a new variable called shrooms_vector, since the fasta cleaner function converted the object shrooms_list into a readable vector.

# XXXXXXXXCX
shrooms_vector <- rep(NA, length(shrooms_list))
# displays each component of shrooms_list

# XXXXXXXXCX
for(i in 1:length(shrooms_vector)){
  shrooms_vector[i] <- shrooms_list[[i]]
}
# for each component of shrooms_list the code reassings it to shrooms_vector

#  XXXXXXXXCX
names(shrooms_vector) <- names(shrooms_list)
# reassigns the names of shrooms_list to shrooms_vector

TODO: explain what this is doing then add the necessary function.

# add necessary function
shrooms_vector_ss <- Biostrings::sort(shrooms_vector)

MSA

In this section, the code builds a multiple sequence alignment in order to compare multiple or more than 2 protein or nucleic acid sequences.

Building a Multiple Sequence Alignment (MSA)

TODO: briefly explain what this chunk does, then add the necessary function.

# add necessary function
library(seqinr)
shrooms_align <- msa(shrooms_vector_ss,
                     type = "protein", method = "ClustalW")

## use default substitution matrix

Viewing an MSA

TODO: briefly summarize what this section will do.

Viewing an MSA in R

TODO: Briefly summarize what output is shown below

shrooms_align

## CLUSTAL 2.1  
## 
## Call:
##    msa(shrooms_vector_ss, type = "protein", method = "ClustalW")
## 
## MsaAAMultipleAlignment with 14 rows and 2252 columns
##      aln                                                   names
##  [1] -------------------------...SVFGGVFPTLTSPL----------- AAF13269
##  [2] -------------------------...SVFGGVFPTLTSPL----------- AAF13270
##  [3] -------------------------...CTFSGIFPTLTSPL----------- NP_065910
##  [4] -------------------------...NKS--LPPPLTSSL----------- ABD59319
##  [5] -------------------------...------------------------- ABD19518
##  [6] -------------------------...------------------------- CAA58534
##  [7] -------------------------...------------------------- NP_065768
##  [8] -------------------------...------------------------- AAK95579
##  [9] -------------------------...LT----------------------- NP_597713
## [10] -------------------------...------------------------- CAA78718
## [11] -------------------------...------------------------- EAA12598
## [12] -------------------------...------------------------- ABA81834
## [13] MTELQPSPPGYRVQDEAPGPPSCPP...------------------------- XP_392427
## [14] -------------------------...AATSSSSNGIGGPEQLNSNATSSYC XP_783573
##  Con -------------------------...------------------------- Consensus

TODO: briefly explain what is being done in this chunk. This is tricky (and annoying) so do your best

# WHAT IS THE LINE BELOW DOING? (its tricky - do your best)
class(shrooms_align) <- "AAMultipleAlignment"

# WHAT IS THE LINE BELOW DOING? This is simpler
shrooms_align_seqinr <- msaConvert(shrooms_align, type = "seqinr::alignment")

TODO: what is the output this produces

print_msa(alignment = shrooms_align_seqinr, 
          chunksize = 60)

Displaying an MSA XXXXXXXX

TODO: explain this output and how its differnet from the prevoius

## add necessary function
ggmsa::ggmsa(shrooms_align,   
      start = 2000, 
      end = 2100)

## Registered S3 methods overwritten by 'ggalt':
##   method                  from   
##   grid.draw.absoluteGrob  ggplot2
##   grobHeight.absoluteGrob ggplot2
##   grobWidth.absoluteGrob  ggplot2
##   grobX.absoluteGrob      ggplot2
##   grobY.absoluteGrob      ggplot2

#### Saving an MSA as PDF

TODO: explain what this command is doing. Add the package the function is coming from using :: notation This may not work for everyone. If its not working you can comment it out.

# msa::msaPrettyPrint
# msaPrettyPrint(shrooms_align,             # alignment
               # file = "shroom_msa.pdf",   # file name
               # y=c(2000, 2100),           # range
               # askForOverwrite=FALSE)

The command getwd returns a file path showing the current R working directory.

getwd()

## [1] "/Users/dsesha02"

Portfolio 1

Deeksha Sesha

9/25/2021

A complete bioinformatics workflow in R