Assignment: Your assignment is to use your notes from class - along with help from classmates, UTAs, and me - to turn this script into a fleshed-out description of what is going on.
This is a substantial project - we’ll work on it in steps over the rest of the unit.
We are currently focused on the overall process and will cover the details over the rest of this unit.
Your first assignment is to get this script to run from top to bottom by adding all of the missing R commands. Once you have done that, you can knit it into an HTML file and upload it to RPubs. (Note - you’ll need to add the YAML header!)
Your second assignment, which will be posted later, is to answer all the TODO and other prompts to add information. You can start on this, but you don’t have to do this on your first time through the code.
Delete all the prompts like TODO() as you compete them. Use RStudio’s search function to see if you’ve missed any - there are a LOT!
Add YAML header!!! Give it a title — title: “MSA-walkthrough-assignment-part01” author: “Ian Snyder/Nathan Brouwer” date: “9/26/2021” output: html_document —
By: Nathan L. Brouwer
Phylogenetics is a way to analyze and understand relationships between evolutionary groups of organisims through molecular sequencing data along with morphological data matrices.
compbio4all::fasta_cleaner() compbio4all::entrez_fetch_list() rentrez::entrez_fetch() Biostrings::pairwiseAlignment() Biostrings::pid()
Add the necessary calls to library() to load call packages Indicate which packages cam from Bioconducotr, CRAN, and GitHub
# github packages
library(compbio4all)
# CRAN packages
library(rentrez)
library(seqinr)
library(ape)
# Bioconductor packages
library(msa)
library(Biostrings)
Here we are fetching a macromolecular sequences from a protein database to a specific accession id to be read into R as an object that we can analyze.
# Human shroom 3 (H. sapiens)
hShroom3 <- entrez_fetch(db = "protein",
id = "NP_065910",
rettype = "fasta")
cat() function is used to improve the readability of output functions by respecting code conventions such as /n.
cat(hShroom3)
## >NP_065910.3 protein Shroom3 [Homo sapiens]
## MMRTTEDFHKPSATLNSNTATKGRYIYLEAFLEGGAPWGFTLKGGLEHGEPLIISKVEEGGKADTLSSKL
## QAGDEVVHINEVTLSSSRKEAVSLVKGSYKTLRLVVRRDVCTDPGHADTGASNFVSPEHLTSGPQHRKAA
## WSGGVKLRLKHRRSEPAGRPHSWHTTKSGEKQPDASMMQISQGMIGPPWHQSYHSSSSTSDLSNYDHAYL
## RRSPDQCSSQGSMESLEPSGAYPPCHLSPAKSTGSIDQLSHFHNKRDSAYSSFSTSSSILEYPHPGISGR
## ERSGSMDNTSARGGLLEGMRQADIRYVKTVYDTRRGVSAEYEVNSSALLLQGREARASANGQGYDKWSNI
## PRGKGVPPPSWSQQCPSSLETATDNLPPKVGAPLPPARSDSYAAFRHRERPSSWSSLDQKRLCRPQANSL
## GSLKSPFIEEQLHTVLEKSPENSPPVKPKHNYTQKAQPGQPLLPTSIYPVPSLEPHFAQVPQPSVSSNGM
## LYPALAKESGYIAPQGACNKMATIDENGNQNGSGRPGFAFCQPLEHDLLSPVEKKPEATAKYVPSKVHFC
## SVPENEEDASLKRHLTPPQGNSPHSNERKSTHSNKPSSHPHSLKCPQAQAWQAGEDKRSSRLSEPWEGDF
## QEDHNANLWRRLEREGLGQSLSGNFGKTKSAFSSLQNIPESLRRHSSLELGRGTQEGYPGGRPTCAVNTK
## AEDPGRKAAPDLGSHLDRQVSYPRPEGRTGASASFNSTDPSPEEPPAPSHPHTSSLGRRGPGPGSASALQ
## GFQYGKPHCSVLEKVSKFEQREQGSQRPSVGGSGFGHNYRPHRTVSTSSTSGNDFEETKAHIRFSESAEP
## LGNGEQHFKNGELKLEEASRQPCGQQLSGGASDSGRGPQRPDARLLRSQSTFQLSSEPEREPEWRDRPGS
## PESPLLDAPFSRAYRNSIKDAQSRVLGATSFRRRDLELGAPVASRSWRPRPSSAHVGLRSPEASASASPH
## TPRERHSVTPAEGDLARPVPPAARRGARRRLTPEQKKRSYSEPEKMNEVGIVEEAEPAPLGPQRNGMRFP
## ESSVADRRRLFERDGKACSTLSLSGPELKQFQQSALADYIQRKTGKRPTSAAGCSLQEPGPLRERAQSAY
## LQPGPAALEGSGLASASSLSSLREPSLQPRREATLLPATVAETQQAPRDRSSSFAGGRRLGERRRGDLLS
## GANGGTRGTQRGDETPREPSSWGARAGKSMSAEDLLERSDVLAGPVHVRSRSSPATADKRQDVLLGQDSG
## FGLVKDPCYLAGPGSRSLSCSERGQEEMLPLFHHLTPRWGGSGCKAIGDSSVPSECPGTLDHQRQASRTP
## CPRPPLAGTQGLVTDTRAAPLTPIGTPLPSAIPSGYCSQDGQTGRQPLPPYTPAMMHRSNGHTLTQPPGP
## RGCEGDGPEHGVEEGTRKRVSLPQWPPPSRAKWAHAAREDSLPEESSAPDFANLKHYQKQQSLPSLCSTS
## DPDTPLGAPSTPGRISLRISESVLRDSPPPHEDYEDEVFVRDPHPKATSSPTFEPLPPPPPPPPSQETPV
## YSMDDFPPPPPHTVCEAQLDSEDPEGPRPSFNKLSKVTIARERHMPGAAHVVGSQTLASRLQTSIKGSEA
## ESTPPSFMSVHAQLAGSLGGQPAPIQTQSLSHDPVSGTQGLEKKVSPDPQKSSEDIRTEALAKEIVHQDK
## SLADILDPDSRLKTTMDLMEGLFPRDVNLLKENSVKRKAIQRTVSSSGCEGKRNEDKEAVSMLVNCPAYY
## SVSAPKAELLNKIKEMPAEVNEEEEQADVNEKKAELIGSLTHKLETLQEAKGSLLTDIKLNNALGEEVEA
## LISELCKPNEFDKYRMFIGDLDKVVNLLLSLSGRLARVENVLSGLGEDASNEERSSLYEKRKILAGQHED
## ARELKENLDRRERVVLGILANYLSEEQLQDYQHFVKMKSTLLIEQRKLDDKIKLGQEQVKCLLESLPSDF
## IPKAGALALPPNLTSEPIPAGGCTFSGIFPTLTSPL
The Entrez Fetch function is a standard interface to read and handle data as an object. This chunk is using this to aquire several different shroom data files for different species.
# Mouse shroom 3a (M. musculus)
mShroom3a <- entrez_fetch(db = "protein",
id = "AAF13269",
rettype = "fasta")
# Human shroom 2 (H. sapiens)
hShroom2 <- entrez_fetch(db = "protein",
id = "CAA58534",
rettype = "fasta")
# Sea-urchin shroom
sShroom <- entrez_fetch(db = "protein",
id = "XP_783573",
rettype = "fasta")
Here we compare the number of characters between our different shroom objects.
nchar(hShroom3)
## [1] 2070
nchar(mShroom3a)
## [1] 2083
nchar(sShroom)
## [1] 1758
nchar(hShroom2)
## [1] 1673
Here we are defining the name of a function which we will assign shortly.
fasta_cleaner
## function (fasta_object, parse = TRUE)
## {
## fasta_object <- sub("^(>)(.*?)(\\n)(.*)(\\n\\n)", "\\4",
## fasta_object)
## fasta_object <- gsub("\n", "", fasta_object)
## if (parse == TRUE) {
## fasta_object <- stringr::str_split(fasta_object, pattern = "",
## simplify = FALSE)
## }
## return(fasta_object[[1]])
## }
## <bytecode: 0x7fdaee37cc88>
## <environment: namespace:compbio4all>
By using the function() command we are able to define a function, a function takes in some data object, runs a set of commands on said object, and returns a specific output relative to the input.
fasta_cleaner <- function(fasta_object, parse = TRUE){
fasta_object <- sub("^(>)(.*?)(\\n)(.*)(\\n\\n)","\\4",fasta_object)
fasta_object <- gsub("\n", "", fasta_object)
if(parse == TRUE){
fasta_object <- stringr::str_split(fasta_object,
pattern = "",
simplify = FALSE)
}
return(fasta_object[[1]])
}
Here we are running the fasta cleaner function we just defined on our various shroom objects. This function greatly improves the readability of our data.
hShroom3 <- fasta_cleaner(hShroom3, parse = F)
mShroom3a <- fasta_cleaner(mShroom3a, parse = F)
hShroom2 <- fasta_cleaner(hShroom2, parse = F)
sShroom <- fasta_cleaner(sShroom, parse = F)
hShroom3
## [1] "MMRTTEDFHKPSATLNSNTATKGRYIYLEAFLEGGAPWGFTLKGGLEHGEPLIISKVEEGGKADTLSSKLQAGDEVVHINEVTLSSSRKEAVSLVKGSYKTLRLVVRRDVCTDPGHADTGASNFVSPEHLTSGPQHRKAAWSGGVKLRLKHRRSEPAGRPHSWHTTKSGEKQPDASMMQISQGMIGPPWHQSYHSSSSTSDLSNYDHAYLRRSPDQCSSQGSMESLEPSGAYPPCHLSPAKSTGSIDQLSHFHNKRDSAYSSFSTSSSILEYPHPGISGRERSGSMDNTSARGGLLEGMRQADIRYVKTVYDTRRGVSAEYEVNSSALLLQGREARASANGQGYDKWSNIPRGKGVPPPSWSQQCPSSLETATDNLPPKVGAPLPPARSDSYAAFRHRERPSSWSSLDQKRLCRPQANSLGSLKSPFIEEQLHTVLEKSPENSPPVKPKHNYTQKAQPGQPLLPTSIYPVPSLEPHFAQVPQPSVSSNGMLYPALAKESGYIAPQGACNKMATIDENGNQNGSGRPGFAFCQPLEHDLLSPVEKKPEATAKYVPSKVHFCSVPENEEDASLKRHLTPPQGNSPHSNERKSTHSNKPSSHPHSLKCPQAQAWQAGEDKRSSRLSEPWEGDFQEDHNANLWRRLEREGLGQSLSGNFGKTKSAFSSLQNIPESLRRHSSLELGRGTQEGYPGGRPTCAVNTKAEDPGRKAAPDLGSHLDRQVSYPRPEGRTGASASFNSTDPSPEEPPAPSHPHTSSLGRRGPGPGSASALQGFQYGKPHCSVLEKVSKFEQREQGSQRPSVGGSGFGHNYRPHRTVSTSSTSGNDFEETKAHIRFSESAEPLGNGEQHFKNGELKLEEASRQPCGQQLSGGASDSGRGPQRPDARLLRSQSTFQLSSEPEREPEWRDRPGSPESPLLDAPFSRAYRNSIKDAQSRVLGATSFRRRDLELGAPVASRSWRPRPSSAHVGLRSPEASASASPHTPRERHSVTPAEGDLARPVPPAARRGARRRLTPEQKKRSYSEPEKMNEVGIVEEAEPAPLGPQRNGMRFPESSVADRRRLFERDGKACSTLSLSGPELKQFQQSALADYIQRKTGKRPTSAAGCSLQEPGPLRERAQSAYLQPGPAALEGSGLASASSLSSLREPSLQPRREATLLPATVAETQQAPRDRSSSFAGGRRLGERRRGDLLSGANGGTRGTQRGDETPREPSSWGARAGKSMSAEDLLERSDVLAGPVHVRSRSSPATADKRQDVLLGQDSGFGLVKDPCYLAGPGSRSLSCSERGQEEMLPLFHHLTPRWGGSGCKAIGDSSVPSECPGTLDHQRQASRTPCPRPPLAGTQGLVTDTRAAPLTPIGTPLPSAIPSGYCSQDGQTGRQPLPPYTPAMMHRSNGHTLTQPPGPRGCEGDGPEHGVEEGTRKRVSLPQWPPPSRAKWAHAAREDSLPEESSAPDFANLKHYQKQQSLPSLCSTSDPDTPLGAPSTPGRISLRISESVLRDSPPPHEDYEDEVFVRDPHPKATSSPTFEPLPPPPPPPPSQETPVYSMDDFPPPPPHTVCEAQLDSEDPEGPRPSFNKLSKVTIARERHMPGAAHVVGSQTLASRLQTSIKGSEAESTPPSFMSVHAQLAGSLGGQPAPIQTQSLSHDPVSGTQGLEKKVSPDPQKSSEDIRTEALAKEIVHQDKSLADILDPDSRLKTTMDLMEGLFPRDVNLLKENSVKRKAIQRTVSSSGCEGKRNEDKEAVSMLVNCPAYYSVSAPKAELLNKIKEMPAEVNEEEEQADVNEKKAELIGSLTHKLETLQEAKGSLLTDIKLNNALGEEVEALISELCKPNEFDKYRMFIGDLDKVVNLLLSLSGRLARVENVLSGLGEDASNEERSSLYEKRKILAGQHEDARELKENLDRRERVVLGILANYLSEEQLQDYQHFVKMKSTLLIEQRKLDDKIKLGQEQVKCLLESLPSDFIPKAGALALPPNLTSEPIPAGGCTFSGIFPTLTSPL"
Here we use a bioconductor function to run pairwise allignment on 2 of our shroom objects. As defined by biostrings, “This function aligns a set of pattern strings to a subject using either a fixed or quality-based substitution scoring scheme.”
# add necessary function
library(Biostrings)
align.h3.vs.m3a <- Biostrings:: pairwiseAlignment(
hShroom3,
mShroom3a)
This call gives us the output data including the score from our pairwise allignment on shroom data.
align.h3.vs.m3a
## Global PairwiseAlignmentsSingleSubject (1 of 1)
## pattern: MMRTTEDFHKPSATLN-SNTATKGRYIYLEAFLE...KAGALALPPNLTSEPIPAGGCTFSGIFPTLTSPL
## subject: MK-TPENLEEPSATPNPSRTPTE-RFVYLEALLE...KAGAISLPPALTGHATPGGTSVFGGVFPTLTSPL
## score: 2189.934
This biostrings function calculates the percent sequence identity for a pairwise sequence alignment.
# add necessary function
Biostrings:: pid(align.h3.vs.m3a)
## [1] 70.56511
Here we run another pairwise alignment test on hShroom3, but compare it to a different shroom model, hShroom2.
align.h3.vs.h2 <- Biostrings::pairwiseAlignment(
hShroom3,
hShroom2)
This function just gives us the score of our pairwise allignment and none of the other output assigned to the object.
score(align.h3.vs.h2)
## [1] -5673.853
Pid gives percent sequence identity vs Score which gives a fixed or quality-based substitution scoring scheme.
Biostrings::pid(align.h3.vs.h2)
## [1] 33.83277
Here we set up a table as a vector of various shroom sequence species and their accession numbers. This basically is just setting up a table useful to reference later.
shroom_table <- c("CAA78718" , "X. laevis Apx" , "xShroom1",
"NP_597713" , "H. sapiens APXL2" , "hShroom1",
"CAA58534" , "H. sapiens APXL", "hShroom2",
"ABD19518" , "M. musculus Apxl" , "mShroom2",
"AAF13269" , "M. musculus ShroomL" , "mShroom3a",
"AAF13270" , "M. musculus ShroomS" , "mShroom3b",
"NP_065910", "H. sapiens Shroom" , "hShroom3",
"ABD59319" , "X. laevis Shroom-like", "xShroom3",
"NP_065768", "H. sapiens KIAA1202" , "hShroom4a",
"AAK95579" , "H. sapiens SHAP-A" , "hShroom4b",
#"DQ435686" , "M. musculus KIAA1202" , "mShroom4",
"ABA81834" , "D. melanogaster Shroom", "dmShroom",
"EAA12598" , "A. gambiae Shroom", "agShroom",
"XP_392427" , "A. mellifera Shroom" , "amShroom",
"XP_783573" , "S. purpuratus Shroom" , "spShroom") #sea urchin
We did not go over this in class but essentially it generates a very large shroom table.
# convert to XXXXXXXXXC
shroom_table_matrix <- matrix(shroom_table,
byrow = T,
nrow = 14)
# convert to XXXXXXXXXC
shroom_table <- data.frame(shroom_table_matrix,
stringsAsFactors = F)
# XXXXXXXXXC columns
names(shroom_table) <- c("accession", "name.orig","name.new")
# Create simplified species names
shroom_table$spp <- "Homo"
shroom_table$spp[grep("laevis",shroom_table$name.orig)] <- "Xenopus"
shroom_table$spp[grep("musculus",shroom_table$name.orig)] <- "Mus"
shroom_table$spp[grep("melanogaster",shroom_table$name.orig)] <- "Drosophila"
shroom_table$spp[grep("gambiae",shroom_table$name.orig)] <- "mosquito"
shroom_table$spp[grep("mellifera",shroom_table$name.orig)] <- "bee"
shroom_table$spp[grep("purpuratus",shroom_table$name.orig)] <- "sea urchin"
Printing out our shroom table generated in the previous chunk.
shroom_table
## accession name.orig name.new spp
## 1 CAA78718 X. laevis Apx xShroom1 Xenopus
## 2 NP_597713 H. sapiens APXL2 hShroom1 Homo
## 3 CAA58534 H. sapiens APXL hShroom2 Homo
## 4 ABD19518 M. musculus Apxl mShroom2 Mus
## 5 AAF13269 M. musculus ShroomL mShroom3a Mus
## 6 AAF13270 M. musculus ShroomS mShroom3b Mus
## 7 NP_065910 H. sapiens Shroom hShroom3 Homo
## 8 ABD59319 X. laevis Shroom-like xShroom3 Xenopus
## 9 NP_065768 H. sapiens KIAA1202 hShroom4a Homo
## 10 AAK95579 H. sapiens SHAP-A hShroom4b Homo
## 11 ABA81834 D. melanogaster Shroom dmShroom Drosophila
## 12 EAA12598 A. gambiae Shroom agShroom mosquito
## 13 XP_392427 A. mellifera Shroom amShroom bee
## 14 XP_783573 S. purpuratus Shroom spShroom sea urchin
This $ symbol allows us to isolate the accession data from our shroom table.
shroom_table$accession
## [1] "CAA78718" "NP_597713" "CAA58534" "ABD19518" "AAF13269" "AAF13270"
## [7] "NP_065910" "ABD59319" "NP_065768" "AAK95579" "ABA81834" "EAA12598"
## [13] "XP_392427" "XP_783573"
We now are running the entrez fetch function explained previously on our various accession data. This will generate a very large and hard to read object.
# add necessary function
shrooms <- rentrez::entrez_fetch(db = "protein",
id = shroom_table$accession,
rettype = "fasta")
is(shrooms)
## [1] "character" "vector"
## [3] "data.frameRowLabels" "SuperClassMethod"
## [5] "character_OR_connection" "character_OR_NULL"
## [7] "atomic" "EnumerationValue"
## [9] "vector_OR_Vector" "vector_OR_factor"
length(shrooms)
## [1] 1
nchar(shrooms)
## [1] 22252
We improve the readability of our shrooms object.
cat(shrooms)
## >CAA78718.1 apical protein [Xenopus laevis]
## MSAFGNTIERWNIKSTGVIAGLGHSERISPVRSMTTLVDSAYSSFSGSSYVPEYQNSFQHDGCHYNDEQL
## SYMDSEYVRAIYNPSLLDKDGVYNDIVSEHGSSKVALSGRSSSSLCSDNTTSVHRTSPAKLDNYVTNLDS
## EKNIYGDPINMKHKQNRPNHKAYGLQRNSPTGINSLQEKENQLYNPSNFMEIKDNYFGRSLDVLQADGDI
## MTQDSYTQNALYFPQNQPDQYRNTQYPGANRMSKEQFKVNDVQKSNEENTERDGPYLTKDGQFVQGQYAS
## DVRTSFKNIRRSLKKSASGKIVAHDSQGSCWIMKPGKDTPSFNSEGTITDMDYDNREQWDIRKSRLSTRA
## SQSLYYESNEDVSGPPLKAMNSKNEVDQTLSFQKDATVKSIPLLSQQLQQEKCKSHPLSDLNCEKITKAS
## TPMLYHLAGGRHSAFIAPVHNTNPAQQEKLKLESKTLERMNNISVLQLSEPRPDNHKLPKNKSLTQLADL
## HDSVEGGNSGNLNSSAEESLMNDYIEKLKVAQKKVLRETSFKRKDLQMSLPCRFKLNPPKRPTIDHFRSY
## SSSSANEESAYLQTKNSADSSYKKDDTEKVAVTRIGGRKRITKEQKKLCYSEPEKLDHLGIQKSNFAWKE
## EPTFANRREMSDSDISANRIKYLESKERTNSSSNLSKTELKQIQHNALVQYMERKTNQRPNSNPQVQMER
## TSLGLPNYNEWSIYSSETSSSDASQKYLRRRSAGASSSYDATVTWNDRFGKTSPLGRSAAEKTAGVQRKT
## FSDQRTLDGSQEHLEGSSPSLSQKTSKSTHNEQVSYVNMEFLPSSHSKNHMYNDRLTVPGDGTSAESGRM
## FVSKSRGKSMEEIGTTDIVKLAELSHSSDQLYHIKGPVISSRLENTRTTAASHQDRLLASTQIETGNLPR
## QTHQESVVGPCRSDLANLGQEAHSWPLRASDVSPGTDNPCSSSPSAEVQPGAPEPLHCLQTEDEVFTPAS
## TARNEEPNSTAFSYLLSTGKPVSQGEATALSFTFLPEQDRLEHPIVSETTPSSESDENVSDAAAEKETTT
## TQLPETSNVNKPLGFTVDNQEVEGDGEPMQPEFIDSSKQLELSSLPSSQVNIMQTAEPYLGDKNIGNEQK
## TEDLEQKSKNPEEDDLPKVKLKSPEDEILEELVKEIVAKDKSLLNCLQPVSVRESAMDLMKSLFPMDVTA
## AEKSRTRGLLGKDKGETLKKNNSDLESSSKLPSKITGMLQKRPDGESLDDITLKKMELLSKIGSKLEDLC
## EQREFLLSDISKNTTNGNNMQTMVKELCKPNEFERYMMFIGDLEKVVSLLFSLSTRLTRVENSLSKVDEN
## TDAEEMQSLKERHNLLSSQREDAKDLKANLDRREQVVTGILVKYLNEEQLQDYKHFVRLKTSLLIEQKNL
## EEKIKVYEEQFESIHNSLPP
##
## >NP_597713.2 protein Shroom1 isoform 2 [Homo sapiens]
## MEALGPGGDRASPASSTSSLDLWHLSMRADSAYSSFSAASGGPEPRTQSPGTDLLPYLDWDYVRVVWGGP
## GPAPPDAALCTSPRPRPAVAARSGPQPTEVPGTPGPLNRQATPLLYALAAEAEAAAQAAEPPSPPASRAA
## YRQRLQGAQRRVLRETSFQRKELRMSLPARLRPTVPARPPATHPRSASLSHPGGEGEPARSRAPAPGTAG
## RGPLANQQRKWCFSEPGKLDRVGRGGGPARECLGEACSSSGLPGPEPLEFQHPALAKFEDHEVGWLPETQ
## PQGSMNLDSGSLKLGDAFRPASRSRSASGEVLGSWGGSGGTIPIVQAVPQGAETPRPLFQTKLSRFLPQK
## EAAVMYPAELPQSSPADSEQRVSETCIVPAWLPSLPDEVFLEEAPLVRMRSPPDPHASQGPPASVHASDQ
## PYGTGLGQRTGQVTVPTEYPLHECPGTAGADDCWQGVNGSVGISRPTSHTPTGTANDNIPTIDPTGLTTN
## PPTAAESDLLKPVPADALGLSGNDTPGPSHNTALARGTGQPGSRPTWPSQCLEELVQELARLDPSLCDPL
## ASQPSPEPPLGLLDGLIPLAEVRAAMRPACGEAGEEAASTFEPGSYQFSFTQLLPAPREETRLENPATHP
## VLDQPCGQGLPAPNNSIQGKKVELAARLQKMLQDLHTEQERLQGEAQAWARRQAALEAAVRQACAPQELE
## RFSRFMADLERVLGLLLLLGSRLARVRRALARAASDSDPDEQRLRLLQRQEEDAKELKEHVARRERAVRE
## VLVRALPVEELRVYCALLAGKAAVLAQQRNLDERIRLLQDQLDAIRDDLGHHAPSPSPARPPGTCPPVQP
## PFPLLLT
##
## >CAA58534.1 APXL [Homo sapiens]
## MEGAEPRARPERLAEAETRAADGGRLVEVQLSGGAPWGFTLKGGREHGEPLVITKIEEGSKAAAVDKLLA
## GDEIVGINDIGLSGFRQEAICLVKGSHKTLKLVVKRRSELGWRPHSWHATKFSDSHPELAASPFTSTSGC
## PSWSGRHHASSSSHDLSSSWEQTNLQRTLDHFSSLGSVDSLDHPSSRLSVAKSNSSIDHLGSHSKRDSAY
## GSFSTSSSTPDHTLSKADTSSAENILYTVGLWEAPRQGGRQAQAAGDPQGSEEKLSCFPPRVPGDSGKGP
## RPEYNAEPKLAAPGRSNFGPVWYVPDKKKAPSSPPPPPPPLRSDSFAATKSHEKAQGPVFSEAAAAQHFT
## ALAQAQPRGDRRPELTDRPWRSAHPGSLGKGSGGPGCPQEAHADGSWPPSKDGASSRLQASLSSSDVRFP
## QSPHSGRHPPLYSDHSPLCADSLGQEPGAASFQNDSPPQVRGLSSCDQKLGSGWQGPRPCVQGDLQAAQL
## WAGCWPSDTALGALESLPPPTVGQSPRHHLPQPEGPPDARETGRCYPLDKGAEGCSAGAQEPPRASRAEK
## ASQRLAASITWADGESSRICPQETPLLHSLTQEGKRRPESSPEDSATRPPPFDAHVGKPTRRSDRFATTL
## RNEIQMHRAKLQKSRSTVALTAAGEAEDGTGRWRAGLGGGTQEGPLAGTYKDHLKEAQARVLRATSFKRR
## DLDPNPGDLYPESLEHRMGDPDTVPHFWEAGLAQPPSSTSGGPHPPRIGGRRRFTAEQKLKSYSEPEKMN
## EVGLTRGYSPHQHPRTSEDTVGTFADRWKFFEETSKPVPQRPAQKQALHGIPRDKPERPRTAGRTCEGTE
## PWSRTTSLGDSLNAHSAAEKAGTSDLPRRLGTFAEYQASWKEQRKPLEARSSGRCHSADDILDVSLDPQE
## RPQHVHGRSRSSPSTDHYKQEASVELRRQAGDPGEPREELPSAVRAEEGQSTPRQADAQCREGSPGSQQH
## PPSQKAPNPPTFSELSHCRGAPELPREGRGRAGTLPRDYRYSEESTPADLGPRAQSPGSPLHARGQDSWP
## VSSALLSKRPAPQRPPPPKREPRRYRATDGAPADAPVGVLGRPFPTPSPASLDVYVARLSLSHSPSVFSS
## AQPQDTPKATVCERGSQHVSGDASRPLPEALLPPKQQHLRLQTATMETSRSPSPQFAPQKLTDKPPLLIQ
## DEDSTRIERVMDNNTTVKMVPIKIVHSESQPEKESRQSLACPAEPPALPHGLEKDQIKTLSTSEQFYSRF
## CLYTRQGAEPEAPHRAQPAEPQPLGTQVPPEKDRCTSPPGLSYMKAKEKTVEDLKSEELAREIVGKDKSL
## ADILDPSVKIKTTMDLMEGIFPKDEHLLEEAQQRRKLLPKIPSPRSTEERKEEPSVPAAVSLATNSTYYS
## TSAPKAELLIKMKDLQEQQEHEEDSGSDLDHDLSVKKQELIESISRKLQVLREARESLLEDVQANTVLGA
## EVEAIVKGVCKPSEFDKFRMFIGDLDKVVNLLLSLSGRLARVENALNNLDDGASPGDRQSLLEKQRVLIQ
## QHEDAKELKENLDRRERIVFDILANYLSEESLADYEHFVKMKSALIIEQRELEDKIHLGEEQLKCLLDSL
## QPERGK
##
## >ABD19518.1 Apxl protein [Mus musculus]
## MEGAEPRARPERLAEAEAPATDGVRLVEVQLSGGAPWGFTLKGGREHGEPLVITKIEEGSKAAAVDKLLA
## GDEIVAINDVSLSGFRQEAICLVKGSHKTLKLVVKRKSDPSWRPHSWHATKYFDVHPEPAASLFLNTSGS
## PSWKSQHQASSSSHDLSGSWEHTSLQRTSDHFSSMGSIDSLDHSSQLYPSGHLSSAKSNSSIDHLGGHSK
## RDSAYGSFSTCSSTPDHTLPKADASSTENILYKVGLWEASRPGSSRQSQSTGDPQGLQDRPSCSIPRVPG
## NSSKSPRPEDNVEPKIATHGRSNFGPVWYVPDKKKAPSPPPLGLPLRSDSFSVAARGHEKARGPPFSDLA
## SMQHFITLPHVQPRGDHRMETTDRQWKLTHLSSGKEIGNVGYQSEGHLDCRWLCSDDRAGRPSGPPGRLQ
## FSDVHFLKSYHGSQHQQQCSDESPRAPSSPRELLHITSGGGLQEPPEPSQDDNPTQVRWPGSAHQKLDDR
## GRSHYFPGSLRQPVQGSAQVVIPRGDYWHSDTTPVDLEYPLLRPVGQRTYLQQHEETPASHEKEGYHQLN
## AGIEGCCSGIQEPPRASRTVRTGLQCPSNDFKLVDGESGRISCQRTPMLHSLTQDGTWRPGNSKDCGNDK
## PPLFDAQVGKPTRRSDRFATTLRNEIQMRRAKLQKSKSTVTLAGDSEAEDCAGDWRADVGAVPEGSFPST
## YKEHLKEAQTRVLKATSFQRRDLDPTPADQYSGPSEHRTFDHSASSSLSSFPGEPDSAPRFCETGLAKAP
## SSGVGVPHVLRIGGRKRFTAEQKLKSYSEPEKINEVGLSGDHRPHPTVRTPEDTVGTFADRWKFFEETSK
## SLLQKAGHRQVHCGLPXEKAERPQTGHHECESTEPWFQKRSLATSCGEILSDRKVEKASEKLNPPRRLGT
## FAEYQASWKEQKKPLEARSSGRYHSADDILDAGLDQQQRPQYIHERSRSSPSTDHYSQEVPVEPNRQAED
## SGDHKEAILCTLQAEEGCSAPSAQPQDSQHVNEDTTFPQPETQLSSKCQHLQTSAMETSRSPSPQFAPQK
## LTDKPPLLIHEDNSARIERVMDNNTTVKMVPIKIVHSESQPEKESRQSLSCPAELPPLPSGLERDQIKTL
## STSEQCYSRFCVYTRQEVEAPHRARPPEPRPPXTPAPPVRDSCSSPPSLNYGKAKEKTMDDLKSEELARE
## IVGKDKSLADILDPSVKIKTTMDLMEGIFPKDEYLLKEAQQRRKLLPKSPYPEHRGQETGPRYARGCVLG
## HLSTYYSTSAPKAELLIKMKDLQEPEEYSAGDLDHDLSVKKQELIDSISRKLQVLREARESLLEDIQANN
## ALGDEVEAIVKDVCKPNEFDKFRMFIGDLDKVVNLLLSLSGRLARVENALNNLDDNPSPGDRQSLLEKQR
## VLTQQHEDAKELKENLDRRERIVFDILATYLSEENLADYEHFVKMKSALIIEQRELEDKIHLGEEQLKCL
## FDSLQPERSK
##
## >AAF13269.1 PDZ domain actin binding protein Shroom [Mus musculus]
## MKTPENLEEPSATPNPSRTPTERFVYLEALLEGGAPWGFTLKGGLERGEPLIISKIEEGGKADSVSSGLQ
## AGDEVIHINEVALSSPRREAVSLVKGSYKTLRLVVRRDVCAAPGHADPGTSKSLSSELLTCSPQHRKATW
## SGGVKLRLKQRCSEPATRPHSWHTTKFGETQPDVSMMQISQGTMGPPWHQSYHSSSSTSDLSNYDHAYLR
## RSPDQCSSQGSMESLEPSGGYPPCHLLSPAKSTSSIDQLGHLHNKRDSAYSSFSTSSSIFEYPPPGGSAR
## ERSGSMDVISARGGLLEGMRQADIRYVKTVYDTRRGVSSEYEVNPSALLLQGRDAHASADSQGCAKWHSI
## PRGKGTPSPSWSQQCSGSLETATDNLPQKAGAPLPPTRSDSYAAFRHRERPSSWSSLDHKRFCRPQTNSS
## GSQKTPFAEDQLHTVPERSPENSPPVKSKHNYTQKAQPGQPLLPTGIYPVPSPEPHFAQVPQPSVSSNGT
## VYPALVKESGYTAAQGTCNKMATLDENGNQNEASRPGFAFCQPLEHNSVTPVEKRPEPTAKYIYKVHFSS
## VPENEDSSLKRHITPPHGHSPYPSERKNIHGGSRACSNHHSLSSPQAQALHVGDDRKPSRLSQPWEGDFQ
## EDHNANLRQKVEREGQGQGLSGNSGRTRSAFSSLQNIPESLRRQSNVELGEAQEVHPGGRSKVEDPGRKA
## GASDIRGYLDRSVSYPRPEGKMNAVDSVHSADSRYEESPAPALPQTSGASQRRLSSSSSAAPQYRKPHCS
## VLEKVSRIEEREQGRHRPLSVGSSAYGPGNRPGRTGPTPSTSSSDLDDPKAGSVHFSESTEHLRNGEQNP
## PNGEAKQEEASRPQCSHLIRRAPADGRGPPARGGEPSRPEARLLRSQSTFQLYSEAEREASWSEDRPGTP
## ESPLLDAPFSRAYRNSIKDAQSRVLGATSFRRRDLEPGTPATSRPWRPRPASAHVGMRSPEAAVPSSSPH
## TPRERHSVTPAAPQAARRGPRRRLTVEQKKRSYSEPEKMNEVGVSEEAEPTPCGPPRPAQPRFSESTVAD
## RRRIFERDGKACSTLSLSGPELKQFQQNALADYIQRKTGKRPTGAASHTGGRAARARTERLPPGRPRGAR
## WPRLASACSLSSLREPEALPRKEHTHPSAADGPQAPRDRSSSFASGRLVGERRRWDPQVPRQLLSGANCE
## PRGVQRMDGAPGGPPSWGMVAGKAGKSKSAEDLLERSDTLAVPVHVRSKSSPTSDKKGQDVLLREGSNFG
## FVKDPCCLAGPGPRSLSCSDKGQNELALPLHHHTPCWNGSGCKATVASSAPPESSGAADHLKQRRAPGPR
## PLSAGMHGHFPDARAASLSSPLPSPVPSASPVPSSYRSQLAMDQQTGQQPPSSPASAVTQPTSPRSLELS
## SPAYGLGEGMWKRTSLPQRPPPPWVKWAHAVREDGLAEDTLAPEFANLKHYRNQPSRPSSCSTSDPDTPG
## RISLRISESALQPSPPPRGDYDDEVFMKDLHPKVTSSPTFEALPPPPPPSPPSEEPLVNGTDDFPPPPPP
## QALCEVLLDGEASTEAGSGPCRIPRVMVTREGHVPGAAHSEGSQIMTATPPQTSAKGSEAESNTPSSASA
## QPQLNGSPGKQLCPSQTRNLTYEPVERTQDLGKKTHAEPQKTSEDIRTEALAKEIVHQDKSLADILDPDS
## RMKTTMDLMEGLFPGDASVLMDSGAKRKALDITARRAGCEAKASDHKEAVSVLVNCPAYYSVSAAKAELL
## NKIKDMPEELQEEEGQEDVYEKKAELIGSLTHKLESLQEAKGSLLTDIKLNNALGEEVEALISELCKPNE
## FDKYKMFIGDLDKVVNLLLSLSGRLARVENVLRGLGEDASKEERSSLNEKRKVLAGQHEDARELKENLDR
## RERVVLDILANYLSAEQLQDYQHFVKMKSTLLIEQRKLDDKIKLGQEQVRCLLESLPSDFRPKAGAISLP
## PALTGHATPGGTSVFGGVFPTLTSPL
##
## >AAF13270.1 actin binding protein ShroomS [Mus musculus]
## MMQISQGTMGPPWHQSYHSSSSTSDLSNYDHAYLRRSPDQCSSQGSMESLEPSGGYPPCHLLSPAKSTSS
## IDQLGHLHNKRDSAYSSFSTSSSIFEYPPPGGSARERSGSMDVISARGGLLEGMRQADIRYVKTVYDTRR
## GVSSEYEVNPSALLLQGRDAHASADSQGCAKWHSIPRGKGTPSPSWSQQCSGSLETATDNLPQKAGAPLP
## PTRSDSYAAFRHRERPSSWSSLDHKRFCRPQTNSSGSQKTPFAEDQLHTVPERSPENSPPVKSKHNYTQK
## AQPGQPLLPTGIYPVPSPEPHFAQVPQPSVSSNGTVYPALVKESGYTAAQGTCNKMATLDENGNQNEASR
## PGFAFCQPLEHNSVTPVEKRPEPTAKYIYKVHFSSVPENEDSSLKRHITPPHGHSPYPSERKNIHGGSRA
## CSNHHSLSSPQAQALHVGDDRKPSRLSQPWEGDFQEDHNANLRQKVEREGQGQGLSGNSGRTRSAFSSLQ
## NIPESLRRQSNVELGEAQEVHPGGRSKVEDPGRKAGASDIRGYLDRSVSYPRPEGKMNAVDSVHSADSRY
## EESPAPALPQTSGASQRRLSSSSSAAPQYRKPHCSVLEKVSRIEEREQGRHRPLSVGSSAYGPGNRPGRT
## GPTPSTSSSDLDDPKAGSVHFSESTEHLRNGEQNPPNGEAKQEEASRPQCSHLIRRAPADGRGPPARGGE
## PSRPEARLLRSQSTFQLYSEAEREASWSEDRPGTPESPLLDAPFSRAYRNSIKDAQSRVLGATSFRRRDL
## EPGTPATSRPWRPRPASAHVGMRSPEAAVPSSSPHTPRERHSVTPAAPQAARRGPRRRLTVEQKKRSYSE
## PEKMNEVGVSEEAEPTPCGPPRPAQPRFSESTVADRRRIFERDGKACSTLSLSGPELKQFQQNALADYIQ
## RKTGKRPTGAASHTGGRAARARTERLPPGRPRGARWPRLASACSLSSLREPEALPRKEHTHPSAADGPQA
## PRDRSSSFASGRLVGERRRWDPQVPRQLLSGANCEPRGVQRMDGAPGGPPSWGMVAGKAGKSKSAEDLLE
## RSDTLAVPVHVRSKSSPTSDKKGQDVLLREGSNFGFVKDPCCLAGPGPRSLSCSDKGQNELALPLHHHTP
## CWNGSGCKATVASSAPPESSGAADHLKQRRAPGPRPLSAGMHGHFPDARAASLSSPLPSPVPSASPVPSS
## YRSQLAMDQQTGQQPPSSPASAVTQPTSPRSLELSSPAYGLGEGMWKRTSLPQRPPPPWVKWAHAVREDG
## LAEDTLAPEFANLKHYRNQPSRPSSCSTSDPDTPGRISLRISESALQPSPPPRGDYDDEVFMKDLHPKVT
## SSPTFEALPPPPPPSPPSEEPLVNGTDDFPPPPPPQALCEVLLDGEASTEAGSGPCRIPRVMVTREGHVP
## GAAHSEGSQIMTATPPQTSAKGSEAESNTPSSASAQPQLNGSPGKQLCPSQTRNLTYEPVERTQDLGKKT
## HAEPQKTSEDIRTEALAKEIVHQDKSLADILDPDSRMKTTMDLMEGLFPGDASVLMDSGAKRKALDITAR
## RAGCEAKASDHKEAVSVLVNCPAYYSVSAAKAELLNKIKDMPEELQEEEGQEDVYEKKAELIGSLTHKLE
## SLQEAKGSLLTDIKLNNALGEEVEALISELCKPNEFDKYKMFIGDLDKVVNLLLSLSGRLARVENVLRGL
## GEDASKEERSSLNEKRKVLAGQHEDARELKENLDRRERVVLDILANYLSAEQLQDYQHFVKMKSTLLIEQ
## RKLDDKIKLGQEQVRCLLESLPSDFRPKAGAISLPPALTGHATPGGTSVFGGVFPTLTSPL
##
## >NP_065910.3 protein Shroom3 [Homo sapiens]
## MMRTTEDFHKPSATLNSNTATKGRYIYLEAFLEGGAPWGFTLKGGLEHGEPLIISKVEEGGKADTLSSKL
## QAGDEVVHINEVTLSSSRKEAVSLVKGSYKTLRLVVRRDVCTDPGHADTGASNFVSPEHLTSGPQHRKAA
## WSGGVKLRLKHRRSEPAGRPHSWHTTKSGEKQPDASMMQISQGMIGPPWHQSYHSSSSTSDLSNYDHAYL
## RRSPDQCSSQGSMESLEPSGAYPPCHLSPAKSTGSIDQLSHFHNKRDSAYSSFSTSSSILEYPHPGISGR
## ERSGSMDNTSARGGLLEGMRQADIRYVKTVYDTRRGVSAEYEVNSSALLLQGREARASANGQGYDKWSNI
## PRGKGVPPPSWSQQCPSSLETATDNLPPKVGAPLPPARSDSYAAFRHRERPSSWSSLDQKRLCRPQANSL
## GSLKSPFIEEQLHTVLEKSPENSPPVKPKHNYTQKAQPGQPLLPTSIYPVPSLEPHFAQVPQPSVSSNGM
## LYPALAKESGYIAPQGACNKMATIDENGNQNGSGRPGFAFCQPLEHDLLSPVEKKPEATAKYVPSKVHFC
## SVPENEEDASLKRHLTPPQGNSPHSNERKSTHSNKPSSHPHSLKCPQAQAWQAGEDKRSSRLSEPWEGDF
## QEDHNANLWRRLEREGLGQSLSGNFGKTKSAFSSLQNIPESLRRHSSLELGRGTQEGYPGGRPTCAVNTK
## AEDPGRKAAPDLGSHLDRQVSYPRPEGRTGASASFNSTDPSPEEPPAPSHPHTSSLGRRGPGPGSASALQ
## GFQYGKPHCSVLEKVSKFEQREQGSQRPSVGGSGFGHNYRPHRTVSTSSTSGNDFEETKAHIRFSESAEP
## LGNGEQHFKNGELKLEEASRQPCGQQLSGGASDSGRGPQRPDARLLRSQSTFQLSSEPEREPEWRDRPGS
## PESPLLDAPFSRAYRNSIKDAQSRVLGATSFRRRDLELGAPVASRSWRPRPSSAHVGLRSPEASASASPH
## TPRERHSVTPAEGDLARPVPPAARRGARRRLTPEQKKRSYSEPEKMNEVGIVEEAEPAPLGPQRNGMRFP
## ESSVADRRRLFERDGKACSTLSLSGPELKQFQQSALADYIQRKTGKRPTSAAGCSLQEPGPLRERAQSAY
## LQPGPAALEGSGLASASSLSSLREPSLQPRREATLLPATVAETQQAPRDRSSSFAGGRRLGERRRGDLLS
## GANGGTRGTQRGDETPREPSSWGARAGKSMSAEDLLERSDVLAGPVHVRSRSSPATADKRQDVLLGQDSG
## FGLVKDPCYLAGPGSRSLSCSERGQEEMLPLFHHLTPRWGGSGCKAIGDSSVPSECPGTLDHQRQASRTP
## CPRPPLAGTQGLVTDTRAAPLTPIGTPLPSAIPSGYCSQDGQTGRQPLPPYTPAMMHRSNGHTLTQPPGP
## RGCEGDGPEHGVEEGTRKRVSLPQWPPPSRAKWAHAAREDSLPEESSAPDFANLKHYQKQQSLPSLCSTS
## DPDTPLGAPSTPGRISLRISESVLRDSPPPHEDYEDEVFVRDPHPKATSSPTFEPLPPPPPPPPSQETPV
## YSMDDFPPPPPHTVCEAQLDSEDPEGPRPSFNKLSKVTIARERHMPGAAHVVGSQTLASRLQTSIKGSEA
## ESTPPSFMSVHAQLAGSLGGQPAPIQTQSLSHDPVSGTQGLEKKVSPDPQKSSEDIRTEALAKEIVHQDK
## SLADILDPDSRLKTTMDLMEGLFPRDVNLLKENSVKRKAIQRTVSSSGCEGKRNEDKEAVSMLVNCPAYY
## SVSAPKAELLNKIKEMPAEVNEEEEQADVNEKKAELIGSLTHKLETLQEAKGSLLTDIKLNNALGEEVEA
## LISELCKPNEFDKYRMFIGDLDKVVNLLLSLSGRLARVENVLSGLGEDASNEERSSLYEKRKILAGQHED
## ARELKENLDRRERVVLGILANYLSEEQLQDYQHFVKMKSTLLIEQRKLDDKIKLGQEQVKCLLESLPSDF
## IPKAGALALPPNLTSEPIPAGGCTFSGIFPTLTSPL
##
## >ABD59319.1 shroom-like protein [Xenopus laevis]
## MMQVSQGTIGSPWHQAYHSSSSTSDLSGYNHEFLRRSPDQYSSRGSMESLDQASAAYHHHLPPAKSTNCI
## DQLVHLHNKRDSAYSSFSTNASIPEYRSSPFSKERSYSMESMHSRNSSGQEGIKHADIKYIKTVYDVQRG
## ISEEYEVNSSSVKNRNYSRQPAYNRHSIGPHGRLEQSRFFSESGGFERAAPMPPTRSDSYALTRHHERPN
## SWSSLDQNRNFRTPKAAGLHSTNTSSNAAQQPKHVHGDGHLHPVLERSPESSPLIKPKQVYSETPQPGQP
## MLPTGIYPVPAPEPHFAHAPQPPKNNNGRLYPALAKEGSYGAKSSEKVLPFSEPNKNEKDTQNLRSKSVG
## QYPMNHSVKEREKKQEGPTGFAHYKLHFTAGPDISTSSLTNDRNDQQPLRLDNIDINEQQKNGTKVAEEF
## SVYAHPAFQNEWSDSKTKQDIASSDIIGLHRNSLSSDAHGEHEYHNHFNIASSSHNKMDERSNRQADHRK
## KLESLSFTVHADEADGPSSNPLKPDESPSPSQKKSYDFTRRRLSSSSSQSSKTDGNKLSSVFDKVCKIEQ
## REHENHRSQFLCGNINQSGLSTRGQNNKGSFTMVEEIRNKFISQDQTPNPNEWRRLSSSHSNEKVTGMHQ
## LTRQGIVYGLQTGDAQKQMPEKQAEKMHSYNQEQNILQAVPDDDNRSFNSQTMPNKEDDWQCAAQDTLGF
## NRAYRNSVKDAQCKVLEATSYRRKDLEISPPHYKKPEKNVRPASAPFRKKSSSLSPHAPKERHSVTPTDN
## CASIQESQGVFFPSRIGAKRRITAEQKKRSYSEPEKMNEVGASESESAPLTVSKMEPVASFSENSVADRR
## RIFEREGKACSTINLSKPQLKQLQQNALADYIERKTGRRPSSQETRLLKERSQSTYFSGSIMDNQSMTST
## SSMNSLNEHNLSYRHREPLSKTGRVSSTLPPGLTGFFDLSSFENNPEYPENRSRSSSFAHQLRSERLLDH
## RSKVEFGKGRETNKPKEVSLQSDDDVIITSSRRHGKSASAEDLLDRLPQPPALHVRSRSSPASDMKSREY
## MSRQEVGNKTSYASASNKEIRSIKSNHFEQMSFTPSFKNHIDTGEDPVPENSSTIQRSAQLENQRNTKTQ
## SISGIYSPHPETKQEPLALPIHSVPAKVTQTSLAHATFDYITAEEYLYSGKRGKESASPTDNKEISDQEW
## CLPENSSSEDLNDPERFAKYTSAQRPQSFETKSGNSINETVQQNKSSGPTAGPKFSTSWKSNGMWSSGSS
## EAETTFNHGKISLHISESCLQPQSPMTGQEDEGDDEVFVKEQDTESFSGTFVPPSPPPFPPPSLEDALLK
## QRIEKFPLVPNTLDEIWENTEEASTQVKVKSNERYLQCASEYTASTESSGSYLLNSGITKRDTDGPLLRL
## SSIVPAPEPLASPVDPTKPIEEQETQPHGADTSILQSSEGNFNPSDSQSTLPHVRSELMSSEDAKSQELA
## KEIVTKDKSLANILDPDSRMKTTMDLMEGLFTKSSSALKEKNQKRKAKKQIDNIIAPESEXKEEKRETLD
## NASNYSAYYSTSAPKAELLRKMKTIHSQIGGKEEQFDVNEKKAELISSLTCKLEVLKDAKESLIDDIKLN
## NSLGEEVETQIETLCKPNEFDKYKMFIGDLDKVVNLLLSLSGRLARVENALSSLGEDASAEERKTWNEKK
## KQLCGQHEDARELKENLDRREKLVMDFLGNYLTGEEFAHYQHFVKMKSALLIEQRELDDKIKLGQEQLRC
## LTESLPSDYLISMKVSLPEERRSSLGNKSLPPPLTSSL
##
## >NP_065768.2 protein Shroom4 [Homo sapiens]
## MENRPGSFQYVPVQLQGGAPWGFTLKGGLEHCEPLTVSKIEDGGKAALSQKMRTGDELVNINGTPLYGSR
## QEALILIKGSFRILKLIVRRRNAPVSRPHSWHVAKLLEGCPEAATTMHFPSEAFSLSWHSGCNTSDVCVQ
## WCPLSRHCSTEKSSSIGSMESLEQPGQATYESHLLPIDQNMYPNQRDSAYSSFSASSNASDCALSLRPEE
## PASTDCIMQGPGPTKAPSGRPNVAETSGGSRRTNGGHLTPSSQMSSRPQEGYQSGPAKAVRGPPQPPVRR
## DSLQASRAQLLNGEQRRASEPVVPLPQKEKLSLEPVLPARNPNRFCCLSGHDQVTSEGHQNCEFSQPPES
## SQQGSEHLLMQASTKAVGSPKACDRASSVDSNPLNEASAELAKASFGRPPHLIGPTGHRHSAPEQLLASH
## LQHVHLDTRGSKGMELPPVQDGHQWTLSPLHSSHKGKKSPCPPTGGTHDQSSKERKTRQVDDRSLVLGHQ
## SQSSPPHGEADGHPSEKGFLDPNRTSRAASELANQQPSASGSLVQQATDCSSTTKAASGTEAGEEGDSEP
## KECSRMGGRRSGGTRGRSIQNRRKSERFATNLRNEIQRRKAQLQKSKGPLSQLCDTKEPVEETQEPPESP
## PLTASNTSLLSSCKKPPSPRDKLFNKSMMLRARSSECLSQAPESHESRTGLEGRISPGQRPGQSSLGLNT
## WWKAPDPSSSDPEKAHAHCGVRGGHWRWSPEHNSQPLVAAAMEGPSNPGDNKELKASTAQAGEDAILLPF
## ADRRKFFEESSKSLSTSHLPGLTTHSNKTFTQRPKPIDQNFQPMSSSCRELRRHPMDQSYHSADQPYHAT
## DQSYHSMSPLQSETPTYSECFASKGLENSMCCKPLHCGDFDYHRTCSYSCSVQGALVHDPCIYCSGEICP
## ALLKRNMMPNCYNCRCHHHQCIRCSVCYHNPQHSALEDSSLAPGNTWKPRKLTVQEFPGDKWNPITGNRK
## TSQSGREMAHSKTSFSWATPFHPCLENPALDLSSYRAISSLDLLGDFKHALKKSEETSVYEEGSSLASMP
## HPLRSRAFSESHISLAPQSTRAWGQHRRELFSKGDETQSDLLGARKKAFPPPRPPPPNWEKYRLFRAAQQ
## QKQQQQQQKQQEEEEEEEEEEEEEEEEEEEEAEEEEEELPPQYFSSETSGSCALNPEEVLEQPQPLSFGH
## LEGSRQGSQSVPAEQESFALHSSDFLPPIRGHLGSQPEQAQPPCYYGIGGLWRTSGQEATESAKQEFQHF
## SPPSGAPGIPTSYSAYYNISVAKAELLNKLKDQPEMAEIGLGEEEVDHELAQKKIQLIESISRKLSVLRE
## AQRGLLEDINANSALGEEVEANLKAVCKSNEFEKYHLFVGDLDKVVNLLLSLSGRLARVENALNSIDSEA
## NQEKLVLIEKKQQLTGQLADAKELKEHVDRREKLVFGMVSRYLPQDQLQDYQHFVKMKSALIIEQRELEE
## KIKLGEEQLKCLRESLLLGPSNF
##
## >AAK95579.1 SHAP-A, partial [Homo sapiens]
## MHFPSEAFSLSWHSGCNTSDVCVQWCPLSRHCSTEKSSSIGSMESLEQPGQATYESHLLPIDQNMYPNQR
## DSAYSSFSASSNASDCALSLRPEEPASTDCIMQGPGPTKAPSGRPNVAETSGGSRRTNGGHLTPSSQMSS
## RPQEGYQSGPAKAVRGPPQPPVRRDSLQASRAQLLNGEQRRASEPVVPLPQKEKLSLEPVLPARNPNRFC
## CLSGHDQVTSEGHQNCEFSQPPESSQQGSEHLLMQASTKAVGSPKACDRASSVDSNPLNEASAELAKASF
## GRPPHLIGPTGHRHSAPEQLLASHLQHVHLDTRGSKGMELPPVQDGHQWTLSPLHSSHKGKKSPCPPTGG
## THDQSSKERKTRQVDDRSLVLGHQSQSSPPHGEADGHPSEKGFLDPNRTSRAASELANQQPSASGSLVQQ
## ATDCSSTTKAASGTEAGEEGDSEPKECSRMGGRRSGGTRGRSIQNRRKSERFATNLRNEIQRRKAQLQKS
## KGPLSQLCDTKEPVEETQEPPESPPLTASNTSLLSSCKKPPSPRDKLFNKSMMLRARSSECLSQAPESHE
## SRTGLEGRISPGQRPGQSSLGLNTWWKAPDPSSSDPEKAHAHCGVRGGHWRWSPEHNSQPLVAAAMEGPS
## NPGDNKELKASTAQAGEDAILLPFADRRKFFEESSKSLSTSHLPGLTTHSNKTFTQRPKPIDQNFQPMSS
## SCRELRRHPMDQSYHSADQPYHA
##
## >ABA81834.1 LP13775p [Drosophila melanogaster]
## MKMRNHKENGNGSEMGESTKSLAKMEPENNNKISVVSVSKLLLKDSNGANSRSSNSNASFSSASVAGSVQ
## DDLPHHNSSSSQLGQQHGSSLDQCGLTQAGLEEYNNRSSSYYDQTAFHHQKQPSYAQSEGYHSYVSSSDS
## TSATPFLDKLRQESDLLSRQSHHWSENDLSSVCSNSVAPSPIPLLARQSHSHSHSHAHSHSNSHGHSHGH
## AHSASSSSSSNNNSNGSATNNNNNNSSESTSSTETLKWLGSMSDISEASHATGYSAISESVSSSQRIVHS
## SRVPTPKRHHSESVLYLHNNEEQGDSSPTASNSSQMMISEEANGEESPPSVQPLRIQHRHSPSYPPVHTS
## MVLHHFQQQQQQQQDYQHPSRHHTNQSTLSTQSSLLELASPTEKPRSLMGQSHSMGDLQQKNPHQNPMLG
## RSAGQQHKSSISVTISSSEAVVTIAPQPPAGKPSKLQLSLGKSEALSCSTPNMGEQSPTNSIDSYRSNHR
## LFPVSTYTEPVHSNTSQYVQHPKPQFSSGLHKSAKLPVITPAGATVQPTWHSVAERINDFERSQLGEPPK
## FAYLEPTKTHRLSNPALKALQKNAVQSYVERQQQQQKEEQQLLRPHSQSYQACHVERKSLPNNLSPIMVG
## LPTGSNSASTRDCSSPTPPPPPRRSGSLLPNLLRRSSSASDYAEFRELHQAQGQVKGPSIRNISNAEKIS
## FNDCGMPPPPPPPRGRLAVPTRRTSSATEYAPMRDKLLLQQAAALAHQQHHPQQHRHAQPPHVPPERPPK
## HPNLRVPSPELPPPPQSELDISYTFDEPLPPPPPPEVLQPRPPPSPNRRNCFAGASTRRTTYEAPPPTAI
## VAAKVPPLVPKKPTSLQHKHLANGGGGSRKRPHHATPQPILENVASPVAPPPPLLPRARSTAHDNVIASN
## LESNQQKRSNSKASYLPRQSLEKLNNTDPDHGIYKLTLTSNEDLVAHTKPSYGVTGKLPNNLPDVLPLGV
## KLHQQPKLQPGSPNGDANVTLRYGSNNNLTGNSPTVAPPPYYGGGQRYSTPVLGQGYGKSSKPVTPQQYT
## RSQSYDVKHTSAVTMPTMSQSHVDLKQAAHDLETTLEEVLPTATPTPTPTPTPTPPRLSPASSHSDCSLS
## TSSLECTINPIATPIPKPEAHIFRAEVISTTLNTNPLTTPPKPAMNRQESLRENIEKITQLQSVLMSAHL
## CDASLLGGYTTPLITSPTASFANEPLMTPPLPPSPPPPLEPEEEEEQEENDVHDKQPEIEELQLMQRSEL
## VLMVNPKPSTTDMACQTDELEDRDTDLEAAREEHQTRTTLQPRQRQPIELDYEQMSRELVKLLPPGDKIA
## DILTPKICKPTSQYVSNLYNPDVPLRLAKRDVGTSTLMRMKSITSSAEIRVVSVELQLAEPSEEPTNLIK
## QKMDELIKHLNQKIVSLKREQQTISEECSANDRLGQDLFAKLAEKVRPSEASKFRTHVDAVGNITSLLLS
## LSERLAQTESSLETRQQERGALESKRDLLYEQMEEAQRLKSDIERRGVSIAGLLAKNLSADMCADYDYFI
## NMKAKLIADARDLAVRIKGSEEQLSSLSDALVQSDC
##
## >EAA12598.4 AGAP008245-PA, partial [Anopheles gambiae str. PEST]
## IPFSSSPKNRSNSKASYLPRQPRDKLHSDPDHGSYKLTLTSNEDCINHNTGEIITASPKCNLPDVLPPGV
## KYSLYSTNNNNNNNNNSVSNNNSINNHHNGIKPKPHSAPIISTANSLKSLFNFSTSSSTTSTSSSDAAKD
## RDGPQTPATGPPPALVGNFEQQQRQHQHDATVLPPPTGGSTVAGAERAPNEPALDSEASSASTSTRDDDA
## LSSNDAAPATVPVVVVAKEQEGESCPSTAEPLVNGVGVGGVSEHTISGSPAALERVTKEINLSPVVGDAV
## ACEPPSPLPLQRTEIVLRVQAPTSEAASQTDSDDAGLARGFAELTIDCGRRAKDQQDATTVSQQCSNGAS
## TSVATSTTSPIGSPPGTPPSGKEQQGQKFFAPLSSSSSPPPPPPSTPRKLHPEEIDCDKLSHDLVSQLSP
## SDKLHTILAPKTFKSSSDYVSDLFNIQIAPRPLKKDASTATPTETTVANGRRSLSITASQRQQLVSKCKG
## EEAVKKNQEELVQRLGKKLLVLTNEQTNIAEESNANDLLGNDVALKVTQKVRPADASKFRSYVDDVGYIT
## MLLLSLSGRLARTDNALHMIDANHPDKKILEAKRERLLEQLDEAKQLKDDIDQRGATIARILEQSLTIEE
## YADYDYFINMKAKLIVDSREIADKIKLGEEQLAALKDTLVQSEC
##
## >XP_392427.4 PREDICTED: hypothetical protein LOC408897 [Apis mellifera]
## MTELQPSPPGYRVQDEAPGPPSCPPASYKYASHGHGSEAANFKSTSSSYPQEGYGGLKQSPSRTVPPNEY
## YRRRNDGRRSTENEHEAGNATKKATIPGYNESHKKNTTSYKNDSGYSESLGFDSYTLPLNERDEASPPPT
## PPVRDASSLKGVCYGPGHEKYPSWPSAPERHPDEDVHGSGHSGSHRSKSWTDHTNYPKEKPAQYTRPHTK
## RPNPAFTQQLKTVMERCEKIPAETFESRNRGNVTEEEPRLWPRVDREGKALGDAEYVVPSPPEREQPQTA
## QTLSHADLEAYVRSYQVDPQVSQVDIYRESTLTQAGLEEYTRVQHSQQASYAQSEGYHSYVSSVDSTTNT
## PFLDRLRRDSEAVAQRPTSTWEDSTSREGRDSVVTTSSGSASSSETLKWHGSMSDVSVSSGLPARQDRTS
## DRWHHGSLSDVSSVNGGVLSQKAGSNGCRDKWQGSMSDVSTSCGLSPTAKGRHGGEKWPDKWQVVSMNDR
## NKQSQGRSSLPAVINHCNASSNAADVSSAMRESKWQESSIDEESLVDKSQVSSSGATASMQWDNSMRIEG
## DKYGSLAQPMPQSPIRQQIGSTTPQSPENWNHPIHGSMSDVSQVNGLSCSKQLIAHSARVQTPQRHHSES
## VLYLDRERNQRKLYPVATTQPQLDSAQTSQRMPPALPSQQISVAERINELEKQQQQQQQQQQQQQQQQQQ
## QMRYTYLDPEKRHRVSDPTLKAIQKKALLSFYERHHQASWRSEPQLAQGSQTIAAPQSPPPQPPPRPRPP
## SSRRASSASDYASGAWRENGNRNQNQGNVGELSSPKHQHSNSCGSLSTDLLGPVIVGPAISIDDWVPERP
## PKKPHLRNVYNDRVPSPDLPPPSPPTVTENEVHDCDDPLPPPPPELSDDCFNDATTTATTATATAATTAH
## HHHQSSEESKIRDRSCDRHKLERHSIRRSKHGSKRDYEKLSSGKSSPNSAAKTSMSHQQAEQEQQHQFVR
## GGIALKHVESEMVLENGVSAGFATHRMVATGRSSLRYPSAQKLMMNGRVTPARRISEERGFSARPPAQLP
## DLISQRYTDSGNQRPAPQVPVEPRIIRQESMRVDGTRIDVSGTLLRNDSAQRLESSSGQRPQPQVAKNAD
## KSGSNNQTTRPNYLPVPENSKCASKYLETGNHGFTIGSPVKTGYEANSKMYPSESSPQKYHEPPKYVANN
## HHQSQRHSGDGTQRGNYYALPPKYIDAPKQKPQPQCPTDRYGGSSNASPSSPPPPPLAPRQNTASRKSLP
## PPPRPAPPHALQGSQSKASYLAYRRERGAPDTEGSYKRTMSPTSRLEDWPPPRDHDDPVLLRVTPHHPLQ
## HHHHHHHNNHHHQQQPELSKSHSVDALHHRLEERSAQQQQQQQQQQQQQQQQQQQQQQQQQQNSAEMIGK
## LSHDLNKKLQLNDARSRTSENNNNDNLEDRHQHHHYHHHHHHHYHQERRHDYEQRKERQSPQSIEVLNDR
## NRQLERERRKLAASCEPLSQNREKQNRSMEIPSSVVEENLFRERSATIEMTSQNIELLNRRNEKRTNVTS
## TTTSTITTCITTSSTTTAMTITTDSCWPRSLEVQSSIEASPVSEKPTLPTSSPPRSPDQVEGDSSRGAVP
## RRSGSCSSSSSSSSSSSSSSSSSSSSSSSSSSGKSSDLHISPRNLSNSFSKNENSFLNNRKVSSPTQTDN
## TGSSNRSSSPVRSNQTKEIELRMDRSSLSSKSRSGGRSSTSSVSSCVSKASSSSSSRNSPVEEDISFSPM
## SPCVSPQPGIEGLTLLQRTEVVLRVNTATSDVASQTDIPETTEIESSTVKIREILLCRKKLPEEIECEEL
## GRDLASQLNPNDKLVPLLVPAPEHKKPTDYVTGLFRVEATLHPRPKRRSSLEEPTTPCSDNGDEEKKHES
## IPSTPLSADSTSPLSPTSAYFTTSEGKARFLTRYSRDVTVEGSTRQEDVPPIIPTNSLDLRQKKEELMMS
## LDKKLVVLRAEQEAVREEGEVNEALGARVATRISAVARPAEASKYRLHVEEVGKITSLLLGLSGRLARAE
## NALYGMPAEHAERKILESKRDKLMDQLEEAKILKSNIDKRSVNVSTILSKYLNEEEFADYQHFINMKAKL
## IVDGREIQDKVKLGEEQLAALREAID
##
## >XP_783573.4 protein Shroom3 isoform X5 [Strongylocentrotus purpuratus]
## MMKDAMYPTTTSTTSSSVNPLPKEVAEQKPVNTKRVRKRESQPGSPRPKSWHTDVRTLSQPDLSRMPQHS
## RQRHGEQTQPRYRNPPPTQYNKFHSSSDSSFMMSSYEEKTGYHQHGRSTGNINNNSAEDTIEPLPGHVQK
## KREAFERTIMSQSTDKINTEDQYGDVYSKRYSKGKEAITQGVNPKLRNIRHDLEPAETYPKVVVATHIHS
## VQSKAVLGRVPDSSDQTGQKYGGAQDVNIYAVQMPERQIASSADSSVTRNYAQAHSNLSGNPQTSYVQST
## FGSNPHSSSFATHHGEIRKVPPAIPKREDSKTKTQAYSDHVKSSSWPVSTISSETTCTLTVCPTILPSDL
## PPVKLTKTEKLQKSPTHSVTQSNPNQNSNSTDQHIVQAKRIWIDDASEHEFNFEDSKMLSSDNTLNTNTR
## PPSPPVRDNENGNYKPKQSKTTRSSDDRFNTSDHLILDYRSFLEKTEQQQENLKSIQPVNSVPESKNKRE
## LFTRTHDFSKATQQEESSPMAQTQPARESSTRNSWYQEKKKQRKRSSLSSEDSLNFSEFDLNKKNLGQNP
## ARTWRNPSESRESTTSDLHPQLTGHPQQPQQQQPVEPRISSHSRQSSDLDNPNAPRKRTPISPSLMEETF
## RMEQEPEQKTFTEKVERTVNRQESRDSRKSGIFDQNDDCQLEKLQPDGNLSENSILRRLEREGSFKNNVN
## LDPQRSEGNETSARKTMPDTKRDLRNLGLSEDAFKQDHRPKSNHYKSGSFTHDSRGNAGDPRLMRTAPVP
## SSHQRTHSIDTYNRNPPRRHESFERRGNPVSRESSFSEHKKSKSDSDQHPKADQQKKKMSDPINKPQNVR
## KTSDPENRQQIWDALKGFVHNRRSPPGTSPASSRPPSMSGSEQSLYRSDMYHSTSSLASGYSSSRHYPQD
## SLSSIGSSFSHPLHQPQDSGFGSNSDISQVRGPHSPSQTGVVSPKDVRIAASIAHSSSMSSSGPQYQQTT
## NERRRSHQVQRPPHQTKHLTSRMSLDSINLPNSRNQQEKMRPNRSPQDKFEFSPTRQISPSPSYNVHVAE
## RVSISSLKEEESNKEGTVFYESLQSERTETEVNHRVFRYPPRSDQTSSSGGQRTSPKSSTVHPPVKSLSM
## DPSYQELELSPPPPPTPLSPLDGRGNMEFPPPPPELAPASNTKRSSPKQEPSEQTVRQATQGGSMPLTSP
## ERITPSFAEQLQQAPSLIHVQVDQVPSAKGEETTPSISPNSIISGRSSPDNHDVEPATSPQQVPRLQSVQ
## ENIAFTDRKDSPVLIRPATLLDSPTRCSQAEPEPLDVAEEEAFDACDGGSNICDSGSNIFTREQEKTDKE
## LREVSNPVLKWVLQALTPSDTVLSDLFPLPRSKTSRSDTMTMDVTTPTKSESEMVMEQSSPSECVNLVLS
## SSRYLRISPAKAIILQRAQTMNKSDDLGNNNTELRKTQEELVDRIGKKVEDIKDLQKEVAEEMSNLEDMG
## RQVMDSVKATCKASEYNKCNMYIADIERVTKLLLSLSRRLNKVESVLGSIENSEEEEKVNLEKLKVTVNS
## KYQDAKMLKESITGRHSTISSMLLNKISNDQHDNFTYYIQMLPRHLIMGQELEDKVKLGEEQLEALGESL
## KQMSLSSDSGSSRDTNGNVSHGFKEEAATSSSSNGIGGPEQLNSNATSSYC
Here we generate a table using ALL our accession data with the protein data base.
shrooms_list <- compbio4all:: entrez_fetch_list(db = "protein",
id = shroom_table$accession,
rettype = "fasta")
is(shrooms_list)
## [1] "list" "vector" "list_OR_List" "vector_OR_Vector"
## [5] "vector_OR_factor"
length(shrooms_list)
## [1] 14
nchar(shrooms_list)
## CAA78718 NP_597713 CAA58534 ABD19518 AAF13269 AAF13270 NP_065910 ABD59319
## 1486 915 1673 1543 2083 1895 2070 1864
## NP_065768 AAK95579 ABA81834 EAA12598 XP_392427 XP_783573
## 1560 778 1647 750 2230 1758