INSTALACION DE PAQUETES EN R

Introducción

Las paqueterias en R se encuentran en tres respositorios grandes (los repositorios son sitios donde se almancenan los paquetes):

  • CRAN: repositorio oficial de R.
  • Github: repositorio que albergan paquetes, funciones y otras paginas de R y otros lenguajes de programación (su uso y popularidad está creciendo rapidamente por la versatilidad y su referencia en articulos de investigación)
  • Bioconductor: es un repositorio dedicado a analisis biológicos.

Un paquete, en el entorno de R, es un conjunto de rutinas creadas pensando en resolver/analizar integralmente un tipo de informacion en especifico. Esto implica tanto la:

  • Importación de datos con diferentes formatos a un “objeto” de R.
  • Funciones para filtrar, depurar, y presentar los datos.
  • Funciones para realizar analisis estadisticos especificos de ese grupo de datos.
  • Funciones para graficar los resultados

Estos paquetes en su mayoria estan desarrollados por grupos de investigacion independiente y el soporte de estos paquetes es usualemnte dado por la comunidad cientifica, lo cual es una de las mayores fortalezas del entorno R.

Para esta clase vamos a utilizar los siguiente paquetes:

  • seqinr

Instalación de paqueteria

Para instalar un paquete, dependiendo del respositorio tenemos diferentes funciones

install.packages("seqinr")

Aunque esta función queda completamente definida por si misma, es recomendable que activen/usen el parámetro dependencies. La razón es muy sencilla. Este es un lenguaje orientado a objetos, cuando se define una función/paquete en R, usualmente este usa otros objetos/funciones definidas en otro paquete, entonces Ud. necesita instalar los otros paquetes para que pueda correr adecudamente este paquete. Entonces, le recomendamos activar este parametro para que R instale automaticamente el conjunto de paqueterias que Ud. necesita para correr este paquete.

install.packages("seqinr", dependencies = T)

Para instalar un paquete desde el repositorio de Bioconductor, podemos usar la siguientes lineas de comando

## Bioconductor version 3.16 (BiocManager 1.30.20), R 4.2.3 (2023-03-15)
## Warning: package(s) not installed when version(s) same as or greater than current; use
##   `force = TRUE` to re-install: 'msa'
## Old packages: 'fontawesome', 'processx', 'ps', 'segmented', 'TH.data',
##   'tinytex', 'zip', 'zoo'

Como se ve en este caso, primero debemos instalar el “instalador de paquetes” de Bioconductor, con la función clásica de R install.packages. Una vez instalado, ya podemos usar BiocManger::install para llamar el paquete que deseamos

Cargar el paquete al entorno de trabajo

Con las funciones anteriores lo que hacemos es descargar el paquete de la nube (repositorio), pero todavia esta no esta “desplegado” en nuestro entorno de trabajo. Para ello tenemos que llamar al paquete con la función library()

library(seqinr)
library(msa)
## Loading required package: Biostrings
## Loading required package: BiocGenerics
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
##     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
##     get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
##     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
##     Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
##     table, tapply, union, unique, unsplit, which.max, which.min
## Loading required package: S4Vectors
## Loading required package: stats4
## 
## Attaching package: 'S4Vectors'
## The following objects are masked from 'package:base':
## 
##     expand.grid, I, unname
## Loading required package: IRanges
## Loading required package: XVector
## Loading required package: GenomeInfoDb
## 
## Attaching package: 'Biostrings'
## The following object is masked from 'package:seqinr':
## 
##     translate
## The following object is masked from 'package:base':
## 
##     strsplit

CARGAR SECUENCIAS

Para cargar las secuencias podemos usar dos metodos:

Cargar una secuencia desde una archivo local

Para esto podemos usar la función read.fasta() del paquete seqinr.

NOTA: Es comun que algunas funciones de diferntes paquetes tengan el mismo nombre, esto crea un problema en R porque no sabe a que funcion estas haciendo referencia. Para solucionar esto puedes escribir el nombre del paquete seguido de la función separado por “::”

Vamos a trabajar con la secuencia de actina de Trypanosoma cruzi, que tiene algunas particularidades moleculares intersantes. Descargue la secuencia en su disco local en formato FASTA

actin <- seqinr::read.fasta(file = "/Users/alfredocardenasrivera/Downloads/actin_Tc.fasta") # cambie la direccion de la carpeta segun la tenga en su disco local
actin
## $XP_809496.1
##   [1] "m" "e" "a" "t" "l" "w" "d" "e" "e" "p" "a" "v" "v" "l" "d" "n" "g" "s"
##  [19] "g" "n" "i" "k" "c" "g" "f" "a" "g" "e" "e" "i" "p" "r" "c" "v" "f" "p"
##  [37] "s" "v" "t" "g" "v" "s" "m" "n" "a" "r" "s" "s" "g" "s" "s" "s" "s" "q"
##  [55] "r" "v" "y" "v" "g" "d" "e" "a" "l" "q" "e" "k" "g" "l" "r" "y" "f" "y"
##  [73] "p" "m" "e" "h" "g" "i" "v" "f" "d" "w" "d" "q" "m" "e" "r" "v" "w" "r"
##  [91] "h" "a" "y" "e" "q" "l" "r" "v" "p" "p" "e" "r" "q" "a" "v" "l" "l" "t"
## [109] "e" "a" "p" "l" "n" "p" "i" "s" "n" "r" "e" "k" "m" "a" "e" "t" "l" "f"
## [127] "e" "s" "f" "g" "v" "p" "a" "l" "h" "v" "q" "i" "q" "a" "v" "l" "t" "l"
## [145] "y" "s" "s" "g" "r" "t" "d" "g" "l" "v" "l" "d" "s" "g" "d" "g" "v" "t"
## [163] "h" "l" "v" "p" "v" "f" "e" "g" "q" "t" "m" "p" "q" "s" "v" "r" "r" "l"
## [181] "e" "l" "a" "g" "r" "d" "l" "t" "e" "w" "m" "m" "e" "l" "l" "s" "d" "e"
## [199] "l" "d" "r" "p" "f" "t" "t" "s" "a" "d" "r" "e" "i" "a" "r" "r" "v" "k"
## [217] "e" "s" "l" "c" "y" "i" "p" "l" "f" "f" "e" "e" "e" "l" "q" "a" "a" "e"
## [235] "e" "d" "g" "i" "n" "e" "d" "v" "k" "g" "k" "e" "p" "f" "t" "l" "p" "d"
## [253] "g" "e" "v" "i" "h" "v" "g" "r" "a" "r" "f" "c" "c" "p" "e" "i" "l" "f"
## [271] "n" "p" "a" "l" "a" "e" "k" "p" "y" "d" "g" "i" "q" "h" "a" "v" "i" "n"
## [289] "c" "v" "n" "s" "c" "p" "i" "d" "l" "r" "r" "q" "l" "l" "g" "s" "i" "v"
## [307] "l" "s" "g" "g" "n" "t" "m" "f" "k" "g" "m" "q" "q" "r" "l" "q" "s" "e"
## [325] "l" "a" "a" "l" "a" "n" "k" "r" "a" "a" "e" "d" "v" "r" "v" "v" "a" "a"
## [343] "s" "e" "r" "k" "f" "s" "v" "w" "i" "g" "a" "a" "i" "l" "a" "s" "l" "t"
## [361] "s" "f" "a" "s" "e" "w" "i" "t" "r" "t" "e" "y" "a" "e" "q" "g" "a" "a"
## [379] "v" "l" "h" "k" "r" "c" "d" "s" "l" "s" "f" "v" "s" "k"
## attr(,"name")
## [1] "XP_809496.1"
## attr(,"Annot")
## [1] ">XP_809496.1 actin 2, putative [Trypanosoma cruzi]"
## attr(,"class")
## [1] "SeqFastadna"

Usted debe ver que se desplega en la consola la secuencia como una cadena de characters, el nombre de la secuencia, el codigo, el organismo y la “clase” definida par esta función

NOTA: Si usted consulta la documentacion de esta funcion, notara que hay dos clases definidas en este paquete para esta función. - DNA: parametro por defecto, y se define para secuenias de bases nitrogenadas - AA: se define para secuencias de aminoacidos

NOTA: para llamar la documentación de una función puede usar la siguiente línea de código ?read.fasta

Como Ud. habrá notado la secuencia es definida con el tipo o clase: DNA. Esto es erroneo ya que la secuencia es de aminoacidos. Para corregir este error tenemos que definir el parámetro seqtype como AA.

actin <- seqinr::read.fasta(file = "/Users/alfredocardenasrivera/Downloads/actin_Tc.fasta",
                            seqtype = "AA")
actin
## $XP_809496.1
##   [1] "M" "E" "A" "T" "L" "W" "D" "E" "E" "P" "A" "V" "V" "L" "D" "N" "G" "S"
##  [19] "G" "N" "I" "K" "C" "G" "F" "A" "G" "E" "E" "I" "P" "R" "C" "V" "F" "P"
##  [37] "S" "V" "T" "G" "V" "S" "M" "N" "A" "R" "S" "S" "G" "S" "S" "S" "S" "Q"
##  [55] "R" "V" "Y" "V" "G" "D" "E" "A" "L" "Q" "E" "K" "G" "L" "R" "Y" "F" "Y"
##  [73] "P" "M" "E" "H" "G" "I" "V" "F" "D" "W" "D" "Q" "M" "E" "R" "V" "W" "R"
##  [91] "H" "A" "Y" "E" "Q" "L" "R" "V" "P" "P" "E" "R" "Q" "A" "V" "L" "L" "T"
## [109] "E" "A" "P" "L" "N" "P" "I" "S" "N" "R" "E" "K" "M" "A" "E" "T" "L" "F"
## [127] "E" "S" "F" "G" "V" "P" "A" "L" "H" "V" "Q" "I" "Q" "A" "V" "L" "T" "L"
## [145] "Y" "S" "S" "G" "R" "T" "D" "G" "L" "V" "L" "D" "S" "G" "D" "G" "V" "T"
## [163] "H" "L" "V" "P" "V" "F" "E" "G" "Q" "T" "M" "P" "Q" "S" "V" "R" "R" "L"
## [181] "E" "L" "A" "G" "R" "D" "L" "T" "E" "W" "M" "M" "E" "L" "L" "S" "D" "E"
## [199] "L" "D" "R" "P" "F" "T" "T" "S" "A" "D" "R" "E" "I" "A" "R" "R" "V" "K"
## [217] "E" "S" "L" "C" "Y" "I" "P" "L" "F" "F" "E" "E" "E" "L" "Q" "A" "A" "E"
## [235] "E" "D" "G" "I" "N" "E" "D" "V" "K" "G" "K" "E" "P" "F" "T" "L" "P" "D"
## [253] "G" "E" "V" "I" "H" "V" "G" "R" "A" "R" "F" "C" "C" "P" "E" "I" "L" "F"
## [271] "N" "P" "A" "L" "A" "E" "K" "P" "Y" "D" "G" "I" "Q" "H" "A" "V" "I" "N"
## [289] "C" "V" "N" "S" "C" "P" "I" "D" "L" "R" "R" "Q" "L" "L" "G" "S" "I" "V"
## [307] "L" "S" "G" "G" "N" "T" "M" "F" "K" "G" "M" "Q" "Q" "R" "L" "Q" "S" "E"
## [325] "L" "A" "A" "L" "A" "N" "K" "R" "A" "A" "E" "D" "V" "R" "V" "V" "A" "A"
## [343] "S" "E" "R" "K" "F" "S" "V" "W" "I" "G" "A" "A" "I" "L" "A" "S" "L" "T"
## [361] "S" "F" "A" "S" "E" "W" "I" "T" "R" "T" "E" "Y" "A" "E" "Q" "G" "A" "A"
## [379] "V" "L" "H" "K" "R" "C" "D" "S" "L" "S" "F" "V" "S" "K"
## attr(,"name")
## [1] "XP_809496.1"
## attr(,"Annot")
## [1] ">XP_809496.1 actin 2, putative [Trypanosoma cruzi]"
## attr(,"class")
## [1] "SeqFastaAA"

Tambien podemos cargar secuencias multiples definidas en archivo

multi.actin <- seqinr::read.fasta(file = "/Users/alfredocardenasrivera/Downloads/multi_actin.txt",
                                  seqtype = "AA")

Cargar un archivo de forma remota

Existen varios paquetes que nos permiten importar las secuencias directamente de los repositorios de forma remota. Para saber a que bases de datos tenemos acceso con seqinr podemos usar la función seqinr::choosebank(). (les recomiendo activar el parametro infobank = T para poder ver el estado de cada una de estas bases de datos, algunas se encuentran inactivas transitoriamente)

seqinr::choosebank(infobank = T)
##               bank status
## 1          genbank     on
## 2             embl     on
## 3          emblwgs     on
## 4    genbankseqinr     on
## 5        swissprot     on
## 6          ensembl     on
## 7      hogenom7dna     on
## 8         hogenom7     on
## 9          hogenom     on
## 10      hogenomdna     on
## 11     hovergendna     on
## 12        hovergen     on
## 13        hogenom5     on
## 14     hogenom5dna     on
## 15        hogenom4     on
## 16     hogenom4dna     on
## 17        homolens     on
## 18     homolensdna     on
## 19       hobacnucl     on
## 20       hobacprot     on
## 21         phever2     on
## 22      phever2dna     on
## 23          refseq     on
## 24       refseq16s     on
## 25        greviews     on
## 26       bacterial    off
## 27        archaeal     on
## 28       protozoan     on
## 29     ensprotists     on
## 30        ensfungi     on
## 31      ensmetazoa     on
## 32       ensplants     on
## 33 ensemblbacteria     on
## 34            mito     on
## 35     polymorphix     on
## 36          emglib     on
## 37   refseqViruses     on
## 38          ribodb     on
## 39          taxodb     on
##                                                                                info
## 1                  GenBank Release 246 (15 October 2021) Last Updated: Nov 19, 2021
## 2       EMBL Nucleotide Archive Release 143 (March 2020) Last Updated: Nov 21, 2021
## 3                                   EMBL Whole Genome Shotgun sequences (July 2018)
## 4                    GenBank Release 231 (15 April 2019) Last Updated: Jun  8, 2019
## 5  UniProt Knowledgebase Release 2021_03 of 09-Jun-2021, Last Updated: Aug  6, 2021
## 6                                Ensembl 85 - (10/03/16) Last Updated: Oct  3, 2016
## 7       HOGENOM - genomic data - Release 07 (Nov 3,2015) Last Updated: Apr 19, 2017
## 8       HOGENOM - protein data - Release 07 (Nov 3,2015) Last Updated: Jun  3, 2019
## 9      HOGENOM - protein data - Release 06 (Oct 30,2011) Last Updated: May 10, 2012
## 10     HOGENOM - genomic data - Release 06 (Oct 30,2011) Last Updated: Nov 14, 2011
## 11    HOVERGEN - genomic data - Release 49 (Dec 22 2009) Last Updated: Dec 22, 2009
## 12    HOVERGEN - protein data - Release 49 (Dec 22 2009) Last Updated: Dec 22, 2009
## 13                                                                    HOGENOM5 (AA)
## 14                                                                   HOGENOM5 (DNA)
## 15                                                                    HOGENOM4 (AA)
## 16                                                                   HOGENOM4 (DNA)
## 17      HOMOLENS 5 - Homologous genes from Ensembl(60)\t Last Updated: Feb 17, 2011
## 18      HOMOLENS 5 - Homologous genes from Ensembl(60)\t Last Updated: Feb 17, 2011
## 19                          HOBACGEN - genomic data - Release 10 (February 12 2002)
## 20                          HOBACGEN - protein data - Release 10 (February 12 2002)
## 21      PhEVER - protein data - Release 2 (June 1  2010) Last Updated: Jul 22, 2010
## 22      PhEVER - genomic data - Release 2 (June 1  2010) Last Updated: Jul 22, 2010
## 23                                 Refseq RNA Sequences. Last Updated: Jan 27, 2018
## 24                         Refseq RNA 16S 23S Sequences. Last Updated: Jan 29, 2018
## 25                                Genome Review from EBI Last Updated: Jan 24, 2013
## 26                                                           NCBI Bacterial Genomes
## 27                            Archaeal Genomes from NCBI Last Updated: Aug 25, 2015
## 28                           Protozoan Genomes from NCBI Last Updated: Feb 23, 2011
## 29                      Ensembl protists 86 - (10/05/16) Last Updated: Oct  5, 2016
## 30                         Ensembl fungi 86 - (10/09/16) Last Updated: Oct  9, 2016
## 31                      Ensembl protists 86 - (10/05/16) Last Updated: Oct  5, 2016
## 32                      Ensembl protists 86 - (10/16/16) Last Updated: Oct 16, 2016
## 33             Ensembl Bacterial Genomes 21 - (02/23/14) Last Updated: Feb 27, 2014
## 34   Mitochondrial sequences - Release 41 (May 19, 2010) Last Updated: Jul  9, 2010
## 35                                            POLYBASE - Release 1  (June 20, 2003)
## 36                                              EMGLib Release 5 (December 9, 2003)
## 37                    RNA sequences - numrel1 (daterel1) Last Updated: May 10, 2012
## 38                                                                           RiboDB
## 39                                                               taxonomic database

Cunado ejecutemos la función seqinr::choosebank() debemos indicar la base de datos en el parámetro bank=. Para este ejemplo vamos a llamar a la base de datos de Uniprot ("swissprot"). Luego que ejecutemos esa linea de comando, se establece una conexión temporal con el servidor.

seqinr::choosebank(bank = "swissprot")

OJO: si el servidor no detecata ninguna actividad en la conexión en un tiempo, se pierde la conexión y Ud tendrá que ejecutar nuevamente la función seqinr::choosebank(bank = "swissprot"), para restablecer la conexión.

Ahora podemos usar la función seqinr::query() para llamar las secuencias desde esa base de datos. Les recomiendo que lean la documentación de esta función antes (pueden usar el siguiente linea de comando para llamarlo desde la consola query), para que sepan que parametros pueden usar para llamar o filtrar sus secuencias. Esta función les permite llamar una secucencia o un grupo de secuencias. Primero vamos a llamar a la proteína Spike de SARS-CoV-2. Esta proteína está identificada con el código P0DTC2, el cual definiremos en el parametro AC=

NOTA: cuando defina un parametro en esta funcion TODO debe estar entrecomillado, ademas, el signo “=” debe estar SIN ESPACIO entre el parámetro y el valor del parámetro

spike <- seqinr::query("spike", "AC=P0DTC2")
spike
## 1 SQ for AC=P0DTC2

NOTA: Como se habrá dado cuenta, esta función tiene una estructura inusual. El nombre del objeto que creamos será el primer parametro que le pasamos a la función.

El objeto que se genera es un “qaw”, que es un objeto definido en este paquete y, como las listas, tiene diferentes slots como:

  • $ call: linea de comando que se ejecuto para llamar la secuencias o secuencias del servidor remoto.
  • $ name: el nombre de la secuencia
  • $ nelem: numero de secuencias que tiene el objeto qaw.
  • $ typelist: el tipo de lista, que en este caso es una lista de secuencias “SQ”
  • $ req: es un objeto lista, dentro de este objeto (que es otra lista) que contiene las secuencias por separado.

Podemos explorar este objesto con funciones basicas de R como str(), summary(), lenght(), structure(), View(). Como otros funciones especificas del paquete seqinr como son:

  • seqinr::getSequence(): da la secuencia
  • seqinr::getName(): da el nombre de las secuencias
  • seqinr::getLength(): da las longitudes de la secuencias
  • seqinr::getKeyword(): da las palabras “claves” o indentificadores de la secuencia
str(spike)
## List of 6
##  $ call    : language seqinr::query(listname = "spike", query = "AC=P0DTC2")
##  $ name    : chr "spike"
##  $ nelem   : int 1
##  $ typelist: chr "SQ"
##  $ req     :List of 1
##   ..$ : 'SeqAcnucWeb' chr "SPIKE_SARS2"
##   .. ..- attr(*, "length")= num 1273
##   .. ..- attr(*, "frame")= num 0
##   .. ..- attr(*, "ncbigc")= num 1
##  $ socket  : 'sockconn' int 4
##   ..- attr(*, "conn_id")=<externalptr> 
##  - attr(*, "class")= chr "qaw"
structure(spike)
## 1 SQ for AC=P0DTC2
summary(spike)
##          Length Class    Mode     
## call     3      -none-   call     
## name     1      -none-   character
## nelem    1      -none-   numeric  
## typelist 1      -none-   character
## req      1      -none-   list     
## socket   1      sockconn numeric
table(spike$req) # OJO aqui tenemos que indicar especifiamente la lista con las secuencias
## 
## SPIKE_SARS2 
##           1
length(spike$req) # OJO aqui tenemos que indicar especifiamente la lista con las secuencias
## [1] 1
seqinr::getSequence(spike,as.string = T)
## [[1]]
##    [1] "M" "F" "V" "F" "L" "V" "L" "L" "P" "L" "V" "S" "S" "Q" "C" "V" "N" "L"
##   [19] "T" "T" "R" "T" "Q" "L" "P" "P" "A" "Y" "T" "N" "S" "F" "T" "R" "G" "V"
##   [37] "Y" "Y" "P" "D" "K" "V" "F" "R" "S" "S" "V" "L" "H" "S" "T" "Q" "D" "L"
##   [55] "F" "L" "P" "F" "F" "S" "N" "V" "T" "W" "F" "H" "A" "I" "H" "V" "S" "G"
##   [73] "T" "N" "G" "T" "K" "R" "F" "D" "N" "P" "V" "L" "P" "F" "N" "D" "G" "V"
##   [91] "Y" "F" "A" "S" "T" "E" "K" "S" "N" "I" "I" "R" "G" "W" "I" "F" "G" "T"
##  [109] "T" "L" "D" "S" "K" "T" "Q" "S" "L" "L" "I" "V" "N" "N" "A" "T" "N" "V"
##  [127] "V" "I" "K" "V" "C" "E" "F" "Q" "F" "C" "N" "D" "P" "F" "L" "G" "V" "Y"
##  [145] "Y" "H" "K" "N" "N" "K" "S" "W" "M" "E" "S" "E" "F" "R" "V" "Y" "S" "S"
##  [163] "A" "N" "N" "C" "T" "F" "E" "Y" "V" "S" "Q" "P" "F" "L" "M" "D" "L" "E"
##  [181] "G" "K" "Q" "G" "N" "F" "K" "N" "L" "R" "E" "F" "V" "F" "K" "N" "I" "D"
##  [199] "G" "Y" "F" "K" "I" "Y" "S" "K" "H" "T" "P" "I" "N" "L" "V" "R" "D" "L"
##  [217] "P" "Q" "G" "F" "S" "A" "L" "E" "P" "L" "V" "D" "L" "P" "I" "G" "I" "N"
##  [235] "I" "T" "R" "F" "Q" "T" "L" "L" "A" "L" "H" "R" "S" "Y" "L" "T" "P" "G"
##  [253] "D" "S" "S" "S" "G" "W" "T" "A" "G" "A" "A" "A" "Y" "Y" "V" "G" "Y" "L"
##  [271] "Q" "P" "R" "T" "F" "L" "L" "K" "Y" "N" "E" "N" "G" "T" "I" "T" "D" "A"
##  [289] "V" "D" "C" "A" "L" "D" "P" "L" "S" "E" "T" "K" "C" "T" "L" "K" "S" "F"
##  [307] "T" "V" "E" "K" "G" "I" "Y" "Q" "T" "S" "N" "F" "R" "V" "Q" "P" "T" "E"
##  [325] "S" "I" "V" "R" "F" "P" "N" "I" "T" "N" "L" "C" "P" "F" "G" "E" "V" "F"
##  [343] "N" "A" "T" "R" "F" "A" "S" "V" "Y" "A" "W" "N" "R" "K" "R" "I" "S" "N"
##  [361] "C" "V" "A" "D" "Y" "S" "V" "L" "Y" "N" "S" "A" "S" "F" "S" "T" "F" "K"
##  [379] "C" "Y" "G" "V" "S" "P" "T" "K" "L" "N" "D" "L" "C" "F" "T" "N" "V" "Y"
##  [397] "A" "D" "S" "F" "V" "I" "R" "G" "D" "E" "V" "R" "Q" "I" "A" "P" "G" "Q"
##  [415] "T" "G" "K" "I" "A" "D" "Y" "N" "Y" "K" "L" "P" "D" "D" "F" "T" "G" "C"
##  [433] "V" "I" "A" "W" "N" "S" "N" "N" "L" "D" "S" "K" "V" "G" "G" "N" "Y" "N"
##  [451] "Y" "L" "Y" "R" "L" "F" "R" "K" "S" "N" "L" "K" "P" "F" "E" "R" "D" "I"
##  [469] "S" "T" "E" "I" "Y" "Q" "A" "G" "S" "T" "P" "C" "N" "G" "V" "E" "G" "F"
##  [487] "N" "C" "Y" "F" "P" "L" "Q" "S" "Y" "G" "F" "Q" "P" "T" "N" "G" "V" "G"
##  [505] "Y" "Q" "P" "Y" "R" "V" "V" "V" "L" "S" "F" "E" "L" "L" "H" "A" "P" "A"
##  [523] "T" "V" "C" "G" "P" "K" "K" "S" "T" "N" "L" "V" "K" "N" "K" "C" "V" "N"
##  [541] "F" "N" "F" "N" "G" "L" "T" "G" "T" "G" "V" "L" "T" "E" "S" "N" "K" "K"
##  [559] "F" "L" "P" "F" "Q" "Q" "F" "G" "R" "D" "I" "A" "D" "T" "T" "D" "A" "V"
##  [577] "R" "D" "P" "Q" "T" "L" "E" "I" "L" "D" "I" "T" "P" "C" "S" "F" "G" "G"
##  [595] "V" "S" "V" "I" "T" "P" "G" "T" "N" "T" "S" "N" "Q" "V" "A" "V" "L" "Y"
##  [613] "Q" "D" "V" "N" "C" "T" "E" "V" "P" "V" "A" "I" "H" "A" "D" "Q" "L" "T"
##  [631] "P" "T" "W" "R" "V" "Y" "S" "T" "G" "S" "N" "V" "F" "Q" "T" "R" "A" "G"
##  [649] "C" "L" "I" "G" "A" "E" "H" "V" "N" "N" "S" "Y" "E" "C" "D" "I" "P" "I"
##  [667] "G" "A" "G" "I" "C" "A" "S" "Y" "Q" "T" "Q" "T" "N" "S" "P" "R" "R" "A"
##  [685] "R" "S" "V" "A" "S" "Q" "S" "I" "I" "A" "Y" "T" "M" "S" "L" "G" "A" "E"
##  [703] "N" "S" "V" "A" "Y" "S" "N" "N" "S" "I" "A" "I" "P" "T" "N" "F" "T" "I"
##  [721] "S" "V" "T" "T" "E" "I" "L" "P" "V" "S" "M" "T" "K" "T" "S" "V" "D" "C"
##  [739] "T" "M" "Y" "I" "C" "G" "D" "S" "T" "E" "C" "S" "N" "L" "L" "L" "Q" "Y"
##  [757] "G" "S" "F" "C" "T" "Q" "L" "N" "R" "A" "L" "T" "G" "I" "A" "V" "E" "Q"
##  [775] "D" "K" "N" "T" "Q" "E" "V" "F" "A" "Q" "V" "K" "Q" "I" "Y" "K" "T" "P"
##  [793] "P" "I" "K" "D" "F" "G" "G" "F" "N" "F" "S" "Q" "I" "L" "P" "D" "P" "S"
##  [811] "K" "P" "S" "K" "R" "S" "F" "I" "E" "D" "L" "L" "F" "N" "K" "V" "T" "L"
##  [829] "A" "D" "A" "G" "F" "I" "K" "Q" "Y" "G" "D" "C" "L" "G" "D" "I" "A" "A"
##  [847] "R" "D" "L" "I" "C" "A" "Q" "K" "F" "N" "G" "L" "T" "V" "L" "P" "P" "L"
##  [865] "L" "T" "D" "E" "M" "I" "A" "Q" "Y" "T" "S" "A" "L" "L" "A" "G" "T" "I"
##  [883] "T" "S" "G" "W" "T" "F" "G" "A" "G" "A" "A" "L" "Q" "I" "P" "F" "A" "M"
##  [901] "Q" "M" "A" "Y" "R" "F" "N" "G" "I" "G" "V" "T" "Q" "N" "V" "L" "Y" "E"
##  [919] "N" "Q" "K" "L" "I" "A" "N" "Q" "F" "N" "S" "A" "I" "G" "K" "I" "Q" "D"
##  [937] "S" "L" "S" "S" "T" "A" "S" "A" "L" "G" "K" "L" "Q" "D" "V" "V" "N" "Q"
##  [955] "N" "A" "Q" "A" "L" "N" "T" "L" "V" "K" "Q" "L" "S" "S" "N" "F" "G" "A"
##  [973] "I" "S" "S" "V" "L" "N" "D" "I" "L" "S" "R" "L" "D" "K" "V" "E" "A" "E"
##  [991] "V" "Q" "I" "D" "R" "L" "I" "T" "G" "R" "L" "Q" "S" "L" "Q" "T" "Y" "V"
## [1009] "T" "Q" "Q" "L" "I" "R" "A" "A" "E" "I" "R" "A" "S" "A" "N" "L" "A" "A"
## [1027] "T" "K" "M" "S" "E" "C" "V" "L" "G" "Q" "S" "K" "R" "V" "D" "F" "C" "G"
## [1045] "K" "G" "Y" "H" "L" "M" "S" "F" "P" "Q" "S" "A" "P" "H" "G" "V" "V" "F"
## [1063] "L" "H" "V" "T" "Y" "V" "P" "A" "Q" "E" "K" "N" "F" "T" "T" "A" "P" "A"
## [1081] "I" "C" "H" "D" "G" "K" "A" "H" "F" "P" "R" "E" "G" "V" "F" "V" "S" "N"
## [1099] "G" "T" "H" "W" "F" "V" "T" "Q" "R" "N" "F" "Y" "E" "P" "Q" "I" "I" "T"
## [1117] "T" "D" "N" "T" "F" "V" "S" "G" "N" "C" "D" "V" "V" "I" "G" "I" "V" "N"
## [1135] "N" "T" "V" "Y" "D" "P" "L" "Q" "P" "E" "L" "D" "S" "F" "K" "E" "E" "L"
## [1153] "D" "K" "Y" "F" "K" "N" "H" "T" "S" "P" "D" "V" "D" "L" "G" "D" "I" "S"
## [1171] "G" "I" "N" "A" "S" "V" "V" "N" "I" "Q" "K" "E" "I" "D" "R" "L" "N" "E"
## [1189] "V" "A" "K" "N" "L" "N" "E" "S" "L" "I" "D" "L" "Q" "E" "L" "G" "K" "Y"
## [1207] "E" "Q" "Y" "I" "K" "W" "P" "W" "Y" "I" "W" "L" "G" "F" "I" "A" "G" "L"
## [1225] "I" "A" "I" "V" "M" "V" "T" "I" "M" "L" "C" "C" "M" "T" "S" "C" "C" "S"
## [1243] "C" "L" "K" "G" "C" "C" "S" "C" "G" "S" "C" "C" "K" "F" "D" "E" "D" "D"
## [1261] "S" "E" "P" "V" "L" "K" "G" "V" "K" "L" "H" "Y" "T"
seqinr::getName(spike)
## [1] "SPIKE_SARS2"
seqinr::getLength(spike)
## [1] 1273
seqinr::getKeyword(spike)
## [[1]]
##  [1] "SPIKE GLYCOPROTEIN"                      
##  [2] "S GLYCOPROTEIN"                          
##  [3] "E2"                                      
##  [4] "PEPLOMER PROTEIN"                        
##  [5] "FULL"                                    
##  [6] "SPIKE PROTEIN S1"                        
##  [7] "SPIKE PROTEIN S2"                        
##  [8] "SPIKE PROTEIN S2'"                       
##  [9] "PRECURSOR"                               
## [10] "S"                                       
## [11] "2"                                       
## [12] "VIRION MEMBRANE"                         
## [13] "SINGLE-PASS TYPE I MEMBRANE PROTEIN"     
## [14] "HOST ENDOPLASMIC RETICULUM-GOLGI INTERME"
## [15] "3D-STRUCTURE"                            
## [16] "COILED COIL"                             
## [17] "DISULFIDE BOND"                          
## [18] "FUSION OF VIRUS MEMBRANE WITH HOST ENDOS"
## [19] "FUSION OF VIRUS MEMBRANE WITH HOST MEMBR"
## [20] "GLYCOPROTEIN"                            
## [21] "HOST CELL MEMBRANE"                      
## [22] "HOST MEMBRANE"                           
## [23] "HOST-VIRUS INTERACTION"                  
## [24] "INHIBITION OF HOST INNATE IMMUNE RESPONS"
## [25] "INHIBITION OF HOST INTERFERON SIGNALING" 
## [26] "INHIBITION OF HOST TETHERIN BY VIRUS"    
## [27] "LIPOPROTEIN"                             
## [28] "MEMBRANE"                                
## [29] "PALMITATE"                               
## [30] "REFERENCE PROTEOME"                      
## [31] "SIGNAL"                                  
## [32] "TRANSMEMBRANE"                           
## [33] "TRANSMEMBRANE HELIX"                     
## [34] "VIRAL ATTACHMENT TO HOST CELL"           
## [35] "VIRAL ENVELOPE PROTEIN"                  
## [36] "VIRAL IMMUNOEVASION"                     
## [37] "VIRAL PENETRATION INTO HOST CYTOPLASM"   
## [38] "VIRION"                                  
## [39] "VIRULENCE"                               
## [40] "VIRUS ENTRY INTO HOST CELL"              
## [41] "CHAIN"                                   
## [42] "TOPO_DOM"                                
## [43] "TRANSMEM"                                
## [44] "DOMAIN"                                  
## [45] "REGION"                                  
## [46] "COILED"                                  
## [47] "MOTIF"                                   
## [48] "SITE"                                    
## [49] "CARBOHYD"                                
## [50] "DISULFID"

Tambien podemos llamar multiples secuencias con la misma función seqinr::query(). Para ello vamos a usar el parametro que define el organimos y vamos a llamar a todas las secuencias definidas para SARS-CoV-2, con la siguienge linea de comando:

seqinr::choosebank(bank = "swissprot")
cov2 <- seqinr::query("cov2", "SP=Severe acute respiratory syndrome coronavirus 2")
length(cov2$req)
## [1] 75730

El resultado son más de 75 mil secuencias,que son muchas secuencias para ser procesadas por una computadora personal promedio. Ahora, no todas las secuencias están curadas (algunas pueden estar incompletas, repetidas, ser predicciones in silico entre otros). Para reducir el numero de secuencias a aquellas depuradas (“revisadas”), podemos agregar el parámetro ST y definirlo como “reviewed”. Usaremos el operador lógico “AND” para unir ambas condiciones

seqinr::choosebank(bank = "swissprot")
cov2 <- seqinr::query("cov2", "SP=Severe acute respiratory syndrome coronavirus 2 AND ST=reviewed")
length(cov2$req)
## [1] 16

Ahora redujimos la cantidad de secuencia a solo 16, pero ya depuradas, lo que mejora enormente la velocidad de nuestro procesamiento y la calidad de los resultados.

Para extraer todas las secuenicias en un solo objeto lista podemos usar una función sapply() de la familia apply.

sapply(cov2$req,seqinr::getSequence,as.string = TRUE)
##  [1] "MDLFMRIFTIGTVTLKQGEIKDATPSDFVRATATIPIQASLPFGWLIVGVALLAVFQSASKIITLKKRWQLALSKGVHFVCNLLLLFVTVYSHLLLVAAGLEAPFLYLYALVYFLQSINFVRIIMRLWLCWKCRSKNPLLYDANYFLCWHTNCYDYCIPYNSVTSSIVITSGDGTTSPISEHDYQIGGYTEKWESGVKDCVVLHSYFTSDYYQLYSTQLSTDTGVEHVTFFIYNKIVDEPEEHVQIHTIDGSSGVVNPVMEPIYDEPTTTTSVPL"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
##  [2] "MSDNGPQNQRNAPRITFGGPSDSTGSNQNGERSGARSKQRRPQGLPNNTASWFTALTQHGKEDLKFPRGQGVPINTNSSPDDQIGYYRRATRRIRGGDGKMKDLSPRWYFYYLGTGPEAGLPYGANKDGIIWVATEGALNTPKDHIGTRNPANNAAIVLQLPQGTTLPKGFYAEGSRGGSQASSRSSSRSRNSSRNSTPGSSRGTSPARMAGNGGDAALALLLLDRLNQLESKMSGKGQQQQGQTVTKKSAAEASKKPRQKRTATKAYNVTQAFGRRGPEQTQGNFGDQELIRQGTDYKHWPQIAQFAPSASAFFGMSRIGMEVTPSGTWLTYTGAIKLDDKDPNFKDQVILLNKHIDAYKTFPPTEPKKDKKKKADETQALPQRQKKQQTVTLLPAADLDDFSKQLQQSMSSADSTQA"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
##  [3] "MFHLVDFQVTIAEILLIIMRTFKVSIWNLDYIINLIIKNLSKSLTENKYSQLDEEQPMEID"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           
##  [4] "MKIILFLALITLATCELYHYQECVRGTTVLLKEPCSSGTYEGNSPFHPLADNKFALTCFSTQFAFACPDGVKHVYQLRARSVSPKLFIRQEEVQELYSPIFLIVAAIVFITLCFTLKRKTE"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
##  [5] "MIELSLIDFYLCFLAFLLFLVLIMLIIFWFSLELQDHNETCHA"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
##  [6] "MKFLVFLGIITTVAAFHQECSLQSCTQHQPYVVDDPCPIHFYSKWYIRVGARKSAPLIELCVDEAGSKSPIQYIDIGNYTVSCLPFTINCQEPKLGSLVVRCSFYEDFLEYHDVRVVLDFI"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
##  [7] "MMPTIFFAGILIVTTIVYLTIV"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
##  [8] "MLLLQILFALLQRYRYKPHSLSDGLLLALHFLLFFRALPKS"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
##  [9] "MAYCWRCTSCCFSERFQNHNPQKEMATSTLQGCSLCLQLAVVVCNSLLTPFARCCWP"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
## [10] "MDPKISEMHPALRLVDPQIQLAVTRMENAVGRDQNNVGPKVYPIILRLGSPLSLNMARKTLNSLEDKAFQLTPIAVQMTKLATTEELPDEFVVVTVK"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       
## [11] "MLQSCYNFLKEQHCQKASTQKGAEAAVKPLLVPHHVVATVQEIQLQAAVGELLLLEWLAMAVMLLLLCCCLTD"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
## [12] "MESLVPGFNEKTHVQLSLPVLQVRDVLVRGFGDSVEEVLSEARQHLKDGTCGLVEVEKGVLPQLEQPYVFIKRSDARTAPHGHVMVELVAELEGIQYGRSGETLGVLVPHVGEIPVAYRKVLLRKNGNKGAGGHSYGADLKSFDLGDELGTDPYEDFQENWNTKHSSGVTRELMRELNGGAYTRYVDNNFCGPDGYPLECIKDLLARAGKASCTLSEQLDFIDTKRGVYCCREHEHEIAWYTERSEKSYELQTPFEIKLAKKFDTFNGECPNFVFPLNSIIKTIQPRVEKKKLDGFMGRIRSVYPVASPNECNQMCLSTLMKCDHCGETSWQTGDFVKATCEFCGTENLTKEGATTCGYLPQNAVVKIYCPACHNSEVGPEHSLAEYHNESGLKTILRKGGRTIAFGGCVFSYVGCHNKCAYWVPRASANIGCNHTGVVGEGSEGLNDNLLEILQKEKVNINIVGDFKLNEEIAIILASFSASTSAFVETVKGLDYKAFKQIVESCGNFKVTKGKAKKGAWNIGEQKSILSPLYAFASEAARVVRSIFSRTLETAQNSVRVLQKAAITILDGISQYSLRLIDAMMFTSDLATNNLVVMAYITGGVVQLTSQWLTNIFGTVYEKLKPVLDWLEEKFKEGVEFLRDGWEIVKFISTCACEIVGGQIVTCAKEIKESVQTFFKLVNKFLALCADSIIIGGAKLKALNLGETFVTHSKGLYRKCVKSREETGLLMPLKAPKEIIFLEGETLPTEVLTEEVVLKTGDLQPLEQPTSEAVEAPLVGTPVCINGLMLLEIKDTEKYCALAPNMMVTNNTFTLKGGAPTKVTFGDDTVIEVQGYKSVNITFELDERIDKVLNEKCSAYTVELGTEVNEFACVVADAVIKTLQPVSELLTPLGIDLDEWSMATYYLFDESGEFKLASHMYCSFYPPDEDEEEGDCEEEEFEPSTQYEYGTEDDYQGKPLEFGATSAALQPEEEQEEDWLDDDSQQTVGQQDGSEDNQTTTIQTIVEVQPQLEMELTPVVQTIEVNSFSGYLKLTDNVYIKNADIVEEAKKVKPTVVVNAANVYLKHGGGVAGALNKATNNAMQVESDDYIATNGPLKVGGSCVLSGHNLAKHCLHVVGPNVNKGEDIQLLKSAYENFNQHEVLLAPLLSAGIFGADPIHSLRVCVDTVRTNVYLAVFDKNLYDKLVSSFLEMKSEKQVEQKIAEIPKEEVKPFITESKPSVEQRKQDDKKIKACVEEVTTTLEETKFLTENLLLYIDINGNLHPDSATLVSDIDITFLKKDAPYIVGDVVQEGVLTAVVIPTKKAGGTTEMLAKALRKVPTDNYITTYPGQGLNGYTVEEAKTVLKKCKSAFYILPSIISNEKQEILGTVSWNLREMLAHAEETRKLMPVCVETKAIVSTIQRKYKGIKIQEGVVDYGARFYFYTSKTTVASLINTLNDLNETLVTMPLGYVTHGLNLEEAARYMRSLKVPATVSVSSPDAVTAYNGYLTSSSKTPEEHFIETISLAGSYKDWSYSGQSTQLGIEFLKRGDKSVYYTSNPTTFHLDGEVITFDNLKTLLSLREVRTIKVFTTVDNINLHTQVVDMSMTYGQQFGPTYLDGADVTKIKPHNSHEGKTFYVLPNDDTLRVEAFEYYHTTDPSFLGRYMSALNHTKKWKYPQVNGLTSIKWADNNCYLATALLTLQQIELKFNPPALQDAYYRARAGEAANFCALILAYCNKTVGELGDVRETMSYLFQHANLDSCKRVLNVVCKTCGQQQTTLKGVEAVMYMGTLSYEQFKKGVQIPCTCGKQATKYLVQQESPFVMMSAPPAQYELKHGTFTCASEYTGNYQCGHYKHITSKETLYCIDGALLTKSSEYKGPITDVFYKENSYTTTIKPVTYKLDGVVCTEIDPKLDNYYKKDNSYFTEQPIDLVPNQPYPNASFDNFKFVCDNIKFADDLNQLTGYKKPASRELKVTFFPDLNGDVVAIDYKHYTPSFKKGAKLLHKPIVWHVNNATNKATYKPNTWCIRCLWSTKPVETSNSFDVLKSEDAQGMDNLACEDLKPVSEEVVENPTIQKDVLECNVKTTEVVGDIILKPANNSLKITEEVGHTDLMAAYVDNSSLTIKKPNELSRVLGLKTLATHGLAAVNSVPWDTIANYAKPFLNKVVSTTTNIVTRCLNRVCTNYMPYFFTLLLQLCTFTRSTNSRIKASMPTTIAKNTVKSVGKFCLEASFNYLKSPNFSKLINIIIWFLLLSVCLGSLIYSTAALGVLMSNLGMPSYCTGYREGYLNSTNVTIATYCTGSIPCSVCLSGLDSLDTYPSLETIQITISSFKWDLTAFGLVAEWFLAYILFTRFFYVLGLAAIMQLFFSYFAVHFISNSWLMWLIINLVQMAPISAMVRMYIFFASFYYVWKSYVHVVDGCNSSTCMMCYKRNRATRVECTTIVNGVRRSFYVYANGGKGFCKLHNWNCVNCDTFCAGSTFISDEVARDLSLQFKRPINPTDQSSYIVDSVTVKNGSIHLYFDKAGQKTYERHSLSHFVNLDNLRANNTKGSLPINVIVFDGKSKCEESSAKSASVYYSQLMCQPILLLDQALVSDVGDSAEVAVKMFDAYVNTFSSTFNVPMEKLKTLVATAEAELAKNVSLDNVLSTFISAARQGFVDSDVETKDVVECLKLSHQSDIEVTGDSCNNYMLTYNKVENMTPRDLGACIDCSARHINAQVAKSHNIALIWNVKDFMSLSEQLRKQIRSAAKKNNLPFKLTCATTRQVVNVVTTKIALKGGKIVNNWLKQLIKVTLVFLFVAAIFYLITPVHVMSKHTDFSSEIIGYKAIDGGVTRDIASTDTCFANKHADFDTWFSQRGGSYTNDKACPLIAAVITREVGFVVPGLPGTILRTTNGDFLHFLPRVFSAVGNICYTPSKLIEYTDFATSACVLAAECTIFKDASGKPVPYCYDTNVLEGSVAYESLRPDTRYVLMDGSIIQFPNTYLEGSVRVVTTFDSEYCRHGTCERSEAGVCVSTSGRWVLNNDYYRSLPGVFCGVDAVNLLTNMFTPLIQPIGALDISASIVAGGIVAIVVTCLAYYFMRFRRAFGEYSHVVAFNTLLFLMSFTVLCLTPVYSFLPGVYSVIYLYLTFYLTNDVSFLAHIQWMVMFTPLVPFWITIAYIICISTKHFYWFFSNYLKRRVVFNGVSFSTFEEAALCTFLLNKEMYLKLRSDVLLPLTQYNRYLALYNKYKYFSGAMDTTSYREAACCHLAKALNDFSNSGSDVLYQPPQTSITSAVLQSGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDVVYCPRHVICTSEDMLNPNYEDLLIRKSNHNFLVQAGNVQLRVIGHSMQNCVLKLKVDTANPKTPKYKFVRIQPGQTFSVLACYNGSPSGVYQCAMRPNFTIKGSFLNGSCGSVGFNIDYDCVSFCYMHHMELPTGVHAGTDLEGNFYGPFVDRQTAQAAGTDTTITVNVLAWLYAAVINGDRWFLNRFTTTLNDFNLVAMKYNYEPLTQDHVDILGPLSAQTGIAVLDMCASLKELLQNGMNGRTILGSALLEDEFTPFDVVRQCSGVTFQSAVKRTIKGTHHWLLLTILTSLLVLVQSTQWSLFFFLYENAFLPFAMGIIAMSAFAMMFVKHKHAFLCLFLLPSLATVAYFNMVYMPASWVMRIMTWLDMVDTSLSGFKLKDCVMYASAVVLLILMTARTVYDDGARRVWTLMNVLTLVYKVYYGNALDQAISMWALIISVTSNYSGVVTTVMFLARGIVFMCVEYCPIFFITGNTLQCIMLVYCFLGYFCTCYFGLFCLLNRYFRLTLGVYDYLVSTQEFRYMNSQGLLPPKNSIDAFKLNIKLLGVGGKPCIKVATVQSKMSDVKCTSVVLLSVLQQLRVESSSKLWAQCVQLHNDILLAKDTTEAFEKMVSLLSVLLSMQGAVDINKLCEEMLDNRATLQAIASEFSSLPSYAAFATAQEAYEQAVANGDSEVVLKKLKKSLNVAKSEFDRDAAMQRKLEKMADQAMTQMYKQARSEDKRAKVTSAMQTMLFTMLRKLDNDALNNIINNARDGCVPLNIIPLTTAAKLMVVIPDYNTYKNTCDGTTFTYASALWEIQQVVDADSKIVQLSEISMDNSPNLAWPLIVTALRANSAVKLQNNELSPVALRQMSCAAGTTQTACTDDNALAYYNTTKGGRFVLALLSDLQDLKWARFPKSDGTGTIYTELEPPCRFVTDTPKGPKVKYLYFIKGLNNLNRGMVLGSLAATVRLQAGNATEVPANSTVLSFCAFAVDAAKAYKDYLASGGQPITNCVKMLCTHTGTGQAITVTPEANMDQESFGGASCCLYCRCHIDHPNPKGFCDLKGKYVQIPTTCANDPVGFTLKNTVCTVCGMWKGYGCSCDQLREPMLQSADAQSFLNRVCGVSAARLTPCGTGTSTDVVYRAFDIYNDKVAGFAKFLKTNCCRFQEKDEDDNLIDSYFVVKRHTFSNYQHEETIYNLLKDCPAVAKHDFFKFRIDGDMVPHISRQRLTKYTMADLVYALRHFDEGNCDTLKEILVTYNCCDDDYFNKKDWYDFVENPDILRVYANLGERVRQALLKTVQFCDAMRNAGIVGVLTLDNQDLNGNWYDFGDFIQTTPGSGVPVVDSYYSLLMPILTLTRALTAESHVDTDLTKPYIKWDLLKYDFTEERLKLFDRYFKYWDQTYHPNCVNCLDDRCILHCANFNVLFSTVFPPTSFGPLVRKIFVDGVPFVVSTGYHFRELGVVHNQDVNLHSSRLSFKELLVYAADPAMHAASGNLLLDKRTTCFSVAALTNNVAFQTVKPGNFNKDFYDFAVSKGFFKEGSSVELKHFFFAQDGNAAISDYDYYRYNLPTMCDIRQLLFVVEVVDKYFDCYDGGCINANQVIVNNLDKSAGFPFNKWGKARLYYDSMSYEDQDALFAYTKRNVIPTITQMNLKYAISAKNRARTVAGVSICSTMTNRQFHQKLLKSIAATRGATVVIGTSKFYGGWHNMLKTVYSDVENPHLMGWDYPKCDRAMPNMLRIMASLVLARKHTTCCSLSHRFYRLANECAQVLSEMVMCGGSLYVKPGGTSSGDATTAYANSVFNICQAVTANVNALLSTDGNKIADKYVRNLQHRLYECLYRNRDVDTDFVNEFYAYLRKHFSMMILSDDAVVCFNSTYASQGLVASIKNFKSVLYYQNNVFMSEAKCWTETDLTKGPHEFCSQHTMLVKQGDDYVYLPYPDPSRILGAGCFVDDIVKTDGTLMIERFVSLAIDAYPLTKHPNQEYADVFHLYLQYIRKLHDELTGHMLDMYSVMLTNDNTSRYWEPEFYEAMYTPHTVLQAVGACVLCNSQTSLRCGACIRRPFLCCKCCYDHVISTSHKLVLSVNPYVCNAPGCDVTDVTQLYLGGMSYYCKSHKPPISFPLCANGQVFGLYKNTCVGSDNVTDFNAIATCDWTNAGDYILANTCTERLKLFAAETLKATEETFKLSYGIATVREVLSDRELHLSWEVGKPRPPLNRNYVFTGYRVTKNSKVQIGEYTFEKGDYGDAVVYRGTTTYKLNVGDYFVLTSHTVMPLSAPTLVPQEHYVRITGLYPTLNISDEFSSNVANYQKVGMQKYSTLQGPPGTGKSHFAIGLALYYPSARIVYTACSHAAVDALCEKALKYLPIDKCSRIIPARARVECFDKFKVNSTLEQYVFCTVNALPETTADIVVFDEISMATNYDLSVVNARLRAKHYVYIGDPAQLPAPRTLLTKGTLEPEYFNSVCRLMKTIGPDMFLGTCRRCPAEIVDTVSALVYDNKLKAHKDKSAQCFKMFYKGVITHDVSSAINRPQIGVVREFLTRNPAWRKAVFISPYNSQNAVASKILGLPTQTVDSSQGSEYDYVIFTQTTETAHSCNVNRFNVAITRAKVGILCIMSDRDLYDKLQFTSLEIPRRNVATLQAENVTGLFKDCSKVITGLHPTQAPTHLSVDTKFKTEGLCVDIPGIPKDMTYRRLISMMGFKMNYQVNGYPNMFITREEAIRHVRAWIGFDVEGCHATREAVGTNLPLQLGFSTGVNLVAVPTGYVDTPNNTDFSRVSAKPPPGDQFKHLIPLMYKGLPWNVVRIKIVQMLSDTLKNLSDRVVFVLWAHGFELTSMKYFVKIGPERTCCLCDRRATCFSTASDTYACWHHSIGFDYVYNPFMIDVQQWGFTGNLQSNHDLYCQVHGNAHVASCDAIMTRCLAVHECFVKRVDWTIEYPIIGDELKINAACRKVQHMVVKAALLADKFPVLHDIGNPKAIKCVPQADVEWKFYDAQPCSDKAYKIEELFYSYATHSDKFTDGVCLFWNCNVDRYPANSIVCRFDTRVLSNLNLPGCDGGSLYVNKHAFHTPAFDKSAFVNLKQLPFFYYSDSPCESHGKQVVSDIDYVPLKSATCITRCNLGGAVCRHHANEYRLYLDAYNMMISAGFSLWVYKQFDTYNLWNTFTRLQSLENVAFNVVNKGHFDGQQGEVPVSIINNTVYTKVDGVDVELFENKTTLPVNVAFELWAKRNIKPVPEVKILNNLGVDIAANTVIWDYKRDAPAHISTIGVCSMTDIAKKPTETICAPLTVFFDGRVDGQVDLFRNARNGVLITEGSVKGLQPSVGPKQASLNGVTLIGEAVKTQFNYYKKVDGVVQQLPETYFTQSRNLQEFKPRSQMEIDFLELAMDEFIERYKLEGYAFEHIVYGDFSHSQLGGLHLLIGLAKRFKESPFELEDFIPMDSTVKNYFITDAQTGSSKCVCSVIDLLLDDFVEIIKSQDLSVVSKVVKVTIDYTEISFMLWCKDGHVETFYPKLQSSQAWQPGVAMPNLYKMQRMLLEKCDLQNYGDSATLPKGIMMNVAKYTQLCQYLNTLTLAVPYNMRVIHFGAGSDKGVAPGTAVLRQWLPTGTLLVDSDLNDFVSDADSTLIGDCATVHTANKWDLIISDMYDPKTKNVTKENDSKEGFFTYICGFIQQKLALGGSVAIKITEHSWNADLYKLMGHFAWWTAFVTNVNASSSEAFLIGCNYLGKPREQIDGYVMHANYIFWRNTNPIQLSSYSLFDMSKFPLKLRGTAVMSLKEGQINDMILSLLSKGRLIIRENNRVVISSDVLVNN"
## [13] "MESLVPGFNEKTHVQLSLPVLQVRDVLVRGFGDSVEEVLSEARQHLKDGTCGLVEVEKGVLPQLEQPYVFIKRSDARTAPHGHVMVELVAELEGIQYGRSGETLGVLVPHVGEIPVAYRKVLLRKNGNKGAGGHSYGADLKSFDLGDELGTDPYEDFQENWNTKHSSGVTRELMRELNGGAYTRYVDNNFCGPDGYPLECIKDLLARAGKASCTLSEQLDFIDTKRGVYCCREHEHEIAWYTERSEKSYELQTPFEIKLAKKFDTFNGECPNFVFPLNSIIKTIQPRVEKKKLDGFMGRIRSVYPVASPNECNQMCLSTLMKCDHCGETSWQTGDFVKATCEFCGTENLTKEGATTCGYLPQNAVVKIYCPACHNSEVGPEHSLAEYHNESGLKTILRKGGRTIAFGGCVFSYVGCHNKCAYWVPRASANIGCNHTGVVGEGSEGLNDNLLEILQKEKVNINIVGDFKLNEEIAIILASFSASTSAFVETVKGLDYKAFKQIVESCGNFKVTKGKAKKGAWNIGEQKSILSPLYAFASEAARVVRSIFSRTLETAQNSVRVLQKAAITILDGISQYSLRLIDAMMFTSDLATNNLVVMAYITGGVVQLTSQWLTNIFGTVYEKLKPVLDWLEEKFKEGVEFLRDGWEIVKFISTCACEIVGGQIVTCAKEIKESVQTFFKLVNKFLALCADSIIIGGAKLKALNLGETFVTHSKGLYRKCVKSREETGLLMPLKAPKEIIFLEGETLPTEVLTEEVVLKTGDLQPLEQPTSEAVEAPLVGTPVCINGLMLLEIKDTEKYCALAPNMMVTNNTFTLKGGAPTKVTFGDDTVIEVQGYKSVNITFELDERIDKVLNEKCSAYTVELGTEVNEFACVVADAVIKTLQPVSELLTPLGIDLDEWSMATYYLFDESGEFKLASHMYCSFYPPDEDEEEGDCEEEEFEPSTQYEYGTEDDYQGKPLEFGATSAALQPEEEQEEDWLDDDSQQTVGQQDGSEDNQTTTIQTIVEVQPQLEMELTPVVQTIEVNSFSGYLKLTDNVYIKNADIVEEAKKVKPTVVVNAANVYLKHGGGVAGALNKATNNAMQVESDDYIATNGPLKVGGSCVLSGHNLAKHCLHVVGPNVNKGEDIQLLKSAYENFNQHEVLLAPLLSAGIFGADPIHSLRVCVDTVRTNVYLAVFDKNLYDKLVSSFLEMKSEKQVEQKIAEIPKEEVKPFITESKPSVEQRKQDDKKIKACVEEVTTTLEETKFLTENLLLYIDINGNLHPDSATLVSDIDITFLKKDAPYIVGDVVQEGVLTAVVIPTKKAGGTTEMLAKALRKVPTDNYITTYPGQGLNGYTVEEAKTVLKKCKSAFYILPSIISNEKQEILGTVSWNLREMLAHAEETRKLMPVCVETKAIVSTIQRKYKGIKIQEGVVDYGARFYFYTSKTTVASLINTLNDLNETLVTMPLGYVTHGLNLEEAARYMRSLKVPATVSVSSPDAVTAYNGYLTSSSKTPEEHFIETISLAGSYKDWSYSGQSTQLGIEFLKRGDKSVYYTSNPTTFHLDGEVITFDNLKTLLSLREVRTIKVFTTVDNINLHTQVVDMSMTYGQQFGPTYLDGADVTKIKPHNSHEGKTFYVLPNDDTLRVEAFEYYHTTDPSFLGRYMSALNHTKKWKYPQVNGLTSIKWADNNCYLATALLTLQQIELKFNPPALQDAYYRARAGEAANFCALILAYCNKTVGELGDVRETMSYLFQHANLDSCKRVLNVVCKTCGQQQTTLKGVEAVMYMGTLSYEQFKKGVQIPCTCGKQATKYLVQQESPFVMMSAPPAQYELKHGTFTCASEYTGNYQCGHYKHITSKETLYCIDGALLTKSSEYKGPITDVFYKENSYTTTIKPVTYKLDGVVCTEIDPKLDNYYKKDNSYFTEQPIDLVPNQPYPNASFDNFKFVCDNIKFADDLNQLTGYKKPASRELKVTFFPDLNGDVVAIDYKHYTPSFKKGAKLLHKPIVWHVNNATNKATYKPNTWCIRCLWSTKPVETSNSFDVLKSEDAQGMDNLACEDLKPVSEEVVENPTIQKDVLECNVKTTEVVGDIILKPANNSLKITEEVGHTDLMAAYVDNSSLTIKKPNELSRVLGLKTLATHGLAAVNSVPWDTIANYAKPFLNKVVSTTTNIVTRCLNRVCTNYMPYFFTLLLQLCTFTRSTNSRIKASMPTTIAKNTVKSVGKFCLEASFNYLKSPNFSKLINIIIWFLLLSVCLGSLIYSTAALGVLMSNLGMPSYCTGYREGYLNSTNVTIATYCTGSIPCSVCLSGLDSLDTYPSLETIQITISSFKWDLTAFGLVAEWFLAYILFTRFFYVLGLAAIMQLFFSYFAVHFISNSWLMWLIINLVQMAPISAMVRMYIFFASFYYVWKSYVHVVDGCNSSTCMMCYKRNRATRVECTTIVNGVRRSFYVYANGGKGFCKLHNWNCVNCDTFCAGSTFISDEVARDLSLQFKRPINPTDQSSYIVDSVTVKNGSIHLYFDKAGQKTYERHSLSHFVNLDNLRANNTKGSLPINVIVFDGKSKCEESSAKSASVYYSQLMCQPILLLDQALVSDVGDSAEVAVKMFDAYVNTFSSTFNVPMEKLKTLVATAEAELAKNVSLDNVLSTFISAARQGFVDSDVETKDVVECLKLSHQSDIEVTGDSCNNYMLTYNKVENMTPRDLGACIDCSARHINAQVAKSHNIALIWNVKDFMSLSEQLRKQIRSAAKKNNLPFKLTCATTRQVVNVVTTKIALKGGKIVNNWLKQLIKVTLVFLFVAAIFYLITPVHVMSKHTDFSSEIIGYKAIDGGVTRDIASTDTCFANKHADFDTWFSQRGGSYTNDKACPLIAAVITREVGFVVPGLPGTILRTTNGDFLHFLPRVFSAVGNICYTPSKLIEYTDFATSACVLAAECTIFKDASGKPVPYCYDTNVLEGSVAYESLRPDTRYVLMDGSIIQFPNTYLEGSVRVVTTFDSEYCRHGTCERSEAGVCVSTSGRWVLNNDYYRSLPGVFCGVDAVNLLTNMFTPLIQPIGALDISASIVAGGIVAIVVTCLAYYFMRFRRAFGEYSHVVAFNTLLFLMSFTVLCLTPVYSFLPGVYSVIYLYLTFYLTNDVSFLAHIQWMVMFTPLVPFWITIAYIICISTKHFYWFFSNYLKRRVVFNGVSFSTFEEAALCTFLLNKEMYLKLRSDVLLPLTQYNRYLALYNKYKYFSGAMDTTSYREAACCHLAKALNDFSNSGSDVLYQPPQTSITSAVLQSGFRKMAFPSGKVEGCMVQVTCGTTTLNGLWLDDVVYCPRHVICTSEDMLNPNYEDLLIRKSNHNFLVQAGNVQLRVIGHSMQNCVLKLKVDTANPKTPKYKFVRIQPGQTFSVLACYNGSPSGVYQCAMRPNFTIKGSFLNGSCGSVGFNIDYDCVSFCYMHHMELPTGVHAGTDLEGNFYGPFVDRQTAQAAGTDTTITVNVLAWLYAAVINGDRWFLNRFTTTLNDFNLVAMKYNYEPLTQDHVDILGPLSAQTGIAVLDMCASLKELLQNGMNGRTILGSALLEDEFTPFDVVRQCSGVTFQSAVKRTIKGTHHWLLLTILTSLLVLVQSTQWSLFFFLYENAFLPFAMGIIAMSAFAMMFVKHKHAFLCLFLLPSLATVAYFNMVYMPASWVMRIMTWLDMVDTSLSGFKLKDCVMYASAVVLLILMTARTVYDDGARRVWTLMNVLTLVYKVYYGNALDQAISMWALIISVTSNYSGVVTTVMFLARGIVFMCVEYCPIFFITGNTLQCIMLVYCFLGYFCTCYFGLFCLLNRYFRLTLGVYDYLVSTQEFRYMNSQGLLPPKNSIDAFKLNIKLLGVGGKPCIKVATVQSKMSDVKCTSVVLLSVLQQLRVESSSKLWAQCVQLHNDILLAKDTTEAFEKMVSLLSVLLSMQGAVDINKLCEEMLDNRATLQAIASEFSSLPSYAAFATAQEAYEQAVANGDSEVVLKKLKKSLNVAKSEFDRDAAMQRKLEKMADQAMTQMYKQARSEDKRAKVTSAMQTMLFTMLRKLDNDALNNIINNARDGCVPLNIIPLTTAAKLMVVIPDYNTYKNTCDGTTFTYASALWEIQQVVDADSKIVQLSEISMDNSPNLAWPLIVTALRANSAVKLQNNELSPVALRQMSCAAGTTQTACTDDNALAYYNTTKGGRFVLALLSDLQDLKWARFPKSDGTGTIYTELEPPCRFVTDTPKGPKVKYLYFIKGLNNLNRGMVLGSLAATVRLQAGNATEVPANSTVLSFCAFAVDAAKAYKDYLASGGQPITNCVKMLCTHTGTGQAITVTPEANMDQESFGGASCCLYCRCHIDHPNPKGFCDLKGKYVQIPTTCANDPVGFTLKNTVCTVCGMWKGYGCSCDQLREPMLQSADAQSFLNGFAV"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
## [14] "MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVFRSSVLHSTQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPFNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKVCEFQFCNDPFLGVYYHKNNKSWMESEFRVYSSANNCTFEYVSQPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPINLVRDLPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETKCTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATRFASVYAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVEGFNCYFPLQSYGFQPTNGVGYQPYRVVVLSFELLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTNTSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQTRAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPRRARSVASQSIIAYTMSLGAENSVAYSNNSIAIPTNFTISVTTEILPVSMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGIAVEQDKNTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDLLFNKVTLADAGFIKQYGDCLGDIAARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPAQEKNFTTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNTFVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSCGSCCKFDEDDSEPVLKGVKLHYT"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               
## [15] "MYSFVSEETGTLIVNSVLLFLAFVVFLLVTLAILTALRLCAYCCNIVNVSLVKPSFYVYSRVKNLNSSRVPDLLV"                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
## [16] "MADSNGTITVEELKKLLEQWNLVIGFLFLTWICLLQFAYANRNRFLYIIKLIFLWLLWPVTLACFVLAAVYRINWITGGIAIAMACLVGLMWLSYFIASFRLFARTRSMWSFNPETNILLNVPLHGTILTRPLLESELVIGAVILRGHLRIAGHHLGRCDIKDLPKEITVATSRTLSYYKLGASQRVAGDSGFAAYSRYRIGNYKLNTDHSSSSDNIALLVQ"

Algunas veces es necesario importar un grupo espec

ANALISIS BASICO DE SECUENCIAS

Vamos a realizar un analisis “descriptivo” de las secuencias.

str(actin)
## List of 1
##  $ XP_809496.1: 'SeqFastaAA' chr [1:392] "M" "E" "A" "T" ...
##   ..- attr(*, "name")= chr "XP_809496.1"
##   ..- attr(*, "Annot")= chr ">XP_809496.1 actin 2, putative [Trypanosoma cruzi]"
table(actin)
## XP_809496.1
##  A  C  D  E  F  G  H  I  K  L  M  N  P  Q  R  S  T  V  W  Y 
## 34  8 19 38 17 27  7 17 12 38 10 11 19 16 27 30 16 32  6  8
summary(actin)
##             Length Class      Mode     
## XP_809496.1 392    SeqFastaAA character

Tambien podemos usar algunas funciones especificas del paquete seqinr para analizar la secuencia como:

seqinr::getLength(actin)
## [1] 392
seqinr::getAnnot(actin)
## [[1]]
## [1] ">XP_809496.1 actin 2, putative [Trypanosoma cruzi]"
seqinr::getSequence(actin)
## [[1]]
##   [1] "M" "E" "A" "T" "L" "W" "D" "E" "E" "P" "A" "V" "V" "L" "D" "N" "G" "S"
##  [19] "G" "N" "I" "K" "C" "G" "F" "A" "G" "E" "E" "I" "P" "R" "C" "V" "F" "P"
##  [37] "S" "V" "T" "G" "V" "S" "M" "N" "A" "R" "S" "S" "G" "S" "S" "S" "S" "Q"
##  [55] "R" "V" "Y" "V" "G" "D" "E" "A" "L" "Q" "E" "K" "G" "L" "R" "Y" "F" "Y"
##  [73] "P" "M" "E" "H" "G" "I" "V" "F" "D" "W" "D" "Q" "M" "E" "R" "V" "W" "R"
##  [91] "H" "A" "Y" "E" "Q" "L" "R" "V" "P" "P" "E" "R" "Q" "A" "V" "L" "L" "T"
## [109] "E" "A" "P" "L" "N" "P" "I" "S" "N" "R" "E" "K" "M" "A" "E" "T" "L" "F"
## [127] "E" "S" "F" "G" "V" "P" "A" "L" "H" "V" "Q" "I" "Q" "A" "V" "L" "T" "L"
## [145] "Y" "S" "S" "G" "R" "T" "D" "G" "L" "V" "L" "D" "S" "G" "D" "G" "V" "T"
## [163] "H" "L" "V" "P" "V" "F" "E" "G" "Q" "T" "M" "P" "Q" "S" "V" "R" "R" "L"
## [181] "E" "L" "A" "G" "R" "D" "L" "T" "E" "W" "M" "M" "E" "L" "L" "S" "D" "E"
## [199] "L" "D" "R" "P" "F" "T" "T" "S" "A" "D" "R" "E" "I" "A" "R" "R" "V" "K"
## [217] "E" "S" "L" "C" "Y" "I" "P" "L" "F" "F" "E" "E" "E" "L" "Q" "A" "A" "E"
## [235] "E" "D" "G" "I" "N" "E" "D" "V" "K" "G" "K" "E" "P" "F" "T" "L" "P" "D"
## [253] "G" "E" "V" "I" "H" "V" "G" "R" "A" "R" "F" "C" "C" "P" "E" "I" "L" "F"
## [271] "N" "P" "A" "L" "A" "E" "K" "P" "Y" "D" "G" "I" "Q" "H" "A" "V" "I" "N"
## [289] "C" "V" "N" "S" "C" "P" "I" "D" "L" "R" "R" "Q" "L" "L" "G" "S" "I" "V"
## [307] "L" "S" "G" "G" "N" "T" "M" "F" "K" "G" "M" "Q" "Q" "R" "L" "Q" "S" "E"
## [325] "L" "A" "A" "L" "A" "N" "K" "R" "A" "A" "E" "D" "V" "R" "V" "V" "A" "A"
## [343] "S" "E" "R" "K" "F" "S" "V" "W" "I" "G" "A" "A" "I" "L" "A" "S" "L" "T"
## [361] "S" "F" "A" "S" "E" "W" "I" "T" "R" "T" "E" "Y" "A" "E" "Q" "G" "A" "A"
## [379] "V" "L" "H" "K" "R" "C" "D" "S" "L" "S" "F" "V" "S" "K"
seqinr::getName(actin)
## [1] "XP_809496.1"

Ahora probemos las mismas funciones con el archivo de secuenias multiples multi.actin.txt (solo presentaremos las 5 primeras como ejemplo)

seqinr::getLength(multi.actin)
##   [1] 392 392 392 392 392 392 391 384 388 388 384 388 376 376 376 376 376 376
##  [19] 376 383 376 404 376 376 376 376 376 376 376 376 376 376 376 376 415 376
##  [37] 376 376 376 367 376 358 341 321 294 294 285 288 393 416 390 416 390 390
##  [55] 416 470 390 390 416 394 416 192 416 417 416 416 416 416 416 416 393 393
##  [73] 416 403 399 403 396 401 166 494 401 403 398 401 330 403 403 401 400 400
##  [91] 305 398 400 394 394 394 392 394 383 163
seqinr::getAnnot(multi.actin)[1:5]
## [[1]]
## [1] ">XP_809496.1 actin 2, putative [Trypanosoma cruzi]"
## 
## [[2]]
## [1] ">EKG04871.1 actin 2, putative [Trypanosoma cruzi]"
## 
## [[3]]
## [1] ">ESS65030.1 actin 2 [Trypanosoma cruzi Dm28c]"
## 
## [[4]]
## [1] ">XP_806044.1 actin 2, putative [Trypanosoma cruzi]"
## 
## [[5]]
## [1] ">RNF12397.1 putative actin 2 [Trypanosoma cruzi]"
seqinr::getSequence(multi.actin)[1:5]
## [[1]]
##   [1] "M" "E" "A" "T" "L" "W" "D" "E" "E" "P" "A" "V" "V" "L" "D" "N" "G" "S"
##  [19] "G" "N" "I" "K" "C" "G" "F" "A" "G" "E" "E" "I" "P" "R" "C" "V" "F" "P"
##  [37] "S" "V" "T" "G" "V" "S" "M" "N" "A" "R" "S" "S" "G" "S" "S" "S" "S" "Q"
##  [55] "R" "V" "Y" "V" "G" "D" "E" "A" "L" "Q" "E" "K" "G" "L" "R" "Y" "F" "Y"
##  [73] "P" "M" "E" "H" "G" "I" "V" "F" "D" "W" "D" "Q" "M" "E" "R" "V" "W" "R"
##  [91] "H" "A" "Y" "E" "Q" "L" "R" "V" "P" "P" "E" "R" "Q" "A" "V" "L" "L" "T"
## [109] "E" "A" "P" "L" "N" "P" "I" "S" "N" "R" "E" "K" "M" "A" "E" "T" "L" "F"
## [127] "E" "S" "F" "G" "V" "P" "A" "L" "H" "V" "Q" "I" "Q" "A" "V" "L" "T" "L"
## [145] "Y" "S" "S" "G" "R" "T" "D" "G" "L" "V" "L" "D" "S" "G" "D" "G" "V" "T"
## [163] "H" "L" "V" "P" "V" "F" "E" "G" "Q" "T" "M" "P" "Q" "S" "V" "R" "R" "L"
## [181] "E" "L" "A" "G" "R" "D" "L" "T" "E" "W" "M" "M" "E" "L" "L" "S" "D" "E"
## [199] "L" "D" "R" "P" "F" "T" "T" "S" "A" "D" "R" "E" "I" "A" "R" "R" "V" "K"
## [217] "E" "S" "L" "C" "Y" "I" "P" "L" "F" "F" "E" "E" "E" "L" "Q" "A" "A" "E"
## [235] "E" "D" "G" "I" "N" "E" "D" "V" "K" "G" "K" "E" "P" "F" "T" "L" "P" "D"
## [253] "G" "E" "V" "I" "H" "V" "G" "R" "A" "R" "F" "C" "C" "P" "E" "I" "L" "F"
## [271] "N" "P" "A" "L" "A" "E" "K" "P" "Y" "D" "G" "I" "Q" "H" "A" "V" "I" "N"
## [289] "C" "V" "N" "S" "C" "P" "I" "D" "L" "R" "R" "Q" "L" "L" "G" "S" "I" "V"
## [307] "L" "S" "G" "G" "N" "T" "M" "F" "K" "G" "M" "Q" "Q" "R" "L" "Q" "S" "E"
## [325] "L" "A" "A" "L" "A" "N" "K" "R" "A" "A" "E" "D" "V" "R" "V" "V" "A" "A"
## [343] "S" "E" "R" "K" "F" "S" "V" "W" "I" "G" "A" "A" "I" "L" "A" "S" "L" "T"
## [361] "S" "F" "A" "S" "E" "W" "I" "T" "R" "T" "E" "Y" "A" "E" "Q" "G" "A" "A"
## [379] "V" "L" "H" "K" "R" "C" "D" "S" "L" "S" "F" "V" "S" "K"
## 
## [[2]]
##   [1] "M" "E" "A" "T" "L" "W" "D" "E" "E" "P" "A" "V" "V" "L" "D" "N" "G" "S"
##  [19] "G" "N" "I" "K" "C" "G" "F" "A" "G" "E" "E" "I" "P" "R" "C" "V" "F" "P"
##  [37] "S" "V" "T" "G" "V" "S" "M" "N" "A" "R" "S" "S" "G" "S" "S" "S" "S" "Q"
##  [55] "R" "V" "Y" "V" "G" "D" "E" "A" "L" "Q" "E" "K" "G" "L" "R" "Y" "F" "Y"
##  [73] "P" "M" "E" "H" "G" "I" "V" "Y" "D" "W" "D" "Q" "M" "E" "R" "V" "W" "R"
##  [91] "H" "A" "Y" "E" "Q" "L" "R" "V" "P" "P" "E" "R" "Q" "A" "V" "L" "L" "T"
## [109] "E" "A" "P" "M" "N" "P" "I" "S" "N" "R" "E" "K" "M" "A" "E" "T" "L" "F"
## [127] "E" "S" "F" "G" "V" "P" "A" "L" "H" "V" "Q" "I" "Q" "A" "V" "L" "T" "L"
## [145] "Y" "S" "S" "G" "R" "T" "D" "G" "L" "V" "L" "D" "S" "G" "D" "G" "V" "T"
## [163] "H" "L" "V" "P" "V" "F" "E" "G" "Q" "T" "M" "P" "Q" "T" "V" "R" "R" "L"
## [181] "E" "L" "A" "G" "R" "D" "L" "T" "E" "W" "M" "M" "E" "L" "L" "S" "D" "E"
## [199] "L" "D" "R" "P" "F" "T" "T" "S" "A" "D" "R" "E" "V" "A" "R" "R" "V" "K"
## [217] "E" "S" "L" "C" "Y" "I" "P" "L" "F" "F" "E" "E" "E" "L" "Q" "A" "A" "E"
## [235] "E" "D" "G" "I" "N" "E" "D" "A" "K" "G" "K" "E" "P" "F" "T" "L" "P" "D"
## [253] "G" "E" "V" "I" "H" "V" "G" "R" "A" "R" "F" "C" "C" "P" "E" "I" "L" "F"
## [271] "N" "P" "A" "L" "A" "E" "K" "P" "Y" "D" "G" "I" "Q" "H" "A" "V" "I" "N"
## [289] "C" "V" "N" "S" "C" "P" "I" "D" "L" "R" "R" "Q" "L" "L" "G" "S" "I" "V"
## [307] "L" "S" "G" "G" "N" "T" "M" "F" "K" "G" "M" "Q" "Q" "R" "L" "Q" "S" "E"
## [325] "L" "A" "A" "L" "A" "N" "K" "R" "A" "A" "E" "D" "V" "R" "V" "V" "A" "A"
## [343] "S" "E" "R" "K" "F" "S" "V" "W" "I" "G" "A" "A" "I" "L" "A" "S" "L" "T"
## [361] "S" "F" "A" "S" "E" "W" "I" "T" "R" "T" "E" "Y" "A" "E" "Q" "G" "A" "A"
## [379] "V" "L" "H" "K" "R" "C" "D" "S" "L" "S" "F" "V" "S" "K"
## 
## [[3]]
##   [1] "M" "E" "A" "T" "L" "W" "D" "E" "E" "P" "A" "V" "V" "L" "D" "N" "G" "S"
##  [19] "G" "N" "I" "K" "C" "G" "F" "A" "G" "E" "E" "I" "P" "R" "C" "V" "F" "P"
##  [37] "S" "V" "T" "G" "V" "S" "M" "N" "A" "R" "S" "S" "G" "S" "S" "S" "S" "Q"
##  [55] "R" "V" "Y" "V" "G" "D" "E" "A" "L" "Q" "E" "K" "G" "L" "R" "Y" "F" "Y"
##  [73] "P" "M" "E" "H" "G" "I" "V" "Y" "D" "W" "D" "Q" "M" "E" "R" "V" "W" "R"
##  [91] "H" "A" "Y" "E" "Q" "L" "R" "V" "P" "P" "E" "R" "Q" "A" "V" "L" "L" "T"
## [109] "E" "A" "P" "M" "N" "P" "I" "S" "N" "R" "E" "K" "M" "A" "E" "T" "L" "F"
## [127] "E" "S" "F" "G" "V" "P" "A" "L" "H" "V" "Q" "I" "Q" "A" "V" "L" "T" "L"
## [145] "Y" "S" "S" "G" "R" "T" "D" "G" "L" "V" "L" "D" "S" "G" "D" "G" "V" "T"
## [163] "H" "L" "V" "P" "V" "F" "E" "G" "Q" "T" "M" "P" "Q" "T" "V" "R" "R" "L"
## [181] "E" "L" "A" "G" "R" "D" "L" "T" "E" "W" "M" "M" "E" "L" "L" "S" "D" "E"
## [199] "L" "D" "R" "P" "F" "T" "T" "S" "A" "D" "R" "E" "V" "A" "R" "R" "V" "K"
## [217] "E" "S" "L" "C" "Y" "I" "P" "L" "F" "F" "E" "E" "E" "L" "Q" "A" "A" "E"
## [235] "E" "D" "G" "I" "N" "E" "D" "A" "K" "G" "K" "E" "P" "F" "T" "L" "P" "D"
## [253] "G" "E" "V" "I" "H" "V" "G" "R" "A" "R" "F" "C" "C" "P" "E" "I" "L" "F"
## [271] "N" "P" "A" "L" "A" "E" "K" "P" "Y" "D" "G" "I" "Q" "H" "A" "V" "I" "N"
## [289] "C" "V" "N" "S" "C" "P" "I" "D" "L" "R" "R" "Q" "L" "L" "G" "S" "I" "V"
## [307] "L" "S" "G" "G" "N" "T" "M" "F" "K" "G" "M" "Q" "K" "R" "L" "Q" "S" "E"
## [325] "L" "A" "A" "L" "A" "N" "K" "R" "A" "A" "E" "D" "V" "R" "V" "V" "A" "A"
## [343] "S" "E" "R" "K" "F" "S" "V" "W" "I" "G" "A" "A" "I" "L" "A" "S" "L" "T"
## [361] "S" "F" "A" "S" "E" "W" "I" "T" "R" "T" "E" "Y" "A" "E" "Q" "G" "A" "A"
## [379] "V" "L" "H" "K" "R" "C" "D" "S" "L" "S" "F" "V" "S" "K"
## 
## [[4]]
##   [1] "M" "E" "A" "T" "L" "W" "D" "E" "E" "P" "A" "V" "V" "L" "D" "N" "G" "S"
##  [19] "G" "N" "I" "K" "C" "G" "F" "A" "G" "E" "E" "I" "P" "R" "C" "V" "F" "P"
##  [37] "S" "V" "T" "G" "V" "S" "M" "N" "T" "R" "S" "S" "G" "S" "S" "S" "S" "Q"
##  [55] "R" "V" "Y" "V" "G" "D" "E" "A" "L" "Q" "E" "K" "G" "L" "R" "Y" "F" "Y"
##  [73] "P" "M" "E" "H" "G" "I" "V" "S" "D" "W" "D" "Q" "M" "E" "R" "V" "W" "R"
##  [91] "H" "A" "Y" "E" "Q" "L" "R" "V" "P" "P" "E" "R" "Q" "A" "V" "L" "L" "T"
## [109] "E" "A" "P" "L" "N" "P" "I" "S" "N" "R" "E" "K" "M" "A" "E" "T" "L" "F"
## [127] "E" "S" "F" "G" "V" "P" "A" "L" "H" "V" "Q" "I" "Q" "A" "V" "L" "T" "L"
## [145] "Y" "S" "S" "G" "R" "T" "D" "G" "L" "V" "L" "D" "S" "G" "D" "G" "V" "T"
## [163] "H" "L" "V" "P" "V" "F" "E" "G" "Q" "T" "M" "P" "Q" "S" "V" "R" "R" "L"
## [181] "E" "L" "A" "G" "R" "D" "L" "T" "E" "W" "M" "M" "E" "L" "L" "S" "D" "E"
## [199] "L" "D" "R" "P" "F" "T" "T" "S" "A" "D" "R" "E" "V" "A" "R" "R" "V" "K"
## [217] "E" "S" "L" "C" "Y" "I" "P" "L" "F" "F" "E" "E" "E" "L" "Q" "A" "A" "E"
## [235] "E" "D" "G" "I" "N" "E" "D" "A" "K" "G" "K" "E" "P" "F" "T" "L" "P" "D"
## [253] "G" "E" "V" "I" "H" "V" "G" "R" "A" "R" "F" "C" "C" "P" "E" "I" "L" "F"
## [271] "N" "P" "A" "L" "A" "E" "K" "P" "Y" "D" "G" "I" "Q" "H" "A" "V" "I" "N"
## [289] "C" "V" "N" "S" "C" "P" "I" "D" "L" "R" "R" "Q" "L" "L" "G" "S" "I" "V"
## [307] "L" "S" "G" "G" "N" "T" "M" "F" "K" "G" "M" "Q" "Q" "R" "L" "Q" "S" "E"
## [325] "L" "A" "A" "L" "A" "N" "K" "R" "A" "A" "E" "D" "V" "R" "V" "V" "A" "A"
## [343] "S" "E" "R" "K" "F" "S" "V" "W" "I" "G" "A" "A" "I" "L" "A" "S" "L" "T"
## [361] "S" "F" "A" "S" "E" "W" "I" "T" "R" "T" "E" "Y" "A" "E" "Q" "G" "A" "A"
## [379] "V" "L" "H" "K" "R" "C" "D" "S" "L" "S" "F" "V" "S" "K"
## 
## [[5]]
##   [1] "M" "E" "A" "T" "L" "W" "D" "E" "E" "P" "A" "V" "V" "L" "D" "N" "G" "S"
##  [19] "G" "N" "I" "K" "C" "G" "F" "A" "G" "E" "E" "I" "P" "R" "C" "V" "F" "P"
##  [37] "S" "V" "T" "G" "V" "S" "M" "N" "A" "R" "S" "S" "G" "S" "S" "S" "S" "Q"
##  [55] "R" "V" "Y" "V" "G" "D" "E" "A" "L" "Q" "E" "K" "G" "L" "R" "Y" "F" "Y"
##  [73] "P" "M" "E" "H" "G" "I" "V" "Y" "D" "W" "D" "Q" "M" "E" "R" "V" "W" "Q"
##  [91] "H" "A" "Y" "E" "Q" "L" "R" "V" "P" "P" "E" "R" "Q" "A" "V" "L" "L" "T"
## [109] "E" "A" "P" "M" "N" "P" "I" "S" "N" "R" "E" "K" "M" "A" "E" "T" "L" "F"
## [127] "E" "S" "F" "G" "V" "P" "A" "L" "H" "V" "Q" "I" "Q" "A" "V" "L" "T" "L"
## [145] "Y" "S" "S" "G" "R" "T" "D" "G" "L" "V" "L" "D" "S" "G" "D" "G" "V" "T"
## [163] "H" "L" "V" "P" "V" "F" "E" "G" "Q" "T" "M" "P" "Q" "T" "V" "R" "R" "L"
## [181] "E" "L" "A" "G" "R" "D" "L" "T" "E" "W" "M" "M" "E" "L" "L" "S" "D" "E"
## [199] "L" "D" "R" "P" "F" "T" "T" "S" "A" "D" "R" "E" "V" "A" "R" "R" "V" "K"
## [217] "E" "S" "L" "C" "Y" "I" "P" "L" "F" "F" "E" "E" "E" "L" "Q" "A" "A" "E"
## [235] "E" "D" "G" "I" "N" "E" "D" "A" "K" "G" "K" "E" "P" "F" "T" "L" "P" "D"
## [253] "G" "E" "V" "I" "H" "V" "G" "R" "A" "R" "F" "C" "C" "P" "E" "I" "L" "F"
## [271] "N" "P" "A" "L" "A" "E" "K" "P" "Y" "D" "G" "I" "Q" "H" "A" "V" "I" "N"
## [289] "C" "V" "N" "S" "C" "P" "I" "D" "L" "R" "R" "Q" "L" "L" "G" "S" "I" "V"
## [307] "L" "S" "G" "G" "N" "T" "M" "F" "K" "G" "M" "Q" "K" "R" "L" "Q" "S" "E"
## [325] "L" "A" "A" "L" "A" "N" "K" "R" "A" "A" "E" "D" "V" "R" "V" "V" "A" "A"
## [343] "S" "E" "R" "K" "F" "S" "V" "W" "I" "G" "A" "A" "I" "L" "A" "S" "L" "T"
## [361] "S" "F" "A" "S" "E" "W" "I" "T" "R" "T" "E" "Y" "A" "E" "Q" "G" "A" "A"
## [379] "V" "L" "H" "K" "R" "C" "D" "S" "L" "S" "F" "V" "S" "K"
seqinr::getName(multi.actin)[1:5]
## [1] "XP_809496.1" "EKG04871.1"  "ESS65030.1"  "XP_806044.1" "RNF12397.1"

Tambien podemos analizar cada una de las secuencias dentro del objeto multi.actin, usando el $

table(multi.actin$XP_809496.1)
## 
##  A  C  D  E  F  G  H  I  K  L  M  N  P  Q  R  S  T  V  W  Y 
## 34  8 19 38 17 27  7 17 12 38 10 11 19 16 27 30 16 32  6  8
round(100*table(multi.actin$XP_809496.1)/seqinr::getLength(multi.actin$XP_809496.1),2)
## 
##    A    C    D    E    F    G    H    I    K    L    M    N    P    Q    R    S 
## 8.67 2.04 4.85 9.69 4.34 6.89 1.79 4.34 3.06 9.69 2.55 2.81 4.85 4.08 6.89 7.65 
##    T    V    W    Y 
## 4.08 8.16 1.53 2.04

Y que pasaria si quisieramos saber la frecuencia relativa de todos las frecuencias relativas de cada una de secuencias y que los resultados respectivos para cada aminoacido esten organizados comodamente en una tabla. Pues bueno para ello podemos hacer uso de una de las funciones más poderosas de R (por su simpleza y rapidez de ejecución). La familia de funciones apply. En este caso que deseamos una tabla usaremos la funcion sapply(). Una de las mayores ventajas de la familia apply es que nos permite ejecutar una función personalizada por cada uno de los elementos de una lista. Como un loop for pero mas rápido en su ejecución.

freRel <- sapply(X = multi.actin, FUN = function(x) round(100*table(x)/seqinr::getLength(x),2))
freRel[1:5,1:10]
##   XP_809496.1 EKG04871.1 ESS65030.1 XP_806044.1 RNF12397.1 KAF5219922.1
## A        8.67       8.93       8.93        8.67       8.93         8.67
## C        2.04       2.04       2.04        2.04       2.04         2.04
## D        4.85       4.85       4.85        4.85       4.85         4.85
## E        9.69       9.69       9.69        9.69       9.69         9.69
## F        4.34       4.08       4.08        4.08       4.08         4.08
##   EKF33322.1 XP_029232524.1 XP_029234507.1 ESL09674.1
## A       8.44           8.33           9.79       8.76
## C       2.05           2.34           2.84       2.32
## D       4.60           4.69           4.12       4.64
## E       9.97          10.16          10.05       9.54
## F       3.84           4.17           3.87       4.12

Si quisieramos hacer un boxplot() para explorar la dispersion de los datos para cada aminoacido, necesitamos transponer la matriz de datos (es decir, cambiar las filas por columnas). Usaremos la función t() para ello

freRel <- t(freRel)

NOTA: Note que estamos re-escribiendo el objeto. Lo que hace R es: 1. Reserva un espacio de memoria para procesar los datos 2. Ejecutar la función y ver si consigue un resultado viable. 3. Si el resultado es viable, entonces recien destruye el objeto original y lo reemplaza por el nuevo. Si el resultado no es viable, conserva el objeto anterior (OJO, eso quiere decir que nuesto script seguira corriendo, aun así la función no haya sido exitosa, entonces podemos llegar a resultados erroneos pensando que nuestro script corrio completamente)

Ahora ya podemos realizar la grafica de caja y bigotes

boxplot(freRel)

Aunque este es un ejercicio meramente demostrativo y no intenta ver un transfondo evolutivo, al menos no en este punto, podemos llegar a ciertas conclusiones interesantes. Como que la distribución de la frecuencias es diferente para cada aminoacido, que el aminoacido mas frecuentemente presente o usado en las secuencias es leucina (l). Seguido de ácido Glutámico (e), Glicina (g), Serina (s) y Valina (v).

PREGUNTA PARA REFLEXIONAR: Ahora sabemos la frecuencia “esperada” de cada aminoacido. Que pasaria si en una secuencia de 25 aa, veo que la “frecuencia observada” de algunos aminoacidos esta incrementada. Que podría sospechar? Que repercución biológica tendría esto?

Si quisieramos usar el paquete de ggplot2 para hacer el boxplot, primero tenemos que hacer ciertos cambios. Primero tenemos que convertir la matriz de datos en un data frame de dos columnas, donde una sea el nombre del aminoacio (como factor) y la segunda la frecuencia (el valor numerico). Para ello usaremos primero la funcion stack(), y luego convertiremos el producto en una data.frame con la función as.data.frame(). Finalmente usaremos geom_boxplot() de la paqueteria de ggplot2 para generar el gráfico.

dat <- stack(freRel)       # convertimos la matriz en un data frame de tres columnas
dat <- as.data.frame(dat)  # convertimos el objeto S4 generado por stack a un data.frame
colnames(dat)              # identificamos el nombre de las columnas
## [1] "row"   "col"   "value"
library(ggplot2)           # cargamos la libreria de ggplot2
ggplot(data = dat, aes(x = col, y = value, fill = col)) + 
  geom_boxplot() +
  labs(title = "FRECUENCIA RELATIVA DE AA", 
       subtitle = "Familia de actina de tripanosomatidos",
       caption = Sys.Date())+
  ylab("Frecuencia relativa") +
  xlab("Aminoácido")

# ALIEAMIENTO DE SECUENCIAS

align <- msa(unlist(seqinr::getSequence(multi.actin,as.string = T)),type = "protein")
## use default substitution matrix
print(align, show = "complete")
## 
## MsaAAMultipleAlignment with 100 rows and 557 columns
##       aln (1..73)
##   [1] -------------------------------------------------------------------------
##   [2] -------------------------------------------------------------------------
##   [3] -------------------------------------------------------------------------
##   [4] ---------------------------------------MACFISPSSCDDWSLVFFCFFWSCRVPPVKTKWP
##   [5] -------------------------------------------------------------------------
##   [6] -------------------------------------------------------------------------
##   [7] -------------------------------------------------------------------------
##   [8] -------------------------------------------------------------------------
##   [9] ------------------------------------------------------------------------- 
##   ... ...
##  [93] -------------------------------------------------------------------------
##  [94] -------------------------------------------------------------------------
##  [95] -------------------------------------------------------------------------
##  [96] -------------------------------------------------------------------------
##  [97] -------------------------------------------------------------------------
##  [98] -------------------------------------------------------------------------
##  [99] -------------------------------------------------------------------------
## [100] -------------------------------------------------------------------------
##   Con ------------------------------------------------------------------------- 
## 
##       aln (74..146)
##   [1] --------------------MAYPVVVIDNGTGYTKMGYAGNEEPTYIIPTAYADNEASRRR-----------
##   [2] --------------------MAYPVVVIDNGTGYTKMGYAGNEEPTYIIPTAYADNEASRRR-----------
##   [3] --------------------MTHPVVVIDNGTGYTKMGYAGNEEPTFTIPTVYADNEVARRR-----------
##   [4] SQHRPHARTPTLRQRRRPGPMSYPVVVIDNGTGYTKLGYAGNEEPTYVIPSLYADNAAAWRR-----------
##   [5] --------------------MSYPVVVIDNGTGYTKMGYAGNEEPTYIIPSLYADNATARRR-----------
##   [6] --------------------MSYPVVVIDNGTGYTKMGYAGNEEPTYIFPSLYADGAAGRRR-----------
##   [7] --------------------MSYPVVVIDNGTGYTKMGYAGNEEPTYIIPSLYADNETVRRR-----------
##   [8] --------------------MSYPVVVIDNGTGYTKMGYAGNEEPTYIIPSLYADNETVRRR-----------
##   [9] --------------------MSYPVVVIDNGTGYTKMGYAGNEEPTYIIPSLYADNETVRRR----------- 
##   ... ...
##  [93] -------------------MSEKLPVVLDNGSGFLKCGFAGSNFPEVFFRTAVGRPVLRQTKMSEGRSSKRK-
##  [94] --------------------MVSSPIVLDNGSGFLKCGYAGANFPEVCFQTAVGRPVLRSTKSGSN-SGKGV-
##  [95] ------------------MVQQQPVVVFDMGSNKTRVGFAGEEAPRVISSTVVGVPRQRGLVG----------
##  [96] ------------------MVQQQPVVVFDMGSNKTRVGFAGEEAPRVISSTVVGVPRQRGLVG----------
##  [97] ------------------MVQQQPVVVFDMGSNKTRVGFAGEEAPRVISSTVVGVPRQRGLVG----------
##  [98] ------------------MVQQQPVVVFDMGSNKTRVGFAGEEAPRVISSTVVGVPRQRGLVG----------
##  [99] ---------------MQQQQQRQSVAVFDVGSCSTRIGFAGEEAPRVVSPTVVGVPRHRGVLG----------
## [100] -----------------MLHQHHPIAVVDVGSGTTRLGFGGEEAPRVVQPTVVGTPQCQGMLG----------
##   Con ------------------???E????V?DNGSG??K?GFAG???PR?VFPS?VG?P?????------------ 
## 
##       aln (147..219)
##   [1] --SHDVFSDLDFYVGDEALAH---SSSCNLYHPIKHGIVEDWDKMERIWQHCVYKYLRVDPEEHGFILTEPPA
##   [2] --SHDVFSDLDFYVGDEALAH---SSSCNLYHPIKHGIVEDWDKMERIWQHCVYKYLRVDPEEHGFILTEPPA
##   [3] --SNDIFSDLDFNIGDEAIAR---AGPCNLSHPIRHGIVEDWDKMERMWLHCIYKYLRVDPGEHGFILTEPLA
##   [4] --SNDVFEDLDFYIGEEAAAR---AGSCTVSYPIQHGIVKDWDKMERIWQHCIYKYLHVEPEEHGFILTEPPA
##   [5] --SNDVFEDLDFYIGEEAAAR---AGSCTVSYPIKHGIVEDWDKMERIWQHCIYKYLHVEPEEHGFILTEPPA
##   [6] --SSDVFEDLDFCIGDEAAAC---AGSCNLSYPIKHGIVEDWDKMERIWQHCIYKYLRVEPEEHGFILTEPPA
##   [7] --SNDVFDDLDFYIGEEAAAR---ASSCTLSYPIKHGIVEDWDKMERIWQHCIYKYLRVEPEEHGFILTEPPA
##   [8] --SNDVFDDLDFYIGEEAAAR---ASSCTLSYPIKHGIVEDWDKMERIWQHCIYKYLRVEPEEHGFILTEPPA
##   [9] --SNDVFDDLDFYIGEEAAAR---ASSCTLSYPIKHGIVEDWDKMERIWQHCIYKYLRVEPEEHGFILTEPPA 
##   ... ...
##  [93] -DTQVDPLTKDLVLGDECNGA---HHLLDMTFPIHNGVIQNMDDMRYLWKHAFHNLLSVEPEDHSLLISEAPL
##  [94] -STHSDPLLKDLVLGDECTSI---RHLLDMSFPINNGIIKNMDDMCHLWNYTFNDLLHIKPEEHSLLLSEAPL
##  [95] ---SLLQHYSDDYAGDAACAQ---EGMLNLSYPVRNRCITSMPEVEHFLQDVFYSRLPLVPSNTMMLWVESVR
##  [96] ---SLLQHYSDDYAGDAACAQ---EGMLNLSYPVRNRCITSMPEVEHFLQDVFYSRLPLVPSNTMMLWVESVR
##  [97] ---SLMQHYSDDYAGDAACAQ---EGMLTLSYPVRNRRITSMPEVEHFLQDVFYSRLPLVPSNTMMLWVESVR
##  [98] ---SLMQHYSDDYAGDAACAQ---EGMLNLSYPVRNRRITSMPEVEHFLQDVFYSRLPLVPSNTMMLWVESVR
##  [99] ---SLLQHHSDDYAGDDALER---EGILKLSRPVQDRRVVSFEGLEHILHDALYTWLPVIPSETPLMWVEATG
## [100] ---SLLQHHGDTFAGDAAWER---RGLLTLSYPVQGRRVVSYKGLEHILHDALYAWLPFVPDETPLLWVEPAC
##   Con ---??????????VGDEA?A?---???L?L?YPI?HGIV??WD?ME??W?HTFY??LRVNPE?H?VLLTEAP? 
## 
##       aln (220..292)
##   [1] NPPENREHTAEVMFETFGVKQLHIAVQGALALRASWTSGKAQQLGLVGENTGVVVDSGDGVTHIVPIVDGFVM
##   [2] NPPENREHTAEVMFETFGVKQLHIAVQGALALRASWTSGKAQQLGLVGENTGVVVDSGDGVTHIVPIVDGFVM
##   [3] NPPENREHTAEVMFETFGVKQLHIAVQGVLALRASWTSGMAQQLGLAGENTGVVVDSGDGVTHVVPIVDGFVM
##   [4] NPPENREYTAEVMFETFGVKQLHIAVQGALALRASWTSGKAQELGVAGKDTGLVIDSGAGVTHIMPIVDGFVL
##   [5] NPPENREYAAEVMFETFGVKQLHIAVQGALALRASWTSGKAQELGVSGKDTGLVIDSGAGVTHIMPIVDGFVL
##   [6] NPPENREYTAEVMFETFGVKQLHIAVQGTLALRASWTSGKAQELGVAGKDTGLVIDSGAGVTHVIPIVDGFVL
##   [7] NPPENREYTAEVMFETFGVKQLHIAVQGALALRASWTSGKAKELGVAGKDTGLVIDSGAGVTHVIPIVDGFVL
##   [8] NPPENREYTAEVMFETFGVKQLHIAVQGALALRASWTSGKAKELGVAGKDTGLVIDSGAGVTHVIPIVDGFVL
##   [9] NPPENREYTAEVMFETFGVKQLHIAVQGALALRASWTSGKAKELGVAGKDTGLVIDSGAGVTHVIPIVDGFVL 
##   ... ...
##  [93] FSHKDRVKLYEVMFEEFKFPFVQSTPQGVLSLFS------------NGLQTGVAVECGECVSHCTPIFEGYTI
##  [94] FSHNDRVKLYEVMFEEYKFPFIQSVPQGVLSLFS------------NGLQTGVALECGECMSHCTPIFEGYAI
##  [95] TSREDRERLCEMMFESFGLPQLGLVAASATTVFS------------TGRTTGLVVDSGEGCTNFNAVWEGYNL
##  [96] TSREDRERLCEMMFESFGLPQLGLVAASATTVFS------------TGRTTGLVVDSGEGCTNFNAVWEGYNL
##  [97] TSREDRERLCEMMFESFGLPQLGLVAASATTVFS------------TGRTTGLVVDSGEGCTNFNAVWEGYNL
##  [98] TSREDRERLCEMMFESFGLPQLGLVAASATTVFS------------TGRTTGLVVDSGEGCTNFNAVWEGYNL
##  [99] TPRKDRERLCEVLFESFDIPLLGITSAAAATVYS------------TGRTTGLVLDSGEGCTTINAVWEGYIL
## [100] APREDRERMCELLFESFDLPLLAMTSAAAATVYS------------TGRTSGLVLDSGEDCTTVNAVWEGYNL
##   Con NP??NREKM?E?MFETFGVPA??V?IQAVLSLYS------------SGRTTG?VLDSGDGVTH?VPI?EGY?L 
## 
##       aln (293..365)
##   [1] HNAIQHIPLAGRDITNFVLEWLRERGEPVPA-DDALYLAQHIKEKYCYIARNIAREFETYDSDLPNHITKHHA
##   [2] HNAIQHIPLAGRDITNFVLEWLRERGEPVPA-DDALYLAQHIKEKYCYIARNIAREFETYDSDLPNHITKHHA
##   [3] HNAMCHIPLAGRDITNFVLEWLRERGEAVPA-DDALYLAQRIKEQHCYIARDIAHEFDKYDNNLPANITKHHD
##   [4] NQAIHHIPLAGRDITNFVLERLRERGEPVPP-DDALLLAQRIKEEYCYIARDIASEFDTYDRDLPKYVTKHRD
##   [5] NQAIHHIPLAGRDITNFVLERLRERGEPVPP-DDALLLAQRIKEQYCYIARDIAREFDTYDRDLPKYVTKHRD
##   [6] NQAIHHIPLAGRDITNFVLERLRERGEPVPP-DDALLLAQRIKEAHCYIARDIAREFDTYDRDLPKYITTHRD
##   [7] NQAIQHIPLAGRDITNFILERLRERGEPVPP-DDALLLAQRIKEQYCYIARDIAREFDTYDRNLPDHITKHCD
##   [8] NQAIQHIPLAGRDITNFILERLRERGEPVPP-DDALLLAQRIKEQYCYIARDIAREFDTYDRNLPDHITKHCD
##   [9] NQAIQHIPLAGRDITNFVLERLRERGEPVPP-DDALLLAQRIKEQYCYIARDIAREFDTYDRNLPDHITKHCD 
##   ... ...
##  [93] PKANRRVDLGGRNITEFLVRLMQRRG-YSFNQSSDFETVRCIKERFCYAAVDPKLEQRLALET------TVLE
##  [94] PKANKRVDLGGRHITEFLIRLMQRRG-YNFNQSSDFETVRRIKERFCYAAVDSKLEQRLAFET------TVLE
##  [95] QYATHTSDVAGRVLTDRLLAFLRAKG-YPLSTPNDRRIVEDVKHTLCYVAADVQEEVKKMHKK-------LQK
##  [96] QYATHTSDVAGRVLTDRLLAFLRAKG-YPLSTPNDRRIVEDVKHTLCYVAADVQEEVKKMHKK-------LQK
##  [97] QYATHTSDVAGRVLTDRLLAFLRAKG-YPLSTPNDRRIVEDVKHTLCYVAADVQEEVKKMHKK-------LQK
##  [98] QYATHTSDVAGRVLTDRLLAFLRAKG-YPLSTPNDRRVVEDVKHTLCYVAADVQEEVKKMHKK-------LQK
##  [99] QHALHVSHGAGRVLTDRLLAFLRGKG-YALSTPRDRDIVESMKRSLCYVAADAAQEVVKLQKK-------KEL
## [100] QHAFHSSPIAGRTLTDRLLEYLRGKG-YTLSTAEDRCLVEKIKRSRCYVAVDAEAEMVDMGRK-------AHL
##   Con P?AIRR?DLAGRDLTE?L??LL?E?G-??FTTS???E?VR??KE?LCYVA?D??EE???????------???E 
## 
##       aln (366..438)
##   [1] VNRKTGESYTVDVGYEKFLGPEMFFSPDIFSREWTL-----------------PLPDVIDKAIWSCPIDCRRP
##   [2] VNRKTGESYTVDVGYEKFLGPEMFFSPDIFSREWTL-----------------PLPDVIDKAIWSCPIDCRRP
##   [3] VNRKTGNPYTVDVGYEKFLGPELFFHPEIFSSEWSL-----------------PLPDVIDKAVWSCPIDCRRP
##   [4] VNSKTGQPYTVDVGYEKFLGPEVFFHPEIFSNEWTT-----------------PLPEVVDKAVWSCPIDCRRP
##   [5] VNSKTGQPYTVDVGYEKFLGPEMFFHPEIFSSEWTT-----------------PLPEVVDKAVWSCPIDCRRP
##   [6] VNSKTGQPYTVDVGYEKFLGPELFFHPEIFSSEWTT-----------------PLPEVVDKAVWSCPIDCRRP
##   [7] VNSKTGQPYTVDVGYEKFLGPEVFFHPEIFSGEWTM-----------------PLPEVVDRAVWSCPIDCRRP
##   [8] VNSKTGQPYAVDVGYEKFLGPEVFFHPEIFSGEWTM-----------------PLPEVVDRAVWSCPIDCRRP
##   [9] VNSKTGQPYTVDVGYEKFLGPEVFFHPEIFSGEWTM-----------------PLPEVVDRAVWSCPIDCRRP 
##   ... ...
##  [93] KTFLLPDGSSCSIGQERFEATEALFQPRLIDVECE------------------GISSQLWNCIQATDIDVRSA
##  [94] KNFLLPDGSSCSIGQERFEAPEALFQPRLIDMECE------------------GISVQLWSCIQAADIDVRAS
##  [95] EYYGLPDEQRIYVEESQFMVPELLFNPSAEGDIGCCGRNNAEVNVDASGVGAGGWTDAIAKVVESAPHFTRPH
##  [96] EYYGLPDEQRIYVEESQFMVPELLFNPSAEGDIGCCGRNNAEVNVDASGVGAGGWTDAIAKVVESAPHFTRPH
##  [97] EYYGLPDEQRIYVEESQFMVPELLFNPSAEGDIGCCGRNNAEVNVDASGVGAGGWTDAIAKVVESAPHFTRPH
##  [98] EYYGLPDEQRIYVEESQFMVPELLFNPSAEGDIGCCGRNNAEVNVDASGVGAGGWTDAIAKVVESAPHFTRPH
##  [99] VCYVLPDEQRIYLHESQFMIPELLFTP--SGDETDNDYNNSNINSSRC---GGGWAEAVTQVVESAPAFTQSH
## [100] DSYELPDEQHIYLHESQFMVPEALFAP--PRDEGSDGGASGEV----------GWAEAVTHVVRKAPPFTQSH
##   Con E?F?LPDG????VG??RF??PEALF?P?LI??????-----------------G??E?????I??C?IDVRR? 
## 
##       aln (439..511)
##   [1] LYRNVVLSGGTTMFPKFDKRLQKDLRALVSRRAKKFTKALGDPSKQITYDVNVVAHERQRYAVWYGGSMLGMS
##   [2] LYRNVVLSGGTTMFPKFDKRLQKDLRALVSRRGKKFTKALGDPSKQITYDVNVVAHERQRYAVWYGGSMLGMS
##   [3] LYRNVVLSGGTTMFPKFDKRLQKDLRELVHRRAEKFTKAFADPKRQITYDVNVVAHERQRYAVWYGGSMLGIS
##   [4] LYRNIVLSGGNTMFPKFDKRLQKDLRAIVDRRAKKNMAAFKDPTRHITYDVNVVSHERQRYAVWYGGSMLGSS
##   [5] LYRNIVLSGGNTMFPKFDKRLQKDLRVIVDRRAKKNMAAFRDPTRHITYDVNVVSHERQRYAVWYGGSMLGSS
##   [6] LYRNIVLSGGNTMFPKFDKRLQKDLRVIVDRRARKNMAASRDPNCHITYDVNVVSHERQRYAVWYGGSMLGSS
##   [7] LYRNIVLSGGTTMFPKFDKRLQKDLRVIVDRRAKKNMEASKDPNRQITYDVNVVSHDRQRYAVWYGGSMLGSS
##   [8] LYRNIVLSGGTTMFPKFDKRLQKDLRVIVDRRAKKNMEASKDPNRQITYDVNVVSHDRQRYAVWYGGSMLGSS
##   [9] LYRNIVLSGGTTMFPKFDKRLQKDLRVIVDRRAKKNMEASKDPNRQITYDVNVVSHDRQRYAVWYGGSMLGSS 
##   ... ...
##  [93] LYSHVVLSGGSTMFPGFPSRIERDMRAAYSERI-----VKGDPERLSRFPLCVEDPPRRRWMSFLGGAALAAV
##  [94] LYAHVVLSGGSTMFPGFPSRIERDMRAFYSERV-----VRGDPERLARFPLCVEDPPRRKWMSFLGGAALASA
##  [95] LLKSIVLGGGNTMFPGIEQRLRREVSALPAS---------------AECEANCVAFRDRDLAAWIGGSVVASM
##  [96] LLKSIVLGGGNTMFPGIEQRLRREVSALPAS---------------VECEANCVAFRDRDLAAWIGGSVVASM
##  [97] LLKSIVLGGGNTMFPGIEQRLRREVSALPAS---------------AECEANCVAFRDRDLAAWIGGSVVASM
##  [98] LLKSIVLGGGNTMFPGIEQRLRREVSALPAS---------------AECEANCVAFRDRDLAAWIGGSVVASM
##  [99] LYANIVLGGGNTMFPGIEERLQHDVAALNAG---------------SRRSVNCIAFPDRDTAAWIGGSVAASM
## [100] LLENIVLGGGNTLFPGLEQRLQHDVSALNTS---------------GEQEVNCVAFPDREMAAWIGASVVASM
##   Con LY?NIVLSGG?TMF??LP?RL?KE???L???--------------???????VVAPP?RKYSVWIGGS?L?SL 
## 
##       aln (512..557)
##   [1] PDFAA-VAKTKQEYDEHGPYVCRRNNMFHSVFE-------------
##   [2] PDFAA-VAKTKQEYDEHGPYVCRRNNMFHSVFE-------------
##   [3] PEFAS-VAKTKQEYEEYGPYICRRNSMYHCVFE-------------
##   [4] PEFAT-LAKTRKEYEEYGPYICRQNNMFHSVFE-------------
##   [5] PEFAT-LAKTKEEYEEYGPYICRQNNMFHSVFE-------------
##   [6] PEFAA-LAKTKAQYEEYGPYICRQNNMFHSVFD-------------
##   [7] PEFST-LAKTKEQYEEYGPYICRQNNMFHSVFD-------------
##   [8] PEFST-LAKTKEQYEEYGPYICRQNNMFHSVFD-------------
##   [9] PEFST-LAKTKEQYEEYGPYICRQNNMFHSVFD------------- 
##   ... ...
##  [93] TAGNHDMWLSKKEWDEGGASAIQARFGV------------------
##  [94] TADSTEMWFSKEEWLEGGPSALRARFGA------------------
##  [95] PTFPH-MCLSRKDYLEKGATVVHERI--------------------
##  [96] PTFPH-MCLSRKDYLEKGATVVHERI--------------------
##  [97] PTFPH-MCLSRKDYLEKGATVVHERI--------------------
##  [98] PTFPH-MCLSRKDYLEKGATVVHERI--------------------
##  [99] PTFLS-TCLARKDYLEKGAALMHAKV--------------------
## [100] PTFSQ-LCLARKDYYEKGVAAMHLRV--------------------
##   Con T?F??-MW?TK?EY?E?GPSIVH?????------------------
msaPrettyPrint(align, y=c(164, 213), output="asis",
               showNames="none", showLogo="none", askForOverwrite=FALSE)
## \begin{texshade}{/var/folders/70/4klcrc8j0nn3br1qq16nc_8w0000gn/T//RtmpmdKuXX/seqffc50318c97.fasta}
## \seqtype{P}
## \setends{consensus}{164..213}
## \shadingmode{identical}
## \threshold{50}
## \showconsensus[ColdHot]{bottom}
## \shadingcolors{blues}
## \hidelogoscale
## \hidenames
## \shownumbering{right}
## \showlegend
## \end{texshade}

ESTADISTICA DESCRIPTIVA

Para esta sección vamos a usar el genoma de SARS-CoV-2. Vamos a usar la función query() del paquete seqinr para descargar la secuencia del genoma completo directamente del servidor de GENBANK (para ello, recuerde que tiene que seleccionar el servidor con la función choosebank()).

Podemos explorar rapidamente el obejto usando algunas funciones básicas vistas anteriormente

sars2
## 1 SQ for AC=MN908947
summary(sars2)
##          Length Class    Mode     
## call     3      -none-   call     
## name     1      -none-   character
## nelem    1      -none-   numeric  
## typelist 1      -none-   character
## req      1      -none-   list     
## socket   1      sockconn numeric
seqinr::getName(sars2)
## [1] "MN908947"
seqinr::getKeyword(sars2)
## [[1]]
## [1] "DIVISION VRL" "SOURCE"       "5'UTR"        "GENE"         "CDS"         
## [6] "3'UTR"        "RELEASE 237"

Como sabemos que la función choosebank() nos da una puerta de acceso temporal y que luego de un tiempo de inactividad caduca, usaremos las función getSequence para descargar la secuencia a nuestro entorno de trabajo y poder trabajar con ella.

covid19 <- seqinr::getSequence(sars2)

Pero como este dentro de una lista, debemos primero sacarlo de la lista con la función unlist, para luego poder hacer los analisis

covid19 <- unlist(covid19)

Rho

Este estadistico nos ayuda a encontrar frecuecuenccias de di,tri, o mas nucleotidos fuera de lo esperado. Pero como sabemos cual es lo esperado. Para eso debemos recordar como calcular la freciencia de que suceda un objeto. Suponga Ud. que en su clase tiene 4 chicos y 6 chicas, cual seria la probabilidad de escoger un alumno y que el sea chico. El calculo es muy sencillo:

\[ \begin{aligned} P(x) = \frac{4}{4+6} \\ P(x) = \frac{4}{10} \\ P(x) = 0.4 \end{aligned} \] Eso quiere decir que la probabilida de escoger un chico al azar es de 0.4. Ahora que pasa si deseamos saber la probabilidad de escoger primero a un hombre y luego a una mujer. En este caso debemos aplicar la “regla de la multiplicacion”

\[ \begin{aligned} P(A|B) = P(A) * P(B) \\ P(A|B) = (0.4)*(0.6) \\ P(A|B) = 0.24 \end{aligned} \]

Esta es la frecuencia que esperamos que ocurra ese evento, podemos llamarlo como “frecuencia esperada”. Ahora esto no estrictamente pasa en la realidad y esto puede moverse mas o menos hacia una frecuencia mas alta o mas baja. Pero, cuando saber que la frecuencia es muy baja o muy alta. Esto lo trataremos en el siguiente capitulo cuando hablemos de la prueba exacta de Fisher. Por ahora usaremos el estadistico de Rho, que es la división de la frecuancia individual de cada evento entre la frecuencia de los eventos juntos. Como Ud. entenderá, si se acerca a uno quiere decir que nuestra “frecuencia observada” coincide con la “frecuencia esperada”.

seqinr::rho(covid19)
## 
##        aa        ac        ag        at        ca        cc        cg        ct 
## 1.0742060 1.2302052 0.9922942 0.8034304 1.2672998 0.8804024 0.4077025 1.1810577 
##        ga        gc        gg        gt        ta        tc        tg        tt 
## 0.9182424 1.0847301 0.9508448 1.0579442 0.8274498 0.8019388 1.3763907 1.0445057

El resultado es un poco complejo de analizar con estos valores. Pero si nosotros restamos 1 a todos los valores, y los graficamo en un barplot será más facil de visualizar si algún dinucleotido sale fuera de lo esperado.

library(ggplot2)              # cargamos la libreria ggplot1 para hacer los graficos
rho1 <- seqinr::rho(covid19)  # guardamos el resultado de rho en un objeto
rho1 <- rho1 - 1              # le restamos 1
rho1 <- as.data.frame(rho1)   # convertimos el objeto en data frame para que este en dos columnas

ggplot(data = rho1, aes(fill=Var1, y=Freq, x=Var1)) + 
  geom_bar(position="stack", stat="identity") +
  labs(title = "FRECUENCIA RELATIVA DE DINUCLEOTIDOS", 
       subtitle = "SARS-CoV-2 genoma", 
       x = "Dinucleotido", 
       y = "Frecuencia relativa")