QUESTION: How to get GeneBank Acession Numbers?

Data

We’ll use the “rentrez” packages (https://cran.r-project.org/web/packages/rentrez/index.html) to address this question. You’ll need to install the package with install.packages(“rentrez”) if you have not done so before, call library(“rentrez”)

###Load Packages

library(rentrez)
library(seqinr)
library(compbio4all) 

The NCBI sequence database

The US National Centre for Biotechnology Information (NCBI) maintains the NCBI Sequence Database, a huge database of all the DNA and protein sequence data that has been collected. There are also similar databases in Europe, the European Molecular Biology Laboratory (EMBL) Sequence Database, and Japan, the DNA Data Bank of Japan (DDBJ). These three databases exchange data every night, so at any one point in time, they contain almost identical data.

Each sequence in the NCBI Sequence Database is stored in a separate record, and is assigned a unique identifier that can be used to refer to that record. The identifier is known as an accession, and consists of a mixture of numbers and letters.

Retrieving genome sequence data using rentrez

You can retrieve sequence data from NCBI directly from R using the rentrez package. To retrieve a sequence with a particular NCBI accession, you can use the function entrez_fetch() from the rentrez package. Note that to be specific where the function comes from I write it as package::function().

For this example I will use the DEN-1 Dengue virus genome sequence. It has an NCBI RefSeq accession NC_001477.

dengueseq_fasta <- rentrez::entrez_fetch(db = "nucleotide", 
                          id = "NC_001477", 
                          rettype = "fasta")

###View Sequence

## [1] ">NC_001477.1 Dengue virus 1, complete genome\nAGTTGTTAGTCTACGTGGACCGACAAGAACAGTTTCGAATCGGAAGCTTGCTTAACGTAGTTCTAACAGT\nTTTTTATTAGAGAGCAGATCTCTGATGAACAACCAACGGAAAAAGACGGGTCGACCGTCTTTCAATATGC\nTGAAACGCGCGAGAAACCGCGTGTCAACTGTTTCACAGTTGGCGAAGAGATTCTCAAAAGGATTGCTTTC\nAGGCCAAGGACCCATGAAATTGGTGATGGCTTTTATAGCATTCCTAAGATTTCTAGCCATACCTCCAACA\nGCAGGAATTTTGGCTAGATGGGGCTCATTCAAGAAGAATGGAGCGATCAAAGTGTTACGGGGTTTCAAGA\nAAGAAATCTCAAACATGTTGAACATAATGAACAGGAGGAAAAGATCTGTGACCATGCTCCTCATGCTGCT\nGCCCACAGCCCTGGCGTTCCATCTGACCACCCGAGGGGGAGAGCCGCACATGATAGTTAGCAAGCAGGAA\nAGAGGAAAATCACTTTTGTTTAAGACCTCTGCAGGTGTCAACATGTGCACCCTTATTGCAATGGATTTGG\nGAGAGTTATGTGAGGACACAATGACCTACAAATGCCCCCGGATCACTGAGACGGAACCAGATGACGTTGA\nCTGTTGGTGCAATGCCACGGAGACATGGGTGACCTATGGAACATGTTCTCAAACTGGTGAACACCGACGA\nGACAAACGTTCCGTCGCACTGGCACCACACGTAGGGCTTGGTCTAGAAACAAGAACCGAAACGTGGATGT\nCCTCTGAAGGCGCTTGGAAACAAATACAAAAAGTGGAGACCTGGGCTCTGAGACACCCAGGATTCACGGT\nGATAGCCCTTTTTCTAGCACATGCCATAGGAACATCCATCACCCAGAAAGGGATCATTTTTATTTTGCTG\nATGCTGGTAACTCCATCCATGGCCATGCGGTGCGTGGGAATAGGCAACAGAGACTTCGTGGAAGGACTGT\nCAGGAGCTACGTGGGTGGATGTGGTACTGGAGCATGGAAGTTGCGTCACTACCATGGCAAAAGACAAACC\nAACACTGGACATTGAACTCTTGAAGACGGAGGTCACAAACCCTGCCGTCCTGCGCAAACTGTGCATTGAA\nGCTAAAATATCAAACACCACCACCGATTCGAGATGTCCAACACAAGGAGAAGCCACGCTGGTGGAAGAAC\nAGGACACGAACTTTGTGTGTCGACGAACGTTCGTGGACAGAGGCTGGGGCAATGGTTGTGGGCTATTCGG\nAAAAGGTAGCTTAATAACGTGTGCTAAGTTTAAGTGTGTGACAAAACTGGAAGGAAAGATAGTCCAATAT\nGAAAACTTAAAATATTCAGTGATAGTCACCGTACACACTGGAGACCAGCACCAAGTTGGAAATGAGACCA\nCAGAACATGGAACAACTGCAACCATAACACCTCAAGCTCCCACGTCGGAAATACAGCTGACAGACTACGG\nAGCTCTAACATTGGATTGTTCACCTAGAACAGGGCTAGACTTTAATGAGATGGTGTTGTTGACAATGAAA\nAAAAAATCATGGCTCGTCCACAAACAATGGTTTCTAGACTTACCACTGCCTTGGACCTCGGGGGCTTCAA\nCATCCCAAGAGACTTGGAATAGACAAGACTTGCTGGTCACATTTAAGACAGCTCATGCAAAAAAGCAGGA\nAGTAGTCGTACTAGGATCACAAGAAGGAGCAATGCACACTGCGTTGACTGGAGCGACAGAAATCCAAACG\nTCTGGAACGACAACAATTTTTGCAGGACACCTGAAATGCAGATTAAAAATGGATAAACTGATTTTAAAAG\nGGATGTCATATGTAATGTGCACAGGGTCATTCAAGTTAGAGAAGGAAGTGGCTGAGACCCAGCATGGAAC\nTGTTCTAGTGCAGGTTAAATACGAAGGAACAGATGCACCATGCAAGATCCCCTTCTCGTCCCAAGATGAG\nAAGGGAGTAACCCAGAATGGGAGATTGATAACAGCCAACCCCATAGTCACTGACAAAGAAAAACCAGTCA\nACATTGAAGCGGAGCCACCTTTTGGTGAGAGCTACATTGTGGTAGGAGCAGGTGAAAAAGCTTTGAAACT\nAAGCTGGTTCAAGAAGGGAAGCAGTATAGGGAAAATGTTTGAAGCAACTGCCCGTGGAGCACGAAGGATG\nGCCATCCTGGGAGACACTGCATGGGACTTCGGTTCTATAGGAGGGGTGTTCACGTCTGTGGGAAAACTGA\nTACACCAGATTTTTGGGACTGCGTATGGAGTTTTGTTCAGCGGTGTTTCTTGGACCATGAAGATAGGAAT\nAGGGATTCTGCTGACATGGCTAGGATTAAACTCAAGGAGCACGTCCCTTTCAATGACGTGTATCGCAGTT\nGGCATGGTCACACTGTACCTAGGAGTCATGGTTCAGGCGGACTCGGGATGTGTAATCAACTGGAAAGGCA\nGAGAACTCAAATGTGGAAGCGGCATTTTTGTCACCAATGAAGTCCACACCTGGACAGAGCAATATAAATT\nCCAGGCCGACTCCCCTAAGAGACTATCAGCGGCCATTGGGAAGGCATGGGAGGAGGGTGTGTGTGGAATT\nCGATCAGCCACTCGTCTCGAGAACATCATGTGGAAGCAAATATCAAATGAATTAAACCACATCTTACTTG\nAAAATGACATGAAATTTACAGTGGTCGTAGGAGACGTTAGTGGAATCTTGGCCCAAGGAAAGAAAATGAT\nTAGGCCACAACCCATGGAACACAAATACTCGTGGAAAAGCTGGGGAAAAGCCAAAATCATAGGAGCAGAT\nGTACAGAATACCACCTTCATCATCGACGGCCCAAACACCCCAGAATGCCCTGATAACCAAAGAGCATGGA\nACATTTGGGAAGTTGAAGACTATGGATTTGGAATTTTCACGACAAACATATGGTTGAAATTGCGTGACTC\nCTACACTCAAGTGTGTGACCACCGGCTAATGTCAGCTGCCATCAAGGATAGCAAAGCAGTCCATGCTGAC\nATGGGGTACTGGATAGAAAGTGAAAAGAACGAGACTTGGAAGTTGGCAAGAGCCTCCTTCATAGAAGTTA\nAGACATGCATCTGGCCAAAATCCCACACTCTATGGAGCAATGGAGTCCTGGAAAGTGAGATGATAATCCC\nAAAGATATATGGAGGACCAATATCTCAGCACAACTACAGACCAGGATATTTCACACAAACAGCAGGGCCG\nTGGCACTTGGGCAAGTTAGAACTAGATTTTGATTTATGTGAAGGTACCACTGTTGTTGTGGATGAACATT\nGTGGAAATCGAGGACCATCTCTTAGAACCACAACAGTCACAGGAAAGACAATCCATGAATGGTGCTGTAG\nATCTTGCACGTTACCCCCCCTACGTTTCAAAGGAGAAGACGGGTGCTGGTACGGCATGGAAATCAGACCA\nGTCAAGGAGAAGGAAGAGAACCTAGTTAAGTCAATGGTCTCTGCAGGGTCAGGAGAAGTGGACAGTTTTT\nCACTAGGACTGCTATGCATATCAATAATGATCGAAGAGGTAATGAGATCCAGATGGAGCAGAAAAATGCT\nGATGACTGGAACATTGGCTGTGTTCCTCCTTCTCACAATGGGACAATTGACATGGAATGATCTGATCAGG\nCTATGTATCATGGTTGGAGCCAACGCTTCAGACAAGATGGGGATGGGAACAACGTACCTAGCTTTGATGG\nCCACTTTCAGAATGAGACCAATGTTCGCAGTCGGGCTACTGTTTCGCAGATTAACATCTAGAGAAGTTCT\nTCTTCTTACAGTTGGATTGAGTCTGGTGGCATCTGTAGAACTACCAAATTCCTTAGAGGAGCTAGGGGAT\nGGACTTGCAATGGGCATCATGATGTTGAAATTACTGACTGATTTTCAGTCACATCAGCTATGGGCTACCT\nTGCTGTCTTTAACATTTGTCAAAACAACTTTTTCATTGCACTATGCATGGAAGACAATGGCTATGATACT\nGTCAATTGTATCTCTCTTCCCTTTATGCCTGTCCACGACTTCTCAAAAAACAACATGGCTTCCGGTGTTG\nCTGGGATCTCTTGGATGCAAACCACTAACCATGTTTCTTATAACAGAAAACAAAATCTGGGGAAGGAAAA\nGCTGGCCTCTCAATGAAGGAATTATGGCTGTTGGAATAGTTAGCATTCTTCTAAGTTCACTTCTCAAGAA\nTGATGTGCCACTAGCTGGCCCACTAATAGCTGGAGGCATGCTAATAGCATGTTATGTCATATCTGGAAGC\nTCGGCCGATTTATCACTGGAGAAAGCGGCTGAGGTCTCCTGGGAAGAAGAAGCAGAACACTCTGGTGCCT\nCACACAACATACTAGTGGAGGTCCAAGATGATGGAACCATGAAGATAAAGGATGAAGAGAGAGATGACAC\nACTCACCATTCTCCTCAAAGCAACTCTGCTAGCAATCTCAGGGGTATACCCAATGTCAATACCGGCGACC\nCTCTTTGTGTGGTATTTTTGGCAGAAAAAGAAACAGAGATCAGGAGTGCTATGGGACACACCCAGCCCTC\nCAGAAGTGGAAAGAGCAGTCCTTGATGATGGCATTTATAGAATTCTCCAAAGAGGATTGTTGGGCAGGTC\nTCAAGTAGGAGTAGGAGTTTTTCAAGAAGGCGTGTTCCACACAATGTGGCACGTCACCAGGGGAGCTGTC\nCTCATGTACCAAGGGAAGAGACTGGAACCAAGTTGGGCCAGTGTCAAAAAAGACTTGATCTCATATGGAG\nGAGGTTGGAGGTTTCAAGGATCCTGGAACGCGGGAGAAGAAGTGCAGGTGATTGCTGTTGAACCGGGGAA\nGAACCCCAAAAATGTACAGACAGCGCCGGGTACCTTCAAGACCCCTGAAGGCGAAGTTGGAGCCATAGCT\nCTAGACTTTAAACCCGGCACATCTGGATCTCCTATCGTGAACAGAGAGGGAAAAATAGTAGGTCTTTATG\nGAAATGGAGTGGTGACAACAAGTGGTACCTACGTCAGTGCCATAGCTCAAGCTAAAGCATCACAAGAAGG\nGCCTCTACCAGAGATTGAGGACGAGGTGTTTAGGAAAAGAAACTTAACAATAATGGACCTACATCCAGGA\nTCGGGAAAAACAAGAAGATACCTTCCAGCCATAGTCCGTGAGGCCATAAAAAGAAAGCTGCGCACGCTAG\nTCTTAGCTCCCACAAGAGTTGTCGCTTCTGAAATGGCAGAGGCGCTCAAGGGAATGCCAATAAGGTATCA\nGACAACAGCAGTGAAGAGTGAACACACGGGAAAGGAGATAGTTGACCTTATGTGTCACGCCACTTTCACT\nATGCGTCTCCTGTCTCCTGTGAGAGTTCCCAATTATAATATGATTATCATGGATGAAGCACATTTTACCG\nATCCAGCCAGCATAGCAGCCAGAGGGTATATCTCAACCCGAGTGGGTATGGGTGAAGCAGCTGCGATTTT\nCATGACAGCCACTCCCCCCGGATCGGTGGAGGCCTTTCCACAGAGCAATGCAGTTATCCAAGATGAGGAA\nAGAGACATTCCTGAAAGATCATGGAACTCAGGCTATGACTGGATCACTGATTTCCCAGGTAAAACAGTCT\nGGTTTGTTCCAAGCATCAAATCAGGAAATGACATTGCCAACTGTTTAAGAAAGAATGGGAAACGGGTGGT\nCCAATTGAGCAGAAAAACTTTTGACACTGAGTACCAGAAAACAAAAAATAACGACTGGGACTATGTTGTC\nACAACAGACATATCCGAAATGGGAGCAAACTTCCGAGCCGACAGGGTAATAGACCCGAGGCGGTGCCTGA\nAACCGGTAATACTAAAAGATGGCCCAGAGCGTGTCATTCTAGCCGGACCGATGCCAGTGACTGTGGCTAG\nCGCCGCCCAGAGGAGAGGAAGAATTGGAAGGAACCAAAATAAGGAAGGCGATCAGTATATTTACATGGGA\nCAGCCTCTAAACAATGATGAGGACCACGCCCATTGGACAGAAGCAAAAATGCTCCTTGACAACATAAACA\nCACCAGAAGGGATTATCCCAGCCCTCTTTGAGCCGGAGAGAGAAAAGAGTGCAGCAATAGACGGGGAATA\nCAGACTACGGGGTGAAGCGAGGAAAACGTTCGTGGAGCTCATGAGAAGAGGAGATCTACCTGTCTGGCTA\nTCCTACAAAGTTGCCTCAGAAGGCTTCCAGTACTCCGACAGAAGGTGGTGCTTTGATGGGGAAAGGAACA\nACCAGGTGTTGGAGGAGAACATGGACGTGGAGATCTGGACAAAAGAAGGAGAAAGAAAGAAACTACGACC\nCCGCTGGCTGGATGCCAGAACATACTCTGACCCACTGGCTCTGCGCGAATTCAAAGAGTTCGCAGCAGGA\nAGAAGAAGCGTCTCAGGTGACCTAATATTAGAAATAGGGAAACTTCCACAACATTTAACGCAAAGGGCCC\nAGAACGCCTTGGACAATCTGGTTATGTTGCACAACTCTGAACAAGGAGGAAAAGCCTATAGACACGCCAT\nGGAAGAACTACCAGACACCATAGAAACGTTAATGCTCCTAGCTTTGATAGCTGTGCTGACTGGTGGAGTG\nACGTTGTTCTTCCTATCAGGAAGGGGTCTAGGAAAAACATCCATTGGCCTACTCTGCGTGATTGCCTCAA\nGTGCACTGTTATGGATGGCCAGTGTGGAACCCCATTGGATAGCGGCCTCTATCATACTGGAGTTCTTTCT\nGATGGTGTTGCTTATTCCAGAGCCGGACAGACAGCGCACTCCACAAGACAACCAGCTAGCATACGTGGTG\nATAGGTCTGTTATTCATGATATTGACAGTGGCAGCCAATGAGATGGGATTACTGGAAACCACAAAGAAGG\nACCTGGGGATTGGTCATGCAGCTGCTGAAAACCACCATCATGCTGCAATGCTGGACGTAGACCTACATCC\nAGCTTCAGCCTGGACTCTCTATGCAGTGGCCACAACAATTATCACTCCCATGATGAGACACACAATTGAA\nAACACAACGGCAAATATTTCCCTGACAGCTATTGCAAACCAGGCAGCTATATTGATGGGACTTGACAAGG\nGATGGCCAATATCAAAGATGGACATAGGAGTTCCACTTCTCGCCTTGGGGTGCTATTCTCAGGTGAACCC\nGCTGACGCTGACAGCGGCGGTATTGATGCTAGTGGCTCATTATGCCATAATTGGACCCGGACTGCAAGCA\nAAAGCTACTAGAGAAGCTCAAAAAAGGACAGCAGCCGGAATAATGAAAAACCCAACTGTCGACGGGATCG\nTTGCAATAGATTTGGACCCTGTGGTTTACGATGCAAAATTTGAAAAACAGCTAGGCCAAATAATGTTGTT\nGATACTTTGCACATCACAGATCCTCCTGATGCGGACCACATGGGCCTTGTGTGAATCCATCACACTAGCC\nACTGGACCTCTGACTACGCTTTGGGAGGGATCTCCAGGAAAATTCTGGAACACCACGATAGCGGTGTCCA\nTGGCAAACATTTTTAGGGGAAGTTATCTAGCAGGAGCAGGTCTGGCCTTTTCATTAATGAAATCTCTAGG\nAGGAGGTAGGAGAGGCACGGGAGCCCAAGGGGAAACACTGGGAGAAAAATGGAAAAGACAGCTAAACCAA\nTTGAGCAAGTCAGAATTCAACACTTACAAAAGGAGTGGGATTATAGAGGTGGATAGATCTGAAGCCAAAG\nAGGGGTTAAAAAGAGGAGAAACGACTAAACACGCAGTGTCGAGAGGAACGGCCAAACTGAGGTGGTTTGT\nGGAGAGGAACCTTGTGAAACCAGAAGGGAAAGTCATAGACCTCGGTTGTGGAAGAGGTGGCTGGTCATAT\nTATTGCGCTGGGCTGAAGAAAGTCACAGAAGTGAAAGGATACACGAAAGGAGGACCTGGACATGAGGAAC\nCAATCCCAATGGCAACCTATGGATGGAACCTAGTAAAGCTATACTCCGGGAAAGATGTATTCTTTACACC\nACCTGAGAAATGTGACACCCTCTTGTGTGATATTGGTGAGTCCTCTCCGAACCCAACTATAGAAGAAGGA\nAGAACGTTACGTGTTCTAAAGATGGTGGAACCATGGCTCAGAGGAAACCAATTTTGCATAAAAATTCTAA\nATCCCTATATGCCGAGTGTGGTAGAAACTTTGGAGCAAATGCAAAGAAAACATGGAGGAATGCTAGTGCG\nAAATCCACTCTCAAGAAACTCCACTCATGAAATGTACTGGGTTTCATGTGGAACAGGAAACATTGTGTCA\nGCAGTAAACATGACATCTAGAATGCTGCTAAATCGATTCACAATGGCTCACAGGAAGCCAACATATGAAA\nGAGACGTGGACTTAGGCGCTGGAACAAGACATGTGGCAGTAGAACCAGAGGTGGCCAACCTAGATATCAT\nTGGCCAGAGGATAGAGAATATAAAAAATGAACACAAATCAACATGGCATTATGATGAGGACAATCCATAC\nAAAACATGGGCCTATCATGGATCATATGAGGTCAAGCCATCAGGATCAGCCTCATCCATGGTCAATGGTG\nTGGTGAGACTGCTAACCAAACCATGGGATGTCATTCCCATGGTCACACAAATAGCCATGACTGACACCAC\nACCCTTTGGACAACAGAGGGTGTTTAAAGAGAAAGTTGACACGCGTACACCAAAAGCGAAACGAGGCACA\nGCACAAATTATGGAGGTGACAGCCAGGTGGTTATGGGGTTTTCTCTCTAGAAACAAAAAACCCAGAATCT\nGCACAAGAGAGGAGTTCACAAGAAAAGTCAGGTCAAACGCAGCTATTGGAGCAGTGTTCGTTGATGAAAA\nTCAATGGAACTCAGCAAAAGAGGCAGTGGAAGATGAACGGTTCTGGGACCTTGTGCACAGAGAGAGGGAG\nCTTCATAAACAAGGAAAATGTGCCACGTGTGTCTACAACATGATGGGAAAGAGAGAGAAAAAATTAGGAG\nAGTTCGGAAAGGCAAAAGGAAGTCGCGCAATATGGTACATGTGGTTGGGAGCGCGCTTTTTAGAGTTTGA\nAGCCCTTGGTTTCATGAATGAAGATCACTGGTTCAGCAGAGAGAATTCACTCAGTGGAGTGGAAGGAGAA\nGGACTCCACAAACTTGGATACATACTCAGAGACATATCAAAGATTCCAGGGGGAAATATGTATGCAGATG\nACACAGCCGGATGGGACACAAGAATAACAGAGGATGATCTTCAGAATGAGGCCAAAATCACTGACATCAT\nGGAACCTGAACATGCCCTATTGGCCACGTCAATCTTTAAGCTAACCTACCAAAACAAGGTAGTAAGGGTG\nCAGAGACCAGCGAAAAATGGAACCGTGATGGATGTCATATCCAGACGTGACCAGAGAGGAAGTGGACAGG\nTTGGAACCTATGGCTTAAACACCTTCACCAACATGGAGGCCCAACTAATAAGACAAATGGAGTCTGAGGG\nAATCTTTTCACCCAGCGAATTGGAAACCCCAAATCTAGCCGAAAGAGTCCTCGACTGGTTGAAAAAACAT\nGGCACCGAGAGGCTGAAAAGAATGGCAATCAGTGGAGATGACTGTGTGGTGAAACCAATCGATGACAGAT\nTTGCAACAGCCTTAACAGCTTTGAATGACATGGGAAAGGTAAGAAAAGACATACCGCAATGGGAACCTTC\nAAAAGGATGGAATGATTGGCAACAAGTGCCTTTCTGTTCACACCATTTCCACCAGCTGATTATGAAGGAT\nGGGAGGGAGATAGTGGTGCCATGCCGCAACCAAGATGAACTTGTAGGTAGGGCCAGAGTATCACAAGGCG\nCCGGATGGAGCTTGAGAGAAACTGCATGCCTAGGCAAGTCATATGCACAAATGTGGCAGCTGATGTACTT\nCCACAGGAGAGACTTGAGATTAGCGGCTAATGCTATCTGTTCAGCCGTTCCAGTTGATTGGGTCCCAACC\nAGCCGCACCACCTGGTCGATCCATGCCCACCATCAATGGATGACAACAGAAGACATGTTGTCAGTGTGGA\nATAGGGTTTGGATAGAGGAAAACCCATGGATGGAGGACAAGACTCATGTGTCCAGTTGGGAAGACGTTCC\nATACCTAGGAAAAAGGGAAGATCAATGGTGTGGTTCCCTAATAGGCTTAACAGCACGAGCCACCTGGGCC\nACCAACATACAAGTGGCCATAAACCAAGTGAGAAGGCTCATTGGGAATGAGAATTATCTAGACTTCATGA\nCATCAATGAAGAGATTCAAAAACGAGAGTGATCCCGAAGGGGCACTCTGGTAAGCCAACTCATTCACAAA\nATAAAGGAAAATAAAAAATCAAACAAGGCAAGAAGTCAGGCCGGATTAAGCCATAGCACGGTAAGAGCTA\nTGCTGCCTGTGAGCCCCGTCCAAGGACGTAAAATGAAGTCAGGCCGAAAGCCACGGTTCGAGCAAGCCGT\nGCTGCCTGTAGCTCCATCGTGGGGATGTAAAAACCCGGGAGGCTGCAAACCATGGAAGCTGTACGCATGG\nGGTAGCAGACTAGTGGTTAGAGGAGACCCCTCCCAAGACACAACGCAGCAGCGGGGCCCAACACCAGGGG\nAAGCTGTACCCTGGTGGTAAGGACTAGAGGTTAGAGGAGACCCCCCGCACAACAACAAACAGCATATTGA\nCGCTGGGAGAGACCAGAGATCCTGCTGTCTCTACAGCATCATTCCAGGCACAGAACGCCAAAAAATGGAA\nTGGTGCTGTTGAATCAACAGGTTCT\n\n"