We’ll use the “rentrez” packages (https://cran.r-project.org/web/packages/rentrez/index.html) to address this question. You’ll need to install the package with install.packages(“rentrez”) if you have not done so before, call library(“rentrez”)
###Load Packages
library(rentrez)
library(seqinr)
library(compbio4all)
The US National Centre for Biotechnology Information (NCBI) maintains the NCBI Sequence Database, a huge database of all the DNA and protein sequence data that has been collected. There are also similar databases in Europe, the European Molecular Biology Laboratory (EMBL) Sequence Database, and Japan, the DNA Data Bank of Japan (DDBJ). These three databases exchange data every night, so at any one point in time, they contain almost identical data.
Each sequence in the NCBI Sequence Database is stored in a separate record, and is assigned a unique identifier that can be used to refer to that record. The identifier is known as an accession, and consists of a mixture of numbers and letters.
You can retrieve sequence data from NCBI directly from R using the rentrez package. To retrieve a sequence with a particular NCBI accession, you can use the function entrez_fetch() from the rentrez package. Note that to be specific where the function comes from I write it as package::function().
For this example I will use the DEN-1 Dengue virus genome sequence. It has an NCBI RefSeq accession NC_001477.
dengueseq_fasta <- rentrez::entrez_fetch(db = "nucleotide",
id = "NC_001477",
rettype = "fasta")
###View Sequence
## [1] ">NC_001477.1 Dengue virus 1, complete genome\nAGTTGTTAGTCTACGTGGACCGACAAGAACAGTTTCGAATCGGAAGCTTGCTTAACGTAGTTCTAACAGT\nTTTTTATTAGAGAGCAGATCTCTGATGAACAACCAACGGAAAAAGACGGGTCGACCGTCTTTCAATATGC\nTGAAACGCGCGAGAAACCGCGTGTCAACTGTTTCACAGTTGGCGAAGAGATTCTCAAAAGGATTGCTTTC\nAGGCCAAGGACCCATGAAATTGGTGATGGCTTTTATAGCATTCCTAAGATTTCTAGCCATACCTCCAACA\nGCAGGAATTTTGGCTAGATGGGGCTCATTCAAGAAGAATGGAGCGATCAAAGTGTTACGGGGTTTCAAGA\nAAGAAATCTCAAACATGTTGAACATAATGAACAGGAGGAAAAGATCTGTGACCATGCTCCTCATGCTGCT\nGCCCACAGCCCTGGCGTTCCATCTGACCACCCGAGGGGGAGAGCCGCACATGATAGTTAGCAAGCAGGAA\nAGAGGAAAATCACTTTTGTTTAAGACCTCTGCAGGTGTCAACATGTGCACCCTTATTGCAATGGATTTGG\nGAGAGTTATGTGAGGACACAATGACCTACAAATGCCCCCGGATCACTGAGACGGAACCAGATGACGTTGA\nCTGTTGGTGCAATGCCACGGAGACATGGGTGACCTATGGAACATGTTCTCAAACTGGTGAACACCGACGA\nGACAAACGTTCCGTCGCACTGGCACCACACGTAGGGCTTGGTCTAGAAACAAGAACCGAAACGTGGATGT\nCCTCTGAAGGCGCTTGGAAACAAATACAAAAAGTGGAGACCTGGGCTCTGAGACACCCAGGATTCACGGT\nGATAGCCCTTTTTCTAGCACATGCCATAGGAACATCCATCACCCAGAAAGGGATCATTTTTATTTTGCTG\nATGCTGGTAACTCCATCCATGGCCATGCGGTGCGTGGGAATAGGCAACAGAGACTTCGTGGAAGGACTGT\nCAGGAGCTACGTGGGTGGATGTGGTACTGGAGCATGGAAGTTGCGTCACTACCATGGCAAAAGACAAACC\nAACACTGGACATTGAACTCTTGAAGACGGAGGTCACAAACCCTGCCGTCCTGCGCAAACTGTGCATTGAA\nGCTAAAATATCAAACACCACCACCGATTCGAGATGTCCAACACAAGGAGAAGCCACGCTGGTGGAAGAAC\nAGGACACGAACTTTGTGTGTCGACGAACGTTCGTGGACAGAGGCTGGGGCAATGGTTGTGGGCTATTCGG\nAAAAGGTAGCTTAATAACGTGTGCTAAGTTTAAGTGTGTGACAAAACTGGAAGGAAAGATAGTCCAATAT\nGAAAACTTAAAATATTCAGTGATAGTCACCGTACACACTGGAGACCAGCACCAAGTTGGAAATGAGACCA\nCAGAACATGGAACAACTGCAACCATAACACCTCAAGCTCCCACGTCGGAAATACAGCTGACAGACTACGG\nAGCTCTAACATTGGATTGTTCACCTAGAACAGGGCTAGACTTTAATGAGATGGTGTTGTTGACAATGAAA\nAAAAAATCATGGCTCGTCCACAAACAATGGTTTCTAGACTTACCACTGCCTTGGACCTCGGGGGCTTCAA\nCATCCCAAGAGACTTGGAATAGACAAGACTTGCTGGTCACATTTAAGACAGCTCATGCAAAAAAGCAGGA\nAGTAGTCGTACTAGGATCACAAGAAGGAGCAATGCACACTGCGTTGACTGGAGCGACAGAAATCCAAACG\nTCTGGAACGACAACAATTTTTGCAGGACACCTGAAATGCAGATTAAAAATGGATAAACTGATTTTAAAAG\nGGATGTCATATGTAATGTGCACAGGGTCATTCAAGTTAGAGAAGGAAGTGGCTGAGACCCAGCATGGAAC\nTGTTCTAGTGCAGGTTAAATACGAAGGAACAGATGCACCATGCAAGATCCCCTTCTCGTCCCAAGATGAG\nAAGGGAGTAACCCAGAATGGGAGATTGATAACAGCCAACCCCATAGTCACTGACAAAGAAAAACCAGTCA\nACATTGAAGCGGAGCCACCTTTTGGTGAGAGCTACATTGTGGTAGGAGCAGGTGAAAAAGCTTTGAAACT\nAAGCTGGTTCAAGAAGGGAAGCAGTATAGGGAAAATGTTTGAAGCAACTGCCCGTGGAGCACGAAGGATG\nGCCATCCTGGGAGACACTGCATGGGACTTCGGTTCTATAGGAGGGGTGTTCACGTCTGTGGGAAAACTGA\nTACACCAGATTTTTGGGACTGCGTATGGAGTTTTGTTCAGCGGTGTTTCTTGGACCATGAAGATAGGAAT\nAGGGATTCTGCTGACATGGCTAGGATTAAACTCAAGGAGCACGTCCCTTTCAATGACGTGTATCGCAGTT\nGGCATGGTCACACTGTACCTAGGAGTCATGGTTCAGGCGGACTCGGGATGTGTAATCAACTGGAAAGGCA\nGAGAACTCAAATGTGGAAGCGGCATTTTTGTCACCAATGAAGTCCACACCTGGACAGAGCAATATAAATT\nCCAGGCCGACTCCCCTAAGAGACTATCAGCGGCCATTGGGAAGGCATGGGAGGAGGGTGTGTGTGGAATT\nCGATCAGCCACTCGTCTCGAGAACATCATGTGGAAGCAAATATCAAATGAATTAAACCACATCTTACTTG\nAAAATGACATGAAATTTACAGTGGTCGTAGGAGACGTTAGTGGAATCTTGGCCCAAGGAAAGAAAATGAT\nTAGGCCACAACCCATGGAACACAAATACTCGTGGAAAAGCTGGGGAAAAGCCAAAATCATAGGAGCAGAT\nGTACAGAATACCACCTTCATCATCGACGGCCCAAACACCCCAGAATGCCCTGATAACCAAAGAGCATGGA\nACATTTGGGAAGTTGAAGACTATGGATTTGGAATTTTCACGACAAACATATGGTTGAAATTGCGTGACTC\nCTACACTCAAGTGTGTGACCACCGGCTAATGTCAGCTGCCATCAAGGATAGCAAAGCAGTCCATGCTGAC\nATGGGGTACTGGATAGAAAGTGAAAAGAACGAGACTTGGAAGTTGGCAAGAGCCTCCTTCATAGAAGTTA\nAGACATGCATCTGGCCAAAATCCCACACTCTATGGAGCAATGGAGTCCTGGAAAGTGAGATGATAATCCC\nAAAGATATATGGAGGACCAATATCTCAGCACAACTACAGACCAGGATATTTCACACAAACAGCAGGGCCG\nTGGCACTTGGGCAAGTTAGAACTAGATTTTGATTTATGTGAAGGTACCACTGTTGTTGTGGATGAACATT\nGTGGAAATCGAGGACCATCTCTTAGAACCACAACAGTCACAGGAAAGACAATCCATGAATGGTGCTGTAG\nATCTTGCACGTTACCCCCCCTACGTTTCAAAGGAGAAGACGGGTGCTGGTACGGCATGGAAATCAGACCA\nGTCAAGGAGAAGGAAGAGAACCTAGTTAAGTCAATGGTCTCTGCAGGGTCAGGAGAAGTGGACAGTTTTT\nCACTAGGACTGCTATGCATATCAATAATGATCGAAGAGGTAATGAGATCCAGATGGAGCAGAAAAATGCT\nGATGACTGGAACATTGGCTGTGTTCCTCCTTCTCACAATGGGACAATTGACATGGAATGATCTGATCAGG\nCTATGTATCATGGTTGGAGCCAACGCTTCAGACAAGATGGGGATGGGAACAACGTACCTAGCTTTGATGG\nCCACTTTCAGAATGAGACCAATGTTCGCAGTCGGGCTACTGTTTCGCAGATTAACATCTAGAGAAGTTCT\nTCTTCTTACAGTTGGATTGAGTCTGGTGGCATCTGTAGAACTACCAAATTCCTTAGAGGAGCTAGGGGAT\nGGACTTGCAATGGGCATCATGATGTTGAAATTACTGACTGATTTTCAGTCACATCAGCTATGGGCTACCT\nTGCTGTCTTTAACATTTGTCAAAACAACTTTTTCATTGCACTATGCATGGAAGACAATGGCTATGATACT\nGTCAATTGTATCTCTCTTCCCTTTATGCCTGTCCACGACTTCTCAAAAAACAACATGGCTTCCGGTGTTG\nCTGGGATCTCTTGGATGCAAACCACTAACCATGTTTCTTATAACAGAAAACAAAATCTGGGGAAGGAAAA\nGCTGGCCTCTCAATGAAGGAATTATGGCTGTTGGAATAGTTAGCATTCTTCTAAGTTCACTTCTCAAGAA\nTGATGTGCCACTAGCTGGCCCACTAATAGCTGGAGGCATGCTAATAGCATGTTATGTCATATCTGGAAGC\nTCGGCCGATTTATCACTGGAGAAAGCGGCTGAGGTCTCCTGGGAAGAAGAAGCAGAACACTCTGGTGCCT\nCACACAACATACTAGTGGAGGTCCAAGATGATGGAACCATGAAGATAAAGGATGAAGAGAGAGATGACAC\nACTCACCATTCTCCTCAAAGCAACTCTGCTAGCAATCTCAGGGGTATACCCAATGTCAATACCGGCGACC\nCTCTTTGTGTGGTATTTTTGGCAGAAAAAGAAACAGAGATCAGGAGTGCTATGGGACACACCCAGCCCTC\nCAGAAGTGGAAAGAGCAGTCCTTGATGATGGCATTTATAGAATTCTCCAAAGAGGATTGTTGGGCAGGTC\nTCAAGTAGGAGTAGGAGTTTTTCAAGAAGGCGTGTTCCACACAATGTGGCACGTCACCAGGGGAGCTGTC\nCTCATGTACCAAGGGAAGAGACTGGAACCAAGTTGGGCCAGTGTCAAAAAAGACTTGATCTCATATGGAG\nGAGGTTGGAGGTTTCAAGGATCCTGGAACGCGGGAGAAGAAGTGCAGGTGATTGCTGTTGAACCGGGGAA\nGAACCCCAAAAATGTACAGACAGCGCCGGGTACCTTCAAGACCCCTGAAGGCGAAGTTGGAGCCATAGCT\nCTAGACTTTAAACCCGGCACATCTGGATCTCCTATCGTGAACAGAGAGGGAAAAATAGTAGGTCTTTATG\nGAAATGGAGTGGTGACAACAAGTGGTACCTACGTCAGTGCCATAGCTCAAGCTAAAGCATCACAAGAAGG\nGCCTCTACCAGAGATTGAGGACGAGGTGTTTAGGAAAAGAAACTTAACAATAATGGACCTACATCCAGGA\nTCGGGAAAAACAAGAAGATACCTTCCAGCCATAGTCCGTGAGGCCATAAAAAGAAAGCTGCGCACGCTAG\nTCTTAGCTCCCACAAGAGTTGTCGCTTCTGAAATGGCAGAGGCGCTCAAGGGAATGCCAATAAGGTATCA\nGACAACAGCAGTGAAGAGTGAACACACGGGAAAGGAGATAGTTGACCTTATGTGTCACGCCACTTTCACT\nATGCGTCTCCTGTCTCCTGTGAGAGTTCCCAATTATAATATGATTATCATGGATGAAGCACATTTTACCG\nATCCAGCCAGCATAGCAGCCAGAGGGTATATCTCAACCCGAGTGGGTATGGGTGAAGCAGCTGCGATTTT\nCATGACAGCCACTCCCCCCGGATCGGTGGAGGCCTTTCCACAGAGCAATGCAGTTATCCAAGATGAGGAA\nAGAGACATTCCTGAAAGATCATGGAACTCAGGCTATGACTGGATCACTGATTTCCCAGGTAAAACAGTCT\nGGTTTGTTCCAAGCATCAAATCAGGAAATGACATTGCCAACTGTTTAAGAAAGAATGGGAAACGGGTGGT\nCCAATTGAGCAGAAAAACTTTTGACACTGAGTACCAGAAAACAAAAAATAACGACTGGGACTATGTTGTC\nACAACAGACATATCCGAAATGGGAGCAAACTTCCGAGCCGACAGGGTAATAGACCCGAGGCGGTGCCTGA\nAACCGGTAATACTAAAAGATGGCCCAGAGCGTGTCATTCTAGCCGGACCGATGCCAGTGACTGTGGCTAG\nCGCCGCCCAGAGGAGAGGAAGAATTGGAAGGAACCAAAATAAGGAAGGCGATCAGTATATTTACATGGGA\nCAGCCTCTAAACAATGATGAGGACCACGCCCATTGGACAGAAGCAAAAATGCTCCTTGACAACATAAACA\nCACCAGAAGGGATTATCCCAGCCCTCTTTGAGCCGGAGAGAGAAAAGAGTGCAGCAATAGACGGGGAATA\nCAGACTACGGGGTGAAGCGAGGAAAACGTTCGTGGAGCTCATGAGAAGAGGAGATCTACCTGTCTGGCTA\nTCCTACAAAGTTGCCTCAGAAGGCTTCCAGTACTCCGACAGAAGGTGGTGCTTTGATGGGGAAAGGAACA\nACCAGGTGTTGGAGGAGAACATGGACGTGGAGATCTGGACAAAAGAAGGAGAAAGAAAGAAACTACGACC\nCCGCTGGCTGGATGCCAGAACATACTCTGACCCACTGGCTCTGCGCGAATTCAAAGAGTTCGCAGCAGGA\nAGAAGAAGCGTCTCAGGTGACCTAATATTAGAAATAGGGAAACTTCCACAACATTTAACGCAAAGGGCCC\nAGAACGCCTTGGACAATCTGGTTATGTTGCACAACTCTGAACAAGGAGGAAAAGCCTATAGACACGCCAT\nGGAAGAACTACCAGACACCATAGAAACGTTAATGCTCCTAGCTTTGATAGCTGTGCTGACTGGTGGAGTG\nACGTTGTTCTTCCTATCAGGAAGGGGTCTAGGAAAAACATCCATTGGCCTACTCTGCGTGATTGCCTCAA\nGTGCACTGTTATGGATGGCCAGTGTGGAACCCCATTGGATAGCGGCCTCTATCATACTGGAGTTCTTTCT\nGATGGTGTTGCTTATTCCAGAGCCGGACAGACAGCGCACTCCACAAGACAACCAGCTAGCATACGTGGTG\nATAGGTCTGTTATTCATGATATTGACAGTGGCAGCCAATGAGATGGGATTACTGGAAACCACAAAGAAGG\nACCTGGGGATTGGTCATGCAGCTGCTGAAAACCACCATCATGCTGCAATGCTGGACGTAGACCTACATCC\nAGCTTCAGCCTGGACTCTCTATGCAGTGGCCACAACAATTATCACTCCCATGATGAGACACACAATTGAA\nAACACAACGGCAAATATTTCCCTGACAGCTATTGCAAACCAGGCAGCTATATTGATGGGACTTGACAAGG\nGATGGCCAATATCAAAGATGGACATAGGAGTTCCACTTCTCGCCTTGGGGTGCTATTCTCAGGTGAACCC\nGCTGACGCTGACAGCGGCGGTATTGATGCTAGTGGCTCATTATGCCATAATTGGACCCGGACTGCAAGCA\nAAAGCTACTAGAGAAGCTCAAAAAAGGACAGCAGCCGGAATAATGAAAAACCCAACTGTCGACGGGATCG\nTTGCAATAGATTTGGACCCTGTGGTTTACGATGCAAAATTTGAAAAACAGCTAGGCCAAATAATGTTGTT\nGATACTTTGCACATCACAGATCCTCCTGATGCGGACCACATGGGCCTTGTGTGAATCCATCACACTAGCC\nACTGGACCTCTGACTACGCTTTGGGAGGGATCTCCAGGAAAATTCTGGAACACCACGATAGCGGTGTCCA\nTGGCAAACATTTTTAGGGGAAGTTATCTAGCAGGAGCAGGTCTGGCCTTTTCATTAATGAAATCTCTAGG\nAGGAGGTAGGAGAGGCACGGGAGCCCAAGGGGAAACACTGGGAGAAAAATGGAAAAGACAGCTAAACCAA\nTTGAGCAAGTCAGAATTCAACACTTACAAAAGGAGTGGGATTATAGAGGTGGATAGATCTGAAGCCAAAG\nAGGGGTTAAAAAGAGGAGAAACGACTAAACACGCAGTGTCGAGAGGAACGGCCAAACTGAGGTGGTTTGT\nGGAGAGGAACCTTGTGAAACCAGAAGGGAAAGTCATAGACCTCGGTTGTGGAAGAGGTGGCTGGTCATAT\nTATTGCGCTGGGCTGAAGAAAGTCACAGAAGTGAAAGGATACACGAAAGGAGGACCTGGACATGAGGAAC\nCAATCCCAATGGCAACCTATGGATGGAACCTAGTAAAGCTATACTCCGGGAAAGATGTATTCTTTACACC\nACCTGAGAAATGTGACACCCTCTTGTGTGATATTGGTGAGTCCTCTCCGAACCCAACTATAGAAGAAGGA\nAGAACGTTACGTGTTCTAAAGATGGTGGAACCATGGCTCAGAGGAAACCAATTTTGCATAAAAATTCTAA\nATCCCTATATGCCGAGTGTGGTAGAAACTTTGGAGCAAATGCAAAGAAAACATGGAGGAATGCTAGTGCG\nAAATCCACTCTCAAGAAACTCCACTCATGAAATGTACTGGGTTTCATGTGGAACAGGAAACATTGTGTCA\nGCAGTAAACATGACATCTAGAATGCTGCTAAATCGATTCACAATGGCTCACAGGAAGCCAACATATGAAA\nGAGACGTGGACTTAGGCGCTGGAACAAGACATGTGGCAGTAGAACCAGAGGTGGCCAACCTAGATATCAT\nTGGCCAGAGGATAGAGAATATAAAAAATGAACACAAATCAACATGGCATTATGATGAGGACAATCCATAC\nAAAACATGGGCCTATCATGGATCATATGAGGTCAAGCCATCAGGATCAGCCTCATCCATGGTCAATGGTG\nTGGTGAGACTGCTAACCAAACCATGGGATGTCATTCCCATGGTCACACAAATAGCCATGACTGACACCAC\nACCCTTTGGACAACAGAGGGTGTTTAAAGAGAAAGTTGACACGCGTACACCAAAAGCGAAACGAGGCACA\nGCACAAATTATGGAGGTGACAGCCAGGTGGTTATGGGGTTTTCTCTCTAGAAACAAAAAACCCAGAATCT\nGCACAAGAGAGGAGTTCACAAGAAAAGTCAGGTCAAACGCAGCTATTGGAGCAGTGTTCGTTGATGAAAA\nTCAATGGAACTCAGCAAAAGAGGCAGTGGAAGATGAACGGTTCTGGGACCTTGTGCACAGAGAGAGGGAG\nCTTCATAAACAAGGAAAATGTGCCACGTGTGTCTACAACATGATGGGAAAGAGAGAGAAAAAATTAGGAG\nAGTTCGGAAAGGCAAAAGGAAGTCGCGCAATATGGTACATGTGGTTGGGAGCGCGCTTTTTAGAGTTTGA\nAGCCCTTGGTTTCATGAATGAAGATCACTGGTTCAGCAGAGAGAATTCACTCAGTGGAGTGGAAGGAGAA\nGGACTCCACAAACTTGGATACATACTCAGAGACATATCAAAGATTCCAGGGGGAAATATGTATGCAGATG\nACACAGCCGGATGGGACACAAGAATAACAGAGGATGATCTTCAGAATGAGGCCAAAATCACTGACATCAT\nGGAACCTGAACATGCCCTATTGGCCACGTCAATCTTTAAGCTAACCTACCAAAACAAGGTAGTAAGGGTG\nCAGAGACCAGCGAAAAATGGAACCGTGATGGATGTCATATCCAGACGTGACCAGAGAGGAAGTGGACAGG\nTTGGAACCTATGGCTTAAACACCTTCACCAACATGGAGGCCCAACTAATAAGACAAATGGAGTCTGAGGG\nAATCTTTTCACCCAGCGAATTGGAAACCCCAAATCTAGCCGAAAGAGTCCTCGACTGGTTGAAAAAACAT\nGGCACCGAGAGGCTGAAAAGAATGGCAATCAGTGGAGATGACTGTGTGGTGAAACCAATCGATGACAGAT\nTTGCAACAGCCTTAACAGCTTTGAATGACATGGGAAAGGTAAGAAAAGACATACCGCAATGGGAACCTTC\nAAAAGGATGGAATGATTGGCAACAAGTGCCTTTCTGTTCACACCATTTCCACCAGCTGATTATGAAGGAT\nGGGAGGGAGATAGTGGTGCCATGCCGCAACCAAGATGAACTTGTAGGTAGGGCCAGAGTATCACAAGGCG\nCCGGATGGAGCTTGAGAGAAACTGCATGCCTAGGCAAGTCATATGCACAAATGTGGCAGCTGATGTACTT\nCCACAGGAGAGACTTGAGATTAGCGGCTAATGCTATCTGTTCAGCCGTTCCAGTTGATTGGGTCCCAACC\nAGCCGCACCACCTGGTCGATCCATGCCCACCATCAATGGATGACAACAGAAGACATGTTGTCAGTGTGGA\nATAGGGTTTGGATAGAGGAAAACCCATGGATGGAGGACAAGACTCATGTGTCCAGTTGGGAAGACGTTCC\nATACCTAGGAAAAAGGGAAGATCAATGGTGTGGTTCCCTAATAGGCTTAACAGCACGAGCCACCTGGGCC\nACCAACATACAAGTGGCCATAAACCAAGTGAGAAGGCTCATTGGGAATGAGAATTATCTAGACTTCATGA\nCATCAATGAAGAGATTCAAAAACGAGAGTGATCCCGAAGGGGCACTCTGGTAAGCCAACTCATTCACAAA\nATAAAGGAAAATAAAAAATCAAACAAGGCAAGAAGTCAGGCCGGATTAAGCCATAGCACGGTAAGAGCTA\nTGCTGCCTGTGAGCCCCGTCCAAGGACGTAAAATGAAGTCAGGCCGAAAGCCACGGTTCGAGCAAGCCGT\nGCTGCCTGTAGCTCCATCGTGGGGATGTAAAAACCCGGGAGGCTGCAAACCATGGAAGCTGTACGCATGG\nGGTAGCAGACTAGTGGTTAGAGGAGACCCCTCCCAAGACACAACGCAGCAGCGGGGCCCAACACCAGGGG\nAAGCTGTACCCTGGTGGTAAGGACTAGAGGTTAGAGGAGACCCCCCGCACAACAACAAACAGCATATTGA\nCGCTGGGAGAGACCAGAGATCCTGCTGTCTCTACAGCATCATTCCAGGCACAGAACGCCAAAAAATGGAA\nTGGTGCTGTTGAATCAACAGGTTCT\n\n"