Directly pasted from https://rpubs.com/profbiot/readGenBank (until line break)

Use library command to make ape functions accessible by this script

library(ape)

## Warning: package 'ape' was built under R version 4.3.3

Use paste() function to create a chr vector of accession numbers for Gasterosteus sequences

These sequences all belong to one genus of sticklebacks

Change in the tutorial to be sequences previous Endicott Bioinformatics students uploaded MT103163-MT103183

seq1 <- paste("JQ", seq(983161, 983255), sep = "") # paste is similar to c(), but output is a string instead of vector

Download all sequential sequences from Genbank

This would be really hard to do my hand

Note that the downloaded sequences are stored in a single variable called a list

sequences <- read.GenBank(seq1,
                          seq.names = seq1,
                          species.names = TRUE,
                          as.character = TRUE)

Write the sequences to a fasta file

write.dna(sequences, "fish.fasta", format = "fasta")

Pan paniscus (Bonobo) Mitochondrial CO1 Gene Sequence Analysis

This script automates the download of mitochondrial CO1 gene sequences for Pan Paniscus (bonobo) from GenBank using the ape package in R. The taxonomic ID for Pan paniscus is txid:9597.

Read in search result file containing accession numbers

accessions = read.table("bonobo.seq",
                        stringsAsFactors = FALSE)$V1
str(accessions)

##  chr [1:21] "GU189677.1" "GU189676.1" "GU189675.1" "GU189674.1" ...

Download all sequences from GenBank

bonoboSeqs = read.GenBank(accessions,
                          seq.names = accessions,
                          species.name = TRUE,
                          as.character = TRUE)

Display information about downloaded sequences

cat("Successfully downloaded",
    length(sequences),
    "sequences\n")

## Successfully downloaded 95 sequences

cat("Sequence names:\n")

## Sequence names:

cat(paste(names(sequences)),
    "\n")

## JQ983161 JQ983162 JQ983163 JQ983164 JQ983165 JQ983166 JQ983167 JQ983168 JQ983169 JQ983170 JQ983171 JQ983172 JQ983173 JQ983174 JQ983175 JQ983176 JQ983177 JQ983178 JQ983179 JQ983180 JQ983181 JQ983182 JQ983183 JQ983184 JQ983185 JQ983186 JQ983187 JQ983188 JQ983189 JQ983190 JQ983191 JQ983192 JQ983193 JQ983194 JQ983195 JQ983196 JQ983197 JQ983198 JQ983199 JQ983200 JQ983201 JQ983202 JQ983203 JQ983204 JQ983205 JQ983206 JQ983207 JQ983208 JQ983209 JQ983210 JQ983211 JQ983212 JQ983213 JQ983214 JQ983215 JQ983216 JQ983217 JQ983218 JQ983219 JQ983220 JQ983221 JQ983222 JQ983223 JQ983224 JQ983225 JQ983226 JQ983227 JQ983228 JQ983229 JQ983230 JQ983231 JQ983232 JQ983233 JQ983234 JQ983235 JQ983236 JQ983237 JQ983238 JQ983239 JQ983240 JQ983241 JQ983242 JQ983243 JQ983244 JQ983245 JQ983246 JQ983247 JQ983248 JQ983249 JQ983250 JQ983251 JQ983252 JQ983253 JQ983254 JQ983255

Export sequences to FASTA file

write.dna(sequences,
          "bonobo_CO1.fasta",
          format = "fasta")

Scrape Sequences

Collin McNeil

2025-09-26

Directly pasted from https://rpubs.com/profbiot/readGenBank (until line break)

Use library command to make ape functions accessible by this script

Use paste() function to create a chr vector of accession numbers for Gasterosteus sequences

These sequences all belong to one genus of sticklebacks

Change in the tutorial to be sequences previous Endicott Bioinformatics students uploaded MT103163-MT103183

Download all sequential sequences from Genbank

This would be really hard to do my hand

Note that the downloaded sequences are stored in a single variable called a list

Write the sequences to a fasta file

Pan paniscus (Bonobo) Mitochondrial CO1 Gene Sequence Analysis

This script automates the download of mitochondrial CO1 gene sequences for Pan Paniscus (bonobo) from GenBank using the ape package in R. The taxonomic ID for Pan paniscus is txid:9597.

Read in search result file containing accession numbers

Download all sequences from GenBank

Display information about downloaded sequences

Export sequences to FASTA file