Objectives

Description

Write a R code to do each of the following tasks :

Question a:

Search for DNA sequences from the organism “Chlamydia trachomatis” in the ACNUC “genbank”.

choosebank("genbank")
Q <- query("Q", "SP=Chlamydia trachomatis") 

Question b:

How many sequences retrieved?

Q$name
## [1] "Q"
Q$nelem
## [1] 43496

Question c:

How many bases are there in longest sequence among them?

# vector = c()
# 
# for (val in 1:length(Q$req)){
#   # vector <- c(vector, length(getSequence(Q$req[[val]])))
# }
# 
# which.max( vector[] )

max(sapply(Q$req, getLength))
## [1] 1083893

Question d:

For the first three sequences, print out the accession numbers?

getName(Q$req[1])
## [1] "A01434"
getName(Q$req[2])
## [1] "A27838"
getName(Q$req[3])
## [1] "A27849"

Question e:

For the 1000th sequence, print out the nucleotide bases in the range 50 to 75.

s1000 = getSequence(Q$req[[1000]])

#Print the first 10 bases in the sequence
s1000[50:75]
##  [1] "t" "a" "g" "c" "t" "a" "a" "g" "t" "c" "g" "t" "a" "t" "t" "c" "t" "t" "t"
## [20] "g" "g" "g" "t" "g" "a" "a"

Question f:

What is the length of the 250th sequence?

Q[["req"]][[250]]
##       name     length      frame     ncbicg 
## "AF087303"      "175"        "2"       "11"

175 is the length of 250th sequences

Question g:

Export the 150th , 151th, and 152th sequences into a FASTA file.

write.fasta(getSequence(Q$req[150]), getName(Q$req[150]), file.out = "Seq_150.fasta")
write.fasta(getSequence(Q$req[151]), getName(Q$req[151]), file.out = "Seq_151.fasta")
write.fasta(getSequence(Q$req[152]), getName(Q$req[152]), file.out = "Seq_152.fasta")
closebank()

Notes:

• Handwritten answers are not allowed! • Use Rmarkdown (https://rmarkdown.rstudio.com/) and provide a neatly formatted “pdf” file showing both code and output. • Include your name as a comment at the beginning of the script file.