Pipeline for targeted functional metagenomic DNA capture probe design in R
Read in the database of sequences (in FASTA format) from dbCAN (http://csbl.bmb.uga.edu/dbCAN/download/CAZyDB.07202017.fa)
library(DECIPHER)
Loading required package: RSQLite
fas <- "http://csbl.bmb.uga.edu/dbCAN/download/CAZyDB.07202017.fa"
store_db <- "C:/Users/faysmith/OneDrive - University of Arkansas/R Code/New DB for probe/new"
seqs <- readAAStringSet(fas)
trying URL 'http://csbl.bmb.uga.edu/dbCAN/download/CAZyDB.07202017.fa'
Content type 'text/plain; charset=UTF-8' length 463121677 bytes (441.7 MB)
downloaded 441.7 MB
Seqs2DB("http://csbl.bmb.uga.edu/dbCAN/download/CAZyDB.07202017.fa", "FASTA", store_db, "CAZymes")
Reading FASTA file chunk 1
Reading FASTA file chunk 2
Reading FASTA file chunk 3
Reading FASTA file chunk 4
Reading FASTA file chunk 5
Reading FASTA file chunk 6
Reading FASTA file chunk 7
Reading FASTA file chunk 8
Reading FASTA file chunk 9
Reading FASTA file chunk 10
Reading FASTA file chunk 11
Reading FASTA file chunk 12
Reading FASTA file chunk 13
Reading FASTA file chunk 14
Reading FASTA file chunk 15
Reading FASTA file chunk 16
Reading FASTA file chunk 17
Reading FASTA file chunk 18
Reading FASTA file chunk 19
Reading FASTA file chunk 20
Reading FASTA file chunk 21
Reading FASTA file chunk 22
Reading FASTA file chunk 23
Reading FASTA file chunk 24
Reading FASTA file chunk 25
Reading FASTA file chunk 26
Reading FASTA file chunk 27
Reading FASTA file chunk 28
Reading FASTA file chunk 29
Reading FASTA file chunk 30
Reading FASTA file chunk 31
Reading FASTA file chunk 32
Reading FASTA file chunk 33
Reading FASTA file chunk 34
Reading FASTA file chunk 35
Reading FASTA file chunk 36
Reading FASTA file chunk 37
Reading FASTA file chunk 38
Reading FASTA file chunk 39
Reading FASTA file chunk 40
Reading FASTA file chunk 41
Reading FASTA file chunk 42
Reading FASTA file chunk 43
Reading FASTA file chunk 44
Reading FASTA file chunk 45
Reading FASTA file chunk 46
Reading FASTA file chunk 47
Added 921174 new sequences to table Seqs.
1842348 total sequences in table Seqs.
Time difference of 113.09 secs
Pull out the GenBank numbers and then pull out familiy names as a string
#fam_names <- stri_extract(fam_names, regex='[.[A-Z][0-9]|?]')
fam_names
[1] "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0"
[9] "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0"
[17] "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0"
[25] "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0"
[33] "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0"
[41] "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0"
[49] "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0"
[57] "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0"
[65] "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0"
[73] "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0"
[81] "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0"
[89] "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0" "|AA0"
[97] "|AA0" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[105] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[113] "|AA10|3.2.1.78" "|AA10" "|AA10" "|AA10|1.-.-.-" "|AA10" "|AA10" "|AA10" "|AA10"
[121] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[129] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10|1.-.-.-" "|AA10" "|AA10"
[137] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[145] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[153] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[161] "|AA10" "|AA10" "|AA10|1.-.-.-" "|AA10" "|AA10" "|AA10|1.-.-.-" "|AA10" "|AA10"
[169] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10|1.-.-.-"
[177] "|AA10|1.-.-.-" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10|1.-.-.-" "|AA10"
[185] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[193] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[201] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[209] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[217] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[225] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[233] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[241] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[249] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[257] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[265] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10|1.-.-.-" "|AA10|1.-.-.-" "|AA10"
[273] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[281] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[289] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[297] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[305] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[313] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[321] "|AA10|1.-.-.-" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[329] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[337] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[345] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[353] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[361] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[369] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[377] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[385] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[393] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[401] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[409] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[417] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[425] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[433] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[441] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[449] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[457] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[465] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[473] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[481] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[489] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[497] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[505] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[513] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[521] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[529] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[537] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[545] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[553] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[561] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[569] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[577] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[585] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[593] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[601] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[609] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[617] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[625] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[633] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[641] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[649] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[657] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[665] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[673] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[681] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[689] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[697] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[705] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[713] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[721] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[729] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[737] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[745] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[753] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[761] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[769] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[777] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[785] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[793] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[801] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[809] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[817] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[825] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[833] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[841] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[849] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[857] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[865] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[873] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[881] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[889] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[897] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[905] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[913] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[921] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[929] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[937] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[945] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[953] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[961] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[969] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[977] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[985] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[993] "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10" "|AA10"
[ reached getOption("max.print") -- omitted 918938 entries ]
Seperating sequences by subfamily
AA <- SearchDB(AA0_db, type = "AAStringSet", nameBy = "row_names")
Search Expression:
select row_names, sequence from _Seqs where row_names in (select row_names from Seqs)
AAStringSet of length: 100
Time difference of 0.21 secs
**Note: Need to write in code to extract the subfamily names from the inital file, then use that list to automate this process
Add2DB(clus, AA0_db)
Expression:
update Seqs set cluster = :cluster where row_names = :row_names
Added to table Seqs: "cluster".
Time difference of 0.21 secs
conSeqs <- IdConsensus(path_cazy_nuc, type = "AAStringSet", colName = "cluster", verbose = TRUE)
Couldn't set synchronous mode: file is not a database
Use `synchronous` = NULL to turn off this warning.Error in result_create(conn@ptr, statement) : file is not a database
Now for the array design and validation steps
probes <- DesignArray(conSeqs, maxPermutations=2, numProbes = 20,verbose = TRUE)
Error in DesignArray(conSeqs, maxPermutations = 2, numProbes = 20, verbose = TRUE) :
myDNAStringSet must be a DNAStringSet.
Try to use the GenBank numbers to call NCBI nucleotide sequences:
Seqs2DB(cazy_nuc, "DNAStringSet", "C:/Users/faysmith/OneDrive - University of Arkansas/R Code/New DB for probe/nuc_cazy", "Nuc_Cazymes")
Couldn't set synchronous mode: file is not a database
Use `synchronous` = NULL to turn off this warning.Error in result_create(conn@ptr, statement) : file is not a database