Feature set includes… - Melting temperature - GC conent - MFE (ViennaRNA) - Distance to PAM - Location in target gene - Positional encoding (sgRNA ind1, ind2, dep1, dep2, dep3, dep4) - PAM nucleotide encoding - HAAR DWTs (20bp sliding windows of the genome: GATC motif, Gene density, GC content, PAM, IPD, MFE, melting temp) - Quantum chemical tensors (monomer, basepair, dimer, trimer, tetramer)
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/
# generate fastq file of sequences and blast to reference
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
sed '1d' DataS1.txt | awk '{print ">"$1"\n"$2}' > ecoli.gRNA.fasta
## blast
# conda install blast
# cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes
# wget https://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/ncbi-blast-2.11.0+-x64-linux.tar.gz
# tar zxvpf ncbi-blast-2.11.0+-x64-linux.tar.gz
# export PATH=$PATH:$HOME/ncbi-blast-2.10.1+/bin
# echo $PATH
# mkdir $HOME/blastdb
# export BLASTDB=$HOME/blastdb
# set BLASTDB=$HOME/blastdb
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/makeblastdb -in genome/GCF_000005845.2_ASM584v2_genomic.fna -dbtype nucl
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query ecoli.gRNA.fasta -db genome/GCF_000005845.2_ASM584v2_genomic.fna -out ecoli.gRNA.blast.tab -outfmt 6 -evalue 0.0001 -task blastn -num_threads 10
## install bwa (git clone https://github.com/bwa-mem2/bwa-mem2)
awk '{if ($9 > $10) print $2"\t"$10"\t"$9"\t"$1}' ecoli.gRNA.blast.tab > tmp1.bed
awk '{if ($10 > $9) print $2"\t"$9"\t"$10"\t"$1}' ecoli.gRNA.blast.tab > tmp2.bed
cat tmp1.bed tmp2.bed > ecoli.gRNA.blast.bed
# tr -d '\n' < mexicanus.fasta | cut -b210-220
###### run with complement sequence
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate emboss
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
revseq ecoli.gRNA.fasta -noreverse -complement -outseq ecoli.gRNA.comp.fasta
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query ecoli.gRNA.comp.fasta -db genome/GCF_000005845.2_ASM584v2_genomic.fna -out ecoli.gRNA.complement.blast.tab -outfmt 6 -evalue 0.0001 -task blastn -num_threads 10
#### only getting two outputs??
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query ecoli.gRNA.comp.fasta -db genome/GCF_000005845.2_ASM584v2_genomic.fna -out ecoli.gRNA.complement.blast.tab -outfmt 6 -evalue 0.0005 -task blastn -num_threads 10
#### only 13 outputs
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query ecoli.gRNA.comp.fasta -db genome/GCF_000005845.2_ASM584v2_genomic.fna -out ecoli.gRNA.complement.blast.tab -outfmt 6 -task blastn-short -num_threads 10
#### too many outputs [1185091]
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query ecoli.gRNA.comp.fasta -db genome/GCF_000005845.2_ASM584v2_genomic.fna -out ecoli.gRNA.complement.blast.tab -outfmt 6 -task blastn -num_threads 10
#### fewer but still too many [809935] <-- input 55671 sequences
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query ecoli.gRNA.comp.fasta -db genome/GCF_000005845.2_ASM584v2_genomic.fna -out ecoli.gRNA.complement.blast.tab -outfmt 6 -evalue 0.001 -task blastn -num_threads 10
#### only 37 outputs...
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query ecoli.gRNA.comp.fasta -db genome/GCF_000005845.2_ASM584v2_genomic.fna -out ecoli.gRNA.complement.blast.tab -outfmt 6 -evalue 0.01 -task blastn-short -num_threads 10
awk '{if ($9 > $10) print $2"\t"$10"\t"$9"\t"$1}' ecoli.gRNA.complement.blast.tab > tmp1.comp.bed
awk '{if ($10 > $9) print $2"\t"$9"\t"$10"\t"$1}' ecoli.gRNA.complement.blast.tab > tmp2.comp.bed
cat tmp1.comp.bed tmp2.comp.bed > ecoli.gRNA.complement.blast.bed
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
# R
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
d1 <- read.delim("DataS1.txt", header=T, sep="\t")
d4 <- read.delim("DataS4.txt", header=T, sep="\t")
d6 <- read.delim("DataS6.txt", header=T, sep="\t")
coord <- read.delim("ecoli.gRNA.blast.bed", header=F, sep="\t")
colnames(coord) <- c("chr", "start", "end", "sgRNA")
d1$sgRNA <- d1$sgRNAID
d4$sgRNA <- d4$sgRNAID
library(dplyr)
df <- left_join(coord, d1, by="sgRNA")
df2 <- left_join(df, d4, by="sgRNA")
df3 <- left_join(df2, d6, by="sgRNA")
write.table(df3, "sgRNA.coord.txt", quote=F, row.names=F, sep="\t")
# complement
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
d1 <- read.delim("DataS1.txt", header=T, sep="\t")
d4 <- read.delim("DataS4.txt", header=T, sep="\t")
d6 <- read.delim("DataS6.txt", header=T, sep="\t")
coord <- read.delim("ecoli.gRNA.complement.blast.bed", header=F, sep="\t")
colnames(coord) <- c("chr", "start", "end", "sgRNA")
d1$sgRNA <- d1$sgRNAID
d4$sgRNA <- d4$sgRNAID
library(dplyr)
df <- left_join(coord, d1, by="sgRNA")
df2 <- left_join(df, d4, by="sgRNA")
df3 <- left_join(df2, d6, by="sgRNA")
write.table(df3, "sgRNA.complement.coord.txt", quote=F, row.names=F, sep="\t")
## include all cas9 types
### dataset --> DataS4... save each sheet as a dataframe, add column declaring Cas9 type, intersect with DataS1 for sequence, create new sgRNAID using both the ID and Cas9 type, merge files
setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli")
seq <- read.delim("DataS1.txt", header=T, sep="\t")
setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/DataS4.tables")
Cas9 <- read.delim("DataS4.Cas9.txt", header=T, sep="\t")
eSpCas9 <- read.delim("DataS4.eSpCas9.txt", header=T, sep="\t")
recAcas9 <- read.delim("DataS4.recACas9.txt", header=T, sep="\t")
# > nrow(seq)
# [1] 55671
# > nrow(Cas9)
# [1] 44163
# > nrow(eSpCas9)
# [1] 45071
# > nrow(recAcas9)
# [1] 48112
library(dplyr)
library(tidyr)
Cas9.seq <- left_join(Cas9, seq, by="sgRNAID")
eSpCas9.seq <- left_join(eSpCas9, seq, by="sgRNAID")
recAcas9.seq <- left_join(recAcas9, seq, by="sgRNAID")
Cas9.seq.id <- Cas9.seq %>% unite(sgRNAID, c(sgRNAID, type), sep="_")
eSpCas9.seq.id <- eSpCas9.seq %>% unite(sgRNAID, c(sgRNAID, type), sep="_")
recAcas9.seq.id <- recAcas9.seq %>% unite(sgRNAID, c(sgRNAID, type), sep="_")
df <- rbind(Cas9.seq.id, eSpCas9.seq.id)
df2 <- rbind(df, recAcas9.seq.id)
# 137346
df.na <- na.omit(df2)
# 126182
write.table(df.na, "Ecoli.allCas9.txt", quote=F, row.names=F, sep="\t")
sed '1d' Ecoli.allCas9.txt | awk '{print ">"$1"\n"$3}' > Ecoli.allCas9.fasta
# cd /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/DataS4.tables
# scp Ecoli.allCas9.txt noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/.
# scp Ecoli.allCas9.fasta noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/.
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
bedtools makewindows -g ecoli.sizes.genome -w 20 -s 1 > ecoli.20bp.sliding.bed
module load python
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
## genes
bedtools intersect -wo -a ecoli.20bp.sliding.bed -b genome/GCF_000005845.2_ASM584v2_genomic.gene.gff > ecoli.gene.20sliding.bed
## GC content
bedtools nuc -fi genome/GCF_000005845.2_ASM584v2_genomic.fna -bed ecoli.20bp.sliding.bed | sed '1d' > ecoli.GC.20sliding.bed
https://biopython.org/docs/1.75/api/Bio.SeqUtils.MeltingTemp.html
module load python
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
Bio.SeqUtils.MeltingTemp.Tm_NN(seq, check=True, strict=True, c_seq=None, shift=0, nn_table=None, tmm_table=None, imm_table=None, de_table=None, dnac1=25, dnac2=25, selfcomp=False, Na=50, K=0, Tris=0, Mg=0, dNTPs=0, saltcorr=5)
https://warwick.ac.uk/fac/sci/moac/people/students/peter_cock/python/fasta_n
# summit: # conda install -c conda-forge biopython
### sgRNA
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
# count nucleotides
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
python3
input_file = open('ecoli.gRNA.fasta', 'r')
output_file = open('nucleotide_counts_sgRNA.tsv','w')
output_file.write('Window\tA\tC\tG\tT\tLength\tCG%\n')
from Bio import SeqIO
for cur_record in SeqIO.parse(input_file, "fasta") :
gene_name = cur_record.name
A_count = cur_record.seq.count('A')
C_count = cur_record.seq.count('C')
G_count = cur_record.seq.count('G')
T_count = cur_record.seq.count('T')
length = len(cur_record.seq)
cg_percentage = float(C_count + G_count) / length
output_line = '%s\t%i\t%i\t%i\t%i\t%i\t%f\n' % \
(gene_name, A_count, C_count, G_count, T_count, length, cg_percentage)
output_file.write(output_line)
output_file.close()
input_file.close()
exit()
# Melting temperature(°C) = 64.9 + 41 * (nG+nC-16.4)/(nA+nT+nG+nC)
R
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica")
df <- read.delim("nucleotide_counts_sgRNA.tsv", header=T, sep="\t")
df.melt <- df %>% mutate(MeltingTemp = 64.9 + 41 * (G+C-16.4) / (A+T+G+C))
write.table(df.melt, "nucleotide_counts_sgRNA_temp.txt", quote=F, row.names=F, sep="\t")
q()
### 20bp sliding windows
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
bedtools getfasta -fi genome/GCF_000005845.2_ASM584v2_genomic.fna -bed ecoli.20bp.sliding.bed -fo ecoli.20sliding.fa
# count nucleotides
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
python3
input_file = open('ecoli.20sliding.fa', 'r')
output_file = open('nucleotide_counts_20sliding.tsv','w')
output_file.write('Window\tA\tC\tG\tT\tLength\tCG%\n')
from Bio import SeqIO
for cur_record in SeqIO.parse(input_file, "fasta") :
gene_name = cur_record.name
A_count = cur_record.seq.count('A')
C_count = cur_record.seq.count('C')
G_count = cur_record.seq.count('G')
T_count = cur_record.seq.count('T')
length = len(cur_record.seq)
cg_percentage = float(C_count + G_count) / length
output_line = '%s\t%i\t%i\t%i\t%i\t%i\t%f\n' % \
(gene_name, A_count, C_count, G_count, T_count, length, cg_percentage)
output_file.write(output_line)
output_file.close()
input_file.close()
exit()
# Melting temperature(°C) = 64.9 + 41 * (nG+nC-16.4)/(nA+nT+nG+nC)
R
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("nucleotide_counts_20sliding.tsv", header=T, sep="\t")
df.melt <- df %>% mutate(MeltingTemp = 64.9 + 41 * (G+C-16.4) / (A+T+G+C))
write.table(df.melt, "nucleotide_counts_20sliding_temp.txt", quote=F, row.names=F, sep="\t")
q()
https://www.tbi.univie.ac.at/RNA/tutorial/ minimum free energy (MFE) structure
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate ViennaRNA
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
RNAfold < ecoli.gRNA.fasta > ecoli.gRNA.ViennaRNA.output.txt
grep '(' ecoli.gRNA.ViennaRNA.output.txt | grep -Eo '[+-]?[0-9]+([.][0-9]+)?' > ecoli.gRNA.ViennaRNA.output.value.txt
grep '>' ecoli.gRNA.ViennaRNA.output.txt | sed 's/>//g' > ecoli.gRNA.names.txt
paste ecoli.gRNA.names.txt ecoli.gRNA.ViennaRNA.output.value.txt > ecoli.gRNA.ViennaRNA.output.value.id.txt
# 20bp sliding fasta
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
RNAfold < ecoli.20sliding.fa > ecoli.20sliding.ViennaRNA.output.txt
grep '(' ecoli.20sliding.ViennaRNA.output.txt | grep -Eo '[+-]?[0-9]+([.][0-9]+)?' > ecoli.20sliding.ViennaRNA.output.value.txt
grep '>' ecoli.20sliding.ViennaRNA.output.txt | sed 's/>//g' > ecoli.20sliding.names.txt
paste ecoli.20sliding.names.txt ecoli.20sliding.ViennaRNA.output.value.txt > ecoli.20sliding.ViennaRNA.output.value.id.txt
### onehot encoding
# import os, sys
# import numpy as np
#
# onehot_dict = {
# 'A': '1000',
# 'C': '0100',
# 'T': '0010',
# 'G': '0001',
# 'AA': '1000000000000000',
# 'AC': '0100000000000000',
# 'AT': '0010000000000000',
# 'AG': '0001000000000000',
# 'CA': '0000100000000000',
# 'CC': '0000010000000000',
# 'CT': '0000001000000000',
# 'CG': '0000000100000000',
# 'TA': '0000000010000000',
# 'TC': '0000000001000000',
# 'TT': '0000000000100000',
# 'TG': '0000000000010000',
# 'GA': '0000000000001000',
# 'GC': '0000000000000100',
# 'GT': '0000000000000010',
# 'GG': '0000000000000001',
# }
#
# # open input and output files
# input_path = sys.argv[1]
# input_file = open(input_path, 'r')
# dep1_file = open(input_path[:-4]+'_dependent1.txt', 'w')
# dep2_file = open(input_path[:-4]+'_dependent2.txt', 'w')
# indep1_file = open(input_path[:-4]+'_independent1.txt', 'w')
# indep2_file = open(input_path[:-4]+'_independent2.txt', 'w')
#
# # loop over nucleotide sequences
# for idx, line in enumerate(input_file):
#
# # if first iteration, write title line
# if idx == 0:
#
# dep1_file.writelines(line+': first-order position-dependent features'+ '\n')
# dep2_file.writelines(line+': second-order position-dependent features'+ '\n')
# indep1_file.writelines(line+': first-order position-independent features'+ '\n')
# indep2_file.writelines(line+': second-order position-independent features'+ '\n')
#
# # otherwise encode sequence
# else:
#
# # split line by tab
# line = line.split('\t')
#
# # extract sequence (also remove \n)
# seq = line[-1][:-1]
#
# # compute position-dependent features as one-hot vectors
# pos_dep1 = ''.join([onehot_dict[seq[i]] for i in range(len(seq))])
# pos_dep2 = ''.join([onehot_dict[seq[i:i+2]] for i in range(len(seq)-1)])
#
# # compute position-independent features as sum over position-dependent features
# pos_indep1 = list(np.array([int(o) for o in pos_dep1]).reshape([-1, 4]).sum(axis=0))
# pos_indep2 = list(np.array([int(o) for o in pos_dep2]).reshape([-1, 16]).sum(axis=0))
# pos_indep1 = ''.join([str(p) for p in pos_indep1])
# pos_indep2 = ''.join([str(p) for p in pos_indep2])
#
# # write features to file
# dep1_file.writelines(line[0] + '\t' + pos_dep1 + '\n')
# dep2_file.writelines(line[0] + '\t' + pos_dep2 + '\n')
# indep1_file.writelines(line[0] + '\t' + pos_indep1 + '\n')
# indep2_file.writelines(line[0] + '\t' + pos_indep2 + '\n')
#
# if idx % 10000 == 0:
# print('{0:,}'.format(idx)+' lines processed...')
#
# print('Done!')
#
# input_file.close()
# dep1_file.close()
# dep2_file.close()
# indep1_file.close()
# indep2_file.close()
#python path/to/encode_sequences.py path/to/data.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/
cut -f 1,3 Ecoli.allCas9.txt > Ecoli.allCas9.noscore.txt
python encode_sequences.py Ecoli.allCas9.noscore.txt
# separate nucleotide sequence values into individual columns in data frame so each position counts as one feature
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/
sed '1d' Ecoli.allCas9.noscore_independent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID A C T G' | cut -d ' ' -f 1-5 > Ecoli.allCas9_ind1.txt
sed '1d' Ecoli.allCas9.noscore_independent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID AA AC AT AG CA CC CT CG TA TC TT TG GA GC GT GG' | cut -d ' ' -f 1-17 > Ecoli.allCas9_ind2.txt
sed '1d' Ecoli.allCas9.noscore_dependent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID p1.A p1.C p1.T p1.G p2.A p2.C p2.T p2.G p3.A p3.C p3.T p3.G p4.A p4.C p4.T p4.G p5.A p5.C p5.T p5.G p6.A p6.C p6.T p6.G p7.A p7.C p7.T p7.G p8.A p8.C p8.T p8.G p9.A p9.C p9.T p9.G p10.A p10.C p10.T p10.G p11.A p11.C p11.T p11.G p12.A p12.C p12.T p12.G p13.A p13.C p13.T p13.G p14.A p14.C p14.T p14.G p15.A p15.C p15.T p15.G p16.A p16.C p16.T p16.G p17.A p17.C p17.T p17.G p18.A p18.C p18.T p18.G p19.A p19.C p19.T p19.G p20.A p20.C p20.T p20.G' | cut -d ' ' -f 1-81 > Ecoli.allCas9_dep1.txt
sed '1d' Ecoli.allCas9.noscore_dependent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID p1.AA p1.AC p1.AT p1.AG p1.CA p1.CC p1.CT p1.CG p1.TA p1.TC p1.TT p1.TG p1.GA p1.GC p1.GT p1.GG p2.AA p2.AC p2.AT p2.AG p2.CA p2.CC p2.CT p2.CG p2.TA p2.TC p2.TT p2.TG p2.GA p2.GC p2.GT p2.GG p3.AA p3.AC p3.AT p3.AG p3.CA p3.CC p3.CT p3.CG p3.TA p3.TC p3.TT p3.TG p3.GA p3.GC p3.GT p3.GG p4.AA p4.AC p4.AT p4.AG p4.CA p4.CC p4.CT p4.CG p4.TA p4.TC p4.TT p4.TG p4.GA p4.GC p4.GT p4.GG p5.AA p5.AC p5.AT p5.AG p5.CA p5.CC p5.CT p5.CG p5.TA p5.TC p5.TT p5.TG p5.GA p5.GC p5.GT p5.GG p6.AA p6.AC p6.AT p6.AG p6.CA p6.CC p6.CT p6.CG p6.TA p6.TC p6.TT p6.TG p6.GA p6.GC p6.GT p6.GG p7.AA p7.AC p7.AT p7.AG p7.CA p7.CC p7.CT p7.CG p7.TA p7.TC p7.TT p7.TG p7.GA p7.GC p7.GT p7.GG p8.AA p8.AC p8.AT p8.AG p8.CA p8.CC p8.CT p8.CG p8.TA p8.TC p8.TT p8.TG p8.GA p8.GC p8.GT p8.GG p9.AA p9.AC p9.AT p9.AG p9.CA p9.CC p9.CT p9.CG p9.TA p9.TC p9.TT p9.TG p9.GA p9.GC p9.GT p9.GG p10.AA p10.AC p10.AT p10.AG p10.CA p10.CC p10.CT p10.CG p10.TA p10.TC p10.TT p10.TG p10.GA p10.GC p10.GT p10.GG p11.AA p11.AC p11.AT p11.AG p11.CA p11.CC p11.CT p11.CG p11.TA p11.TC p11.TT p11.TG p11.GA p11.GC p11.GT p11.GG p12.AA p12.AC p12.AT p12.AG p12.CA p12.CC p12.CT p12.CG p12.TA p12.TC p12.TT p12.TG p12.GA p12.GC p12.GT p12.GG p13.AA p13.AC p13.AT p13.AG p13.CA p13.CC p13.CT p13.CG p13.TA p13.TC p13.TT p13.TG p13.GA p13.GC p13.GT p13.GG p14.AA p14.AC p14.AT p14.AG p14.CA p14.CC p14.CT p14.CG p14.TA p14.TC p14.TT p14.TG p14.GA p14.GC p14.GT p14.GG p15.AA p15.AC p15.AT p15.AG p15.CA p15.CC p15.CT p15.CG p15.TA p15.TC p15.TT p15.TG p15.GA p15.GC p15.GT p15.GG p16.AA p16.AC p16.AT p16.AG p16.CA p16.CC p16.CT p16.CG p16.TA p16.TC p16.TT p16.TG p16.GA p16.GC p16.GT p16.GG p17.AA p17.AC p17.AT p17.AG p17.CA p17.CC p17.CT p17.CG p17.TA p17.TC p17.TT p17.TG p17.GA p17.GC p17.GT p17.GG p18.AA p18.AC p18.AT p18.AG p18.CA p18.CC p18.CT p18.CG p18.TA p18.TC p18.TT p18.TG p18.GA p18.GC p18.GT p18.GG p19.AA p19.AC p19.AT p19.AG p19.CA p19.CC p19.CT p19.CG p19.TA p19.TC p19.TT p19.TG p19.GA p19.GC p19.GT p19.GG p20.AA p20.AC p20.AT p20.AG p20.CA p20.CC p20.CT p20.CG p20.TA p20.TC p20.TT p20.TG p20.GA p20.GC p20.GT p20.GG' | cut -d ' ' -f 1-321 > Ecoli.allCas9_dep2.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/
sed '1d' Ecoli.allCas9.noscore.txt | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 p16 p17 p18 p19 p20' | cut -d ' ' -f 1-21 > Ecoli.allCas9.sequence.txt
# salloc -A SYB105 -N 2 -t 4:00:00
module load python
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("protein_rna_dna-vector_lee_nucleotide_dna_data.txt", header=T, sep="\t", stringsAsFactors = F)
seq <- read.delim("Ecoli.allCas9.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:5]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- c("A", "C", "G", "T")
rownames(seq) <- seq[,1]
seq.df <- seq[,2:21]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Ecoli.allCas9.tensors.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Ecoli.allCas9.tensors.melt.txt", quote=F, row.names=F, sep="\t")
library(tidyr)
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/genome")
# sed '1d' GCF_000005845.2_ASM584v2_genomic.gff | sed '1d' | sed '1d' | sed '1d' | sed '1d' | sed '1d' | sed '1d' > GCF_000005845.2_ASM584v2_genomic.txt
annotation <- read.delim("GCF_000005845.2_ASM584v2_genomic.txt", header=F, sep="\t")
gene <- subset(annotation, annotation$V3 == "gene")
gene.id <- separate(gene, V9, c("id1", "id2"), sep="EcoGene:")
gene.id$gene_id <- substr(gene.id$id2, 1, 7)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
rna <- read.delim("GSM2267479_Sample-1.genes.results.txt", header=T, sep="\t")
rna.id <- left_join(rna, gene.id, by="gene_id")
rna.id.idf <- na.omit(rna.id[,c(8,11,12,1,3:7)])
write.table(rna.id.idf, "GSM2267479.fpkm.coord.txt", quote=F, row.names=F, sep="\t")
# calculate density
bedtools intersect -wo -a ecoli.20bp.sliding.bed -b GSM2267479.fpkm.coord.bed > ecoli.rnaseq.20sliding.bed
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
window <- read.delim("ecoli.rnaseq.20sliding.bed", header=F, sep="\t")
window.df <- window %>% group_by(V1, V2, V3) %>% mutate(avg.fpkm = mean(V12))
window.uniq <- unique(window.df[,c(1:3,14)])
write.table(window.uniq, "ecoli.rnaseq.average.20sliding.bed", quote=F, row.names=F, sep="\t")
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
## GATC motif
## fastaregex
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f genome/GCF_000005845.2_ASM584v2_genomic.fna -r 'GATC' > ecoli.gatc.bed
bedtools intersect -wo -a ecoli.20bp.sliding.bed -b ecoli.gatc.coord.bed > ecoli.gatc.20sliding.bed
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("GSM3264688_Ecoli.gff", header=F, sep="\t")
df2 <- df[5:nrow(df),]
library(dplyr)
library(tidyr)
df.sep <- df2 %>% separate(V9, c("coverage", "context", "IPD"), sep=";")
df.ipd <- df.sep %>% separate(IPD, c("IPD", "IPD.value"), sep="=")
df.ipd$chr <- "NC_000913.3"
df.coord <- df.ipd[,c(13,4,5,12)]
write.table(df.coord, "GSM3264688_Ecoli.coord.bed", quote=F, row.names=F, col.names=F, sep="\t")
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
bedtools intersect -wo -a ecoli.20bp.sliding.bed -b GSM3264688_Ecoli.coord.bed > ecoli.ipd.20sliding.bed
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
window <- read.delim("ecoli.ipd.20sliding.bed", header=F, sep="\t")
window.df <- window %>% group_by(V1, V2, V3) %>% mutate(avg.fpkm = mean(V7))
write.table(window.df, "ecoli.ipd.average.20sliding.bed", quote=F, row.names=F, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
window.df <- read.delim("ecoli.ipd.average.20sliding.bed", header=T, sep="\t")
window.uniq <- unique(window.df[,c(1:3,9)])
write.table(window.uniq, "ecoli.ipd.average.20sliding.bed", quote=F, row.names=F, sep="\t")
https://www.synthego.com/guide/how-to-use-crispr/pam-sequence
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
# generate fastq file of NGG sequences and blast to reference
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
# vim NGG.PAM.fasta
## fastaRegexFinder
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f genome/GCF_000005845.2_ASM584v2_genomic.fna -r 'AGG' > AGG.PAM.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f genome/GCF_000005845.2_ASM584v2_genomic.fna -r 'TGG' > TGG.PAM.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f genome/GCF_000005845.2_ASM584v2_genomic.fna -r 'CGG' > CGG.PAM.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f genome/GCF_000005845.2_ASM584v2_genomic.fna -r 'GGG' > GGG.PAM.txt
cat AGG.PAM.txt TGG.PAM.txt CGG.PAM.txt GGG.PAM.txt > NGG.PAM.txt
sort -k 1,1 -k 2,2n NGG.PAM.txt > NGG.PAM.sorted.bed
# intersect with sliding windows in the genome to get density for DWT
bedtools intersect -wo -a ecoli.20bp.sliding.bed -b NGG.PAM.sorted.bed > NGG.PAM.20bp.sliding.windows.bed
# closest with gRNAs to identify distance (downstream, strand)
cut -f 1-4 sgRNA.coord.txt | sed '1d' | sort -k 1,1 -k 2,2n > sgRNA.coord.bed
awk '{print $0"\t""+"}' sgRNA.coord.bed > sgRNA.coord.strand.txt
bedtools closest -a sgRNA.coord.strand.txt -b NGG.PAM.sorted.bed -io -iu -D a > ecoli.sgRNA.closestPAM.bed
# determine if N = A,C,T, or G
## feature: PAM.A.raw, PAM.C.raw, PAM.T.raw, PAM.G.raw <-- binary
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
bedtools closest -a sgRNA.coord.bed -b genome/GCF_000005845.2_ASM584v2_genomic.gene.gff -D b> sgRNA.gene.closest.bed
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
structure <- read.delim("Ecoli.allCas9.structure.txt", header=T, sep="\t", stringsAsFactors = F)
nuc <- read.delim("Ecoli.allCas9.nuc.count.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("Ecoli.allCas9.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(1:2)]
colnames(score.df) <- c("sgRNAID", "cut.score")
structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]
# strucutre, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])
structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"
structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]
structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "seq", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")
## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
onehot.ind1 <- read.delim("Ecoli.allCas9_ind1.txt", header=T, sep=" ")
onehot.ind2 <- read.delim("Ecoli.allCas9_ind2.txt", header=T, sep=" ")
onehot.dep1 <- read.delim("Ecoli.allCas9_dep1.txt", header=T, sep=" ")
onehot.dep2 <- read.delim("Ecoli.allCas9_dep2.txt", header=T, sep=" ")
onehot.dep2 <- onehot.dep2[,1:305]
onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep <- full_join(onehot.dep1, onehot.dep2, by="sgRNAID")
onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"
data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))
df.melt <- melt(data.onehot[,c(1,2,4:ncol(data.onehot))], id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
write.table(df.id, "df.id.test.txt", quote=F, row.names=F, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
tensor <- read.delim("Ecoli.allCas9.tensors.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0
tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0
write.table(tensor.id, "tensor.id.test", quote=F, row.names=F, sep="\t")
# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df.id <- read.delim("df.id.test.txt", header=T, sep="\t")
tensor.id <- read.delim("tensor.id.test", header=T, sep="\t")
df.score <- unique(df.id[,c(1,3)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
head(df.id)
head(tensor.score.order)
tensor.df <- rbind(df.id, tensor.score.order)
write.table(tensor.df, "Ecoli.allCas9.raw.onehot.tensor.txt", quote=F, row.names=F, sep="\t")
df.dcast <- tensor.df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
write.table(df.dcast, "Ecoli.allCas9.raw.onehot.tensor.dcast.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast)
#
df.dcast.na <- na.omit(df.dcast)
write.table(df.dcast.na, "Ecoli.allCas9.raw.onehot.tensor.dcast.na.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast.na)
#
# pam (distance and nucleotide)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
sgRNA.pam <- read.table("ecoli.sgRNA.closestPAM.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.pam.sub <- sgRNA.pam[,c(4,12,13)]
colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]
sgRNA.pam.df$id <- "Cas9"
sgRNA.pam.id <- unite(sgRNA.pam.df, "sgRNAID", c(sgRNAID, id), sep="_")
score.location <- left_join(score.df, sgRNA.pam.id, by="sgRNAID")
score.location$scale <- 0
df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
# 40468
write.table(df.dcast.na, "ecoli.sgRNA.pam.dcast.txt", quote=F, row.names=F, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df.dcast <- read.delim("ecoli.sgRNA.pam.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
df <- read.delim("Ecoli.allCas9.raw.onehot.tensor.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df.location <- inner_join(df, df.dcast, by=c("sgRNAID"))
nrow(df.location)
# 40468
write.table(df.location, "Ecoli.allCas9.raw.onehot.tensor.pam.dcast.na.txt", quote=F, row.names=F, sep="\t")
# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
sgRNA.genes <- read.table("sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- sgRNA.genes[,c(4,14)]
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")
sgRNA.genes.df$id <- "Cas9"
sgRNA.genes.id <- unite(sgRNA.genes.df, "sgRNAID", c(sgRNAID, id), sep="_")
score.location <- left_join(score.df, sgRNA.genes.id, by=c("sgRNAID"))
score.location$scale <- 0
df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
# 40468
write.table(df.dcast.na, "ecoli.sgRNA.location.dcast.txt", quote=F, row.names=F, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df.dcast <- read.delim("ecoli.sgRNA.location.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
df <- read.delim("Ecoli.allCas9.raw.onehot.tensor.pam.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df.location <- inner_join(df, df.dcast, by=c("sgRNAID"))
nrow(df.location)
#
write.table(df.location, "Ecoli.allCas9.raw.onehot.tensor.pam.location.dcast.na.txt", quote=F, row.names=F, sep="\t")
salloc -A SYB105 -N 2 -p gpu -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
library(wmtsa)
library(data.table)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
gatc <- read.table("ecoli.gatc.20sliding.bed", header=F, sep="\t", stringsAsFactors = F)
ipd <- read.table("ecoli.ipd.average.20sliding.bed", header=T, sep="\t", stringsAsFactors = F)
gene <- read.table("ecoli.gene.20sliding.bed", header=F, sep="\t", stringsAsFactors = F)
structure <- read.table("ecoli.20sliding.ViennaRNA.output.value.id.txt", header=T, sep="\t", stringsAsFactors = F)
nuc <- read.table("nucleotide_counts_20sliding_temp.txt", header=T, sep="\t", stringsAsFactors = F)
rnaseq <- read.table("ecoli.rnaseq.average.20sliding.bed", header=T, sep="\t", stringsAsFactors = F)
pam <- read.table("NGG.PAM.20bp.sliding.windows.bed", header=F, sep="\t", stringsAsFactors = F)
window <- read.table("ecoli.20bp.sliding.bed", header=F, sep="\t", stringsAsFactors = F)
score <- read.table("sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
colnames(score) <- c("chr", "start", "end", "sgRNA", "id", "seq", "id2", "cut.score", "gid", "change.val", "quality")
score.df <- score[,c(1:4,8)]
gatc.bin <- gatc %>% group_by(V1, V2, V3) %>% mutate(gatc.count = n())
gatc.count <- unique(gatc.bin[,c(1:3,8)])
gene.bin <- gene %>% group_by(V1, V2, V3) %>% mutate(gene.count = n())
gene.count <- unique(gene.bin[,c(1:3,14)])
pam.bin <- pam %>% group_by(V1, V2, V3) %>% mutate(pam.count = n())
pam.count <- unique(pam.bin[,c(1:3,12)])
window.v <- window[,1:3]
colnames(window.v) <- c("V1", "V2", "V3")
gatc.win <- left_join(window.v, gatc.count, by=c("V1", "V2", "V3"))
gatc.win[is.na(gatc.win)] <- 0
gene.win <- left_join(window.v, gene.count, by=c("V1", "V2", "V3"))
gene.win[is.na(gene.win)] <- 0
ipd.win <- left_join(window.v, ipd, by=c("V1", "V2", "V3"))
ipd.win[is.na(ipd.win)] <- 0
rnaseq.win <- left_join(window.v, rnaseq, by=c("V1", "V2", "V3"))
rnaseq.win[is.na(rnaseq.win)] <- 0
pam.win <- left_join(window.v, pam.count, by=c("V1", "V2", "V3"))
pam.win[is.na(pam.win)] <- 0
gene.df <- gene.win$gene.count
gatc.df <- gatc.win$gatc.count
pam.df <- pam.win$pam.count
ipd.df <- ipd.win[,4]
structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]
rna.df <- rnaseq.win[,4]
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/modwt")
temp.modwt <- wavMODWT(temp.df, wavelet="haar")
temp.modwt.df <- as.matrix(temp.modwt)
temp.modwt.label <- data.frame(label = row.names(temp.modwt.df), temp.modwt.df)
temp.modwt.dt <- as.data.table(temp.modwt.label)
temp.modwt.name <- temp.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(temp.modwt.name) <- c("label", "temp.dwt", "scale", "window")
write.table(temp.modwt.name, "temp.modwt.haar.txt", quote=F, row.names=F, sep="\t")
gc.modwt <- wavMODWT(gc.df, wavelet="haar")
gc.modwt.df <- as.matrix(gc.modwt)
gc.modwt.label <- data.frame(label = row.names(gc.modwt.df), gc.modwt.df)
gc.modwt.dt <- as.data.table(gc.modwt.label)
gc.modwt.name <- gc.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(gc.modwt.name) <- c("label", "gc.dwt", "scale", "window")
write.table(gc.modwt.name, "gc.modwt.haar.txt", quote=F, row.names=F, sep="\t")
structure.modwt <- wavMODWT(structure.df, wavelet="haar")
structure.modwt.df <- as.matrix(structure.modwt)
structure.modwt.label <- data.frame(label = row.names(structure.modwt.df), structure.modwt.df)
structure.modwt.dt <- as.data.table(structure.modwt.label)
structure.modwt.name <- structure.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(structure.modwt.name) <- c("label", "structure.dwt", "scale", "window")
write.table(structure.modwt.name, "structure.modwt.haar.txt", quote=F, row.names=F, sep="\t")
rna.modwt <- wavMODWT(rna.df, wavelet="haar")
rna.modwt.df <- as.matrix(rna.modwt)
rna.modwt.label <- data.frame(label = row.names(rna.modwt.df), rna.modwt.df)
rna.modwt.dt <- as.data.table(rna.modwt.label)
rna.modwt.name <- rna.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(rna.modwt.name) <- c("label", "rna.dwt", "scale", "window")
write.table(rna.modwt.name, "rnaseq.modwt.haar.txt", quote=F, row.names=F, sep="\t")
ipd.modwt <- wavMODWT(ipd.df, wavelet="haar")
ipd.modwt.df <- as.matrix(ipd.modwt)
ipd.modwt.label <- data.frame(label = row.names(ipd.modwt.df), ipd.modwt.df)
ipd.modwt.dt <- as.data.table(ipd.modwt.label)
ipd.modwt.name <- ipd.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(ipd.modwt.name) <- c("label", "ipd.dwt", "scale", "window")
write.table(ipd.modwt.name, "ipd.modwt.haar.txt", quote=F, row.names=F, sep="\t")
gene.modwt <- wavMODWT(gene.df, wavelet="haar")
gene.modwt.df <- as.matrix(gene.modwt)
gene.modwt.label <- data.frame(label = row.names(gene.modwt.df), gene.modwt.df)
gene.modwt.dt <- as.data.table(gene.modwt.label)
gene.modwt.name <- gene.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(gene.modwt.name) <- c("label", "gene.dwt", "scale", "window")
write.table(gene.modwt.name, "gene.density.modwt.haar.txt", quote=F, row.names=F, sep="\t")
gatc.modwt <- wavMODWT(gatc.df, wavelet="haar")
gatc.modwt.df <- as.matrix(gatc.modwt)
gatc.modwt.label <- data.frame(label = row.names(gatc.modwt.df), gatc.modwt.df)
gatc.modwt.dt <- as.data.table(gatc.modwt.label)
gatc.modwt.name <- gatc.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(gatc.modwt.name) <- c("label", "gatc.dwt", "scale", "window")
write.table(gatc.modwt.name, "gatc.density.modwt.haar.txt", quote=F, row.names=F, sep="\t")
pam.modwt <- wavMODWT(pam.df, wavelet="haar")
pam.modwt.df <- as.matrix(pam.modwt)
pam.modwt.label <- data.frame(label = row.names(pam.modwt.df), pam.modwt.df)
pam.modwt.dt <- as.data.table(pam.modwt.label)
pam.modwt.name <- pam.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(pam.modwt.name) <- c("label", "pam.dwt", "scale", "window")
write.table(pam.modwt.name, "pam.density.modwt.haar.txt", quote=F, row.names=F, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/modwt")
temp.modwt.name <- read.delim("temp.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
gc.modwt.name <- read.delim("gc.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
structure.modwt.name <- read.delim("structure.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
rna.modwt.name <- read.delim("rnaseq.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
gene.modwt.name <- read.delim("gene.density.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
gatc.modwt.name <- read.delim("gatc.density.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
ipd.modwt.name <- read.delim("ipd.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
pam.modwt.name <- read.delim("pam.density.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
window <- read.table("ecoli.20bp.sliding.bed", header=F, sep="\t", stringsAsFactors = F)
score <- read.table("sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
colnames(score) <- c("chr", "start", "end", "sgRNA", "id", "seq", "id2", "cut.score", "gid", "change.val", "quality")
score.df <- score[,c(1:4,8)]
colnames(window) <- c("chr", "start", "end")
window$window <- seq.int(nrow(window))
window$window <- as.character(window$window-1)
window$start <- as.numeric(window$start)
window$end <- as.numeric(window$end - 1)
window.score.df <- left_join(score.df, window, by=c("chr", "start", "end"))
window.score.df$window <- as.integer(window.score.df$window)
window.score.temp <- left_join(window.score.df, temp.modwt.name[,c(3,4,2)], by="window")
window.temp.gc <- left_join(window.score.temp, gc.modwt.name[,c(3,4,2)], by=c("window", "scale"))
window.temp.gc.structure <- left_join(window.temp.gc, structure.modwt.name[,c(3,4,2)], by=c("window", "scale"))
window.temp.gc.structure.rna <- left_join(window.temp.gc.structure, rna.modwt.name[,c(3,4,2)], by=c("window", "scale"))
window.temp.gc.structure.rna.gene <- left_join(window.temp.gc.structure.rna, gene.modwt.name[,c(3,4,2)], by=c("window", "scale"))
window.temp.gc.structure.rna.gene.gatc <- left_join(window.temp.gc.structure.rna.gene, gatc.modwt.name[,c(3,4,2)], by=c("window", "scale"))
window.temp.gc.structure.rna.gene.gatc.ipd <- left_join(window.temp.gc.structure.rna.gene.gatc, ipd.modwt.name[,c(3,4,2)], by=c("window", "scale"))
window.temp.gc.structure.rna.gene.gatc.ipd.pam <- left_join(window.temp.gc.structure.rna.gene.gatc.ipd, pam.modwt.name[,c(3,4,2)], by=c("window", "scale"))
nrow(window.temp.gc.structure.rna.gene.gatc.ipd.pam)
#
window.temp.gc.structure.rna.gene.gatc.ipd.pam.sgRNA <- subset(window.temp.gc.structure.rna.gene.gatc.ipd.pam, window.temp.gc.structure.rna.gene.gatc.ipd.pam$cut.score != "NA")
nrow(window.temp.gc.structure.rna.gene.gatc.ipd.pam)
#
write.table(window.temp.gc.structure.rna.gene.gatc.ipd.pam.sgRNA, "ecoli.20sliding.exact.DWT.haar.txt", quote=F, row.names=F, sep="\t")
df.melt <- melt(window.temp.gc.structure.rna.gene.gatc.ipd.pam.sgRNA[,c(4,5,7:15)], id=c("cut.score", "scale", "sgRNA"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNA", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNA + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
nrow(df.dcast.na)
#
write.table(df.dcast.na, "ecoli.20sliding.exact.DWT.haar.dcast.txt", quote=F, row.names=F, sep="\t")
# combine regional DWT with other features
library(tidyr)
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df.dcast.na <- read.delim("ecoli.20sliding.exact.DWT.haar.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
df.dcast.sep <- df.dcast.na %>% separate(sgRNA, c("sgRNA", "ID"), sep="_")
df.dcast.dwt <- df.dcast.sep[,c(4:ncol(df.dcast.sep))]
colnames(df.dcast.dwt) <- paste0('sgRNA_', colnames(df.dcast.dwt))
df.dcast <- cbind(df.dcast.sep[,1:3], df.dcast.dwt)
df <- read.delim("Ecoli.allCas9.raw.onehot.tensor.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df.sep <- df %>% separate(sgRNAID, c("sgRNA", "ID", "type"), sep="_")
nrow(df.sep)
# 40468
df.cas9 <- subset(df.sep, df.sep$type == "Cas9")
# 40468
df.sep.region <- inner_join(df.cas9[,c(1:3,1658,5:1651,1653:1657,1659)], df.dcast[,c(1,2,4:ncol(df.dcast))], by=c("sgRNA", "ID"))
df.sep.region.id <- df.sep.region %>% unite(sgRNAID, c("sgRNA", "ID", "type"), sep="_")
nrow(df.sep.region.id)
# 40468
write.table(df.sep.region.id, "ecoli.20sliding.raw.onehot.tensor.dwt.dcast.txt", quote=F, row.names=F, sep="\t")
### kmer positional encoding
import os, sys
import numpy as np
onehot_dict={
'A':'1000',
'C':'0100',
'T':'0010',
'G':'0001'
}
# open input and output files
input_path = sys.argv[1]
input_file = open(input_path, 'r')
dep_file = open(input_path[:-4]+'_dependent1.txt', 'w')
# loop over nucleotide sequences
for idx, line in enumerate(input_file):
# if first iteration, write title line
if idx == 0:
dep_file.writelines(line+': first-order position-dependent features'+ '\n')
# otherwise encode sequence
else:
# split line by tab
line = line.split('\t')
# extract sequence (also remove \n)
seq = line[-1][:-1]
# compute position-dependent features as one-hot vectors
pos_dep = ''.join([onehot_dict[seq[i]] for i in range(len(seq))])
# write features to file
dep_file.writelines(line[0] + '\t' + pos_dep + '\n')
if idx % 10000 == 0:
print('{0:,}'.format(idx)+' lines processed...')
print('Done!')
input_file.close()
dep_file.close()
#/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer1_positional_encode.py
#python file.py data.txt
import os, sys
import numpy as np
onehot_dict = {
'AA':'1000000000000000',
'AC':'0100000000000000',
'AT':'0010000000000000',
'AG':'0001000000000000',
'CA':'0000100000000000',
'CC':'0000010000000000',
'CT':'0000001000000000',
'CG':'0000000100000000',
'TA':'0000000010000000',
'TC':'0000000001000000',
'TT':'0000000000100000',
'TG':'0000000000010000',
'GA':'0000000000001000',
'GC':'0000000000000100',
'GT':'0000000000000010',
'GG':'0000000000000001'
}
# open input and output files
input_path = sys.argv[1]
input_file = open(input_path, 'r')
dep_file = open(input_path[:-4]+'_dependent2.txt', 'w')
# loop over nucleotide sequences
for idx, line in enumerate(input_file):
# if first iteration, write title line
if idx == 0:
dep_file.writelines(line+': second-order position-dependent features'+ '\n')
# otherwise encode sequence
else:
# split line by tab
line = line.split('\t')
# extract sequence (also remove \n)
seq = line[-1][:-1]
# compute position-dependent features as one-hot vectors
pos_dep = ''.join([onehot_dict[seq[i:i+2]] for i in range(len(seq)-1)])
# write features to file
dep_file.writelines(line[0] + '\t' + pos_dep + '\n')
if idx % 10000 == 0:
print('{0:,}'.format(idx)+' lines processed...')
print('Done!')
input_file.close()
dep_file.close()
#/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer2_positional_encode.py
#python file.py data.txt
import os, sys
import numpy as np
onehot_dict = {
'AAA':'1000000000000000000000000000000000000000000000000000000000000000',
'AAC':'0100000000000000000000000000000000000000000000000000000000000000',
'AAT':'0010000000000000000000000000000000000000000000000000000000000000',
'AAG':'0001000000000000000000000000000000000000000000000000000000000000',
'ACA':'0000100000000000000000000000000000000000000000000000000000000000',
'ACC':'0000010000000000000000000000000000000000000000000000000000000000',
'ACT':'0000001000000000000000000000000000000000000000000000000000000000',
'ACG':'0000000100000000000000000000000000000000000000000000000000000000',
'ATA':'0000000010000000000000000000000000000000000000000000000000000000',
'ATC':'0000000001000000000000000000000000000000000000000000000000000000',
'ATT':'0000000000100000000000000000000000000000000000000000000000000000',
'ATG':'0000000000010000000000000000000000000000000000000000000000000000',
'AGA':'0000000000001000000000000000000000000000000000000000000000000000',
'AGC':'0000000000000100000000000000000000000000000000000000000000000000',
'AGT':'0000000000000010000000000000000000000000000000000000000000000000',
'AGG':'0000000000000001000000000000000000000000000000000000000000000000',
'CAA':'0000000000000000100000000000000000000000000000000000000000000000',
'CAC':'0000000000000000010000000000000000000000000000000000000000000000',
'CAT':'0000000000000000001000000000000000000000000000000000000000000000',
'CAG':'0000000000000000000100000000000000000000000000000000000000000000',
'CCA':'0000000000000000000010000000000000000000000000000000000000000000',
'CCC':'0000000000000000000001000000000000000000000000000000000000000000',
'CCT':'0000000000000000000000100000000000000000000000000000000000000000',
'CCG':'0000000000000000000000010000000000000000000000000000000000000000',
'CTA':'0000000000000000000000001000000000000000000000000000000000000000',
'CTC':'0000000000000000000000000100000000000000000000000000000000000000',
'CTT':'0000000000000000000000000010000000000000000000000000000000000000',
'CTG':'0000000000000000000000000001000000000000000000000000000000000000',
'CGA':'0000000000000000000000000000100000000000000000000000000000000000',
'CGC':'0000000000000000000000000000010000000000000000000000000000000000',
'CGT':'0000000000000000000000000000001000000000000000000000000000000000',
'CGG':'0000000000000000000000000000000100000000000000000000000000000000',
'TAA':'0000000000000000000000000000000010000000000000000000000000000000',
'TAC':'0000000000000000000000000000000001000000000000000000000000000000',
'TAT':'0000000000000000000000000000000000100000000000000000000000000000',
'TAG':'0000000000000000000000000000000000010000000000000000000000000000',
'TCA':'0000000000000000000000000000000000001000000000000000000000000000',
'TCC':'0000000000000000000000000000000000000100000000000000000000000000',
'TCT':'0000000000000000000000000000000000000010000000000000000000000000',
'TCG':'0000000000000000000000000000000000000001000000000000000000000000',
'TTA':'0000000000000000000000000000000000000000100000000000000000000000',
'TTC':'0000000000000000000000000000000000000000010000000000000000000000',
'TTT':'0000000000000000000000000000000000000000001000000000000000000000',
'TTG':'0000000000000000000000000000000000000000000100000000000000000000',
'TGA':'0000000000000000000000000000000000000000000010000000000000000000',
'TGC':'0000000000000000000000000000000000000000000001000000000000000000',
'TGT':'0000000000000000000000000000000000000000000000100000000000000000',
'TGG':'0000000000000000000000000000000000000000000000010000000000000000',
'GAA':'0000000000000000000000000000000000000000000000001000000000000000',
'GAC':'0000000000000000000000000000000000000000000000000100000000000000',
'GAT':'0000000000000000000000000000000000000000000000000010000000000000',
'GAG':'0000000000000000000000000000000000000000000000000001000000000000',
'GCA':'0000000000000000000000000000000000000000000000000000100000000000',
'GCC':'0000000000000000000000000000000000000000000000000000010000000000',
'GCT':'0000000000000000000000000000000000000000000000000000001000000000',
'GCG':'0000000000000000000000000000000000000000000000000000000100000000',
'GTA':'0000000000000000000000000000000000000000000000000000000010000000',
'GTC':'0000000000000000000000000000000000000000000000000000000001000000',
'GTT':'0000000000000000000000000000000000000000000000000000000000100000',
'GTG':'0000000000000000000000000000000000000000000000000000000000010000',
'GGA':'0000000000000000000000000000000000000000000000000000000000001000',
'GGC':'0000000000000000000000000000000000000000000000000000000000000100',
'GGT':'0000000000000000000000000000000000000000000000000000000000000010',
'GGG':'0000000000000000000000000000000000000000000000000000000000000001'
}
# open input and output files
input_path = sys.argv[1]
input_file = open(input_path, 'r')
dep_file = open(input_path[:-4]+'_dependent3.txt', 'w')
# loop over nucleotide sequences
for idx, line in enumerate(input_file):
# if first iteration, write title line
if idx == 0:
dep_file.writelines(line+': third-order position-dependent features'+ '\n')
# otherwise encode sequence
else:
# split line by tab
line = line.split('\t')
# extract sequence (also remove \n)
seq = line[-1][:-1]
# compute position-dependent features as one-hot vectors
pos_dep = ''.join([onehot_dict[seq[i:i+3]] for i in range(len(seq)-2)])
# write features to file
dep_file.writelines(line[0] + '\t' + pos_dep + '\n')
if idx % 10000 == 0:
print('{0:,}'.format(idx)+' lines processed...')
print('Done!')
input_file.close()
dep_file.close()
#/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer3_positional_encode.py
#python file.py data.txt
import os, sys
import numpy as np
onehot_dict = {
'AAAA':'1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAAC':'0100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAAT':'0010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAAG':'0001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACA':'0000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACC':'0000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACT':'0000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACG':'0000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATA':'0000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATC':'0000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATT':'0000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATG':'0000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGA':'0000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGC':'0000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGT':'0000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGG':'0000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACAA':'0000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACAC':'0000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACAT':'0000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACAG':'0000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCA':'0000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCC':'0000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCT':'0000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCG':'0000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTA':'0000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTC':'0000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTT':'0000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTG':'0000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGA':'0000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGC':'0000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGT':'0000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGG':'0000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATAA':'0000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATAC':'0000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATAT':'0000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATAG':'0000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCA':'0000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCC':'0000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCT':'0000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCG':'0000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTA':'0000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTC':'0000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTT':'0000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTG':'0000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGA':'0000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGC':'0000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGT':'0000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGG':'0000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGAA':'0000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGAC':'0000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGAT':'0000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGAG':'0000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCA':'0000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCC':'0000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCT':'0000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCG':'0000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTA':'0000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTC':'0000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTT':'0000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTG':'0000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGA':'0000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGC':'0000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGT':'0000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGG':'0000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAAA':'0000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAAC':'0000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAAT':'0000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAAG':'0000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACA':'0000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACC':'0000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACT':'0000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACG':'0000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATA':'0000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATC':'0000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATT':'0000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATG':'0000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000',
'TGTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000',
'TGTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000',
'TGTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000',
'TGGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000',
'TGGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000',
'TGGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000',
'TGGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000',
'GAAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000',
'GAAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000',
'GAAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000',
'GAAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000',
'GACA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000',
'GACC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000',
'GACT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000',
'GACG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000',
'GATA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000',
'GATC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000',
'GATT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000',
'GATG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000',
'GAGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000',
'GAGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000',
'GAGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000',
'GAGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000',
'GCAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000',
'GCAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000',
'GCAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000',
'GCAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000',
'GCCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000',
'GCCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000',
'GCCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000',
'GCCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000',
'GCTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000',
'GCTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000',
'GCTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000',
'GCTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000',
'GCGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000',
'GCGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000',
'GCGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000',
'GCGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000',
'GTAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000',
'GTAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000',
'GTAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000',
'GTAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000',
'GTCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000',
'GTCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000',
'GTCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000',
'GTCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000',
'GTTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000',
'GTTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000',
'GTTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000',
'GTTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000',
'GTGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000',
'GTGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000',
'GTGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000',
'GTGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000',
'GGAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000',
'GGAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000',
'GGAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000',
'GGAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000',
'GGCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000',
'GGCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000',
'GGCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000',
'GGCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000',
'GGTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000',
'GGTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000',
'GGTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000',
'GGTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000',
'GGGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000',
'GGGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100',
'GGGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010',
'GGGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001'
}
# open input and output files
input_path = sys.argv[1]
input_file = open(input_path, 'r')
dep_file = open(input_path[:-4]+'_dependent4.txt', 'w')
# loop over nucleotide sequences
for idx, line in enumerate(input_file):
# if first iteration, write title line
if idx == 0:
dep_file.writelines(line+': fourth-order position-dependent features'+ '\n')
# otherwise encode sequence
else:
# split line by tab
line = line.split('\t')
# extract sequence (also remove \n)
seq = line[-1][:-1]
# compute position-dependent features as one-hot vectors
pos_dep = ''.join([onehot_dict[seq[i:i+4]] for i in range(len(seq)-3)])
# write features to file
dep_file.writelines(line[0] + '\t' + pos_dep + '\n')
if idx % 10000 == 0:
print('{0:,}'.format(idx)+' lines processed...')
print('Done!')
input_file.close()
dep_file.close()
#/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer4_positional_encode.py
#python file.py data.txt
import os, sys
import numpy as np
onehot_dict = {
'AAAAA':'1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAAAC':'0100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAAAT':'0010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAAAG':'0001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAACA':'0000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAACC':'0000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAACT':'0000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAACG':'0000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAATA':'0000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAATC':'0000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAATT':'0000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAATG':'0000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAAGA':'0000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAAGC':'0000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAAGT':'0000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAAGG':'0000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACAA':'0000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACAC':'0000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACAT':'0000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACAG':'0000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACCA':'0000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACCC':'0000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACCT':'0000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACCG':'0000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACTA':'0000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACTC':'0000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACTT':'0000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACTG':'0000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACGA':'0000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACGC':'0000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACGT':'0000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACGG':'0000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATAA':'0000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATAC':'0000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATAT':'0000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATAG':'0000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATCA':'0000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATCC':'0000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATCT':'0000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATCG':'0000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATTA':'0000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATTC':'0000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATTT':'0000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATTG':'0000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATGA':'0000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATGC':'0000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATGT':'0000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATGG':'0000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGAA':'0000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGAC':'0000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGAT':'0000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGAG':'0000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGCA':'0000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGCC':'0000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGCT':'0000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGCG':'0000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGTA':'0000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGTC':'0000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGTT':'0000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGTG':'0000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGGA':'0000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGGC':'0000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGGT':'0000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGGG':'0000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACAAA':'0000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACAAC':'0000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACAAT':'0000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACAAG':'0000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACACA':'0000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACACC':'0000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACACT':'0000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACACG':'0000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACATA':'0000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACATC':'0000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACATT':'0000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACATG':'0000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACAGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACAGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACAGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACAGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATAAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATAAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATAAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATAAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATACA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATACC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATACT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATACG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATATA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATATC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATATT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATATG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATAGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATAGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATAGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATAGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGAAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGAAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGAAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGAAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGACA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGACC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGACT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGACG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGATA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGATC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGATT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGATG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGAGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGAGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGAGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGAGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000',
'AGCCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAAAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAAAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAAAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAAAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAACA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAACC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAACT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAACG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAATA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAATC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAATT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAATG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAAGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAAGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAAGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAAGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCAAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCAAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCAAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCAAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCACA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCACC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCACT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCACG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCATA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCATC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCATT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCATG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCAGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCAGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCAGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCAGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTAAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTAAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTAAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTAAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTACA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTACC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTACT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTACG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTATA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTATC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTATT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTATG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTAGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTAGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTAGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTAGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGAAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGAAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGAAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGAAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGACA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGACC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGACT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGACG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGATA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGATC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGATT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGATG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGAGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGAGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGAGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGAGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAAAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAAAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAAAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAAAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAACA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAACC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAACT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAACG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAATA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAATC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAATT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAATG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAAGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAAGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAAGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAAGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCAAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCAAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCAAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCAAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCACA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCACC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCACT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCACG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCATA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCATC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCATT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCATG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCAGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCAGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCAGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCAGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTAAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTAAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTAAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTAAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTACA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTACC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTACT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTACG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTATA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTATC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTATT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTATG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTAGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTAGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTAGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTAGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGAAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGAAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGAAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGAAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGACA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGACC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGACT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGACG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGATA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGATC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGATT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGATG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGAGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGAGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGAGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGAGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGGAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGGAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGGAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGGAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGGCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGGCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGGCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGGCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGGTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGGTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGGTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGGTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGGGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGGGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGGGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGGGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAAAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAAAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAAAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAAAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAACA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAACC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAACT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAACG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAATA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAATC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAATT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAATG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAAGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAAGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAAGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAAGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GACAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GACAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GACAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GACAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GACCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GACCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GACCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GACCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GACTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GACTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GACTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GACTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GACGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GACGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GACGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GACGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GATAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GATAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GATAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GATAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GATCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GATCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GATCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GATCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GATTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GATTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GATTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GATTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GATGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GATGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GATGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GATGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAGAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAGAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAGAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAGAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAGCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAGCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAGCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAGCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAGTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAGTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAGTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAGTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAGGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAGGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAGGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAGGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCAAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCAAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCAAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCAAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCACA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCACC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCACT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCACG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCATA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCATC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCATT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCATG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCAGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCAGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCAGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCAGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCCAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCCAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCCAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCCAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCCCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCCCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCCCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCCCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCCTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCCTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCCTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCCTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCCGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCCGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCCGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCCGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCTAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCTAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCTAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCTAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCTCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCTCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCTCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCTCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCTTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCTTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCTTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCTTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCTGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCTGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCTGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCTGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCGAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCGAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCGAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCGAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCGCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCGCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCGCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCGCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCGTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCGTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCGTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCGTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCGGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCGGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCGGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCGGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTAAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTAAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTAAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTAAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTACA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTACC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTACT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTACG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTATA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTATC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTATT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTATG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTAGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTAGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTAGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTAGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTCAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTCAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTCAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTCAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTCCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTCCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTCCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTCCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTCTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTCTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTCTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTCTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTCGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTCGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTCGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTCGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTTAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTTAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTTAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTTAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTTCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTTCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTTCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTTCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTTTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTTTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTTTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTTTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTTGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTTGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTTGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTTGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTGAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTGAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTGAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTGAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTGCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTGCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000',
'GTGCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000',
'GTGCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000',
'GTGTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000',
'GTGTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000',
'GTGTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000',
'GTGTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000',
'GTGGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000',
'GTGGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000',
'GTGGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000',
'GTGGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000',
'GGAAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000',
'GGAAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000',
'GGAAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000',
'GGAAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000',
'GGACA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000',
'GGACC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000',
'GGACT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000',
'GGACG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000',
'GGATA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000',
'GGATC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000',
'GGATT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000',
'GGATG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000',
'GGAGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000',
'GGAGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000',
'GGAGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000',
'GGAGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000',
'GGCAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000',
'GGCAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000',
'GGCAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000',
'GGCAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000',
'GGCCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000',
'GGCCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000',
'GGCCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000',
'GGCCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000',
'GGCTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000',
'GGCTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000',
'GGCTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000',
'GGCTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000',
'GGCGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000',
'GGCGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000',
'GGCGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000',
'GGCGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000',
'GGTAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000',
'GGTAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000',
'GGTAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000',
'GGTAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000',
'GGTCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000',
'GGTCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000',
'GGTCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000',
'GGTCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000',
'GGTTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000',
'GGTTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000',
'GGTTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000',
'GGTTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000',
'GGTGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000',
'GGTGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000',
'GGTGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000',
'GGTGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000',
'GGGAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000',
'GGGAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000',
'GGGAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000',
'GGGAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000',
'GGGCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000',
'GGGCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000',
'GGGCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000',
'GGGCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000',
'GGGTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000',
'GGGTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000',
'GGGTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000',
'GGGTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000',
'GGGGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000',
'GGGGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100',
'GGGGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010',
'GGGGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001'
}
# open input and output files
input_path = sys.argv[1]
input_file = open(input_path, 'r')
dep_file = open(input_path[:-4]+'_dependent5.txt', 'w')
# loop over nucleotide sequences
for idx, line in enumerate(input_file):
# if first iteration, write title line
if idx == 0:
dep_file.writelines(line+': fifth-order position-dependent features'+ '\n')
# otherwise encode sequence
else:
# split line by tab
line = line.split('\t')
# extract sequence (also remove \n)
seq = line[-1][:-1]
# compute position-dependent features as one-hot vectors
pos_dep = ''.join([onehot_dict[seq[i]] for i in range(len(seq))])
# write features to file
dep_file.writelines(line[0] + '\t' + pos_dep + '\n')
if idx % 10000 == 0:
print('{0:,}'.format(idx)+' lines processed...')
print('Done!')
input_file.close()
dep_file.close()
#/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer5_positional_encode.py
#python file.py data.txt
def kmer2onehot(kmer, letters='ACTG'):
idx = 0
onehot = '0' * len(letters)**len(kmer)
for position, mer in enumerate(kmer[::-1]):
idx += letters.index(mer) * len(letters)**(position)
onehot = onehot[:idx] + '1' + onehot[idx+1:]
return onehot
–> testing methods in python
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
# conda install -c conda-forge more-itertools
python
from itertools import product
from string import ascii_lowercase
alphabet = 'ACTG'
keywords = [''.join(i) for i in product(alphabet, repeat = 5)]
print(keywords)
def kmer2onehot(kmer, letters='ACTG'):
idx = 0
onehot = '0' * len(letters)**len(kmer)
for position, mer in enumerate(kmer[::-1]):
idx += letters.index(mer) * len(letters)**(position)
onehot = onehot[:idx] + '1' + onehot[idx+1:]
return onehot
global onehot
# onehot = []
# for i in keywords:
# kmer2onehot = kmer2onehot(i, letters='ACTG')
# #print("'" + i + "'" + ":" + "'" + onehot + "',")
# onehot.append("'" + i + "'" + ":" + "'" + kmer2onehot + "',")
onehot_dict = {}
for kmer in keywords:
onehot_dict[kmer] = kmer2onehot(kmer, letters='ACTG')
### kmer positional encoding
import os, sys
import numpy as np
# onehot_dict={
# THIS IS WHERE I NEED THE OUTPUT FROM THE PREVIOUS FUNCTION...
# }
# open input and output files
input_path = sys.argv[1]
input_file = open(input_path, 'r')
dep_file = open(input_path[:-4]+'_dependent5.txt', 'w')
# loop over nucleotide sequences
for idx, line in enumerate(input_file):
# if first iteration, write title line
if idx == 0:
dep_file.writelines(line+': fifth-order position-dependent features'+ '\n')
# otherwise encode sequence
else:
# split line by tab
line = line.split('\t')
# extract sequence (also remove \n)
seq = line[-1][:-1]
# compute position-dependent features as one-hot vectors
pos_dep = ''.join([onehot_dict[seq[i:i+5]] for i in range(len(seq)-4)])
# write features to file
dep_file.writelines(line[0] + '\t' + pos_dep + '\n')
if idx % 10000 == 0:
print('{0:,}'.format(idx)+' lines processed...')
print('Done!')
input_file.close()
dep_file.close()
#/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer5_positional_encode.py
#python file.py data.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/
python ../kmer1_positional_encode.py Ecoli.allCas9.noscore.txt
python ../kmer2_positional_encode.py Ecoli.allCas9.noscore.txt
python ../kmer3_positional_encode.py Ecoli.allCas9.noscore.txt
python ../kmer4_positional_encode.py Ecoli.allCas9.noscore.txt
python ../kmer5_positional_encode.py Ecoli.allCas9.noscore.txt
python ../kmer6_positional_encode.py Ecoli.allCas9.noscore.txt
python ../kmer7_positional_encode.py Ecoli.allCas9.noscore.txt
python ../kmer8_positional_encode.py Ecoli.allCas9.noscore.txt
# separate nucleotide sequence values into individual columns in data frame so each position counts as one feature
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/
sed '1d' Ecoli.allCas9.noscore_dependent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Ecoli.allCas9_dep1.txt
sed '1d' Ecoli.allCas9.noscore_dependent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Ecoli.allCas9_dep2.txt
sed '1d' Ecoli.allCas9.noscore_dependent3.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Ecoli.allCas9_dep3.txt
sed '1d' Ecoli.allCas9.noscore_dependent4.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Ecoli.allCas9_dep4.txt
sed '1d' Ecoli.allCas9.noscore_dependent5.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Ecoli.allCas9_dep5.txt
sed '1d' Ecoli.allCas9.noscore_dependent6.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Ecoli.allCas9_dep6.txt
sed '1d' Ecoli.allCas9.noscore_dependent7.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Ecoli.allCas9_dep7.txt
sed '1d' Ecoli.allCas9.noscore_dependent8.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Ecoli.allCas9_dep8.txt
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J kmer.onehot.matrix
#SBATCH -N 2
#SBATCH -t 24:00:00
#SBATCH --mem-per-cpu=0
module load python
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
R CMD BATCH onehot.kmer1to8.score.matrix.R
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/onehot.kmer1to8.score.matrix.sh
# salloc -A SYB105 -N 2 -p gpu -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
score <- read.delim("Ecoli.allCas9.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(1:2)]
colnames(score.df) <- c("sgRNAID", "cut.score")
onehot.dep1 <- read.delim("Ecoli.allCas9_dep1.txt", header=F, sep=" ")
onehot.dep2 <- read.delim("Ecoli.allCas9_dep2.txt", header=F, sep=" ")
onehot.dep3 <- read.delim("Ecoli.allCas9_dep3.txt", header=F, sep=" ")
onehot.dep4 <- read.delim("Ecoli.allCas9_dep4.txt", header=F, sep=" ")
onehot.dep5 <- read.delim("Ecoli.allCas9_dep5.txt", header=F, sep=" ")
onehot.dep6 <- read.delim("Ecoli.allCas9_dep6.txt", header=F, sep=" ")
onehot.dep7 <- read.delim("Ecoli.allCas9_dep7.txt", header=F, sep=" ")
onehot.dep8 <- read.delim("Ecoli.allCas9_dep8.txt", header=F, sep=" ")
colnames(onehot.dep1)[1] <- "sgRNAID"
colnames(onehot.dep2)[1] <- "sgRNAID"
colnames(onehot.dep3)[1] <- "sgRNAID"
colnames(onehot.dep4)[1] <- "sgRNAID"
colnames(onehot.dep5)[1] <- "sgRNAID"
colnames(onehot.dep6)[1] <- "sgRNAID"
colnames(onehot.dep7)[1] <- "sgRNAID"
colnames(onehot.dep8)[1] <- "sgRNAID"
onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)-1], onehot.dep2[,1:ncol(onehot.dep2)-1], by="sgRNAID")
onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)-1], by="sgRNAID")
onehot.dep1234 <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)-1], by="sgRNAID")
onehot.dep12345 <- full_join(onehot.dep1234, onehot.dep5[,1:ncol(onehot.dep5)-1], by="sgRNAID")
onehot.dep123456 <- full_join(onehot.dep12345, onehot.dep6[,1:ncol(onehot.dep6)-1], by="sgRNAID")
onehot.dep1234567 <- full_join(onehot.dep123456, onehot.dep7[,1:ncol(onehot.dep7)-1], by="sgRNAID")
onehot.dep <- full_join(onehot.dep1234567, onehot.dep8[,1:ncol(onehot.dep8)-1], by="sgRNAID")
onehot.score <- full_join(score.df, onehot.dep, by="sgRNAID")
df.melt <- melt(onehot.score, id=c("cut.score", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "sgRNAID", "variable", "value")
df$value <- as.numeric(df$value)
df.id <- df[!(is.na(df$value) | df$value==""), ]
colnames(df.id) <- c("cut.score", "sgRNAID", "feature", "value")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
write.table(df.dcast, "Ecoli.allCas9.kmer1to8.encoding.txt", quote=F, row.names=F, sep="\t")
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J iRF.onehot.kmer
#SBATCH -N 1
#SBATCH -p gpu
#SBATCH -t 10:00:00
#SBATCH --mem-per-cpu=0
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
R CMD BATCH iRF.onehot.kmer.R
R CMD BATCH iRF.onehot.kmer.control.R
R CMD BATCH iRF.onehot.kmer1to8.R
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.onehot.kmer.sh
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(ranger)
iRF <- function(xmat, y, ntree=200, iter=5, classification=F, threads=1, alwayssplits=NULL, saveall=T)
{
tmp <- cbind(xmat, Y = y)
wt <- rep(1/ncol(xmat), ncol(xmat)) # start with equal sample weighting per SNP
rfs <- list()
for(i in 1:iter)
{
cat("\niRF iteration ",i,"\n")
cat("=================\n")
mtry = 0.5*sum(wt>0)
rf <- ranger::ranger(dependent.variable.name = "Y", data = tmp, num.trees=ntree,
split.select.weights = wt, classification = classification,
mtry = mtry, importance = "impurity_corrected", num.threads=threads, write.forest = T,
always.split.variables = alwayssplits)
wt <- rf$variable.importance / sum(abs(rf$variable.importance)) # scale importance to range(0,1)
wt[wt<0] <- 0 # set negative weights to zero
cat("mtry: ", mtry, "\n")
cat("prediction error: ",rf$prediction.error,"\n")
if(classification==FALSE) cat("r^2: ",rf$r.squared,"\n")
if(classification==TRUE) print(rf$confusion.matrix)
cat("cor(y,yhat): ",cor(rf$predictions,y),"\n")
cat("SNPs with importance > 0:",sum(wt>0),"\n")
if(saveall) rfs[[i]] <- rf
if(sum(wt>0) < max(0.01*(ncol(xmat)-1), 10))
{
if(!saveall) rfs <- rf
break
}
}
return(rfs)
}
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("Ecoli.allCas9.kmer.encoding.txt", header=T, sep="\t", stringsAsFactors = F)
df.sep <- separate(df, sgRNAID, c("sgRNA", "ID", "cas"), sep="_")
df.cas9 <- subset(df.sep, df.sep$cas == "Cas9")
df.cas9.id <- unite(df.cas9, "sgRNAID", c(sgRNA, ID, cas), sep="_")
set.seed(2458)
df.sample <- df.cas9.id[sample(nrow(df.cas9.id), 10000), ]
# kmer = 1
df.1 <- df.sample[,c(2:82)]
iRF(df.1[,2:ncol(df.1)], df.1$cut.score)
# iRF iteration 2
# =================
# mtry: 31.5
# prediction error: 89.54715
# r^2: 0.194584
# cor(y,yhat): 0.4412569
# SNPs with importance > 0: 45
# kmer = 2
df.2 <- df.sample[,c(2,83:386)]
iRF(df.2[,2:ncol(df.2)], df.2$cut.score)
# iRF iteration 2
# =================
# mtry: 89
# prediction error: 88.42499
# r^2: 0.2046771
# cor(y,yhat): 0.4546679
# SNPs with importance > 0: 114
# kmer = 3
df.3 <- df.sample[,c(2,387:1538)]
iRF(df.3[,2:ncol(df.3)], df.3$cut.score)
# iRF iteration 3
# =================
# mtry: 196
# prediction error: 88.91244
# r^2: 0.2002929
# cor(y,yhat): 0.470527
# SNPs with importance > 0: 282
# kmer = 4
df.4 <- df.sample[,c(2,1539:5890)]
iRF(df.4[,2:ncol(df.4)], df.4$cut.score)
# iRF iteration 4
# =================
# mtry: 599
# prediction error: 89.92779
# r^2: 0.1911605
# cor(y,yhat): 0.4695909
# SNPs with importance > 0: 931
# kmer = 5
df.5 <- df.sample[,c(2,5891:ncol(df.sample))]
iRF(df.5[,2:ncol(df.5)], df.5$cut.score)
# kmer = 1 + 2
df.1.2 <- df.sample[,c(2:386)]
iRF(df.1.2[,2:ncol(df.1.2)], df.1.2$cut.score)
# iRF iteration 2
# =================
# mtry: 111.5
# prediction error: 86.3626
# r^2: 0.2232269
# cor(y,yhat): 0.472461
# SNPs with importance > 0: 136
# kmer = 1 + 2 + 3
df.1.2.3 <- df.sample[,c(2:1538)]
iRF(df.1.2.3[,2:ncol(df.1.2.3)], df.1.2.3$cut.score)
# iRF iteration 5
# =================
# mtry: 127.5
# prediction error: 84.82643
# r^2: 0.2370438
# cor(y,yhat): 0.4899957
# SNPs with importance > 0: 214
# kmer = 1 + 2 + 3 + 4
df.1.2.3.4 <- df.sample[,c(2:5890)]
iRF(df.1.2.3.4[,2:ncol(df.1.2.3.4)], df.1.2.3.4$cut.score)
# iRF iteration 5
# =================
# mtry: 460
# prediction error: 81.89764
# r^2: 0.2633862
# cor(y,yhat): 0.5152009
# SNPs with importance > 0: 738
# kmer = 1 + 2 + 3 + 4 + 5
df.1.2.3.4.5 <- df.sample[,c(2:ncol(df.sample))]
iRF(df.1.2.3.4.5[,2:ncol(df.1.2.3.4.5)], df.1.2.3.4.5$cut.score)
######################## NEED TO FIGURE OUT HOW TO NON-MANUALLY CODE THE KMERS ########################
--> control... test using 2-mer twice instead of using 4-mer to see what that does?
df.1.2.3 <- df.sample[,c(2:1538)]
df.2 <- df.sample[,c(2,83:386)]
df.1.2.3.2 <- cbind(df.1.2.3, df.2[,2:ncol(df.2)], df.2[,2:ncol(df.2)])
iRF(df.1.2.3.2[,2:ncol(df.1.2.3.2)], df.1.2.3.2$cut.score)
# iteration 3
# mtry: 343
# prediction error: 84.09708
# r^2: 0.2436038
# cor(y,yhat): 0.4937429 <--- adding more and more kmers is really just accentuating the same information???
# SNPs with importance > 0: 482
### expanding kmers to 8
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("Ecoli.allCas9.kmer1to8.encoding.txt", header=T, sep="\t", stringsAsFactors = F)
df.sep <- separate(df, sgRNAID, c("sgRNA", "ID", "cas"), sep="_")
df.cas9 <- subset(df.sep, df.sep$cas == "Cas9")
df.cas9.id <- unite(df.cas9, "sgRNAID", c(sgRNA, ID, cas), sep="_")
set.seed(2458)
df.sample <- df.cas9.id[sample(nrow(df.cas9.id), 10000), ]
# kmer = 1
df.1 <- df.sample[,c(2:82)]
iRF(df.1[,2:ncol(df.1)], df.1$cut.score)
# kmer = 2
df.2 <- df.sample[,c(2,83:386)]
iRF(df.2[,2:ncol(df.2)], df.2$cut.score)
# kmer = 3
df.3 <- df.sample[,c(2,387:1538)]
iRF(df.3[,2:ncol(df.3)], df.3$cut.score)
# kmer = 4
df.4 <- df.sample[,c(2,1539:5890)]
iRF(df.4[,2:ncol(df.4)], df.4$cut.score)
# kmer = 5
df.5 <- df.sample[,c(2,5891:)]
iRF(df.5[,2:ncol(df.5)], df.5$cut.score)
# kmer = 6
df.5 <- df.sample[,c(2,)]
iRF(df.5[,2:ncol(df.5)], df.5$cut.score)
# kmer = 7
df.5 <- df.sample[,c(2,)]
iRF(df.5[,2:ncol(df.5)], df.5$cut.score)
# kmer = 8
df.5 <- df.sample[,c(2,)]
iRF(df.5[,2:ncol(df.5)], df.5$cut.score)
# kmer = 1 - 2
df.1.2 <- df.sample[,c(2:386)]
iRF(df.1.2[,2:ncol(df.1.2)], df.1.2$cut.score)
# kmer = 1 - 3
df.1.2.3 <- df.sample[,c(2:1538)]
iRF(df.1.2.3[,2:ncol(df.1.2.3)], df.1.2.3$cut.score)
# kmer = 1 - 4
df.1.2.3 <- df.sample[,c(2:5890)]
iRF(df.1.2.3[,2:ncol(df.1.2.3)], df.1.2.3$cut.score)
# kmer = 1 - 5
df.1.2.3 <- df.sample[,c(2:)]
iRF(df.1.2.3[,2:ncol(df.1.2.3)], df.1.2.3$cut.score)
# kmer = 1 - 6
df.1.2.3 <- df.sample[,c(2:)]
iRF(df.1.2.3[,2:ncol(df.1.2.3)], df.1.2.3$cut.score)
# kmer = 1 - 7
df.1.2.3 <- df.sample[,c(2:)]
iRF(df.1.2.3[,2:ncol(df.1.2.3)], df.1.2.3$cut.score)
# kmer = 1 - 8
df.1.2.3 <- df.sample[,c(2:)]
iRF(df.1.2.3[,2:ncol(df.1.2.3)], df.1.2.3$cut.score)
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J e.coli.full
#SBATCH -N 1
#SBATCH -t 10:00:00
#SBATCH --mem-per-cpu=0
#SBATCH -o e.coli.full-%j.o
#SBATCH -e e.coli.full-%j.e
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
R CMD BATCH e.coli.full.R
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.full.sh
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
full <- read.delim("ecoli.20sliding.raw.onehot.tensor.dwt.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
df <- read.delim("Ecoli.allCas9.kmer.encoding.txt", header=T, sep="\t", stringsAsFactors = F)
df.sep <- separate(df, sgRNAID, c("sgRNA", "ID", "cas"), sep="_")
df.cas9 <- subset(df.sep, df.sep$cas == "Cas9")
df.cas9.id <- unite(df.cas9, "sgRNAID", c(sgRNA, ID, cas), sep="_")
library(dplyr)
df.full <- left_join(full, df.cas9.id[,c(1,387:ncol(df.cas9.id))], by="sgRNAID")
# 7343
write.table(df.full, "ecoli.20sliding.raw.onehot.kmer1to4.tensor.dwt.dcast.txt", quote=F, row.names=F, sep="\t")
set.seed(2458)
df.sample <- df.full[sample(nrow(df.full), 10000), ]
library(ranger)
iRF <- function(xmat, y, ntree=200, iter=5, classification=F, threads=1, alwayssplits=NULL, saveall=T)
{
tmp <- cbind(xmat, Y = y)
wt <- rep(1/ncol(xmat), ncol(xmat)) # start with equal sample weighting per SNP
rfs <- list()
for(i in 1:iter)
{
cat("\niRF iteration ",i,"\n")
cat("=================\n")
mtry = 0.5*sum(wt>0)
rf <- ranger::ranger(dependent.variable.name = "Y", data = tmp, num.trees=ntree,
split.select.weights = wt, classification = classification,
mtry = mtry, importance = "impurity_corrected", num.threads=threads, write.forest = T,
always.split.variables = alwayssplits)
wt <- rf$variable.importance / sum(abs(rf$variable.importance)) # scale importance to range(0,1)
wt[wt<0] <- 0 # set negative weights to zero
cat("mtry: ", mtry, "\n")
cat("prediction error: ",rf$prediction.error,"\n")
if(classification==FALSE) cat("r^2: ",rf$r.squared,"\n")
if(classification==TRUE) print(rf$confusion.matrix)
cat("cor(y,yhat): ",cor(rf$predictions,y),"\n")
cat("SNPs with importance > 0:",sum(wt>0),"\n")
if(saveall) rfs[[i]] <- rf
if(sum(wt>0) < max(0.01*(ncol(xmat)-1), 10))
{
if(!saveall) rfs <- rf
break
}
}
return(rfs)
}
df.kmer <- df.sample[,c(2:ncol(df.sample))]
iRF(df.kmer[,2:ncol(df.kmer)], df.kmer$cut.score)
# iRF iteration 5
# =================
# mtry: 481
# prediction error: 85.28233
# r^2: 0.2209295
# cor(y,yhat): 0.4711767
# SNPs with importance > 0: 718
df.raw <- df.sample[,c(2,1642:1644,1650,1653)]
iRF(df.raw[,2:ncol(df.raw)], df.raw$cut.score)
# iRF iteration 1
# =================
# mtry: 2.5
# prediction error: 107.6557
# r^2: 0.01654484
# cor(y,yhat): 0.1625113
# SNPs with importance > 0: 2
df.dwt <- df.sample[,c(2,1656:1839)]
iRF(df.dwt[,2:ncol(df.dwt)], df.dwt$cut.score)
# iRF iteration 2
# =================
# mtry: 61.5
# prediction error: 101.7383
# r^2: 0.07060119
# cor(y,yhat): 0.2656731
# SNPs with importance > 0: 91
df.onehot <- df.sample[,c(2,3:17,1645:1649,1651:1652,1654:1655,18:57,120:139,202:221,284:303,366:385,448:467,530:549,612:631,694:713,776:795,920:943,1068:1087,1150:1169,1232:1251,1314:1333,1396:1415,1478:1497,1560:1579,1840:ncol(df.sample))]
iRF(df.onehot[,2:ncol(df.onehot)], df.onehot$cut.score)
# iRF iteration 5
# =================
# mtry: 441
# prediction error: 80.37818
# r^2: 0.2657299
# cor(y,yhat): 0.5159296
# SNPs with importance > 0: 707
df.quantum <- df.sample[,c(2,58:119,140:201,222:283,304:365,386:447,468:529,550:611,632:693,714:775,796:919,944:1067,1088:1149,1170:1231,1252:1313,1334:1395,1416:1477,1498:1559,1580:1641)]
iRF(df.quantum[,2:ncol(df.quantum)], df.quantum$cut.score)
# iRF iteration 5
# =================
# mtry: 125
# prediction error: 88.3275
# r^2: 0.1931113
# cor(y,yhat): 0.442198
# SNPs with importance > 0: 175
df.raw.dwt <- cbind(df.raw, df.dwt[,2:ncol(df.dwt)])
iRF(df.raw.dwt[,2:ncol(df.raw.dwt)], df.raw.dwt$cut.score)
# iRF iteration 5
# =================
# mtry: 31.5
# prediction error: 101.5211
# r^2: 0.07258498
# cor(y,yhat): 0.2709458
# SNPs with importance > 0: 53
df.raw.onehot <- cbind(df.raw, df.onehot[,2:ncol(df.onehot)])
iRF(df.raw.onehot[,2:ncol(df.raw.onehot)], df.raw.onehot$cut.score)
# iRF iteration 5
# =================
# mtry: 453
# prediction error: 80.5656
# r^2: 0.2640177
# cor(y,yhat): 0.5147411
# SNPs with importance > 0: 725
df.raw.quantum <- cbind(df.raw, df.quantum[,2:ncol(df.quantum)])
iRF(df.raw.quantum[,2:ncol(df.raw.quantum)], df.raw.quantum$cut.score)
# iRF iteration 3
# =================
# mtry: 234
# prediction error: 87.29735
# r^2: 0.2025218
# cor(y,yhat): 0.451906
# SNPs with importance > 0: 319
df.onehot.dwt <- cbind(df.onehot, df.dwt[,2:ncol(df.dwt)])
iRF(df.onehot.dwt[,2:ncol(df.onehot.dwt)], df.onehot.dwt$cut.score)
# iRF iteration 3
# =================
# mtry: 680.5
# prediction error: 84.89164
# r^2: 0.2244985
# cor(y,yhat): 0.4753814
# SNPs with importance > 0: 812
df.onehot.quantum <- cbind(df.onehot, df.quantum[,2:ncol(df.quantum)])
iRF(df.onehot.quantum[,2:ncol(df.onehot.quantum)], df.onehot.quantum$cut.score)
# iRF iteration 4
# =================
# mtry: 708
# prediction error: 81.49942
# r^2: 0.255487
# cor(y,yhat): 0.5069903
# SNPs with importance > 0: 1038
df.quantum.dwt <- cbind(df.quantum, df.dwt[,2:ncol(df.dwt)])
iRF(df.quantum.dwt[,2:ncol(df.quantum.dwt)], df.quantum.dwt$cut.score)
# iRF iteration 5
# =================
# mtry: 194.5
# prediction error: 87.64181
# r^2: 0.1993751
# cor(y,yhat): 0.4479156
# SNPs with importance > 0: 312
df.raw.dwt.onehot <- cbind(df.raw, df.dwt[,2:ncol(df.dwt)], df.onehot.quantum[,2:ncol(df.onehot.quantum)])
iRF(df.raw.dwt.onehot[,2:ncol(df.raw.dwt.onehot)], df.raw.dwt.onehot$cut.score)
# iteration 5
# mtry: 412
# prediction error: 84.02855
# r^2: 0.232383
# cor(y,yhat): 0.4842411
# SNPs with importance > 0: 612
df.raw.dwt.quantum <- cbind(df.raw, df.dwt[,2:ncol(df.dwt)], df.quantum[,2:ncol(df.quantum)])
iRF(df.raw.dwt.quantum[,2:ncol(df.raw.dwt.quantum)], df.raw.dwt.quantum$cut.score)
# iRF iteration 5
# =================
# mtry: 186.5
# prediction error: 87.38436
# r^2: 0.201727
# cor(y,yhat): 0.4502237
# SNPs with importance > 0: 322
df.raw.onehot.quantum <- cbind(df.raw, df.onehot[,2:ncol(df.onehot)], df.quantum[,2:ncol(df.quantum)])
iRF(df.raw.onehot.quantum[,2:ncol(df.raw.onehot.quantum)], df.raw.onehot.quantum$cut.score)
# iteration 5
# mtry: 531
# prediction error: 81.57787
# r^2: 0.2547704
# cor(y,yhat): 0.5064083
# SNPs with importance > 0: 833
df.dwt.onehot.quantum <- cbind(df.dwt, df.onehot[,2:ncol(df.onehot)], df.quantum[,2:ncol(df.quantum)])
iRF(df.dwt.onehot.quantum[,2:ncol(df.dwt.onehot.quantum)], df.dwt.onehot.quantum$cut.score)
df.all <- cbind(df.dwt, df.onehot[,2:ncol(df.onehot)], df.raw[,2:ncol(df.raw)], df.quantum[,2:ncol(df.quantum)])
iRF(df.all[,2:ncol(df.all)], df.all$cut.score)
library(dplyr)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("ecoli.20sliding.raw.onehot.kmer1to4.tensor.dwt.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
ncol(df)
# 7343
df.num <- mutate_all(df[,2:ncol(df)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"
write.table(df.all[,c(1,3:ncol(df.all))], "e.coli.cas9.raw.onehot.tensor.pam.location.dwt.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "e.coli.cas9.raw.onehot.tensor.pam.location.dwt.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "e.coli.cas9.raw.onehot.tensor.pam.location.dwt.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "e.coli.cas9.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "e.coli.cas9.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.all[,2]), "e.coli.cas9.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("Ecoli.allCas9.raw.onehot.tensor.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df.sep <- df %>% separate(sgRNAID, c("sgRNA", "ID", "type"), sep="_")
df.cas9 <- subset(df.sep, df.sep$type == "Cas9")
df <- df.cas9[,c(1:3,1658,5:1651,1653:1657,1659)]
df.cas9.id <- unite(df, "sgRNAID", c(sgRNA, ID, type), sep="_")
write.table(df.cas9.id, "Ecoli.allCas9.raw.onehot.tensor.pam.location.dcast.txt", quote=F, row.names=F, sep="\t")
ncol(df)
# 1657
df.num <- mutate_all(df.cas9.id[,2:ncol(df.cas9.id)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"
write.table(df.all[,c(1,3:ncol(df.all))], "e.coli.cas9.raw.onehot.tensor.pam.location.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "e.coli.cas9.raw.onehot.tensor.pam.location.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "e.coli.cas9.raw.onehot.tensor.pam.location.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
# run python scripts on Andes
# run job submissions on Summit
# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]
# Andes
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.noDWT
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.noDWT
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName e.coli.noDWT --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.cas9.raw.onehot.tensor.pam.location.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.cas9.score.txt
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.cas9.DWT
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.cas9.DWT
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName e.coli.cas9.DWT --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.cas9.raw.onehot.tensor.pam.location.dwt.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.cas9.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.noDWT/Submits/submit_full_e.coli.noDWT_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.cas9.DWT/Submits/submit_full_e.coli.cas9.DWT_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.noDWT/Submits/submit_train_e.coli.noDWT_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.cas9.DWT/Submits/submit_train_e.coli.cas9.DWT_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.noDWT/Submits/submit_test_e.coli.noDWT_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.cas9.DWT/Submits/submit_test_e.coli.cas9.DWT_0.sh
# Andes
module load python/3.7-anaconda3
vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.noDWT
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt e.coli.noDWT
#
sort -k3rg topVarEdges/cut.score_top95.txt | head
# p20homo_lumo_energygapraw cut.score 0.07800873556611687
# p15.CCsgRNA.raw cut.score 0.03354121734879912
# GGsgRNA.raw cut.score 0.032218576015115866
# p19.GGsgRNA.raw cut.score 0.03201591141170038
# pam.distance0 cut.score 0.03177960466450477
# CCsgRNA.raw cut.score 0.03044758321566565
# sgRNA.gcsgRNA.raw cut.score 0.028365152554358
# sgRNA.tempsgRNA.raw cut.score 0.026106844733878046
# TsgRNA.raw cut.score 0.024264328556364425
# p20xz_quadrupoleraw cut.score 0.021899457216109038
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/e.coli.noDWT_cut.score.importance4 | head
# p20homo_lumo_energygapraw: 22341
# p15.CCsgRNA.raw: 13634
# sgRNA.gcsgRNA.raw: 11941.9
# sgRNA.tempsgRNA.raw: 11300.7
# p19.GGsgRNA.raw: 10844.7
# GGsgRNA.raw: 10575.2
# CCsgRNA.raw: 10572
# pam.distance0: 9257.53
# TsgRNA.raw: 8865.36
# GsgRNA.raw: 8007.28
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.noDWT/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("e.coli.noDWT_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.09442431 <-- SOMETHING ISN'T RIGHT HERE....
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.cas9.DWT
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt e.coli.cas9e.DWT
#
sort -k3rg topVarEdges/cut.score_top95.txt | head
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/e.coli.cas9.DWT_cut.score.importance4 | head
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.cas9.DWT/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("e.coli.cas9.DWT_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
#
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate shap
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
# python
import pandas as pd
import numpy as np
np.random.seed(0)
import matplotlib.pyplot as plt
df = pd.read_table('/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.pam.location.dcast.txt') # Load the data
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.ensemble import RandomForestRegressor
# The target variable is 'cut.score'.
Y = df['cut.score']
# get list of features from R... dput(colnames(df))
X = df.drop(columns =['sgRNAID', 'cut.score'])
# Split the data into train and test data:
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size = 0.2)
# Build the model with the random forest regression algorithm:
model = RandomForestRegressor(max_depth=6,random_state=0,n_estimators=10)
model.fit(X_train, Y_train)
import shap
shap_values = shap.TreeExplainer(model).shap_values(X_train)
f = plt.figure()
shap.summary_plot(shap_values, X_train, plot_type="bar")
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.noDWT.16dec.shap_summary_plot_bar.png", bbox_inches='tight', dpi=600)
import matplotlib.pyplot as plt
f = plt.figure()
shap.summary_plot(shap_values, X_train)
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.noDWT.16dec.shap_summary_plot_varimp.png", bbox_inches='tight', dpi=600)
# scp noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.noDWT.16dec.shap_summary_plot_varimp.png /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP/.
** Need to compile the C++ file /gpfs/alpine/syb105/proj-shared/Personal/jromero/codesnippets/ritw **
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00
# /gpfs/alpine/syb105/proj-shared/Personal/jromero/PathAnalysis/runRIT.sh
## cp /gpfs/alpine/syb105/proj-shared/Personal/jromero/PathAnalysis/runRIT.sh /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh
# runRIT.sh feature name ### Note: name is name of the run and feature is the name of the y-value
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/Ecoli.allCas9/all.features/dwt20bp.noncor2.cas9/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score dwt20bp.noncor2.cas9
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/Ecoli.allCas9/all.features/dwt20bp.noncor2.cas9/cut.score/RIT.run
Email from Stephan: Attached please find our recent DNA/RNA monomer base and base pair data for now. This data was created with the help of Bredesen Center student Tyler Walker, cc’d to this message. Listed are total energies (please ignore), HOMO energy in eV, LUMO energy in eV, HOMO-LUMO (HL gap) gap in eV, number of valence electrons in the molecule (this determines to some extent the HOMO energy), as well as interaction energy E in kcal/mol. We have checked the literature and our HL gap data for monomers is consistent with previously reported results.
Somewhat surprisingly, when hydrogen bonded base pairs are formed through hydrogen bonding, the HL gaps are significantly reduced, and I have to research this a bit more to see if others have reported this behavior as well. Another surprise, perhaps more significant, is that the GC base pair is significantly stronger H-bonded than AT and AU pairs. If one had assumed additivity rules, one would have expected: -5 kcal/mol (1 H bond) (water dimer) -10 kcal/mol (2 H bonds) (AT, AU) -15 kcal/mol (3 H bonds) (GC) Instead, we see -11 kcal/mol for AT and AU (consistent with additivity model) but -25 kcal/mol, which overshoots the additivity model by 10 kcal/mol, indicating a synergistic strengthening due to the 3rd H bond in this base pair. Naively speaking one would therefore expect that unpairing GC pairs requires more energy than AT/AU pairs. This however should be done by the RNA polymerase, I assume. Not sure about CRISPR-CAS9.
# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("protein_rna_dna-vector_lee_nucleotide_dna_data_15dec.txt", header=T, sep="\t", stringsAsFactors = F)
seq <- read.delim("Ecoli.allCas9.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:5]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- c("A", "C", "G", "T")
rownames(seq) <- seq[,1]
seq.df <- seq[,2:21]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Ecoli.allCas9.tensorsDNARNA.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Ecoli.allCas9.tensorsDNARNA.melt.txt", quote=F, row.names=F, sep="\t")
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
structure <- read.delim("Ecoli.allCas9.structure.txt", header=T, sep="\t", stringsAsFactors = F)
nuc <- read.delim("Ecoli.allCas9.nuc.count.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("Ecoli.allCas9.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(1:2)]
colnames(score.df) <- c("sgRNAID", "cut.score")
structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]
# structure, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])
structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"
structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]
structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "seq", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")
## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
onehot.ind1 <- read.delim("Ecoli.allCas9_ind1.txt", header=T, sep=" ")
onehot.ind2 <- read.delim("Ecoli.allCas9_ind2.txt", header=T, sep=" ")
onehot.dep1 <- read.delim("Ecoli.allCas9_dep1.txt", header=T, sep=" ")
onehot.dep2 <- read.delim("Ecoli.allCas9_dep2.txt", header=T, sep=" ")
onehot.dep2 <- onehot.dep2[,1:305]
onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep <- full_join(onehot.dep1, onehot.dep2, by="sgRNAID")
onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"
data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))
df.melt <- melt(data.onehot[,c(1,2,4:ncol(data.onehot))], id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
tensor <- read.delim("Ecoli.allCas9.tensorsDNARNA.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0
tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0
write.table(tensor.id, "tensor.id.test", quote=F, row.names=F, sep="\t")
tensor.id <- read.delim("tensor.id.test", header=T, sep="\t")
df.score <- unique(df.id[,c(1,3)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
head(df.id)
head(tensor.score.order)
tensor.df <- rbind(df.id, tensor.score.order)
df.dcast <- tensor.df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
write.table(df.dcast.na, "Ecoli.allCas9.raw.onehot.tensorDNARNA.dcast.na.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast.na)
# 126182
# pam (distance and nucleotide)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
sgRNA.pam <- read.table("ecoli.sgRNA.closestPAM.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.pam.sub <- sgRNA.pam[,c(4,12,13)]
colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]
sgRNA.pam.df$id <- "Cas9"
sgRNA.pam.id <- unite(sgRNA.pam.df, "sgRNAID", c(sgRNAID, id), sep="_")
score.location <- left_join(score.df, sgRNA.pam.id, by="sgRNAID")
score.location$scale <- 0
df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
# 40468
df <- read.delim("Ecoli.allCas9.raw.onehot.tensorDNARNA.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
nrow(df.location)
# 40468
# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
sgRNA.genes <- read.table("sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- sgRNA.genes[,c(4,14)]
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")
sgRNA.genes.df$id <- "Cas9"
sgRNA.genes.id <- unite(sgRNA.genes.df, "sgRNAID", c(sgRNAID, id), sep="_")
score.location <- left_join(score.df, sgRNA.genes.id, by=c("sgRNAID"))
score.location$scale <- 0
df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
# 40468
df <- df.location
df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
nrow(df.location)
# 40468
write.table(df.location, "Ecoli.allCas9.raw.onehot.tensorDNARNA.pam.location.dcast.na.txt", quote=F, row.names=F, sep="\t")
# library(tidyr)
# setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
# df <- read.delim("Ecoli.allCas9.raw.onehot.tensorDNARNA.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
# df.sep <- separate(df, sgRNAID, c("sgRNA", "ID", "cas"), sep="_")
# df.cas9 <- subset(df.sep, df.sep$cas == "Cas9")
# df.cas9.id <- unite(df.cas9, "sgRNAID", c(sgRNA, ID, cas), sep="_")
# df <- df.cas9.id[,c(1,1816,3:1809,1811:1815)]
# write.table(df, "Ecoli.allCas9.raw.onehot.tensorDNARNA.pam.location.dcast.txt", quote=F, row.names=F, sep="\t")
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("Ecoli.allCas9.raw.onehot.tensorDNARNA.pam.location.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
set.seed(2458)
df.sample <- df[sample(nrow(df), 10000), ]
library(ranger)
iRF <- function(xmat, y, ntree=200, iter=5, classification=F, threads=1, alwayssplits=NULL, saveall=T)
{
tmp <- cbind(xmat, Y = y)
wt <- rep(1/ncol(xmat), ncol(xmat)) # start with equal sample weighting per SNP
rfs <- list()
for(i in 1:iter)
{
cat("\niRF iteration ",i,"\n")
cat("=================\n")
mtry = 0.5*sum(wt>0)
rf <- ranger::ranger(dependent.variable.name = "Y", data = tmp, num.trees=ntree,
split.select.weights = wt, classification = classification,
mtry = mtry, importance = "impurity_corrected", num.threads=threads, write.forest = T,
always.split.variables = alwayssplits)
wt <- rf$variable.importance / sum(abs(rf$variable.importance)) # scale importance to range(0,1)
wt[wt<0] <- 0 # set negative weights to zero
cat("mtry: ", mtry, "\n")
cat("prediction error: ",rf$prediction.error,"\n")
if(classification==FALSE) cat("r^2: ",rf$r.squared,"\n")
if(classification==TRUE) print(rf$confusion.matrix)
cat("cor(y,yhat): ",cor(rf$predictions,y),"\n")
cat("SNPs with importance > 0:",sum(wt>0),"\n")
if(saveall) rfs[[i]] <- rf
if(sum(wt>0) < max(0.01*(ncol(xmat)-1), 10))
{
if(!saveall) rfs <- rf
break
}
}
return(rfs)
}
# sgRNAID: [,1]
# cut.score: [,2]
# one-hot independent: [,c(3:17,1805:1811,1813:1814)]
# one-hot dependent: [,c(18:57,128:147,218:237,308:327,398:418,488:507,578:597,668:687,758:777,848:867,1008:1031,1172:1191,1262:1281,1352:1371,1442:1461,1532:1551,1622:1642,1712:1731,)]
# chemical tensors: [,c(58:127,148:217,238:307,328:397,419:487,508:577,598:667,688:757,778:847,868:1007,1032:1171,1192:1261,1282:1351,1372:1441,1462:1531,1552:1621,1643:1711,1732:1801)]
# raw (gc, structure, temp, gene.distance, pam.distance): [,c(1802:1804,1812)]
iRF(df.sample[,3:ncol(df.sample)], df.sample$cut.score)
# iRF iteration 4
# =================
# mtry: 222.5
# prediction error: 84.22153
# r^2: 0.2306201
# cor(y,yhat): 0.4812496
# SNPs with importance > 0: 320
df.quantum.new <- df.sample[,c(58:127,148:217,238:307,328:397,419:487,508:577,598:667,688:757,778:847,868:1007,1032:1171,1192:1261,1282:1351,1372:1441,1462:1531,1552:1621,1643:1711,1732:1801)]
iRF(df.quantum.new, df.sample$cut.score)
# iRF iteration 5
# =================
# mtry: 139
# prediction error: 88.10261
# r^2: 0.1951657
# cor(y,yhat): 0.4452454 #### slightly better than previous but not significantly
# SNPs with importance > 0: 204
#### previously quantum data resulted in r^2:0.1931113 and cor(y,yhat): 0.442198
## feature importance
xmat = df.sample[,3:ncol(df.sample)]
xmat.score = df.sample$cut.score
library(gbm)
gbm.df <- gbm(formula=xmat.score ~ ., data=xmat, distribution = "gaussian", n.trees = 500, shrinkage = 0.1,
interaction.depth = 3, bag.fraction = 0.2, train.fraction = 0.8,
n.minobsinnode = 10, cv.folds = 5, keep.data = TRUE,
verbose = FALSE, n.cores = 1)
best.iter <- gbm.perf(gbm.df, method = "OOB")
print(best.iter)
best.iter <- gbm.perf(gbm.df, method = "cv")
print(best.iter)
head(summary(gbm.df, n.trees = best.iter))
# Number of Observations: 500
# Equivalent Number of Parameters: 39.85
# Residual Standard Error: 0.04408
# [1] 174
# var rel.inf
# p19.GGsgRNA.raw p19.GGsgRNA.raw 4.044676
# sgRNA.gcsgRNA.raw sgRNA.gcsgRNA.raw 3.639182 #### GC content
# p20bp_HOMO_evraw p20bp_HOMO_evraw 3.336434 #### updated quantum chemical tensor
# p20homo_energyraw p20homo_energyraw 2.909753
# CCsgRNA.raw CCsgRNA.raw 2.770156
# GGsgRNA.raw GGsgRNA.raw 2.472840
#### try with just the updated QCTs
df.quantum.newONLY <- df.sample[,c(58:61,63,65,69,73,148:151,153,155,159,163,238:241,243,245,249,253,328:331,333,335,339,343,418:421,423,425,429,453,508:511,513,515,519,523,598:601,603,605,609,613,688:691,693,695,699,703,778:781,783,785,789,793,868:871,873,875,879,883,1032:1035,1037,1039,1043,1047,1192:1195,1197,1199,1203,1207,1282:1285,1287,1289,1293,1297,1372:1375,1377,1379,1383,1387,1462:1465,1467,1469,1473,1477,1552:1555,1557,1559,1563,1567,1642:1645,1647,1649,1653,1657,1732:1735,1737,1739,1743,1747)]
df.quantum.new <- df.sample[,c(58:127,148:217,238:307,328:397,418:487,508:577,598:667,688:757,778:847,868:1007,1032:1171,1192:1261,1282:1351,1372:1441,1462:1531,1552:1621,1642:1711,1732:1801)]
df.quantum.old <- df.sample[,c(62,64,66:68,70:72,74:127,152,154,156:158,160:162,164:217,242,244,246:248,250:252,254:307,332,334,336:338,340:342,344:397,422,424,426:428,450:452,454:487,512,514,516:518,520:522,524:577,602,604,606:608,610:612,614:667,692,694,696:698,700:702,704:757,782,784,786:788,790:792,794:847,872,874,876:878,880:882,884:1007,1036,1038,1040:1042,1044:1046,1048:1171,1196,1198,1200:1202,1204:1206,1208:1261,1286,1288,1290:1292,1294:1296,1298:1351,1376,1378,1380:1382,1384:1386,1388:1441,1466,1468,1470:1472,1474:1476,1478:1531,1556,1558,1560:1562,1564:1566,1568:1621,1646,1648,1650:1652,1654:1656,1658:1711,1736,1738,1780:1782,1784:1786,1788:1801)]
iRF(df.quantum.newONLY, df.sample$cut.score)
# iRF iteration 2
# =================
# mtry: 59
# prediction error: 87.97316
# r^2: 0.1963482
# cor(y,yhat): 0.4477064
# SNPs with importance > 0: 73
iRF(df.quantum.old, df.sample$cut.score)
# iRF iteration 5
# =================
# mtry: 117
# prediction error: 88.02909
# r^2: 0.1958372
# cor(y,yhat): 0.4442053
# SNPs with importance > 0: 168
iRF(df.quantum.new, df.sample$cut.score)
# iRF iteration 5
# =================
# mtry: 124.5
# prediction error: 87.73299
# r^2: 0.1985422
# cor(y,yhat): 0.4478398
# SNPs with importance > 0: 195
# bp_HOMO_eV ONLY
df.bp_HOMO_eV <- df.sample[,c(60,150,240,330,420,510,600,690,780,870,940,1034,1104,1194,1284,1374,1464,1554,1644,1734)]
iRF(df.bp_HOMO_eV, df.sample$cut.score)
# iRF iteration 1
# =================
# mtry: 10
# prediction error: 99.29029
# r^2: 0.09296408
# cor(y,yhat): 0.306499
# SNPs with importance > 0: 6
## feature importance
library(gbm)
xmat = df.quantum.newONLY
xmat.score = df.sample$cut.score
gbm.df <- gbm(formula=xmat.score ~ ., data=xmat, distribution = "gaussian", n.trees = 500, shrinkage = 0.1,
interaction.depth = 3, bag.fraction = 0.2, train.fraction = 0.8,
n.minobsinnode = 10, cv.folds = 5, keep.data = TRUE,
verbose = FALSE, n.cores = 1)
best.iter <- gbm.perf(gbm.df, method = "OOB")
print(best.iter)
best.iter <- gbm.perf(gbm.df, method = "cv")
print(best.iter)
head(summary(gbm.df, n.trees = best.iter))
# Number of Observations: 500
# Equivalent Number of Parameters: 39.85
# Residual Standard Error: 0.05074
# [1] 327
# var rel.inf
# p19HL.gap_eVraw p19HL.gap_eVraw 6.288143
# p20bp_bondraw p20bp_bondraw 5.540806
# p20HOMO_eVraw p20HOMO_eVraw 4.060408
# p18HOMO_eVraw p18HOMO_eVraw 3.850749
# p19HOMO_eVraw p19HOMO_eVraw 3.540256
# p17HL.gap_eVraw p17HL.gap_eVraw 3.530563
library(gbm)
xmat = df.quantum.new
xmat.score = df.sample$cut.score
gbm.df <- gbm(formula=xmat.score ~ ., data=xmat, distribution = "gaussian", n.trees = 500, shrinkage = 0.1,
interaction.depth = 3, bag.fraction = 0.2, train.fraction = 0.8,
n.minobsinnode = 10, cv.folds = 5, keep.data = TRUE,
verbose = FALSE, n.cores = 1)
best.iter <- gbm.perf(gbm.df, method = "OOB")
print(best.iter)
best.iter <- gbm.perf(gbm.df, method = "cv")
print(best.iter)
head(summary(gbm.df, n.trees = best.iter))
# Number of Observations: 500
# Equivalent Number of Parameters: 39.85
# Residual Standard Error: 0.05003
# [1] 359
# var rel.inf
# p20bp_HOMO_evraw p20bp_HOMO_evraw 2.487339
# p20homo_energyraw p20homo_energyraw 2.239112
# p18HOMO_eVraw p18HOMO_eVraw 2.114659
# p19rot_constants_yraw p19rot_constants_yraw 1.663645
# p20bp_bondraw p20bp_bondraw 1.478998
# p19tot_dipoleraw p19tot_dipoleraw 1.410728
library(gbm)
xmat = df.quantum.old
xmat.score = df.sample$cut.score
gbm.df <- gbm(formula=xmat.score ~ ., data=xmat, distribution = "gaussian", n.trees = 500, shrinkage = 0.1,
interaction.depth = 3, bag.fraction = 0.2, train.fraction = 0.8,
n.minobsinnode = 10, cv.folds = 5, keep.data = TRUE,
verbose = FALSE, n.cores = 1)
best.iter <- gbm.perf(gbm.df, method = "OOB")
print(best.iter)
best.iter <- gbm.perf(gbm.df, method = "cv")
print(best.iter)
head(summary(gbm.df, n.trees = best.iter))
# Number of Observations: 500
# Equivalent Number of Parameters: 39.85
# Residual Standard Error: 0.05249
# [1] 499
# var rel.inf
# p20homo_lumo_energygapraw p20homo_lumo_energygapraw 2.678070
# p20homo_energyraw p20homo_energyraw 2.121547
# p19molecular_volumeraw p19molecular_volumeraw 1.407671
# p18homo_energyraw p18homo_energyraw 1.356138
# p18xz_quadrupoleraw p18xz_quadrupoleraw 1.313431
# p19rot_constants_yraw p19rot_constants_yraw 1.200738
library(gbm)
xmat = df.sample[,3:ncol(df.sample)]
xmat.score = df.sample$cut.score
gbm.df <- gbm(formula=xmat.score ~ ., data=xmat, distribution = "gaussian", n.trees = 500, shrinkage = 0.1,
interaction.depth = 3, bag.fraction = 0.2, train.fraction = 0.8,
n.minobsinnode = 10, cv.folds = 5, keep.data = TRUE,
verbose = FALSE, n.cores = 1)
best.iter <- gbm.perf(gbm.df, method = "OOB")
print(best.iter)
best.iter <- gbm.perf(gbm.df, method = "cv")
print(best.iter)
head(summary(gbm.df, n.trees = best.iter))
Number of Observations: 500
Equivalent Number of Parameters: 39.85
Residual Standard Error: 0.04919
# [1] 228
# var rel.inf
# p20bp_bondraw p20bp_bondraw 5.011324
# p19.GGsgRNA.raw p19.GGsgRNA.raw 3.197801
# sgRNA.gcsgRNA.raw sgRNA.gcsgRNA.raw 3.046770
# GGsgRNA.raw GGsgRNA.raw 2.345230
# pam.distance0.x pam.distance0.x 2.115209
# p15.CCsgRNA.raw p15.CCsgRNA.raw 1.621739
library(gtools)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("protein_rna_dna-vector_lee_nucleotide_dna_data_15dec.txt", header=T, sep="\t", stringsAsFactors = F)
rownames(tensor) <- tensor[,1]
input <- tensor[67:70,2:5]
nucleotides <- c("A", "C", "G", "T")
# input <- data.frame(matrix(ncol=4, nrow=2,
# dimnames=list(c("Feat1", "Feat2"), nucleotides)))
# input["Feat1",] <- c(2, 5, 10, 15)
# input["Feat2",] <- c(12, 15, 8, 20)
numnucleotides <- 3
# Get all permutations
# n = the number of possibilities
# r = the number of draws
# v = the nucleotides
permlist <- permutations(n=length(nucleotides), r=numnucleotides,
v=nucleotides, repeats.allowed=TRUE)
# Merge each rows of permlist into 1 column
# To create a vector of strings for each potential sequence
sequence <- rep(NA, nrow(permlist))
for (ii in 1:nrow(permlist)) {
sequence[ii] <- paste(permlist[ii,], sep="", collapse="")
}
# Create diagnal matrix that corresponds to all possible sequences
diagmatrix <- diag(1, nrow(permlist))
rownames(diagmatrix) <- sequence
colnames(diagmatrix) <- sequence
# Loop through each sequence in the permutation
for (ii in 1:nrow(permlist)) {
# Create an empty vector in order to substitute values
nucleotidevector <- rep(NA, numnucleotides)
# Loop through each nucleotide of the sequence
for (jj in 1:numnucleotides) {
# Get nucleotide
nucleotidevalue <- permlist[ii,jj]
# Get value that corresponds to the nucleotide
# Place in vector of values
nucleotidevector[jj] <- input[1,eval(nucleotidevalue)]
}
# Substitute mean of nucleotide values into the corresponding location
# in the diagonal matrix
diagmatrix[ii,ii] <- mean(nucleotidevector)
}
write.table(diagmatrix, "tensor.kmer.3.txt", quote=F, row.names=T, col.names=F, sep=" ")
### kmer positional encoding
import os, sys
import numpy as np
#kmer 1
onehot_dict={
'A':'-5.367 0.000 0.000 0.000 ',
'C':'0.000 -4.951 0.000 0.000 ',
'T':'0.000 0.000 -4.951 0.000 ',
'G':'0.000 0.000 0.000 -5.367 '
}
#kmer 2
onehot_dict={
'AA':'-5.367 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 ',
'AC':'0.000 -5.159 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 ',
'AG':'0.000 0.000 -5.159 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 ',
'AT':'0.000 0.000 0.000 -5.367 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 ',
'CA':'0.000 0.000 0.000 0.000 -5.159 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 ',
'CC':'0.000 0.000 0.000 0.000 0.000 -4.951 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 ',
'CG':'0.000 0.000 0.000 0.000 0.000 0.000 -4.951 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 ',
'CT':'0.000 0.000 0.000 0.000 0.000 0.000 0.000 -5.159 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 ',
'GA':'0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -5.159 0.000 0.000 0.000 0.000 0.000 0.000 0.000 ',
'GC':'0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -4.951 0.000 0.000 0.000 0.000 0.000 0.000 ',
'GG':'0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -4.951 0.000 0.000 0.000 0.000 0.000 ',
'GT':'0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -5.159 0.000 0.000 0.000 0.000 ',
'TA':'0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -5.367 0.000 0.000 0.000 ',
'TC':'0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -5.159 0.000 0.000 ',
'TG':'0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -5.159 0.000 ',
'TT':'0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 -5.367'
}
#kmer 3
onehot_dict={
'AAA':'-5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AAC':'0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AAG':'0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AAT':'0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACA':'0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACC':'0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACG':'0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACT':'0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGA':'0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGC':'0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGG':'0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGT':'0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATA':'0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATC':'0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 ',
'TCC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 ',
'TCG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 ',
'TCT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 ',
'TGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 ',
'TGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 ',
'TGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 ',
'TGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 ',
'TTA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 ',
'TTC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 ',
'TTG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 ',
'TTT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 '
}
#kmer 4
onehot_dict={
'AAAA':'-5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AAAC':'0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AAAG':'0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AAAT':'0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AACA':'0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AACC':'0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AACG':'0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AACT':'0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AAGA':'0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AAGC':'0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AAGG':'0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AAGT':'0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AATA':'0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AATC':'0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AATG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AATT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACCA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACCC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACCG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACCT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACTA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACTC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACTG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACTT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGCA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGCC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGCG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGCT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGTA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGTC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGTG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGTT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATCA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATCC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATCG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATCT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATTA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATTC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATTG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATTT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CAAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CAAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CAAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CAAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CACA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CACC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CACG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CACT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CAGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CAGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CAGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CAGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CATA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CATC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CATG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CATT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCCA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCCC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCCG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCCT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCTA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCTC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCTG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCTT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGCA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGCC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGCG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGCT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGTA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGTC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGTG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGTT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTCA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTCC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTCG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTCT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTTA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTTC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTTG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTTT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GAAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GAAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GAAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GAAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GACA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GACC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GACG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GACT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GAGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GAGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GAGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GAGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GATA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GATC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GATG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GATT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCCA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCCC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCCG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCCT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCTA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCTC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCTG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCTT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGCA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGCC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGCG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGCT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGTA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGTC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGTG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGTT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTCA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTCC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTCG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTCT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTTA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTTC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTTG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTTT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TAAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TAAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TAAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TAAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TACA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TACC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TACG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TACT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TAGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TAGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TAGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TAGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TATA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TATC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TATG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TATT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCCA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCCC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCCG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCCT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCTA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCTC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCTG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCTT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TGAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TGAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TGAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TGAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TGCA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TGCC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TGCG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TGCT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TGGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TGGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TGGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TGGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TGTA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TGTC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TGTG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TGTT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TTAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TTAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TTAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TTAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TTCA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 ',
'TTCC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 ',
'TTCG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 ',
'TTCT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 ',
'TTGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 ',
'TTGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 ',
'TTGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 ',
'TTGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 ',
'TTTA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 ',
'TTTC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 ',
'TTTG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 ',
'TTTT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 '
}
# open input and output files
input_path = sys.argv[1]
input_file = open(input_path, 'r')
dep_file = open(input_path[:-4]+'_dependent1.txt', 'w')
# loop over nucleotide sequences
for idx, line in enumerate(input_file):
# if first iteration, write title line
if idx == 0:
dep_file.writelines(line+': third-order position-dependent features'+ '\n')
# otherwise encode sequence
else:
# split line by tab
line = line.split('\t')
# extract sequence (also remove \n)
seq = line[-1][:-1]
# compute position-dependent features as one-hot vectors
pos_dep = ''.join([onehot_dict[seq[i]] for i in range(len(seq))])
# write features to file
dep_file.writelines(line[0] + '\t' + pos_dep + '\n')
if idx % 10000 == 0:
print('{0:,}'.format(idx)+' lines processed...')
print('Done!')
input_file.close()
dep_file.close()
#/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer1_quantum_positional_encode.py
#python file.py data.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/
python ../kmer1_quantum_positional_encode.py Ecoli.allCas9.noscore.txt
python ../kmer2_quantum_positional_encode.py Ecoli.allCas9.noscore.txt
python ../kmer3_quantum_positional_encode.py Ecoli.allCas9.noscore.txt
python ../kmer4_quantum_positional_encode.py Ecoli.allCas9.noscore.txt
# separate nucleotide sequence values into individual columns in data frame so each position counts as one feature
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/
sed '1d' Ecoli.allCas9.noscore_dependent1.txt | sed '1d' | sed 's/\t/ /g' > Ecoli.allCas9.quantum.tensor_dep1.txt
sed '1d' Ecoli.allCas9.noscore_dependent2.txt | sed '1d' | sed 's/\t/ /g' > Ecoli.allCas9.quantum.tensor_dep2.txt
sed '1d' Ecoli.allCas9.noscore_dependent3.txt | sed '1d' | sed 's/\t/ /g' > Ecoli.allCas9.quantum.tensor_dep3.txt
sed '1d' Ecoli.allCas9.noscore_dependent4.txt | sed '1d' | sed 's/\t/ /g' > Ecoli.allCas9.quantum.tensor_dep4.txt
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J kmer.matrix
#SBATCH -N 4
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0
module load python
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
R CMD BATCH quantum.kmer.score.matrix.R
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/quantum.kmer.score.matrix.sh
# salloc -A SYB105 -N 2 -p gpu -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
score <- read.delim("Ecoli.allCas9.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(1:2)]
colnames(score.df) <- c("sgRNAID", "cut.score")
onehot.dep1 <- read.delim("Ecoli.allCas9.quantum.tensor_dep1.txt", header=F, sep=" ")
onehot.dep2 <- read.delim("Ecoli.allCas9.quantum.tensor_dep2.txt", header=F, sep=" ")
onehot.dep3 <- read.delim("Ecoli.allCas9.quantum.tensor_dep3.txt", header=F, sep=" ")
onehot.dep4 <- read.delim("Ecoli.allCas9.quantum.tensor_dep4.txt", header=F, sep=" ")
colnames(onehot.dep1)[1] <- "sgRNAID"
colnames(onehot.dep2)[1] <- "sgRNAID"
colnames(onehot.dep3)[1] <- "sgRNAID"
colnames(onehot.dep4)[1] <- "sgRNAID"
onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)-1], onehot.dep2[,1:ncol(onehot.dep2)-1], by="sgRNAID")
onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)-1], by="sgRNAID")
onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)-1], by="sgRNAID")
onehot.score <- full_join(score.df, onehot.dep, by="sgRNAID")
onehot.score[is.na(onehot.score)] <- 0
df.melt <- melt(onehot.score, id=c("cut.score", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "sgRNAID", "variable", "value")
df$value <- as.numeric(df$value)
df.id <- df[!(is.na(df$value) | df$value==""), ]
colnames(df.id) <- c("cut.score", "sgRNAID", "feature", "value")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
write.table(df.dcast, "Ecoli.allCas9.quantum.tensor.kmer.encoding.txt", quote=F, row.names=F, sep="\t")
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J iRF.quantum.kmer
#SBATCH -N 1
#SBATCH -p gpu
#SBATCH -t 10:00:00
#SBATCH --mem-per-cpu=0
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
R CMD BATCH iRF.quantum.kmer.R
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.quantum.kmer.sh
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(ranger)
iRF <- function(xmat, y, ntree=200, iter=5, classification=F, threads=1, alwayssplits=NULL, saveall=T)
{
tmp <- cbind(xmat, Y = y)
wt <- rep(1/ncol(xmat), ncol(xmat)) # start with equal sample weighting per SNP
rfs <- list()
for(i in 1:iter)
{
cat("\niRF iteration ",i,"\n")
cat("=================\n")
mtry = 0.5*sum(wt>0)
rf <- ranger::ranger(dependent.variable.name = "Y", data = tmp, num.trees=ntree,
split.select.weights = wt, classification = classification,
mtry = mtry, importance = "impurity_corrected", num.threads=threads, write.forest = T,
always.split.variables = alwayssplits)
wt <- rf$variable.importance / sum(abs(rf$variable.importance)) # scale importance to range(0,1)
wt[wt<0] <- 0 # set negative weights to zero
cat("mtry: ", mtry, "\n")
cat("prediction error: ",rf$prediction.error,"\n")
if(classification==FALSE) cat("r^2: ",rf$r.squared,"\n")
if(classification==TRUE) print(rf$confusion.matrix)
cat("cor(y,yhat): ",cor(rf$predictions,y),"\n")
cat("SNPs with importance > 0:",sum(wt>0),"\n")
if(saveall) rfs[[i]] <- rf
if(sum(wt>0) < max(0.01*(ncol(xmat)-1), 10))
{
if(!saveall) rfs <- rf
break
}
}
return(rfs)
}
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("Ecoli.allCas9.quantum.tensor.kmer.encoding.txt", header=T, sep="\t", stringsAsFactors = F)
df.sep <- separate(df, sgRNAID, c("sgRNA", "ID", "cas"), sep="_")
df.cas9 <- subset(df.sep, df.sep$cas == "Cas9")
df.cas9.id <- unite(df.cas9, "sgRNAID", c(sgRNA, ID, cas), sep="_")
set.seed(2458)
df.sample <- df.cas9.id[sample(nrow(df.cas9.id), 10000), ]
# kmer = 1
df.1 <- df.sample[,c(2:82)]
iRF(df.1[,2:ncol(df.1)], df.1$cut.score)
# kmer = 2
df.2 <- df.sample[,c(2,83:386)]
iRF(df.2[,2:ncol(df.2)], df.2$cut.score)
# kmer = 3
df.3 <- df.sample[,c(2,387:1538)]
iRF(df.3[,2:ncol(df.3)], df.3$cut.score)
# kmer = 4
df.4 <- df.sample[,c(2,1539:5890)]
iRF(df.4[,2:ncol(df.4)], df.4$cut.score)
# kmer = 1 + 2
df.1.2 <- df.sample[,c(2:386)]
iRF(df.1.2[,2:ncol(df.1.2)], df.1.2$cut.score)
# kmer = 1 + 2 + 3
df.1.2.3 <- df.sample[,c(2:1538)]
iRF(df.1.2.3[,2:ncol(df.1.2.3)], df.1.2.3$cut.score)
# kmer = 1 + 2 + 3 + 4
df.1.2.3.4 <- df.sample[,c(2:5890)]
iRF(df.1.2.3.4[,2:ncol(df.1.2.3.4)], df.1.2.3.4$cut.score)
# add new DNA/RNA dimer features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(tidyr)
library(reshape2)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
#tensor <- read.delim("protein_rna_dna-vector_lee_nucleotide_dna_data_15dec.txt", header=T, sep="\t", stringsAsFactors = F)
tensor <- read.delim("quantum_dimers_20dec.txt", header=T, sep="\t", stringsAsFactors = F)
seq <- read.delim("Ecoli.allCas9.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.dimer <- seq %>% unite("p1", p1:p2, remove=F, sep= "") %>% unite("p2", p2:p3, remove=F, sep= "") %>% unite("p3", p3:p4, remove=F, sep= "") %>% unite("p4", p4:p5, remove=F, sep= "") %>% unite("p5", p5:p6, remove=F, sep= "") %>% unite("p6", p6:p7, remove=F, sep= "") %>% unite("p7", p7:p8, remove=F, sep= "") %>% unite("p8", p8:p9, remove=F, sep= "") %>% unite("p9", p9:p10, remove=F, sep= "") %>% unite("p10", p10:p11, remove=F, sep= "") %>% unite("p11", p11:p12, remove=F, sep= "") %>% unite("p12", p12:p13, remove=F, sep= "") %>% unite("p13", p13:p14, remove=F, sep= "") %>% unite("p14", p14:p15, remove=F, sep= "") %>% unite("p15", p15:p16, remove=F, sep= "") %>% unite("p16", p16:p17, remove=F, sep= "") %>% unite("p17", p17:p18, remove=F, sep= "") %>% unite("p18", p18:p19, remove=F, sep= "") %>% unite("p19", p19:p20, remove=T, sep= "")
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:17]
tensor.t <- as.data.frame(t(tensor.df))
#tensor.t$base <- c("A", "C", "G", "T")
tensor.t$base <- names(tensor[,2:17])
rownames(seq) <- seq.dimer[,1]
seq.df <- seq.dimer[,2:20]
seq.melt <- melt(seq.dimer, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Ecoli.allCas9.tensorsDNARNAdimers.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Ecoli.allCas9.tensorsDNARNAdimers.melt.txt", quote=F, row.names=F, sep="\t")
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("Ecoli.allCas9.raw.onehot.tensorDNARNA.pam.location.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
tensor <- read.delim("Ecoli.allCas9.tensorsDNARNAdimers.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0
tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0
df.score <- unique(df[,c(1,2)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
df.dcast <- tensor.score.order %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
write.table(df.dcast.na, "Ecoli.allCas9.raw.onehot.tensorDNARNAdimers.dcast.na.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast.na)
# 40468
df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
write.table(df.location, "Ecoli.allCas9.raw.onehot.tensorDNARNAdimers.pam.location.dcast.na.txt", quote=F, row.names=F, sep="\t")
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("Ecoli.allCas9.raw.onehot.tensorDNARNAdimers.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
set.seed(2458)
df.sample <- df[sample(nrow(df), 10000), ]
library(ranger)
iRF <- function(xmat, y, ntree=200, iter=5, classification=F, threads=1, alwayssplits=NULL, saveall=T)
{
tmp <- cbind(xmat, Y = y)
wt <- rep(1/ncol(xmat), ncol(xmat)) # start with equal sample weighting per SNP
rfs <- list()
for(i in 1:iter)
{
cat("\niRF iteration ",i,"\n")
cat("=================\n")
mtry = 0.5*sum(wt>0)
rf <- ranger::ranger(dependent.variable.name = "Y", data = tmp, num.trees=ntree,
split.select.weights = wt, classification = classification,
mtry = mtry, importance = "impurity_corrected", num.threads=threads, write.forest = T,
always.split.variables = alwayssplits)
wt <- rf$variable.importance / sum(abs(rf$variable.importance)) # scale importance to range(0,1)
wt[wt<0] <- 0 # set negative weights to zero
cat("mtry: ", mtry, "\n")
cat("prediction error: ",rf$prediction.error,"\n")
if(classification==FALSE) cat("r^2: ",rf$r.squared,"\n")
if(classification==TRUE) print(rf$confusion.matrix)
cat("cor(y,yhat): ",cor(rf$predictions,y),"\n")
cat("SNPs with importance > 0:",sum(wt>0),"\n")
if(saveall) rfs[[i]] <- rf
if(sum(wt>0) < max(0.01*(ncol(xmat)-1), 10))
{
if(!saveall) rfs <- rf
break
}
}
return(rfs)
}
# sgRNAID: [,1]
# cut.score: [,2]
# one-hot independent: [,c(3:17,1805:1811,1813:1814)]
# one-hot dependent: [,c(18:57,128:147,218:237,308:327,398:418,488:507,578:597,668:687,758:777,848:867,1008:1031,1172:1191,1262:1281,1352:1371,1442:1461,1532:1551,1622:1642,1712:1731)]
# chemical tensors: [,c(58:127,148:217,238:307,328:397,419:487,508:577,598:667,688:757,778:847,868:1007,1032:1171,1192:1261,1282:1351,1372:1441,1462:1531,1552:1621,1643:1711,1732:1801)]
# raw (gc, structure, temp, gene.distance, pam.distance): [,c(1802:1804,1812)]
# chemical tensor dimers: [,c(1816:1910)]
df.tensor.dimer <- df.sample[,c(1816:1910)]
iRF(df.tensor.dimer, df.sample$cut.score.x)
# iRF iteration 2
# =================
# mtry: 26.5
# prediction error: 94.9823
# r^2: 0.1323184
# cor(y,yhat): 0.3668597
# SNPs with importance > 0: 27
df.tensor <- df.sample[,c(58:127,148:217,238:307,328:397,419:487,508:577,598:667,688:757,778:847,868:1007,1032:1171,1192:1261,1282:1351,1372:1441,1462:1531,1552:1621,1643:1711,1732:1801)]
iRF(df.tensor, df.sample$cut.score.x)
# iRF iteration 5
# =================
# mtry: 128
# prediction error: 87.8625
# r^2: 0.1973591
# cor(y,yhat): 0.4461191
# SNPs with importance > 0: 190
iRF(df.sample[,c(3:1814,1816:1910)], df.sample$cut.score.x)
# iRF iteration 5
# =================
# mtry: 165.5
# prediction error: 84.29982
# r^2: 0.2299048
# cor(y,yhat): 0.4806311
# SNPs with importance > 0: 257
df.tensor.bond.dimer <- df.sample[,c(58:61,63,65,69,73,148:151,153,155,159,163,238:241,243,245,249,253,328:331,333,335,339,343,418:421,423,425,429,453,508:511,513,515,519,523,598:601,603,605,609,613,688:691,693,695,699,703,778:781,783,785,789,793,868:871,873,875,879,883,1032:1035,1037,1039,1043,1047,1192:1195,1197,1199,1203,1207,1282:1285,1287,1289,1293,1297,1372:1375,1377,1379,1383,1387,1462:1465,1467,1469,1473,1477,1552:1555,1557,1559,1563,1567,1642:1645,1647,1649,1653,1657,1732:1735,1737,1739,1743,1747,1816:1910)]
iRF(df.tensor.bond.dimer, df.sample$cut.score.x)
# iRF iteration 2
# =================
# mtry: 83.5
# prediction error: 87.03005
# r^2: 0.2049637
# cor(y,yhat): 0.4545251
# SNPs with importance > 0: 107
## feature importance
library(gbm)
xmat = df.sample[,c(3:1814,1816:1910)]
xmat.score = df.sample$cut.score.x
gbm.df <- gbm(formula=xmat.score ~ ., data=xmat, distribution = "gaussian", n.trees = 500, shrinkage = 0.1,
interaction.depth = 3, bag.fraction = 0.2, train.fraction = 0.8,
n.minobsinnode = 10, cv.folds = 5, keep.data = TRUE,
verbose = FALSE, n.cores = 1)
best.iter <- gbm.perf(gbm.df, method = "OOB")
print(best.iter)
best.iter <- gbm.perf(gbm.df, method = "cv")
print(best.iter)
head(summary(gbm.df, n.trees = best.iter))
# Number of Observations: 500
# Equivalent Number of Parameters: 39.85
# Residual Standard Error: 0.04349
# [1] 162
# var rel.inf
# sgRNA.gcsgRNA.raw sgRNA.gcsgRNA.raw 3.433471
# p20bp_bondraw p20bp_bondraw 2.452850
# p19H_bondraw p19H_bondraw 2.275021
# GGsgRNA.raw GGsgRNA.raw 2.218373
# p20homo_energyraw p20homo_energyraw 2.157199
# p20bp_HOMO_evraw p20bp_HOMO_evraw 1.582125
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("Ecoli.allCas9.raw.onehot.tensorDNARNAdimers.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df <- df[,c(1:1654,1656)]
df.num <- mutate_all(df[,2:ncol(df)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"
write.table(df.all, "Ecoli.allCas9.raw.onehot.tensorDNARNAdimers.pam.location.dcast.na.corrected.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "Ecoli.allCas9.raw.onehot.tensorDNARNAdimers.pam.location.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "Ecoli.allCas9.raw.onehot.tensorDNARNAdimers.pam.location.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "Ecoli.allCas9.raw.onehot.tensorDNARNAdimers.pam.location.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Ecoli.allCas9.raw.onehot.tensorDNARNAdimers.pam.location.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Ecoli.allCas9.raw.onehot.tensorDNARNAdimers.pam.location.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.all[,2]), "Ecoli.allCas9.raw.onehot.tensorDNARNAdimers.pam.location.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
# run python scripts on Andes
# run job submissions on Summit
# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]
# Andes
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.dimers.noDWT
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.dimers.noDWT
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName e.coli.tensor.dimers.noDWT --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensorDNARNAdimers.pam.location.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensorDNARNAdimers.pam.location.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.dimers.noDWT
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.dimers.noDWT/Submits/submit_full_e.coli.tensor.dimers.noDWT_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.dimers.noDWT/Submits/submit_train_e.coli.tensor.dimers.noDWT_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.dimers.noDWT/Submits/submit_test_e.coli.tensor.dimers.noDWT_0.sh
# Andes
module load python/3.7-anaconda3
vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.dimers.noDWT
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt e.coli.tensor.dimers.noDWT
# 0.2527974463591811
sort -k3rg topVarEdges/cut.score_top95.txt | head
# GGsgRNA.raw cut.score 0.041778468190407723
# CCsgRNA.raw cut.score 0.039018995323038465
# p15.CCsgRNA.raw cut.score 0.03434113872955213
# GsgRNA.raw cut.score 0.031975618837973466
# p19.GGsgRNA.raw cut.score 0.03134661557038756
# CsgRNA.raw cut.score 0.030342742435033754
# AsgRNA.raw cut.score 0.02678672308048948
# p20bp_LUMO_evraw cut.score 0.025944109138838368
# p20homo_lumo_energygapraw cut.score 0.025743528577272926
# GCsgRNA.raw cut.score 0.019657819154867556
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/e.coli.tensor.dimers.noDWT_cut.score.importance4 | head
# CCsgRNA.raw: 71761.7
# GGsgRNA.raw: 71694.9
# p20bp_HOMO_evraw: 64499.4
# p19.GGsgRNA.raw: 57819.5
# p15.CCsgRNA.raw: 57003.9
# GsgRNA.raw: 55730
# CsgRNA.raw: 48776.1
# AsgRNA.raw: 45918.2
# p20xz_quadrupoleraw: 37561.6
# GCsgRNA.raw: 33928.6
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.dimers.noDWT/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("e.coli.tensor.dimers.noDWT_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4902065
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate shap
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
# python
import pandas as pd
import numpy as np
np.random.seed(0)
import matplotlib.pyplot as plt
df = pd.read_table('/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensorDNARNAdimers.pam.location.dcast.na.corrected.txt') # Load the data
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.ensemble import RandomForestRegressor
# The target variable is 'cut.score'.
Y = df['cut.score']
# get list of features from R... dput(colnames(df))
X = df.drop(columns =['sgRNAID', 'cut.score'])
# Split the data into train and test data:
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size = 0.2)
# Build the model with the random forest regression algorithm:
model = RandomForestRegressor(max_depth=6,random_state=0,n_estimators=10)
model.fit(X_train, Y_train)
import shap
shap_values = shap.TreeExplainer(model).shap_values(X_train)
f = plt.figure()
shap.summary_plot(shap_values, X_train, plot_type="bar")
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.noDWT.dimer.14feb.shap_summary_plot_bar.png", bbox_inches='tight', dpi=600)
import matplotlib.pyplot as plt
f = plt.figure()
shap.summary_plot(shap_values, X_train)
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.noDWT.dimer.14feb.shap_summary_plot_varimp.png", bbox_inches='tight', dpi=600)
# scp noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.noDWT.dimer.14feb.shap_summary_plot_varimp.png /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP/.
# positional encoding kmers 1-4
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer1_positional_encode.py Ecoli.allCas9.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer2_positional_encode.py Ecoli.allCas9.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer3_positional_encode.py Ecoli.allCas9.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer4_positional_encode.py Ecoli.allCas9.noscore.txt
# separate nucleotide sequence values into individual columns in data frame so each position counts as one feature
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/
sed '1d' Ecoli.allCas9.noscore_dependent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Ecoli.allCas9_dep1.txt
sed '1d' Ecoli.allCas9.noscore_dependent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Ecoli.allCas9_dep2.txt
sed '1d' Ecoli.allCas9.noscore_dependent3.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Ecoli.allCas9_dep3.txt
sed '1d' Ecoli.allCas9.noscore_dependent4.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Ecoli.allCas9_dep4.txt
# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("protein_rna_dna-vector_lee_nucleotide_dna_data_15dec.txt", header=T, sep="\t", stringsAsFactors = F)
seq <- read.delim("Ecoli.allCas9.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:5]
tensor.t <- as.data.frame(t(tensor.df[63:70,]))
tensor.t$base <- c("A", "C", "G", "T")
rownames(seq) <- seq[,1]
seq.df <- seq[,2:21]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Ecoli.allCas9.tensors.single.bp.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Ecoli.allCas9.tensors.single.bp.melt.txt", quote=F, row.names=F, sep="\t")
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J jan18.matrix
#SBATCH -N 4
#SBATCH -t 10:00:00
module load python
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
R CMD BATCH jan18.matrix.R
R CMD BATCH jan18.matrix.2.R
#sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/jan18.matrix.sh
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
structure <- read.delim("Ecoli.allCas9.structure.txt", header=T, sep="\t", stringsAsFactors = F)
nuc <- read.delim("Ecoli.allCas9.nuc.count.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("Ecoli.allCas9.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(1:2)]
colnames(score.df) <- c("sgRNAID", "cut.score")
structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]
# structure, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])
structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"
structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]
structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "seq", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")
## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
onehot.ind1 <- read.delim("Ecoli.allCas9_ind1.txt", header=T, sep=" ")
# 5 columns (-1 for sgRNAID)
onehot.ind2 <- read.delim("Ecoli.allCas9_ind2.txt", header=T, sep=" ")
# 17
onehot.dep1 <- read.delim("Ecoli.allCas9_dep1.txt", header=F, sep=" ")
# 81
onehot.dep2 <- read.delim("Ecoli.allCas9_dep2.txt", header=F, sep=" ")
# 321
onehot.dep3 <- read.delim("Ecoli.allCas9_dep3.txt", header=F, sep=" ")
# 1154 <-- have 1218 for the labels??
onehot.dep4 <- read.delim("Ecoli.allCas9_dep4.txt", header=F, sep=" ")
# 4354 <-- have 5121 for the labels??
# 5926 total features...
colnames(onehot.dep1)[1] <- "sgRNAID"
colnames(onehot.dep2)[1] <- "sgRNAID"
colnames(onehot.dep3)[1] <- "sgRNAID"
colnames(onehot.dep4)[1] <- "sgRNAID"
onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)-1], onehot.dep2[,1:ncol(onehot.dep2)-1], by="sgRNAID")
onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)-1], by="sgRNAID")
onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)-1], by="sgRNAID")
onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"
### getting the labels for the onehot matrix
# setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
# setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/onehot")
# onehot.ind1 <- read.delim("ind1.head.txt", header=T, sep=" ")
# onehot.ind2 <- read.delim("ind2.head.txt", header=T, sep=" ")
# onehot.dep1 <- read.delim("dep1.txt", header=F, sep=" ")
# onehot.dep2 <- read.delim("dep2.txt", header=F, sep=" ")
# onehot.dep3 <- read.delim("dep3.txt", header=F, sep=" ")
# onehot.dep3 <- onehot.dep3[,1:1154]
# onehot.dep4 <- read.delim("dep4.txt", header=F, sep=" ")
# onehot.dep4 <- onehot.dep4[,1:4354]
# colnames(onehot.dep1)[1] <- "sgRNAID"
# colnames(onehot.dep2)[1] <- "sgRNAID"
# colnames(onehot.dep3)[1] <- "sgRNAID"
# colnames(onehot.dep4)[1] <- "sgRNAID"
#
# onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
# onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)], onehot.dep2[,1:ncol(onehot.dep2)], by="sgRNAID")
# onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)], by="sgRNAID")
# onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)], by="sgRNAID")
# onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
# write.table(onehot, "onehot.labels.txt", quote=F, row.names=F, sep="\t")
# onehot.t <- data.frame(t(onehot))
# 6754 columns <-- corrected to match matrix used = 5926 total features
data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))
df.melt <- melt(data.onehot[,c(1,2,4:ncol(data.onehot))], id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
write.table(df.id, "e.coli.structure.temp.gc.onehot1to4.txt", quote=F, row.names=F, sep="\t")
# 5910
# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
tensor <- read.delim("Ecoli.allCas9.tensors.single.bp.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0
tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0
write.table(tensor.id, "tensor.id.test", quote=F, row.names=F, sep="\t")
tensor.id <- read.delim("tensor.id.test", header=T, sep="\t")
df.id <- read.delim("e.coli.structure.temp.gc.onehot1to4.txt", header=T, sep="\t")
df.score <- unique(df.id[,c(1,3)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
head(df.id)
head(tensor.score.order)
tensor.df <- rbind(df.id, tensor.score.order)
df.dcast <- tensor.df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
write.table(df.dcast.na, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dcast.na.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast.na)
# 126182
# pam (distance and nucleotide)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
sgRNA.pam <- read.table("ecoli.sgRNA.closestPAM.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.pam.sub <- sgRNA.pam[,c(4,12,13)]
colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]
sgRNA.pam.df$id <- "Cas9"
sgRNA.pam.id <- unite(sgRNA.pam.df, "sgRNAID", c(sgRNAID, id), sep="_")
score <- read.delim("Ecoli.allCas9.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(1:2)]
colnames(score.df) <- c("sgRNAID", "cut.score")
score.location <- left_join(score.df, sgRNA.pam.id, by="sgRNAID")
score.location$scale <- 0
df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
# 40468
df <- read.delim("Ecoli.allCas9.raw.onehot.tensor.single.bp.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
nrow(df.location)
# 40468
# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
sgRNA.genes <- read.table("sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- sgRNA.genes[,c(4,14)]
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")
sgRNA.genes.df$id <- "Cas9"
sgRNA.genes.id <- unite(sgRNA.genes.df, "sgRNAID", c(sgRNAID, id), sep="_")
score.location <- left_join(score.df, sgRNA.genes.id, by=c("sgRNAID"))
score.location$scale <- 0
df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
# 40468
df <- df.location
df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
nrow(df.location)
# 40468
write.table(df.location, "Ecoli.allCas9.raw.onehot.tensor.single.bp.pam.location.dcast.na.txt", quote=F, row.names=F, sep="\t")
# add new DNA/RNA dimer features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(tidyr)
library(reshape2)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("quantum_dimers_20dec.txt", header=T, sep="\t", stringsAsFactors = F)
seq <- read.delim("Ecoli.allCas9.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.dimer <- seq %>% unite("p1", p1:p2, remove=F, sep= "") %>% unite("p2", p2:p3, remove=F, sep= "") %>% unite("p3", p3:p4, remove=F, sep= "") %>% unite("p4", p4:p5, remove=F, sep= "") %>% unite("p5", p5:p6, remove=F, sep= "") %>% unite("p6", p6:p7, remove=F, sep= "") %>% unite("p7", p7:p8, remove=F, sep= "") %>% unite("p8", p8:p9, remove=F, sep= "") %>% unite("p9", p9:p10, remove=F, sep= "") %>% unite("p10", p10:p11, remove=F, sep= "") %>% unite("p11", p11:p12, remove=F, sep= "") %>% unite("p12", p12:p13, remove=F, sep= "") %>% unite("p13", p13:p14, remove=F, sep= "") %>% unite("p14", p14:p15, remove=F, sep= "") %>% unite("p15", p15:p16, remove=F, sep= "") %>% unite("p16", p16:p17, remove=F, sep= "") %>% unite("p17", p17:p18, remove=F, sep= "") %>% unite("p18", p18:p19, remove=F, sep= "") %>% unite("p19", p19:p20, remove=T, sep= "")
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:17]
tensor.t <- as.data.frame(t(tensor.df))
#tensor.t$base <- c("A", "C", "G", "T")
tensor.t$base <- names(tensor[,2:17])
rownames(seq) <- seq.dimer[,1]
seq.df <- seq.dimer[,2:20]
seq.melt <- melt(seq.dimer, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Ecoli.allCas9.tensors.dimers.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Ecoli.allCas9.tensors.dimers.melt.txt", quote=F, row.names=F, sep="\t")
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("Ecoli.allCas9.raw.onehot.tensor.single.bp.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
tensor <- read.delim("Ecoli.allCas9.tensors.dimers.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0
tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0
df.score <- unique(df[,c(1,2)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
colnames(tensor.score.order) <- c("cut.score", "feature.scale", "sgRNAID", "value")
df.dcast <- tensor.score.order %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
write.table(df.dcast.na, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.dcast.na.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast.na)
# 40468
df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
write.table(df.location, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.txt", quote=F, row.names=F, sep="\t")
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df <- df[,c(1:6072,6074:6078,6080,6082:6176)]
df.num <- mutate_all(df[,2:ncol(df)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"
write.table(df.all, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.all[,2]), "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
# run python scripts on Andes
# run job submissions on Summit
# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]
# Andes
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName e.coli.tensor.single.bp.dimers --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/Submits/submit_full_e.coli.tensor.single.bp.dimers_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/Submits/submit_train_e.coli.tensor.single.bp.dimers_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/Submits/submit_test_e.coli.tensor.single.bp.dimers_0.sh
# Andes
module load python/3.7-anaconda3
vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt e.coli.tensor.single.bp.dimers
# 0.25925023667824065
sort -k3rg topVarEdges/cut.score_top95.txt | head
# p20bp_HOMO_evraw cut.score 0.04132934588303254
# p20bp_HL.gab_evraw cut.score 0.027377143866354425
# V231.xsgRNA.raw cut.score 0.02703345485327785
# V303.xsgRNA.raw cut.score 0.0235055107545665
# sgRNA.tempsgRNA.raw cut.score 0.021431169257080367
# sgRNA.gcsgRNA.raw cut.score 0.02122949600576284
# p20bp_LUMO_evraw cut.score 0.021225482607726668
# pam.distance0 cut.score 0.020960305693732667
# p18HOMO_eVraw cut.score 0.020957253838975995
# p20bp_bondraw cut.score 0.020619877566560477
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/e.coli.tensor.single.bp.dimers_cut.score.importance4 | head
# p20bp_HOMO_evraw: 124305
# V303.xsgRNA.raw: 52691.6 <-- dependent 2 (p19.GC)
# V231.xsgRNA.raw: 51941.8 <-- dependent 2 (p15.CC)
# CCsgRNA.raw: 43885
# sgRNA.gcsgRNA.raw: 39482.6
# GGsgRNA.raw: 39031.4
# sgRNA.tempsgRNA.raw: 38915.7
# pam.distance0: 38026.6
# p18LUMO_eVraw: 37618.4
# p18HOMO_eVraw: 36632.3
# V231.x = p15.CC
# V303.x = p19.GC
# V1110.x = p19.CCA
# V257.x = p16.GG
# V74.x = p5.TA
# V305.x = p19.GG
# V215.x = p14.CC
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("e.coli.tensor.single.bp.dimers_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.5010897
** Need to compile the C++ file /gpfs/alpine/syb105/proj-shared/Personal/jromero/codesnippets/ritw **
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score e.coli.tensor.single.bp.dimers
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/cut.score/RIT.run
#### looking at the top features (weight and direction)
# on local computer
#scp noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/cut.score/e.coli.tensor.single.bp.dimers.importance4.effect_sorted .
library(dplyr)
library(tidyr)
library(reshape2)
library(ggplot2)
library(RColorBrewer)
setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP")
imp <- read.delim("e.coli.tensor.single.bp.dimers.importance4.effect_sorted", header=F, sep="\t")
nrow(imp)
# 2020
imp$weight <- as.numeric(substr(imp$V3, 0, 4))
imp.dir <- imp %>% mutate(direction = ifelse(V4 < 0, "neg", ifelse(V4 > 0, "pos", "zero")))
imp.dir.top20 <- imp.dir[1:20,]
ggplot(imp.dir.top20) + geom_bar(aes(x=reorder(V1, -weight), y=weight, fill=direction), stat="identity") + theme_classic() + xlab("Top Feature") + ylab("Feature Importance Weight") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_fill_brewer(palette="Set1")
setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP")
imp <- read.delim("e.coli.tensor.single.bp.dimers.importance4.effect_sorted", header=F, sep="\t", stringsAsFactors = F)
colnames(imp) <- c("Feature", "YVec", "NormEdge", "Effect", "Samples", "Linearity")
#imp$Normalized.Importance <- as.numeric(substr(imp$NormEdge, 0, 4))
imp$Normalized.Importance <- as.numeric(imp$NormEdge)
imp$Feature.Effect <- as.numeric(imp$Effect)
imp$SampleCount <- as.numeric(imp$Samples)
imp.dir <- imp %>% mutate(Effect.Direction = ifelse(Feature.Effect < 0, "neg", ifelse(Feature.Effect > 0, "pos", "zero")))
imp.dir.top20 <- imp.dir[1:20,]
ggplot(imp.dir.top20) + geom_bar(aes(x=reorder(Feature, -Normalized.Importance), y=Normalized.Importance, fill=Effect.Direction), stat="identity") + theme_classic() + xlab("Top Features") + ylab("Normalized Importance") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_fill_brewer(palette="Set1")
ggplot(imp.dir.top20) + geom_bar(aes(x=Feature, y=Feature.Effect, fill=Effect.Direction), stat="identity") + coord_flip() + theme_classic() + xlab("Top Features") + ylab("Feature Effect") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_fill_brewer(palette="Set1")
ggplot(imp.dir.top20, aes(x=reorder(Feature, -Normalized.Importance))) + geom_bar(aes(y=Normalized.Importance, fill=Effect.Direction), stat="identity") + coord_flip() + xlab("") + ylab("Normalized Importance") + theme_classic() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position="bottom") + scale_fill_brewer(palette="Set1")
ggplot(imp.dir.top20, aes(x=reorder(Feature, -Normalized.Importance))) + geom_bar(aes(y=Normalized.Importance, fill=Effect.Direction), stat="identity") + geom_point(aes(y=abs(Feature.Effect))) + coord_flip() + xlab("") + theme_classic() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_fill_brewer(palette="Set1") + scale_y_continuous("Normalized Importance (bars)", sec.axis = sec_axis(~. * 100, name="% Feature Effect (points)"))
imp.dir.top20$Sample.Prop <- imp.dir.top20$SampleCount/32374
ggplot(imp.dir.top20, aes(x=reorder(Feature, -Normalized.Importance))) + geom_bar(aes(y=Normalized.Importance, fill=Effect.Direction), stat="identity") + geom_point(aes(y=abs(Sample.Prop))) + coord_flip() + xlab("") + theme_classic() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_fill_brewer(palette="Set1") + scale_y_continuous("Normalized Importance (bars)", sec.axis = sec_axis(~. , name="Avg Proportion of Samples that Features Influence"))
ggplot(imp.dir.top20, aes(x=reorder(Feature, -Normalized.Importance))) + geom_point(aes(y=Sample.Prop, color=Effect.Direction, size=Normalized.Importance)) + xlab("") + ylab("Avg Proportion of Samples that Features Influence") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Set2")
ggplot(imp.dir.top20, aes(x=reorder(Feature, -Normalized.Importance))) + geom_point(aes(y=Sample.Prop, color=Effect.Direction, size=Feature.Effect)) + xlab("") + ylab("Avg Proportion of Samples that Features Influence") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Set2")
#### looking at the interaction of features
library(dplyr)
library(tidyr)
library(reshape2)
library(ggplot2)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/cut.score")
rit <- read.delim("e.coli.tensor.single.bp.dimers_cut.score.rit", header=F, sep="\t")
key <- read.delim("e.coli.tensor.single.bp.dimers_cut.score.paths.key.out", header=F, sep=",")
colnames(key) <- c("feature", "feature.key")
colnames(rit) <- c("rit.value", "rit.features")
rit.id <- separate(rit, "rit.features", c("feature1.key", "feature2.key"))
rit.id$feature1.key <- as.numeric(rit.id$feature1.key)
rit.id$feature2.key <- as.numeric(rit.id$feature2.key)
key.1 <- key
colnames(key.1) <- c("feature1", "feature1.key")
key.2 <- key
colnames(key.2) <- c("feature2", "feature2.key")
rit.feature1.key <- left_join(rit.id, key.1, by=c("feature1.key"))
rit.key <- inner_join(rit.feature1.key, key.2, by=c("feature2.key"))
write.table(rit.key, "e.coli.tensor.single.bp.dimers_cut.score.rit_IDdefined.txt", quote=F, row.names=F, sep="\t")
# check to see if any of the features in this file have a 0 importance score
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/cut.score")
rit <- read.delim("e.coli.tensor.single.bp.dimers_cut.score.rit_IDdefined.txt", header=T, sep=" ")
imp <- read.delim("e.coli.tensor.single.bp.dimers_cut.score.importance4", header=F, sep=":")
imp.feature1 <- subset(imp, imp$V1 %in% rit$feature1)
imp.feature2 <- subset(imp, imp$V1 %in% rit$feature2)
## look at full RIT set
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/cut.score")
rit <- read.delim("e.coli.tensor.single.bp.dimers_cut.score.full_rit_sort", header=F, sep="\t")
key <- read.delim("e.coli.tensor.single.bp.dimers_cut.score.paths.key.out", header=F, sep=",")
colnames(key) <- c("feature", "feature.key")
colnames(rit) <- c("rit.value", "rit.features")
rit.id <- separate(rit, "rit.features", c("feature1.key", "feature2.key"))
rit.id$feature1.key <- as.numeric(rit.id$feature1.key)
rit.id$feature2.key <- as.numeric(rit.id$feature2.key)
key.1 <- key
colnames(key.1) <- c("feature1", "feature1.key")
key.2 <- key
colnames(key.2) <- c("feature2", "feature2.key")
rit.feature1.key <- left_join(rit.id, key.1, by=c("feature1.key"))
rit.key <- inner_join(rit.feature1.key, key.2, by=c("feature2.key"))
write.table(rit.key, "e.coli.tensor.single.bp.dimers_cut.score.full_rit_sort_IDdefined.txt", quote=F, row.names=F, sep="\t")
# on local computer
#scp noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/cut.score/e.coli.tensor.single.bp.dimers_cut.score.rit_IDdefined.txt .
# https://methods.sagepub.com/dataset/howtoguide/network-diagram-in-unhcr-2016
# install.packages(c("igraph","graphlayouts","ggraph","ggplot2"))
library(igraph)
library(ggraph)
library(graphlayouts)
setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP")
df <- read.csv("e.coli.tensor.single.bp.dimers_cut.score.rit_IDdefined.txt", header=T, sep=" ")
df$rit.val.dec <- as.numeric(substr(df$rit.value, 0, 4))
df.network <- df[,c(4,5,6)]
nodes1 <- df.network %>% select("feature1") %>% distinct() %>% rename("feature" = "feature1")
nodes2 <- df.network %>% select("feature2") %>% distinct() %>% rename("feature" = "feature2")
nodes <- union(nodes1,nodes2)
nodes$ID <- seq.int(nrow(nodes))
net <- graph_from_data_frame(d=df.network, directed=TRUE)
l <- layout_with_lgl(net, maxiter=93)
edgesp18HOMO_eVraw <- incident(net, V(net)[name=="p18HOMO_eVraw"], mode="out")
edgesp18LUMO_eVraw <- incident(net, V(net)[name=="p18LUMO_eVraw"], mode="out")
ecol <- rep("gray", ecount(net))
ecol[edgesp18HOMO_eVraw] <- "orange"
ecol[edgesp18LUMO_eVraw] <- "gold"
vcol <- rep("gray", vcount(net))
vcol[V(net)$name=="p18HOMO_eVraw"] <- "orange"
vcol[V(net)$name=="p18LUMO_eVraw"] <- "gold"
#plot(net, main="E.coli RIT", layout=l, edge.curved=.25, edge.arrow.size=log(E(net)$rit.val.dec)/6, edge.label=E(net)$rit.val.dec, edge.label.color="black", edge.label.cex=.7, vertex.label.color="black", vertex.label.cex=log(strength(net))/12)
plot(net, main="E.coli RIT", layout=l, edge.arrow.size=log(E(net)$rit.val.dec)/6, edge.label=E(net)$rit.val.dec, edge.label.color="black", edge.label.cex=.5, vertex.label.color="black", vertex.label.cex=.5, edge.color=ecol, vertex.color=vcol)
library(igraph)
library(ggraph)
library(graphlayouts)
library(dplyr)
library(tidyr)
setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP")
df <- read.csv("e.coli.tensor.single.bp.dimers_cut.score.full_rit_sort_IDdefined.txt", header=T, sep="\t")
df$rit.val.dec <- as.numeric(substr(df$rit.value, 0, 4))
df.network <- df[,c(4,5,6)]
nodes1 <- df.network %>% select("feature1") %>% distinct() %>% rename("feature" = "feature1")
nodes2 <- df.network %>% select("feature2") %>% distinct() %>% rename("feature" = "feature2")
nodes <- union(nodes1,nodes2)
nodes$ID <- seq.int(nrow(nodes))
net <- graph_from_data_frame(d=df.network, directed=TRUE)
l <- layout_with_lgl(net, maxiter=93)
edgesp20bp_HOMO_evraw <- incident(net, V(net)[name=="p20bp_HOMO_evraw"], mode="out")
edgesp20bp_HL.gab_evraw <- incident(net, V(net)[name=="p20bp_HL.gab_evraw"], mode="out")
edgesp18HOMO_eVraw <- incident(net, V(net)[name=="p18HOMO_eVraw"], mode="out")
edgesp18LUMO_eVraw <- incident(net, V(net)[name=="p18LUMO_eVraw"], mode="out")
ecol <- rep("gray", ecount(net))
ecol[edgesp20bp_HOMO_evraw] <- "orange"
ecol[edgesp20bp_HL.gab_evraw] <- "gold"
ecol[edgesp18HOMO_eVraw] <- "light green"
ecol[edgesp18LUMO_eVraw] <- "light blue"
vcol <- rep("gray", vcount(net))
vcol[V(net)$name=="p20bp_HOMO_evraw"] <- "orange"
vcol[V(net)$name=="p20bp_HL.gab_evraw"] <- "gold"
vcol[V(net)$name=="p18HOMO_eVraw"] <- "light green"
vcol[V(net)$name=="p18LUMO_eVraw"] <- "light blue"
plot(net, main="E.coli RIT", layout=l, edge.arrow.size=log(E(net)$rit.val.dec)/6, edge.label=E(net)$rit.val.dec, edge.label.color="black", edge.label.cex=.5, vertex.label.color="black", vertex.label.cex=.5, edge.color=ecol, vertex.color=vcol)
pdf("ecoli.rit.network.pdf")
plot(net, main="E.coli RIT", layout=l, edge.arrow.size=log(E(net)$rit.val.dec)/6, edge.label=E(net)$rit.val.dec, edge.label.color="black", edge.label.cex=.5, vertex.label.color="black", vertex.label.cex=.5, edge.color=ecol, vertex.color=vcol)
dev.off()
library(igraph)
library(ggraph)
library(graphlayouts)
library(dplyr)
library(tidyr)
library(reshape2)
setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/onehot")
onehot <- read.delim("onehot.labels.txt", header=F, sep="\t")
onehot.t <- data.frame(t(onehot))
colnames(onehot.t) <- c("matrix.label", "onehot.label")
setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP")
df <- read.csv("e.coli.tensor.single.bp.dimers_cut.score.full_rit_sort_IDdefined.txt", header=T, sep="\t")
df$rit.val.dec <- as.numeric(substr(df$rit.value, 0, 4))
df.network <- df[,c(4,5,6)]
df.network$feature1.label <-
c("p19.GC","p16.GG","p16.GG","p18.HOMOeV","p18.HOMOeV","p19.CCA","p15.CC","GC.content","p15.CC","p15.CC",
"p20bp.HOMOeV","p18.LUMOeV","p18.LUMOeV","p16.GG","p18.LUMOeV","p19.GC","p18.HOMOeV","GC.content","p19.CCA","p18.HOMOeV",
"p19.CCA","p19.GC","p18.HOMOeV","p16.GG","p15.CC","p19.GC","CC","p18.HOMOeV","p19.GC","CC",
"p19.GC","p16.GG","CC","p19.CCA","p19.GC","p15.CC","p16.GG","p19.GC","CC","p19.CCA",
"p18.LUMOeV","p18.LUMOeV","GG","p15.dimer.Hbond","CC","p19.GC","p18.LUMOeV","GC.content","p1.TTTT","GG",
"GG","CC","p18.LUMOeV","p15.CC","p18.HOMOeV","p16.GG","p15.CC","p1.TTTT","p19.LUMOeV","PAM.distance",
"CC","p16.GG","PAM.distance","p20bp.HOMOeV","p19.CCA","p20bp.HLgap","p15.CC","p1.TTTT","p18.HOMOeV","p19.CCA",
"p16.ATCA","p13.dimer.Hbond","p18.LUMOeV","p19.GC","CC","p16.GG")
df.network$feature2.label <-
c("p15.CC","GC.content","p19.CCA","p15.CC","GC.content","GC.content","p20bp.HOMOeV","p20bp.HOMOeV","Tm","GC.content",
"Tm","GC.content","p15.CC","Tm","Tm","p19.HOMOeV","Tm","p15.dimer.Hbond","Tm","p19.CCA",
"p15.dimer.Hbond","p20bp.HOMOeV","p20bp.HOMOeV","p1.TTTT","p19.CCA","p18.HOMOeV","p15.CC","p16.GG","p19.LUMOeV","p18.HOMOeV",
"CC","p15.dimer.Hbond","p20bp.HOMOeV","p1.TTTT","Tm","GG","p20bp.HOMOeV","GC.content","GG","p20bp.HOMOeV",
"p19.CCA","p20bp.HOMOeV","GC.content","Tm","GC.content","p18.LUMOeV","p16.GG","Tm","GC.content","p20bp.HOMOeV",
"Tm","Tm","CC","p19.HOMOeV","p15.dimer.Hbond","p14.CC","p20bp.HLgap","Tm","p15.CC","Tm",
"p16.GG","p19.CCA","GC.content","p15.dimer.Hbond","p14.CC","GC.content","T","p15.dimer.Hbond","GG","GG",
"p19.CCA","Tm","p15.dimer.Hbond","GG","p19.CCA","p15.CC")
df.network.label <- df.network[,c(4,5,3)]
colnames(df.network.label) <- c("feature1","feature2","rit.val.dec")
write.table(df.network.label, "e.coli.network.rit.txt", quote=F, row.names=F, sep="\t")
nodes1 <- df.network.label %>% select("feature1") %>% distinct() %>% rename("feature" = "feature1")
nodes2 <- df.network.label %>% select("feature2") %>% distinct() %>% rename("feature" = "feature2")
nodes <- union(nodes1,nodes2)
nodes$ID <- seq.int(nrow(nodes))
net <- graph_from_data_frame(d=df.network.label, directed=TRUE)
l <- layout_with_lgl(net, maxiter=93)
edges.pam <- incident(net, V(net)[name=="PAM.distance"], mode="out")
edges.GCcontent <- incident(net, V(net)[name=="GC.content"], mode="out")
edges.Tm <- incident(net, V(net)[name=="Tm"], mode="out")
edges.GG <- incident(net, V(net)[name=="GG"], mode="out")
edges.CC <- incident(net, V(net)[name=="CC"], mode="out")
edges.T <- incident(net, V(net)[name=="T"], mode="out")
edges.p16.GG <- incident(net, V(net)[name=="p16.GG"], mode="out")
edges.p1.TTTT <- incident(net, V(net)[name=="p1.TTTT"], mode="out")
edges.p15.CC <- incident(net, V(net)[name=="p15.CC"], mode="out")
edges.p19.GC <- incident(net, V(net)[name=="p19.GC"], mode="out")
edges.p19.CCA <- incident(net, V(net)[name=="p19.CCA"], mode="out")
edges.p14.CC <- incident(net, V(net)[name=="p14.CC"], mode="out")
edges.p16.ATCA <- incident(net, V(net)[name=="p16.ATCA"], mode="out")
edges.p20bp.HOMOeV <- incident(net, V(net)[name=="p20bp.HOMOeV"], mode="out")
edges.p20bpHL.gap <- incident(net, V(net)[name=="p20bpHL.gap"], mode="out")
edges.p13.dimer.Hbond <- incident(net, V(net)[name=="p13.dimer.Hbond"], mode="out")
edges.p15.dimer.Hbond <- incident(net, V(net)[name=="p15.dimer.Hbond"], mode="out")
edges.p18.LUMOeV <- incident(net, V(net)[name=="p18.LUMOeV"], mode="out")
edges.p19.LUMOeV <- incident(net, V(net)[name=="p19.LUMOeV"], mode="out")
edges.p19.HOMOeV <- incident(net, V(net)[name=="p19.HOMOeV"], mode="out")
ecol <- rep("gray", ecount(net))
ecol[edges.pam] <- "orange"
ecol[edges.GCcontent] <- "orange"
ecol[edges.Tm] <- "orange"
ecol[edges.GG] <- "orange"
ecol[edges.CC] <- "orange"
ecol[edges.T] <- "orange"
ecol[edges.p1.TTTT] <- "yellow"
ecol[edges.p16.GG] <- "yellow"
ecol[edges.p15.CC] <- "yellow"
ecol[edges.p19.GC] <- "yellow"
ecol[edges.p19.CCA] <- "yellow"
ecol[edges.p14.CC] <- "yellow"
ecol[edges.p16.ATCA] <- "yellow"
ecol[edges.p20bp.HLgap] <- "light purple"
ecol[edges.p20bp.HOMOeV] <- "light purple"
ecol[edges.p13.dimer.Hbond] <- "light green"
ecol[edges.p15.dimer.Hbond] <- "light green"
ecol[edges.p19.LUMOeV] <- "light blue"
ecol[edges.p18.LUMOeV] <- "light blue"
ecol[edges.p19.HOMOeV] <- "light blue"
vcol <- rep("gray", vcount(net))
vcol[V(net)$name=="PAM.distance"] <- "orange"
vcol[V(net)$name=="GC.content"] <- "orange"
vcol[V(net)$name=="Tm"] <- "orange"
vcol[V(net)$name=="GG"] <- "orange"
vcol[V(net)$name=="CC"] <- "orange"
vcol[V(net)$name=="T"] <- "orange"
vcol[V(net)$name=="p1.TTTT"] <- "yellow"
vcol[V(net)$name=="p16.GG"] <- "yellow"
vcol[V(net)$name=="p15.CC"] <- "yellow"
vcol[V(net)$name=="p19.GC"] <- "yellow"
vcol[V(net)$name=="p19.CCA"] <- "yellow"
vcol[V(net)$name=="p14.CC"] <- "yellow"
vcol[V(net)$name=="p16.ATCA"] <- "yellow"
vcol[V(net)$name=="p20bp.HOMOeV"] <- "light purple"
vcol[V(net)$name=="p20bp.HLgap"] <- "light purple"
vcol[V(net)$name=="p13.dimer.Hbond"] <- "light green"
vcol[V(net)$name=="p15.dimer.Hbond"] <- "light green"
vcol[V(net)$name=="p18.LUMOeV"] <- "light blue"
vcol[V(net)$name=="p19.LUMOeV"] <- "light blue"
vcol[V(net)$name=="p19.HOMOeV"] <- "light blue"
plot(net, main="E.coli RIT", layout=l, edge.arrow.size=log(E(net)$rit.val.dec)/6, edge.label="", edge.label.color="black", edge.label.cex=.5, vertex.label.color="black", vertex.label.cex=.5, edge.color=ecol, vertex.color=vcol)
df <- read.delim("e.coli.network.rit.txt", header=T, sep="\t")
df$edge.weight <- df$rit.val.dec * 100
write.table(df, "e.coli.network.rit.edge.txt", quote=F, row.names=F, sep="\t")
df <- read.delim("e.coli.network.rit.txt", header=T, sep="\t")
df$feature1.group <-
c("positional.encoding","positional.encoding","positional.encoding","quantum.single","quantum.single","positional.encoding","positional.encoding","raw.calculation","positional.encoding","positional.encoding","quantum.basepair","quantum.single","quantum.single","positional.encoding","quantum.single","positional.encoding","quantum.single","raw.calculation","positional.encoding","quantum.single","positional.encoding","positional.encoding","quantum.single","positional.encoding","positional.encoding","positional.encoding","raw.calculation","quantum.single","positional.encoding","raw.calculation","positional.encoding","positional.encoding","raw.calculation","positional.encoding","positional.encoding","positional.encoding","positional.encoding","positional.encoding","raw.calculation","positional.encoding","quantum.single","quantum.single","raw.calculation","quantum.dimer","raw.calculation","positional.encoding","quantum.single","raw.calculation","positional.encoding","raw.calculation","raw.calculation","raw.calculation","quantum.single","positional.encoding","quantum.single","positional.encoding","positional.encoding","positional.encoding","quantum.single","raw.calculation","raw.calculation","positional.encoding","raw.calculation","quantum.basepair","positional.encoding","quantum.basepair","positional.encoding","positional.encoding","quantum.single","positional.encoding","positional.encoding","quantum.dimer","quantum.single","positional.encoding","raw.calculation","positional.encoding")
df$feature2.group <- c("positional.encoding","raw.calculation","positional.encoding","positional.encoding","raw.calculation","raw.calculation","quantum.basepair","quantum.basepair","raw.calculation","raw.calculation","raw.calculation","raw.calculation","positional.encoding","raw.calculation","raw.calculation","quantum.single","raw.calculation","quantum.dimer","raw.calculation","positional.encoding","quantum.dimer","quantum.basepair","quantum.basepair","positional.encoding","positional.encoding","quantum.single","positional.encoding","positional.encoding","quantum.single","quantum.single","raw.calculation","quantum.dimer","quantum.basepair","positional.encoding","raw.calculation","raw.calculation","quantum.basepair","raw.calculation","raw.calculation","quantum.basepair","positional.encoding","quantum.basepair","raw.calculation","raw.calculation","raw.calculation","quantum.single","positional.encoding","raw.calculation","raw.calculation","quantum.basepair","raw.calculation","raw.calculation","raw.calculation","quantum.single","quantum.dimer","positional.encoding","quantum.basepair","raw.calculation","postional.encoding","raw.calculation","postional.encoding","postional.encoding","raw.calculation","quantum.dimer","postional.encoding","raw.calculation","raw.calculation","quantum.dimer","raw.calculation","raw.calculation","postional.encoding","raw.calculation","quantum.dimer","raw.calculation","postional.encoding","postional.encoding")
df$edge.weight <- df$rit.val.dec * 10
write.table(df, "e.coli.network.rit.group.txt", quote=F, row.names=F, sep="\t")
# understanding the output of /gpfs/alpine/syb105/proj-shared/Personal/jromero/PathAnalysis/ritEval.py ????
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/cut.score")
rit.adj <- read.delim("e.coli.tensor.single.bp.dimers_cut.score.rit.adj", header=T, sep="\t")
rit.edge <- read.delim("e.coli.tensor.single.bp.dimers_cut.score.rit.edge", header=T, sep="\t")
effect <- read.delim("e.coli.tensor.single.bp.dimers_cut.score.importance4.effect", header=T, sep="\t")
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate shap
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
# python
import pandas as pd
import numpy as np
np.random.seed(0)
import matplotlib.pyplot as plt
df = pd.read_table('/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.txt') # Load the data
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.ensemble import RandomForestRegressor
# The target variable is 'cut.score'.
Y = df['cut.score']
# get list of features from R... dput(colnames(df))
X = df.drop(columns =['sgRNAID', 'cut.score'])
# Split the data into train and test data:
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size = 0.2)
# Build the model with the random forest regression algorithm:
model = RandomForestRegressor(max_depth=6,random_state=0,n_estimators=10)
model.fit(X_train, Y_train)
import shap
shap_values = shap.TreeExplainer(model).shap_values(X_train)
f = plt.figure()
shap.summary_plot(shap_values, X_train, plot_type="bar")
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.raw.onehot.tensor.single.bp.dimers.pam.location.shap_summary_plot_bar.png", bbox_inches='tight', dpi=600)
import matplotlib.pyplot as plt
f = plt.figure()
shap.summary_plot(shap_values, X_train)
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.raw.onehot.tensor.single.bp.dimers.pam.location.shap_summary_plot_varimp.png", bbox_inches='tight', dpi=600)
# scp noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.raw.onehot.tensor.single.bp.dimers.pam.location.shap_summary_plot_varimp.png /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP/.
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/e.coli.tensor.single.bp.dimers_cut.score.importance4 > cut.score/foldRuns/fold9/Runs/Set4/e.coli.tensor.single.bp.dimers_cut.score.importance4.sorted
# R
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/cut.score/foldRuns/fold9/Runs/Set4/")
imp <- read.delim("e.coli.tensor.single.bp.dimers_cut.score.importance4.sorted", header=F, sep=":")
imp5 <- imp[1:5,]
imp10 <- imp[1:10,]
imp20 <- imp[1:20,]
imp50 <- imp[1:50,]
imp100 <- imp[1:100,]
imp200 <- imp[1:200,]
imp500 <- imp[1:500,]
imp1k <- imp[1:1000,]
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
df <- read.delim("Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.txt", header=T, sep="\t")
top5 <- df %>% select(matches(imp5$V1))
df.top5 <- cbind(df[,1], top5)
write.table(df.top5, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top5.txt", quote=F, row.names=F, sep="\t")
write.table(df.top5, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top5_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.top5[,2:ncol(df.top5)], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top5_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
top10 <- df %>% select(matches(imp10$V1))
df.top10 <- cbind(df[,1], top10)
write.table(df.top10, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top10.txt", quote=F, row.names=F, sep="\t")
write.table(df.top10, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top10_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.top10[,2:ncol(df.top10)], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top10_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
top20 <- df %>% select(matches(imp20$V1))
df.top20 <- cbind(df[,1], top20)
write.table(df.top20, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top20.txt", quote=F, row.names=F, sep="\t")
write.table(df.top20, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top20_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.top20[,2:ncol(df.top20)], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top20_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
top50 <- df %>% select(matches(imp50$V1))
df.top50 <- cbind(df[,1], top50)
write.table(df.top50, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top50.txt", quote=F, row.names=F, sep="\t")
write.table(df.top50, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top50_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.top50[,2:ncol(df.top50)], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top50_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
top100 <- df %>% select(matches(imp100$V1))
df.top100 <- cbind(df[,1], top100)
write.table(df.top100, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top100.txt", quote=F, row.names=F, sep="\t")
write.table(df.top100, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top100_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.top100[,2:ncol(df.top100)], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top100_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
top200 <- df %>% select(matches(imp200$V1))
df.top200 <- cbind(df[,1], top200)
write.table(df.top200, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top200.txt", quote=F, row.names=F, sep="\t")
write.table(df.top200, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top200_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.top200[,2:ncol(df.top200)], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top200_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
top500 <- df %>% select(matches(imp500$V1))
df.top500 <- cbind(df[,1], top500)
write.table(df.top500, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top500.txt", quote=F, row.names=F, sep="\t")
write.table(df.top500, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top500_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.top500[,2:ncol(df.top500)], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top500_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
top1k <- df %>% select(matches(imp1k$V1))
df.top1k <- cbind(df[,1], top1k)
write.table(df.top1k, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top1k.txt", quote=F, row.names=F, sep="\t")
write.table(df.top1k, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top1k_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.top1k[,2:ncol(df.top1k)], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top1k_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
# top 5 features
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top5
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top5
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName top5 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top5.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top5
module load python/3.7.0-anaconda3-5.3.0
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top5/Submits/submit_full_top5_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top5/Submits/submit_train_top5_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top5/Submits/submit_test_top5_0.sh
module load python/3.7-anaconda3
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top5
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt top5
# 0.11240745945016933
sort -k3rg topVarEdges/cut.score_top95.txt | head
# p20bp_HOMO_evraw cut.score 0.5560700206650031
# sgRNA.gcsgRNA.raw cut.score 0.18401685074949461
# V303.xsgRNA.raw cut.score 0.13877919464539287
# CCsgRNA.raw cut.score 0.07303826527934598
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top5/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("top5_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.3436711
# top 10 features
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top10
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top10
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName top10 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top10.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top10
module load python/3.7.0-anaconda3-5.3.0
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top10/Submits/submit_full_top10_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top10/Submits/submit_train_top10_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top10/Submits/submit_test_top10_0.sh
module load python/3.7-anaconda3
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top10
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt top10
# 0.15779734147083332
sort -k3rg topVarEdges/cut.score_top95.txt | head
# p20bp_HOMO_evraw cut.score 0.2779751149892543
# CCsgRNA.raw cut.score 0.10392795171106281
# GGsgRNA.raw cut.score 0.08649169902899904
# pam.distance0 cut.score 0.08305282825098441
# V303.xsgRNA.raw cut.score 0.0808250888344932
# sgRNA.gcsgRNA.raw cut.score 0.08065523571159726
# p18HOMO_eVraw cut.score 0.07858676088784716
# p18LUMO_eVraw cut.score 0.07790865831751967
# sgRNA.tempsgRNA.raw cut.score 0.07505008265603128
# V231.xsgRNA.raw cut.score 0.05552657961221091
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top10/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("top10_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4019815
# top 20 features
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top20
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top20
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName top20 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top20.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top20
module load python/3.7.0-anaconda3-5.3.0
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top20/Submits/submit_full_top20_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top20/Submits/submit_train_top20_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top20/Submits/submit_test_top20_0.sh
module load python/3.7-anaconda3
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top20
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt top20
# 0.20172360134328232
sort -k3rg topVarEdges/cut.score_top95.txt | head
# p15dimer_H_bondraw cut.score 0.07323354697021574
# pam.distance0 cut.score 0.06606994398300291
# sgRNA.gcsgRNA.raw cut.score 0.05703286182756317
# CCsgRNA.raw cut.score 0.056726691069441004
# sgRNA.tempsgRNA.raw cut.score 0.05473447914546529
# GGsgRNA.raw cut.score 0.05202158281633737
# TsgRNA.raw cut.score 0.05141008209856573
# p18HOMO_eVraw cut.score 0.05070351109634776
# p18LUMO_eVraw cut.score 0.04933637277977404
# p19LUMO_eVraw cut.score 0.04712981730229931
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top20/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("top20_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4458406
# top 50 features
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top50
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top50
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName top50 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top50.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top50
module load python/3.7.0-anaconda3-5.3.0
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top50/Submits/submit_full_top50_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top50/Submits/submit_train_top50_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top50/Submits/submit_test_top50_0.sh
module load python/3.7-anaconda3
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top50
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt top50
# 0.24529071033783692
sort -k3rg topVarEdges/cut.score_top95.txt | head
# p15dimer_H_bondraw cut.score 0.05288961312310658
# p18dimer_H_bondraw cut.score 0.048287272079531054
# sgRNA.gcsgRNA.raw cut.score 0.048042513201620105
# sgRNA.tempsgRNA.raw cut.score 0.046670325843486814
# p20bp_HL.gab_evraw cut.score 0.03680256204578182
# p19dimer_HOMO_eVraw cut.score 0.036082345356197476
# p19LUMO_eVraw cut.score 0.035649593966638055
# p20bp_bondraw cut.score 0.03532168658007362
# p18LUMO_eVraw cut.score 0.035193304018897434
# CCsgRNA.raw cut.score 0.03424359955178815
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top50/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("top50_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4903894
# top 100 features
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top100
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top100
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName top100 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top100.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top100
module load python/3.7.0-anaconda3-5.3.0
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top100/Submits/submit_full_top100_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top100/Submits/submit_train_top100_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top100/Submits/submit_test_top100_0.sh
module load python/3.7-anaconda3
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top100
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt top100
# 0.2511902743156288
sort -k3rg topVarEdges/cut.score_top95.txt | head
# sgRNA.tempsgRNA.raw cut.score 0.042941379113082226
# sgRNA.gcsgRNA.raw cut.score 0.040372033685997025
# p15dimer_H_bondraw cut.score 0.03820204540598607
# p20bp_HL.gab_evraw cut.score 0.03748607181550359
# p20bp_LUMO_evraw cut.score 0.032674508114237846
# p18HOMO_eVraw cut.score 0.030854362074953134
# p18dimer_H_bondraw cut.score 0.029996170116640183
# p18LUMO_eVraw cut.score 0.029986605869136346
# p20bp_bondraw cut.score 0.02958156995010671
# CCsgRNA.raw cut.score 0.028279537131068865
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top100/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("top100_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4967809
# top 200 features
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top200
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top200
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName top200 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top200.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top200
module load python/3.7.0-anaconda3-5.3.0
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top200/Submits/submit_full_top200_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top200/Submits/submit_train_top200_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top200/Submits/submit_test_top200_0.sh
module load python/3.7-anaconda3
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top200
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt top200
# 0.2541662419500061
sort -k3rg topVarEdges/cut.score_top95.txt | head
# p20bp_HL.gab_evraw cut.score 0.040370718407463375
# sgRNA.gcsgRNA.raw cut.score 0.03770659114517902
# sgRNA.tempsgRNA.raw cut.score 0.03717516292014969
# p15dimer_H_bondraw cut.score 0.03390945697992189
# p20bp_LUMO_evraw cut.score 0.030311114551425513
# p20bp_bondraw cut.score 0.02994484824438354
# p18dimer_H_bondraw cut.score 0.0298580274228511
# CCsgRNA.raw cut.score 0.02874886974665889
# p19dimer_H_bondraw cut.score 0.025755526794290634
# p18HOMO_eVraw cut.score 0.025255463915996947
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top200/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("top200_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4984185
# top 500 features
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top500
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top500
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName top500 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top500.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top500
module load python/3.7.0-anaconda3-5.3.0
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top500/Submits/submit_full_top500_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top500/Submits/submit_train_top500_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top500/Submits/submit_test_top500_0.sh
module load python/3.7-anaconda3
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top500
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt top500
# 0.2564356719903999
sort -k3rg topVarEdges/cut.score_top95.txt | head
# p20bp_bondraw cut.score 0.033351661481709816
# p20bp_HL.gab_evraw cut.score 0.03302323848544733
# sgRNA.gcsgRNA.raw cut.score 0.03235167607143662
# sgRNA.tempsgRNA.raw cut.score 0.030300746213757945
# p15dimer_H_bondraw cut.score 0.028587105767771046
# p20bp_HOMO_evraw cut.score 0.027145399197064105
# CCsgRNA.raw cut.score 0.026952897435269286
# p20bp_LUMO_evraw cut.score 0.026059446119664594
# p18dimer_H_bondraw cut.score 0.026041073444393575
# p18LUMO_eVraw cut.score 0.025907985806609503
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top500/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("top500_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.500109
# top 1000 features
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top1k
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top1k
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName top1k --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top1k.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features
module load python/3.7.0-anaconda3-5.3.0
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top1k/Submits/submit_full_top1k_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top1k/Submits/submit_train_top1k_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top1k/Submits/submit_test_top1k_0.sh
module load python/3.7-anaconda3
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top1k
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt top1k
# 0.25773091055490766
sort -k3rg topVarEdges/cut.score_top95.txt | head
# p20bp_LUMO_evraw cut.score 0.03437642180047371
# sgRNA.gcsgRNA.raw cut.score 0.027582072433968742
# p20bp_HOMO_evraw cut.score 0.027573956881477995
# p20bp_HL.gab_evraw cut.score 0.02745782508958597
# sgRNA.tempsgRNA.raw cut.score 0.026721515126104947
# p20bp_bondraw cut.score 0.026638154288020657
# CCsgRNA.raw cut.score 0.024201018590038908
# p15dimer_H_bondraw cut.score 0.02355067173419084
# pam.distance0 cut.score 0.023324230177193622
# p18LUMO_eVraw cut.score 0.02261733909149155
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top1k/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("top1k_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.5000614
require(data.table)
### violin plots of R2 across different models
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults")
#create a list of the files from your target directory
file_list <- list.files(path="/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults")
#initiate a blank data frame, each iteration of the loop will add a column of the data from the given file to this variable
dataset <- data.frame()
for (i in 1:length(file_list)){
temp_data <- fread(file_list[i], stringsAsFactors = F) #read in files using the fread function from the data.table package
dataset <- do.call(cbind, sapply(file_list,data.table::fread, simplify = FALSE)) #for each iteration, bind the new data to the building dataset
}
colnames(dataset) <- c("onehot", "onehot.QCT", "raw.onehot", "raw.onehot.QCT", "raw", "raw.QCT", "QCT", "QCT.dimers", "QCT.single.bp.dimers.noncorrelated", "QCT.single.bp.dimers", "top10", "top100", "top1k", "top20", "top200", "top5", "top50", "top500")
library(ggplot2)
library(reshape2)
library(RColorBrewer)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
dataset.order <- dataset[,c(5,1,7,3,6,2,4,8,10,9,16,11,14,17,12,15,18,13)]
dataset.order.melt <- melt(dataset.order)
pdf("R2.order.violin.pdf")
ggplot(dataset.order.melt, aes(x=value, y=variable, fill=variable)) + geom_violin(trim=FALSE) + geom_boxplot(width=0.1, fill="white") + scale_color_brewer(palette="Dark2") + labs(title="R2 across iRF runs", x="feature run", y="R2") + theme_minimal() + theme(legend.position = "none")
dev.off()
dataset.subsets <- dataset[,c(5,1,7,3,6,2,4,8,10,9)]
dataset.subsets.melt <- melt(dataset.subsets)
pdf("R2.subsets.violin.pdf")
ggplot(dataset.subsets.melt, aes(x=value, y=variable, fill=variable)) + geom_violin(trim=FALSE) + geom_boxplot(width=0.1, fill="white") + scale_color_brewer(palette="Dark2") + labs(title="R2 across iRF runs", x="feature run", y="R2") + theme_minimal() + theme(legend.position = "none")
dev.off()
dataset.subsets2 <- dataset[,c(5,1,7,3,6,2,4,10)]
dataset.subsets2.melt <- melt(dataset.subsets2)
pdf("R2.subsets2.violin.pdf")
ggplot(dataset.subsets2.melt, aes(x=value, y=variable, fill=variable)) + geom_violin(trim=FALSE) + geom_boxplot(width=0.1, fill="white") + scale_color_brewer(palette="Dark2") + labs(title="R2 across iRF runs", x="feature run", y="R2") + theme_minimal() + theme(legend.position = "none")
dev.off()
dataset.top <- dataset[,c(16,11,14,17,12,15,18,13)]
dataset.top.melt <- melt(dataset.top)
pdf("R2.top.violin.pdf")
ggplot(dataset.top.melt, aes(x=value, y=variable, fill=variable)) + geom_violin(trim=FALSE) + geom_boxplot(width=0.1, fill="white") + scale_color_brewer(palette="Dark2") + labs(title="R2 across iRF runs", x="feature run", y="R2") + theme_minimal() + theme(legend.position = "none")
dev.off()
## scatterplot of feature importance vs the samples affected by that feature
library(ggplot2)
library(reshape2)
library(RColorBrewer)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/cut.score")
effect <- read.delim("e.coli.tensor.single.bp.dimers_cut.score.importance4.effect", header=T, sep="\t", stringsAsFactors = F)
# 2020
effect.sort <- effect[order(-effect$NormEdge),]
effect.sort.100 <- effect.sort[1:100,]
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
pdf("Imp.Effect.scatter.pdf")
ggplot(effect.sort.100, aes(x=NormEdge, y=Samples, label=Feature)) + geom_point(aes(size = FeatureEffect)) + scale_color_brewer(palette="Dark2") + labs(title="Samples v Importance (Top 100 Features)", x="Normalized Importance", y="Samples") + theme_minimal() + geom_text(hjust=0, vjust=0, size=2)
dev.off()
library(ggrepel)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
pdf("Imp.Effect.scatter.label.pdf")
ggplot(effect.sort.100, aes(x=NormEdge, y=Samples, label=Feature)) + geom_point(aes(size = FeatureEffect)) + scale_color_brewer(palette="Dark2") + labs(title="Samples v Importance (Top 100 Features)", x="Normalized Importance", y="Samples") + theme_minimal() + geom_text_repel(aes(label = Feature), box.padding = 0.35, point.padding = 0.5, segment.color = 'grey50', size=2)
dev.off()
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
summary(effect.sort.100$NormEdge)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 0.002762 0.003401 0.004474 0.007956 0.010092 0.041329
summary(effect.sort.100$Samples)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 1199 1722 3033 5030 6255 19242
# pdf("Imp.Effect.scatter.label.quartile.pdf")
# ggplot(effect.sort.100, aes(x=NormEdge, y=Samples, label=Feature)) + geom_point(color = dplyr::case_when(effect.sort.100$Samples > 6255 ~ "#1b9e77", effect.sort.100$Samples < 1722 ~ "#d95f02", TRUE ~ "#7570b3"), size = effect.sort.100$FeatureEffect, alpha = 0.8) +
# geom_text_repel(data = subset(effect.sort.100, Samples > 6255),
# nudge_y = 32 - subset(effect.sort.100, Samples > 6255)$Samples,
# size = 2,
# box.padding = 1.5,
# point.padding = 0.5,
# force = 100,
# segment.size = 0.2,
# segment.color = "grey50",
# direction = "x") +
# scale_color_brewer(palette="Dark2") + labs(title="Samples v Importance (Top 100 Features)", x="Normalized Importance", y="Samples") + theme_minimal()
# dev.off()
pdf("Imp.Effect.scatter.label.quartile.color.pdf")
ggplot(effect.sort.100, aes(x=NormEdge, y=Samples, label=Feature)) + geom_point(aes(size = FeatureEffect), color = dplyr::case_when(effect.sort.100$Samples > 6255 ~ "#1b9e77", effect.sort.100$Samples < 1722 ~ "#d95f02", TRUE ~ "#7570b3"), alpha = 0.8) + geom_text_repel(aes(label = Feature), box.padding = 0.35, point.padding = 0.5, segment.color = 'grey50', size=2) + labs(title="Samples v Importance (Top 100 Features)", x="Normalized Importance", y="Samples") + theme_minimal()
dev.off()
pdf("Imp.Effect.scatter.label.quartile.colorEffect.pdf")
ggplot(effect.sort.100, aes(x=NormEdge, y=Samples, label=Feature)) + geom_point(aes(size = FeatureEffect), color = dplyr::case_when(effect.sort.100$FeatureEffect > 0.010092 ~ "#1b9e77", effect.sort.100$FeatureEffect < 0.003401 ~ "#d95f02", TRUE ~ "#7570b3"), alpha = 0.8) + geom_text_repel(aes(label = Feature), box.padding = 0.35, point.padding = 0.5, segment.color = 'grey50', size=2) + labs(title="Samples v Importance (Top 100 Features)", x="Normalized Importance", y="Samples") + theme_minimal()
dev.off()
# ## heatmap of direction, size effect and importance of top features
# library(dplyr)
# library(ggplot2)
# library(reshape2)
# library(hrbrthemes)
# library(viridis)
#
# setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/cut.score")
# effect <- read.delim("e.coli.tensor.single.bp.dimers_cut.score.importance4.effect", header=T, sep="\t", stringsAsFactors = F)
# # 2020
# effect.sort <- effect[order(-effect$NormEdge),]
# effect.sort.20 <- effect.sort[1:20,c(1,3:4)]
# effect.sort.20.dir <- effect.sort.20 %>% mutate(Direction = ifelse(FeatureEffect > 0, "Predict High Efficiency", "Predict Low Efficiency"))
# colnames(effect.sort.20.dir) <- c("Feature", "Normalized Importance", "Effect Size", "Direction of Effect")
# effect.sort.20.melt <- melt(effect.sort.20.dir, id="Feature")
#
# setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
# pdf("Dir.Imp.Effect.heatmap.pdf")
# #ggplot(effect.sort.20.melt, aes(variable, Feature, fill= value)) + geom_tile() + scale_fill_viridis(discrete=FALSE) + theme_ipsum() + facet_wrap(. ~ variable, scales="free") + labs(title="Direction, Importance, and Effect (Top 20 Features)") + theme_minimal()
# ggplot(effect.sort.20.melt, aes(variable, Feature, fill= value)) + geom_tile() + facet_wrap(. ~ variable, scales="free") + labs(title="Direction, Importance, and Effect (Top 20 Features)") + theme_minimal() + theme(legend.position="bottom")
# dev.off()
## for a single feature... normalized importance across models
## normalized importance across features (for a single model)
library(ggplot2)
library(reshape2)
library(RColorBrewer)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/cut.score")
effect <- read.delim("e.coli.tensor.single.bp.dimers_cut.score.importance4.effect", header=T, sep="\t", stringsAsFactors = F)
# 2020
effect.sort <- effect[order(-effect$NormEdge),]
effect.sort.50 <- effect.sort[1:50,]
effect.sort.50$category <- c("QCT.bp", "QCT.bp", "dep.kmer2", "dep.kmer2", "raw", "raw", "QCT.bp", "raw", "QCT.nucleotide", "QCT.bp", "ind.kmer2", "ind.kmer2", "dep.kmer3", "QCT.nucleotide", "ind.kmer1", "QCT.nucleotide", "QCT.dimer", "QCT.dimer", "QCT.nucleotide", "ind.kmer1", "ind.kmer1", "QCT.dimer", "QCT.dimer", "dep.kmer2", "QCT.dimer", "QCT.dimer", "raw", "QCT.dimer", "QCT.dimer", "QCT.dimer", "ind.kmer2", "QCT.dimer", "ind.kmer2", "QCT.dimer", "QCT.dimer", "QCT.nucleotide", "dep.kmer2", "QCT.nucleotide", "ind.kmer2", "ind.kmer2", "QCT.nucleotide", "QCT.dimer", "ind.kmer1", "QCT.nucleotide", "QCT.bp", "QCT.dimer", "QCT.dimer", "dep.kmer2", "QCT.bp", "dep.kmer2")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
pdf("Imp.bar.pdf")
ggplot(effect.sort.50, aes(x=reorder(Feature, -NormEdge), y=NormEdge, color=category)) + geom_bar(stat="identity") + labs(title="Feature Importance (Top 50 Features)", x="Feature", y="Normalized Importance") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Paired")
dev.off()
–> calculate average R2 in each set
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults
sed '1d' <FILE> | awk '{ total += $1; count++ } END { print total/count }'
# e.coli.tensor.single.bp.dimers.noncorrelated.R2_foldResults.txt = 0.259283
# e.coli.tensor.single.bp.dimers.R2_foldResults.txt = 0.25925
# e.coli.tensor.dimers.noDWT.R2_foldResults.txt = 0.252797
# cas9only.raw.onehot.tensor.R2_foldResults.txt = 0.250661
# cas9only.onehot.tensor.R2_foldResults.txt = 0.25007
# cas9only.raw.tensor.R2_foldResults.txt = 0.242835
# cas9only.raw.onehot.R2_foldResults.txt = 0.192075
# cas9only.tensor.R2_foldResults.txt = 0.238398
# cas9only.onehot.R2_foldResults.txt = 0.191217
# cas9only.raw.R2_foldResults.txt = 0.0406861
# raw = 5
# onehot = 5885 + 4 PAM = 5889
# QCT = 80
# QCT bp = 80
# QCT dimer = 94
# all = 6148
# noncorrelated = 6091
df <- data.frame(feature.set = c("raw", "onehot", "QCT", "raw + onehot", "raw + QCT", "onehot + QCT", "raw + onehot + QCT", "raw + onehot + QCT + dimers", "raw + onehot + QCT + dimers + bp", "non-correlated"), R2 = c(0.0406861, 0.191217, 0.238398, 0.192075, 0.242835, 0.25007, 0.250661, 0.252797, 0.25925, 0.259283), feature.count = c(5, 5889, 80, 5+5889, 5+80, 5889+80, 5+5889+80, 5+5889+80+94, 6148, 6091))
library(ggplot2)
library(RColorBrewer)
ggplot(df) + geom_bar(aes(x=feature.set, y=R2, fill=feature.set), stat="identity") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Paired") + geom_line(aes(x=feature.set, y=feature.count, group=1),inherit.aes = FALSE, color="blue",size=2) + scale_y_continuous(name = "R2", sec.axis=sec_axis(~ . , name="Feature Count"), limits=c(0,6200)) + labs(title = "Size and Prediction Accuracy of Feature Subsets", x = "Feature Set", y = "R2")
library(dplyr)
library(tidyr)
library(ggplot2)
library(reshape2)
library(RColorBrewer)
setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP")
#setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
doench <- read.delim("Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.txt", header=T, sep="\t", stringsAsFactors = F)
ncol(doench)
# 6174
nrow(doench)
# 673
var(doench$cut.score)
# 0.03162405
sd(doench$cut.score)
# 0.1778315
mean(doench$cut.score)
# 0.1678287
doench.num <- mutate_all(doench[,2:ncol(doench)], function(x) as.numeric(as.character(x)))
doench.num$cut.score <- (doench.num$cut.score - min(doench.num$cut.score)) / (max(doench.num$cut.score) - min(doench.num$cut.score))
doench.num <- cbind(data.frame("sgRNAID" = doench$sgRNAID), doench.num)
var(doench.num$cut.score)
# 0.03670343
sd(doench.num$cut.score)
# 0.1915814
mean(doench.num$cut.score)
# 0.1797033
ggplot(doench, aes(x=cut.score)) + geom_density() + theme_classic() + labs(title = "H.sapien (Doench et al., 2014)", x = "Experimental Cutting Efficiency Score", y = "Density")
#setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
lipolytica <- read.delim("y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.txt", header=T, sep="\t", stringsAsFactors = F)
ncol(lipolytica)
# 6174
nrow(lipolytica)
# 45271
var(lipolytica$cut.score)
# 7.647933
sd(lipolytica$cut.score)
# 2.76549
mean(lipolytica$cut.score)
# -3.50674
names(lipolytica)[names(lipolytica) == 'cut.score.x'] <- 'cut.score'
lipolytica.num <- mutate_all(lipolytica[,2:ncol(lipolytica)], function(x) as.numeric(as.character(x)))
lipolytica.num$cut.score <- (lipolytica.num$cut.score - min(lipolytica.num$cut.score)) / (max(lipolytica.num$cut.score) - min(lipolytica.num$cut.score))
lipolytica.num <- cbind(data.frame("sgRNAID" = lipolytica$sgRNAID), lipolytica.num)
var(lipolytica.num$cut.score)
# 0.02594887
sd(lipolytica.num$cut.score)
# 0.1610865
mean(lipolytica.num$cut.score)
# 0.3389167
ggplot(lipolytica, aes(x=cut.score)) + geom_density() + theme_classic() + labs(title = "Y.lipolytica (Baisya et al., 2021)", x = "Experimental Cutting Efficiency Score", y = "Density")
#setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
ecoli <- read.delim("Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.txt", header=T, sep="\t", stringsAsFactors = F)
ncol(ecoli)
# 6174
nrow(ecoli)
# 40468
var(ecoli$cut.score)
# 110.5085
sd(ecoli$cut.score)
# 10.5123
mean(ecoli$cut.score)
# 24.56023
ecoli.num <- mutate_all(ecoli[,2:ncol(ecoli)], function(x) as.numeric(as.character(x)))
ecoli.num$cut.score <- (ecoli.num$cut.score - min(ecoli.num$cut.score)) / (max(ecoli.num$cut.score) - min(ecoli.num$cut.score))
ecoli.num <- cbind(data.frame("sgRNAID" = ecoli$sgRNAID), ecoli.num)
var(ecoli.num$cut.score)
# 0.04721325
sd(ecoli.num$cut.score)
# 0.2172861
mean(ecoli.num$cut.score)
# 0.5076525
ggplot(ecoli, aes(x=cut.score)) + geom_density() + theme_classic() + labs(title = "E.coli (Guo et al., 2018)", x = "Experimental Cutting Efficiency Score", y = "Density")
df <- data.frame("E.coli" = c(40468, 0.5076525, 0.04721325, 0.2172861), "Y.lipolytica" = c(45271, 0.3389167, 0.02594887, 0.1610865), "H.sapien" = c(673, 0.1797033, 0.03670343, 0.1915814))
df$label <- c("Sample Size", "Mean", "Variance", "Standard Deviation")
df.melt <- melt(df)
ggplot(df.melt) + geom_bar(aes(x=variable, y=value, fill=variable), stat="identity") + facet_wrap(. ~ label, scales="free") + theme_classic() + theme(legend.position="bottom") + labs(x = "", y = "") + scale_fill_brewer(palette="Set1")
ecoli.num.df <- ecoli.num[,1:2]
ecoli.num.df$dataset <- "E.coli"
doench.num.df <- doench.num[,1:2]
doench.num.df$dataset <- "H.sapien"
lipolytica.num.df <- lipolytica.num[,1:2]
lipolytica.num.df$dataset <- "Y.lipolytica"
ecoli.doench.lip <- rbind(ecoli.num.df, doench.num.df, lipolytica.num.df)
ggplot(ecoli.doench.lip, aes(x=cut.score, color=dataset)) + geom_density() + theme_classic() + theme(legend.position="bottom") + labs(x = "Normalized Cutting Efficiency Score", y = "Density") + scale_color_brewer(palette="Set2")
#!/bin/bash -l
#BSUB -P SYB105
#BSUB -W 02:15
#BSUB -nnodes 50
#BSUB -J yeast.test_0
#BSUB -o yeast.test_0.o%J
#BSUB -e yeast.test_0.e%J
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/yeast.test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/yeast.test
/usr/bin/time -f "%e" jsrun -n 1 -a 1 -c 40 -bpacked:40 /gpfs/alpine/syb105/proj-shared/Projects/iRF/IterativeRanger/cpp_version/build/ranger --file /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.features_overlap_noSampleIDs.txt --yfile /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.score_overlap_noSampleIDs.txt --predict /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/cut.score/e.coli.tensor.single.bp.dimers_cut.score.forest --treetype 3 --depvarname cut.score --impmeasure 1 --nthreads 160 --useMPI 0 --outprefix ecoli.model.yeast.test --outputDirectory /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/yeast.test > /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/yeast.test/ecoli.model.yeast.test.o
# bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/ecoli.model.yeast.test.sh
#### test the output
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/")
score <- read.delim("y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.score_overlap_noSampleIDs.txt", header=T, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/yeast.test/")
predict <- read.delim("ecoli.model.yeast.test.prediction", header=T, sep="\t")
score.predict <- cbind(score, predict)
cor(score.predict$cut.score, score.predict$Predictions.)
#-0.02424251
#!/bin/bash -l
#BSUB -P SYB105
#BSUB -W 02:15
#BSUB -nnodes 50
#BSUB -J doench.test_0
#BSUB -o doench.test_0.o%J
#BSUB -e doench.test_0.e%J
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/doench.test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/doench.test
/usr/bin/time -f "%e" jsrun -n 1 -a 1 -c 40 -bpacked:40 /gpfs/alpine/syb105/proj-shared/Projects/iRF/IterativeRanger/cpp_version/build/ranger --file /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.features_overlap_noSampleIDs.txt --yfile /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.score_overlap_noSampleIDs.txt --predict /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/cut.score/e.coli.tensor.single.bp.dimers_cut.score.forest --treetype 3 --depvarname cut.score --impmeasure 1 --nthreads 160 --useMPI 0 --outprefix ecoli.model.doench.test --outputDirectory /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/doench.test > /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/doench.test/ecoli.model.doench.test.o
# bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/ecoli.model.doench.test.sh
#### test the output
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/")
score <- read.delim("Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.score_overlap_noSampleIDs.txt", header=T, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/doench.test/")
predict <- read.delim("ecoli.model.doench.test.prediction", header=T, sep="\t")
score.predict <- cbind(score, predict)
cor(score.predict$cut.score, score.predict$Predictions.)
# 0.06557198
## R2 ~ 0.004299685
https://www.genome.wisc.edu/Gerdes2003/
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
essential.genes <- read.delim("essential.genes.header.txt", header=T, sep="\t", stringsAsFactors = F)
sgRNA <- read.delim("sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
essential.genes.df <- as.data.frame(essential.genes)
sgRNA.df <- as.data.frame(sgRNA)
sgRNA.essential <- subset(sgRNA.df, sgRNA.df$gene..promoter. %in% essential.genes.df$gene)
sgRNA.nonessential <- subset(sgRNA.df, !(sgRNA.df$gene..promoter. %in% sgRNA.essential$gene..promoter.))
length(unique(essential.genes$gene))
# 4162
length(unique(sgRNA.df$gene..promoter.))
# 4135
length(unique(sgRNA.essential$gene..promoter.))
# 3287
nrow(sgRNA)
# 56251
nrow(sgRNA.essential)
# 42944
summary(sgRNA$score)
# Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
# 0.00 17.23 27.18 24.56 32.69 48.38 15757
summary(sgRNA.nonessential$score)
# Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
# 0.00 17.06 27.00 24.44 32.68 46.12 5812
summary(sgRNA.essential$score)
# Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
# 0.00 17.27 27.21 24.59 32.69 48.38 9945
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J mar15.matrix
#SBATCH -N 4
#SBATCH -t 10:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
R CMD BATCH mar15.matrix.R
#sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/mar15.matrix.sh
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
structure <- read.delim("Ecoli.allCas9.structure.txt", header=T, sep="\t", stringsAsFactors = F)
nuc <- read.delim("Ecoli.allCas9.nuc.count.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("Ecoli.allCas9.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(1:2)]
colnames(score.df) <- c("sgRNAID", "cut.score")
structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]
# structure, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])
structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"
structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]
structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "seq", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")
## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
onehot.ind1 <- read.delim("Ecoli.allCas9_ind1.txt", header=T, sep=" ")
# 5 columns (-1 for sgRNAID)
onehot.ind2 <- read.delim("Ecoli.allCas9_ind2.txt", header=T, sep=" ")
# 17
onehot.dep1 <- read.delim("Ecoli.allCas9_dep1.txt", header=F, sep=" ")
# 81
onehot.dep2 <- read.delim("Ecoli.allCas9_dep2.txt", header=F, sep=" ")
# 321
onehot.dep3 <- read.delim("Ecoli.allCas9_dep3.txt", header=F, sep=" ")
# 1154 <-- have 1218 for the labels??
onehot.dep4 <- read.delim("Ecoli.allCas9_dep4.txt", header=F, sep=" ")
# 4354 <-- have 5121 for the labels??
# 5926 total features...
colnames(onehot.dep1)[1] <- "sgRNAID"
colnames(onehot.dep2)[1] <- "sgRNAID"
colnames(onehot.dep3)[1] <- "sgRNAID"
colnames(onehot.dep4)[1] <- "sgRNAID"
onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)-1], onehot.dep2[,1:ncol(onehot.dep2)-1], by="sgRNAID")
onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)-1], by="sgRNAID")
onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)-1], by="sgRNAID")
onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"
### getting the labels for the onehot matrix
# setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
# setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/onehot")
# onehot.ind1 <- read.delim("ind1.head.txt", header=T, sep=" ")
# onehot.ind2 <- read.delim("ind2.head.txt", header=T, sep=" ")
# onehot.dep1 <- read.delim("dep1.txt", header=F, sep=" ")
# onehot.dep2 <- read.delim("dep2.txt", header=F, sep=" ")
# onehot.dep3 <- read.delim("dep3.txt", header=F, sep=" ")
# onehot.dep3 <- onehot.dep3[,1:1154]
# onehot.dep4 <- read.delim("dep4.txt", header=F, sep=" ")
# onehot.dep4 <- onehot.dep4[,1:4354]
# colnames(onehot.dep1)[1] <- "sgRNAID"
# colnames(onehot.dep2)[1] <- "sgRNAID"
# colnames(onehot.dep3)[1] <- "sgRNAID"
# colnames(onehot.dep4)[1] <- "sgRNAID"
#
# onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
# onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)], onehot.dep2[,1:ncol(onehot.dep2)], by="sgRNAID")
# onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)], by="sgRNAID")
# onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)], by="sgRNAID")
# onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
# write.table(onehot, "onehot.labels.txt", quote=F, row.names=F, sep="\t")
# onehot.t <- data.frame(t(onehot))
# 6754 columns <-- corrected to match matrix used = 5926 total features
data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))
df.melt <- melt(data.onehot[,c(1,2,4:ncol(data.onehot))], id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
write.table(df.id, "e.coli.structure.temp.gc.onehot1to4.txt", quote=F, row.names=F, sep="\t")
#
# pam (distance and nucleotide)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
sgRNA.pam <- read.table("ecoli.sgRNA.closestPAM.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.pam.sub <- sgRNA.pam[,c(4,12,13)]
colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]
sgRNA.pam.df$id <- "Cas9"
sgRNA.pam.id <- unite(sgRNA.pam.df, "sgRNAID", c(sgRNAID, id), sep="_")
score <- read.delim("Ecoli.allCas9.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(1:2)]
colnames(score.df) <- c("sgRNAID", "cut.score")
score.location <- left_join(score.df, sgRNA.pam.id, by="sgRNAID")
score.location$scale <- 0
df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.pam.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df <- read.delim("e.coli.structure.temp.gc.onehot1to4.txt", header=T, sep="\t")
df.onehot.dcast <- df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.onehot.pam <- left_join(df.onehot.dcast, df.pam.dcast, by=c("sgRNAID"))
df.onehot.pam.na <- na.omit(df.onehot.pam)
nrow(df.onehot.pam.na)
#
# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
sgRNA.genes <- read.table("sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- sgRNA.genes[,c(4,14)]
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")
sgRNA.genes.df$id <- "Cas9"
sgRNA.genes.id <- unite(sgRNA.genes.df, "sgRNAID", c(sgRNAID, id), sep="_")
score.location <- left_join(score.df, sgRNA.genes.id, by=c("sgRNAID"))
score.location$scale <- 0
df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.location.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.location.dcast.na <- na.omit(df.location.dcast)
df.pam.location <- inner_join(df.location.dcast.na, df.onehot.pam.na, by=c("sgRNAID"))
nrow(df.pam.location)
#
write.table(df.pam.location, "Ecoli.allCas9.raw.matrix.txt", quote=F, row.names=F, sep="\t")
# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
# Monomer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Monomer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
seq <- read.delim("Ecoli.allCas9.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq) <- seq[,1]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Ecoli.quantum.monomer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Ecoli.quantum.monomer.melt.txt", quote=F, row.names=F, sep="\t")
# Basepair
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Basepair.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
seq <- read.delim("Ecoli.allCas9.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq) <- seq[,1]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Ecoli.quantum.basepair.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Ecoli.quantum.basepair.melt.txt", quote=F, row.names=F, sep="\t")
# Dimer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Dimer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
seq <- read.delim("Ecoli.allCas9.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.dimer <- seq %>% unite("p1", p1:p2, remove=F, sep= "") %>% unite("p2", p2:p3, remove=F, sep= "") %>% unite("p3", p3:p4, remove=F, sep= "") %>% unite("p4", p4:p5, remove=F, sep= "") %>% unite("p5", p5:p6, remove=F, sep= "") %>% unite("p6", p6:p7, remove=F, sep= "") %>% unite("p7", p7:p8, remove=F, sep= "") %>% unite("p8", p8:p9, remove=F, sep= "") %>% unite("p9", p9:p10, remove=F, sep= "") %>% unite("p10", p10:p11, remove=F, sep= "") %>% unite("p11", p11:p12, remove=F, sep= "") %>% unite("p12", p12:p13, remove=F, sep= "") %>% unite("p13", p13:p14, remove=F, sep= "") %>% unite("p14", p14:p15, remove=F, sep= "") %>% unite("p15", p15:p16, remove=F, sep= "") %>% unite("p16", p16:p17, remove=F, sep= "") %>% unite("p17", p17:p18, remove=F, sep= "") %>% unite("p18", p18:p19, remove=F, sep= "") %>% unite("p19", p19:p20, remove=T, sep= "")
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq.dimer) <- seq.dimer[,1]
seq.df <- seq.dimer[,1:20]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Ecoli.quantum.dimer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Ecoli.quantum.dimer.melt.txt", quote=F, row.names=F, sep="\t")
# Trimer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Trimer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
seq <- read.delim("Ecoli.allCas9.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.trimer <- seq %>% unite("p1", p1:p3, remove=F, sep= "") %>% unite("p2", p2:p4, remove=F, sep= "") %>% unite("p3", p3:p5, remove=F, sep= "") %>% unite("p4", p4:p6, remove=F, sep= "") %>% unite("p5", p5:p7, remove=F, sep= "") %>% unite("p6", p6:p8, remove=F, sep= "") %>% unite("p7", p7:p9, remove=F, sep= "") %>% unite("p8", p8:p10, remove=F, sep= "") %>% unite("p9", p9:p11, remove=F, sep= "") %>% unite("p10", p10:p12, remove=F, sep= "") %>% unite("p11", p11:p13, remove=F, sep= "") %>% unite("p12", p12:p14, remove=F, sep= "") %>% unite("p13", p13:p15, remove=F, sep= "") %>% unite("p14", p14:p16, remove=F, sep= "") %>% unite("p15", p15:p17, remove=F, sep= "") %>% unite("p16", p16:p18, remove=F, sep= "") %>% unite("p17", p17:p19, remove=F, sep= "") %>% unite("p18", p18:p20, remove=F, sep= "")
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq.trimer) <- seq.trimer[,1]
seq.df <- seq.trimer[,1:19]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Ecoli.quantum.trimer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Ecoli.quantum.trimer.melt.txt", quote=F, row.names=F, sep="\t")
# Tetramer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Tetramer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
seq <- read.delim("Ecoli.allCas9.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.tetramer <- seq %>% unite("p1", p1:p4, remove=F, sep= "") %>% unite("p2", p2:p5, remove=F, sep= "") %>% unite("p3", p3:p6, remove=F, sep= "") %>% unite("p4", p4:p7, remove=F, sep= "") %>% unite("p5", p5:p8, remove=F, sep= "") %>% unite("p6", p6:p9, remove=F, sep= "") %>% unite("p7", p7:p10, remove=F, sep= "") %>% unite("p8", p8:p11, remove=F, sep= "") %>% unite("p9", p9:p12, remove=F, sep= "") %>% unite("p10", p10:p13, remove=F, sep= "") %>% unite("p11", p11:p14, remove=F, sep= "") %>% unite("p12", p12:p15, remove=F, sep= "") %>% unite("p13", p13:p16, remove=F, sep= "") %>% unite("p14", p14:p17, remove=F, sep= "") %>% unite("p15", p15:p18, remove=F, sep= "") %>% unite("p16", p16:p19, remove=F, sep= "") %>% unite("p17", p17:p20, remove=F, sep= "")
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq.tetramer) <- seq.tetramer[,1]
seq.df <- seq.tetramer[,1:18]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Ecoli.quantum.tetramer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Ecoli.quantum.tetramer.melt.txt", quote=F, row.names=F, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
monomer <- read.delim("Ecoli.quantum.monomer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
basepair <- read.delim("Ecoli.quantum.basepair.melt.txt", header=T, sep="\t", stringsAsFactors = F)
dimer <- read.delim("Ecoli.quantum.dimer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
trimer <- read.delim("Ecoli.quantum.trimer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
tetramer <- read.delim("Ecoli.quantum.tetramer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
monomer.basepair <- rbind(monomer, basepair)
monomer.basepair.dimer <- rbind(monomer.basepair, dimer)
monomer.basepair.dimer.trimer <- rbind(monomer.basepair.dimer, trimer)
monomer.basepair.dimer.trimer.tetramer <- rbind(monomer.basepair.dimer.trimer, tetramer)
write.table(monomer.basepair.dimer.trimer.tetramer, "Ecoli.15mar22.quantum.melt.txt", quote=F, row.names=F, sep="\t")
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("Ecoli.allCas9.raw.matrix.txt", header=T, sep="\t", stringsAsFactors = F)
# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
tensor <- read.delim("Ecoli.15mar22.quantum.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0
tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0
df.score <- unique(df[,c(1,2)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
colnames(tensor.score.order) <- c("cut.score", "feature.scale", "sgRNAID", "value")
df.dcast <- tensor.score.order %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
nrow(df.dcast.na)
# 40468
df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
write.table(df.location, "Ecoli.finalquantum.txt", quote=F, row.names=F, sep="\t")
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("Ecoli.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df.cut <- df %>% select(-grep("cut.score.y.y", names(df)), -grep("cut.score.y", names(df)), -grep("cut.score.x.x", names(df)))
df.num <- mutate_all(df.cut[,2:ncol(df.cut)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df.cut[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"
write.table(df.all, "Ecoli.finalquantum.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "Ecoli.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "Ecoli.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "Ecoli.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Ecoli.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Ecoli.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.all[,2]), "Ecoli.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
# run python scripts on Andes
# run job submissions on Summit
# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]
# Andes
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName e.coli.finalquantum --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/Submits/submit_full_e.coli.finalquantum_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/Submits/submit_train_e.coli.finalquantum_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/Submits/submit_test_e.coli.finalquantum_0.sh
# Andes
module load python/3.7-anaconda3
vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt e.coli.finalquantum
# 0.2491263918468429
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/e.coli.finalquantum_cut.score.importance4 | head
# p20basepair.Hbond.energyraw: 129834
# p19dimer.Hbond.stackingraw: 83816.2
# p20basepair.Hlgap.eVEraw: 72112.9
# p1tetramer.Hbond.energyraw: 47947.1
# p18trimer.Hbond.stackingraw: 45361.6
# p11tetramer.Hbond.energyraw: 44435.1
# V231.xsgRNA.raw: 40206.9 <-- p15.CC
# p18dimer.Hbond.energyraw: 39997
# p18dimer.Hbond.stackingraw: 37078.4
# sgRNA.tempsgRNA.raw: 29678.1
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("e.coli.finalquantum_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.5026295
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score e.coli.finalquantum
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/cut.score/RIT.run
# p20basepair.Hlgap.eVEraw cut.score 0.06696141562856729 0.03657642590538788 27636.877 22.7097104529399
# p19dimer.Hbond.stackingraw cut.score 0.03464100776701448 -0.02177720806464378 23222.226 23.879833753099728
# p20basepair.Hbond.energyraw cut.score 0.030869948756904336 -0.03726636075285378 12671.673 26.854588230689323
# p11tetramer.Hbond.energyraw cut.score 0.02157359665811225 -0.017156906797274052 15825.268 26.12220936768207
# p1tetramer.Hbond.energyraw cut.score 0.021430069435486497 0.009695123833459367 19851.039 27.05879242091512
# V231.xsgRNA.raw cut.score 0.02114294017802617 -0.030188561170599513 16749.994 24.112707842750073
# p18trimer.Hbond.stackingraw cut.score 0.018561357882087588 0.02252639747755138 13935.941 23.14311630354901
# p18dimer.Hbond.stackingraw cut.score 0.015586899271518235 0.01981957306973018 11737.613 23.47773385851222
# p18trimer.Hbond.energyraw cut.score 0.015115507543728361 -0.007948753966886423 10510.788 23.662072722110985
# p18dimer.Hbond.energyraw cut.score 0.014481062606810093 -0.015054629467698068 8062.311 22.76173208112195
#### sorted by feature effect (not importance)
sort -k4rg e.coli.finalquantum_cut.score.importance4.effect | head
# p11basepair.Hlgap.eVEraw cut.score 6.66202719563372e-07 41.99212346291712 0.421 -104.56922208388855
# V246sgRNA.raw cut.score 3.915671009763828e-07 18.985163076923072 0.094 12.388724153846157
# V4055sgRNA.raw cut.score 7.190051765502139e-07 18.389079629629627 0.114 12.940920370370371
# V161.xsgRNA.raw cut.score 2.985695404334478e-07 16.7936 0.053 -21.827900800000002
# V137.ysgRNA.raw cut.score 1.9847267533431217e-07 16.6169 0.051 -25.763079500000003
# V86.xsgRNA.raw cut.score 6.236383133538739e-07 12.254980263157897 0.089 -241.95720486973687
# V132sgRNA.raw cut.score 2.7053179487203707e-07 11.168820930232553 0.178 3.776982591860481
# V209.xsgRNA.raw cut.score 2.5923627352298865e-07 9.804176712328768 0.081 0.7271831130136945
# V42.xsgRNA.raw cut.score 7.193642751525615e-07 7.330155415809548 0.259 -201.256495464747
# p16basepair.Hbond.energyraw cut.score 2.54235077363212e-07 6.7097000000000016 0.062 -196.83178666000006
# p11basepair.Hlgap.eVE, p1.GGCA, p16.GCCC, p10.GG, p3.ACG, p6.CA, p1.TAAT, p13.GG, p11.A, p16basepair.Hbond.energy
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("Ecoli.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)
# 6234
monomer <- df %>% select(grep("monomer", names(df)))
# 40
bp <- df %>% select(grep("basepair", names(df)))
# 60
dimer <- df %>% select(grep("dimer", names(df)))
# 76
trimer <- df %>% select(grep("trimer", names(df)))
# 72
tetramer <- df %>% select(grep("tetramer", names(df)))
# 68
monomer.bp <- cbind(monomer, bp)
# 100
monomer.bp.dimer <- cbind(monomer, bp, dimer)
# 176
monomer.bp.dimer.trimer <- cbind(monomer, bp, dimer, trimer)
# 248
monomer.bp.dimer.trimer.tetramer <- cbind(monomer, bp, dimer, trimer, tetramer)
# 316
df.monomer <- cbind(df[,1:2], monomer)
df.bp <- cbind(df[,1:2], bp)
df.dimer <- cbind(df[,1:2], dimer)
df.trimer <- cbind(df[,1:2], trimer)
df.tetramer <- cbind(df[,1:2], tetramer)
df.monomer.bp <- cbind(df.monomer, bp)
df.monomer.bp.dimer <- cbind(df.monomer, bp, dimer)
df.monomer.bp.dimer.trimer <- cbind(df.monomer, bp, dimer, trimer)
df.monomer.bp.dimer.trimer.tetramer <- cbind(df.monomer, bp, dimer, trimer, tetramer)
write.table(df.monomer[,c(1,3:ncol(df.monomer))], "Ecoli.monomer.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.monomer[,c(1,3:ncol(df.monomer))], "Ecoli.monomer.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.monomer[,3:ncol(df.monomer)], "Ecoli.monomer.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.bp[,c(1,3:ncol(df.bp))], "Ecoli.bp.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.bp[,c(1,3:ncol(df.bp))], "Ecoli.bp.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.bp[,3:ncol(df.bp)], "Ecoli.bp.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.dimer[,c(1,3:ncol(df.dimer))], "Ecoli.dimer.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.dimer[,c(1,3:ncol(df.dimer))], "Ecoli.dimer.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.dimer[,3:ncol(df.dimer)], "Ecoli.dimer.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.trimer[,c(1,3:ncol(df.trimer))], "Ecoli.trimer.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.trimer[,c(1,3:ncol(df.trimer))], "Ecoli.trimer.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.trimer[,3:ncol(df.trimer)], "Ecoli.trimer.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.tetramer[,c(1,3:ncol(df.tetramer))], "Ecoli.tetramer.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.tetramer[,c(1,3:ncol(df.tetramer))], "Ecoli.tetramer.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.tetramer[,3:ncol(df.tetramer)], "Ecoli.tetramer.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.monomer.bp[,c(1,3:ncol(df.monomer.bp))], "Ecoli.monomer.bp.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.monomer.bp[,c(1,3:ncol(df.monomer.bp))], "Ecoli.monomer.bp.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.monomer.bp[,3:ncol(df.monomer.bp)], "Ecoli.monomer.bp.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.monomer.bp.dimer[,c(1,3:ncol(df.monomer.bp.dimer))], "Ecoli.monomer.bp.dimer.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.monomer.bp.dimer[,c(1,3:ncol(df.monomer.bp.dimer))], "Ecoli.monomer.bp.dimer.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.monomer.bp.dimer[,3:ncol(df.monomer.bp.dimer)], "Ecoli.monomer.bp.dimer.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.monomer.bp.dimer.trimer[,c(1,3:ncol(df.monomer.bp.dimer.trimer))], "Ecoli.monomer.bp.dimer.trimer.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.monomer.bp.dimer.trimer[,c(1,3:ncol(df.monomer.bp.dimer.trimer))], "Ecoli.monomer.bp.dimer.trimer.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.monomer.bp.dimer.trimer[,3:ncol(df.monomer.bp.dimer.trimer)], "Ecoli.monomer.bp.dimer.trimer.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.monomer.bp.dimer.trimer.tetramer[,c(1,3:ncol(df.monomer.bp.dimer.trimer.tetramer))], "Ecoli.monomer.bp.dimer.trimer.tetramer.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.monomer.bp.dimer.trimer.tetramer[,c(1,3:ncol(df.monomer.bp.dimer.trimer.tetramer))], "Ecoli.monomer.bp.dimer.trimer.tetramer.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.monomer.bp.dimer.trimer.tetramer[,3:ncol(df.monomer.bp.dimer.trimer.tetramer)], "Ecoli.monomer.bp.dimer.trimer.tetramer.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
# run python scripts on Andes
# run job submissions on Summit
# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]
# Andes
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/bp
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/dimer
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/trimer
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/tetramer
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer.trimer
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer.trimer.tetramer
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName quantum.monomer --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.monomer.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/bp
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName quantum.bp --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.bp.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/dimer
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName quantum.dimer --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.dimer.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/trimer
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName quantum.trimer --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.trimer.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/tetramer
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName quantum.tetramer --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.tetramer.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName quantum.monomer.bp --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.monomer.bp.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName quantum.monomer.bp.dimer --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.monomer.bp.dimer.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer.trimer
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName quantum.monomer.bp.dimer.trimer --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.monomer.bp.dimer.trimer.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer.trimer.tetramer
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName quantum.monomer.bp.dimer.trimer.tetramer --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.monomer.bp.dimer.trimer.tetramer.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer/Submits/submit_full_quantum.monomer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/bp/Submits/submit_full_quantum.bp_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/dimer/Submits/submit_full_quantum.dimer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/trimer/Submits/submit_full_quantum.trimer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/tetramer/Submits/submit_full_quantum.tetramer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp/Submits/submit_full_quantum.monomer.bp_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer/Submits/submit_full_quantum.monomer.bp.dimer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer.trimer/Submits/submit_full_quantum.monomer.bp.dimer.trimer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer.trimer.tetramer/Submits/submit_full_quantum.monomer.bp.dimer.trimer.tetramer_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer/Submits/submit_train_quantum.monomer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/bp/Submits/submit_train_quantum.bp_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/dimer/Submits/submit_train_quantum.dimer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/trimer/Submits/submit_train_quantum.trimer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/tetramer/Submits/submit_train_quantum.tetramer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp/Submits/submit_train_quantum.monomer.bp_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer/Submits/submit_train_quantum.monomer.bp.dimer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer.trimer/Submits/submit_train_quantum.monomer.bp.dimer.trimer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer.trimer.tetramer/Submits/submit_train_quantum.monomer.bp.dimer.trimer.tetramer_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer/Submits/submit_test_quantum.monomer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/bp/Submits/submit_test_quantum.bp_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/dimer/Submits/submit_test_quantum.dimer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/trimer/Submits/submit_test_quantum.trimer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/tetramer/Submits/submit_test_quantum.tetramer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp/Submits/submit_test_quantum.monomer.bp_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer/Submits/submit_test_quantum.monomer.bp.dimer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer.trimer/Submits/submit_test_quantum.monomer.bp.dimer.trimer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer.trimer.tetramer/Submits/submit_test_quantum.monomer.bp.dimer.trimer.tetramer_0.sh
# Andes
module load python/3.7-anaconda3
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt quantum.monomer
# 0.23912958274324625
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/quantum.monomer_cut.score.importance4 | head
# p20monomer.No.electronsraw: 234402
# p18monomer.No.electronsraw: 165851
# p19monomer.HLgap.eVraw: 116336
# p19monomer.No.electronsraw: 106237
# p17monomer.No.electronsraw: 79784.1
# p16monomer.No.electronsraw: 79011.8
# p15monomer.No.electronsraw: 49554.4
# p15monomer.HLgap.eVraw: 44367.4
# p17monomer.HLgap.eVraw: 36062.9
# p14monomer.HLgap.eVraw: 35593.3
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/bp
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt quantum.bp
# 0.10865513159923842
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/quantum.bp_cut.score.importance4 | head
# p20basepair.Hbond.energyraw: 132049
# p20basepair.Hlgap.eVEraw: 76184.5
# p18basepair.Hlgap.eVEraw: 45633
# p18basepair.Hbond.energyraw: 44610.9
# p14basepair.Hbond.energyraw: 14119.6
# p11basepair.Hbond.energyraw: 13603.3
# p14basepair.Hlgap.eVEraw: 13600.5
# p16basepair.Hbond.energyraw: 12756.2
# p16basepair.Hlgap.eVEraw: 12573.6
# p11basepair.Hlgap.eVEraw: 12563.8
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/dimer
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt quantum.dimer
# 0.24662278990401446
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/quantum.dimer_cut.score.importance4 | head
# p19dimer.Hbond.energyraw: 282852
# p18dimer.Hbond.energyraw: 94006.1
# p16dimer.Hbond.stackingraw: 82983.1
# p19dimer.HLgap.eVEraw: 72357.1
# p15dimer.Hbond.energyraw: 70803.2
# p18dimer.Hbond.stackingraw: 67454.3
# p15dimer.Hbond.stackingraw: 65940.3
# p13dimer.Hbond.energyraw: 63473.3
# p19dimer.Hbond.stackingraw: 55555.9
# p18dimer.HLgap.eVEraw: 51251.5
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/trimer
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt quantum.trimer
# 0.2343633085914134
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/quantum.trimer_cut.score.importance4 | head
# p18trimer.Hbond.energyraw: 379823
# p18trimer.Hlgap.eVEraw: 125230
# p17trimer.Hbond.stackingraw: 123012
# p15trimer.Hbond.stackingraw: 76675.7
# p14trimer.Hbond.energyraw: 62165.7
# p15trimer.Hbond.energyraw: 45814.7
# p1trimer.Hbond.energyraw: 43700.5
# p11trimer.Hbond.energyraw: 43260.1
# p12trimer.Hbond.energyraw: 42536.3
# p13trimer.Hbond.energyraw: 41079.6
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/tetramer
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt quantum.tetramer
# 0.22809195591591117
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/quantum.tetramer_cut.score.importance4 | head
# p17tetramer.Hbond.energyraw: 313042
# p16tetramer.Hbond.stackingraw: 176979
# p17tetramer.Hlgap.eVEraw: 112160
# p16tetramer.Hbond.energyraw: 74317.7
# p15tetramer.Hbond.stackingraw: 74174.2
# p11tetramer.Hbond.energyraw: 71547.9
# p17tetramer.Hbond.stackingraw: 68452.7
# p1tetramer.Hbond.energyraw: 67379.8
# p13tetramer.Hbond.energyraw: 67226.3
# p15tetramer.Hbond.energyraw: 63265.2
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt quantum.monomer.bp
# 0.24123214119568534
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/quantum.monomer.bp_cut.score.importance4 | head
# p19monomer.HLgap.eVraw: 132757
# p20basepair.Hbond.energyraw: 110940
# p18monomer.No.electronsraw: 97866.1
# p20basepair.Hlgap.eVEraw: 81235.4
# p16monomer.No.electronsraw: 80132.5
# p17monomer.No.electronsraw: 74586.9
# p20monomer.No.electronsraw: 73039.8
# p19monomer.No.electronsraw: 49834.4
# p15monomer.No.electronsraw: 49326
# p15monomer.HLgap.eVraw: 47552.6
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt quantum.monomer.bp.dimer
# 0.2503873805742469
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/quantum.monomer.bp.dimer_cut.score.importance4 | head
# p19dimer.Hbond.energyraw: 107949
# p20basepair.Hbond.energyraw: 87396.1
# p16dimer.Hbond.stackingraw: 86288.4
# p20basepair.Hlgap.eVEraw: 76726.1
# p19dimer.Hbond.stackingraw: 72936.8
# p18dimer.Hbond.energyraw: 71045.7
# p18dimer.Hbond.stackingraw: 66421.5
# p15dimer.Hbond.energyraw: 66143.4
# p13dimer.Hbond.energyraw: 63958.4
# p15dimer.Hbond.stackingraw: 62114
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer.trimer
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt quantum.monomer.bp.dimer.trimer
# 0.24631083915041616
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/quantum.monomer.bp.dimer.trimer_cut.score.importance4 | head
# p20basepair.Hbond.energyraw: 90824.1
# p19dimer.Hbond.energyraw: 76022.6
# p20basepair.Hlgap.eVEraw: 71778.3
# p18trimer.Hbond.energyraw: 71042.3
# p19dimer.Hbond.stackingraw: 67615.9
# p15trimer.Hbond.stackingraw: 58769.7
# p11trimer.Hbond.energyraw: 44117.4
# p1trimer.Hbond.energyraw: 43938.7
# p17trimer.Hbond.stackingraw: 43434.2
# p14trimer.Hbond.energyraw: 42841.4
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer.trimer.tetramer
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt quantum.monomer.bp.dimer.trimer.tetramer
# 0.24179354649146106
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/quantum.monomer.bp.dimer.trimer.tetramer_cut.score.importance4 | head
# p20basepair.Hbond.energyraw: 102256
# p20basepair.Hlgap.eVEraw: 70983
# p19dimer.Hbond.stackingraw: 68035.5
# p19dimer.Hbond.energyraw: 65745.6
# p11tetramer.Hbond.energyraw: 65184.4
# p1tetramer.Hbond.energyraw: 61460.2
# p18trimer.Hbond.energyraw: 57405.8
# p13tetramer.Hbond.energyraw: 49726
# p15tetramer.Hbond.stackingraw: 44775.1
# p18dimer.Hbond.stackingraw: 42723.5
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("quantum.monomer_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4775135
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/bp/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("quantum.bp_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.3430826
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/dimer/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("quantum.dimer_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4919964
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/trimer/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("quantum.trimer_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.481761
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/tetramer/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("quantum.tetramer_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4792102
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("quantum.monomer.bp_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4789989
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("quantum.monomer.bp.dimer_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4953076
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer.trimer/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("quantum.monomer.bp.dimer.trimer_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4961969
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer.trimer.tetramer/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("quantum.monomer.bp.dimer.trimer.tetramer_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4920071
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("Ecoli.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)
# 6234
raw <- df[,c(1:2,3,19:21,5916)]
onehot <- df[,c(1:2,4:18,22:5915,5917:5918)]
qct <- df[,c(1:2,5919:6234)]
raw_onehot <- df[,c(1:2,3,19:21,5916,4:18,22:5915,5917:5918)]
raw_qct <- df[,c(1:2,3,19:21,5916,5919:6234)]
onehot_qct <- df[,c(1:2,4:18,22:5915,5917:5918,5919:6234)]
raw_onehot_qct <- df[,c(1:2,3,19:21,5916,4:18,22:5915,5917:5918,5919:6234)]
write.table(raw[,c(1,3:ncol(raw))], "Ecoli.raw.features.txt", quote=F, row.names=F, sep="\t")
write.table(raw[,c(1,3:ncol(raw))], "Ecoli.raw.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(raw[,3:ncol(raw)], "Ecoli.raw.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(onehot[,c(1,3:ncol(onehot))], "Ecoli.onehot.features.txt", quote=F, row.names=F, sep="\t")
write.table(onehot[,c(1,3:ncol(onehot))], "Ecoli.onehot.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(onehot[,3:ncol(onehot)], "Ecoli.onehot.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(qct[,c(1,3:ncol(qct))], "Ecoli.qct.features.txt", quote=F, row.names=F, sep="\t")
write.table(qct[,c(1,3:ncol(qct))], "Ecoli.qct.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(qct[,3:ncol(qct)], "Ecoli.qct.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(raw_onehot[,c(1,3:ncol(raw_onehot))], "Ecoli.raw_onehot.features.txt", quote=F, row.names=F, sep="\t")
write.table(raw_onehot[,c(1,3:ncol(raw_onehot))], "Ecoli.raw_onehot.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(raw_onehot[,3:ncol(raw_onehot)], "Ecoli.raw_onehot.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(raw_qct[,c(1,3:ncol(raw_qct))], "Ecoli.raw_qct.features.txt", quote=F, row.names=F, sep="\t")
write.table(raw_qct[,c(1,3:ncol(raw_qct))], "Ecoli.raw_qct.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(raw_qct[,3:ncol(raw_qct)], "Ecoli.raw_qct.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(onehot_qct[,c(1,3:ncol(onehot_qct))], "Ecoli.onehot_qct.features.txt", quote=F, row.names=F, sep="\t")
write.table(onehot_qct[,c(1,3:ncol(onehot_qct))], "Ecoli.onehot_qct.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(onehot_qct[,3:ncol(onehot_qct)], "Ecoli.onehot_qct.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(raw_onehot_qct[,c(1,3:ncol(raw_onehot_qct))], "Ecoli.raw_onehot_qct.features.txt", quote=F, row.names=F, sep="\t")
write.table(raw_onehot_qct[,c(1,3:ncol(raw_onehot_qct))], "Ecoli.raw_onehot_qct.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(raw_onehot_qct[,3:ncol(raw_onehot_qct)], "Ecoli.raw_onehot_qct.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
# run python scripts on Andes
# run job submissions on Summit
# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]
# Andes
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/onehot
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/qct
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_onehot
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_qct
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/onehot_qct
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_onehot_qct
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName raw --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.raw.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/onehot
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName onehot --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.onehot.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/qct
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName qct --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.qct.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_onehot
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName raw_onehot --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.raw_onehot.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_qct
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName raw_qct --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.raw_qct.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/onehot_qct
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName onehot_qct --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.onehot_qct.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_onehot_qct
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName raw_onehot_qct --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.raw_onehot_qct.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw/Submits/submit_full_raw_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/onehot/Submits/submit_full_onehot_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/qct/Submits/submit_full_qct_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_onehot/Submits/submit_full_raw_onehot_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_qct/Submits/submit_full_raw_qct_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/onehot_qct/Submits/submit_full_onehot_qct_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_onehot_qct/Submits/submit_full_raw_onehot_qct_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw/Submits/submit_train_raw_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/onehot/Submits/submit_train_onehot_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/qct/Submits/submit_train_qct_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_onehot/Submits/submit_train_raw_onehot_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_qct/Submits/submit_train_raw_qct_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/onehot_qct/Submits/submit_train_onehot_qct_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_onehot_qct/Submits/submit_train_raw_onehot_qct_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw/Submits/submit_test_raw_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/onehot/Submits/submit_test_onehot_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/qct/Submits/submit_test_qct_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_onehot/Submits/submit_test_raw_onehot_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_qct/Submits/submit_test_raw_qct_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/onehot_qct/Submits/submit_test_onehot_qct_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_onehot_qct/Submits/submit_test_raw_onehot_qct_0.sh
# Andes
module load python/3.7-anaconda3
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt raw
#
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/raw_cut.score.importance4 | head
# sgRNA.tempsgRNA.raw: 77242.8
# sgRNA.gcsgRNA.raw: 76936.2
# sgRNA.structuresgRNA.raw: 4904.05
# pam.distance0: 1557.88
# gene.distance0: 0
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/onehot
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt onehot
# 0.2600428516356858
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/onehot_cut.score.importance4 | head
# V80.xsgRNA.raw: 125603
# V78.xsgRNA.raw: 97003.8
# CCsgRNA.raw: 65756.1
# V231.xsgRNA.raw: 62699.6
# GGsgRNA.raw: 61549.4
# V76.xsgRNA.raw: 55640.9
# V303.xsgRNA.raw: 55623.9
# V73.xsgRNA.raw: 55053.2
# V72.xsgRNA.raw: 54051.9
# TsgRNA.raw: 50452
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/qct
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt qct
# 0.24183122435585644
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/qct_cut.score.importance4 | head
# p20basepair.Hbond.energyraw: 94414.7
# p19dimer.Hbond.energyraw: 77736.9
# p20basepair.Hlgap.eVEraw: 72590.9
# p11tetramer.Hbond.energyraw: 64640.2
# p19dimer.Hbond.stackingraw: 63317.4
# p1tetramer.Hbond.energyraw: 60242.7
# p18trimer.Hbond.energyraw: 56528.4
# p13tetramer.Hbond.energyraw: 52013.4
# p15tetramer.Hbond.stackingraw: 44911
# p18dimer.Hbond.stackingraw: 42437
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_onehot
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt raw_onehot
# 0.2602828644651521
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/raw_onehot_cut.score.importance4 | head
# V80.xsgRNA.raw: 124786
# V78.xsgRNA.raw: 96487.2
# V231.xsgRNA.raw: 62187.8
# pam.distance0: 57859.9
# CCsgRNA.raw: 56013.6
# V76.xsgRNA.raw: 55856.1
# V303.xsgRNA.raw: 55425.8
# V72.xsgRNA.raw: 54023.2
# V73.xsgRNA.raw: 53518.8
# GGsgRNA.raw: 51386.6
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_qct
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt raw_qct
# 0.24177446035820813
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/raw_qct_cut.score.importance4 | head
# p20basepair.Hbond.energyraw: 99208.4
# p20basepair.Hlgap.eVEraw: 69960.5
# p19dimer.Hbond.stackingraw: 67007.8
# p19dimer.Hbond.energyraw: 65094.9
# p18trimer.Hbond.energyraw: 56776.3
# p1tetramer.Hbond.energyraw: 53759
# p11tetramer.Hbond.energyraw: 48134.6
# p15tetramer.Hbond.stackingraw: 41530.7
# p18dimer.Hbond.stackingraw: 39964.4
# p19dimer.HLgap.eVEraw: 39812.1
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/onehot_qct
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt onehot_qct
# 0.24905182664101577
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/onehot_qct_cut.score.importance4 | head
# p20basepair.Hbond.energyraw: 127860
# p19dimer.Hbond.stackingraw: 83010.5
# p20basepair.Hlgap.eVEraw: 73663.5
# p11tetramer.Hbond.energyraw: 54304.4
# p1tetramer.Hbond.energyraw: 49929.4
# p18trimer.Hbond.stackingraw: 44303.8
# p18dimer.Hbond.energyraw: 40919
# V231.xsgRNA.raw: 39596.2
# p13tetramer.Hbond.energyraw: 36465.7
# p18dimer.Hbond.stackingraw: 33128.6
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_onehot_qct
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt raw_onehot_qct
# 0.24906667479923555
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/raw_onehot_qct_cut.score.importance4 | head
# p20basepair.Hbond.energyraw: 125357
# p19dimer.Hbond.stackingraw: 80519.8
# p20basepair.Hlgap.eVEraw: 74407.4
# p1tetramer.Hbond.energyraw: 47519
# p18trimer.Hbond.stackingraw: 46368.4
# p11tetramer.Hbond.energyraw: 43490.4
# V231.xsgRNA.raw: 40232.6
# p18dimer.Hbond.energyraw: 37379.9
# p18dimer.Hbond.stackingraw: 34258.7
# p13tetramer.Hbond.energyraw: 29700.7
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("raw_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.2007612
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/onehot/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("onehot_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4914184
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/qct/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("qct_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4918057
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_onehot/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("raw_onehot_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4931724
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_qct/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("raw_qct_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4939777
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/onehot_qct/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("onehot_qct_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.500817
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_onehot_qct/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("raw_onehot_qct_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.5019173
require(data.table)
### violin plots of R2 across different models
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults")
#create a list of the files from your target directory
file_list <- list.files(path="/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults")
#initiate a blank data frame, each iteration of the loop will add a column of the data from the given file to this variable
dataset <- data.frame()
for (i in 1:length(file_list)){
temp_data <- fread(file_list[i], stringsAsFactors = F) #read in files using the fread function from the data.table package
dataset <- do.call(cbind, sapply(file_list,data.table::fread, simplify = FALSE)) #for each iteration, bind the new data to the building dataset
}
colnames(dataset) <- c("onehot", "onehot.QCT", "raw.onehot", "raw.onehot.QCT", "raw", "raw.QCT", "QCT", "QCT.dimers", "QCT.single.bp.dimers.noncorrelated", "QCT.single.bp.dimers", "top10", "top100", "top1k", "top20", "top200", "top5", "top50", "top500")
library(ggplot2)
library(reshape2)
library(RColorBrewer)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
# Figure 2A
dataset.subsets2 <- dataset[,c(5,1,3,7,6,2,4,10)]
colnames(dataset.subsets2) <- c("Raw", "One-hot", "Raw + One-hot", "Quantum", "Raw + Quantum", "One-hot + Quantum", "Raw + One-hot + Quantum", "Raw + One-hot + Quantum + Kmers")
dataset.subsets2.melt <- melt(dataset.subsets2)
#pdf("R2.subsets2.violin.pdf")
#ggplot(dataset.subsets2.melt, aes(x=value, y=variable, fill=variable)) + geom_violin(trim=FALSE) + geom_boxplot(width=0.1, fill="white") + scale_color_brewer(palette="Dark2") + labs(title="R2 across iRF runs", x="feature run", y="R2") + theme_minimal() + theme(legend.position = "none")
pdf("R2.subsets2.violin.nocolor.pdf")
ggplot(dataset.subsets2.melt, aes(x=value, y=variable)) + geom_violin(trim=FALSE) + geom_boxplot(width=0.1, fill="white") + labs(title="R2 across iRF models", x="feature run", y="R2") + theme(legend.position = "none") + theme_minimal()
dev.off()
# Figure 2D
dataset.top <- dataset[,c(16,11,14,17,12,15,18,13)]
dataset.top.melt <- melt(dataset.top)
#pdf("R2.top.violin.pdf")
#ggplot(dataset.top.melt, aes(x=value, y=variable, fill=variable)) + geom_violin(trim=FALSE) + geom_boxplot(width=0.1, fill="white") + scale_color_brewer(palette="Dark2") + labs(title="R2 across iRF runs", x="feature run", y="R2") + theme_minimal() + theme(legend.position = "none")
pdf("R2.top.violin.nocolor.pdf")
ggplot(dataset.top.melt, aes(x=value, y=variable)) + geom_violin(trim=FALSE) + geom_boxplot(width=0.1, fill="white") + labs(title="R2 across iRF runs", x="feature run", y="R2") + theme(legend.position = "none") + theme_minimal()
dev.off()
# Figure 2B
### updated output (21 March 2022)
library(ggplot2)
library(reshape2)
library(RColorBrewer)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/cut.score")
effect <- read.delim("e.coli.finalquantum_cut.score.importance4.effect", header=T, sep="\t", stringsAsFactors = F)
effect.sort <- effect[order(-effect$NormEdge),]
effect.sort.50 <- effect.sort[1:50,]
effect.sort.50$category <- c("QCT.bp", "QCT.dimer", "QCT.bp", "QCT.tetramer", "QCT.tetramer", "dep.kmer2", "QCT.trimer", "QCT.dimer", "QCT.trimer", "QCT.dimer", "QCT.dimer", "QCT.tetramer", "raw", "raw", "ind.kmer2", "QCT.tetramer", "QCT.tetramer", "QCT.tetramer", "QCT.tetramer", "QCT.trimer", "QCT.tetramer", "QCT.tetramer", "QCT.trimer", "QCT.monomer", "QCT.tetramer", "QCT.tetramer", "QCT.tetramer", "QCT.trimer", "QCT.dimer", "QCT.tetramer", "QCT.tetramer", "QCT.tetramer", "QCT.tetramer", "QCT.tetramer", "QCT.tetramer", "QCT.trimer", "QCT.tetramer", "QCT.tetramer", "QCT.tetramer", "QCT.tetramer", "QCT.bp", "QCT.tetramer", "QCT.tetramer", "QCT.tetramer", "QCT.tetramer", "QCT.tetramer", "QCT.tetramer", "QCT.tetramer", "QCT.tetramer", "QCT.tetramer")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
pdf("Imp.bar.21march.pdf")
ggplot(effect.sort.50, aes(x=reorder(Feature, -NormEdge), y=NormEdge, color=category)) + geom_bar(stat="identity") + labs(title="Feature Importance (Top 50 Features)", x="Feature", y="Normalized Importance") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Paired")
dev.off()
effect.sort.50$Category <- c("Quantum Basepair", "Quantum Dimer", "Quantum Basepair", "Quantum Tetramer", "Quantum Tetramer", "Onehot Dimer", "Quantum Trimer", "Quantum Dimer", "Quantum Trimer", "Quantum Dimer", "Quantum Dimer", "Quantum Tetramer", "Raw Calculation", "Raw Calculation", "Onehot Dimer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Trimer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Trimer", "Quantum Monomer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Trimer", "Quantum Dimer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Trimer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Basepair", "Quantum Tetramer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Tetramer")
effect.sort.50$Feature.Label <- c("Basepair HL-gap pos20", "Dimer H-stacking pos19", "Basepair H-bond pos20", "Tetramer H-bond pos11", "Tetramer H-bond pos1", "CC pos15", "Trimer H-stacking pos18", "Dimer H-stacking pos18", "Trimer H-bond pos18", "Dimer H-bond pos18", "Dimer HL-gap pos19", "Tetramer H-bond pos13", "GC content", "Melting Temperature", "CCA pos19", "Tetramer H-stacking pos17", "Tetramer H-bond pos2", "Tetramer H-stacking pos14", "Tetramer H-stacking pos15", "Trimer HL-gap pos18", "Tetramer H-stacking pos7", "Tetramer H-stacking pos16", "Trimer H-stacking pos15", "Monomer # of Electrons pos18", "Tetramer HL-gap pos1", "Tetramer H-bond pos14", "Tetramer HL-gap pos5", "Trimer H-bond pos1", "Dimer H-stacking pos16", "Tetramer H-bond pos8", "Tetramer HL-gap pos7", "Tetramer HL-gap pos12", "Tetramer H-bond pos15", "Tetramer HL-gap pos6", "Tetramer H-bond pos17", "Trimer H-stacking pos17", "Tetramer HL-gap pos4", "Tetramer HL-gap pos10", "Tetramer H-bond pos10", "Tetramer H-stacking pos6", "Basepair H-bond pos18", "Tetramer HL-gap pos9", "Tetramer HL-gap pos14", "Tetramer HL-gap pos11", "Tetramer H-stacking pos2", "Tetramer HL-gap pos3", "Tetramer H-stacking pos5", "Tetramer HL-gap pos2", "Tetramer H-bond pos3", "Tetramer H-stacking pos13")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
pdf("Imp.bar.19May.pdf")
ggplot(effect.sort.50, aes(x=reorder(Feature.Label, -NormEdge), y=NormEdge, fill=Category)) + geom_bar(colour="black", stat="identity") + labs(title="Feature Importance (Top 50 Features)", x="Feature", y="Normalized Importance") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1, size=8)) + scale_fill_brewer(palette="Paired")
dev.off()
effect.sort.50$Category <- c("HL-gap", "H-stacking", "H-bond", "H-bond", "H-bond", "One-hot Dimer", "H-stacking", "H-stacking", "H-bond", "H-bond", "HL-gap", "H-bond", "Raw Calculation", "Raw Calculation", "One-hot Dimer", "H-stacking", "H-bond", "H-stacking", "H-stacking", "HL-gap", "H-stacking", "H-stacking", "H-stacking", "# of Electrons", "HL-gap", "H-bond", "HL-gap", "H-bond", "H-stacking", "H-bond", "HL-gap", "HL-gap", "H-bond", "HL-gap", "H-bond", "H-stacking", "HL-gap", "HL-gap", "H-bond", "H-stacking", "H-bond", "HL-gap", "HL-gap", "HL-gap", "H-stacking", "HL-gap", "H-stacking", "HL-gap", "H-bond", "H-stacking")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
pdf("Imp.bar.feature.19May.pdf")
ggplot(effect.sort.50, aes(x=reorder(Feature.Label, -NormEdge), y=NormEdge, fill=Category)) + geom_bar(colour="black", stat="identity") + labs(title="Feature Importance (Top 50 Features)", x="Feature", y="Normalized Importance") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1, size=8)) + scale_fill_brewer(palette="Paired")
dev.off()
# Figure 2C
library(ggplot2)
library(reshape2)
library(RColorBrewer)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/cut.score")
effect <- read.delim("e.coli.finalquantum_cut.score.importance4.effect", header=T, sep="\t", stringsAsFactors = F)
effect.sort <- effect[order(-effect$NormEdge),]
effect.sort.100 <- effect.sort[1:100,]
library(ggrepel)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
summary(effect.sort.100$Samples)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 1585 2776 4066 5730 6792 27637
pdf("Imp.Effect.scatter.label.quartile.color.21March.pdf")
ggplot(effect.sort.100, aes(x=NormEdge, y=Samples, label=Feature)) + geom_point(aes(size = FeatureEffect), color = dplyr::case_when(effect.sort.100$Samples > 6792 ~ "#1b9e77", effect.sort.100$Samples < 2776 ~ "#d95f02", TRUE ~ "#7570b3"), alpha = 0.8) + geom_text_repel(aes(label = Feature), box.padding = 0.35, point.padding = 0.5, segment.color = 'grey50', size=2) + labs(title="Samples v Importance (Top 100 Features)", x="Normalized Importance", y="Samples") + theme_minimal()
dev.off()
pdf("Imp.Sample.scatter.label.Effect.color.18May.pdf")
ggplot(effect.sort.100, aes(x=NormEdge, y=Samples, label=Feature)) + geom_point(aes(color = FeatureEffect, size=NormEdge), alpha = 0.8) + geom_text_repel(aes(label = Feature), box.padding = 0.35, point.padding = 0.5, segment.color = 'grey50', size=2) + labs(title="Samples v Importance (Top 100 Features)", x="Normalized Importance", y="Samples") + theme_minimal() + scale_colour_gradient2()
dev.off()
effect.sort.100$Feature <- gsub('.raw', '', effect.sort.100$Feature)
effect.sort.100$Feature <- gsub('raw', '', effect.sort.100$Feature)
pdf("Imp.Sample.scatter.LargeLabel.Effect.color.18May.pdf")
ggplot(effect.sort.100, aes(x=NormEdge, y=Samples, label=Feature)) + geom_point(aes(color = FeatureEffect, size=NormEdge), alpha = 1) + geom_text_repel(aes(label = Feature), box.padding = 1, point.padding = 0.5, segment.color = 'grey50', size=4) + labs(title="Samples v Importance (Top 100 Features)", x="Normalized Importance", y="Samples") + theme_minimal() + scale_colour_gradient2()
dev.off()
pdf("Imp.Sample.scatter.NoLabel.Effect.color.18May.pdf")
ggplot(effect.sort.100, aes(x=NormEdge, y=Samples, label=Feature)) + geom_point(aes(color = FeatureEffect, size=NormEdge), alpha = 1) + labs(title="Samples v Importance (Top 100 Features)", x="Normalized Importance", y="Samples") + theme_minimal() + scale_colour_gradient2()
dev.off()
# Figure 3A
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/cut.score")
imp <- read.delim("e.coli.finalquantum.importance4.effect_sorted", header=F, sep="\t", stringsAsFactors = F)
colnames(imp) <- c("Feature", "YVec", "NormEdge", "Effect", "Samples", "Linearity")
imp$Normalized.Importance <- as.numeric(imp$NormEdge)
imp$Feature.Effect <- as.numeric(imp$Effect)
imp$SampleCount <- as.numeric(imp$Samples)
imp.dir <- imp %>% mutate(Effect.Direction = ifelse(Feature.Effect < 0, "neg", ifelse(Feature.Effect > 0, "pos", "zero")))
imp.dir.top20 <- imp.dir[1:20,]
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
pdf("Imp.Dir.Top20.21March.pdf")
ggplot(imp.dir.top20) + geom_bar(aes(x=reorder(Feature, -Normalized.Importance), y=Normalized.Importance, fill=Effect.Direction), stat="identity") + theme_classic() + xlab("Top Features") + ylab("Normalized Importance") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_fill_brewer(palette="Set1")
dev.off()
ggplot(imp.dir.top20, aes(x=reorder(Feature, -Normalized.Importance))) + geom_bar(aes(y=Normalized.Importance, fill=Effect.Direction), stat="identity") + coord_flip() + xlab("") + ylab("Normalized Importance") + theme_classic() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position="bottom") + scale_fill_brewer(palette="Set1")
# Figure 3B
pdf("Imp.Dir.Top20.Effect.21March.pdf")
imp.dir.top20$Sample.Prop <- imp.dir.top20$SampleCount/32374
ggplot(imp.dir.top20, aes(x=reorder(Feature, -Normalized.Importance))) + geom_point(aes(y=Sample.Prop, color=Effect.Direction, size=Normalized.Importance)) + xlab("") + ylab("Avg Proportion of Samples that Features Influence") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Set2")
dev.off()
pdf("Imp.Dir.Top20.Effect.30March.pdf")
ggplot(imp.dir.top20, aes(x=reorder(Feature, -Normalized.Importance))) + geom_point(aes(y=Sample.Prop, color=Effect.Direction, size=Feature.Effect)) + xlab("") + ylab("Avg Proportion of Samples that Features Influence") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Set2")
dev.off()
#### Figure S3: Focus on effect size
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/cut.score")
imp <- read.delim("e.coli.finalquantum.importance4.effect_sorted", header=F, sep="\t", stringsAsFactors = F)
colnames(imp) <- c("Feature", "YVec", "NormEdge", "Effect", "Samples", "Linearity")
imp$Normalized.Importance <- as.numeric(imp$NormEdge)
imp$Feature.Effect <- as.numeric(imp$Effect)
imp$SampleCount <- as.numeric(imp$Samples)
imp.dir <- imp %>% mutate(Effect.Direction = ifelse(Feature.Effect < 0, "neg", ifelse(Feature.Effect > 0, "pos", "zero")))
imp.dir$absEffect <- abs(imp.dir$Feature.Effect)
imp.dir.effectsorted <- imp.dir[order(imp.dir$absEffect, decreasing = TRUE),]
imp.dir.effectsorted.top20 <- imp.dir.effectsorted[1:20,]
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
pdf("Imp.Dir.Top20Effect.Effect.30March.pdf")
imp.dir.effectsorted.top20$Feature.Label <- c("CTG pos19", "Basepair HL-gap pos11", "CACC pos12", "GCTA pos6", "CAC pos12", "TC pos12", "CTG pos6", "GCCG pos8", "ACA pos15", "TACT pos3", "CAGC pos6", "TGC pos13", "AAC pos13", "GTG pos3", "CAGT pos4", "TCCT pos5", "AACA pos4", "AGCA pos4", "GTGG pos7", "GAGA pos1")
ggplot(imp.dir.effectsorted.top20) + geom_point(aes(x=reorder(Feature.Label, -absEffect), y=absEffect, color=Effect.Direction, size=Normalized.Importance)) + xlab("") + ylab("abs(Effect Size)") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Set2")
dev.off()
pdf("Imp.Dir.Top20Effect.Effect.31May.pdf")
imp.dir.effectsorted.top20$Feature.Label <- c("CTG pos19", "Basepair HL-gap pos11", "CACC pos12", "GCTA pos6", "CAC pos12", "TC pos12", "CTG pos6", "GCCG pos8", "ACA pos15", "TACT pos3", "CAGC pos6", "TGC pos13", "AAC pos13", "GTG pos3", "CAGT pos4", "TCCT pos5", "AACA pos4", "AGCA pos4", "GTGG pos7", "GAGA pos1")
ggplot(imp.dir.effectsorted.top20) + geom_point(aes(x=reorder(Feature.Label, absEffect), y=absEffect, color=Effect.Direction, size=Normalized.Importance)) + xlab("") + ylab("abs(Effect Size)") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Set2")
dev.off()
## Main E.coli feature figure
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/cut.score")
imp <- read.delim("e.coli.finalquantum.importance4.effect_sorted", header=F, sep="\t", stringsAsFactors = F)
colnames(imp) <- c("Feature", "YVec", "NormEdge", "Effect", "Samples", "Linearity")
imp$Normalized.Importance <- as.numeric(imp$NormEdge)
imp$Feature.Effect <- as.numeric(imp$Effect)
imp$SampleCount <- as.numeric(imp$Samples)
imp.dir <- imp %>% mutate(Effect.Direction = ifelse(Feature.Effect < 0, "neg", ifelse(Feature.Effect > 0, "pos", "zero")))
imp.dir.top20 <- imp.dir[1:20,]
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
imp.dir.top20.df <- imp.dir.top20 %>% mutate(imp.dir = ifelse(Effect.Direction == "neg", Normalized.Importance*-1, Normalized.Importance))
imp.dir.top20.df$Feature.Label <- c("Basepair HL-gap pos20", "Dimer H-stacking pos19", "Basepair H-bond pos20", "Tetramer H-bond pos11", "Tetramer H-bond pos1", "CC pos15", "Trimer H-stacking pos18", "Dimer H-stacking pos18", "Trimer H-bond pos18", "Dimer H-bond pos18", "Dimer HL-gap pos19", "Tetramer H-bond pos13", "GC content", "Temperature of Melting", "CCA pos19", "Tetramer H-stacking pos17", "Tetramer H-bond pos2", "Tetramer H-stacking pos14", "Tetramer H-stacking pos15", "Trimer HL-gap pos18")
library(ggplot2)
pdf("Ecoli.FeatureEngineering.pdf")
ggplot(imp.dir.top20.df, aes(x=reorder(Feature.Label, -Normalized.Importance), y=imp.dir, color=Effect.Direction)) + geom_point(size=3) + geom_segment(aes(x=Feature.Label, xend=Feature.Label, y=0, yend=imp.dir)) + labs(title="Ecoli Top Features") + ylab("Normalized Importance") + xlab("") + theme(axis.text.x = element_text(angle=90, vjust=0.6)) + scale_fill_brewer(palette="Set1") + theme_classic() + coord_flip()
dev.off()
library(ggplot2)
pdf("Ecoli.FeatureEngineering.nocolor.pdf")
ggplot(imp.dir.top20.df, aes(x=reorder(Feature.Label, -Normalized.Importance), y=imp.dir), color="black") + geom_point(size=3) + geom_segment(aes(x=Feature.Label, xend=Feature.Label, y=0, yend=imp.dir)) + labs(title="Ecoli Top Features") + ylab("Normalized Importance") + xlab("") + theme(axis.text.x = element_text(angle=90, vjust=0.6)) + theme_classic() + coord_flip()
dev.off()
library(ggplot2)
pdf("Ecoli.FeatureEngineering.31May.pdf")
ggplot(imp.dir.top20.df, aes(x=reorder(Feature.Label, Normalized.Importance), y=imp.dir, color=Effect.Direction)) + geom_point(size=3) + geom_segment(aes(x=Feature.Label, xend=Feature.Label, y=0, yend=imp.dir)) + labs(title="Ecoli Top Features") + ylab("Normalized Importance") + xlab("") + theme(axis.text.x = element_text(angle=90, vjust=0.6)) + scale_fill_brewer(palette="Set1") + theme_classic() + coord_flip()
dev.off()
# violin plot of iRF output from the same matrix generation across species (e.coli, y.lipolytica, h.sapien, p.putida)
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_cross.species
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_cross.species
cp /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/cut.score/foldRuns/results/R2_foldResults.txt e.coli.R2_foldResults.txt
cp /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum/cut.score/foldRuns/results/R2_foldResults.txt y.lipolytica.R2_foldResults.txt
cp /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum/cut.score/foldRuns/results/R2_foldResults.txt Doench2014.R2_foldResults.txt
cp /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum/cut.score/foldRuns/results/R2_foldResults.txt putida.R2_foldResults.txt
require(data.table)
### violin plots of R2 across different models
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_cross.species")
#create a list of the files from your target directory
file_list <- list.files(path="/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_cross.species")
#initiate a blank data frame, each iteration of the loop will add a column of the data from the given file to this variable
dataset <- data.frame()
for (i in 1:length(file_list)){
temp_data <- fread(file_list[i], stringsAsFactors = F) #read in files using the fread function from the data.table package
dataset <- do.call(cbind, sapply(file_list,data.table::fread, simplify = FALSE)) #for each iteration, bind the new data to the building dataset
}
colnames(dataset) <- c("H.sapien", "E.coli", "P.putida", "Y.lipolytica")
library(ggplot2)
library(reshape2)
library(RColorBrewer)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_cross.species")
dataset.order <- dataset[,c(2,3,4,1)]
dataset.order.melt <- melt(dataset.order)
pdf("R2.cross.species.violin.pdf")
ggplot(dataset.order.melt, aes(x=value, y=variable, fill=variable)) + geom_violin(trim=FALSE) + geom_boxplot(width=0.1, fill="white") + scale_color_brewer(palette="Dark2") + labs(title="R2 across iRF runs", x="feature run", y="R2") + theme_minimal() + theme(legend.position = "none")
dev.off()
#### run same plot with MSE instead of R2
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_cross.species.MSE
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_cross.species.MSE
cp /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/cut.score/foldRuns/results/MSE_foldResults.txt e.coli.MSE_foldResults.txt
cp /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum/cut.score/foldRuns/results/MSE_foldResults.txt y.lipolytica.MSE_foldResults.txt
cp /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum/cut.score/foldRuns/results/MSE_foldResults.txt Doench2014.MSE_foldResults.txt
cp /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum/cut.score/foldRuns/results/MSE_foldResults.txt putida.MSE_foldResults.txt
require(data.table)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_cross.species.MSE")
#create a list of the files from your target directory
file_list <- list.files(path="/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_cross.species.MSE")
#initiate a blank data frame, each iteration of the loop will add a column of the data from the given file to this variable
dataset <- data.frame()
for (i in 1:length(file_list)){
temp_data <- fread(file_list[i], stringsAsFactors = F) #read in files using the fread function from the data.table package
dataset <- do.call(cbind, sapply(file_list,data.table::fread, simplify = FALSE)) #for each iteration, bind the new data to the building dataset
}
colnames(dataset) <- c("H.sapien", "E.coli", "P.putida", "Y.lipolytica")
library(ggplot2)
library(reshape2)
library(RColorBrewer)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_cross.species.MSE")
dataset.order <- dataset[,c(2,3,4,1)]
dataset.order.melt <- melt(dataset.order)
pdf("MSE.cross.species.violin.pdf")
ggplot(dataset.order.melt, aes(x=value, y=variable, fill=variable)) + geom_violin(trim=FALSE) + geom_boxplot(width=0.1, fill="white") + scale_color_brewer(palette="Dark2") + labs(title="MSE across iRF runs", x="feature run", y="R2") + theme_minimal() + theme(legend.position = "none")
dev.off()
# RMSE
df <- dataset[,c(2,7,8,1)]
colnames(df) <- c("E.coli", "P.putida", "Y.lipolytica", "H.sapien")
df.melt <- melt(df)
df.melt$rmse <- sqrt(df.melt$value)
pdf("RMSE.cross.species.violin.pdf")
ggplot(df.melt, aes(x=rmse, y=variable, fill=variable)) + geom_violin(trim=FALSE) + geom_boxplot(width=0.1, fill="white") + scale_color_brewer(palette="Dark2") + labs(title="RMSE across iRF runs", x="feature run", y="RMSE") + theme_minimal() + theme(legend.position = "none")
dev.off()
#### run same plot with MAE
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_cross.species.MAE
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_cross.species.MAE
cp /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/cut.score/foldRuns/results/MAE_foldResults.txt e.coli.MAE_foldResults.txt
cp /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum/cut.score/foldRuns/results/MAE_foldResults.txt y.lipolytica.MAE_foldResults.txt
cp /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum/cut.score/foldRuns/results/MAE_foldResults.txt Doench2014.MAE_foldResults.txt
cp /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum/cut.score/foldRuns/results/MAE_foldResults.txt putida.MAE_foldResults.txt
require(data.table)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_cross.species.MAE")
#create a list of the files from your target directory
file_list <- list.files(path="/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_cross.species.MAE")
#initiate a blank data frame, each iteration of the loop will add a column of the data from the given file to this variable
dataset <- data.frame()
for (i in 1:length(file_list)){
temp_data <- fread(file_list[i], stringsAsFactors = F) #read in files using the fread function from the data.table package
dataset <- do.call(cbind, sapply(file_list,data.table::fread, simplify = FALSE)) #for each iteration, bind the new data to the building dataset
}
colnames(dataset) <- c("H.sapien.MAE", "H.sapien.MEAN", "H.sapien.MAE.MEAN", "E.coli.MAE", "E.coli.MEAN", "E.coli.MAE.MEAN", "P.putida.MAE", "P.putida.MEAN", "P.putida.MAE.MEAN", "Y.lipolytica.MAE", "Y.lipolytica.MEAN", "Y.lipolytica.MAE.MEAN")
library(ggplot2)
library(reshape2)
library(RColorBrewer)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_cross.species.MAE")
dataset.order <- dataset[,c(6,9,12,3)]
dataset.order.melt <- melt(dataset.order)
pdf("MAE.cross.species.violin.pdf")
ggplot(dataset.order.melt, aes(x=value, y=variable, fill=variable)) + geom_violin(trim=FALSE) + geom_boxplot(width=0.1, fill="white") + scale_color_brewer(palette="Dark2") + labs(title="MAE/MEAN across iRF runs", x="feature run", y="R2") + theme_minimal() + theme(legend.position = "none")
dev.off()
#### summary figure of R2 and feature count: Figure S3A?
### number of features / R2 / correlation
# raw = 5 / 0.0406861 / 0.2007612
# onehot = 5911 / 0.2600428516356858 / 0.4914184
# qct = 316 / 0.24183122435585644 / 0.4918057
# raw+onehot = 5916 / 0.2602828644651521 / 0.4931724
# raw+qct = 312 / 0.24177446035820813 / 0.4939777
# onehot+qct = 6227 / 0.24905182664101577 / 0.500817
# raw+onehot+qct = 6232 / 0.24906667479923555 / 0.5019173
library(ggplot2)
library(reshape2)
library(RColorBrewer)
df <- data.frame(feature.set = c("raw", "onehot", "QCT", "raw+onehot", "raw+QCT", "onehot+QCT", "raw+onehot+QCT"), R2 = c(0.0406861, 0.2600428516356858, 0.24183122435585644, 0.2602828644651521, 0.24177446035820813, 0.24905182664101577, 0.24906667479923555), Correlation = c(0.2007612, 0.4914184, 0.4918057, 0.4931724, 0.4939777, 0.500817, 0.5019173), feature.count = c(5, 5911, 316, 5916, 312, 6227, 6232))
ggplot(df) + geom_bar(aes(x=feature.set, y=Correlation, fill=feature.set), stat="identity") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Paired") + geom_line(aes(x=feature.set, y=feature.count, group=1),inherit.aes = FALSE, color="blue",size=2) + scale_y_continuous(name = "R2", sec.axis=sec_axis(~ . , name="Feature Count"), limits=c(0,6200)) + labs(title = "Size and Prediction Accuracy of Feature Subsets", x = "Feature Set", y = "R2")
# mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
features <- read.delim("Ecoli.finalquantum.features.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("Ecoli.finalquantum.normalize.score.txt", header=T, sep="\t", stringsAsFactors = F)
summary(score$cut.score)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 0.0000 0.3563 0.5618 0.5077 0.6757 1.0000
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification")
score.q1 <- score %>% mutate(cut.score = ifelse(cut.score < 0.25, 0, 1))
score.q2 <- score %>% mutate(cut.score = ifelse(cut.score < 0.50, 0, 1))
score.q3 <- score %>% mutate(cut.score = ifelse(cut.score < 0.75, 0, 1))
feature.score.q1 <- left_join(score.q1, features, by="sgRNAID")
write.table(feature.score.q1[,2:ncol(feature.score.q1)], "Ecoli.finalquantum.classify.q1.iRFmatrix.tsv", quote=F, row.names=F, sep=",")
feature.score.q2 <- left_join(score.q2, features, by="sgRNAID")
write.table(feature.score.q2[,2:ncol(feature.score.q2)], "Ecoli.finalquantum.classify.q2.iRFmatrix.tsv", quote=F, row.names=F, sep=",")
feature.score.q3 <- left_join(score.q3, features, by="sgRNAID")
write.table(feature.score.q3[,2:ncol(feature.score.q3)], "Ecoli.finalquantum.classify.q3.iRFmatrix.tsv", quote=F, row.names=F, sep=",")
write.table(feature.score.q1[,1:2], "Ecoli.finalquantum.classify.q1.score.txt", quote=F, row.names=F, sep="\t")
write.table(feature.score.q1[,1:2], "Ecoli.finalquantum.classify.q1.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = feature.score.q1[,2]), "Ecoli.finalquantum.classify.q1.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(feature.score.q2[,1:2], "Ecoli.finalquantum.classify.q2.score.txt", quote=F, row.names=F, sep="\t")
write.table(feature.score.q2[,1:2], "Ecoli.finalquantum.classify.q2.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = feature.score.q2[,2]), "Ecoli.finalquantum.classify.q2.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(feature.score.q3[,1:2], "Ecoli.finalquantum.classify.q3.score.txt", quote=F, row.names=F, sep="\t")
write.table(feature.score.q3[,1:2], "Ecoli.finalquantum.classify.q3.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = feature.score.q3[,2]), "Ecoli.finalquantum.classify.q3.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(features, "Ecoli.finalquantum.classify.features.txt", quote=F, row.names=F, sep="\t")
write.table(features, "Ecoli.finalquantum.classify.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(features[,2:ncol(features)], "Ecoli.finalquantum.classify.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q1.iRF
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q1.iRF
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName classify.q1 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/Ecoli.finalquantum.classify.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/Ecoli.finalquantum.classify.q1.score.txt
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q2.iRF
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q2.iRF
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName classify.q2 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/Ecoli.finalquantum.classify.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/Ecoli.finalquantum.classify.q2.score.txt
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q3.iRF
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q3.iRF
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName classify.q3 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/Ecoli.finalquantum.classify.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/Ecoli.finalquantum.classify.q3.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q1.iRF/Submits/submit_full_classify.q1_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q2.iRF/Submits/submit_full_classify.q2_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q3.iRF/Submits/submit_full_classify.q3_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q1.iRF/Submits/submit_train_classify.q1_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q2.iRF/Submits/submit_train_classify.q2_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q3.iRF/Submits/submit_train_classify.q3_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q1.iRF/Submits/submit_test_classify.q1_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q2.iRF/Submits/submit_test_classify.q2_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q3.iRF/Submits/submit_test_classify.q3_0.sh
# Andes
module load python/3.7-anaconda3
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q1.iRF
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt classify.q1
# 0.12529151834520436
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/classify.q1_cut.score.importance4 | head
# p19dimer.Hbond.stackingraw: 110.528
# p18dimer.Hbond.energyraw: 52.9624
# p1tetramer.Hbond.energyraw: 51.0775
# V231.xsgRNA.raw: 47.3827
# p15tetramer.Hbond.stackingraw: 42.5926
# p11tetramer.Hbond.energyraw: 39.3022
# p18trimer.Hlgap.eVEraw: 32.6164
# p6tetramer.Hlgap.eVEraw: 32.1497
# p6tetramer.Hbond.stackingraw: 31.8695
# p2tetramer.Hbond.energyraw: 31.798
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q1.iRF/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("classify.q1_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.3588828
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q2.iRF
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt classify.q2
# 0.18607930324108463
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/classify.q2_cut.score.importance4 | head
# p20basepair.Hbond.energyraw: 183.638
# p20basepair.Hlgap.eVEraw: 163.451
# p19dimer.Hbond.stackingraw: 144.959
# p1tetramer.Hbond.energyraw: 82.4669
# p11tetramer.Hbond.energyraw: 80.3186
# p18trimer.Hbond.stackingraw: 79.974
# p18dimer.Hbond.energyraw: 72.5719
# p15tetramer.Hbond.stackingraw: 60.8338
# p13tetramer.Hbond.energyraw: 60.0197
# p16tetramer.Hbond.stackingraw: 54.315
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q2.iRF/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("classify.q2_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4318153
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q3.iRF
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt classify.q3
# 0.01631988476578034
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/classify.q3_cut.score.importance4 | head
# p20basepair.Hbond.energyraw: 27.6664
# p20basepair.Hlgap.eVEraw: 25.6897
# p12tetramer.Hlgap.eVEraw: 22.6305
# p4tetramer.Hbond.stackingraw: 22.2947
# p7tetramer.Hlgap.eVEraw: 22.028
# p1tetramer.Hlgap.eVEraw: 21.0615
# p9tetramer.Hlgap.eVEraw: 20.7667
# p12tetramer.Hbond.stackingraw: 20.737
# p11tetramer.Hbond.stackingraw: 20.2828
# p10tetramer.Hbond.stackingraw: 20.0696
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q3.iRF/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("classify.q3_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.1601039
# /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/Y.Lipolytica.SupTable1.txt
# /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/GSM552919_Ylip.fsa.txt
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
### dataset --> Data S4... save each sheet as a dataframe, add column declaring Cas9 type, intersect with Data S1 for sequence, create new sgRNAID using both the ID and Cas9 type, merge files
setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration")
df <- read.delim("Y.Lipolytica.SupTable1.txt", header=T, sep="\t")
library(dplyr)
library(tidyr)
df2 <- unite(df, sgRNAID,c("Number", "Gene.target"), sep="_", remove=TRUE)
df3 <- df2[,c(1,3,2)]
colnames(df3) <- c("sgRNAID", "cut.score", "nucleotide.sequence")
df.na <- na.omit(df3)
# 46711
write.table(df.na, "Y.Lipolytica.txt", quote=F, row.names=F, sep="\t")
sed '1d' Y.Lipolytica.txt | awk '{print ">"$1"\n"$3}' > Y.Lipolytica.fasta
# cd /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/
# scp Y.Lipolytica.txt noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/.
# scp Y.Lipolytica.fasta noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/.
https://www.biorxiv.org/content/10.1101/2021.09.29.461753v1.supplementary-material
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica
# scp /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/y.lipolytica/baisya2021.tableS3.txt noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/.
# scp /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/GCF_000002525.2_ASM252v1_genomic.fna.gz noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/.
# scp /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/GCF_000002525.2_ASM252v1_genomic.gff.gz noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/.
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/")
df <- read.delim("baisya2021.tableS3.txt", header=T, sep="\t")
df2 <- df[,c(1,6,3)]
colnames(df2) <- c("sgRNAID", "cut.score", "nucleotide.sequence")
df.na <- na.omit(df2)
write.table(df.na, "baisya2021.txt", quote=F, row.names=F, sep="\t")
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/
sed '1d' baisya2021.txt | awk '{print ">"$1"\n"$2}' > baisya2021.fasta
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
## blast
# conda install blast
# cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes
# wget https://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/ncbi-blast-2.11.0+-x64-linux.tar.gz
# tar zxvpf ncbi-blast-2.11.0+-x64-linux.tar.gz
# export PATH=$PATH:$HOME/ncbi-blast-2.10.1+/bin
# echo $PATH
# mkdir $HOME/blastdb
# export BLASTDB=$HOME/blastdb
# set BLASTDB=$HOME/blastdb
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/makeblastdb -in GCF_000002525.2_ASM252v1_genomic.fna -dbtype nucl
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query baisya2021.fasta -db GCF_000002525.2_ASM252v1_genomic.fna -out y.lipolytica.gRNA.blast.tab -outfmt 6 -evalue 0.0005 -task blastn -num_threads 10
awk '{if ($9 > $10) print $2"\t"$10"\t"$9"\t"$1}' y.lipolytica.gRNA.blast.tab > tmp1.bed
awk '{if ($10 > $9) print $2"\t"$9"\t"$10"\t"$1}' y.lipolytica.gRNA.blast.tab > tmp2.bed
cat tmp1.bed tmp2.bed > y.lipolytica.gRNA.blast.bed
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
# R
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/")
df <- read.delim("baisya2021.txt", header=T, sep="\t")
colnames(df) <- c("sgRNAID", "nucleotide.sequence", "cut.score")
coord <- read.delim("y.lipolytica.gRNA.blast.bed", header=F, sep="\t")
colnames(coord) <- c("chr", "start", "end", "sgRNA")
df$sgRNA <- df$sgRNAID
library(dplyr)
df.coord <- left_join(coord, df, by="sgRNA")
write.table(df.coord, "y.lipolytica.sgRNA.coord.txt", quote=F, row.names=F, sep="\t")
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica
faidx GCF_000002525.2_ASM252v1_genomic.fna -i chromsizes > y.lipolytica.sizes.genome
bedtools makewindows -g y.lipolytica.sizes.genome -w 20 -s 1 > y.lipolytica.20bp.sliding.bed
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica
## genes
grep 'gene' GCF_000002525.2_ASM252v1_genomic.gff | sort -k 1,1 -k 4,4n > GCF_000002525.2_ASM252v1_genomic.gene.sort.gff
bedtools intersect -wo -a y.lipolytica.20bp.sliding.bed -b GCF_000002525.2_ASM252v1_genomic.gene.sort.gff > y.lipolytica.gene.20sliding.bed
## GC content
bedtools nuc -fi GCF_000002525.2_ASM252v1_genomic.fna -bed y.lipolytica.20bp.sliding.bed | sed '1d' > y.lipolytica.GC.20sliding.bed
https://biopython.org/docs/1.75/api/Bio.SeqUtils.MeltingTemp.html
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
Bio.SeqUtils.MeltingTemp.Tm_NN(seq, check=True, strict=True, c_seq=None, shift=0, nn_table=None, tmm_table=None, imm_table=None, de_table=None, dnac1=25, dnac2=25, selfcomp=False, Na=50, K=0, Tris=0, Mg=0, dNTPs=0, saltcorr=5)
https://warwick.ac.uk/fac/sci/moac/people/students/peter_cock/python/fasta_n
# summit: # conda install -c conda-forge biopython
### sgRNA
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica
# count nucleotides
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica
python3
input_file = open('baisya2021.fasta', 'r')
output_file = open('nucleotide_counts_sgRNA.tsv','w')
output_file.write('Window\tA\tC\tG\tT\tLength\tCG%\n')
from Bio import SeqIO
for cur_record in SeqIO.parse(input_file, "fasta") :
gene_name = cur_record.name
A_count = cur_record.seq.count('A')
C_count = cur_record.seq.count('C')
G_count = cur_record.seq.count('G')
T_count = cur_record.seq.count('T')
length = len(cur_record.seq)
cg_percentage = float(C_count + G_count) / length
output_line = '%s\t%i\t%i\t%i\t%i\t%i\t%f\n' % \
(gene_name, A_count, C_count, G_count, T_count, length, cg_percentage)
output_file.write(output_line)
output_file.close()
input_file.close()
exit()
# Melting temperature(°C) = 64.9 + 41 * (nG+nC-16.4)/(nA+nT+nG+nC)
R
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica")
df <- read.delim("nucleotide_counts_sgRNA.tsv", header=T, sep="\t")
df.melt <- df %>% mutate(MeltingTemp = 64.9 + 41 * (G+C-16.4) / (A+T+G+C))
write.table(df.melt, "y.lipolytica.nucleotide_counts_sgRNA_temp.txt", quote=F, row.names=F, sep="\t")
q()
### 20bp sliding windows
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica
bedtools getfasta -fi GCF_000002525.2_ASM252v1_genomic.fna -bed y.lipolytica.20bp.sliding.bed -fo y.lipolytica.20sliding.fa
# count nucleotides
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica
python3
input_file = open('y.lipolytica.20sliding.fa', 'r')
output_file = open('nucleotide_counts_20sliding.tsv','w')
output_file.write('Window\tA\tC\tG\tT\tLength\tCG%\n')
from Bio import SeqIO
for cur_record in SeqIO.parse(input_file, "fasta") :
gene_name = cur_record.name
A_count = cur_record.seq.count('A')
C_count = cur_record.seq.count('C')
G_count = cur_record.seq.count('G')
T_count = cur_record.seq.count('T')
length = len(cur_record.seq)
cg_percentage = float(C_count + G_count) / length
output_line = '%s\t%i\t%i\t%i\t%i\t%i\t%f\n' % \
(gene_name, A_count, C_count, G_count, T_count, length, cg_percentage)
output_file.write(output_line)
output_file.close()
input_file.close()
exit()
# Melting temperature(°C) = 64.9 + 41 * (nG+nC-16.4)/(nA+nT+nG+nC)
R
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica")
df <- read.delim("nucleotide_counts_20sliding.tsv", header=T, sep="\t")
df.melt <- df %>% mutate(MeltingTemp = 64.9 + 41 * (G+C-16.4) / (A+T+G+C))
write.table(df.melt, "y.lipolytica.nucleotide_counts_20sliding_temp.txt", quote=F, row.names=F, sep="\t")
q()
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/
cut -f 1,3 baisya2021.txt > y.lipolytica.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/encode_sequences.py y.lipolytica.noscore.txt
# separate nucleotide sequence values into individual columns in data frame so each position counts as one feature
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/
sed '1d' y.lipolytica.noscore_independent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID A C T G' | cut -d ' ' -f 1-5 > y.lipolytica_ind1.txt
sed '1d' y.lipolytica.noscore_independent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID AA AC AT AG CA CC CT CG TA TC TT TG GA GC GT GG' | cut -d ' ' -f 1-17 > y.lipolytica_ind2.txt
sed '1d' y.lipolytica.noscore_dependent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID p1.A p1.C p1.T p1.G p2.A p2.C p2.T p2.G p3.A p3.C p3.T p3.G p4.A p4.C p4.T p4.G p5.A p5.C p5.T p5.G p6.A p6.C p6.T p6.G p7.A p7.C p7.T p7.G p8.A p8.C p8.T p8.G p9.A p9.C p9.T p9.G p10.A p10.C p10.T p10.G p11.A p11.C p11.T p11.G p12.A p12.C p12.T p12.G p13.A p13.C p13.T p13.G p14.A p14.C p14.T p14.G p15.A p15.C p15.T p15.G p16.A p16.C p16.T p16.G p17.A p17.C p17.T p17.G p18.A p18.C p18.T p18.G p19.A p19.C p19.T p19.G p20.A p20.C p20.T p20.G' | cut -d ' ' -f 1-81 > y.lipolytica_dep1.txt
sed '1d' y.lipolytica.noscore_dependent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID p1.AA p1.AC p1.AT p1.AG p1.CA p1.CC p1.CT p1.CG p1.TA p1.TC p1.TT p1.TG p1.GA p1.GC p1.GT p1.GG p2.AA p2.AC p2.AT p2.AG p2.CA p2.CC p2.CT p2.CG p2.TA p2.TC p2.TT p2.TG p2.GA p2.GC p2.GT p2.GG p3.AA p3.AC p3.AT p3.AG p3.CA p3.CC p3.CT p3.CG p3.TA p3.TC p3.TT p3.TG p3.GA p3.GC p3.GT p3.GG p4.AA p4.AC p4.AT p4.AG p4.CA p4.CC p4.CT p4.CG p4.TA p4.TC p4.TT p4.TG p4.GA p4.GC p4.GT p4.GG p5.AA p5.AC p5.AT p5.AG p5.CA p5.CC p5.CT p5.CG p5.TA p5.TC p5.TT p5.TG p5.GA p5.GC p5.GT p5.GG p6.AA p6.AC p6.AT p6.AG p6.CA p6.CC p6.CT p6.CG p6.TA p6.TC p6.TT p6.TG p6.GA p6.GC p6.GT p6.GG p7.AA p7.AC p7.AT p7.AG p7.CA p7.CC p7.CT p7.CG p7.TA p7.TC p7.TT p7.TG p7.GA p7.GC p7.GT p7.GG p8.AA p8.AC p8.AT p8.AG p8.CA p8.CC p8.CT p8.CG p8.TA p8.TC p8.TT p8.TG p8.GA p8.GC p8.GT p8.GG p9.AA p9.AC p9.AT p9.AG p9.CA p9.CC p9.CT p9.CG p9.TA p9.TC p9.TT p9.TG p9.GA p9.GC p9.GT p9.GG p10.AA p10.AC p10.AT p10.AG p10.CA p10.CC p10.CT p10.CG p10.TA p10.TC p10.TT p10.TG p10.GA p10.GC p10.GT p10.GG p11.AA p11.AC p11.AT p11.AG p11.CA p11.CC p11.CT p11.CG p11.TA p11.TC p11.TT p11.TG p11.GA p11.GC p11.GT p11.GG p12.AA p12.AC p12.AT p12.AG p12.CA p12.CC p12.CT p12.CG p12.TA p12.TC p12.TT p12.TG p12.GA p12.GC p12.GT p12.GG p13.AA p13.AC p13.AT p13.AG p13.CA p13.CC p13.CT p13.CG p13.TA p13.TC p13.TT p13.TG p13.GA p13.GC p13.GT p13.GG p14.AA p14.AC p14.AT p14.AG p14.CA p14.CC p14.CT p14.CG p14.TA p14.TC p14.TT p14.TG p14.GA p14.GC p14.GT p14.GG p15.AA p15.AC p15.AT p15.AG p15.CA p15.CC p15.CT p15.CG p15.TA p15.TC p15.TT p15.TG p15.GA p15.GC p15.GT p15.GG p16.AA p16.AC p16.AT p16.AG p16.CA p16.CC p16.CT p16.CG p16.TA p16.TC p16.TT p16.TG p16.GA p16.GC p16.GT p16.GG p17.AA p17.AC p17.AT p17.AG p17.CA p17.CC p17.CT p17.CG p17.TA p17.TC p17.TT p17.TG p17.GA p17.GC p17.GT p17.GG p18.AA p18.AC p18.AT p18.AG p18.CA p18.CC p18.CT p18.CG p18.TA p18.TC p18.TT p18.TG p18.GA p18.GC p18.GT p18.GG p19.AA p19.AC p19.AT p19.AG p19.CA p19.CC p19.CT p19.CG p19.TA p19.TC p19.TT p19.TG p19.GA p19.GC p19.GT p19.GG p20.AA p20.AC p20.AT p20.AG p20.CA p20.CC p20.CT p20.CG p20.TA p20.TC p20.TT p20.TG p20.GA p20.GC p20.GT p20.GG' | cut -d ' ' -f 1-321 > y.lipolytica_dep2.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/
sed '1d' y.lipolytica.noscore.txt | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 p16 p17 p18 p19 p20' | cut -d ' ' -f 1-21 > y.lipolytica.sequence.txt
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("protein_rna_dna-vector_lee_nucleotide_dna_data.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/")
seq <- read.delim("y.lipolytica.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:5]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- c("A", "C", "G", "T")
rownames(seq) <- seq[,1]
seq.df <- seq[,2:21]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "y.lipolytica.tensors.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "y.lipolytica.tensors.melt.txt", quote=F, row.names=F, sep="\t")
https://www.tbi.univie.ac.at/RNA/tutorial/ minimum free energy (MFE) structure
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate ViennaRNA
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/vienna
RNAfold < ../baisya2021.fasta > y.lipolytica.gRNA.ViennaRNA.output.txt
grep '(' y.lipolytica.gRNA.ViennaRNA.output.txt | grep -Eo '[+-]?[0-9]+([.][0-9]+)?' > y.lipolytica.gRNA.ViennaRNA.output.value.txt
grep '>' y.lipolytica.gRNA.ViennaRNA.output.txt | sed 's/>//g' > y.lipolytica.gRNA.names.txt
paste y.lipolytica.gRNA.names.txt y.lipolytica.gRNA.ViennaRNA.output.value.txt > y.lipolytica.gRNA.ViennaRNA.output.value.id.txt
cp y.lipolytica.gRNA.ViennaRNA.output.value.id.txt ../.
# 20bp sliding fasta
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/vienna
RNAfold < ../y.lipolytica.20sliding.fa > y.lipolytica.20sliding.ViennaRNA.output.txt
grep '(' y.lipolytica.20sliding.ViennaRNA.output.txt | grep -Eo '[+-]?[0-9]+([.][0-9]+)?' > y.lipolytica.20sliding.ViennaRNA.output.value.txt
grep '>' y.lipolytica.20sliding.ViennaRNA.output.txt | sed 's/>//g' > y.lipolytica.20sliding.names.txt
paste y.lipolytica.20sliding.names.txt y.lipolytica.20sliding.ViennaRNA.output.value.txt > y.lipolytica.20sliding.ViennaRNA.output.value.id.txt
cp y.lipolytica.20sliding.ViennaRNA.output.value.id.txt ../.
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J ViennaRNA.ylipolytica
#SBATCH -N 2
#SBATCH -t 48:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate ViennaRNA
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/vienna
RNAfold < ../y.lipolytica.20sliding.fa > y.lipolytica.20sliding.ViennaRNA.output.txt
grep '(' y.lipolytica.20sliding.ViennaRNA.output.txt | grep -Eo '[+-]?[0-9]+([.][0-9]+)?' > y.lipolytica.20sliding.ViennaRNA.output.value.txt
grep '>' y.lipolytica.20sliding.ViennaRNA.output.txt | sed 's/>//g' > y.lipolytica.20sliding.names.txt
paste y.lipolytica.20sliding.names.txt y.lipolytica.20sliding.ViennaRNA.output.value.txt > y.lipolytica.20sliding.ViennaRNA.output.value.id.txt
cp y.lipolytica.20sliding.ViennaRNA.output.value.id.txt ../.
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/ViennaRNA.ylipolytica.sh
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
## GATC motif
## fastaregex
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f GCF_000002525.2_ASM252v1_genomic.fna -r 'GATC' > y.lipolytica.gatc.bed
bedtools intersect -wo -a y.lipolytica.20bp.sliding.bed -b y.lipolytica.gatc.bed > y.lipolytica.gatc.20sliding.bed
https://www.synthego.com/guide/how-to-use-crispr/pam-sequence
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
# generate fastq file of NGG sequences and blast to reference
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica
# vim NGG.PAM.fasta
## fastaRegexFinder
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f GCF_000002525.2_ASM252v1_genomic.fna -r 'AGG' > y.lipolytica.AGG.PAM.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f GCF_000002525.2_ASM252v1_genomic.fna -r 'TGG' > y.lipolytica.TGG.PAM.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f GCF_000002525.2_ASM252v1_genomic.fna -r 'CGG' > y.lipolytica.CGG.PAM.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f GCF_000002525.2_ASM252v1_genomic.fna -r 'GGG' > y.lipolytica.GGG.PAM.txt
cat y.lipolytica.AGG.PAM.txt y.lipolytica.TGG.PAM.txt y.lipolytica.CGG.PAM.txt y.lipolytica.GGG.PAM.txt > y.lipolytica.NGG.PAM.txt
sort -k 1,1 -k 2,2n y.lipolytica.NGG.PAM.txt > y.lipolytica.NGG.PAM.sorted.bed
# intersect with sliding windows in the genome to get density for DWT
bedtools intersect -wo -a y.lipolytica.20bp.sliding.bed -b y.lipolytica.NGG.PAM.sorted.bed > y.lipolytica.NGG.PAM.20bp.sliding.windows.bed
# closest with gRNAs to identify distance (downstream, strand)
awk '{print $0"\t""+"}' y.lipolytica.sgRNA.coord.bed > y.lipolytica.sgRNA.coord.strand.txt
bedtools closest -a y.lipolytica.sgRNA.coord.strand.txt -b y.lipolytica.NGG.PAM.sorted.bed -io -iu -D a > y.lipolytica.sgRNA.closestPAM.bed
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica
cut -f 1-4 y.lipolytica.sgRNA.coord.txt | sed '1d' | sort -k 1,1 -k 2,2n > y.lipolytica.sgRNA.coord.bed
grep 'gene' GCF_000002525.2_ASM252v1_genomic.gff | sort -k 1,1 -k 4,4n > GCF_000002525.2_ASM252v1_genomic.gene.sort.gff
bedtools closest -a y.lipolytica.sgRNA.coord.bed -b GCF_000002525.2_ASM252v1_genomic.gene.sort.gff -D b > y.lipolytica.sgRNA.gene.closest.bed
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
structure <- read.delim("y.lipolytica.gRNA.ViennaRNA.output.value.id.txt", header=F, sep="\t", stringsAsFactors = F)
nuc <- read.delim("y.lipolytica.nucleotide_counts_sgRNA_temp.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("y.lipolytica.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(5:6)]
colnames(score.df) <- c("sgRNAID", "cut.score")
structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]
# strucutre, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])
structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"
structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]
structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score.df, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")
## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
onehot.ind1 <- read.delim("y.lipolytica_ind1.txt", header=T, sep=" ")
onehot.ind2 <- read.delim("y.lipolytica_ind2.txt", header=T, sep=" ")
onehot.dep1 <- read.delim("y.lipolytica_dep1.txt", header=T, sep=" ")
onehot.dep2 <- read.delim("y.lipolytica_dep2.txt", header=T, sep=" ")
onehot.dep2 <- onehot.dep2[,1:305]
onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep <- full_join(onehot.dep1, onehot.dep2, by="sgRNAID")
onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"
data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))
df.melt <- melt(data.onehot, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
write.table(df.id, "df.id.test.txt", quote=F, row.names=F, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica")
tensor <- read.delim("y.lipolytica.tensors.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0
tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0
write.table(tensor.id, "tensor.id.test", quote=F, row.names=F, sep="\t")
# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
df.id <- read.delim("df.id.test.txt", header=T, sep="\t")
tensor.id <- read.delim("tensor.id.test", header=T, sep="\t")
df.score <- unique(df.id[,c(1,3)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
head(df.id)
head(tensor.score.order)
tensor.df <- rbind(df.id, tensor.score.order)
write.table(tensor.df, "y.lipolytica.raw.onehot.tensor.txt", quote=F, row.names=F, sep="\t")
df.dcast <- tensor.df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
write.table(df.dcast, "y.lipolytica.raw.onehot.tensor.dcast.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast)
# 45271
df.dcast.na <- na.omit(df.dcast)
write.table(df.dcast.na, "y.lipolytica.raw.onehot.tensor.dcast.na.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast.na)
# 45271
# pam (distance and nucleotide)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
sgRNA.pam <- read.table("y.lipolytica.sgRNA.closestPAM.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.pam.sub <- sgRNA.pam[,c(4,12,13)]
colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]
score.location <- left_join(score.df, sgRNA.pam.df, by=c("sgRNAID"))
score.location$scale <- 0
df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
# 45271
write.table(df.dcast.na, "y.lipolytica.sgRNA.pam.dcast.txt", quote=F, row.names=F, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
df.dcast <- read.delim("y.lipolytica.sgRNA.pam.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
df <- read.delim("y.lipolytica.raw.onehot.tensor.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df.location <- left_join(df, df.dcast, by=c("sgRNAID"))
nrow(df.location)
# 45271
write.table(df.location, "y.lipolytica.raw.onehot.tensor.pam.dcast.na.txt", quote=F, row.names=F, sep="\t")
# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
sgRNA.genes <- read.table("y.lipolytica.sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- unique(sgRNA.genes[,c(4,14)])
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")
score.location <- left_join(score.df, sgRNA.genes.df, by=c("sgRNAID"))
score.location$scale <- 0
df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
# 45271
write.table(df.dcast.na, "y.lipolytica.sgRNA.location.dcast.txt", quote=F, row.names=F, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
df.dcast <- read.delim("y.lipolytica.sgRNA.location.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
df <- read.delim("y.lipolytica.raw.onehot.tensor.pam.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df.location <- inner_join(df, df.dcast, by=c("sgRNAID"))
nrow(df.location)
# 45271
write.table(df.location, "y.lipolytica.raw.onehot.tensor.pam.location.dcast.na.txt", quote=F, row.names=F, sep="\t")
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J haar.matrix
#SBATCH -N 1
#SBATCH -p gpu
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica
R CMD BATCH haar.matrix.R
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/haar.matrix.sh
salloc -A SYB105 -N 2 -p gpu -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/modwt
R
library(dplyr)
library(reshape2)
library(tidyr)
library(wmtsa)
library(data.table)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica")
gatc <- read.table("y.lipolytica.gatc.20sliding.bed", header=F, sep="\t", stringsAsFactors = F)
#gene <- read.table("y.lipolytica.gene.20sliding.bed", header=F, sep="\t", stringsAsFactors = F)
gene <- read.table("y.lipolytica.gene.20sliding.coord.bed", header=F, sep="\t", stringsAsFactors = F)
structure <- read.table("y.lipolytica.20sliding.ViennaRNA.output.value.id.txt", header=T, sep="\t", stringsAsFactors = F)
nuc <- read.table("y.lipolytica.nucleotide_counts_20sliding_temp.txt", header=T, sep="\t", stringsAsFactors = F)
pam <- read.table("y.lipolytica.NGG.PAM.20bp.sliding.windows.bed", header=F, sep="\t", stringsAsFactors = F)
window <- read.table("y.lipolytica.20bp.sliding.bed", header=F, sep="\t", stringsAsFactors = F)
score <- read.table("y.lipolytica.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
colnames(score) <- c("chr", "start", "end", "sgRNA", "sgRNAid", "cut.score", "seq")
score.df <- score[,c(1:3,5,6)]
gatc.bin <- gatc %>% group_by(V1, V2, V3) %>% mutate(gatc.count = n())
gatc.count <- unique(gatc.bin[,c(1:3,12)])
gene.bin <- gene %>% group_by(V1, V2, V3) %>% mutate(gene.count = n())
#gene.count <- unique(gene.bin[,c(1:3,14)])
gene.count <- unique(gene.bin)
pam.bin <- pam %>% group_by(V1, V2, V3) %>% mutate(pam.count = n())
pam.count <- unique(pam.bin[,c(1:3,12)])
window.v <- window[,1:3]
colnames(window.v) <- c("V1", "V2", "V3")
gatc.win <- left_join(window.v, gatc.count, by=c("V1", "V2", "V3"))
gatc.win[is.na(gatc.win)] <- 0
gene.win <- left_join(window.v, gene.count, by=c("V1", "V2", "V3"))
gene.win[is.na(gene.win)] <- 0
pam.win <- left_join(window.v, pam.count, by=c("V1", "V2", "V3"))
pam.win[is.na(pam.win)] <- 0
gene.df <- gene.win$gene.count
gatc.df <- gatc.win$gatc.count
pam.df <- pam.win$pam.count
structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/modwt")
temp.modwt <- wavMODWT(temp.df, wavelet="haar")
temp.modwt.df <- as.matrix(temp.modwt)
temp.modwt.label <- data.frame(label = row.names(temp.modwt.df), temp.modwt.df)
temp.modwt.dt <- as.data.table(temp.modwt.label)
temp.modwt.name <- temp.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(temp.modwt.name) <- c("label", "temp.dwt", "scale", "window")
write.table(temp.modwt.name, "temp.modwt.haar.txt", quote=F, row.names=F, sep="\t")
gc.modwt <- wavMODWT(gc.df, wavelet="haar")
gc.modwt.df <- as.matrix(gc.modwt)
gc.modwt.label <- data.frame(label = row.names(gc.modwt.df), gc.modwt.df)
gc.modwt.dt <- as.data.table(gc.modwt.label)
gc.modwt.name <- gc.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(gc.modwt.name) <- c("label", "gc.dwt", "scale", "window")
write.table(gc.modwt.name, "gc.modwt.haar.txt", quote=F, row.names=F, sep="\t")
structure.modwt <- wavMODWT(structure.df, wavelet="haar")
structure.modwt.df <- as.matrix(structure.modwt)
structure.modwt.label <- data.frame(label = row.names(structure.modwt.df), structure.modwt.df)
structure.modwt.dt <- as.data.table(structure.modwt.label)
structure.modwt.name <- structure.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(structure.modwt.name) <- c("label", "structure.dwt", "scale", "window")
write.table(structure.modwt.name, "structure.modwt.haar.txt", quote=F, row.names=F, sep="\t")
gene.modwt <- wavMODWT(gene.df, wavelet="haar")
gene.modwt.df <- as.matrix(gene.modwt)
gene.modwt.label <- data.frame(label = row.names(gene.modwt.df), gene.modwt.df)
gene.modwt.dt <- as.data.table(gene.modwt.label)
gene.modwt.name <- gene.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(gene.modwt.name) <- c("label", "gene.dwt", "scale", "window")
write.table(gene.modwt.name, "gene.density.modwt.haar.txt", quote=F, row.names=F, sep="\t")
gatc.modwt <- wavMODWT(gatc.df, wavelet="haar")
gatc.modwt.df <- as.matrix(gatc.modwt)
gatc.modwt.label <- data.frame(label = row.names(gatc.modwt.df), gatc.modwt.df)
gatc.modwt.dt <- as.data.table(gatc.modwt.label)
gatc.modwt.name <- gatc.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(gatc.modwt.name) <- c("label", "gatc.dwt", "scale", "window")
write.table(gatc.modwt.name, "gatc.density.modwt.haar.txt", quote=F, row.names=F, sep="\t")
pam.modwt <- wavMODWT(pam.df, wavelet="haar")
pam.modwt.df <- as.matrix(pam.modwt)
pam.modwt.label <- data.frame(label = row.names(pam.modwt.df), pam.modwt.df)
pam.modwt.dt <- as.data.table(pam.modwt.label)
pam.modwt.name <- pam.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(pam.modwt.name) <- c("label", "pam.dwt", "scale", "window")
write.table(pam.modwt.name, "pam.density.modwt.haar.txt", quote=F, row.names=F, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/modwt")
temp.modwt.name <- read.delim("temp.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
gc.modwt.name <- read.delim("gc.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
structure.modwt.name <- read.delim("structure.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
gene.modwt.name <- read.delim("gene.density.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
gatc.modwt.name <- read.delim("gatc.density.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
pam.modwt.name <- read.delim("pam.density.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica")
window <- read.table("y.lipolytica.20bp.sliding.bed", header=F, sep="\t", stringsAsFactors = F)
score <- read.table("y.lipolytica.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
colnames(score) <- c("chr", "start", "end", "sgRNA", "sgRNAid", "cut.score", "seq")
score.df <- score[,c(1:3,5,6)]
colnames(window) <- c("chr", "start", "end")
window$window <- seq.int(nrow(window))
window$window <- as.character(window$window-1)
window$start <- as.numeric(window$start)
window$end <- as.numeric(window$end - 1)
window.score.df <- left_join(score.df, window, by=c("chr", "start", "end"))
window.score.df$window <- as.integer(window.score.df$window)
window.score.temp <- left_join(window.score.df, temp.modwt.name[,c(3,4,2)], by="window")
window.temp.gc <- left_join(window.score.temp, gc.modwt.name[,c(3,4,2)], by=c("window", "scale"))
window.temp.gc.structure <- left_join(window.temp.gc, structure.modwt.name[,c(3,4,2)], by=c("window", "scale"))
window.temp.gc.structure.gene <- left_join(window.temp.gc.structure, gene.modwt.name[,c(3,4,2)], by=c("window", "scale"))
window.temp.gc.structure.gene.gatc <- left_join(window.temp.gc.structure.gene, gatc.modwt.name[,c(3,4,2)], by=c("window", "scale"))
window.temp.gc.structure.gene.gatc.pam <- left_join(window.temp.gc.structure.gene.gatc, pam.modwt.name[,c(3,4,2)], by=c("window", "scale"))
nrow(window.temp.gc.structure.gene.gatc.pam)
#
window.temp.gc.structure.gene.gatc.pam.sgRNA <- subset(window.temp.gc.structure.gene.gatc.pam, window.temp.gc.structure.gene.gatc.pam$cut.score != "NA")
nrow(window.temp.gc.structure.gene.gatc.pam)
#
write.table(window.temp.gc.structure.gene.gatc.pam.sgRNA, "y.lipolytica.20sliding.exact.DWT.haar.txt", quote=F, row.names=F, sep="\t")
df.melt <- melt(window.temp.gc.structure.gene.gatc.pam.sgRNA[,c(4,5,7:ncol(window.temp.gc.structure.gene.gatc.pam.sgRNA))], id=c("cut.score", "scale", "sgRNAid"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAid", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAid + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast[is.na(df.dcast)] <- 0
df.dcast.na <- na.omit(df.dcast)
nrow(df.dcast.na)
# 45271
write.table(df.dcast.na, "y.lipolytica.20sliding.exact.DWT.haar.dcast.txt", quote=F, row.names=F, sep="\t")
# combine regional DWT with other features
library(tidyr)
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
df.dcast.na <- read.delim("y.lipolytica.20sliding.exact.DWT.haar.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
names(df.dcast.na)[names(df.dcast.na) == 'sgRNAid'] <- 'sgRNAID'
df <- read.delim("y.lipolytica.raw.onehot.tensor.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df <- df[,c(1,1656,3:1649,1651:1655,1657)]
nrow(df)
# 45271
df.region <- inner_join(df, df.dcast.na[,c(1,3:ncol(df.dcast.na))], by=c("sgRNAID"))
nrow(df.region)
# 45271
write.table(df.region, "y.lipolytica.20sliding.raw.onehot.tensor.dwt.dcast.txt", quote=F, row.names=F, sep="\t")
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J y.lipolytica.iRF
#SBATCH -N 1
#SBATCH -t 10:00:00
#SBATCH --mem-per-cpu=0
#SBATCH -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save
R CMD BATCH iRF.test.R
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.test.sh
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(ranger)
iRF <- function(xmat, y, ntree=200, iter=5, classification=F, threads=1, alwayssplits=NULL, saveall=T)
{
tmp <- cbind(xmat, Y = y)
wt <- rep(1/ncol(xmat), ncol(xmat)) # start with equal sample weighting per SNP
rfs <- list()
for(i in 1:iter)
{
cat("\niRF iteration ",i,"\n")
cat("=================\n")
mtry = 0.5*sum(wt>0)
rf <- ranger::ranger(dependent.variable.name = "Y", data = tmp, num.trees=ntree,
split.select.weights = wt, classification = classification,
mtry = mtry, importance = "impurity_corrected", num.threads=threads, write.forest = T,
always.split.variables = alwayssplits)
wt <- rf$variable.importance / sum(abs(rf$variable.importance)) # scale importance to range(0,1)
wt[wt<0] <- 0 # set negative weights to zero
cat("mtry: ", mtry, "\n")
cat("prediction error: ",rf$prediction.error,"\n")
if(classification==FALSE) cat("r^2: ",rf$r.squared,"\n")
if(classification==TRUE) print(rf$confusion.matrix)
cat("cor(y,yhat): ",cor(rf$predictions,y),"\n")
cat("SNPs with importance > 0:",sum(wt>0),"\n")
if(saveall) rfs[[i]] <- rf
if(sum(wt>0) < max(0.01*(ncol(xmat)-1), 10))
{
if(!saveall) rfs <- rf
break
}
}
return(rfs)
}
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
df <- read.delim("y.lipolytica.20sliding.raw.onehot.tensor.dwt.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
set.seed(2458)
df.sample <- df[sample(nrow(df), 10000), ]
# sgRNAID: [,1]
# cut.score: [,2]
# one-hot independent: [,c(3:17,1645:1649,1651:1652,1654:1655)]
# one-hot dependent: [,c(18:57,120:139,202:221,284:303,366:385,448:467,530:549,612:631,694:713,776:795,920:943,1068:1087,1150:1169,1232:1251,1314:1333,1396:1415,1478:1497,1560:1579)]
# chemical tensors: [,c(58:119,140:201,222:283,304:365,386:447,468:529,550:611,632:693,714:775,796:919,944:1067,1088:1149,1170:1231,1252:1313,1334:1395,1416:1477,1498:1559,1580:1641)]
# raw (gc, structure, temp, gene.distance, pam.distance): [,c(1642:1644,1650,1653)]
# DWT : [,c(1656:1797)]
df.raw <- df.sample[,c(2,1642:1644,1650,1653)]
iRF(df.raw[,2:ncol(df.raw)], df.raw$cut.score)
# iRF iteration 1
# =================
# mtry: 2.5
# prediction error: 7.765089
# r^2: -0.01215432
# cor(y,yhat): 0.0608681
# SNPs with importance > 0: 2
df.dwt <- df.sample[,c(2,1656:1797)]
iRF(df.dwt[,2:ncol(df.dwt)], df.dwt$cut.score)
# iRF iteration 5
# =================
# mtry: 24
# prediction error: 7.457404
# r^2: 0.02795152
# cor(y,yhat): 0.1798774
# SNPs with importance > 0: 40
df.onehot <- df.sample[,c(2,3:17,1645:1649,1651:1652,1654:1655,18:57,120:139,202:221,284:303,366:385,448:467,530:549,612:631,694:713,776:795,920:943,1068:1087,1150:1169,1232:1251,1314:1333,1396:1415,1478:1497,1560:1579)]
iRF(df.onehot[,2:ncol(df.onehot)], df.onehot$cut.score)
# iRF iteration 2
# =================
# mtry: 106.5
# prediction error: 7.143456
# r^2: 0.06887356
# cor(y,yhat): 0.2652687
# SNPs with importance > 0: 149
df.quantum <- df.sample[,c(2,58:119,140:201,222:283,304:365,386:447,468:529,550:611,632:693,714:775,796:919,944:1067,1088:1149,1170:1231,1252:1313,1334:1395,1416:1477,1498:1559,1580:1641)]
iRF(df.quantum[,2:ncol(df.quantum)], df.quantum$cut.score)
# iRF iteration 3
# =================
# mtry: 164.5
# prediction error: 7.420933
# r^2: 0.03270536
# cor(y,yhat): 0.1926085
# SNPs with importance > 0: 189
df.raw.dwt <- cbind(df.raw, df.dwt[,2:ncol(df.dwt)])
iRF(df.raw.dwt[,2:ncol(df.raw.dwt)], df.raw.dwt$cut.score)
# iRF iteration 5
# =================
# mtry: 22
# prediction error: 7.444199
# r^2: 0.02967267
# cor(y,yhat): 0.1882058
# SNPs with importance > 0: 35
df.raw.onehot <- cbind(df.raw, df.onehot[,2:ncol(df.onehot)])
iRF(df.raw.onehot[,2:ncol(df.raw.onehot)], df.raw.onehot$cut.score)
# iRF iteration 4
# =================
# mtry: 61
# prediction error: 7.112543
# r^2: 0.07290298
# cor(y,yhat): 0.2733014
# SNPs with importance > 0: 108
df.raw.quantum <- cbind(df.raw, df.quantum[,2:ncol(df.quantum)])
iRF(df.raw.quantum[,2:ncol(df.raw.quantum)], df.raw.quantum$cut.score)
# iRF iteration 4
# =================
# mtry: 105
# prediction error: 7.344396
# r^2: 0.04268167
# cor(y,yhat): 0.2127093
# SNPs with importance > 0: 164
df.onehot.dwt <- cbind(df.onehot, df.dwt[,2:ncol(df.dwt)])
iRF(df.onehot.dwt[,2:ncol(df.onehot.dwt)], df.onehot.dwt$cut.score)
# iRF iteration 3
# =================
# mtry: 118
# prediction error: 7.091331
# r^2: 0.07566788
# cor(y,yhat): 0.2752033
# SNPs with importance > 0: 165
df.onehot.quantum <- cbind(df.onehot, df.quantum[,2:ncol(df.quantum)])
iRF(df.onehot.quantum[,2:ncol(df.onehot.quantum)], df.onehot.quantum$cut.score)
# iRF iteration 4
# =================
# mtry: 126
# prediction error: 7.119273
# r^2: 0.07202576
# cor(y,yhat): 0.2690378
# SNPs with importance > 0: 174
df.quantum.dwt <- cbind(df.quantum, df.dwt[,2:ncol(df.dwt)])
iRF(df.quantum.dwt[,2:ncol(df.quantum.dwt)], df.quantum.dwt$cut.score)
# iRF iteration 4
# =================
# mtry: 199
# prediction error: 7.372495
# r^2: 0.03901906
# cor(y,yhat): 0.2007382
# SNPs with importance > 0: 307
df.raw.dwt.onehot <- cbind(df.raw, df.dwt[,2:ncol(df.dwt)], df.onehot.quantum[,2:ncol(df.onehot.quantum)])
iRF(df.raw.dwt.onehot[,2:ncol(df.raw.dwt.onehot)], df.raw.dwt.onehot$cut.score)
# iRF iteration 5
# =================
# mtry: 140
# prediction error: 7.054261
# r^2: 0.08049979
# cor(y,yhat): 0.2840999
# SNPs with importance > 0: 221
df.raw.dwt.quantum <- cbind(df.raw, df.dwt[,2:ncol(df.dwt)], df.quantum[,2:ncol(df.quantum)])
iRF(df.raw.dwt.quantum[,2:ncol(df.raw.dwt.quantum)], df.raw.dwt.quantum$cut.score)
# iRF iteration 4
# =================
# mtry: 203.5
# prediction error: 7.31899
# r^2: 0.0459933
# cor(y,yhat): 0.2160629
# SNPs with importance > 0: 309
df.raw.onehot.quantum <- cbind(df.raw, df.onehot[,2:ncol(df.onehot)], df.quantum[,2:ncol(df.quantum)])
iRF(df.raw.onehot.quantum[,2:ncol(df.raw.onehot.quantum)], df.raw.onehot.quantum$cut.score)
# iRF iteration 5
# =================
# mtry: 106.5
# prediction error: 7.068021
# r^2: 0.07870629
# cor(y,yhat): 0.2818915
# SNPs with importance > 0: 156
df.dwt.onehot.quantum <- cbind(df.dwt, df.onehot[,2:ncol(df.onehot)], df.quantum[,2:ncol(df.quantum)])
iRF(df.dwt.onehot.quantum[,2:ncol(df.dwt.onehot.quantum)], df.dwt.onehot.quantum$cut.score)
# iRF iteration 4
# =================
# mtry: 213.5
# prediction error: 7.062192
# r^2: 0.07946603
# cor(y,yhat): 0.282356
# SNPs with importance > 0: 307
df.all <- cbind(df.dwt, df.onehot[,2:ncol(df.onehot)], df.raw[,2:ncol(df.raw)], df.quantum[,2:ncol(df.quantum)])
iRF(df.all[,2:ncol(df.all)], df.all$cut.score)
# iRF iteration 5
# =================
# mtry: 154
# prediction error: 7.026276
# r^2: 0.08414762
# cor(y,yhat): 0.2901653
# SNPs with importance > 0: 227
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/
cut -f 1,3 baisya2021.txt > y.lipolytica.noscore.txt
python ../kmer1_positional_encode.py y.lipolytica.noscore.txt
python ../kmer2_positional_encode.py y.lipolytica.noscore.txt
python ../kmer3_positional_encode.py y.lipolytica.noscore.txt
python ../kmer4_positional_encode.py y.lipolytica.noscore.txt
# separate nucleotide sequence values into individual columns in data frame so each position counts as one feature
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/
sed '1d' y.lipolytica.noscore_dependent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > y.lipolytica_dep1.txt
sed '1d' y.lipolytica.noscore_dependent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > y.lipolytica_dep2.txt
sed '1d' y.lipolytica.noscore_dependent3.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > y.lipolytica_dep3.txt
sed '1d' y.lipolytica.noscore_dependent4.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > y.lipolytica_dep4.txt
# salloc -A SYB105 -N 2 -p gpu -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
score <- read.delim("y.lipolytica.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- unique(score[,c(5,7)])
colnames(score.df) <- c("sgRNAID", "cut.score")
onehot.dep1 <- read.delim("y.lipolytica_dep1.txt", header=F, sep=" ")
onehot.dep2 <- read.delim("y.lipolytica_dep2.txt", header=F, sep=" ")
onehot.dep3 <- read.delim("y.lipolytica_dep3.txt", header=F, sep=" ")
onehot.dep4 <- read.delim("y.lipolytica_dep4.txt", header=F, sep=" ")
colnames(onehot.dep1)[1] <- "sgRNAID"
colnames(onehot.dep2)[1] <- "sgRNAID"
colnames(onehot.dep3)[1] <- "sgRNAID"
colnames(onehot.dep4)[1] <- "sgRNAID"
onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)-1], onehot.dep2[,1:ncol(onehot.dep2)-1], by="sgRNAID")
onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)-1], by="sgRNAID")
onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)-1], by="sgRNAID")
onehot.score <- full_join(score.df, onehot.dep, by="sgRNAID")
df.melt <- melt(onehot.score, id=c("cut.score", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "sgRNAID", "variable", "value")
df$value <- as.numeric(df$value)
df.id <- df[!(is.na(df$value) | df$value==""), ]
colnames(df.id) <- c("cut.score", "sgRNAID", "feature", "value")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
write.table(df.dcast, "y.lipolytica.kmer.encoding.txt", quote=F, row.names=F, sep="\t")
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J iRF.onehot.kmer
#SBATCH -N 1
#SBATCH -t 10:00:00
#SBATCH --mem-per-cpu=0
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica
R CMD BATCH iRF.onehot.kmer.R
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.onehot.kmer.sh
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(ranger)
iRF <- function(xmat, y, ntree=200, iter=5, classification=F, threads=1, alwayssplits=NULL, saveall=T)
{
tmp <- cbind(xmat, Y = y)
wt <- rep(1/ncol(xmat), ncol(xmat)) # start with equal sample weighting per SNP
rfs <- list()
for(i in 1:iter)
{
cat("\niRF iteration ",i,"\n")
cat("=================\n")
mtry = 0.5*sum(wt>0)
rf <- ranger::ranger(dependent.variable.name = "Y", data = tmp, num.trees=ntree,
split.select.weights = wt, classification = classification,
mtry = mtry, importance = "impurity_corrected", num.threads=threads, write.forest = T,
always.split.variables = alwayssplits)
wt <- rf$variable.importance / sum(abs(rf$variable.importance)) # scale importance to range(0,1)
wt[wt<0] <- 0 # set negative weights to zero
cat("mtry: ", mtry, "\n")
cat("prediction error: ",rf$prediction.error,"\n")
if(classification==FALSE) cat("r^2: ",rf$r.squared,"\n")
if(classification==TRUE) print(rf$confusion.matrix)
cat("cor(y,yhat): ",cor(rf$predictions,y),"\n")
cat("SNPs with importance > 0:",sum(wt>0),"\n")
if(saveall) rfs[[i]] <- rf
if(sum(wt>0) < max(0.01*(ncol(xmat)-1), 10))
{
if(!saveall) rfs <- rf
break
}
}
return(rfs)
}
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
df <- read.delim("y.lipolytica.kmer.encoding.txt", header=T, sep="\t", stringsAsFactors = F)
set.seed(2458)
df <- df[sample(nrow(df), 10000), ]
# kmer = 1
df.1 <- df[,c(2:82)]
iRF(df.1[,2:ncol(df.1)], df.1$cut.score)
# kmer = 2
df.2 <- df[,c(2,83:386)]
iRF(df.2[,2:ncol(df.2)], df.2$cut.score)
# kmer = 3
df.3 <- df[,c(2,387:1538)]
iRF(df.3[,2:ncol(df.3)], df.3$cut.score)
# kmer = 4
df.4 <- df[,c(2,1539:5890)]
iRF(df.4[,2:ncol(df.4)], df.4$cut.score)
# kmer = 1 + 2
df.1.2 <- df[,c(2:386)]
iRF(df.1.2[,2:ncol(df.1.2)], df.1.2$cut.score)
# kmer = 1 + 2 + 3
df.1.2.3 <- df[,c(2:1538)]
iRF(df.1.2.3[,2:ncol(df.1.2.3)], df.1.2.3$cut.score)
# kmer = 1 + 2 + 3 + 4
df.1.2.3.4 <- df[,c(2:5890)]
iRF(df.1.2.3.4[,2:ncol(df.1.2.3.4)], df.1.2.3.4$cut.score)
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
lipolytica <- read.delim("y.lipolytica.raw.onehot.tensor.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
lipolytica <- lipolytica[,c(1,1656,3:1649,1651:1655,1657)]
ncol(lipolytica)
# 1655
nrow(lipolytica)
# 45271
lipolytica.num <- mutate_all(lipolytica[,1:ncol(lipolytica)], function(x) as.numeric(as.character(x)))
lipolytica.num$cut.score <- (lipolytica.num$cut.score - min(lipolytica.num$cut.score)) / (max(lipolytica.num$cut.score) - min(lipolytica.num$cut.score))
df <- lipolytica.num
set.seed(2458)
df.sample <- df[sample(nrow(df), 10000), ]
library(ranger)
iRF <- function(xmat, y, ntree=200, iter=5, classification=F, threads=1, alwayssplits=NULL, saveall=T)
{
tmp <- cbind(xmat, Y = y)
wt <- rep(1/ncol(xmat), ncol(xmat)) # start with equal sample weighting per SNP
rfs <- list()
for(i in 1:iter)
{
cat("\niRF iteration ",i,"\n")
cat("=================\n")
mtry = 0.5*sum(wt>0)
rf <- ranger::ranger(dependent.variable.name = "Y", data = tmp, num.trees=ntree,
split.select.weights = wt, classification = classification,
mtry = mtry, importance = "impurity_corrected", num.threads=threads, write.forest = T,
always.split.variables = alwayssplits)
wt <- rf$variable.importance / sum(abs(rf$variable.importance)) # scale importance to range(0,1)
wt[wt<0] <- 0 # set negative weights to zero
cat("mtry: ", mtry, "\n")
cat("prediction error: ",rf$prediction.error,"\n")
if(classification==FALSE) cat("r^2: ",rf$r.squared,"\n")
if(classification==TRUE) print(rf$confusion.matrix)
cat("cor(y,yhat): ",cor(rf$predictions,y),"\n")
cat("SNPs with importance > 0:",sum(wt>0),"\n")
if(saveall) rfs[[i]] <- rf
if(sum(wt>0) < max(0.01*(ncol(xmat)-1), 10))
{
if(!saveall) rfs <- rf
break
}
}
return(rfs)
}
# sgRNAID: [,1]
# cut.score: [,2]
# one-hot independent: [,c(3:17,1645:1649,1651:1652,1654:1655)]
# one-hot dependent: [,c(18:57,120:139,202:221,284:303,366:385,448:467,530:549,612:631,694:713,776:795,920:943,1068:1087,1150:1169,1232:1251,1314:1333,1396:1415,1478:1497,1560:1579)]
# chemical tensors: [,c(58:119,140:201,222:283,304:365,386:447,468:529,550:611,632:693,714:775,796:919,944:1067,1088:1149,1170:1231,1252:1313,1334:1395,1416:1477,1498:1559,1580:1641)]
# raw (gc, structure, temp, gene.distance, pam.distance): [,c(1642:1644,1650,1653)]
df.raw <- df.sample[,c(2,1642:1644,1650,1653)]
iRF(df.raw[,2:ncol(df.raw)], df.raw$cut.score)
# iRF iteration 2
# =================
# mtry: 286
# prediction error: 0.02513118
# r^2: 0.03452986
# cor(y,yhat): 0.1976886
# SNPs with importance > 0: 314
df.onehot <- df.sample[,c(2,3:17,1645:1649,1651:1652,1654:1655,18:57,120:139,202:221,284:303,366:385,448:467,530:549,612:631,694:713,776:795,920:943,1068:1087,1150:1169,1232:1251,1314:1333,1396:1415,1478:1497,1560:1579)]
iRF(df.onehot[,2:ncol(df.onehot)], df.onehot$cut.score)
# iRF iteration 4
# =================
# mtry: 57.5
# prediction error: 0.024432
# r^2: 0.06139041
# cor(y,yhat): 0.2574278
# SNPs with importance > 0: 92
df.quantum <- df.sample[,c(2,58:119,140:201,222:283,304:365,386:447,468:529,550:611,632:693,714:775,796:919,944:1067,1088:1149,1170:1231,1252:1313,1334:1395,1416:1477,1498:1559,1580:1641)]
iRF(df.quantum[,2:ncol(df.quantum)], df.quantum$cut.score)
# iRF iteration 4
# =================
# mtry: 95
# prediction error: 0.02534101
# r^2: 0.02646891
# cor(y,yhat): 0.1749871
# SNPs with importance > 0: 102
df.raw.onehot <- cbind(df.raw, df.onehot[,2:ncol(df.onehot)])
iRF(df.raw.onehot[,2:ncol(df.raw.onehot)], df.raw.onehot$cut.score)
df.raw.quantum <- cbind(df.raw, df.quantum[,2:ncol(df.quantum)])
iRF(df.raw.quantum[,2:ncol(df.raw.quantum)], df.raw.quantum$cut.score)
df.onehot.quantum <- cbind(df.onehot, df.quantum[,2:ncol(df.quantum)])
iRF(df.onehot.quantum[,2:ncol(df.onehot.quantum)], df.onehot.quantum$cut.score)
df.raw.onehot.quantum <- cbind(df.raw, df.onehot[,2:ncol(df.onehot)], df.quantum[,2:ncol(df.quantum)])
iRF(df.raw.onehot.quantum[,2:ncol(df.raw.onehot.quantum)], df.raw.onehot.quantum$cut.score)
library(dplyr)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
df <- read.delim("y.lipolytica.raw.onehot.tensor.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df <- df[,c(1,1656,3:1649,1651:1655,1657)]
write.table(df, "y.lipolytica.raw.onehot.tensor.pam.location.dcast.txt", quote=F, row.names=F, sep="\t")
ncol(df)
# 1655
df.num <- mutate_all(df[,2:ncol(df)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"
write.table(df.all[,c(1,3:ncol(df.all))], "y.lipolytica.raw.onehot.tensor.pam.location.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "y.lipolytica.raw.onehot.tensor.pam.location.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "y.lipolytica.raw.onehot.tensor.pam.location.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "y.lipolytica.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "y.lipolytica.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.all[,2]), "y.lipolytica.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
df <- read.delim("y.lipolytica.20sliding.raw.onehot.tensor.dwt.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
ncol(df)
#
df.num <- mutate_all(df[,2:ncol(df)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"
write.table(df.all[,c(1,3:ncol(df.all))], "y.lipolytica.raw.onehot.tensor.pam.location.dwt.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "y.lipolytica.raw.onehot.tensor.pam.location.dwt.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "y.lipolytica.raw.onehot.tensor.pam.location.dwt.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
# run python scripts on Andes
# run job submissions on Summit
# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]
# Andes
module load python/3.7-anaconda3
#mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run
#mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.noDWT
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.noDWT
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName y.lipolytica.noDWT --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.raw.onehot.tensor.pam.location.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.score.txt
#mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.DWT
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.DWT
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName y.lipolytica.DWT --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.raw.onehot.tensor.pam.location.dwt.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.noDWT/Submits/submit_full_y.lipolytica.noDWT_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.DWT/Submits/submit_full_y.lipolytica.DWT_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.noDWT/Submits/submit_train_y.lipolytica.noDWT_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.DWT/Submits/submit_train_y.lipolytica.DWT_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.noDWT/Submits/submit_test_y.lipolytica.noDWT_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.DWT/Submits/submit_test_y.lipolytica.DWT_0.sh
# Andes
module load python/3.7-anaconda3
vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.noDWT
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/YNames.txt y.lipolytica.noDWT
# 0.09210298017671817
sort -k3rg topVarEdges/cut.score_top95.txt | head
# sgRNA.structuresgRNA.raw cut.score 0.07789451487981215
# TTsgRNA.raw cut.score 0.07174965288226201
# gene.distance0 cut.score 0.04313731815364106
# pam.distance0 cut.score 0.038925709394903266
# CGsgRNA.raw cut.score 0.03330233677130654
# AAsgRNA.raw cut.score 0.029370519036537104
# TsgRNA.raw cut.score 0.024903402499592636
# GsgRNA.raw cut.score 0.02376442487627081
# AsgRNA.raw cut.score 0.021933041187672975
# GCsgRNA.raw cut.score 0.021531833934896223
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/y.lipolytica.noDWT_cut.score.importance4 | head
# sgRNA.structuresgRNA.raw: 9447.4
# TTsgRNA.raw: 8342.87
# gene.distance0: 4978.74
# pam.distance0: 4663.46
# CGsgRNA.raw: 3942.05
# AAsgRNA.raw: 3512.91
# TsgRNA.raw: 2977.86
# GCsgRNA.raw: 2885.99
# GsgRNA.raw: 2808.87
# AsgRNA.raw: 2691.89
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.noDWT/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("y.lipolytica.noDWT_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.3180251
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.DWT
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/YNames.txt y.lipolytica.DWT
#
sort -k3rg topVarEdges/cut.score_top95.txt | head
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/y.lipolytica.save.DWT_cut.score.importance4 | head
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.DWT/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("y.lipolytica.DWT_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
#
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate shap
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save
# python
import pandas as pd
import numpy as np
np.random.seed(0)
import matplotlib.pyplot as plt
df = pd.read_table('/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.raw.onehot.tensor.pam.location.dcast.txt') # Load the data
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.ensemble import RandomForestRegressor
# The target variable is 'cut.score'.
Y = df['cut.score']
# get list of features from R... dput(colnames(df))
X = df.drop(columns =['sgRNAID', 'cut.score'])
# Split the data into train and test data:
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size = 0.2)
# Build the model with the random forest regression algorithm:
model = RandomForestRegressor(max_depth=6,random_state=0,n_estimators=10)
model.fit(X_train, Y_train)
import shap
shap_values = shap.TreeExplainer(model).shap_values(X_train)
f = plt.figure()
shap.summary_plot(shap_values, X_train, plot_type="bar")
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/baisya.noDWT.16dec.shap_summary_plot_bar.png", bbox_inches='tight', dpi=600)
import matplotlib.pyplot as plt
f = plt.figure()
shap.summary_plot(shap_values, X_train)
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/baisya.noDWT.16dec.shap_summary_plot_varimp.png", bbox_inches='tight', dpi=600)
# scp noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/baisya.noDWT.16dec.shap_summary_plot_varimp.png /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP/.
# positional encoding kmers 1-4
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer1_positional_encode.py y.lipolytica.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer2_positional_encode.py y.lipolytica.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer3_positional_encode.py y.lipolytica.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer4_positional_encode.py y.lipolytica.noscore.txt
# separate nucleotide sequence values into individual columns in data frame so each position counts as one feature
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/
sed '1d' y.lipolytica.noscore_dependent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > y.lipolytica_dep1.txt
sed '1d' y.lipolytica.noscore_dependent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > y.lipolytica_dep2.txt
sed '1d' y.lipolytica.noscore_dependent3.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > y.lipolytica_dep3.txt
sed '1d' y.lipolytica.noscore_dependent4.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > y.lipolytica_dep4.txt
# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("protein_rna_dna-vector_lee_nucleotide_dna_data_15dec.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/")
seq <- read.delim("y.lipolytica.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:5]
tensor.t <- as.data.frame(t(tensor.df[63:70,]))
tensor.t$base <- c("A", "C", "G", "T")
rownames(seq) <- seq[,1]
seq.df <- seq[,2:21]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "y.lipolytica.tensors.single.bp.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "y.lipolytica.tensors.single.bp.melt.txt", quote=F, row.names=F, sep="\t")
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J jan18.matrix
#SBATCH -N 4
#SBATCH -t 10:00:00
module load python
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save
R CMD BATCH jan18.matrix.R
R CMD BATCH jan18.matrix.2.R
#sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/jan18.matrix.sh
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
structure <- read.delim("y.lipolytica.gRNA.ViennaRNA.output.value.id.txt", header=F, sep="\t", stringsAsFactors = F)
nuc <- read.delim("y.lipolytica.nucleotide_counts_sgRNA_temp.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("y.lipolytica.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(5:6)]
colnames(score.df) <- c("sgRNAID", "cut.score")
structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]
# strucutre, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])
structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"
structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]
structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score.df, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")
## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
onehot.ind1 <- read.delim("y.lipolytica_ind1.txt", header=T, sep=" ")
onehot.ind2 <- read.delim("y.lipolytica_ind2.txt", header=T, sep=" ")
onehot.dep1 <- read.delim("y.lipolytica_dep1.txt", header=F, sep=" ")
onehot.dep2 <- read.delim("y.lipolytica_dep2.txt", header=F, sep=" ")
onehot.dep3 <- read.delim("y.lipolytica_dep3.txt", header=F, sep=" ")
onehot.dep4 <- read.delim("y.lipolytica_dep4.txt", header=F, sep=" ")
colnames(onehot.dep1)[1] <- "sgRNAID"
colnames(onehot.dep2)[1] <- "sgRNAID"
colnames(onehot.dep3)[1] <- "sgRNAID"
colnames(onehot.dep4)[1] <- "sgRNAID"
onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)-1], onehot.dep2[,1:ncol(onehot.dep2)-1], by="sgRNAID")
onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)-1], by="sgRNAID")
onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)-1], by="sgRNAID")
onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"
data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))
df.melt <- melt(data.onehot, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
write.table(df.id, "y.lipolytica.structure.temp.gc.onehot1to4.txt", quote=F, row.names=F, sep="\t")
# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
tensor <- read.delim("y.lipolytica.tensors.single.bp.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0
tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0
write.table(tensor.id, "tensor.id.test", quote=F, row.names=F, sep="\t")
tensor.id <- read.delim("tensor.id.test", header=T, sep="\t")
df.id <- read.delim("y.lipolytica.structure.temp.gc.onehot1to4.txt", header=T, sep="\t")
score <- read.delim("y.lipolytica.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(5:6)]
colnames(score.df) <- c("sgRNAID", "cut.score")
df.score <- unique(df.id[,c(1,3)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
head(df.id)
head(tensor.score.order)
tensor.df <- rbind(df.id, tensor.score.order)
df.dcast <- tensor.df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
write.table(df.dcast.na, "y.lipolytica.raw.onehot.tensor.single.bp.dcast.na.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast.na)
# 105531
# pam (distance and nucleotide)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
sgRNA.pam <- read.table("y.lipolytica.sgRNA.closestPAM.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.pam.sub <- sgRNA.pam[,c(4,12,13)]
colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]
#sgRNA.pam.df$id <- "Cas9"
#sgRNA.pam.id <- unite(sgRNA.pam.df, "sgRNAID", c(sgRNAID, id), sep="_")
score <- read.delim("y.lipolytica.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(5:6)]
colnames(score.df) <- c("sgRNAID", "cut.score")
score.location <- left_join(score.df, sgRNA.pam.df, by="sgRNAID")
score.location$scale <- 0
df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
df <- read.delim("y.lipolytica.raw.onehot.tensor.single.bp.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
nrow(df.location)
# 45271
# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
sgRNA.genes <- read.table("y.lipolytica.sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- sgRNA.genes[,c(4,14)]
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")
#sgRNA.genes.df$id <- "Cas9"
#sgRNA.genes.id <- unite(sgRNA.genes.df, "sgRNAID", c(sgRNAID, id), sep="_")
score.location <- left_join(score.df, sgRNA.genes.df, by=c("sgRNAID"))
score.location$scale <- 0
df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
df <- df.location
df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
nrow(df.location)
# 45271
write.table(df.location, "y.lipolytica.raw.onehot.tensor.single.bp.pam.location.dcast.na.txt", quote=F, row.names=F, sep="\t")
# add new DNA/RNA dimer features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(tidyr)
library(reshape2)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("quantum_dimers_20dec.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
seq <- read.delim("y.lipolytica.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.dimer <- seq %>% unite("p1", p1:p2, remove=F, sep= "") %>% unite("p2", p2:p3, remove=F, sep= "") %>% unite("p3", p3:p4, remove=F, sep= "") %>% unite("p4", p4:p5, remove=F, sep= "") %>% unite("p5", p5:p6, remove=F, sep= "") %>% unite("p6", p6:p7, remove=F, sep= "") %>% unite("p7", p7:p8, remove=F, sep= "") %>% unite("p8", p8:p9, remove=F, sep= "") %>% unite("p9", p9:p10, remove=F, sep= "") %>% unite("p10", p10:p11, remove=F, sep= "") %>% unite("p11", p11:p12, remove=F, sep= "") %>% unite("p12", p12:p13, remove=F, sep= "") %>% unite("p13", p13:p14, remove=F, sep= "") %>% unite("p14", p14:p15, remove=F, sep= "") %>% unite("p15", p15:p16, remove=F, sep= "") %>% unite("p16", p16:p17, remove=F, sep= "") %>% unite("p17", p17:p18, remove=F, sep= "") %>% unite("p18", p18:p19, remove=F, sep= "") %>% unite("p19", p19:p20, remove=T, sep= "")
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:17]
tensor.t <- as.data.frame(t(tensor.df))
#tensor.t$base <- c("A", "C", "G", "T")
tensor.t$base <- names(tensor[,2:17])
rownames(seq) <- seq.dimer[,1]
seq.df <- seq.dimer[,2:20]
seq.melt <- melt(seq.dimer, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "y.lipolytica.tensors.dimers.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "y.lipolytica.tensors.dimers.melt.txt", quote=F, row.names=F, sep="\t")
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
df <- read.delim("y.lipolytica.raw.onehot.tensor.single.bp.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
tensor <- read.delim("y.lipolytica.tensors.dimers.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0
tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0
df.score <- unique(df[,c(1,2)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
colnames(tensor.score.order) <- c("cut.score", "feature.scale", "sgRNAID", "value")
df.dcast <- tensor.score.order %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
write.table(df.dcast.na, "y.lipolytica.raw.onehot.tensor.single.bp.dimers.dcast.na.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast.na)
# 45271
df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
write.table(df.location, "y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.txt", quote=F, row.names=F, sep="\t")
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
df <- read.delim("y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df <- df[,c(1:6073,6075:6079,6081,6083:6177)]
df.num <- mutate_all(df[,2:ncol(df)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"
write.table(df.all, "y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.all[,2]), "y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
################### scores should actually be *(-1) ###################
df.score <- data.frame(sgRNAID = df.all[,1], cut.score = df.all[,2]*-1)
write.table(df.score, "y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.score, "y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.score[,2]), "y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
# run python scripts on Andes
# run job submissions on Summit
# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]
# Andes
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName y.lipolytica.tensor.single.bp.dimers --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers/Submits/submit_full_y.lipolytica.tensor.single.bp.dimers_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers/Submits/submit_train_y.lipolytica.tensor.single.bp.dimers_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers/Submits/submit_test_y.lipolytica.tensor.single.bp.dimers_0.sh
# Andes
module load python/3.7-anaconda3
vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/YNames.txt y.lipolytica.tensor.single.bp.dimers
# 0.09359164840323947
sort -k3rg topVarEdges/cut.score_top95.txt | head
# TTsgRNA.raw cut.score 0.06175000685655969
# sgRNA.structuresgRNA.raw cut.score 0.056176192213938325
# gene.distance0 cut.score 0.029570807617378663
# CGsgRNA.raw cut.score 0.02770352475569353
# pam.distance0 cut.score 0.025529473995302983
# AAsgRNA.raw cut.score 0.020337760753082006
# TsgRNA.raw cut.score 0.01737628965366339
# GsgRNA.raw cut.score 0.016720949937689043
# AsgRNA.raw cut.score 0.015102514626684835
# GCsgRNA.raw cut.score 0.014576723834117079
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/y.lipolytica.tensor.single.bp.dimers_cut.score.importance4 | head
# TTsgRNA.raw: 7681.08
# sgRNA.structuresgRNA.raw: 7402.52
# gene.distance0: 3731.39
# CGsgRNA.raw: 3524.15
# pam.distance0: 3328.5
# AAsgRNA.raw: 2595.21
# TsgRNA.raw: 2259.45
# GCsgRNA.raw: 2173.4
# GsgRNA.raw: 2138.05
# AsgRNA.raw: 1998
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("y.lipolytica.tensor.single.bp.dimers_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.3202335
### test different folds
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers/cut.score/foldRuns/fold1/Runs/Set4")
pred1 <- read.delim("y.lipolytica.tensor.single.bp.dimers_Set4_test.prediction", header=T, sep="\t")
y1 <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers/cut.score/foldRuns/fold3/Runs/Set4")
pred3 <- read.delim("y.lipolytica.tensor.single.bp.dimers_Set4_test.prediction", header=T, sep="\t")
y3 <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers/cut.score/foldRuns/fold6/Runs/Set4")
pred6 <- read.delim("y.lipolytica.tensor.single.bp.dimers_Set4_test.prediction", header=T, sep="\t")
y6 <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers/cut.score/foldRuns/fold9/Runs/Set4")
pred9 <- read.delim("y.lipolytica.tensor.single.bp.dimers_Set4_test.prediction", header=T, sep="\t")
y9 <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y1$cut.score, pred1$Predictions.)
# 0.320635
cor(y3$cut.score, pred3$Predictions.)
# 0.3195123
cor(y6$cut.score, pred6$Predictions.)
# 0.3182949
cor(y9$cut.score, pred9$Predictions.)
# 0.3202335
### test different runs
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers/cut.score/foldRuns/fold9/Runs/Set1")
pred1 <- read.delim("y.lipolytica.tensor.single.bp.dimers_Set1_test.prediction", header=T, sep="\t")
y1 <- read.delim("set1_Y_test_noSampleIDs.txt", header=T, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers/cut.score/foldRuns/fold9/Runs/Set2")
pred2 <- read.delim("y.lipolytica.tensor.single.bp.dimers_Set2_test.prediction", header=T, sep="\t")
y2 <- read.delim("set2_Y_test_noSampleIDs.txt", header=T, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers/cut.score/foldRuns/fold9/Runs/Set3")
pred3 <- read.delim("y.lipolytica.tensor.single.bp.dimers_Set3_test.prediction", header=T, sep="\t")
y3 <- read.delim("set3_Y_test_noSampleIDs.txt", header=T, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers/cut.score/foldRuns/fold9/Runs/Set4")
pred4 <- read.delim("y.lipolytica.tensor.single.bp.dimers_Set4_test.prediction", header=T, sep="\t")
y4 <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y1$cut.score, pred1$Predictions.)
# 0.2831959
cor(y2$cut.score, pred2$Predictions.)
# 0.320763
cor(y3$cut.score, pred3$Predictions.)
# 0.3148909
cor(y4$cut.score, pred4$Predictions.)
# 0.3202335
########## Output once cut.score values were multiplied by -1 ... 3 February 2022
module load python/3.7-anaconda3
vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/YNames.txt y.lipolytica.tensor.single.bp.dimers
# 0.09299333027101521
sort -k3rg topVarEdges/cut.score_top95.txt | head
# TTsgRNA.raw cut.score 0.06175630698127264
# sgRNA.structuresgRNA.raw cut.score 0.056315801993953016
# gene.distance0 cut.score 0.029906586186827133
# CGsgRNA.raw cut.score 0.02788713025998322
# pam.distance0 cut.score 0.025801982379830196
# AAsgRNA.raw cut.score 0.020488288199839153
# TsgRNA.raw cut.score 0.017637269785951606
# GsgRNA.raw cut.score 0.016757504360449742
# AsgRNA.raw cut.score 0.01509203103807862
# GCsgRNA.raw cut.score 0.01453627587711077
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/y.lipolytica.tensor.single.bp.dimers_cut.score.importance4 | head
# TTsgRNA.raw: 7689.67
# sgRNA.structuresgRNA.raw: 7426.01
# gene.distance0: 3751.52
# CGsgRNA.raw: 3552.67
# pam.distance0: 3269.85
# AAsgRNA.raw: 2567.57
# TsgRNA.raw: 2270.38
# GCsgRNA.raw: 2179.87
# GsgRNA.raw: 2157.72
# AsgRNA.raw: 2007.38
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("y.lipolytica.tensor.single.bp.dimers_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.3206221
** Need to compile the C++ file /gpfs/alpine/syb105/proj-shared/Personal/jromero/codesnippets/ritw **
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score y.lipolytica.tensor.single.bp.dimers
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers/cut.score/RIT.run
# sort -k3rg y.lipolytica.tensor.single.bp.dimers_cut.score.importance4.effect > y.lipolytica.tensor.single.bp.dimers_cut.score.importance4.effect_sorted
library(dplyr)
library(tidyr)
library(reshape2)
library(ggplot2)
library(RColorBrewer)
setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP")
imp <- read.delim("y.lipolytica.tensor.single.bp.dimers_cut.score.importance4.effect_sorted", header=F, sep="\t", stringsAsFactors = F)
colnames(imp) <- c("Feature", "YVec", "NormEdge", "Effect", "Samples", "Linearity")
#imp$Normalized.Importance <- as.numeric(substr(imp$NormEdge, 0, 4))
imp$Normalized.Importance <- as.numeric(imp$NormEdge)
imp$Feature.Effect <- as.numeric(imp$Effect)
imp$SampleCount <- as.numeric(imp$Samples)
imp.dir <- imp %>% mutate(Effect.Direction = ifelse(Feature.Effect < 0, "neg", ifelse(Feature.Effect > 0, "pos", "zero")))
imp.dir.top20 <- imp.dir[1:20,]
ggplot(imp.dir.top20, aes(x=reorder(Feature, -Normalized.Importance))) + geom_bar(aes(y=Normalized.Importance, fill=Effect.Direction), stat="identity") + coord_flip() + xlab("") + ylab("Normalized Importance") + theme_classic() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position="bottom") + scale_fill_brewer(palette="Set1")
# wc -l set0_Y_train_noSampleIDs.txt <-- 36217
imp.dir.top20$Sample.Prop <- imp.dir.top20$SampleCount/36217
ggplot(imp.dir.top20, aes(x=reorder(Feature, -Normalized.Importance))) + geom_point(aes(y=Sample.Prop, color=Effect.Direction, size=Feature.Effect)) + xlab("") + ylab("Avg Proportion of Samples that Features Influence") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Set2")
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate shap
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
# python
import pandas as pd
import numpy as np
np.random.seed(0)
import matplotlib.pyplot as plt
df = pd.read_table('/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.txt') # Load the data
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.ensemble import RandomForestRegressor
# The target variable is 'cut.score'.
Y = df['cut.score']
# get list of features from R... dput(colnames(df))
X = df.drop(columns =['sgRNAID', 'cut.score'])
# Split the data into train and test data:
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size = 0.2)
# Build the model with the random forest regression algorithm:
model = RandomForestRegressor(max_depth=6,random_state=0,n_estimators=10)
model.fit(X_train, Y_train)
import shap
shap_values = shap.TreeExplainer(model).shap_values(X_train)
f = plt.figure()
shap.summary_plot(shap_values, X_train, plot_type="bar")
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.shap_summary_plot_bar.png", bbox_inches='tight', dpi=600)
import matplotlib.pyplot as plt
f = plt.figure()
shap.summary_plot(shap_values, X_train)
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.shap_summary_plot_varimp.png", bbox_inches='tight', dpi=600)
# scp noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.shap_summary_plot_varimp.png /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP/.
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J y.lip.matrix
#SBATCH -N 1
#SBATCH -t 10:00:00
#SBATCH -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save
R CMD BATCH mar15.matrix.R
#sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/mar15.matrix.sh
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
structure <- read.delim("y.lipolytica.gRNA.ViennaRNA.output.value.id.txt", header=F, sep="\t", stringsAsFactors = F)
nuc <- read.delim("y.lipolytica.nucleotide_counts_sgRNA_temp.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("y.lipolytica.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(5:7)]
colnames(score.df) <- c("sgRNAID", "cut.score", "nucleotide.sequence")
structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]
# strucutre, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])
structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"
structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]
structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score.df, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")
## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
onehot.ind1 <- read.delim("y.lipolytica_ind1.txt", header=T, sep=" ")
onehot.ind2 <- read.delim("y.lipolytica_ind2.txt", header=T, sep=" ")
onehot.dep1 <- read.delim("y.lipolytica_dep1.txt", header=F, sep=" ")
onehot.dep2 <- read.delim("y.lipolytica_dep2.txt", header=F, sep=" ")
onehot.dep3 <- read.delim("y.lipolytica_dep3.txt", header=F, sep=" ")
onehot.dep4 <- read.delim("y.lipolytica_dep4.txt", header=F, sep=" ")
colnames(onehot.dep1)[1] <- "sgRNAID"
colnames(onehot.dep2)[1] <- "sgRNAID"
colnames(onehot.dep3)[1] <- "sgRNAID"
colnames(onehot.dep4)[1] <- "sgRNAID"
onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)-1], onehot.dep2[,1:ncol(onehot.dep2)-1], by="sgRNAID")
onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)-1], by="sgRNAID")
onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)-1], by="sgRNAID")
onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"
data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))
df.melt <- melt(data.onehot[,c(1,2,4:ncol(data.onehot))], id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
write.table(df.id, "y.lipolytica.structure.temp.gc.onehot1to4.txt", quote=F, row.names=F, sep="\t")
#
# pam (distance and nucleotide)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
sgRNA.pam <- read.table("y.lipolytica.sgRNA.closestPAM.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.pam.sub <- sgRNA.pam[,c(4,12,13)]
colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]
sgRNA.pam.id <- sgRNA.pam.df
score <- read.delim("y.lipolytica.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(5:6)]
colnames(score.df) <- c("sgRNAID", "cut.score")
score.location <- left_join(score.df, sgRNA.pam.id, by="sgRNAID")
score.location$scale <- 0
df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.pam.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df <- read.delim("y.lipolytica.structure.temp.gc.onehot1to4.txt", header=T, sep="\t")
df.onehot.dcast <- df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.onehot.pam <- left_join(df.onehot.dcast, df.pam.dcast, by=c("sgRNAID"))
df.onehot.pam.na <- na.omit(df.onehot.pam)
nrow(df.onehot.pam.na)
#
# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
sgRNA.genes <- read.table("y.lipolytica.sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- sgRNA.genes[,c(4,14)]
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")
#sgRNA.genes.df$id <- "Cas9"
#sgRNA.genes.id <- unite(sgRNA.genes.df, "sgRNAID", c(sgRNAID, id), sep="_")
sgRNA.genes.id <- sgRNA.genes.df
score.location <- left_join(score.df, sgRNA.genes.id, by=c("sgRNAID"))
score.location$scale <- 0
df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.location.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.location.dcast.na <- na.omit(df.location.dcast)
df.pam.location <- inner_join(df.location.dcast.na, df.onehot.pam.na, by=c("sgRNAID"))
nrow(df.pam.location)
# 45271
write.table(df.pam.location, "y.lipolytica.raw.matrix.txt", quote=F, row.names=F, sep="\t")
# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
# Monomer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Monomer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/")
seq <- read.delim("y.lipolytica.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq) <- seq[,1]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "y.lipolytica.quantum.monomer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "y.lipolytica.quantum.monomer.melt.txt", quote=F, row.names=F, sep="\t")
# Basepair
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Basepair.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/")
seq <- read.delim("y.lipolytica.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq) <- seq[,1]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "y.lipolytica.quantum.basepair.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "y.lipolytica.quantum.basepair.melt.txt", quote=F, row.names=F, sep="\t")
# Dimer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Dimer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/")
seq <- read.delim("y.lipolytica.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.dimer <- seq %>% unite("p1", p1:p2, remove=F, sep= "") %>% unite("p2", p2:p3, remove=F, sep= "") %>% unite("p3", p3:p4, remove=F, sep= "") %>% unite("p4", p4:p5, remove=F, sep= "") %>% unite("p5", p5:p6, remove=F, sep= "") %>% unite("p6", p6:p7, remove=F, sep= "") %>% unite("p7", p7:p8, remove=F, sep= "") %>% unite("p8", p8:p9, remove=F, sep= "") %>% unite("p9", p9:p10, remove=F, sep= "") %>% unite("p10", p10:p11, remove=F, sep= "") %>% unite("p11", p11:p12, remove=F, sep= "") %>% unite("p12", p12:p13, remove=F, sep= "") %>% unite("p13", p13:p14, remove=F, sep= "") %>% unite("p14", p14:p15, remove=F, sep= "") %>% unite("p15", p15:p16, remove=F, sep= "") %>% unite("p16", p16:p17, remove=F, sep= "") %>% unite("p17", p17:p18, remove=F, sep= "") %>% unite("p18", p18:p19, remove=F, sep= "") %>% unite("p19", p19:p20, remove=T, sep= "")
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq.dimer) <- seq.dimer[,1]
seq.df <- seq.dimer[,1:20]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "y.lipolytica.quantum.dimer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "y.lipolytica.quantum.dimer.melt.txt", quote=F, row.names=F, sep="\t")
# Trimer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Trimer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/")
seq <- read.delim("y.lipolytica.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.trimer <- seq %>% unite("p1", p1:p3, remove=F, sep= "") %>% unite("p2", p2:p4, remove=F, sep= "") %>% unite("p3", p3:p5, remove=F, sep= "") %>% unite("p4", p4:p6, remove=F, sep= "") %>% unite("p5", p5:p7, remove=F, sep= "") %>% unite("p6", p6:p8, remove=F, sep= "") %>% unite("p7", p7:p9, remove=F, sep= "") %>% unite("p8", p8:p10, remove=F, sep= "") %>% unite("p9", p9:p11, remove=F, sep= "") %>% unite("p10", p10:p12, remove=F, sep= "") %>% unite("p11", p11:p13, remove=F, sep= "") %>% unite("p12", p12:p14, remove=F, sep= "") %>% unite("p13", p13:p15, remove=F, sep= "") %>% unite("p14", p14:p16, remove=F, sep= "") %>% unite("p15", p15:p17, remove=F, sep= "") %>% unite("p16", p16:p18, remove=F, sep= "") %>% unite("p17", p17:p19, remove=F, sep= "") %>% unite("p18", p18:p20, remove=F, sep= "")
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq.trimer) <- seq.trimer[,1]
seq.df <- seq.trimer[,1:19]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "y.lipolytica.quantum.trimer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "y.lipolytica.quantum.trimer.melt.txt", quote=F, row.names=F, sep="\t")
# Tetramer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Tetramer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/")
seq <- read.delim("y.lipolytica.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.tetramer <- seq %>% unite("p1", p1:p4, remove=F, sep= "") %>% unite("p2", p2:p5, remove=F, sep= "") %>% unite("p3", p3:p6, remove=F, sep= "") %>% unite("p4", p4:p7, remove=F, sep= "") %>% unite("p5", p5:p8, remove=F, sep= "") %>% unite("p6", p6:p9, remove=F, sep= "") %>% unite("p7", p7:p10, remove=F, sep= "") %>% unite("p8", p8:p11, remove=F, sep= "") %>% unite("p9", p9:p12, remove=F, sep= "") %>% unite("p10", p10:p13, remove=F, sep= "") %>% unite("p11", p11:p14, remove=F, sep= "") %>% unite("p12", p12:p15, remove=F, sep= "") %>% unite("p13", p13:p16, remove=F, sep= "") %>% unite("p14", p14:p17, remove=F, sep= "") %>% unite("p15", p15:p18, remove=F, sep= "") %>% unite("p16", p16:p19, remove=F, sep= "") %>% unite("p17", p17:p20, remove=F, sep= "")
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq.tetramer) <- seq.tetramer[,1]
seq.df <- seq.tetramer[,1:18]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "y.lipolytica.quantum.tetramer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "y.lipolytica.quantum.tetramer.melt.txt", quote=F, row.names=F, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/")
monomer <- read.delim("y.lipolytica.quantum.monomer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
basepair <- read.delim("y.lipolytica.quantum.basepair.melt.txt", header=T, sep="\t", stringsAsFactors = F)
dimer <- read.delim("y.lipolytica.quantum.dimer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
trimer <- read.delim("y.lipolytica.quantum.trimer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
tetramer <- read.delim("y.lipolytica.quantum.tetramer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
monomer.basepair <- rbind(monomer, basepair)
monomer.basepair.dimer <- rbind(monomer.basepair, dimer)
monomer.basepair.dimer.trimer <- rbind(monomer.basepair.dimer, trimer)
monomer.basepair.dimer.trimer.tetramer <- rbind(monomer.basepair.dimer.trimer, tetramer)
write.table(monomer.basepair.dimer.trimer.tetramer, "y.lipolytica.15mar22.quantum.melt.txt", quote=F, row.names=F, sep="\t")
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
df <- read.delim("y.lipolytica.raw.matrix.txt", header=T, sep="\t", stringsAsFactors = F)
# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
tensor <- read.delim("y.lipolytica.15mar22.quantum.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0
tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0
df.score <- unique(df[,c(1,2)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
colnames(tensor.score.order) <- c("cut.score", "feature.scale", "sgRNAID", "value")
df.dcast <- tensor.score.order %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
nrow(df.dcast.na)
# 45271
df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
write.table(df.location, "y.lipolytica.finalquantum.txt", quote=F, row.names=F, sep="\t")
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
df <- read.delim("y.lipolytica.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df.cut <- df %>% select(-grep("cut.score.y.y", names(df)), -grep("cut.score.y", names(df)), -grep("cut.score.x.x", names(df)))
df.num <- mutate_all(df.cut[,2:ncol(df.cut)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df.cut[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"
write.table(df.all, "y.lipolytica.finalquantum.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "y.lipolytica.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "y.lipolytica.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "y.lipolytica.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "y.lipolytica.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "y.lipolytica.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.all[,2]), "y.lipolytica.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
# run python scripts on Andes
# run job submissions on Summit
# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]
# Andes
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName y.lipolytica.finalquantum --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.finalquantum.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum/Submits/submit_full_y.lipolytica.finalquantum_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum/Submits/submit_train_y.lipolytica.finalquantum_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum/Submits/submit_test_y.lipolytica.finalquantum_0.sh
# Andes
module load python/3.7-anaconda3
vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/YNames.txt y.lipolytica.finalquantum
# 0.08545804711311601
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/y.lipolytica.finalquantum_cut.score.importance4 | head
# TTsgRNA.raw: 6478.7
# sgRNA.structuresgRNA.raw: 4883.34
# CGsgRNA.raw: 2532.61
# p11tetramer.Hbond.stackingraw: 2431.19
# p5tetramer.Hlgap.eVEraw: 2182.59
# p2tetramer.Hbond.stackingraw: 2180.77
# p5tetramer.Hbond.stackingraw: 2092.17
# p4tetramer.Hlgap.eVEraw: 2010.87
# p6tetramer.Hbond.stackingraw: 2001.02
# p6tetramer.Hlgap.eVEraw: 1979.47
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("y.lipolytica.finalquantum_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.3024251
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score y.lipolytica.finalquantum
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum/cut.score/RIT.run
# TTsgRNA.raw cut.score 0.04625773976249322 0.0003649053206181409 66194.581 -3.3820271415852257
# sgRNA.structuresgRNA.raw cut.score 0.032304910327757425 -5.1113396088919486e-05 56086.877 -3.688107542876454
# CGsgRNA.raw cut.score 0.018116510727833473 0.0002235208147464496 46003.732 -3.5040572264423506
# p11tetramer.Hbond.stackingraw cut.score 0.015149117580774115 -2.207199826384307e-05 12336.336 -3.8326470012202316
# p5tetramer.Hbond.stackingraw cut.score 0.01514820907514612 -2.6014164934804636e-05 15496.092 -3.8566216507153634
# p5tetramer.Hlgap.eVEraw cut.score 0.014912532026942904 -2.648470659462676e-05 10171.831 -3.90096106338001
# p6tetramer.Hbond.stackingraw cut.score 0.014000071727391495 -1.5348780718090023e-05 9198.55 -3.9048327611853013
# p2tetramer.Hbond.stackingraw cut.score 0.013797742179886422 -3.2383220351173965e-05 10161.696 -3.923555913351904
# p6tetramer.Hlgap.eVEraw cut.score 0.013487888319232852 -2.774670944787072e-05 9729.251 -3.9506660077623654
# p2tetramer.Hlgap.eVEraw cut.score 0.013485483451394041 -4.0047331023409636e-05 9609.594 -3.948292258091099
library(ggplot2)
library(reshape2)
library(RColorBrewer)
# Figure 5A
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum/cut.score")
imp <- read.delim("y.lipolytica.finalquantum.importance4.effect_sorted", header=F, sep="\t", stringsAsFactors = F)
colnames(imp) <- c("Feature", "YVec", "NormEdge", "Effect", "Samples", "Linearity")
imp$Normalized.Importance <- as.numeric(imp$NormEdge)
imp$Feature.Effect <- as.numeric(imp$Effect)
imp$SampleCount <- as.numeric(imp$Samples)
imp.dir <- imp %>% mutate(Effect.Direction = ifelse(Feature.Effect < 0, "neg", ifelse(Feature.Effect > 0, "pos", "zero")))
imp.dir.top20 <- imp.dir[1:20,]
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
pdf("y.lipolytica.Imp.Dir.Top20.21March.pdf")
ggplot(imp.dir.top20) + geom_bar(aes(x=reorder(Feature, -Normalized.Importance), y=Normalized.Importance, fill=Effect.Direction), stat="identity") + theme_classic() + xlab("Y.lipolytica Top Features") + ylab("Normalized Importance") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_fill_brewer(palette="Set1")
dev.off()
pdf("y.lipolytica.Imp.Dir.Top20.Effect.21March.pdf")
imp.dir.top20$Sample.Prop <- imp.dir.top20$SampleCount/32374
ggplot(imp.dir.top20, aes(x=reorder(Feature, -Normalized.Importance))) + geom_point(aes(y=Sample.Prop, color=Effect.Direction, size=Feature.Effect)) + xlab("y.lipolytica") + ylab("Avg Proportion of Samples that Features Influence") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Set2")
dev.off()
#### Figure 5B: Focus on effect size
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum/cut.score")
imp <- read.delim("y.lipolytica.finalquantum.importance4.effect_sorted", header=F, sep="\t", stringsAsFactors = F)
colnames(imp) <- c("Feature", "YVec", "NormEdge", "Effect", "Samples", "Linearity")
imp$Normalized.Importance <- as.numeric(imp$NormEdge)
imp$Feature.Effect <- as.numeric(imp$Effect)
imp$SampleCount <- as.numeric(imp$Samples)
imp.dir <- imp %>% mutate(Effect.Direction = ifelse(Feature.Effect < 0, "neg", ifelse(Feature.Effect > 0, "pos", "zero")))
imp.dir$absEffect <- abs(imp.dir$Feature.Effect)
imp.dir.effectsorted <- imp.dir[order(imp.dir$absEffect, decreasing = TRUE),]
imp.dir.effectsorted.top20 <- imp.dir.effectsorted[1:20,]
pdf("y.lipolytica.Imp.Dir.Top20Effect.Effect.pdf")
ggplot(imp.dir.effectsorted.top20) + geom_point(aes(x=Feature, y=absEffect, color=Effect.Direction, size=Normalized.Importance)) + xlab("") + ylab("abs(Effect Size)") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Set2")
dev.off()
https://www.nature.com/articles/nbt.3026?report=reader#Sec15
# scp /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/Sprint.Opioid.ATAC/Genome/GCF_000001405.39_GRCh38.p13_genomic.fna noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/.
# scp /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/Sprint.Opioid.ATAC/Genome/GCF_000001405.39_GRCh38.p13_genomic.gene.sorted.gtf noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/.
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/human/")
df <- read.delim("doench.2014.TableS7.txt", header=T, sep="\t")
colnames(df) <- c("sgRNAID", "nucleotide.sequence", "cut.score")
df2 <- df[,c(1,3,2)]
df.na <- na.omit(df2)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/")
write.table(df.na, "Doench2014.txt", quote=F, row.names=F, sep="\t")
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/
sed '1d' Doench2014.txt | awk '{print ">"$1"\n"$3}' > Doench2014.fasta
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
## blast
# conda install blast
# cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes
# wget https://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/ncbi-blast-2.11.0+-x64-linux.tar.gz
# tar zxvpf ncbi-blast-2.11.0+-x64-linux.tar.gz
# export PATH=$PATH:$HOME/ncbi-blast-2.10.1+/bin
# echo $PATH
# mkdir $HOME/blastdb
# export BLASTDB=$HOME/blastdb
# set BLASTDB=$HOME/blastdb
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/makeblastdb -in GCF_000001405.39_GRCh38.p13_genomic.fna -dbtype nucl
# /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query Doench2014.fasta -db GCF_000001405.39_GRCh38.p13_genomic.fna -out Doench2014.gRNA.blast.tab -outfmt 6 -evalue 0.0005 -task blastn -num_threads 10
#
# # can't find sequences... do i need the complement??
# source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
# conda create --name emboss python=3.8
# conda activate emboss
# conda install -c conda-forge -c bioconda emboss
# ## revseq test.fasta -noreverse -complement -outseq test.comp.fasta
# revseq Doench2014.fasta -noreverse -complement -outseq Doench2014.comp.fasta
#
# /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query Doench2014.comp.fasta -db GCF_000001405.39_GRCh38.p13_genomic.fna -out Doench2014.gRNA.blast.tab -outfmt 6 -evalue 0.0005 -task blastn -num_threads 10
#### correction... just needed to adjust settings in the blast command... used forward strand (originally provided from table S7)
#/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query Doench2014.fasta -db GCF_000001405.39_GRCh38.p13_genomic.fna -out Doench2014.gRNA.blast.tab -outfmt 6 -task blastn-short -num_threads 10
# 105959 (1841 sgRNAs)
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query Doench2014.fasta -db GCF_000001405.39_GRCh38.p13_genomic.fna -out Doench2014.gRNA.blast.tab -outfmt 6 -evalue 0.01 -task blastn-short -num_threads 10
# 1733
awk '{if ($9 > $10) print $2"\t"$10"\t"$9"\t"$1}' Doench2014.gRNA.blast.tab > tmp1.bed
awk '{if ($10 > $9) print $2"\t"$9"\t"$10"\t"$1}' Doench2014.gRNA.blast.tab > tmp2.bed
cat tmp1.bed tmp2.bed > Doench2014.gRNA.blast.bed
#### also run complement
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query Doench2014.comp.fasta -db GCF_000001405.39_GRCh38.p13_genomic.fna -out Doench2014.gRNA.complement.blast.tab -outfmt 6 -task blastn-short -num_threads 10
awk '{if ($9 > $10) print $2"\t"$10"\t"$9"\t"$1}' Doench2014.gRNA.complement.blast.tab > tmp1.comp.bed
awk '{if ($10 > $9) print $2"\t"$9"\t"$10"\t"$1}' Doench2014.gRNA.complement.blast.tab > tmp2.comp.bed
cat tmp1.comp.bed tmp2.comp.bed > Doench2014.gRNA.complement.blast.bed
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
# R
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/human/")
df <- read.delim("doench.2014.TableS7.txt", header=T, sep="\t")
colnames(df) <- c("sgRNAID", "nucleotide.sequence", "cut.score")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/")
coord <- read.delim("Doench2014.gRNA.blast.bed", header=F, sep="\t")
colnames(coord) <- c("chr", "start", "end", "sgRNA")
df$sgRNA <- df$sgRNAID
library(dplyr)
df.coord <- left_join(coord, df, by="sgRNA")
write.table(df.coord, "Doench2014.sgRNA.coord.txt", quote=F, row.names=F, sep="\t")
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014
faidx GCF_000001405.39_GRCh38.p13_genomic.fna -i chromsizes > Doench2014.sizes.genome
grep 'NC_' Doench2014.sizes.genome > Doench2014.sizes.genome.chr
bedtools makewindows -g Doench2014.sizes.genome.chr -w 20 -s 1 > Doench2014.20bp.sliding.bed
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014
## genes
bedtools intersect -wo -a Doench2014.20bp.sliding.bed -b GCF_000001405.39_GRCh38.p13_genomic.gene.sorted.gtf > Doench2014.gene.20sliding.bed
## GC content
bedtools nuc -fi GCF_000001405.39_GRCh38.p13_genomic.fna -bed Doench2014.20bp.sliding.bed | sed '1d' > Doench2014.GC.20sliding.bed
https://biopython.org/docs/1.75/api/Bio.SeqUtils.MeltingTemp.html
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
Bio.SeqUtils.MeltingTemp.Tm_NN(seq, check=True, strict=True, c_seq=None, shift=0, nn_table=None, tmm_table=None, imm_table=None, de_table=None, dnac1=25, dnac2=25, selfcomp=False, Na=50, K=0, Tris=0, Mg=0, dNTPs=0, saltcorr=5)
https://warwick.ac.uk/fac/sci/moac/people/students/peter_cock/python/fasta_n
# summit: # conda install -c conda-forge biopython
### sgRNA
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014
# count nucleotides
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014
python3
input_file = open('Doench2014.fasta', 'r')
output_file = open('nucleotide_counts_sgRNA.tsv','w')
output_file.write('Window\tA\tC\tG\tT\tLength\tCG%\n')
from Bio import SeqIO
for cur_record in SeqIO.parse(input_file, "fasta") :
gene_name = cur_record.name
A_count = cur_record.seq.count('A')
C_count = cur_record.seq.count('C')
G_count = cur_record.seq.count('G')
T_count = cur_record.seq.count('T')
length = len(cur_record.seq)
cg_percentage = float(C_count + G_count) / length
output_line = '%s\t%i\t%i\t%i\t%i\t%i\t%f\n' % \
(gene_name, A_count, C_count, G_count, T_count, length, cg_percentage)
output_file.write(output_line)
output_file.close()
input_file.close()
exit()
# Melting temperature(°C) = 64.9 + 41 * (nG+nC-16.4)/(nA+nT+nG+nC)
R
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
df <- read.delim("nucleotide_counts_sgRNA.tsv", header=T, sep="\t")
df.melt <- df %>% mutate(MeltingTemp = 64.9 + 41 * (G+C-16.4) / (A+T+G+C))
write.table(df.melt, "Doench2014.nucleotide_counts_sgRNA_temp.txt", quote=F, row.names=F, sep="\t")
q()
### 20bp sliding windows
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014
bedtools getfasta -fi GCF_000001405.39_GRCh38.p13_genomic.fna -bed Doench2014.20bp.sliding.bed -fo Doench2014.20sliding.fa
# count nucleotides
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014
python3
input_file = open('Doench2014.20sliding.fa', 'r')
output_file = open('nucleotide_counts_20sliding.tsv','w')
output_file.write('Window\tA\tC\tG\tT\tLength\tCG%\n')
from Bio import SeqIO
for cur_record in SeqIO.parse(input_file, "fasta") :
gene_name = cur_record.name
A_count = cur_record.seq.count('A')
C_count = cur_record.seq.count('C')
G_count = cur_record.seq.count('G')
T_count = cur_record.seq.count('T')
length = len(cur_record.seq)
cg_percentage = float(C_count + G_count) / length
output_line = '%s\t%i\t%i\t%i\t%i\t%i\t%f\n' % \
(gene_name, A_count, C_count, G_count, T_count, length, cg_percentage)
output_file.write(output_line)
output_file.close()
input_file.close()
exit()
# Melting temperature(°C) = 64.9 + 41 * (nG+nC-16.4)/(nA+nT+nG+nC)
R
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
df <- read.delim("nucleotide_counts_20sliding.tsv", header=T, sep="\t")
df.melt <- df %>% mutate(MeltingTemp = 64.9 + 41 * (G+C-16.4) / (A+T+G+C))
write.table(df.melt, "Doench2014.nucleotide_counts_20sliding_temp.txt", quote=F, row.names=F, sep="\t")
q()
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J temp.melt.sliding
#SBATCH -N 1
#SBATCH -t 48:00:00
#SBATCH -o temp.melt.sliding-%j.o
#SBATCH -e temp.melt.sliding-%j.e
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014
python3 temp.melt.sliding.py
R CMD BATCH temp.melt.sliding.R
#sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/temp.melt.sliding.sh
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/
cut -f 1,3 Doench2014.txt > Doench2014.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/encode_sequences.py Doench2014.noscore.txt
# separate nucleotide sequence values into individual columns in data frame so each position counts as one feature
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/
sed '1d' Doench2014.noscore_independent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID A C T G' | cut -d ' ' -f 1-5 > Doench2014_ind1.txt
sed '1d' Doench2014.noscore_independent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID AA AC AT AG CA CC CT CG TA TC TT TG GA GC GT GG' | cut -d ' ' -f 1-17 > Doench2014_ind2.txt
sed '1d' Doench2014.noscore_dependent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID p1.A p1.C p1.T p1.G p2.A p2.C p2.T p2.G p3.A p3.C p3.T p3.G p4.A p4.C p4.T p4.G p5.A p5.C p5.T p5.G p6.A p6.C p6.T p6.G p7.A p7.C p7.T p7.G p8.A p8.C p8.T p8.G p9.A p9.C p9.T p9.G p10.A p10.C p10.T p10.G p11.A p11.C p11.T p11.G p12.A p12.C p12.T p12.G p13.A p13.C p13.T p13.G p14.A p14.C p14.T p14.G p15.A p15.C p15.T p15.G p16.A p16.C p16.T p16.G p17.A p17.C p17.T p17.G p18.A p18.C p18.T p18.G p19.A p19.C p19.T p19.G p20.A p20.C p20.T p20.G' | cut -d ' ' -f 1-81 > Doench2014_dep1.txt
sed '1d' Doench2014.noscore_dependent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID p1.AA p1.AC p1.AT p1.AG p1.CA p1.CC p1.CT p1.CG p1.TA p1.TC p1.TT p1.TG p1.GA p1.GC p1.GT p1.GG p2.AA p2.AC p2.AT p2.AG p2.CA p2.CC p2.CT p2.CG p2.TA p2.TC p2.TT p2.TG p2.GA p2.GC p2.GT p2.GG p3.AA p3.AC p3.AT p3.AG p3.CA p3.CC p3.CT p3.CG p3.TA p3.TC p3.TT p3.TG p3.GA p3.GC p3.GT p3.GG p4.AA p4.AC p4.AT p4.AG p4.CA p4.CC p4.CT p4.CG p4.TA p4.TC p4.TT p4.TG p4.GA p4.GC p4.GT p4.GG p5.AA p5.AC p5.AT p5.AG p5.CA p5.CC p5.CT p5.CG p5.TA p5.TC p5.TT p5.TG p5.GA p5.GC p5.GT p5.GG p6.AA p6.AC p6.AT p6.AG p6.CA p6.CC p6.CT p6.CG p6.TA p6.TC p6.TT p6.TG p6.GA p6.GC p6.GT p6.GG p7.AA p7.AC p7.AT p7.AG p7.CA p7.CC p7.CT p7.CG p7.TA p7.TC p7.TT p7.TG p7.GA p7.GC p7.GT p7.GG p8.AA p8.AC p8.AT p8.AG p8.CA p8.CC p8.CT p8.CG p8.TA p8.TC p8.TT p8.TG p8.GA p8.GC p8.GT p8.GG p9.AA p9.AC p9.AT p9.AG p9.CA p9.CC p9.CT p9.CG p9.TA p9.TC p9.TT p9.TG p9.GA p9.GC p9.GT p9.GG p10.AA p10.AC p10.AT p10.AG p10.CA p10.CC p10.CT p10.CG p10.TA p10.TC p10.TT p10.TG p10.GA p10.GC p10.GT p10.GG p11.AA p11.AC p11.AT p11.AG p11.CA p11.CC p11.CT p11.CG p11.TA p11.TC p11.TT p11.TG p11.GA p11.GC p11.GT p11.GG p12.AA p12.AC p12.AT p12.AG p12.CA p12.CC p12.CT p12.CG p12.TA p12.TC p12.TT p12.TG p12.GA p12.GC p12.GT p12.GG p13.AA p13.AC p13.AT p13.AG p13.CA p13.CC p13.CT p13.CG p13.TA p13.TC p13.TT p13.TG p13.GA p13.GC p13.GT p13.GG p14.AA p14.AC p14.AT p14.AG p14.CA p14.CC p14.CT p14.CG p14.TA p14.TC p14.TT p14.TG p14.GA p14.GC p14.GT p14.GG p15.AA p15.AC p15.AT p15.AG p15.CA p15.CC p15.CT p15.CG p15.TA p15.TC p15.TT p15.TG p15.GA p15.GC p15.GT p15.GG p16.AA p16.AC p16.AT p16.AG p16.CA p16.CC p16.CT p16.CG p16.TA p16.TC p16.TT p16.TG p16.GA p16.GC p16.GT p16.GG p17.AA p17.AC p17.AT p17.AG p17.CA p17.CC p17.CT p17.CG p17.TA p17.TC p17.TT p17.TG p17.GA p17.GC p17.GT p17.GG p18.AA p18.AC p18.AT p18.AG p18.CA p18.CC p18.CT p18.CG p18.TA p18.TC p18.TT p18.TG p18.GA p18.GC p18.GT p18.GG p19.AA p19.AC p19.AT p19.AG p19.CA p19.CC p19.CT p19.CG p19.TA p19.TC p19.TT p19.TG p19.GA p19.GC p19.GT p19.GG p20.AA p20.AC p20.AT p20.AG p20.CA p20.CC p20.CT p20.CG p20.TA p20.TC p20.TT p20.TG p20.GA p20.GC p20.GT p20.GG' | cut -d ' ' -f 1-321 > Doench2014_dep2.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/
sed '1d' Doench2014.noscore.txt | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 p16 p17 p18 p19 p20' | cut -d ' ' -f 1-21 > Doench2014.sequence.txt
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("protein_rna_dna-vector_lee_nucleotide_dna_data.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/")
seq <- read.delim("Doench2014.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:5]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- c("A", "C", "G", "T")
rownames(seq) <- seq[,1]
seq.df <- seq[,2:21]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Doench2014.tensors.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014.tensors.melt.txt", quote=F, row.names=F, sep="\t")
https://www.tbi.univie.ac.at/RNA/tutorial/ minimum free energy (MFE) structure
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate ViennaRNA
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/vienna
RNAfold < ../Doench2014.fasta > Doench2014.gRNA.ViennaRNA.output.txt
grep '(' Doench2014.gRNA.ViennaRNA.output.txt | grep -Eo '[+-]?[0-9]+([.][0-9]+)?' > Doench2014.gRNA.ViennaRNA.output.value.txt
grep '>' Doench2014.gRNA.ViennaRNA.output.txt | sed 's/>//g' > Doench2014.gRNA.names.txt
paste Doench2014.gRNA.names.txt Doench2014.gRNA.ViennaRNA.output.value.txt > Doench2014.gRNA.ViennaRNA.output.value.id.txt
cp Doench2014.gRNA.ViennaRNA.output.value.id.txt ../.
# 20bp sliding fasta
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/vienna
RNAfold < ../Doench2014.20sliding.fa > Doench2014.20sliding.ViennaRNA.output.txt
grep '(' Doench2014.20sliding.ViennaRNA.output.txt | grep -Eo '[+-]?[0-9]+([.][0-9]+)?' > Doench2014.20sliding.ViennaRNA.output.value.txt
grep '>' Doench2014.20sliding.ViennaRNA.output.txt | sed 's/>//g' > Doench2014.20sliding.names.txt
paste Doench2014.20sliding.names.txt Doench2014.20sliding.ViennaRNA.output.value.txt > Doench2014.20sliding.ViennaRNA.output.value.id.txt
cp Doench2014.20sliding.ViennaRNA.output.value.id.txt ../.
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J ViennaRNA.doench2014
#SBATCH -N 2
#SBATCH -t 48:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate ViennaRNA
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/vienna
RNAfold < ../Doench2014.20sliding.fa > Doench2014.20sliding.ViennaRNA.output.txt
grep '(' Doench2014.20sliding.ViennaRNA.output.txt | grep -Eo '[+-]?[0-9]+([.][0-9]+)?' > Doench2014.20sliding.ViennaRNA.output.value.txt
grep '>' Doench2014.20sliding.ViennaRNA.output.txt | sed 's/>//g' > Doench2014.20sliding.names.txt
paste Doench2014.20sliding.names.txt Doench2014.20sliding.ViennaRNA.output.value.txt > Doench2014.20sliding.ViennaRNA.output.value.id.txt
cp Doench2014.20sliding.ViennaRNA.output.value.id.txt ../.
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/ViennaRNA.doench2014.sh
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
## GATC motif
## fastaregex
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f GCF_000001405.39_GRCh38.p13_genomic.fna -r 'GATC' > Doench2014.gatc.bed
bedtools intersect -wo -a Doench2014.20bp.sliding.bed -b Doench2014.gatc.bed > Doench2014.gatc.20sliding.bed
https://www.synthego.com/guide/how-to-use-crispr/pam-sequence
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J bedtools
#SBATCH -N 1
#SBATCH -p gpu
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014
awk '{print $0"\t""+"}' Doench2014.sgRNA.coord.bed > Doench2014.sgRNA.coord.strand.txt
bedtools closest -a Doench2014.sgRNA.coord.strand.txt -b Doench2014.NGG.PAM.sorted.bed -io -iu -D a > Doench2014.sgRNA.closestPAM.bed
bedtools intersect -wo -a Doench2014.20bp.sliding.bed -b Doench2014.NGG.PAM.sorted.bed > Doench2014.NGG.PAM.20bp.sliding.windows.bed
cut -f 1-4 Doench2014.sgRNA.coord.txt | sed '1d' | sort -k 1,1 -k 2,2n > Doench2014.sgRNA.coord.bed
bedtools closest -a Doench2014.sgRNA.coord.bed -b GCF_000001405.39_GRCh38.p13_genomic.gene.sorted.gtf -D b > Doench2014.sgRNA.gene.closest.bed
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/bedtools.sh
salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
# generate fastq file of NGG sequences and blast to reference
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014
# vim NGG.PAM.fasta
## fastaRegexFinder
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f GCF_000001405.39_GRCh38.p13_genomic.fna -r 'AGG' > Doench2014.AGG.PAM.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f GCF_000001405.39_GRCh38.p13_genomic.fna -r 'TGG' > Doench2014.TGG.PAM.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f GCF_000001405.39_GRCh38.p13_genomic.fna -r 'CGG' > Doench2014.CGG.PAM.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f GCF_000001405.39_GRCh38.p13_genomic.fna -r 'GGG' > Doench2014.GGG.PAM.txt
cat Doench2014.AGG.PAM.txt Doench2014.TGG.PAM.txt Doench2014.CGG.PAM.txt Doench2014.GGG.PAM.txt > Doench2014.NGG.PAM.txt
sort -k 1,1 -k 2,2n Doench2014.NGG.PAM.txt > Doench2014.NGG.PAM.sorted.bed
# intersect with sliding windows in the genome to get density for DWT
bedtools intersect -wo -a Doench2014.20bp.sliding.bed -b Doench2014.NGG.PAM.sorted.bed > Doench2014.NGG.PAM.20bp.sliding.windows.bed
# closest with gRNAs to identify distance (downstream, strand)
awk '{print $0"\t""+"}' Doench2014.sgRNA.coord.bed > Doench2014.sgRNA.coord.strand.txt
bedtools closest -a Doench2014.sgRNA.coord.strand.txt -b Doench2014.NGG.PAM.sorted.bed -io -iu -D a > Doench2014.sgRNA.closestPAM.bed
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014
cut -f 1-4 Doench2014.sgRNA.coord.txt | sed '1d' | sort -k 1,1 -k 2,2n > Doench2014.sgRNA.coord.bed
bedtools closest -a Doench2014.sgRNA.coord.bed -b GCF_000001405.39_GRCh38.p13_genomic.gene.sorted.gtf -D b > Doench2014.sgRNA.gene.closest.bed
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
structure <- read.delim("Doench2014.gRNA.ViennaRNA.output.value.id.txt", header=T, sep="\t", stringsAsFactors = F)
nuc <- read.delim("Doench2014.nucleotide_counts_sgRNA_temp.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("Doench2014.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- unique(score[,c(5,7)])
colnames(score.df) <- c("sgRNAID", "cut.score")
structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]
# strucutre, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])
structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"
structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]
structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score.df, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")
## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
onehot.ind1 <- read.delim("Doench2014_ind1.txt", header=T, sep=" ")
onehot.ind2 <- read.delim("Doench2014_ind2.txt", header=T, sep=" ")
onehot.dep1 <- read.delim("Doench2014_dep1.txt", header=T, sep=" ")
onehot.dep2 <- read.delim("Doench2014_dep2.txt", header=T, sep=" ")
onehot.dep2 <- onehot.dep2[,1:305]
onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep <- full_join(onehot.dep1, onehot.dep2, by="sgRNAID")
onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"
data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))
df.melt <- melt(data.onehot, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
write.table(df.id, "df.id.test.txt", quote=F, row.names=F, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
tensor <- read.delim("Doench2014.tensors.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0
tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0
write.table(tensor.id, "tensor.id.test", quote=F, row.names=F, sep="\t")
# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
df.id <- read.delim("df.id.test.txt", header=T, sep="\t")
tensor.id <- read.delim("tensor.id.test", header=T, sep="\t")
df.score <- unique(df.id[,c(1,3)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
head(df.id)
head(tensor.score.order)
tensor.df <- rbind(df.id, tensor.score.order)
write.table(tensor.df, "Doench2014.raw.onehot.tensor.txt", quote=F, row.names=F, sep="\t")
df.dcast <- tensor.df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
write.table(df.dcast, "Doench2014.raw.onehot.tensor.dcast.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast)
df.dcast.na <- na.omit(df.dcast)
write.table(df.dcast.na, "Doench2014.raw.onehot.tensor.dcast.na.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast.na)
# pam (distance and nucleotide)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
sgRNA.pam <- read.table("Doench2014.sgRNA.closestPAM.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.pam.sub <- sgRNA.pam[,c(4,12,13)]
colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]
score.location <- left_join(score.df, sgRNA.pam.df, by=c("sgRNAID"))
score.location$scale <- 0
df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
write.table(df.dcast.na, "Doench2014.sgRNA.pam.dcast.txt", quote=F, row.names=F, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
df.dcast <- read.delim("Doench2014.sgRNA.pam.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
df.dcast.sep <- df.dcast[,c(1,3:7)]
df <- read.delim("Doench2014.raw.onehot.tensor.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df.location <- inner_join(df, df.dcast.sep, by=c("sgRNAID"))
nrow(df.location)
# 1825
write.table(df.location, "Doench2014.raw.onehot.tensor.pam.dcast.na.txt", quote=F, row.names=F, sep="\t")
# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
sgRNA.genes <- read.table("Doench2014.sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- sgRNA.genes[,c(4,14)]
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")
score.location <- left_join(score.df, sgRNA.genes.df, by=c("sgRNAID"))
score.location$scale <- 0
df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
#
write.table(df.dcast.na, "Doench2014.sgRNA.location.dcast.txt", quote=F, row.names=F, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
df.dcast <- read.delim("Doench2014.sgRNA.location.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
df <- read.delim("Doench2014.raw.onehot.tensor.pam.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df.location <- inner_join(df, df.dcast, by=c("sgRNAID"))
nrow(df.location)
# 1825
write.table(df.location, "Doench2014.raw.onehot.tensor.pam.location.dcast.na.txt", quote=F, row.names=F, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
df <- read.delim("Doench2014.raw.onehot.tensor.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.y'] <- 'cut.score.x'
df <- df[,c(1:1654,1656)]
write.table(df, "Doench2014.raw.onehot.tensor.pam.location.dcast.corrected.txt", quote=F, row.names=F, sep="\t")
–> MAJOR CHALLENGE WITH WAVELETS FOR HUMAN DATA: genome is too large so compute time is too memory intensive for R… can’t generate modwt files
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J haar.matrix
#SBATCH -N 1
#SBATCH -p gpu
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014
# R CMD BATCH haar.matrix.R
R CMD BATCH haar.matrix.gatc.R
R CMD BATCH haar.matrix.gene.R
R CMD BATCH haar.matrix.structure.R
R CMD BATCH haar.matrix.nuc.R
R CMD BATCH haar.matrix.pam.R
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/haar.matrix.sh
salloc -A SYB105 -N 2 -p gpu -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/modwt
R
library(dplyr)
library(reshape2)
library(tidyr)
library(wmtsa)
library(data.table)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
gatc <- read.table("Doench2014.gatc.20sliding.bed", header=F, sep="\t", stringsAsFactors = F)
gene <- read.table("Doench2014.gene.20sliding.bed", header=F, sep="\t", stringsAsFactors = F)
structure <- read.table("Doench2014.20sliding.ViennaRNA.output.value.id.txt", header=T, sep="\t", stringsAsFactors = F)
nuc <- read.table("Doench2014.nucleotide_counts_20sliding_temp.txt", header=T, sep="\t", stringsAsFactors = F)
pam <- read.table("Doench2014.NGG.PAM.20bp.sliding.windows.bed", header=F, sep="\t", stringsAsFactors = F)
window <- read.table("Doench2014.20bp.sliding.bed", header=F, sep="\t", stringsAsFactors = F)
score <- read.table("Doench2014.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
colnames(score) <- c("chr", "start", "end", "sgRNA", "id", "seq", "id2", "cut.score", "gid", "change.val", "quality")
score.df <- score[,c(1:4,8)]
gatc.bin <- gatc %>% group_by(V1, V2, V3) %>% mutate(gatc.count = n())
gatc.count <- unique(gatc.bin[,c(1:3,8)])
gene.bin <- gene %>% group_by(V1, V2, V3) %>% mutate(gene.count = n())
gene.count <- unique(gene.bin[,c(1:3,14)])
pam.bin <- pam %>% group_by(V1, V2, V3) %>% mutate(pam.count = n())
pam.count <- unique(pam.bin[,c(1:3,12)])
window.v <- window[,1:3]
colnames(window.v) <- c("V1", "V2", "V3")
gatc.win <- left_join(window.v, gatc.count, by=c("V1", "V2", "V3"))
gatc.win[is.na(gatc.win)] <- 0
gene.win <- left_join(window.v, gene.count, by=c("V1", "V2", "V3"))
gene.win[is.na(gene.win)] <- 0
pam.win <- left_join(window.v, pam.count, by=c("V1", "V2", "V3"))
pam.win[is.na(pam.win)] <- 0
gene.df <- gene.win$gene.count
gatc.df <- gatc.win$gatc.count
pam.df <- pam.win$pam.count
structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/modwt")
temp.modwt <- wavMODWT(temp.df, wavelet="haar")
temp.modwt.df <- as.matrix(temp.modwt)
temp.modwt.label <- data.frame(label = row.names(temp.modwt.df), temp.modwt.df)
temp.modwt.dt <- as.data.table(temp.modwt.label)
temp.modwt.name <- temp.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(temp.modwt.name) <- c("label", "temp.dwt", "scale", "window")
write.table(temp.modwt.name, "temp.modwt.haar.txt", quote=F, row.names=F, sep="\t")
gc.modwt <- wavMODWT(gc.df, wavelet="haar")
gc.modwt.df <- as.matrix(gc.modwt)
gc.modwt.label <- data.frame(label = row.names(gc.modwt.df), gc.modwt.df)
gc.modwt.dt <- as.data.table(gc.modwt.label)
gc.modwt.name <- gc.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(gc.modwt.name) <- c("label", "gc.dwt", "scale", "window")
write.table(gc.modwt.name, "gc.modwt.haar.txt", quote=F, row.names=F, sep="\t")
structure.modwt <- wavMODWT(structure.df, wavelet="haar")
structure.modwt.df <- as.matrix(structure.modwt)
structure.modwt.label <- data.frame(label = row.names(structure.modwt.df), structure.modwt.df)
structure.modwt.dt <- as.data.table(structure.modwt.label)
structure.modwt.name <- structure.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(structure.modwt.name) <- c("label", "structure.dwt", "scale", "window")
write.table(structure.modwt.name, "structure.modwt.haar.txt", quote=F, row.names=F, sep="\t")
ipd.modwt <- wavMODWT(ipd.df, wavelet="haar")
ipd.modwt.df <- as.matrix(ipd.modwt)
ipd.modwt.label <- data.frame(label = row.names(ipd.modwt.df), ipd.modwt.df)
ipd.modwt.dt <- as.data.table(ipd.modwt.label)
ipd.modwt.name <- ipd.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(ipd.modwt.name) <- c("label", "ipd.dwt", "scale", "window")
write.table(ipd.modwt.name, "ipd.modwt.haar.txt", quote=F, row.names=F, sep="\t")
gene.modwt <- wavMODWT(gene.df, wavelet="haar")
gene.modwt.df <- as.matrix(gene.modwt)
gene.modwt.label <- data.frame(label = row.names(gene.modwt.df), gene.modwt.df)
gene.modwt.dt <- as.data.table(gene.modwt.label)
gene.modwt.name <- gene.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(gene.modwt.name) <- c("label", "gene.dwt", "scale", "window")
write.table(gene.modwt.name, "gene.density.modwt.haar.txt", quote=F, row.names=F, sep="\t")
gatc.modwt <- wavMODWT(gatc.df, wavelet="haar")
gatc.modwt.df <- as.matrix(gatc.modwt)
gatc.modwt.label <- data.frame(label = row.names(gatc.modwt.df), gatc.modwt.df)
gatc.modwt.dt <- as.data.table(gatc.modwt.label)
gatc.modwt.name <- gatc.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(gatc.modwt.name) <- c("label", "gatc.dwt", "scale", "window")
write.table(gatc.modwt.name, "gatc.density.modwt.haar.txt", quote=F, row.names=F, sep="\t")
pam.modwt <- wavMODWT(pam.df, wavelet="haar")
pam.modwt.df <- as.matrix(pam.modwt)
pam.modwt.label <- data.frame(label = row.names(pam.modwt.df), pam.modwt.df)
pam.modwt.dt <- as.data.table(pam.modwt.label)
pam.modwt.name <- pam.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(pam.modwt.name) <- c("label", "pam.dwt", "scale", "window")
write.table(pam.modwt.name, "pam.density.modwt.haar.txt", quote=F, row.names=F, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/modwt")
temp.modwt.name <- read.delim("temp.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
gc.modwt.name <- read.delim("gc.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
structure.modwt.name <- read.delim("structure.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
gene.modwt.name <- read.delim("gene.density.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
gatc.modwt.name <- read.delim("gatc.density.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
ipd.modwt.name <- read.delim("ipd.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
pam.modwt.name <- read.delim("pam.density.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
window <- read.table("Doench2014.20bp.sliding.bed", header=F, sep="\t", stringsAsFactors = F)
score <- read.table("Doench2014.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
colnames(score) <- c("chr", "start", "end", "sgRNA", "id", "seq", "id2", "cut.score", "gid", "change.val", "quality")
score.df <- score[,c(1:4,8)]
colnames(window) <- c("chr", "start", "end")
window$window <- seq.int(nrow(window))
window$window <- as.character(window$window-1)
window$start <- as.numeric(window$start)
window$end <- as.numeric(window$end - 1)
window.score.df <- left_join(score.df, window, by=c("chr", "start", "end"))
window.score.df$window <- as.integer(window.score.df$window)
window.score.temp <- left_join(window.score.df, temp.modwt.name[,c(3,4,2)], by="window")
window.temp.gc <- left_join(window.score.temp, gc.modwt.name[,c(3,4,2)], by=c("window", "scale"))
window.temp.gc.structure <- left_join(window.temp.gc, structure.modwt.name[,c(3,4,2)], by=c("window", "scale"))
window.temp.gc.structure.gene <- left_join(window.temp.gc.structure, gene.modwt.name[,c(3,4,2)], by=c("window", "scale"))
window.temp.gc.structure.gene.gatc <- left_join(window.temp.gc.structure.gene, gatc.modwt.name[,c(3,4,2)], by=c("window", "scale"))
window.temp.gc.structure.gene.gatc.pam <- left_join(window.temp.gc.structure.gene.gatc, pam.modwt.name[,c(3,4,2)], by=c("window", "scale"))
nrow(window.temp.gc.structure.gene.gatc.pam)
#
window.temp.gc.structure.gene.gatc.pam.sgRNA <- subset(window.temp.gc.structure.gene.gatc.pam, window.temp.gc.structure.gene.gatc.pam$cut.score != "NA")
nrow(window.temp.gc.structure.gene.gatc.pam)
#
write.table(window.temp.gc.structure.gene.gatc.pam.sgRNA, "Doench2014.20sliding.exact.DWT.haar.txt", quote=F, row.names=F, sep="\t")
df.melt <- melt(window.temp.gc.structure.gene.gatc.pam.sgRNA[,c(4,5,7:15)], id=c("cut.score", "scale", "sgRNA"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNA", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNA + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
nrow(df.dcast.na)
#
write.table(df.dcast.na, "Doench2014.20sliding.exact.DWT.haar.dcast.txt", quote=F, row.names=F, sep="\t")
# combine regional DWT with other features
library(tidyr)
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
df.dcast.na <- read.delim("Doench2014.20sliding.exact.DWT.haar.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
df.dcast.sep <- df.dcast.na %>% separate(sgRNA, c("sgRNA", "ID"), sep="_")
df.dcast.dwt <- df.dcast.sep[,c(4:ncol(df.dcast.sep))]
colnames(df.dcast.dwt) <- paste0('sgRNA_', colnames(df.dcast.dwt))
df.dcast <- cbind(df.dcast.sep[,1:3], df.dcast.dwt)
df <- read.delim("Doench2014.raw.onehot.tensor.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df.sep <- df %>% separate(sgRNAID, c("sgRNA", "ID", "type"), sep="_")
nrow(df.sep)
#
df.sep.region <- inner_join(df.sep, df.dcast[,c(1,2,4:ncol(df.dcast.sep))], by=c("sgRNA", "ID"))
df.sep.region.id <- df.sep.region %>% unite(sgRNAID, c("sgRNA", "ID", "type"), sep="_")
nrow(df.sep.region.id)
#
write.table(df.sep.region.id, "Doench2014.20sliding.raw.onehot.tensor.dwt.dcast.txt", quote=F, row.names=F, sep="\t")
–> run iRF without wavelets (due to computational limitations)
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J doench.iRF
#SBATCH -N 1
#SBATCH -t 10:00:00
#SBATCH --mem-per-cpu=0
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014
R CMD BATCH iRF.test.R
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.test.sh
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(ranger)
iRF <- function(xmat, y, ntree=200, iter=5, classification=F, threads=1, alwayssplits=NULL, saveall=T)
{
tmp <- cbind(xmat, Y = y)
wt <- rep(1/ncol(xmat), ncol(xmat)) # start with equal sample weighting per SNP
rfs <- list()
for(i in 1:iter)
{
cat("\niRF iteration ",i,"\n")
cat("=================\n")
mtry = 0.5*sum(wt>0)
rf <- ranger::ranger(dependent.variable.name = "Y", data = tmp, num.trees=ntree,
split.select.weights = wt, classification = classification,
mtry = mtry, importance = "impurity_corrected", num.threads=threads, write.forest = T,
always.split.variables = alwayssplits)
wt <- rf$variable.importance / sum(abs(rf$variable.importance)) # scale importance to range(0,1)
wt[wt<0] <- 0 # set negative weights to zero
cat("mtry: ", mtry, "\n")
cat("prediction error: ",rf$prediction.error,"\n")
if(classification==FALSE) cat("r^2: ",rf$r.squared,"\n")
if(classification==TRUE) print(rf$confusion.matrix)
cat("cor(y,yhat): ",cor(rf$predictions,y),"\n")
cat("SNPs with importance > 0:",sum(wt>0),"\n")
if(saveall) rfs[[i]] <- rf
if(sum(wt>0) < max(0.01*(ncol(xmat)-1), 10))
{
if(!saveall) rfs <- rf
break
}
}
return(rfs)
}
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
df <- read.delim("Doench2014.raw.onehot.tensor.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df$cut.score <- df$cut.score.x
df.cut <- df[,c(1,1657, 3:1654, 1656)]
# sgRNAID: [,1]
# cut.score: [,2]
iRF(df.cut[,3:ncol(df.cut)], df.cut$cut.score)
# iRF iteration 5
# =================
# mtry: 213
# prediction error: 0.01827123
# r^2: 0.5248384
# cor(y,yhat): 0.734981
# SNPs with importance > 0: 355
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
df <- read.delim("Doench2014.raw.onehot.tensor.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df <- df[,c(1:1654,1656)]
write.table(df, "Doench2014.raw.onehot.tensor.pam.location.dcast.txt", quote=F, row.names=F, sep="\t")
# sgRNAID: [,1]
# cut.score: [,2]
# one-hot independent: [,c(3:17,1645:1649,1651:1652,1654:1655)]
# one-hot dependent: [,c(18:57,120:139,202:221,284:303,366:385,448:467,530:549,612:631,694:713,776:795,920:943,1068:1087,1150:1169,1232:1251,1314:1333,1396:1415,1478:1497,1560:1579)]
# chemical tensors: [,c(58:119,140:201,222:283,304:365,386:447,468:529,550:611,632:693,714:775,796:919,944:1067,1088:1149,1170:1231,1252:1313,1334:1395,1416:1477,1498:1559,1580:1641)]
# raw (gc, structure, temp, gene.distance, pam.distance): [,c(1642:1644,1650,1653)]
df.raw <- df[,c(2,1642:1644,1650,1653)]
iRF(df.raw[,2:ncol(df.raw)], df.raw$cut.score)
# iRF iteration 1
# =================
# mtry: 2.5
# prediction error: 0.03886899
# r^2: -0.01082707
# cor(y,yhat): 0.1496061
# SNPs with importance > 0: 1
df.onehot <- df[,c(2,3:17,1645:1649,1651:1652,1654:1655,18:57,120:139,202:221,284:303,366:385,448:467,530:549,612:631,694:713,776:795,920:943,1068:1087,1150:1169,1232:1251,1314:1333,1396:1415,1478:1497,1560:1579)]
iRF(df.onehot[,2:ncol(df.onehot)], df.onehot$cut.score)
# iRF iteration 5
# =================
# mtry: 58.5
# prediction error: 0.01801721
# r^2: 0.5314444
# cor(y,yhat): 0.7364577
# SNPs with importance > 0: 94
df.quantum <- df[,c(2,58:119,140:201,222:283,304:365,386:447,468:529,550:611,632:693,714:775,796:919,944:1067,1088:1149,1170:1231,1252:1313,1334:1395,1416:1477,1498:1559,1580:1641)]
iRF(df.quantum[,2:ncol(df.quantum)], df.quantum$cut.score)
# iRF iteration 4
# =================
# mtry: 216
# prediction error: 0.02016961
# r^2: 0.4754692
# cor(y,yhat): 0.6990759
# SNPs with importance > 0: 366
df.raw.onehot <- cbind(df.raw, df.onehot[,2:ncol(df.onehot)])
iRF(df.raw.onehot[,2:ncol(df.raw.onehot)], df.raw.onehot$cut.score)
# iRF iteration 5
# =================
# mtry: 56
# prediction error: 0.01830529
# r^2: 0.5239526
# cor(y,yhat): 0.7311999
# SNPs with importance > 0: 94
df.raw.quantum <- cbind(df.raw, df.quantum[,2:ncol(df.quantum)])
iRF(df.raw.quantum[,2:ncol(df.raw.quantum)], df.raw.quantum$cut.score)
# iRF iteration 5
# =================
# mtry: 176.5
# prediction error: 0.02064207
# r^2: 0.4631822
# cor(y,yhat): 0.6891123
# SNPs with importance > 0: 300
df.onehot.quantum <- cbind(df.onehot, df.quantum[,2:ncol(df.quantum)])
iRF(df.onehot.quantum[,2:ncol(df.onehot.quantum)], df.onehot.quantum$cut.score)
# iRF iteration 5
# =================
# mtry: 208
# prediction error: 0.01845882
# r^2: 0.51996
# cor(y,yhat): 0.7299223
# SNPs with importance > 0: 356
df.raw.onehot.quantum <- cbind(df.raw, df.onehot[,2:ncol(df.onehot)], df.quantum[,2:ncol(df.quantum)])
iRF(df.raw.onehot.quantum[,2:ncol(df.raw.onehot.quantum)], df.raw.onehot.quantum$cut.score)
# iRF iteration 5
# =================
# mtry: 196
# prediction error: 0.01821351
# r^2: 0.5263394
# cor(y,yhat): 0.7332925
# SNPs with importance > 0: 329
library(dplyr)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
df <- read.delim("Doench2014.raw.onehot.tensor.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df$cut.score <- df$cut.score.x
df.cut <- df[,c(1,1657, 3:1654, 1656)]
ncol(df.cut)
# 1655
df.id <- separate(df.cut, sgRNAID, c("data", "sgRNAID"))
df.num <- mutate_all(df.id[,2:ncol(df.id)], function(x) as.numeric(as.character(x)))
write.table(df.num[,c(1,3:ncol(df.num))], "Doench2014.raw.onehot.tensor.pam.location.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.num[,c(1,3:ncol(df.num))], "Doench2014.raw.onehot.tensor.pam.location.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.num[,3:ncol(df.num)], "Doench2014.raw.onehot.tensor.pam.location.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.num[,1:2], "Doench2014.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.num[,1:2], "Doench2014.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.num[,2]), "Doench2014.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
# run python scripts on Andes
# run job submissions on Summit
# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]
# Andes
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Doench2014.noDWT --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.raw.onehot.tensor.pam.location.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Submits/submit_full_Doench2014.noDWT_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Submits/submit_train_Doench2014.noDWT_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Submits/submit_test_Doench2014.noDWT_0.sh
# Andes
module load python/3.7-anaconda3
vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/YNames.txt Doench2014.noDWT
# 0.44364214097908705
sort -k3rg topVarEdges/cut.score_top95.txt | head
# GGsgRNA.raw cut.score 0.05970314755902555
# p20relativenum_Hatomsraw cut.score 0.04573441936072455
# p16.CCsgRNA.raw cut.score 0.0434682191939938
# pam.distance0 cut.score 0.03126013639591283
# p20num_ringsraw cut.score 0.03026777683362737
# p19.CGsgRNA.raw cut.score 0.020910013660982794
# p20num_doublebondsraw cut.score 0.02076397222225037
# p8.TAsgRNA.raw cut.score 0.01734592200141828
# p2.TAsgRNA.raw cut.score 0.016856617209545403
# p18xy_quadrupoleraw cut.score 0.015472105481533837
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Doench2014.noDWT_cut.score.importance4 | head
# GGsgRNA.raw: 1.98761
# pam.distance0: 1.53691
# p19.CGsgRNA.raw: 1.52857
# p20num_ringsraw: 1.29585
# p20num_doublebondsraw: 1.10066
# p16.CCsgRNA.raw: 1.0482
# p20relativenum_Hatomsraw: 1.0156
# p8.TAsgRNA.raw: 0.799865
# p16num_singlebondsraw: 0.639571
# gene.distance0: 0.581855
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Doench2014.noDWT_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.7221353
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/
cut -f 1,3 Doench2014.txt > Doench2014.noscore.txt
python ../kmer1_positional_encode.py Doench2014.noscore.txt
python ../kmer2_positional_encode.py Doench2014.noscore.txt
python ../kmer3_positional_encode.py Doench2014.noscore.txt
python ../kmer4_positional_encode.py Doench2014.noscore.txt
# separate nucleotide sequence values into individual columns in data frame so each position counts as one feature
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/
sed '1d' Doench2014.noscore_dependent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Doench2014_dep1.txt
sed '1d' Doench2014.noscore_dependent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Doench2014_dep2.txt
sed '1d' Doench2014.noscore_dependent3.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Doench2014_dep3.txt
sed '1d' Doench2014.noscore_dependent4.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Doench2014_dep4.txt
# salloc -A SYB105 -N 2 -p gpu -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
score <- read.delim("Doench2014.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- unique(score[,c(5,7)])
colnames(score.df) <- c("sgRNAID", "cut.score")
onehot.dep1 <- read.delim("Doench2014_dep1.txt", header=F, sep=" ")
onehot.dep2 <- read.delim("Doench2014_dep2.txt", header=F, sep=" ")
onehot.dep3 <- read.delim("Doench2014_dep3.txt", header=F, sep=" ")
onehot.dep4 <- read.delim("Doench2014_dep4.txt", header=F, sep=" ")
colnames(onehot.dep1)[1] <- "sgRNAID"
colnames(onehot.dep2)[1] <- "sgRNAID"
colnames(onehot.dep3)[1] <- "sgRNAID"
colnames(onehot.dep4)[1] <- "sgRNAID"
onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)-1], onehot.dep2[,1:ncol(onehot.dep2)-1], by="sgRNAID")
onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)-1], by="sgRNAID")
onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)-1], by="sgRNAID")
onehot.score <- full_join(score.df, onehot.dep, by="sgRNAID")
df.melt <- melt(onehot.score, id=c("cut.score", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "sgRNAID", "variable", "value")
df$value <- as.numeric(df$value)
df.id <- df[!(is.na(df$value) | df$value==""), ]
colnames(df.id) <- c("cut.score", "sgRNAID", "feature", "value")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
write.table(df.dcast, "Doench2014.kmer.encoding.txt", quote=F, row.names=F, sep="\t")
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J iRF.onehot.kmer
#SBATCH -N 1
#SBATCH -t 10:00:00
#SBATCH --mem-per-cpu=0
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014
R CMD BATCH iRF.onehot.kmer.R
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.onehot.kmer.sh
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(ranger)
iRF <- function(xmat, y, ntree=200, iter=5, classification=F, threads=1, alwayssplits=NULL, saveall=T)
{
tmp <- cbind(xmat, Y = y)
wt <- rep(1/ncol(xmat), ncol(xmat)) # start with equal sample weighting per SNP
rfs <- list()
for(i in 1:iter)
{
cat("\niRF iteration ",i,"\n")
cat("=================\n")
mtry = 0.5*sum(wt>0)
rf <- ranger::ranger(dependent.variable.name = "Y", data = tmp, num.trees=ntree,
split.select.weights = wt, classification = classification,
mtry = mtry, importance = "impurity_corrected", num.threads=threads, write.forest = T,
always.split.variables = alwayssplits)
wt <- rf$variable.importance / sum(abs(rf$variable.importance)) # scale importance to range(0,1)
wt[wt<0] <- 0 # set negative weights to zero
cat("mtry: ", mtry, "\n")
cat("prediction error: ",rf$prediction.error,"\n")
if(classification==FALSE) cat("r^2: ",rf$r.squared,"\n")
if(classification==TRUE) print(rf$confusion.matrix)
cat("cor(y,yhat): ",cor(rf$predictions,y),"\n")
cat("SNPs with importance > 0:",sum(wt>0),"\n")
if(saveall) rfs[[i]] <- rf
if(sum(wt>0) < max(0.01*(ncol(xmat)-1), 10))
{
if(!saveall) rfs <- rf
break
}
}
return(rfs)
}
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
df <- read.delim("Doench2014.kmer.encoding.txt", header=T, sep="\t", stringsAsFactors = F)
# kmer = 1
df.1 <- df[,c(2:82)]
iRF(df.1[,2:ncol(df.1)], df.1$cut.score)
# iRF iteration 2
# =================
# mtry: 32.5
# prediction error: 0.0196492
# r^2: 0.488806
# cor(y,yhat): 0.7078255
# SNPs with importance > 0: 55
# kmer = 2
df.2 <- df[,c(2,83:386)]
iRF(df.2[,2:ncol(df.2)], df.2$cut.score)
# iRF iteration 2
# =================
# mtry: 94
# prediction error: 0.01810885
# r^2: 0.5288797
# cor(y,yhat): 0.7320499
# SNPs with importance > 0: 138
# kmer = 3
df.3 <- df[,c(2,387:1538)]
iRF(df.3[,2:ncol(df.3)], df.3$cut.score)
# iRF iteration 4
# =================
# mtry: 176.5
# prediction error: 0.02355923
# r^2: 0.3870824
# cor(y,yhat): 0.6366728
# SNPs with importance > 0: 300
# kmer = 4
df.4 <- df[,c(2,1539:5890)]
iRF(df.4[,2:ncol(df.4)], df.4$cut.score)
# iRF iteration 5
# =================
# mtry: 426
# prediction error: 0.02256984
# r^2: 0.4128225
# cor(y,yhat): 0.6467192
# SNPs with importance > 0: 706
# kmer = 1 + 2
df.1.2 <- df[,c(2:386)]
iRF(df.1.2[,2:ncol(df.1.2)], df.1.2$cut.score)
# iRF iteration 3
# =================
# mtry: 88
# prediction error: 0.01712434
# r^2: 0.5544926
# cor(y,yhat): 0.7554629
# SNPs with importance > 0: 136
# kmer = 1 + 2 + 3
df.1.2.3 <- df[,c(2:1538)]
iRF(df.1.2.3[,2:ncol(df.1.2.3)], df.1.2.3$cut.score)
# iRF iteration 5
# =================
# mtry: 157
# prediction error: 0.01667377
# r^2: 0.5662148
# cor(y,yhat): 0.7629953
# SNPs with importance > 0: 254
# kmer = 1 + 2 + 3 + 4
df.1.2.3.4 <- df[,c(2:5890)]
iRF(df.1.2.3.4[,2:ncol(df.1.2.3.4)], df.1.2.3.4$cut.score)
# iRF iteration 5
# =================
# mtry: 286
# prediction error: 0.01691536
# r^2: 0.5599295
# cor(y,yhat): 0.7625464
# SNPs with importance > 0: 444
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate shap
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014
# python
import pandas as pd
import numpy as np
np.random.seed(0)
import matplotlib.pyplot as plt
df = pd.read_table('/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.raw.onehot.tensor.pam.location.dcast.txt') # Load the data
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.ensemble import RandomForestRegressor
# The target variable is 'cut.score'.
Y = df['cut.score']
# get list of features from R... dput(colnames(df))
X = df.drop(columns =['sgRNAID', 'cut.score'])
# Split the data into train and test data:
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size = 0.2)
# Build the model with the random forest regression algorithm:
model = RandomForestRegressor(max_depth=6,random_state=0,n_estimators=10)
model.fit(X_train, Y_train)
import shap
shap_values = shap.TreeExplainer(model).shap_values(X_train)
f = plt.figure()
shap.summary_plot(shap_values, X_train, plot_type="bar")
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.noDWT.16dec.shap_summary_plot_bar.png", bbox_inches='tight', dpi=600)
import matplotlib.pyplot as plt
f = plt.figure()
shap.summary_plot(shap_values, X_train)
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.noDWT.16dec.shap_summary_plot_varimp.png", bbox_inches='tight', dpi=600)
# scp noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.noDWT.16dec.shap_summary_plot_varimp.png /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP/.
** Need to compile the C++ file /gpfs/alpine/syb105/proj-shared/Personal/jromero/codesnippets/ritw **
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00
# /gpfs/alpine/syb105/proj-shared/Personal/jromero/PathAnalysis/runRIT.sh
## cp /gpfs/alpine/syb105/proj-shared/Personal/jromero/PathAnalysis/runRIT.sh /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh
# runRIT.sh feature name ### Note: name is name of the run and feature is the name of the y-value
# cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run
# python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Doench2014.noDWT --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.raw.onehot.tensor.pam.location.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.score.txt
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score Doench2014.noDWT
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/cut.score/RIT.run
sort -k3rg Doench2014.noDWT_cut.score.importance4.effect | head
# GGsgRNA.raw cut.score 0.05970314755902555 -2.4922830521170305e-06 1068.364 0.154811002130185
# p20relativenum_Hatomsraw cut.score 0.04573441936072455 -4.842518672415318e-06 332.522 0.23191751459995405
# p16.CCsgRNA.raw cut.score 0.0434682191939938 6.666536737294725e-06 626.932 0.20406466107233728
# pam.distance0 cut.score 0.03126013639591283 8.518014576197689e-07 557.416 0.1616350915002599
# p20num_ringsraw cut.score 0.03026777683362737 5.3520511137451035e-06 204.4 0.1238110408020338
# p19.CGsgRNA.raw cut.score 0.020910013660982794 8.373493754214167e-06 201.136 0.2964272539549334
# p20num_doublebondsraw cut.score 0.02076397222225037 0.0009766008330982072 146.0 0.1467377696311431
# p8.TAsgRNA.raw cut.score 0.01734592200141828 8.389209122949194e-06 445.251 0.13091950466369892
# p2.TAsgRNA.raw cut.score 0.016856617209545403 7.935778676155901e-06 228.942 0.21242520749190558
# p18xy_quadrupoleraw cut.score 0.015472105481533837 1.8887582466112958e-06 209.254 0.2185088382511396
### get output from SHAP and do correlation of SHAP values to FeatureEffect (column 4 values) or EffectSize (column 3 with sign from column 4)???
# positional encoding kmers 1-4
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer1_positional_encode.py Doench2014.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer2_positional_encode.py Doench2014.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer3_positional_encode.py Doench2014.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer4_positional_encode.py Doench2014.noscore.txt
# separate nucleotide sequence values into individual columns in data frame so each position counts as one feature
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/
sed '1d' Doench2014.noscore_dependent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Doench2014_dep1.txt
sed '1d' Doench2014.noscore_dependent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Doench2014_dep2.txt
sed '1d' Doench2014.noscore_dependent3.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Doench2014_dep3.txt
sed '1d' Doench2014.noscore_dependent4.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Doench2014_dep4.txt
sed '1d' Doench2014.noscore_independent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID A C T G' | cut -d ' ' -f 1-5 > Doench2014_ind1.txt
sed '1d' Doench2014.noscore_independent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID AA AC AT AG CA CC CT CG TA TC TT TG GA GC GT GG' | cut -d ' ' -f 1-17 > Doench2014_ind2.txt
# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("protein_rna_dna-vector_lee_nucleotide_dna_data_15dec.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/")
seq <- read.delim("Doench2014.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:5]
#tensor.t <- as.data.frame(t(tensor.df[63:70,]))
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- c("A", "C", "G", "T")
rownames(seq) <- seq[,1]
seq.df <- seq[,2:21]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
#write.table(seq.tensor.dcast, "Doench2014.tensors.single.bp.txt", quote=F, row.names=F, sep="\t")
#write.table(seq.tensor.melt, "Doench2014.tensors.single.bp.melt.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.dcast, "Doench2014.tensorsAll.single.bp.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014.tensorsAll.single.bp.melt.txt", quote=F, row.names=F, sep="\t")
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J jan18.matrix
#SBATCH -N 4
#SBATCH -t 10:00:00
module load python
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014
R CMD BATCH jan18.matrix.R
R CMD BATCH jan18.matrix.2.R
#sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/jan18.matrix.sh
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
structure <- read.delim("Doench2014.gRNA.ViennaRNA.output.value.id.txt", header=T, sep="\t", stringsAsFactors = F)
nuc <- read.delim("Doench2014.nucleotide_counts_sgRNA_temp.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("Doench2014.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- unique(score[,c(5,7)])
colnames(score.df) <- c("sgRNAID", "cut.score")
structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]
# strucutre, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])
structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"
structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]
structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score.df, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")
## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
onehot.ind1 <- read.delim("Doench2014_ind1.txt", header=T, sep=" ")
onehot.ind2 <- read.delim("Doench2014_ind2.txt", header=T, sep=" ")
onehot.dep1 <- read.delim("Doench2014_dep1.txt", header=F, sep=" ")
onehot.dep2 <- read.delim("Doench2014_dep2.txt", header=F, sep=" ")
onehot.dep3 <- read.delim("Doench2014_dep3.txt", header=F, sep=" ")
onehot.dep4 <- read.delim("Doench2014_dep4.txt", header=F, sep=" ")
colnames(onehot.dep1)[1] <- "sgRNAID"
colnames(onehot.dep2)[1] <- "sgRNAID"
colnames(onehot.dep3)[1] <- "sgRNAID"
colnames(onehot.dep4)[1] <- "sgRNAID"
onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)-1], onehot.dep2[,1:ncol(onehot.dep2)-1], by="sgRNAID")
# dep 1 = V2.x - V81.x
# dep 2 = V2.y - V81.y & V82 - V305
onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)-1], by="sgRNAID")
# dep 1 = V2.x - V81.x
# dep 2 = V2.y - V81.y & V82.x - V305.x
# dep 3 = V2 - V81 & V82.y - V305.y & V306 - V1153
onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)-1], by="sgRNAID")
# dep 1 = V2.x - V81.x
# dep 2 = V2.y - V81.y & V82.x - V305.x
# dep 3 = V2.x.x - V81.x.x & V82.y - V305.y & V306.x - V1153.x
# dep 4 = V2.y.y - V81.y.y & V82 - V305 & V306.y - V1153.y & V1154 - V4353
onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"
data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))
df.melt <- melt(data.onehot, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
write.table(df.id, "Doench2014.structure.temp.gc.onehot1to4.txt", quote=F, row.names=F, sep="\t")
# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
#tensor <- read.delim("Doench2014.tensors.single.bp.melt.txt", header=T, sep="\t")
tensor <- read.delim("Doench2014.tensorsAll.single.bp.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0
tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0
#write.table(tensor.id, "tensor.id.test", quote=F, row.names=F, sep="\t")
tensor.id <- read.delim("tensor.id.test", header=T, sep="\t")
df.id <- read.delim("Doench2014.structure.temp.gc.onehot1to4.txt", header=T, sep="\t")
score <- read.delim("Doench2014.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- unique(score[,c(5,7)])
colnames(score.df) <- c("sgRNAID", "cut.score")
df.score <- unique(df.id[,c(1,3)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
head(df.id)
head(tensor.score.order)
tensor.df <- rbind(df.id, tensor.score.order)
df.dcast <- tensor.df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast[is.na(df.dcast)] <- 0
df.dcast.na <- na.omit(df.dcast)
#write.table(df.dcast.na, "Doench2014.raw.onehot.tensor.single.bp.dcast.na.txt", quote=F, row.names=F, sep="\t")
write.table(df.dcast.na, "Doench2014.raw.onehot.tensorAll.single.bp.dcast.na.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast.na)
# 929
# pam (distance and nucleotide)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
sgRNA.pam <- read.table("Doench2014.sgRNA.closestPAM.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.pam.sub <- sgRNA.pam[,c(4,12,13)]
colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]
#sgRNA.pam.df$id <- "Cas9"
#sgRNA.pam.id <- unite(sgRNA.pam.df, "sgRNAID", c(sgRNAID, id), sep="_")
score.location <- left_join(score.df, sgRNA.pam.df, by="sgRNAID")
score.location$scale <- 0
df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
#df <- read.delim("Doench2014.raw.onehot.tensor.single.bp.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df <- read.delim("Doench2014.raw.onehot.tensorAll.single.bp.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
nrow(df.location)
# 673
# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
sgRNA.genes <- read.table("Doench2014.sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- sgRNA.genes[,c(4,14)]
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")
#sgRNA.genes.df$id <- "Cas9"
#sgRNA.genes.id <- unite(sgRNA.genes.df, "sgRNAID", c(sgRNAID, id), sep="_")
score.location <- left_join(score.df, sgRNA.genes.df, by=c("sgRNAID"))
score.location$scale <- 0
df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
# 930
df <- df.location
df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
nrow(df.location)
# 673
#write.table(df.location, "Doench2014.raw.onehot.tensor.single.bp.pam.location.dcast.na.txt", quote=F, row.names=F, sep="\t")
write.table(df.location, "Doench2014.raw.onehot.tensorAll.single.bp.pam.location.dcast.na.txt", quote=F, row.names=F, sep="\t")
# add new DNA/RNA dimer features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(tidyr)
library(reshape2)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("quantum_dimers_20dec.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
seq <- read.delim("Doench2014.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.dimer <- seq %>% unite("p1", p1:p2, remove=F, sep= "") %>% unite("p2", p2:p3, remove=F, sep= "") %>% unite("p3", p3:p4, remove=F, sep= "") %>% unite("p4", p4:p5, remove=F, sep= "") %>% unite("p5", p5:p6, remove=F, sep= "") %>% unite("p6", p6:p7, remove=F, sep= "") %>% unite("p7", p7:p8, remove=F, sep= "") %>% unite("p8", p8:p9, remove=F, sep= "") %>% unite("p9", p9:p10, remove=F, sep= "") %>% unite("p10", p10:p11, remove=F, sep= "") %>% unite("p11", p11:p12, remove=F, sep= "") %>% unite("p12", p12:p13, remove=F, sep= "") %>% unite("p13", p13:p14, remove=F, sep= "") %>% unite("p14", p14:p15, remove=F, sep= "") %>% unite("p15", p15:p16, remove=F, sep= "") %>% unite("p16", p16:p17, remove=F, sep= "") %>% unite("p17", p17:p18, remove=F, sep= "") %>% unite("p18", p18:p19, remove=F, sep= "") %>% unite("p19", p19:p20, remove=T, sep= "")
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:17]
tensor.t <- as.data.frame(t(tensor.df))
#tensor.t$base <- c("A", "C", "G", "T")
tensor.t$base <- names(tensor[,2:17])
rownames(seq) <- seq.dimer[,1]
seq.df <- seq.dimer[,2:20]
seq.melt <- melt(seq.dimer, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Doench2014.tensors.dimers.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014.tensors.dimers.melt.txt", quote=F, row.names=F, sep="\t")
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
#df <- read.delim("Doench2014.raw.onehot.tensor.single.bp.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df <- read.delim("Doench2014.raw.onehot.tensorAll.single.bp.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
tensor <- read.delim("Doench2014.tensors.dimers.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0
tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0
df.score <- unique(df[,c(1,2)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
colnames(tensor.score.order) <- c("cut.score", "feature.scale", "sgRNAID", "value")
df.dcast <- tensor.score.order %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
#write.table(df.dcast.na, "Doench2014.raw.onehot.tensor.single.bp.dimers.dcast.na.txt", quote=F, row.names=F, sep="\t")
write.table(df.dcast.na, "Doench2014.raw.onehot.tensorAll.single.bp.dimers.dcast.na.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast.na)
# 673
df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
#write.table(df.location, "Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.txt", quote=F, row.names=F, sep="\t")
write.table(df.location, "Doench2014.raw.onehot.tensorAll.single.bp.dimers.pam.location.dcast.na.txt", quote=F, row.names=F, sep="\t")
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
df <- read.delim("Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df <- df[,c(1:6073,6075:6079,6081,6083:6177)]
df.num <- mutate_all(df[,2:ncol(df)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"
write.table(df.all, "Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.all[,2]), "Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
# run python scripts on Andes
# run job submissions on Summit
# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]
# Andes
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensor.single.bp.dimers
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensor.single.bp.dimers
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Doench2014.tensor.single.bp.dimers --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensor.single.bp.dimers
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensor.single.bp.dimers/Submits/submit_full_Doench2014.tensor.single.bp.dimers_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensor.single.bp.dimers/Submits/submit_train_Doench2014.tensor.single.bp.dimers_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensor.single.bp.dimers/Submits/submit_test_Doench2014.tensor.single.bp.dimers_0.sh
# Andes
module load python/3.7-anaconda3
vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensor.single.bp.dimers
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/YNames.txt Doench2014.tensor.single.bp.dimers
#
sort -k3rg topVarEdges/cut.score_top95.txt | head
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Doench2014.tensor.single.bp.dimers_cut.score.importance4 | head
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensor.single.bp.dimers/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Doench2014.tensor.single.bp.dimers_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
#
### why is this prediction lower?? because I got rid of the other quantum chemical properties?? what happens if I add those back?
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
df <- read.delim("Doench2014.raw.onehot.tensorAll.single.bp.dimers.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df <- df[,c(1:7313,7315,7317,7319:7413)]
df.num <- mutate_all(df[,2:ncol(df)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"
write.table(df.all, "Doench2014.raw.onehot.tensorAll.single.bp.dimers.pam.location.dcast.na.corrected.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "Doench2014.raw.onehot.tensorAll.single.bp.dimers.pam.location.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "Doench2014.raw.onehot.tensorAll.single.bp.dimers.pam.location.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "Doench2014.raw.onehot.tensorAll.single.bp.dimers.pam.location.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Doench2014.raw.onehot.tensorAll.single.bp.dimers.pam.location.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Doench2014.raw.onehot.tensorAll.single.bp.dimers.pam.location.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.all[,2]), "Doench2014.raw.onehot.tensorAll.single.bp.dimers.pam.location.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
# Andes
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensorAll.single.bp.dimers
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensorAll.single.bp.dimers
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Doench2014.tensorAll.single.bp.dimers --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.raw.onehot.tensorAll.single.bp.dimers.pam.location.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.raw.onehot.tensorAll.single.bp.dimers.pam.location.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensorAll.single.bp.dimers
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensorAll.single.bp.dimers/Submits/submit_full_Doench2014.tensorAll.single.bp.dimers_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensorAll.single.bp.dimers/Submits/submit_train_Doench2014.tensorAll.single.bp.dimers_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensorAll.single.bp.dimers/Submits/submit_test_Doench2014.tensorAll.single.bp.dimers_0.sh
# Andes
module load python/3.7-anaconda3
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensorAll.single.bp.dimers
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/YNames.txt Doench2014.tensorAll.single.bp.dimers
# 0.396284289347527
sort -k3rg topVarEdges/cut.score_top95.txt | head
# GGsgRNA.raw cut.score 0.08286483432754792
# p14dimer_H_bondraw cut.score 0.060711831264174475
# V297.xsgRNA.raw cut.score 0.05803372127628595
# AsgRNA.raw cut.score 0.040184665740256094
# p20relativenum_Hatomsraw cut.score 0.03882577467341015
# V247.xsgRNA.raw cut.score 0.0305350646682211
# V3927sgRNA.raw cut.score 0.02846495673084694
# sgRNA.structuresgRNA.raw cut.score 0.01961441799504762
# p20num_ringsraw cut.score 0.019048393493482585
# p20relativenum_singlebondsraw cut.score 0.016330563330558483
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Doench2014.tensorAll.single.bp.dimers_cut.score.importance4 | head
# GGsgRNA.raw: 1.41962
# V297.xsgRNA.raw: 0.808977 <-- dependent 2 (p19.CG)
# V985.xsgRNA.raw: 0.767533 <-- dependent 3 (p17.TCG)
# p14dimer_H_bondraw: 0.568758
# V247.xsgRNA.raw: 0.561394 <-- dependent 2 (p16.CC)
# AsgRNA.raw: 0.465157
# sgRNA.structuresgRNA.raw: 0.454485
# p20num_doublebondsraw: 0.398546
# p20relativenum_Hatomsraw: 0.384287
# V259.xsgRNA.raw: 0.374241 <-- dependent 2 (p17.AC)
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensorAll.single.bp.dimers/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Doench2014.tensorAll.single.bp.dimers_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.7325874
** Need to compile the C++ file /gpfs/alpine/syb105/proj-shared/Personal/jromero/codesnippets/ritw **
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00
s
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0
#cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensor.single.bp.dimers/cut.score
#/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score Doench2014.tensor.single.bp.dimers
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensorAll.single.bp.dimers/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score Doench2014.tensorAll.single.bp.dimers
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.top50.sh cut.score Doench2014.tensorAll.single.bp.dimers
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensorAll.single.bp.dimers/cut.score/RIT.run
# sort -k3rg Doench2014.tensorAll.single.bp.dimers_cut.score.importance4.effect > Doench2014.tensorAll.single.bp.dimers_cut.score.importance4.effect_sorted
library(dplyr)
library(tidyr)
library(reshape2)
library(ggplot2)
library(RColorBrewer)
setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP")
imp <- read.delim("Doench2014.tensorAll.single.bp.dimers_cut.score.importance4.effect_sorted", header=F, sep="\t", stringsAsFactors = F)
colnames(imp) <- c("Feature", "YVec", "NormEdge", "Effect", "Samples", "Linearity")
#imp$Normalized.Importance <- as.numeric(substr(imp$NormEdge, 0, 4))
imp$Normalized.Importance <- as.numeric(imp$NormEdge)
imp$Feature.Effect <- as.numeric(imp$Effect)
imp$SampleCount <- as.numeric(imp$Samples)
imp.dir <- imp %>% mutate(Effect.Direction = ifelse(Feature.Effect < 0, "neg", ifelse(Feature.Effect > 0, "pos", "zero")))
imp.dir.top20 <- imp.dir[1:20,]
ggplot(imp.dir.top20, aes(x=reorder(Feature, -Normalized.Importance))) + geom_bar(aes(y=Normalized.Importance, fill=Effect.Direction), stat="identity") + coord_flip() + xlab("") + ylab("Normalized Importance") + theme_classic() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position="bottom") + scale_fill_brewer(palette="Set1")
# wc -l set0_Y_train_noSampleIDs.txt <-- 744
imp.dir.top20$Sample.Prop <- imp.dir.top20$SampleCount/744
ggplot(imp.dir.top20, aes(x=reorder(Feature, -Normalized.Importance))) + geom_point(aes(y=Sample.Prop, color=Effect.Direction, size=Feature.Effect)) + xlab("") + ylab("Avg Proportion of Samples that Features Influence") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Set2")
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate shap
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014
# python
import pandas as pd
import numpy as np
np.random.seed(0)
import matplotlib.pyplot as plt
df = pd.read_table('/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.txt') # Load the data
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.ensemble import RandomForestRegressor
# The target variable is 'cut.score'.
Y = df['cut.score']
# get list of features from R... dput(colnames(df))
X = df.drop(columns =['sgRNAID', 'cut.score'])
# Split the data into train and test data:
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size = 0.2)
# Build the model with the random forest regression algorithm:
model = RandomForestRegressor(max_depth=6,random_state=0,n_estimators=10)
model.fit(X_train, Y_train)
import shap
shap_values = shap.TreeExplainer(model).shap_values(X_train)
f = plt.figure()
shap.summary_plot(shap_values, X_train, plot_type="bar")
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.shap_summary_plot_bar.png", bbox_inches='tight', dpi=600)
import matplotlib.pyplot as plt
f = plt.figure()
shap.summary_plot(shap_values, X_train)
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.shap_summary_plot_varimp.png", bbox_inches='tight', dpi=600)
# scp noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.shap_summary_plot_varimp.png /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP/.
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J doench.matrix
#SBATCH -N 1
#SBATCH -t 10:00:00
#SBATCH -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014
R CMD BATCH mar15.matrix.R
#sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/mar15.matrix.sh
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
structure <- read.delim("Doench2014.gRNA.ViennaRNA.output.value.id.txt", header=T, sep="\t", stringsAsFactors = F)
nuc <- read.delim("Doench2014.nucleotide_counts_sgRNA_temp.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("Doench2014.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- unique(score[,c(5,7,6)])
colnames(score.df) <- c("sgRNAID", "cut.score", "nucleotide.sequence")
structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]
# structure, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])
structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"
structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]
structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score.df, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "seq", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")
## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
onehot.ind1 <- read.delim("Doench2014_ind1.txt", header=T, sep=" ")
onehot.ind2 <- read.delim("Doench2014_ind2.txt", header=T, sep=" ")
onehot.dep1 <- read.delim("Doench2014_dep1.txt", header=F, sep=" ")
onehot.dep2 <- read.delim("Doench2014_dep2.txt", header=F, sep=" ")
onehot.dep3 <- read.delim("Doench2014_dep3.txt", header=F, sep=" ")
onehot.dep4 <- read.delim("Doench2014_dep4.txt", header=F, sep=" ")
colnames(onehot.dep1)[1] <- "sgRNAID"
colnames(onehot.dep2)[1] <- "sgRNAID"
colnames(onehot.dep3)[1] <- "sgRNAID"
colnames(onehot.dep4)[1] <- "sgRNAID"
onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)-1], onehot.dep2[,1:ncol(onehot.dep2)-1], by="sgRNAID")
onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)-1], by="sgRNAID")
onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)-1], by="sgRNAID")
onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"
data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))
df.melt <- melt(data.onehot[,c(1,2,4:ncol(data.onehot))], id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
write.table(df.id, "Doench2014.structure.temp.gc.onehot1to4.txt", quote=F, row.names=F, sep="\t")
#
# pam (distance and nucleotide)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
sgRNA.pam <- read.table("Doench2014.sgRNA.closestPAM.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.pam.sub <- sgRNA.pam[,c(4,12,13)]
colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]
sgRNA.pam.id <- sgRNA.pam.df
score <- read.delim("Doench2014.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(5:6)]
colnames(score.df) <- c("sgRNAID", "cut.score")
score.location <- left_join(score.df, sgRNA.pam.id, by="sgRNAID")
score.location$scale <- 0
df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.pam.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df <- read.delim("Doench2014.structure.temp.gc.onehot1to4.txt", header=T, sep="\t")
df.onehot.dcast <- df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.onehot.pam <- left_join(df.onehot.dcast, df.pam.dcast, by=c("sgRNAID"))
df.onehot.pam.na <- na.omit(df.onehot.pam)
nrow(df.onehot.pam.na)
#
# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
sgRNA.genes <- read.table("Doench2014.sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- sgRNA.genes[,c(4,14)]
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")
sgRNA.genes.id <- sgRNA.genes.df
score.location <- left_join(score.df, sgRNA.genes.id, by=c("sgRNAID"))
score.location$scale <- 0
df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.location.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.location.dcast.na <- na.omit(df.location.dcast)
df.pam.location <- inner_join(df.location.dcast.na, df.onehot.pam.na, by=c("sgRNAID"))
nrow(df.pam.location)
#
write.table(df.pam.location, "Doench2014.raw.matrix.txt", quote=F, row.names=F, sep="\t")
# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
# Monomer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Monomer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/")
seq <- read.delim("Doench2014.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq) <- seq[,1]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Doench2014.quantum.monomer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014.quantum.monomer.melt.txt", quote=F, row.names=F, sep="\t")
# Basepair
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Basepair.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/")
seq <- read.delim("Doench2014.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq) <- seq[,1]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Doench2014.quantum.basepair.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014.quantum.basepair.melt.txt", quote=F, row.names=F, sep="\t")
# Dimer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Dimer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/")
seq <- read.delim("Doench2014.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.dimer <- seq %>% unite("p1", p1:p2, remove=F, sep= "") %>% unite("p2", p2:p3, remove=F, sep= "") %>% unite("p3", p3:p4, remove=F, sep= "") %>% unite("p4", p4:p5, remove=F, sep= "") %>% unite("p5", p5:p6, remove=F, sep= "") %>% unite("p6", p6:p7, remove=F, sep= "") %>% unite("p7", p7:p8, remove=F, sep= "") %>% unite("p8", p8:p9, remove=F, sep= "") %>% unite("p9", p9:p10, remove=F, sep= "") %>% unite("p10", p10:p11, remove=F, sep= "") %>% unite("p11", p11:p12, remove=F, sep= "") %>% unite("p12", p12:p13, remove=F, sep= "") %>% unite("p13", p13:p14, remove=F, sep= "") %>% unite("p14", p14:p15, remove=F, sep= "") %>% unite("p15", p15:p16, remove=F, sep= "") %>% unite("p16", p16:p17, remove=F, sep= "") %>% unite("p17", p17:p18, remove=F, sep= "") %>% unite("p18", p18:p19, remove=F, sep= "") %>% unite("p19", p19:p20, remove=T, sep= "")
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq.dimer) <- seq.dimer[,1]
seq.df <- seq.dimer[,1:20]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Doench2014.quantum.dimer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014.quantum.dimer.melt.txt", quote=F, row.names=F, sep="\t")
# Trimer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Trimer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/")
seq <- read.delim("Doench2014.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.trimer <- seq %>% unite("p1", p1:p3, remove=F, sep= "") %>% unite("p2", p2:p4, remove=F, sep= "") %>% unite("p3", p3:p5, remove=F, sep= "") %>% unite("p4", p4:p6, remove=F, sep= "") %>% unite("p5", p5:p7, remove=F, sep= "") %>% unite("p6", p6:p8, remove=F, sep= "") %>% unite("p7", p7:p9, remove=F, sep= "") %>% unite("p8", p8:p10, remove=F, sep= "") %>% unite("p9", p9:p11, remove=F, sep= "") %>% unite("p10", p10:p12, remove=F, sep= "") %>% unite("p11", p11:p13, remove=F, sep= "") %>% unite("p12", p12:p14, remove=F, sep= "") %>% unite("p13", p13:p15, remove=F, sep= "") %>% unite("p14", p14:p16, remove=F, sep= "") %>% unite("p15", p15:p17, remove=F, sep= "") %>% unite("p16", p16:p18, remove=F, sep= "") %>% unite("p17", p17:p19, remove=F, sep= "") %>% unite("p18", p18:p20, remove=F, sep= "")
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq.trimer) <- seq.trimer[,1]
seq.df <- seq.trimer[,1:19]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Doench2014.quantum.trimer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014.quantum.trimer.melt.txt", quote=F, row.names=F, sep="\t")
# Tetramer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Tetramer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/")
seq <- read.delim("Doench2014.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.tetramer <- seq %>% unite("p1", p1:p4, remove=F, sep= "") %>% unite("p2", p2:p5, remove=F, sep= "") %>% unite("p3", p3:p6, remove=F, sep= "") %>% unite("p4", p4:p7, remove=F, sep= "") %>% unite("p5", p5:p8, remove=F, sep= "") %>% unite("p6", p6:p9, remove=F, sep= "") %>% unite("p7", p7:p10, remove=F, sep= "") %>% unite("p8", p8:p11, remove=F, sep= "") %>% unite("p9", p9:p12, remove=F, sep= "") %>% unite("p10", p10:p13, remove=F, sep= "") %>% unite("p11", p11:p14, remove=F, sep= "") %>% unite("p12", p12:p15, remove=F, sep= "") %>% unite("p13", p13:p16, remove=F, sep= "") %>% unite("p14", p14:p17, remove=F, sep= "") %>% unite("p15", p15:p18, remove=F, sep= "") %>% unite("p16", p16:p19, remove=F, sep= "") %>% unite("p17", p17:p20, remove=F, sep= "")
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq.tetramer) <- seq.tetramer[,1]
seq.df <- seq.tetramer[,1:18]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Doench2014.quantum.tetramer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014.quantum.tetramer.melt.txt", quote=F, row.names=F, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/")
monomer <- read.delim("Doench2014.quantum.monomer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
basepair <- read.delim("Doench2014.quantum.basepair.melt.txt", header=T, sep="\t", stringsAsFactors = F)
dimer <- read.delim("Doench2014.quantum.dimer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
trimer <- read.delim("Doench2014.quantum.trimer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
tetramer <- read.delim("Doench2014.quantum.tetramer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
monomer.basepair <- rbind(monomer, basepair)
monomer.basepair.dimer <- rbind(monomer.basepair, dimer)
monomer.basepair.dimer.trimer <- rbind(monomer.basepair.dimer, trimer)
monomer.basepair.dimer.trimer.tetramer <- rbind(monomer.basepair.dimer.trimer, tetramer)
write.table(monomer.basepair.dimer.trimer.tetramer, "Doench2014.15mar22.quantum.melt.txt", quote=F, row.names=F, sep="\t")
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
df <- read.delim("Doench2014.raw.matrix.txt", header=T, sep="\t", stringsAsFactors = F)
# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
tensor <- read.delim("Doench2014.15mar22.quantum.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0
tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0
df.score <- unique(df[,c(1,2)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
colnames(tensor.score.order) <- c("cut.score", "feature.scale", "sgRNAID", "value")
df.dcast <- tensor.score.order %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
nrow(df.dcast.na)
# 673
df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
write.table(df.location, "Doench2014.finalquantum.txt", quote=F, row.names=F, sep="\t")
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
df <- read.delim("Doench2014.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df.cut <- df %>% select(-grep("cut.score.y.y", names(df)), -grep("cut.score.y", names(df)), -grep("cut.score.x.x", names(df)))
df.num <- mutate_all(df.cut[,2:ncol(df.cut)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df.cut[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"
write.table(df.all, "Doench2014.finalquantum.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "Doench2014.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "Doench2014.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "Doench2014.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Doench2014.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Doench2014.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.all[,2]), "Doench2014.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
# run python scripts on Andes
# run job submissions on Summit
# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]
# Andes
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Doench2014.finalquantum --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.finalquantum.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum/Submits/submit_full_Doench2014.finalquantum_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum/Submits/submit_train_Doench2014.finalquantum_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum/Submits/submit_test_Doench2014.finalquantum_0.sh
# Andes
module load python/3.7-anaconda3
vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/YNames.txt Doench2014.finalquantum
# 0.32761105236921945
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Doench2014.finalquantum_cut.score.importance4 | head
# p13tetramer.Hbond.stackingraw: 1.68426
# p20monomer.HLgap.eVraw: 0.92112
# p14tetramer.Hbond.energyraw: 0.817467
# p20monomer.No.electronsraw: 0.773501
# p15dimer.Hbond.stackingraw: 0.672684
# V111.xsgRNA.raw: 0.564113
# p17tetramer.Hbond.stackingraw: 0.543763
# p13tetramer.Hbond.energyraw: 0.423488
# p12trimer.Hbond.energyraw: 0.257399
# AsgRNA.raw: 0.246818
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Doench2014.finalquantum_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.3024251
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score Doench2014.finalquantum
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum/cut.score/RIT.run
# p13tetramer.Hbond.stackingraw cut.score 0.11125470557540898 -0.0007895928967698461 459.388 0.19550837155006962
# p20monomer.HLgap.eVraw cut.score 0.08848169905650763 0.0009442158277223906 311.728 0.10390178588720794
# p20monomer.No.electronsraw cut.score 0.08696537886884295 0.0009343689766738636 302.87 0.1055127545554004
# p17tetramer.Hbond.stackingraw cut.score 0.07192122631089481 0.0003078482264406661 272.63 0.14631521294929967
# p15dimer.Hbond.stackingraw cut.score 0.06680307047040686 -0.0006860065623903445 299.334 0.186762337828679
# p19dimer.Hbond.stackingraw cut.score 0.054833195861223226 0.0008171232034317624 191.97 0.1406645538533321
# p14trimer.Hbond.energyraw cut.score 0.05269802703309119 -0.00044701900945406616 204.47 0.1810428619996191
# p13tetramer.Hbond.energyraw cut.score 0.04698573790321235 -0.0006259350766183389 170.246 0.1997767598362409
# p14tetramer.Hbond.energyraw cut.score 0.04173947787065836 6.88679903870876e-05 206.602 0.12690147469165897
# p11tetramer.Hbond.energyraw cut.score 0.03248012764980142 -0.0002955808579939091 166.789 0.1572346687082362
library(ggplot2)
library(reshape2)
library(RColorBrewer)
# Figure 5A
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum/cut.score")
imp <- read.delim("Doench2014.finalquantum.importance4.effect_sorted", header=F, sep="\t", stringsAsFactors = F)
colnames(imp) <- c("Feature", "YVec", "NormEdge", "Effect", "Samples", "Linearity")
imp$Normalized.Importance <- as.numeric(imp$NormEdge)
imp$Feature.Effect <- as.numeric(imp$Effect)
imp$SampleCount <- as.numeric(imp$Samples)
imp.dir <- imp %>% mutate(Effect.Direction = ifelse(Feature.Effect < 0, "neg", ifelse(Feature.Effect > 0, "pos", "zero")))
imp.dir.top20 <- imp.dir[1:20,]
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
pdf("Doench2014.Imp.Dir.Top20.21March.pdf")
ggplot(imp.dir.top20) + geom_bar(aes(x=reorder(Feature, -Normalized.Importance), y=Normalized.Importance, fill=Effect.Direction), stat="identity") + theme_classic() + xlab("Doench2014 Top Features") + ylab("Normalized Importance") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_fill_brewer(palette="Set1")
dev.off()
pdf("Doench2014.Imp.Dir.Top20.Effect.21March.pdf")
imp.dir.top20$Sample.Prop <- imp.dir.top20$SampleCount/32374
ggplot(imp.dir.top20, aes(x=reorder(Feature, -Normalized.Importance))) + geom_point(aes(y=Sample.Prop, color=Effect.Direction, size=Feature.Effect)) + xlab("Doench2014") + ylab("Avg Proportion of Samples that Features Influence") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Set2")
dev.off()
#### Figure 5B: Focus on effect size
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum/cut.score")
imp <- read.delim("Doench2014.finalquantum.importance4.effect_sorted", header=F, sep="\t", stringsAsFactors = F)
colnames(imp) <- c("Feature", "YVec", "NormEdge", "Effect", "Samples", "Linearity")
imp$Normalized.Importance <- as.numeric(imp$NormEdge)
imp$Feature.Effect <- as.numeric(imp$Effect)
imp$SampleCount <- as.numeric(imp$Samples)
imp.dir <- imp %>% mutate(Effect.Direction = ifelse(Feature.Effect < 0, "neg", ifelse(Feature.Effect > 0, "pos", "zero")))
imp.dir$absEffect <- abs(imp.dir$Feature.Effect)
imp.dir.effectsorted <- imp.dir[order(imp.dir$absEffect, decreasing = TRUE),]
imp.dir.effectsorted.top20 <- imp.dir.effectsorted[1:20,]
pdf("Doench2014.Imp.Dir.Top20Effect.Effect.pdf")
ggplot(imp.dir.effectsorted.top20) + geom_point(aes(x=Feature, y=absEffect, color=Effect.Direction, size=Normalized.Importance)) + xlab("") + ylab("abs(Effect Size)") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Set2")
dev.off()
## Main H.sapien feature figure
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum/cut.score")
imp <- read.delim("Doench2014.finalquantum.importance4.effect_sorted", header=F, sep="\t", stringsAsFactors = F)
colnames(imp) <- c("Feature", "YVec", "NormEdge", "Effect", "Samples", "Linearity")
imp$Normalized.Importance <- as.numeric(imp$NormEdge)
imp$Feature.Effect <- as.numeric(imp$Effect)
imp$SampleCount <- as.numeric(imp$Samples)
imp.dir <- imp %>% mutate(Effect.Direction = ifelse(Feature.Effect < 0, "neg", ifelse(Feature.Effect > 0, "pos", "zero")))
imp.dir.top20 <- imp.dir[1:20,]
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
imp.dir.top20.df <- imp.dir.top20 %>% mutate(imp.dir = ifelse(Effect.Direction == "neg", Normalized.Importance*-1, Normalized.Importance))
imp.dir.top20.df$Feature.Label <- c("Tetramer H-stacking pos13", "Monomer HL-gap pos20", "Monomer No.Electrons pos20", "Tetramer H-stacking pos17", "Dimer H-stacking pos15", "Dimer H-stacking pos19", "Trimer H-bond pos14", "Tetramer H-bond pos13", "Tetramer H-bond pos14", "Tetramer H-bond pos11", "Tetramer H-stacking pos11", "Tetramer H-stacking pos1", "Tetramer H-bond pos12", "Trimer H-bond pos12", "Tetramer H-stacking pos10", "Tetramer H-stacking pos14", "Dimer HL-gap pos8", "Adenines count", "Trimer H-stacking pos15", "Trimer H-stacking pos14")
library(ggplot2)
pdf("Doench2014.FeatureEngineering.pdf")
ggplot(imp.dir.top20.df, aes(x=reorder(Feature.Label, -Normalized.Importance), y=imp.dir, color=Effect.Direction)) + geom_point(size=3) + geom_segment(aes(x=Feature.Label, xend=Feature.Label, y=0, yend=imp.dir)) + labs(title="Doench2014 Top Features") + ylab("Normalized Importance") + xlab("") + theme(axis.text.x = element_text(angle=90, vjust=0.6)) + scale_fill_brewer(palette="Set1") + theme_classic() + coord_flip()
dev.off()
pdf("Doench2014.FeatureEngineering.nocolor.pdf")
ggplot(imp.dir.top20.df, aes(x=reorder(Feature.Label, -Normalized.Importance), y=imp.dir), color="black") + geom_point(size=3) + geom_segment(aes(x=Feature.Label, xend=Feature.Label, y=0, yend=imp.dir)) + labs(title="Doench2014 Top Features") + ylab("Normalized Importance") + xlab("") + theme(axis.text.x = element_text(angle=90, vjust=0.6)) + theme_classic() + coord_flip()
dev.off()
“Within each gene, passing sgRNAs were first ranked, with the best sgRNA receiving the rank of 1. This number was then divided by the total number of sgRNAs, which was then subtracted from 1 to determine a percent-rank. This results in the worst sgRNA for a gene receiving a percent-rank of 0, while the best sgRNA will have a percent-rank approaching 1. Percent-rank values were averaged for genes that were assayed in more than one cell line.”
# scp /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/human/Doench.et.al.2014.supp7.txt noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/Doench2014.genepercentrank.score.txt
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/")
df <- read.delim("Doench2014.genepercentrank.score.txt", header=T, sep="\t")
library(dplyr)
df2 <- df %>% mutate(id = row_number())
df3 <- df2[,c(10,2,8)]
colnames(df3) <- c("sgRNAID", "nucleotide.sequence", "cut.score")
df3$nucleotide.sequence <- substr(df3$nucleotide.sequence, 5, 27)
df.na <- na.omit(df3)
write.table(df.na, "Doench2014.genepercentrank.ngg.txt", quote=F, row.names=F, sep="\t")
df.na$nucleotide.sequence <- substr(df.na$nucleotide.sequence, 1, 20)
write.table(df.na, "Doench2014.genepercentrank.txt", quote=F, row.names=F, sep="\t")
# 1841
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/
sed '1d' Doench2014.genepercentrank.txt | awk '{print ">"$1"\n"$2}' > Doench2014.genepercentrank.fasta
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query Doench2014.genepercentrank.fasta -db ../Doench2014/GCF_000001405.39_GRCh38.p13_genomic.fna -out Doench2014.genepercentrank.gRNA.blast.tab -outfmt 6 -task blastn-short -num_threads 10
awk '{if ($9 > $10) print $2"\t"$10"\t"$9"\t"$1}' Doench2014.genepercentrank.gRNA.blast.tab > tmp1.bed
awk '{if ($10 > $9) print $2"\t"$9"\t"$10"\t"$1}' Doench2014.genepercentrank.gRNA.blast.tab > tmp2.bed
cat tmp1.bed tmp2.bed > Doench2014.genepercentrank.gRNA.blast.bed
# 105959
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
# R
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/")
df <- read.delim("Doench2014.genepercentrank.txt", header=T, sep="\t")
colnames(df) <- c("sgRNAID", "nucleotide.sequence", "cut.score")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/")
coord <- read.delim("Doench2014.genepercentrank.gRNA.blast.bed", header=F, sep="\t")
colnames(coord) <- c("chr", "start", "end", "sgRNA")
df$sgRNA <- df$sgRNAID
library(dplyr)
df.coord <- left_join(coord, df, by="sgRNA")
write.table(df.coord, "Doench2014.genepercentrank.sgRNA.coord.txt", quote=F, row.names=F, sep="\t")
length(unique(df.coord$sgRNAID))
# 1826
https://www.tbi.univie.ac.at/RNA/tutorial/ minimum free energy (MFE) structure
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate ViennaRNA
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/vienna
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/vienna
RNAfold < ../Doench2014.genepercentrank.fasta > Doench2014.genepercentrank.gRNA.ViennaRNA.output.txt
grep '(' Doench2014.genepercentrank.gRNA.ViennaRNA.output.txt | grep -Eo '[+-]?[0-9]+([.][0-9]+)?' > Doench2014.genepercentrank.gRNA.ViennaRNA.output.value.txt
grep '>' Doench2014.genepercentrank.gRNA.ViennaRNA.output.txt | sed 's/>//g' > Doench2014.genepercentrank.gRNA.names.txt
paste Doench2014.genepercentrank.gRNA.names.txt Doench2014.genepercentrank.gRNA.ViennaRNA.output.value.txt > Doench2014.genepercentrank.gRNA.ViennaRNA.output.value.id.txt
cp Doench2014.genepercentrank.gRNA.ViennaRNA.output.value.id.txt ../.
https://biopython.org/docs/1.75/api/Bio.SeqUtils.MeltingTemp.html
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
Bio.SeqUtils.MeltingTemp.Tm_NN(seq, check=True, strict=True, c_seq=None, shift=0, nn_table=None, tmm_table=None, imm_table=None, de_table=None, dnac1=25, dnac2=25, selfcomp=False, Na=50, K=0, Tris=0, Mg=0, dNTPs=0, saltcorr=5)
https://warwick.ac.uk/fac/sci/moac/people/students/peter_cock/python/fasta_n
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
# count nucleotides
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank
python3
input_file = open('Doench2014.genepercentrank.fasta', 'r')
output_file = open('nucleotide_counts_sgRNA.tsv','w')
output_file.write('Window\tA\tC\tG\tT\tLength\tCG%\n')
from Bio import SeqIO
for cur_record in SeqIO.parse(input_file, "fasta") :
gene_name = cur_record.name
A_count = cur_record.seq.count('A')
C_count = cur_record.seq.count('C')
G_count = cur_record.seq.count('G')
T_count = cur_record.seq.count('T')
length = len(cur_record.seq)
cg_percentage = float(C_count + G_count) / length
output_line = '%s\t%i\t%i\t%i\t%i\t%i\t%f\n' % \
(gene_name, A_count, C_count, G_count, T_count, length, cg_percentage)
output_file.write(output_line)
output_file.close()
input_file.close()
exit()
# Melting temperature(°C) = 64.9 + 41 * (nG+nC-16.4)/(nA+nT+nG+nC)
R
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank")
df <- read.delim("nucleotide_counts_sgRNA.tsv", header=T, sep="\t")
df.melt <- df %>% mutate(MeltingTemp = 64.9 + 41 * (G+C-16.4) / (A+T+G+C))
write.table(df.melt, "Doench2014.genepercentrank.nucleotide_counts_sgRNA_temp.txt", quote=F, row.names=F, sep="\t")
q()
# positional encoding kmers 1-4
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/
cut -f 1-2 Doench2014.genepercentrank.txt > Doench2014.genepercentrank.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer1_positional_encode.py Doench2014.genepercentrank.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer2_positional_encode.py Doench2014.genepercentrank.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer3_positional_encode.py Doench2014.genepercentrank.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer4_positional_encode.py Doench2014.genepercentrank.noscore.txt
# separate nucleotide sequence values into individual columns in data frame so each position counts as one feature
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/
sed '1d' Doench2014.genepercentrank.noscore_dependent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Doench2014.genepercentrank_dep1.txt
sed '1d' Doench2014.genepercentrank.noscore_dependent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Doench2014.genepercentrank_dep2.txt
sed '1d' Doench2014.genepercentrank.noscore_dependent3.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Doench2014.genepercentrank_dep3.txt
sed '1d' Doench2014.genepercentrank.noscore_dependent4.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Doench2014.genepercentrank_dep4.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/encode_sequences.py Doench2014.genepercentrank.noscore.txt
sed '1d' Doench2014.genepercentrank.noscore_independent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID A C T G' | cut -d ' ' -f 1-5 > Doench2014.genepercentrank_ind1.txt
sed '1d' Doench2014.genepercentrank.noscore_independent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID AA AC AT AG CA CC CT CG TA TC TT TG GA GC GT GG' | cut -d ' ' -f 1-17 > Doench2014.genepercentrank_ind2.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/
sed '1d' Doench2014.genepercentrank.noscore.txt | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 p16 p17 p18 p19 p20' | cut -d ' ' -f 1-21 > Doench2014.genepercentrank.sequence.txt
# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
library(dplyr)
library(reshape2)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("protein_rna_dna-vector_lee_nucleotide_dna_data_15dec.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/")
seq <- read.delim("Doench2014.genepercentrank.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:5]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- c("A", "C", "G", "T")
rownames(seq) <- seq[,1]
seq.df <- seq[,2:21]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Doench2014.genepercentrank.tensorsAll.single.bp.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014.genepercentrank.tensorsAll.single.bp.melt.txt", quote=F, row.names=F, sep="\t")
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank")
df <- read.delim("Doench2014.genepercentrank.ngg.txt")
df$ngg <- substr(df$nucleotide.sequence, 21, 23)
df$nucleotide.sequence <- substr(df$nucleotide.sequence, 1, 20)
df$pam.distance <- 1
write.table(df, "Doench2014.genepercentrank.sgRNA.closestPAM.bed", quote=F, row.names=F, sep='\t')
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank
cut -f 1-4 Doench2014.genepercentrank.sgRNA.coord.txt | sed '1d' | sort -k 1,1 -k 2,2n > Doench2014.genepercentrank.sgRNA.coord.bed
bedtools closest -a Doench2014.genepercentrank.sgRNA.coord.bed -b ../Doench2014/GCF_000001405.39_GRCh38.p13_genomic.gene.sorted.gtf -D b > Doench2014.genepercentrank.sgRNA.gene.closest.bed
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank")
structure <- read.delim("Doench2014.genepercentrank.gRNA.ViennaRNA.output.value.id.txt", header=T, sep="\t", stringsAsFactors = F)
nuc <- read.delim("Doench2014.genepercentrank.nucleotide_counts_sgRNA_temp.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("Doench2014.genepercentrank.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- unique(score[,c(5,7,6)])
colnames(score.df) <- c("sgRNAID", "cut.score", "nucleotide.sequence")
structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]
# structure, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])
structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"
structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]
structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score.df, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "seq", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")
## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank")
onehot.ind1 <- read.delim("Doench2014.genepercentrank_ind1.txt", header=T, sep=" ")
onehot.ind2 <- read.delim("Doench2014.genepercentrank_ind2.txt", header=T, sep=" ")
onehot.dep1 <- read.delim("Doench2014.genepercentrank_dep1.txt", header=F, sep=" ")
onehot.dep2 <- read.delim("Doench2014.genepercentrank_dep2.txt", header=F, sep=" ")
onehot.dep3 <- read.delim("Doench2014.genepercentrank_dep3.txt", header=F, sep=" ")
onehot.dep4 <- read.delim("Doench2014.genepercentrank_dep4.txt", header=F, sep=" ")
colnames(onehot.dep1)[1] <- "sgRNAID"
colnames(onehot.dep2)[1] <- "sgRNAID"
colnames(onehot.dep3)[1] <- "sgRNAID"
colnames(onehot.dep4)[1] <- "sgRNAID"
onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)-1], onehot.dep2[,1:ncol(onehot.dep2)-1], by="sgRNAID")
onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)-1], by="sgRNAID")
onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)-1], by="sgRNAID")
onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"
data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))
df.melt <- melt(data.onehot[,c(1,2,4:ncol(data.onehot))], id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
write.table(df.id, "Doench2014.genepercentrank.structure.temp.gc.onehot1to4.txt", quote=F, row.names=F, sep="\t")
length(unique(df.id$sgRNAID))
# 1825 sgRNAIDs
# pam (distance and nucleotide)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank")
sgRNA.pam <- read.table("Doench2014.genepercentrank.sgRNA.closestPAM.bed", header=T, sep="\t", stringsAsFactors = F)
sgRNA.pam.sub <- sgRNA.pam[,c(1,4,5)]
colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]
sgRNA.pam.id <- sgRNA.pam.df
score <- read.delim("Doench2014.genepercentrank.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(5,7)]
colnames(score.df) <- c("sgRNAID", "cut.score")
score.location <- left_join(score.df, sgRNA.pam.id, by="sgRNAID")
score.location$scale <- 0
df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.pam.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df <- read.delim("Doench2014.genepercentrank.structure.temp.gc.onehot1to4.txt", header=T, sep="\t")
df.onehot.dcast <- df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.onehot.pam <- left_join(df.onehot.dcast, df.pam.dcast, by=c("sgRNAID"))
df.onehot.pam.na <- na.omit(df.onehot.pam)
nrow(df.onehot.pam.na)
# 1825
# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank")
sgRNA.genes <- read.table("Doench2014.genepercentrank.sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- sgRNA.genes[,c(4,14)]
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")
sgRNA.genes.id <- sgRNA.genes.df
score.location <- left_join(score.df, sgRNA.genes.id, by=c("sgRNAID"))
score.location$scale <- 0
df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.location.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.location.dcast.na <- na.omit(df.location.dcast)
df.pam.location <- inner_join(df.location.dcast.na, df.onehot.pam.na, by=c("sgRNAID"))
nrow(df.pam.location)
# 1825
df.final <- df.pam.location[,c(1:3,5:5915,5917:5921)]
ncol(df.final)
# 5919
write.table(df.final, "Doench2014.genepercentrank.raw.matrix.txt", quote=F, row.names=F, sep="\t")
# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
# Monomer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Monomer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/")
seq <- read.delim("Doench2014.genepercentrank.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq) <- seq[,1]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Doench2014.genepercentrank.quantum.monomer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014.genepercentrank.quantum.monomer.melt.txt", quote=F, row.names=F, sep="\t")
# Basepair
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Basepair.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/")
seq <- read.delim("Doench2014.genepercentrank.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq) <- seq[,1]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Doench2014.genepercentrank.quantum.basepair.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014.genepercentrank.quantum.basepair.melt.txt", quote=F, row.names=F, sep="\t")
# Dimer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Dimer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/")
seq <- read.delim("Doench2014.genepercentrank.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.dimer <- seq %>% unite("p1", p1:p2, remove=F, sep= "") %>% unite("p2", p2:p3, remove=F, sep= "") %>% unite("p3", p3:p4, remove=F, sep= "") %>% unite("p4", p4:p5, remove=F, sep= "") %>% unite("p5", p5:p6, remove=F, sep= "") %>% unite("p6", p6:p7, remove=F, sep= "") %>% unite("p7", p7:p8, remove=F, sep= "") %>% unite("p8", p8:p9, remove=F, sep= "") %>% unite("p9", p9:p10, remove=F, sep= "") %>% unite("p10", p10:p11, remove=F, sep= "") %>% unite("p11", p11:p12, remove=F, sep= "") %>% unite("p12", p12:p13, remove=F, sep= "") %>% unite("p13", p13:p14, remove=F, sep= "") %>% unite("p14", p14:p15, remove=F, sep= "") %>% unite("p15", p15:p16, remove=F, sep= "") %>% unite("p16", p16:p17, remove=F, sep= "") %>% unite("p17", p17:p18, remove=F, sep= "") %>% unite("p18", p18:p19, remove=F, sep= "") %>% unite("p19", p19:p20, remove=T, sep= "")
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq.dimer) <- seq.dimer[,1]
seq.df <- seq.dimer[,1:20]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Doench2014.genepercentrank.quantum.dimer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014.genepercentrank.quantum.dimer.melt.txt", quote=F, row.names=F, sep="\t")
# Trimer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Trimer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/")
seq <- read.delim("Doench2014.genepercentrank.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.trimer <- seq %>% unite("p1", p1:p3, remove=F, sep= "") %>% unite("p2", p2:p4, remove=F, sep= "") %>% unite("p3", p3:p5, remove=F, sep= "") %>% unite("p4", p4:p6, remove=F, sep= "") %>% unite("p5", p5:p7, remove=F, sep= "") %>% unite("p6", p6:p8, remove=F, sep= "") %>% unite("p7", p7:p9, remove=F, sep= "") %>% unite("p8", p8:p10, remove=F, sep= "") %>% unite("p9", p9:p11, remove=F, sep= "") %>% unite("p10", p10:p12, remove=F, sep= "") %>% unite("p11", p11:p13, remove=F, sep= "") %>% unite("p12", p12:p14, remove=F, sep= "") %>% unite("p13", p13:p15, remove=F, sep= "") %>% unite("p14", p14:p16, remove=F, sep= "") %>% unite("p15", p15:p17, remove=F, sep= "") %>% unite("p16", p16:p18, remove=F, sep= "") %>% unite("p17", p17:p19, remove=F, sep= "") %>% unite("p18", p18:p20, remove=F, sep= "")
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq.trimer) <- seq.trimer[,1]
seq.df <- seq.trimer[,1:19]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Doench2014.genepercentrank.quantum.trimer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014.genepercentrank.quantum.trimer.melt.txt", quote=F, row.names=F, sep="\t")
# Tetramer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Tetramer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/")
seq <- read.delim("Doench2014.genepercentrank.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.tetramer <- seq %>% unite("p1", p1:p4, remove=F, sep= "") %>% unite("p2", p2:p5, remove=F, sep= "") %>% unite("p3", p3:p6, remove=F, sep= "") %>% unite("p4", p4:p7, remove=F, sep= "") %>% unite("p5", p5:p8, remove=F, sep= "") %>% unite("p6", p6:p9, remove=F, sep= "") %>% unite("p7", p7:p10, remove=F, sep= "") %>% unite("p8", p8:p11, remove=F, sep= "") %>% unite("p9", p9:p12, remove=F, sep= "") %>% unite("p10", p10:p13, remove=F, sep= "") %>% unite("p11", p11:p14, remove=F, sep= "") %>% unite("p12", p12:p15, remove=F, sep= "") %>% unite("p13", p13:p16, remove=F, sep= "") %>% unite("p14", p14:p17, remove=F, sep= "") %>% unite("p15", p15:p18, remove=F, sep= "") %>% unite("p16", p16:p19, remove=F, sep= "") %>% unite("p17", p17:p20, remove=F, sep= "")
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq.tetramer) <- seq.tetramer[,1]
seq.df <- seq.tetramer[,1:18]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Doench2014.genepercentrank.quantum.tetramer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014.genepercentrank.quantum.tetramer.melt.txt", quote=F, row.names=F, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/")
monomer <- read.delim("Doench2014.genepercentrank.quantum.monomer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
basepair <- read.delim("Doench2014.genepercentrank.quantum.basepair.melt.txt", header=T, sep="\t", stringsAsFactors = F)
dimer <- read.delim("Doench2014.genepercentrank.quantum.dimer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
trimer <- read.delim("Doench2014.genepercentrank.quantum.trimer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
tetramer <- read.delim("Doench2014.genepercentrank.quantum.tetramer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
monomer.basepair <- rbind(monomer, basepair)
monomer.basepair.dimer <- rbind(monomer.basepair, dimer)
monomer.basepair.dimer.trimer <- rbind(monomer.basepair.dimer, trimer)
monomer.basepair.dimer.trimer.tetramer <- rbind(monomer.basepair.dimer.trimer, tetramer)
write.table(monomer.basepair.dimer.trimer.tetramer, "Doench2014.genepercentrank.15mar22.quantum.melt.txt", quote=F, row.names=F, sep="\t")
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank")
df <- read.delim("Doench2014.genepercentrank.raw.matrix.txt", header=T, sep="\t", stringsAsFactors = F)
# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank")
tensor <- read.delim("Doench2014.genepercentrank.15mar22.quantum.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0
tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0
df.score <- unique(df[,c(1,2)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
colnames(tensor.score.order) <- c("cut.score", "feature.scale", "sgRNAID", "value")
df.dcast <- tensor.score.order %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
nrow(df.dcast.na)
# 1825
df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
write.table(df.location, "Doench2014.genepercentrank.finalquantum.txt", quote=F, row.names=F, sep="\t")
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank")
df <- read.delim("Doench2014.genepercentrank.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df.cut <- df[,c(1:5919,5921:6236)]
df.num <- mutate_all(df.cut[,2:ncol(df.cut)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df.cut[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"
write.table(df.all, "Doench2014.genepercentrank.finalquantum.df.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "Doench2014.genepercentrank.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "Doench2014.genepercentrank.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "Doench2014.genepercentrank.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Doench2014.genepercentrank.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Doench2014.genepercentrank.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.all[,2]), "Doench2014.genepercentrank.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
# run python scripts on Andes
# run job submissions on Summit
# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]
# Andes
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/iRF.run/Doench2014.genepercentrank.finalquantum
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/iRF.run/Doench2014.genepercentrank.finalquantum
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Doench2014.genepercentrank.finalquantum --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/Doench2014.genepercentrank.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/Doench2014.genepercentrank.finalquantum.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/iRF.run/Doench2014.genepercentrank.finalquantum
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/iRF.run/Doench2014.genepercentrank.finalquantum/Submits/submit_full_Doench2014.genepercentrank.finalquantum_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/iRF.run/Doench2014.genepercentrank.finalquantum/Submits/submit_train_Doench2014.genepercentrank.finalquantum_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/iRF.run/Doench2014.genepercentrank.finalquantum/Submits/submit_test_Doench2014.genepercentrank.finalquantum_0.sh
# Andes
module load python/3.7-anaconda3
vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/iRF.run/Doench2014.genepercentrank.finalquantum
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/YNames.txt Doench2014.genepercentrank.finalquantum
# 0.2755723456206027
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Doench2014.genepercentrank.finalquantum_cut.score.importance4 | head
# p20monomer.HLgap.eVraw: 6.55918
# p20monomer.No.electronsraw: 5.46649
# GGsgRNA.raw: 4.50132
# AsgRNA.raw: 3.85251
# sgRNA.structuresgRNA.raw: 3.82253
# p18trimer.Hbond.energyraw: 3.6557
# p17tetramer.Hlgap.eVEraw: 2.57827
# p17tetramer.Hbond.energyraw: 1.76065
# p15dimer.Hbond.stackingraw: 1.63949
# p16tetramer.Hbond.energyraw: 1.40244
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/iRF.run/Doench2014.genepercentrank.finalquantum/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Doench2014.genepercentrank.finalquantum_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.566357
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/iRF.run/Doench2014.genepercentrank.finalquantum/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score Doench2014.genepercentrank.finalquantum
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/iRF.run/Doench2014.genepercentrank.finalquantum/cut.score/RIT.run
# scp /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/human/Doench.et.al.2014.supp10.txt noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/Doench2014CORRECTED.score.txt
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/")
df <- read.delim("Doench2014CORRECTED.score.txt", header=T, sep="\t")
library(dplyr)
df2 <- df %>% mutate(id = row_number())
df3 <- df2[,c(3,2,13)]
colnames(df3) <- c("sgRNAID", "nucleotide.sequence", "cut.score")
df3$nucleotide.sequence <- substr(df3$nucleotide.sequence, 5, 27)
df.na <- na.omit(df3)
write.table(df.na, "Doench2014CORRECTED.ngg.txt", quote=F, row.names=F, sep="\t")
df.na$nucleotide.sequence <- substr(df.na$nucleotide.sequence, 1, 20)
write.table(df.na, "Doench2014CORRECTED.txt", quote=F, row.names=F, sep="\t")
# 1278
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/
sed '1d' Doench2014CORRECTED.txt | awk '{print ">"$1"\n"$2}' > Doench2014CORRECTED.fasta
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query Doench2014CORRECTED.fasta -db ../Doench2014/GCF_000001405.39_GRCh38.p13_genomic.fna -out Doench2014CORRECTED.gRNA.blast.tab -outfmt 6 -task blastn-short -num_threads 10
awk '{if ($9 > $10) print $2"\t"$10"\t"$9"\t"$1}' Doench2014CORRECTED.gRNA.blast.tab > tmp1.bed
awk '{if ($10 > $9) print $2"\t"$9"\t"$10"\t"$1}' Doench2014CORRECTED.gRNA.blast.tab > tmp2.bed
cat tmp1.bed tmp2.bed > Doench2014CORRECTED.gRNA.blast.bed
# 42037
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
# R
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/")
df <- read.delim("Doench2014CORRECTED.txt", header=T, sep="\t")
colnames(df) <- c("sgRNAID", "nucleotide.sequence", "cut.score")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/")
coord <- read.delim("Doench2014CORRECTED.gRNA.blast.bed", header=F, sep="\t")
colnames(coord) <- c("chr", "start", "end", "sgRNA")
df$sgRNA <- df$sgRNAID
library(dplyr)
df.coord <- left_join(coord, df, by="sgRNA")
write.table(df.coord, "Doench2014CORRECTED.sgRNA.coord.txt", quote=F, row.names=F, sep="\t")
length(unique(df.coord$sgRNAID))
#1278
https://www.tbi.univie.ac.at/RNA/tutorial/ minimum free energy (MFE) structure
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate ViennaRNA
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/vienna
RNAfold < ../Doench2014CORRECTED.fasta > Doench2014CORRECTED.gRNA.ViennaRNA.output.txt
grep '(' Doench2014CORRECTED.gRNA.ViennaRNA.output.txt | grep -Eo '[+-]?[0-9]+([.][0-9]+)?' > Doench2014CORRECTED.gRNA.ViennaRNA.output.value.txt
grep '>' Doench2014CORRECTED.gRNA.ViennaRNA.output.txt | sed 's/>//g' > Doench2014CORRECTED.gRNA.names.txt
paste Doench2014CORRECTED.gRNA.names.txt Doench2014CORRECTED.gRNA.ViennaRNA.output.value.txt > Doench2014CORRECTED.gRNA.ViennaRNA.output.value.id.txt
cp Doench2014CORRECTED.gRNA.ViennaRNA.output.value.id.txt ../.
https://biopython.org/docs/1.75/api/Bio.SeqUtils.MeltingTemp.html
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
Bio.SeqUtils.MeltingTemp.Tm_NN(seq, check=True, strict=True, c_seq=None, shift=0, nn_table=None, tmm_table=None, imm_table=None, de_table=None, dnac1=25, dnac2=25, selfcomp=False, Na=50, K=0, Tris=0, Mg=0, dNTPs=0, saltcorr=5)
https://warwick.ac.uk/fac/sci/moac/people/students/peter_cock/python/fasta_n
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
# count nucleotides
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED
python3
input_file = open('Doench2014CORRECTED.fasta', 'r')
output_file = open('nucleotide_counts_sgRNA.tsv','w')
output_file.write('Window\tA\tC\tG\tT\tLength\tCG%\n')
from Bio import SeqIO
for cur_record in SeqIO.parse(input_file, "fasta") :
gene_name = cur_record.name
A_count = cur_record.seq.count('A')
C_count = cur_record.seq.count('C')
G_count = cur_record.seq.count('G')
T_count = cur_record.seq.count('T')
length = len(cur_record.seq)
cg_percentage = float(C_count + G_count) / length
output_line = '%s\t%i\t%i\t%i\t%i\t%i\t%f\n' % \
(gene_name, A_count, C_count, G_count, T_count, length, cg_percentage)
output_file.write(output_line)
output_file.close()
input_file.close()
exit()
# Melting temperature(°C) = 64.9 + 41 * (nG+nC-16.4)/(nA+nT+nG+nC)
R
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED")
df <- read.delim("nucleotide_counts_sgRNA.tsv", header=T, sep="\t")
df.melt <- df %>% mutate(MeltingTemp = 64.9 + 41 * (G+C-16.4) / (A+T+G+C))
write.table(df.melt, "Doench2014CORRECTED.nucleotide_counts_sgRNA_temp.txt", quote=F, row.names=F, sep="\t")
q()
# positional encoding kmers 1-4
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/
cut -f 1-2 Doench2014CORRECTED.txt > Doench2014CORRECTED.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer1_positional_encode.py Doench2014CORRECTED.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer2_positional_encode.py Doench2014CORRECTED.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer3_positional_encode.py Doench2014CORRECTED.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer4_positional_encode.py Doench2014CORRECTED.noscore.txt
# separate nucleotide sequence values into individual columns in data frame so each position counts as one feature
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/
sed '1d' Doench2014CORRECTED.noscore_dependent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Doench2014CORRECTED_dep1.txt
sed '1d' Doench2014CORRECTED.noscore_dependent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Doench2014CORRECTED_dep2.txt
sed '1d' Doench2014CORRECTED.noscore_dependent3.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Doench2014CORRECTED_dep3.txt
sed '1d' Doench2014CORRECTED.noscore_dependent4.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Doench2014CORRECTED_dep4.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/encode_sequences.py Doench2014CORRECTED.noscore.txt
sed '1d' Doench2014CORRECTED.noscore_independent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID A C T G' | cut -d ' ' -f 1-5 > Doench2014CORRECTED_ind1.txt
sed '1d' Doench2014CORRECTED.noscore_independent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID AA AC AT AG CA CC CT CG TA TC TT TG GA GC GT GG' | cut -d ' ' -f 1-17 > Doench2014CORRECTED_ind2.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/
sed '1d' Doench2014CORRECTED.noscore.txt | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 p16 p17 p18 p19 p20' | cut -d ' ' -f 1-21 > Doench2014CORRECTED.sequence.txt
# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
library(dplyr)
library(reshape2)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("protein_rna_dna-vector_lee_nucleotide_dna_data_15dec.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/")
seq <- read.delim("Doench2014CORRECTED.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:5]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- c("A", "C", "G", "T")
rownames(seq) <- seq[,1]
seq.df <- seq[,2:21]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Doench2014CORRECTED.tensorsAll.single.bp.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014CORRECTED.tensorsAll.single.bp.melt.txt", quote=F, row.names=F, sep="\t")
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED")
df <- read.delim("Doench2014CORRECTED.ngg.txt")
df$ngg <- substr(df$nucleotide.sequence, 21, 23)
df$nucleotide.sequence <- substr(df$nucleotide.sequence, 1, 20)
df$pam.distance <- 1
write.table(df, "Doench2014CORRECTED.sgRNA.closestPAM.bed", quote=F, row.names=F, sep='\t')
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED
cut -f 1-4 Doench2014CORRECTED.sgRNA.coord.txt | sed '1d' | sort -k 1,1 -k 2,2n > Doench2014CORRECTED.sgRNA.coord.bed
bedtools closest -a Doench2014CORRECTED.sgRNA.coord.bed -b ../Doench2014/GCF_000001405.39_GRCh38.p13_genomic.gene.sorted.gtf -D b > Doench2014CORRECTED.sgRNA.gene.closest.bed
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED")
structure <- read.delim("Doench2014CORRECTED.gRNA.ViennaRNA.output.value.id.txt", header=T, sep="\t", stringsAsFactors = F)
nuc <- read.delim("Doench2014CORRECTED.nucleotide_counts_sgRNA_temp.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("Doench2014CORRECTED.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- unique(score[,c(5,7,6)])
colnames(score.df) <- c("sgRNAID", "cut.score", "nucleotide.sequence")
structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]
# structure, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])
structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"
structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]
structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score.df, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "seq", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")
## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED")
onehot.ind1 <- read.delim("Doench2014CORRECTED_ind1.txt", header=T, sep=" ")
onehot.ind2 <- read.delim("Doench2014CORRECTED_ind2.txt", header=T, sep=" ")
onehot.dep1 <- read.delim("Doench2014CORRECTED_dep1.txt", header=F, sep=" ")
onehot.dep2 <- read.delim("Doench2014CORRECTED_dep2.txt", header=F, sep=" ")
onehot.dep3 <- read.delim("Doench2014CORRECTED_dep3.txt", header=F, sep=" ")
onehot.dep4 <- read.delim("Doench2014CORRECTED_dep4.txt", header=F, sep=" ")
colnames(onehot.dep1)[1] <- "sgRNAID"
colnames(onehot.dep2)[1] <- "sgRNAID"
colnames(onehot.dep3)[1] <- "sgRNAID"
colnames(onehot.dep4)[1] <- "sgRNAID"
onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)-1], onehot.dep2[,1:ncol(onehot.dep2)-1], by="sgRNAID")
onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)-1], by="sgRNAID")
onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)-1], by="sgRNAID")
onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"
data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))
df.melt <- melt(data.onehot[,c(1,2,4:ncol(data.onehot))], id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
write.table(df.id, "Doench2014CORRECTED.structure.temp.gc.onehot1to4.txt", quote=F, row.names=F, sep="\t")
# 1277 sgRNAIDs
# pam (distance and nucleotide)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED")
sgRNA.pam <- read.table("Doench2014CORRECTED.sgRNA.closestPAM.bed", header=T, sep="\t", stringsAsFactors = F)
sgRNA.pam.sub <- sgRNA.pam[,c(1,4,5)]
colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]
sgRNA.pam.id <- sgRNA.pam.df
score <- read.delim("Doench2014CORRECTED.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(5,7)]
colnames(score.df) <- c("sgRNAID", "cut.score")
score.location <- left_join(score.df, sgRNA.pam.id, by="sgRNAID")
score.location$scale <- 0
df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.pam.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df <- read.delim("Doench2014CORRECTED.structure.temp.gc.onehot1to4.txt", header=T, sep="\t")
df.onehot.dcast <- df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.onehot.pam <- left_join(df.onehot.dcast, df.pam.dcast, by=c("sgRNAID"))
df.onehot.pam.na <- na.omit(df.onehot.pam)
nrow(df.onehot.pam.na)
# 1277
# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED")
sgRNA.genes <- read.table("Doench2014CORRECTED.sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- sgRNA.genes[,c(4,14)]
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")
sgRNA.genes.id <- sgRNA.genes.df
score.location <- left_join(score.df, sgRNA.genes.id, by=c("sgRNAID"))
score.location$scale <- 0
df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.location.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.location.dcast.na <- na.omit(df.location.dcast)
df.pam.location <- inner_join(df.location.dcast.na, df.onehot.pam.na, by=c("sgRNAID"))
nrow(df.pam.location)
# 1277
df.final <- df.pam.location[,c(1:3,5:5915,5917:5921)]
ncol(df.final)
# 5919
write.table(df.final, "Doench2014CORRECTED.raw.matrix.txt", quote=F, row.names=F, sep="\t")
# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
# Monomer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Monomer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/")
seq <- read.delim("Doench2014CORRECTED.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq) <- seq[,1]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Doench2014CORRECTED.quantum.monomer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014CORRECTED.quantum.monomer.melt.txt", quote=F, row.names=F, sep="\t")
# Basepair
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Basepair.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/")
seq <- read.delim("Doench2014CORRECTED.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq) <- seq[,1]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Doench2014CORRECTED.quantum.basepair.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014CORRECTED.quantum.basepair.melt.txt", quote=F, row.names=F, sep="\t")
# Dimer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Dimer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/")
seq <- read.delim("Doench2014CORRECTED.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.dimer <- seq %>% unite("p1", p1:p2, remove=F, sep= "") %>% unite("p2", p2:p3, remove=F, sep= "") %>% unite("p3", p3:p4, remove=F, sep= "") %>% unite("p4", p4:p5, remove=F, sep= "") %>% unite("p5", p5:p6, remove=F, sep= "") %>% unite("p6", p6:p7, remove=F, sep= "") %>% unite("p7", p7:p8, remove=F, sep= "") %>% unite("p8", p8:p9, remove=F, sep= "") %>% unite("p9", p9:p10, remove=F, sep= "") %>% unite("p10", p10:p11, remove=F, sep= "") %>% unite("p11", p11:p12, remove=F, sep= "") %>% unite("p12", p12:p13, remove=F, sep= "") %>% unite("p13", p13:p14, remove=F, sep= "") %>% unite("p14", p14:p15, remove=F, sep= "") %>% unite("p15", p15:p16, remove=F, sep= "") %>% unite("p16", p16:p17, remove=F, sep= "") %>% unite("p17", p17:p18, remove=F, sep= "") %>% unite("p18", p18:p19, remove=F, sep= "") %>% unite("p19", p19:p20, remove=T, sep= "")
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq.dimer) <- seq.dimer[,1]
seq.df <- seq.dimer[,1:20]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Doench2014CORRECTED.quantum.dimer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014CORRECTED.quantum.dimer.melt.txt", quote=F, row.names=F, sep="\t")
# Trimer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Trimer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/")
seq <- read.delim("Doench2014CORRECTED.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.trimer <- seq %>% unite("p1", p1:p3, remove=F, sep= "") %>% unite("p2", p2:p4, remove=F, sep= "") %>% unite("p3", p3:p5, remove=F, sep= "") %>% unite("p4", p4:p6, remove=F, sep= "") %>% unite("p5", p5:p7, remove=F, sep= "") %>% unite("p6", p6:p8, remove=F, sep= "") %>% unite("p7", p7:p9, remove=F, sep= "") %>% unite("p8", p8:p10, remove=F, sep= "") %>% unite("p9", p9:p11, remove=F, sep= "") %>% unite("p10", p10:p12, remove=F, sep= "") %>% unite("p11", p11:p13, remove=F, sep= "") %>% unite("p12", p12:p14, remove=F, sep= "") %>% unite("p13", p13:p15, remove=F, sep= "") %>% unite("p14", p14:p16, remove=F, sep= "") %>% unite("p15", p15:p17, remove=F, sep= "") %>% unite("p16", p16:p18, remove=F, sep= "") %>% unite("p17", p17:p19, remove=F, sep= "") %>% unite("p18", p18:p20, remove=F, sep= "")
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq.trimer) <- seq.trimer[,1]
seq.df <- seq.trimer[,1:19]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Doench2014CORRECTED.quantum.trimer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014CORRECTED.quantum.trimer.melt.txt", quote=F, row.names=F, sep="\t")
# Tetramer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Tetramer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/")
seq <- read.delim("Doench2014CORRECTED.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.tetramer <- seq %>% unite("p1", p1:p4, remove=F, sep= "") %>% unite("p2", p2:p5, remove=F, sep= "") %>% unite("p3", p3:p6, remove=F, sep= "") %>% unite("p4", p4:p7, remove=F, sep= "") %>% unite("p5", p5:p8, remove=F, sep= "") %>% unite("p6", p6:p9, remove=F, sep= "") %>% unite("p7", p7:p10, remove=F, sep= "") %>% unite("p8", p8:p11, remove=F, sep= "") %>% unite("p9", p9:p12, remove=F, sep= "") %>% unite("p10", p10:p13, remove=F, sep= "") %>% unite("p11", p11:p14, remove=F, sep= "") %>% unite("p12", p12:p15, remove=F, sep= "") %>% unite("p13", p13:p16, remove=F, sep= "") %>% unite("p14", p14:p17, remove=F, sep= "") %>% unite("p15", p15:p18, remove=F, sep= "") %>% unite("p16", p16:p19, remove=F, sep= "") %>% unite("p17", p17:p20, remove=F, sep= "")
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq.tetramer) <- seq.tetramer[,1]
seq.df <- seq.tetramer[,1:18]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Doench2014CORRECTED.quantum.tetramer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014CORRECTED.quantum.tetramer.melt.txt", quote=F, row.names=F, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/")
monomer <- read.delim("Doench2014CORRECTED.quantum.monomer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
basepair <- read.delim("Doench2014CORRECTED.quantum.basepair.melt.txt", header=T, sep="\t", stringsAsFactors = F)
dimer <- read.delim("Doench2014CORRECTED.quantum.dimer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
trimer <- read.delim("Doench2014CORRECTED.quantum.trimer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
tetramer <- read.delim("Doench2014CORRECTED.quantum.tetramer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
monomer.basepair <- rbind(monomer, basepair)
monomer.basepair.dimer <- rbind(monomer.basepair, dimer)
monomer.basepair.dimer.trimer <- rbind(monomer.basepair.dimer, trimer)
monomer.basepair.dimer.trimer.tetramer <- rbind(monomer.basepair.dimer.trimer, tetramer)
write.table(monomer.basepair.dimer.trimer.tetramer, "Doench2014CORRECTED.15mar22.quantum.melt.txt", quote=F, row.names=F, sep="\t")
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED")
df <- read.delim("Doench2014CORRECTED.raw.matrix.txt", header=T, sep="\t", stringsAsFactors = F)
# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED")
tensor <- read.delim("Doench2014CORRECTED.15mar22.quantum.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0
tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0
df.score <- unique(df[,c(1,2)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
colnames(tensor.score.order) <- c("cut.score", "feature.scale", "sgRNAID", "value")
df.dcast <- tensor.score.order %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
nrow(df.dcast.na)
# 16748
df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
write.table(df.location, "Doench2014CORRECTED.finalquantum.txt", quote=F, row.names=F, sep="\t")
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED")
df <- read.delim("w", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df.cut <- df[,c(1:5919,5921:6236)]
df.num <- mutate_all(df.cut[,2:ncol(df.cut)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df.cut[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"
write.table(df.all, "Doench2014CORRECTED.finalquantum.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "Doench2014CORRECTED.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "Doench2014CORRECTED.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "Doench2014CORRECTED.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Doench2014CORRECTED.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Doench2014CORRECTED.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.all[,2]), "Doench2014CORRECTED.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
# run python scripts on Andes
# run job submissions on Summit
# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]
# Andes
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/iRF.run/Doench2014CORRECTED.finalquantum
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/iRF.run/Doench2014CORRECTED.finalquantum
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Doench2014CORRECTED.finalquantum --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/Doench2014CORRECTED.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/Doench2014CORRECTED.finalquantum.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/iRF.run/Doench2014CORRECTED.finalquantum
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/iRF.run/Doench2014CORRECTED.finalquantum/Submits/submit_full_Doench2014CORRECTED.finalquantum_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/iRF.run/Doench2014CORRECTED.finalquantum/Submits/submit_train_Doench2014CORRECTED.finalquantum_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/iRF.run/Doench2014CORRECTED.finalquantum/Submits/submit_test_Doench2014CORRECTED.finalquantum_0.sh
# Andes
module load python/3.7-anaconda3
vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/iRF.run/Doench2014CORRECTED.finalquantum
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/YNames.txt Doench2014CORRECTED.finalquantum
# 0.38912071429062073
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Doench2014CORRECTED.finalquantum_cut.score.importance4 | head
# p19dimer.Hbond.stackingraw: 3.98127
# V247.xsgRNA.raw: 2.42589
# PAM.C0: 1.86213
# p15dimer.Hbond.stackingraw: 1.20198
# p14dimer.Hbond.energyraw: 1.09491
# p13tetramer.Hbond.stackingraw: 0.796051
# p3tetramer.Hbond.energyraw: 0.746174
# p20monomer.HLgap.eVraw: 0.700829
# p20monomer.No.electronsraw: 0.529749
# p14trimer.Hbond.energyraw: 0.500073
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/iRF.run/Doench2014CORRECTED.finalquantum/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Doench2014CORRECTED.finalquantum_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.6525512
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/iRF.run/Doench2014CORRECTED.finalquantum/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score Doench2014CORRECTED.finalquantum
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/iRF.run/Doench2014CORRECTED.finalquantum/cut.score/RIT.run
# p19dimer.Hbond.stackingraw cut.score 0.16562004904649372 1.800345917904429e-05 1213.151 0.17853491514892303
# V247.xsgRNA.raw cut.score 0.09835326754653839 0.0010176665362924215 924.223 0.1464096649214905
# PAM.C0 cut.score 0.09179251740061076 7.345814511089212e-06 961.303 0.2927093722046682
# p3tetramer.Hbond.energyraw cut.score 0.042723720919565535 -2.9640260062614164e-06 348.364 0.23253379511409317
# p14dimer.Hbond.energyraw cut.score 0.04044322736269804 -8.124362014625185e-06 378.372 0.23879281375413255
# p15dimer.Hbond.stackingraw cut.score 0.038112174026197654 -0.0006000076543151262 485.299 0.19449052993622917
# p20monomer.No.electronsraw cut.score 0.03045501914851094 0.0007592500338267883 386.478 0.12710706117366816
# p17tetramer.Hbond.stackingraw cut.score 0.028125521497537727 -1.4651686023597412e-06 250.067 0.17998194182849084
# p2dimer.HLgap.eVEraw cut.score 0.027402481292298345 0.0003752894963077648 425.63 0.13126023383927354
# p14dimer.HLgap.eVEraw cut.score 0.026716706826732422 0.0007290664463055949 169.792 0.16361717817517185
library(ggplot2)
library(reshape2)
library(RColorBrewer)
## Main H.sapien feature figure
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/iRF.run/Doench2014CORRECTED.finalquantum/cut.score")
imp <- read.delim("Doench2014CORRECTED.finalquantum.importance4.effect_sorted", header=F, sep="\t", stringsAsFactors = F)
colnames(imp) <- c("Feature", "YVec", "NormEdge", "Effect", "Samples", "Linearity")
imp$Normalized.Importance <- as.numeric(imp$NormEdge)
imp$Feature.Effect <- as.numeric(imp$Effect)
imp$SampleCount <- as.numeric(imp$Samples)
imp.dir <- imp %>% mutate(Effect.Direction = ifelse(Feature.Effect < 0, "neg", ifelse(Feature.Effect > 0, "pos", "zero")))
imp.dir.top20 <- imp.dir[1:20,]
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
imp.dir.top20.df <- imp.dir.top20 %>% mutate(imp.dir = ifelse(Effect.Direction == "neg", Normalized.Importance*-1, Normalized.Importance))
imp.dir.top20.df$Feature.Label <- c()
library(ggplot2)
pdf("Doench2014CORRECTED.FeatureEngineering.pdf")
ggplot(imp.dir.top20.df, aes(x=reorder(Feature.Label, -Normalized.Importance), y=imp.dir, color=Effect.Direction)) + geom_point(size=3) + geom_segment(aes(x=Feature.Label, xend=Feature.Label, y=0, yend=imp.dir)) + labs(title="Doench2014CORRECTED Top Features") + ylab("Normalized Importance") + xlab("") + theme(axis.text.x = element_text(angle=90, vjust=0.6)) + scale_fill_brewer(palette="Set1") + theme_classic() + coord_flip()
dev.off()
pdf("Doench2014CORRECTED.FeatureEngineering.nocolor.pdf")
ggplot(imp.dir.top20.df, aes(x=reorder(Feature.Label, -Normalized.Importance), y=imp.dir), color="black") + geom_point(size=3) + geom_segment(aes(x=Feature.Label, xend=Feature.Label, y=0, yend=imp.dir)) + labs(title="Doench2014CORRECTED Top Features") + ylab("Normalized Importance") + xlab("") + theme(axis.text.x = element_text(angle=90, vjust=0.6)) + theme_classic() + coord_flip()
dev.off()
https://github.com/Peppags/CNN-SVR/blob/master/data/training_example.csv
https://www.frontiersin.org/articles/10.3389/fgene.2019.01303/full In order to evaluate the performance of our method, we used four public experimental validated gRNA on-target cleavage efficacy independent human datasets, which were integrated and processed by Chuai et al (Chuai et al., 2018). These experimented-based datasets were originally collected from public datasets (Wang et al., 2014; Hart et al., 2015; Doench et al., 2016). They covered gRNAs targeting 1071 genes from four different cell lines, including HCT116 (4239 samples) (Hart et al., 2015), HEK293T (2333 samples) (Doench et al., 2016), HELA (8101 samples) (Hart et al., 2015), and HL60 (2076 samples) (Wang et al., 2014) with redundancy removed. The gRNA on-target activity was strictly restricted to experimental assay, where the cleavage efficiency was defined as the log-fold change in the measured knockout efficacy. Readouts of cleavage efficacies without in vivo (in vitro) experimental validation were excluded.
# scp /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/Sprint.Opioid.ATAC/Genome/GCF_000001405.39_GRCh38.p13_genomic.fna noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/.
# scp /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/Sprint.Opioid.ATAC/Genome/GCF_000001405.39_GRCh38.p13_genomic.gene.sorted.gtf noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/.
# scp /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/human/Chuai.et.al.2018/Chuai.2018.score.txt noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/.
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/")
df <- read.delim("Chuai.2018.score.txt", header=T, sep="\t")
library(dplyr)
df2 <- df %>% mutate(id = row_number())
df3 <- df2[,c(7,5,6)]
colnames(df3) <- c("sgRNAID", "nucleotide.sequence", "cut.score")
df.na <- na.omit(df3)
write.table(df.na, "Chuai2018.ngg.txt", quote=F, row.names=F, sep="\t")
df.na$nucleotide.sequence <- substr(df.na$nucleotide.sequence, 1, 20)
write.table(df.na, "Chuai2018.txt", quote=F, row.names=F, sep="\t")
# 16750
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/
sed '1d' Chuai2018.txt | awk '{print ">"$1"\n"$2}' > Chuai2018.fasta
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018
sed 's/chr10/NC_000010.11/g' Chuai.2018.score.txt | sed 's/chr11/NC_000011.10/g' | sed 's/chr12/NC_000012.12/g' | sed 's/chr13/NC_000013.11/g' | sed 's/chr14/NC_000014.9/g' | sed 's/chr15/NC_000015.10/g' | sed 's/chr16/NC_000016.10/g' | sed 's/chr17/NC_000017.11/g' | sed 's/chr18/NC_000018.10/g' | sed 's/chr19/NC_000019.10/g' | sed 's/chr20/NC_000020.11/g' | sed 's/chr21/NC_000021.9/g' | sed 's/chr22/NC_000022.11/g' | sed 's/chr1/NC_000001.11/g' | sed 's/chr2/NC_000002.12/g' | sed 's/chr3/NC_000003.12/g' | sed 's/chr4/NC_000004.12/g' | sed 's/chr5/NC_000005.10/g' | sed 's/chr6/NC_000006.12/g' | sed 's/chr7/NC_000007.14/g' | sed 's/chr8/NC_000008.11/g' | sed 's/chr9/NC_000009.12/g' | sed 's/chrX/NC_000023.11/g' | sed 's/chrY/NC_000024.10/g' > Chuai.2018.score.chr.txt
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/")
df <- read.delim("Chuai.2018.score.chr.txt", header=T, sep="\t")
df.id <- df %>% mutate(id = row_number())
df.2 <- cbind(df, df.id)
df.coord <- df.2[,c(1:3,13,13,5,6)]
colnames(df.coord) <- c("chr", "start", "end", "sgRNA", "sgRNAID", "nucleotide.sequence", "cut.score")
write.table(df.coord, "Chuai2018.sgRNA.coord.txt", quote=F, row.names=F, sep="\t")
https://www.tbi.univie.ac.at/RNA/tutorial/ minimum free energy (MFE) structure
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate ViennaRNA
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/vienna
RNAfold < ../Chuai2018.fasta > Chuai2018.gRNA.ViennaRNA.output.txt
grep '(' Chuai2018.gRNA.ViennaRNA.output.txt | grep -Eo '[+-]?[0-9]+([.][0-9]+)?' > Chuai2018.gRNA.ViennaRNA.output.value.txt
grep '>' Chuai2018.gRNA.ViennaRNA.output.txt | sed 's/>//g' > Chuai2018.gRNA.names.txt
paste Chuai2018.gRNA.names.txt Chuai2018.gRNA.ViennaRNA.output.value.txt > Chuai2018.gRNA.ViennaRNA.output.value.id.txt
cp Chuai2018.gRNA.ViennaRNA.output.value.id.txt ../.
https://biopython.org/docs/1.75/api/Bio.SeqUtils.MeltingTemp.html
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
Bio.SeqUtils.MeltingTemp.Tm_NN(seq, check=True, strict=True, c_seq=None, shift=0, nn_table=None, tmm_table=None, imm_table=None, de_table=None, dnac1=25, dnac2=25, selfcomp=False, Na=50, K=0, Tris=0, Mg=0, dNTPs=0, saltcorr=5)
https://warwick.ac.uk/fac/sci/moac/people/students/peter_cock/python/fasta_n
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
# count nucleotides
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018
python3
input_file = open('Chuai2018.fasta', 'r')
output_file = open('nucleotide_counts_sgRNA.tsv','w')
output_file.write('Window\tA\tC\tG\tT\tLength\tCG%\n')
from Bio import SeqIO
for cur_record in SeqIO.parse(input_file, "fasta") :
gene_name = cur_record.name
A_count = cur_record.seq.count('A')
C_count = cur_record.seq.count('C')
G_count = cur_record.seq.count('G')
T_count = cur_record.seq.count('T')
length = len(cur_record.seq)
cg_percentage = float(C_count + G_count) / length
output_line = '%s\t%i\t%i\t%i\t%i\t%i\t%f\n' % \
(gene_name, A_count, C_count, G_count, T_count, length, cg_percentage)
output_file.write(output_line)
output_file.close()
input_file.close()
exit()
# Melting temperature(°C) = 64.9 + 41 * (nG+nC-16.4)/(nA+nT+nG+nC)
R
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018")
df <- read.delim("nucleotide_counts_sgRNA.tsv", header=T, sep="\t")
df.melt <- df %>% mutate(MeltingTemp = 64.9 + 41 * (G+C-16.4) / (A+T+G+C))
write.table(df.melt, "Chuai2018.nucleotide_counts_sgRNA_temp.txt", quote=F, row.names=F, sep="\t")
q()
# positional encoding kmers 1-4
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/
cut -f 1-2 Chuai2018.txt > Chuai2018.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer1_positional_encode.py Chuai2018.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer2_positional_encode.py Chuai2018.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer3_positional_encode.py Chuai2018.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer4_positional_encode.py Chuai2018.noscore.txt
# separate nucleotide sequence values into individual columns in data frame so each position counts as one feature
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/
sed '1d' Chuai2018.noscore_dependent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Chuai2018_dep1.txt
sed '1d' Chuai2018.noscore_dependent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Chuai2018_dep2.txt
sed '1d' Chuai2018.noscore_dependent3.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Chuai2018_dep3.txt
sed '1d' Chuai2018.noscore_dependent4.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Chuai2018_dep4.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/encode_sequences.py Chuai2018.noscore.txt
sed '1d' Chuai2018.noscore_independent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID A C T G' | cut -d ' ' -f 1-5 > Chuai2018_ind1.txt
sed '1d' Chuai2018.noscore_independent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID AA AC AT AG CA CC CT CG TA TC TT TG GA GC GT GG' | cut -d ' ' -f 1-17 > Chuai2018_ind2.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/
sed '1d' Chuai2018.noscore.txt | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 p16 p17 p18 p19 p20' | cut -d ' ' -f 1-21 > Chuai2018.sequence.txt
# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
library(dplyr)
library(reshape2)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("protein_rna_dna-vector_lee_nucleotide_dna_data_15dec.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/")
seq <- read.delim("Chuai2018.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:5]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- c("A", "C", "G", "T")
rownames(seq) <- seq[,1]
seq.df <- seq[,2:21]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Chuai2018.tensorsAll.single.bp.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Chuai2018.tensorsAll.single.bp.melt.txt", quote=F, row.names=F, sep="\t")
https://www.synthego.com/guide/how-to-use-crispr/pam-sequence
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J bedtools
#SBATCH -N 1
#SBATCH -p gpu
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018
awk '{print $0"\t""+"}' Chuai2018.sgRNA.coord.bed > Chuai2018.sgRNA.coord.strand.txt
bedtools closest -a Chuai2018.sgRNA.coord.strand.txt -b Chuai2018.NGG.PAM.sorted.bed -io -iu -D a > Chuai2018.sgRNA.closestPAM.bed
bedtools intersect -wo -a Chuai2018.20bp.sliding.bed -b Chuai2018.NGG.PAM.sorted.bed > Chuai2018.NGG.PAM.20bp.sliding.windows.bed
cut -f 1-4 Chuai2018.sgRNA.coord.txt | sed '1d' | sort -k 1,1 -k 2,2n > Chuai2018.sgRNA.coord.bed
bedtools closest -a Chuai2018.sgRNA.coord.bed -b GCF_000001405.39_GRCh38.p13_genomic.gene.sorted.gtf -D b > Chuai2018.sgRNA.gene.closest.bed
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/bedtools.sh
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018")
df <- read.delim("Chuai2018.ngg.txt")
df$ngg <- substr(df$nucleotide.sequence, 21, 23)
df$nucleotide.sequence <- substr(df$nucleotide.sequence, 1, 20)
df$pam.distance <- 1
write.table(df, "Chuai2018.sgRNA.closestPAM.bed", quote=F, row.names=F, sep='\t')
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018
cut -f 1-4 Chuai2018.sgRNA.coord.txt | sed '1d' | sort -k 1,1 -k 2,2n > Chuai2018.sgRNA.coord.bed
bedtools closest -a Chuai2018.sgRNA.coord.bed -b GCF_000001405.39_GRCh38.p13_genomic.gene.sorted.gtf -D b > Chuai2018.sgRNA.gene.closest.bed
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J chuari.matrix
#SBATCH -N 1
#SBATCH -t 10:00:00
#SBATCH -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018
R CMD BATCH chuari.matrix.R
#sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/chuari.matrix.sh
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018")
structure <- read.delim("Chuai2018.gRNA.ViennaRNA.output.value.id.txt", header=T, sep="\t", stringsAsFactors = F)
nuc <- read.delim("Chuai2018.nucleotide_counts_sgRNA_temp.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("Chuai2018.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- unique(score[,c(5,7,6)])
colnames(score.df) <- c("sgRNAID", "cut.score", "nucleotide.sequence")
structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]
# structure, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])
structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"
structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]
structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score.df, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "seq", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")
## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018")
onehot.ind1 <- read.delim("Chuai2018_ind1.txt", header=T, sep=" ")
onehot.ind2 <- read.delim("Chuai2018_ind2.txt", header=T, sep=" ")
onehot.dep1 <- read.delim("Chuai2018_dep1.txt", header=F, sep=" ")
onehot.dep2 <- read.delim("Chuai2018_dep2.txt", header=F, sep=" ")
onehot.dep3 <- read.delim("Chuai2018_dep3.txt", header=F, sep=" ")
onehot.dep4 <- read.delim("Chuai2018_dep4.txt", header=F, sep=" ")
colnames(onehot.dep1)[1] <- "sgRNAID"
colnames(onehot.dep2)[1] <- "sgRNAID"
colnames(onehot.dep3)[1] <- "sgRNAID"
colnames(onehot.dep4)[1] <- "sgRNAID"
onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)-1], onehot.dep2[,1:ncol(onehot.dep2)-1], by="sgRNAID")
onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)-1], by="sgRNAID")
onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)-1], by="sgRNAID")
onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"
data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))
df.melt <- melt(data.onehot[,c(1,2,4:ncol(data.onehot))], id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
write.table(df.id, "Chuai2018.structure.temp.gc.onehot1to4.txt", quote=F, row.names=F, sep="\t")
# pam (distance and nucleotide)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018")
sgRNA.pam <- read.table("Chuai2018.sgRNA.closestPAM.bed", header=T, sep="\t", stringsAsFactors = F)
sgRNA.pam.sub <- sgRNA.pam[,c(1,4,5)]
colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]
sgRNA.pam.id <- sgRNA.pam.df
score <- read.delim("Chuai2018.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(5,7)]
colnames(score.df) <- c("sgRNAID", "cut.score")
score.location <- left_join(score.df, sgRNA.pam.id, by="sgRNAID")
score.location$scale <- 0
df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.pam.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df <- read.delim("Chuai2018.structure.temp.gc.onehot1to4.txt", header=T, sep="\t")
df.onehot.dcast <- df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.onehot.pam <- left_join(df.onehot.dcast, df.pam.dcast, by=c("sgRNAID"))
df.onehot.pam.na <- na.omit(df.onehot.pam)
nrow(df.onehot.pam.na)
# 16748
# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018")
sgRNA.genes <- read.table("Chuai2018.sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- sgRNA.genes[,c(4,14)]
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")
sgRNA.genes.id <- sgRNA.genes.df
score.location <- left_join(score.df, sgRNA.genes.id, by=c("sgRNAID"))
score.location$scale <- 0
df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.location.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.location.dcast.na <- na.omit(df.location.dcast)
df.pam.location <- inner_join(df.location.dcast.na, df.onehot.pam.na, by=c("sgRNAID"))
nrow(df.pam.location)
# 16748
df.final <- df.pam.location[,c(1:3,5:5915,5917:5921)]
ncol(df.final)
# 5919
write.table(df.final, "Chuai2018.raw.matrix.txt", quote=F, row.names=F, sep="\t")
# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
# Monomer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Monomer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/")
seq <- read.delim("Chuai2018.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq) <- seq[,1]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Chuai2018.quantum.monomer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Chuai2018.quantum.monomer.melt.txt", quote=F, row.names=F, sep="\t")
# Basepair
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Basepair.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/")
seq <- read.delim("Chuai2018.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq) <- seq[,1]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Chuai2018.quantum.basepair.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Chuai2018.quantum.basepair.melt.txt", quote=F, row.names=F, sep="\t")
# Dimer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Dimer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/")
seq <- read.delim("Chuai2018.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.dimer <- seq %>% unite("p1", p1:p2, remove=F, sep= "") %>% unite("p2", p2:p3, remove=F, sep= "") %>% unite("p3", p3:p4, remove=F, sep= "") %>% unite("p4", p4:p5, remove=F, sep= "") %>% unite("p5", p5:p6, remove=F, sep= "") %>% unite("p6", p6:p7, remove=F, sep= "") %>% unite("p7", p7:p8, remove=F, sep= "") %>% unite("p8", p8:p9, remove=F, sep= "") %>% unite("p9", p9:p10, remove=F, sep= "") %>% unite("p10", p10:p11, remove=F, sep= "") %>% unite("p11", p11:p12, remove=F, sep= "") %>% unite("p12", p12:p13, remove=F, sep= "") %>% unite("p13", p13:p14, remove=F, sep= "") %>% unite("p14", p14:p15, remove=F, sep= "") %>% unite("p15", p15:p16, remove=F, sep= "") %>% unite("p16", p16:p17, remove=F, sep= "") %>% unite("p17", p17:p18, remove=F, sep= "") %>% unite("p18", p18:p19, remove=F, sep= "") %>% unite("p19", p19:p20, remove=T, sep= "")
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq.dimer) <- seq.dimer[,1]
seq.df <- seq.dimer[,1:20]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Chuai2018.quantum.dimer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Chuai2018.quantum.dimer.melt.txt", quote=F, row.names=F, sep="\t")
# Trimer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Trimer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/")
seq <- read.delim("Chuai2018.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.trimer <- seq %>% unite("p1", p1:p3, remove=F, sep= "") %>% unite("p2", p2:p4, remove=F, sep= "") %>% unite("p3", p3:p5, remove=F, sep= "") %>% unite("p4", p4:p6, remove=F, sep= "") %>% unite("p5", p5:p7, remove=F, sep= "") %>% unite("p6", p6:p8, remove=F, sep= "") %>% unite("p7", p7:p9, remove=F, sep= "") %>% unite("p8", p8:p10, remove=F, sep= "") %>% unite("p9", p9:p11, remove=F, sep= "") %>% unite("p10", p10:p12, remove=F, sep= "") %>% unite("p11", p11:p13, remove=F, sep= "") %>% unite("p12", p12:p14, remove=F, sep= "") %>% unite("p13", p13:p15, remove=F, sep= "") %>% unite("p14", p14:p16, remove=F, sep= "") %>% unite("p15", p15:p17, remove=F, sep= "") %>% unite("p16", p16:p18, remove=F, sep= "") %>% unite("p17", p17:p19, remove=F, sep= "") %>% unite("p18", p18:p20, remove=F, sep= "")
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq.trimer) <- seq.trimer[,1]
seq.df <- seq.trimer[,1:19]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Chuai2018.quantum.trimer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Chuai2018.quantum.trimer.melt.txt", quote=F, row.names=F, sep="\t")
# Tetramer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Tetramer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/")
seq <- read.delim("Chuai2018.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.tetramer <- seq %>% unite("p1", p1:p4, remove=F, sep= "") %>% unite("p2", p2:p5, remove=F, sep= "") %>% unite("p3", p3:p6, remove=F, sep= "") %>% unite("p4", p4:p7, remove=F, sep= "") %>% unite("p5", p5:p8, remove=F, sep= "") %>% unite("p6", p6:p9, remove=F, sep= "") %>% unite("p7", p7:p10, remove=F, sep= "") %>% unite("p8", p8:p11, remove=F, sep= "") %>% unite("p9", p9:p12, remove=F, sep= "") %>% unite("p10", p10:p13, remove=F, sep= "") %>% unite("p11", p11:p14, remove=F, sep= "") %>% unite("p12", p12:p15, remove=F, sep= "") %>% unite("p13", p13:p16, remove=F, sep= "") %>% unite("p14", p14:p17, remove=F, sep= "") %>% unite("p15", p15:p18, remove=F, sep= "") %>% unite("p16", p16:p19, remove=F, sep= "") %>% unite("p17", p17:p20, remove=F, sep= "")
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq.tetramer) <- seq.tetramer[,1]
seq.df <- seq.tetramer[,1:18]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Chuai2018.quantum.tetramer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Chuai2018.quantum.tetramer.melt.txt", quote=F, row.names=F, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/")
monomer <- read.delim("Chuai2018.quantum.monomer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
basepair <- read.delim("Chuai2018.quantum.basepair.melt.txt", header=T, sep="\t", stringsAsFactors = F)
dimer <- read.delim("Chuai2018.quantum.dimer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
trimer <- read.delim("Chuai2018.quantum.trimer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
tetramer <- read.delim("Chuai2018.quantum.tetramer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
monomer.basepair <- rbind(monomer, basepair)
monomer.basepair.dimer <- rbind(monomer.basepair, dimer)
monomer.basepair.dimer.trimer <- rbind(monomer.basepair.dimer, trimer)
monomer.basepair.dimer.trimer.tetramer <- rbind(monomer.basepair.dimer.trimer, tetramer)
write.table(monomer.basepair.dimer.trimer.tetramer, "Chuai2018.15mar22.quantum.melt.txt", quote=F, row.names=F, sep="\t")
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018")
df <- read.delim("Chuai2018.raw.matrix.txt", header=T, sep="\t", stringsAsFactors = F)
# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018")
tensor <- read.delim("Chuai2018.15mar22.quantum.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0
tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0
df.score <- unique(df[,c(1,2)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
colnames(tensor.score.order) <- c("cut.score", "feature.scale", "sgRNAID", "value")
df.dcast <- tensor.score.order %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
nrow(df.dcast.na)
# 16748
df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
write.table(df.location, "Chuai2018.finalquantum.txt", quote=F, row.names=F, sep="\t")
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018")
df <- read.delim("Chuai2018.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df.cut <- df[,c(1:5919,5921:6236)]
df.num <- mutate_all(df.cut[,2:ncol(df.cut)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df.cut[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"
write.table(df.all, "w", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "Chuai2018.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "Chuai2018.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "Chuai2018.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Chuai2018.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Chuai2018.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.all[,2]), "Chuai2018.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
# run python scripts on Andes
# run job submissions on Summit
# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]
# Andes
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Chuai2018.finalquantum --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/Chuai2018.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/Chuai2018.finalquantum.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum/Submits/submit_full_Chuai2018.finalquantum_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum/Submits/submit_train_Chuai2018.finalquantum_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum/Submits/submit_test_Chuai2018.finalquantum_0.sh
# Andes
module load python/3.7-anaconda3
vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/YNames.txt Chuai2018.finalquantum
# 0.23170011706359436
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Chuai2018.finalquantum_cut.score.importance4 | head
# p18monomer.No.electronsraw: 5.83976
# p18monomer.HLgap.eVraw: 4.99657
# p17tetramer.Hbond.energyraw: 3.53595
# p13tetramer.Hbond.stackingraw: 3.50499
# p5tetramer.Hbond.stackingraw: 3.40497
# p17tetramer.Hlgap.eVEraw: 3.37027
# p3tetramer.Hlgap.eVEraw: 3.25658
# p1tetramer.Hbond.stackingraw: 3.16035
# p11tetramer.Hlgap.eVEraw: 3.15086
# p8tetramer.Hbond.stackingraw: 3.06078
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Chuai2018.finalquantum_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.503659
–> remove trimer/tetramer
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018")
df <- read.delim("Chuai2018.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)
df.nokmer <- df %>% select(-grep("trimer", names(df)), -grep("tetramer", names(df)))
write.table(df.nokmer[,c(1,3:ncol(df.nokmer))], "Chuai2018.finalquantum.nokmer.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.nokmer[,c(1,3:ncol(df.nokmer))], "Chuai2018.finalquantum.nokmer.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.nokmer[,3:ncol(df.nokmer)], "Chuai2018.finalquantum.nokmer.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
# Andes
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum.nokmer
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum.nokmer
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Chuai2018.finalquantum --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/Chuai2018.finalquantum.nokmer.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/Chuai2018.finalquantum.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum.nokmer
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum.nokmer/Submits/submit_full_Chuai2018.finalquantum_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum.nokmer/Submits/submit_train_Chuai2018.finalquantum_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum.nokmer/Submits/submit_test_Chuai2018.finalquantum_0.sh
# Andes
module load python/3.7-anaconda3
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum.nokmer
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/YNames.txt Chuai2018.finalquantum
# 0.22948997895639026
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Chuai2018.finalquantum_cut.score.importance4 | head
# sgRNA.structuresgRNA.raw: 6.25575
# p18monomer.HLgap.eVraw: 6.14648
# gene.distance0: 5.43944
# p18monomer.No.electronsraw: 4.85526
# p13dimer.HLgap.eVEraw: 4.40427
# p14dimer.HLgap.eVEraw: 4.19683
# p14dimer.Hbond.energyraw: 3.81849
# TsgRNA.raw: 3.57828
# p9dimer.HLgap.eVEraw: 3.49469
# p18dimer.Hbond.energyraw: 3.1656
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum.nokmer/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Chuai2018.finalquantum_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.486193
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score Chuai2018.finalquantum
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum/cut.score/RIT.run
# mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
hct <- read.delim("HCT116.csv", header=T, sep=",")
hek <- read.delim("HEK293T.csv", header=T, sep=",")
hel <- read.delim("HELA.csv", header=T, sep=",")
hl <- read.delim("HL60.csv", header=T, sep=",")
hct$cell.line <- "HCT116"
hek$cell.line <- "HEK293T"
hel$cell.line <- "HELA"
hl$cell.line <- "HL60"
all <- rbind(hct, hek, hel, hl)
write.table(all, "Chuai2018.cell.lines.dataset.txt", quote=F, row.names=F, sep="\t")
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
df <- read.delim("Chuai2018.cell.lines.dataset.txt", header=T, sep="\t")
library(dplyr)
library(tidyr)
df2 <- df %>% group_by(cell.line) %>% mutate(id = row_number())
df2.id <- unite(df2, "sgRNAID", c(cell.line, id), sep="_")
df3 <- df2.id[,c(12,5,10)]
colnames(df3) <- c("sgRNAID", "nucleotide.sequence", "cut.score")
df.na <- na.omit(df3)
write.table(df.na, "Chuai2018.cell.lines.ngg.txt", quote=F, row.names=F, sep="\t")
df.na$nucleotide.sequence <- substr(df.na$nucleotide.sequence, 1, 20)
write.table(df.na, "Chuai2018.cell.lines.txt", quote=F, row.names=F, sep="\t")
# 16749
df3 <- df2.id[,c(12,5,11)]
colnames(df3) <- c("sgRNAID", "nucleotide.sequence", "cut.score")
df.na <- na.omit(df3)
df.na$nucleotide.sequence <- substr(df.na$nucleotide.sequence, 1, 20)
write.table(df.na, "Chuai2018.cell.lines.classification.txt", quote=F, row.names=F, sep="\t")
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines
sed '1d' Chuai2018.cell.lines.txt | awk '{print ">"$1"\n"$2}' > Chuai2018.cell.lines.fasta
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines
sed 's/chr10/NC_000010.11/g' Chuai2018.cell.lines.dataset.txt | sed 's/chr11/NC_000011.10/g' | sed 's/chr12/NC_000012.12/g' | sed 's/chr13/NC_000013.11/g' | sed 's/chr14/NC_000014.9/g' | sed 's/chr15/NC_000015.10/g' | sed 's/chr16/NC_000016.10/g' | sed 's/chr17/NC_000017.11/g' | sed 's/chr18/NC_000018.10/g' | sed 's/chr19/NC_000019.10/g' | sed 's/chr20/NC_000020.11/g' | sed 's/chr21/NC_000021.9/g' | sed 's/chr22/NC_000022.11/g' | sed 's/chr1/NC_000001.11/g' | sed 's/chr2/NC_000002.12/g' | sed 's/chr3/NC_000003.12/g' | sed 's/chr4/NC_000004.12/g' | sed 's/chr5/NC_000005.10/g' | sed 's/chr6/NC_000006.12/g' | sed 's/chr7/NC_000007.14/g' | sed 's/chr8/NC_000008.11/g' | sed 's/chr9/NC_000009.12/g' | sed 's/chrX/NC_000023.11/g' | sed 's/chrY/NC_000024.10/g' > Chuai2018.cell.lines.score.chr.txt
library(dplyr)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
df <- read.delim("Chuai2018.cell.lines.score.chr.txt", header=T, sep="\t")
df2 <- df %>% group_by(cell.line) %>% mutate(id = row_number())
df.id <- unite(df2, "sgRNAID", c(cell.line, id), sep="_")
df.coord <- df.id[,c(1:3,12,12,5,10)]
colnames(df.coord) <- c("chr", "start", "end", "sgRNA", "sgRNAID", "nucleotide.sequence", "cut.score")
write.table(df.coord, "Chuai2018.cell.lines.sgRNA.coord.txt", quote=F, row.names=F, sep="\t")
https://www.tbi.univie.ac.at/RNA/tutorial/ minimum free energy (MFE) structure
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate ViennaRNA
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/vienna
RNAfold < ../Chuai2018.cell.lines.fasta > Chuai2018.cell.lines.gRNA.ViennaRNA.output.txt
grep '(' Chuai2018.cell.lines.gRNA.ViennaRNA.output.txt | grep -Eo '[+-]?[0-9]+([.][0-9]+)?' > Chuai2018.cell.lines.gRNA.ViennaRNA.output.value.txt
grep '>' Chuai2018.cell.lines.gRNA.ViennaRNA.output.txt | sed 's/>//g' > Chuai2018.cell.lines.gRNA.names.txt
paste Chuai2018.cell.lines.gRNA.names.txt Chuai2018.cell.lines.gRNA.ViennaRNA.output.value.txt > Chuai2018.cell.lines.gRNA.ViennaRNA.output.value.id.txt
cp Chuai2018.cell.lines.gRNA.ViennaRNA.output.value.id.txt ../.
https://biopython.org/docs/1.75/api/Bio.SeqUtils.MeltingTemp.html
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
Bio.SeqUtils.MeltingTemp.Tm_NN(seq, check=True, strict=True, c_seq=None, shift=0, nn_table=None, tmm_table=None, imm_table=None, de_table=None, dnac1=25, dnac2=25, selfcomp=False, Na=50, K=0, Tris=0, Mg=0, dNTPs=0, saltcorr=5)
https://warwick.ac.uk/fac/sci/moac/people/students/peter_cock/python/fasta_n
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
# count nucleotides
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines
python3
input_file = open('Chuai2018.cell.lines.fasta', 'r')
output_file = open('nucleotide_counts_sgRNA.tsv','w')
output_file.write('Window\tA\tC\tG\tT\tLength\tCG%\n')
from Bio import SeqIO
for cur_record in SeqIO.parse(input_file, "fasta") :
gene_name = cur_record.name
A_count = cur_record.seq.count('A')
C_count = cur_record.seq.count('C')
G_count = cur_record.seq.count('G')
T_count = cur_record.seq.count('T')
length = len(cur_record.seq)
cg_percentage = float(C_count + G_count) / length
output_line = '%s\t%i\t%i\t%i\t%i\t%i\t%f\n' % \
(gene_name, A_count, C_count, G_count, T_count, length, cg_percentage)
output_file.write(output_line)
output_file.close()
input_file.close()
exit()
# Melting temperature(°C) = 64.9 + 41 * (nG+nC-16.4)/(nA+nT+nG+nC)
R
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
df <- read.delim("nucleotide_counts_sgRNA.tsv", header=T, sep="\t")
df.melt <- df %>% mutate(MeltingTemp = 64.9 + 41 * (G+C-16.4) / (A+T+G+C))
write.table(df.melt, "Chuai2018.cell.lines.nucleotide_counts_sgRNA_temp.txt", quote=F, row.names=F, sep="\t")
q()
# positional encoding kmers 1-4
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines
cut -f 1-2 Chuai2018.cell.lines.txt > Chuai2018.cell.lines.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer1_positional_encode.py Chuai2018.cell.lines.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer2_positional_encode.py Chuai2018.cell.lines.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer3_positional_encode.py Chuai2018.cell.lines.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer4_positional_encode.py Chuai2018.cell.lines.noscore.txt
# separate nucleotide sequence values into individual columns in data frame so each position counts as one feature
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines
sed '1d' Chuai2018.cell.lines.noscore_dependent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Chuai2018.cell.lines_dep1.txt
sed '1d' Chuai2018.cell.lines.noscore_dependent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Chuai2018.cell.lines_dep2.txt
sed '1d' Chuai2018.cell.lines.noscore_dependent3.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Chuai2018.cell.lines_dep3.txt
sed '1d' Chuai2018.cell.lines.noscore_dependent4.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Chuai2018.cell.lines_dep4.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/encode_sequences.py Chuai2018.cell.lines.noscore.txt
sed '1d' Chuai2018.cell.lines.noscore_independent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID A C T G' | cut -d ' ' -f 1-5 > Chuai2018.cell.lines_ind1.txt
sed '1d' Chuai2018.cell.lines.noscore_independent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID AA AC AT AG CA CC CT CG TA TC TT TG GA GC GT GG' | cut -d ' ' -f 1-17 > Chuai2018.cell.lines_ind2.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines
sed '1d' Chuai2018.cell.lines.noscore.txt | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 p16 p17 p18 p19 p20' | cut -d ' ' -f 1-21 > Chuai2018.cell.lines.sequence.txt
# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
library(dplyr)
library(reshape2)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("protein_rna_dna-vector_lee_nucleotide_dna_data_15dec.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
seq <- read.delim("Chuai2018.cell.lines.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:5]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- c("A", "C", "G", "T")
rownames(seq) <- seq[,1]
seq.df <- seq[,2:21]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Chuai2018.cell.lines.tensorsAll.single.bp.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Chuai2018.cell.lines.tensorsAll.single.bp.melt.txt", quote=F, row.names=F, sep="\t")
https://www.synthego.com/guide/how-to-use-crispr/pam-sequence
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J bedtools
#SBATCH -N 1
#SBATCH -p gpu
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines
awk '{print $0"\t""+"}' Chuai2018.cell.lines.sgRNA.coord.bed > Chuai2018.cell.lines.sgRNA.coord.strand.txt
cut -f 1-4 Chuai2018.cell.lines.sgRNA.coord.txt | sed '1d' | sort -k 1,1 -k 2,2n > Chuai2018.cell.lines.sgRNA.coord.bed
bedtools closest -a Chuai2018.cell.lines.sgRNA.coord.bed -b ../GCF_000001405.39_GRCh38.p13_genomic.gene.sorted.gtf -D b > Chuai201.cell.lines8.sgRNA.gene.closest.bed
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/bedtools.sh
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
df <- read.delim("Chuai2018.cell.lines.ngg.txt")
df$ngg <- substr(df$nucleotide.sequence, 21, 23)
df$nucleotide.sequence <- substr(df$nucleotide.sequence, 1, 20)
df$pam.distance <- 1
write.table(df, "Chuai2018.cell.lines.sgRNA.closestPAM.bed", quote=F, row.names=F, sep='\t')
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines
cut -f 1-4 Chuai2018.cell.lines.sgRNA.coord.txt | sed '1d' | sort -k 1,1 -k 2,2n > Chuai2018.cell.lines.sgRNA.coord.bed
bedtools closest -a Chuai2018.cell.lines.sgRNA.coord.bed -b ../GCF_000001405.39_GRCh38.p13_genomic.gene.sorted.gtf -D b > Chuai2018.cell.lines.sgRNA.gene.closest.bed
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
structure <- read.delim("Chuai2018.cell.lines.gRNA.ViennaRNA.output.value.id.txt", header=T, sep="\t", stringsAsFactors = F)
nuc <- read.delim("Chuai2018.cell.lines.nucleotide_counts_sgRNA_temp.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("Chuai2018.cell.lines.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- unique(score[,c(5,7,6)])
colnames(score.df) <- c("sgRNAID", "cut.score", "nucleotide.sequence")
structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]
# structure, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])
structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"
structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]
structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score.df, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "seq", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")
## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
onehot.ind1 <- read.delim("Chuai2018.cell.lines_ind1.txt", header=T, sep=" ")
onehot.ind2 <- read.delim("Chuai2018.cell.lines_ind2.txt", header=T, sep=" ")
onehot.dep1 <- read.delim("Chuai2018.cell.lines_dep1.txt", header=F, sep=" ")
onehot.dep2 <- read.delim("Chuai2018.cell.lines_dep2.txt", header=F, sep=" ")
onehot.dep3 <- read.delim("Chuai2018.cell.lines_dep3.txt", header=F, sep=" ")
onehot.dep4 <- read.delim("Chuai2018.cell.lines_dep4.txt", header=F, sep=" ")
colnames(onehot.dep1)[1] <- "sgRNAID"
colnames(onehot.dep2)[1] <- "sgRNAID"
colnames(onehot.dep3)[1] <- "sgRNAID"
colnames(onehot.dep4)[1] <- "sgRNAID"
onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)-1], onehot.dep2[,1:ncol(onehot.dep2)-1], by="sgRNAID")
onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)-1], by="sgRNAID")
onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)-1], by="sgRNAID")
onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"
data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))
df.melt <- melt(data.onehot[,c(1,2,4:ncol(data.onehot))], id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
write.table(df.id, "Chuai2018.cell.lines.structure.temp.gc.onehot1to4.txt", quote=F, row.names=F, sep="\t")
# pam (distance and nucleotide)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
sgRNA.pam <- read.table("Chuai2018.cell.lines.sgRNA.closestPAM.bed", header=T, sep="\t", stringsAsFactors = F)
sgRNA.pam.sub <- sgRNA.pam[,c(1,4,5)]
colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]
sgRNA.pam.id <- sgRNA.pam.df
score <- read.delim("Chuai2018.cell.lines.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(5,7)]
colnames(score.df) <- c("sgRNAID", "cut.score")
score.location <- left_join(score.df, sgRNA.pam.id, by="sgRNAID")
score.location$scale <- 0
df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.pam.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df <- read.delim("Chuai2018.cell.lines.structure.temp.gc.onehot1to4.txt", header=T, sep="\t")
df.onehot.dcast <- df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.onehot.pam <- left_join(df.onehot.dcast, df.pam.dcast, by=c("sgRNAID"))
df.onehot.pam.na <- na.omit(df.onehot.pam)
nrow(df.onehot.pam.na)
# 16748
# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
sgRNA.genes <- read.table("Chuai2018.cell.lines.sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- sgRNA.genes[,c(4,14)]
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")
sgRNA.genes.id <- sgRNA.genes.df
score.location <- left_join(score.df, sgRNA.genes.id, by=c("sgRNAID"))
score.location$scale <- 0
df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.location.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.location.dcast.na <- na.omit(df.location.dcast)
df.pam.location <- inner_join(df.location.dcast.na, df.onehot.pam.na, by=c("sgRNAID"))
nrow(df.pam.location)
# 16748
df.final <- df.pam.location[,c(1:3,5:5915,5917:5921)]
ncol(df.final)
# 5919
write.table(df.final, "Chuai2018.cell.lines.raw.matrix.txt", quote=F, row.names=F, sep="\t")
# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
# Monomer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Monomer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
seq <- read.delim("Chuai2018.cell.lines.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq) <- seq[,1]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Chuai2018.cell.lines.quantum.monomer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Chuai2018.cell.lines.quantum.monomer.melt.txt", quote=F, row.names=F, sep="\t")
# Basepair
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Basepair.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
seq <- read.delim("Chuai2018.cell.lines.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq) <- seq[,1]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Chuai2018.cell.lines.quantum.basepair.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Chuai2018.cell.lines.quantum.basepair.melt.txt", quote=F, row.names=F, sep="\t")
# Dimer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Dimer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
seq <- read.delim("Chuai2018.cell.lines.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.dimer <- seq %>% unite("p1", p1:p2, remove=F, sep= "") %>% unite("p2", p2:p3, remove=F, sep= "") %>% unite("p3", p3:p4, remove=F, sep= "") %>% unite("p4", p4:p5, remove=F, sep= "") %>% unite("p5", p5:p6, remove=F, sep= "") %>% unite("p6", p6:p7, remove=F, sep= "") %>% unite("p7", p7:p8, remove=F, sep= "") %>% unite("p8", p8:p9, remove=F, sep= "") %>% unite("p9", p9:p10, remove=F, sep= "") %>% unite("p10", p10:p11, remove=F, sep= "") %>% unite("p11", p11:p12, remove=F, sep= "") %>% unite("p12", p12:p13, remove=F, sep= "") %>% unite("p13", p13:p14, remove=F, sep= "") %>% unite("p14", p14:p15, remove=F, sep= "") %>% unite("p15", p15:p16, remove=F, sep= "") %>% unite("p16", p16:p17, remove=F, sep= "") %>% unite("p17", p17:p18, remove=F, sep= "") %>% unite("p18", p18:p19, remove=F, sep= "") %>% unite("p19", p19:p20, remove=T, sep= "")
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq.dimer) <- seq.dimer[,1]
seq.df <- seq.dimer[,1:20]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Chuai2018.cell.lines.quantum.dimer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Chuai2018.cell.lines.quantum.dimer.melt.txt", quote=F, row.names=F, sep="\t")
# Trimer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Trimer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
seq <- read.delim("Chuai2018.cell.lines.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.trimer <- seq %>% unite("p1", p1:p3, remove=F, sep= "") %>% unite("p2", p2:p4, remove=F, sep= "") %>% unite("p3", p3:p5, remove=F, sep= "") %>% unite("p4", p4:p6, remove=F, sep= "") %>% unite("p5", p5:p7, remove=F, sep= "") %>% unite("p6", p6:p8, remove=F, sep= "") %>% unite("p7", p7:p9, remove=F, sep= "") %>% unite("p8", p8:p10, remove=F, sep= "") %>% unite("p9", p9:p11, remove=F, sep= "") %>% unite("p10", p10:p12, remove=F, sep= "") %>% unite("p11", p11:p13, remove=F, sep= "") %>% unite("p12", p12:p14, remove=F, sep= "") %>% unite("p13", p13:p15, remove=F, sep= "") %>% unite("p14", p14:p16, remove=F, sep= "") %>% unite("p15", p15:p17, remove=F, sep= "") %>% unite("p16", p16:p18, remove=F, sep= "") %>% unite("p17", p17:p19, remove=F, sep= "") %>% unite("p18", p18:p20, remove=F, sep= "")
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq.trimer) <- seq.trimer[,1]
seq.df <- seq.trimer[,1:19]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Chuai2018.cell.lines.quantum.trimer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Chuai2018.cell.lines.quantum.trimer.melt.txt", quote=F, row.names=F, sep="\t")
# Tetramer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Tetramer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
seq <- read.delim("Chuai2018.cell.lines.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.tetramer <- seq %>% unite("p1", p1:p4, remove=F, sep= "") %>% unite("p2", p2:p5, remove=F, sep= "") %>% unite("p3", p3:p6, remove=F, sep= "") %>% unite("p4", p4:p7, remove=F, sep= "") %>% unite("p5", p5:p8, remove=F, sep= "") %>% unite("p6", p6:p9, remove=F, sep= "") %>% unite("p7", p7:p10, remove=F, sep= "") %>% unite("p8", p8:p11, remove=F, sep= "") %>% unite("p9", p9:p12, remove=F, sep= "") %>% unite("p10", p10:p13, remove=F, sep= "") %>% unite("p11", p11:p14, remove=F, sep= "") %>% unite("p12", p12:p15, remove=F, sep= "") %>% unite("p13", p13:p16, remove=F, sep= "") %>% unite("p14", p14:p17, remove=F, sep= "") %>% unite("p15", p15:p18, remove=F, sep= "") %>% unite("p16", p16:p19, remove=F, sep= "") %>% unite("p17", p17:p20, remove=F, sep= "")
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq.tetramer) <- seq.tetramer[,1]
seq.df <- seq.tetramer[,1:18]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "Chuai2018.cell.lines.quantum.tetramer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Chuai2018.cell.lines.quantum.tetramer.melt.txt", quote=F, row.names=F, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
monomer <- read.delim("Chuai2018.cell.lines.quantum.monomer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
basepair <- read.delim("Chuai2018.cell.lines.quantum.basepair.melt.txt", header=T, sep="\t", stringsAsFactors = F)
dimer <- read.delim("Chuai2018.cell.lines.quantum.dimer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
trimer <- read.delim("Chuai2018.cell.lines.quantum.trimer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
tetramer <- read.delim("Chuai2018.cell.lines.quantum.tetramer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
monomer.basepair <- rbind(monomer, basepair)
monomer.basepair.dimer <- rbind(monomer.basepair, dimer)
monomer.basepair.dimer.trimer <- rbind(monomer.basepair.dimer, trimer)
monomer.basepair.dimer.trimer.tetramer <- rbind(monomer.basepair.dimer.trimer, tetramer)
write.table(monomer.basepair.dimer.trimer.tetramer, "Chuai2018.cell.lines.quantum.melt.txt", quote=F, row.names=F, sep="\t")
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
df <- read.delim("Chuai2018.cell.lines.raw.matrix.txt", header=T, sep="\t", stringsAsFactors = F)
# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
tensor <- read.delim("Chuai2018.cell.lines.quantum.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0
tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0
df.score <- unique(df[,c(1,2)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
colnames(tensor.score.order) <- c("cut.score", "feature.scale", "sgRNAID", "value")
df.dcast <- tensor.score.order %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
nrow(df.dcast.na)
# 16748
df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
ncol(df.location)
# 6236
write.table(df.location, "Chuai2018.cell.lines.finalquantum.txt", quote=F, row.names=F, sep="\t")
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
df <- read.delim("Chuai2018.cell.lines.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df.cut <- df[,c(1:5919,5921:6236)]
df.num <- mutate_all(df.cut[,2:ncol(df.cut)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df.cut[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"
write.table(df.all[,c(1,3:ncol(df.all))], "Chuai2018.cell.lines.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "Chuai2018.cell.lines.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "Chuai2018.cell.lines.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Chuai2018.cell.lines.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Chuai2018.cell.lines.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.all[,2]), "Chuai2018.cell.lines.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
library(stringr)
df.1 <- df.all %>% filter(str_detect(sgRNAID, "HCT116"))
write.table(df.1[,c(1,3:ncol(df.1))], "Chuai2018.HCT116.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.1[,c(1,3:ncol(df.1))], "Chuai2018.HCT116.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.1[,3:ncol(df.1)], "Chuai2018.HCT116.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.1[,1:2], "Chuai2018.HCT116.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.1[,1:2], "Chuai2018.HCT116.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.1[,2]), "Chuai2018.HCT116.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
df.2 <- df.all %>% filter(str_detect(sgRNAID, "HEK293T"))
write.table(df.2[,c(1,3:ncol(df.2))], "Chuai2018.HEK293T.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.2[,c(1,3:ncol(df.2))], "Chuai2018.HEK293T.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.2[,3:ncol(df.2)], "Chuai2018.HEK293T.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.2[,1:2], "Chuai2018.HEK293T.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.2[,1:2], "Chuai2018.HEK293T.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.2[,2]), "Chuai2018.HEK293T.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
df.3 <- df.all %>% filter(str_detect(sgRNAID, "HELA"))
write.table(df.3[,c(1,3:ncol(df.3))], "Chuai2018.HELA.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.3[,c(1,3:ncol(df.3))], "Chuai2018.HELA.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.3[,3:ncol(df.3)], "Chuai2018.HELA.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.3[,1:2], "Chuai2018.HELA.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.3[,1:2], "Chuai2018.HELA.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.3[,2]), "Chuai2018.HELA.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
df.4 <- df.all %>% filter(str_detect(sgRNAID, "HL60"))
write.table(df.4[,c(1,3:ncol(df.4))], "Chuai2018.HL60.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.4[,c(1,3:ncol(df.4))], "Chuai2018.HL60.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.4[,3:ncol(df.4)], "Chuai2018.HL60.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.4[,1:2], "Chuai2018.HL60.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.4[,1:2], "Chuai2018.HL60.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.4[,2]), "Chuai2018.HL60.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
# run python scripts on Andes
# run job submissions on Summit
# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]
# Andes
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Chuai2018.cell.lines --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/Chuai2018.cell.lines.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/Chuai2018.cell.lines.finalquantum.score.txt
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HCT116
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HCT116
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Chuai2018.HCT116 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/Chuai2018.HCT116.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/Chuai2018.HCT116.finalquantum.score.txt
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HEK293T
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HEK293T
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Chuai2018.HEK293T --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/Chuai2018.HEK293T.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/Chuai2018.HEK293T.finalquantum.score.txt
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HELA
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HELA
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Chuai2018.HELA --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/Chuai2018.HELA.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/Chuai2018.HELA.finalquantum.score.txt
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HL60
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HL60
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Chuai2018.HL60 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/Chuai2018.HL60.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/Chuai2018.HL60.finalquantum.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/Submits/submit_full_Chuai2018.cell.lines_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HCT116/Submits/submit_full_Chuai2018.HCT116_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HEK293T/Submits/submit_full_Chuai2018.HEK293T_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HELA/Submits/submit_full_Chuai2018.HELA_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HL60/Submits/submit_full_Chuai2018.HL60_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/Submits/submit_train_Chuai2018.cell.lines_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HCT116/Submits/submit_train_Chuai2018.HCT116_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HEK293T/Submits/submit_train_Chuai2018.HEK293T_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HELA/Submits/submit_train_Chuai2018.HELA_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HL60/Submits/submit_train_Chuai2018.HL60_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/Submits/submit_test_Chuai2018.cell.lines_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HCT116/Submits/submit_test_Chuai2018.HCT116_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HEK293T/Submits/submit_test_Chuai2018.HEK293T_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HELA/Submits/submit_test_Chuai2018.HELA_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HL60/Submits/submit_test_Chuai2018.HL60_0.sh
# Andes
#module load python/3.7-anaconda3
module load python/3.7.0-anaconda3-5.3.0
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/YNames.txt Chuai2018.cell.lines
# 0.2215839985177171
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Chuai2018.cell.lines_cut.score.importance4 | head
# p18monomer.HLgap.eVraw: 5.59946
# p18monomer.No.electronsraw: 4.61317
# p17tetramer.Hbond.energyraw: 4.45884
# p3tetramer.Hlgap.eVEraw: 3.59588
# p8tetramer.Hbond.stackingraw: 3.52185
# p5tetramer.Hbond.stackingraw: 3.39563
# p17tetramer.Hbond.stackingraw: 3.22668
# p13tetramer.Hbond.stackingraw: 3.03115
# p11tetramer.Hlgap.eVEraw: 2.97624
# p17tetramer.Hlgap.eVEraw: 2.95762
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Chuai2018.cell.lines_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4950757
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HCT116
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/YNames.txt Chuai2018.HCT116
# 0.09946901052076344
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Chuai2018.HCT116_cut.score.importance4 | head
# p17tetramer.Hbond.stackingraw: 2.50685
# V71.xsgRNA.raw: 1.81098
# p18monomer.No.electronsraw: 1.51649
# p18monomer.HLgap.eVraw: 1.36099
# p14tetramer.Hbond.energyraw: 1.33923
# p14trimer.Hlgap.eVEraw: 1.07283
# p18trimer.Hbond.stackingraw: 1.05095
# p11tetramer.Hbond.stackingraw: 0.965189
# p12trimer.Hlgap.eVEraw: 0.890748
# p14trimer.Hbond.energyraw: 0.890312
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HCT116/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Chuai2018.HCT116_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.3326642
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HEK293T
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/YNames.txt Chuai2018.HEK293T
# 0.07072574111118635
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Chuai2018.HEK293T_cut.score.importance4 | head
# p17tetramer.Hbond.energyraw: 1.07532
# GCsgRNA.raw: 0.500497
# p18trimer.Hbond.stackingraw: 0.486274
# gene.distance0: 0.453329
# p17tetramer.Hlgap.eVEraw: 0.418314
# p16tetramer.Hbond.energyraw: 0.415888
# p2tetramer.Hbond.energyraw: 0.296519
# AsgRNA.raw: 0.283171
# p16trimer.Hlgap.eVEraw: 0.263133
# CTsgRNA.raw: 0.255572
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HEK293T/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Chuai2018.HEK293T_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.2178619
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HELA
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/YNames.txt Chuai2018.HELA
# 0.1073818896135052
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Chuai2018.HELA_cut.score.importance4 | head
# V71.xsgRNA.raw: 6.58317
# p17tetramer.Hbond.energyraw: 5.05808
# p5tetramer.Hbond.stackingraw: 2.18043
# p3tetramer.Hlgap.eVEraw: 2.09323
# p16tetramer.Hbond.stackingraw: 1.83042
# p9tetramer.Hbond.stackingraw: 1.74105
# p10tetramer.Hlgap.eVEraw: 1.60667
# p6tetramer.Hlgap.eVEraw: 1.57217
# p16monomer.No.electronsraw: 1.528
# p11tetramer.Hlgap.eVEraw: 1.52691
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HELA/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Chuai2018.HELA_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.3185771
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HL60
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/YNames.txt Chuai2018.HL60
# 0.12180358813347845
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Chuai2018.HL60_cut.score.importance4 | head
# p18trimer.Hbond.energyraw: 0.825499
# p18trimer.Hbond.stackingraw: 0.543373
# p20monomer.No.electronsraw: 0.449361
# p7tetramer.Hbond.stackingraw: 0.321361
# p17tetramer.Hbond.energyraw: 0.29536
# p17tetramer.Hlgap.eVEraw: 0.247908
# p14trimer.Hbond.energyraw: 0.230709
# p10tetramer.Hlgap.eVEraw: 0.211915
# p3tetramer.Hlgap.eVEraw: 0.203093
# sgRNA.structuresgRNA.raw: 0.202799
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HL60/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Chuai2018.HL60_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4012635
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
df <- read.delim("Chuai2018.cell.lines.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df.cut <- df[,c(1:5919,5921:6236)]
df.na <- na.omit(df.cut)
df.num <- mutate_all(df.cut[,2:ncol(df.cut)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df.cut[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"
score <- read.delim("Chuai2018.cell.lines.classification.txt", header=T, sep="\t")
df.score <- inner_join(score[,c(1,3)], df.all[,c(1,3:ncol(df.all))], by=c("sgRNAID"))
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification")
write.table(df.score[,c(1,3:ncol(df.score))], "Chuai2018.cell.lines.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.score[,c(1,3:ncol(df.score))], "Chuai2018.cell.lines.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.score[,3:ncol(df.score)], "Chuai2018.cell.lines.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.score[,1:2], "Chuai2018.cell.lines.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.score[,1:2], "Chuai2018.cell.lines.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.score[,2]), "Chuai2018.cell.lines.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
library(stringr)
df.1 <- df.score %>% filter(str_detect(sgRNAID, "HCT116"))
write.table(df.1[,c(1,3:ncol(df.1))], "Chuai2018.HCT116.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.1[,c(1,3:ncol(df.1))], "Chuai2018.HCT116.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.1[,3:ncol(df.1)], "Chuai2018.HCT116.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.1[,1:2], "Chuai2018.HCT116.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.1[,1:2], "Chuai2018.HCT116.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.1[,2]), "Chuai2018.HCT116.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
df.2 <- df.score %>% filter(str_detect(sgRNAID, "HEK293T"))
write.table(df.2[,c(1,3:ncol(df.2))], "Chuai2018.HEK293T.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.2[,c(1,3:ncol(df.2))], "Chuai2018.HEK293T.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.2[,3:ncol(df.2)], "Chuai2018.HEK293T.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.2[,1:2], "Chuai2018.HEK293T.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.2[,1:2], "Chuai2018.HEK293T.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.2[,2]), "Chuai2018.HEK293T.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
df.3 <- df.score %>% filter(str_detect(sgRNAID, "HELA"))
write.table(df.3[,c(1,3:ncol(df.3))], "Chuai2018.HELA.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.3[,c(1,3:ncol(df.3))], "Chuai2018.HELA.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.3[,3:ncol(df.3)], "Chuai2018.HELA.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.3[,1:2], "Chuai2018.HELA.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.3[,1:2], "Chuai2018.HELA.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.3[,2]), "Chuai2018.HELA.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
df.4 <- df.score %>% filter(str_detect(sgRNAID, "HL60"))
write.table(df.4[,c(1,3:ncol(df.4))], "Chuai2018.HL60.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.4[,c(1,3:ncol(df.4))], "Chuai2018.HL60.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.4[,3:ncol(df.4)], "Chuai2018.HL60.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.4[,1:2], "Chuai2018.HL60.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.4[,1:2], "Chuai2018.HL60.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.4[,2]), "Chuai2018.HL60.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
# run python scripts on Andes
# run job submissions on Summit
# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]
# Andes
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Chuai2018.cell.lines --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/Chuai2018.cell.lines.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/Chuai2018.cell.lines.finalquantum.score.txt
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HCT116
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HCT116
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Chuai2018.HCT116 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/Chuai2018.HCT116.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/Chuai2018.HCT116.finalquantum.score.txt
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HEK293T
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HEK293T
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Chuai2018.HEK293T --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/Chuai2018.HEK293T.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/Chuai2018.HEK293T.finalquantum.score.txt
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HELA
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HELA
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Chuai2018.HELA --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/Chuai2018.HELA.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/Chuai2018.HELA.finalquantum.score.txt
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HL60
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HL60
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Chuai2018.HL60 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/Chuai2018.HL60.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/Chuai2018.HL60.finalquantum.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/Submits/submit_full_Chuai2018.cell.lines_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HCT116/Submits/submit_full_Chuai2018.HCT116_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HEK293T/Submits/submit_full_Chuai2018.HEK293T_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HELA/Submits/submit_full_Chuai2018.HELA_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HL60/Submits/submit_full_Chuai2018.HL60_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/Submits/submit_train_Chuai2018.cell.lines_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HCT116/Submits/submit_train_Chuai2018.HCT116_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HEK293T/Submits/submit_train_Chuai2018.HEK293T_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HELA/Submits/submit_train_Chuai2018.HELA_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HL60/Submits/submit_train_Chuai2018.HL60_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/Submits/submit_test_Chuai2018.cell.lines_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HCT116/Submits/submit_test_Chuai2018.HCT116_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HEK293T/Submits/submit_test_Chuai2018.HEK293T_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HELA/Submits/submit_test_Chuai2018.HELA_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HL60/Submits/submit_test_Chuai2018.HL60_0.sh
# Andes
module load python/3.7-anaconda3
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/YNames.txt Chuai2018.cell.lines
#
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Chuai2018.cell.lines_cut.score.importance4 | head
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Chuai2018.cell.lines_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
#
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HCT116
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/YNames.txt Chuai2018.HCT116
#
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Chuai2018.HCT116_cut.score.importance4 | head
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HCT116/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Chuai2018.HCT116_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
#
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HEK293T
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/YNames.txt Chuai2018.HEK293T
#
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Chuai2018.HEK293T_cut.score.importance4 | head
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HEK293T/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Chuai2018.HEK293T_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
#
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HELA
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/YNames.txt Chuai2018.HELA
#
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Chuai2018.HELA_cut.score.importance4 | head
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HELA/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Chuai2018.HELA_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
#
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HL60
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/YNames.txt Chuai2018.HL60
#
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Chuai2018.HL60_cut.score.importance4 | head
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HL60/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Chuai2018.HL60_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
#
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018")
chuai <- read.delim("Chuai2018.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED")
doench <- read.delim("Doench2014CORRECTED.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)
doench.chuai <- rbind(doench, chuai)
nrow(doench)
# 1277
nrow(chuai)
# 16748
ncol(doench.chuai)
# 6235
nrow(doench.chuai)
# 17421
write.table(doench.chuai, "Doench2014CORRECTED.Chuai2018.finalquantum.txt", quote=F, row.names=F, sep="\t")
write.table(doench.chuai[,c(1,3:ncol(doench.chuai))], "Doench2014CORRECTED.Chuai2018.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(doench.chuai[,c(1,3:ncol(doench.chuai))], "Doench2014CORRECTED.Chuai2018.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(doench.chuai[,3:ncol(doench.chuai)], "Doench2014CORRECTED.Chuai2018.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(doench.chuai[,1:2], "Doench2014CORRECTED.Chuai2018.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(doench.chuai[,1:2], "Doench2014CORRECTED.Chuai2018.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = doench.chuai[,2]), "Doench2014CORRECTED.Chuai2018.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
# run python scripts on Andes
# run job submissions on Summit
# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]
# Andes
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Doench2014CORRECTED.Chuai2018
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Doench2014CORRECTED.Chuai2018
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Doench2014CORRECTED.Chuai2018 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/Doench2014CORRECTED.Chuai2018.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/Doench2014CORRECTED.Chuai2018.finalquantum.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Doench2014CORRECTED.Chuai2018
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Doench2014CORRECTED.Chuai2018/Submits/submit_full_Doench2014CORRECTED.Chuai2018_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Doench2014CORRECTED.Chuai2018/Submits/submit_train_Doench2014CORRECTED.Chuai2018_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Doench2014CORRECTED.Chuai2018/Submits/submit_test_Doench2014CORRECTED.Chuai2018_0.sh
# Andes
module load python/3.7-anaconda3
vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Doench2014CORRECTED.Chuai2018
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/YNames.txt Doench2014CORRECTED.Chuai2018
# 0.2116713321128397
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Doench2014CORRECTED.Chuai2018_cut.score.importance4 | head
# p18monomer.HLgap.eVraw: 5.89143
# p18monomer.No.electronsraw: 5.39423
# p17tetramer.Hbond.energyraw: 4.44324
# gene.distance0: 3.76072
# p20monomer.No.electronsraw: 3.65848
# p1tetramer.Hbond.stackingraw: 3.53852
# p5tetramer.Hbond.stackingraw: 3.51513
# p13tetramer.Hbond.stackingraw: 3.49967
# p11tetramer.Hlgap.eVEraw: 3.42702
# p14tetramer.Hbond.energyraw: 3.33662
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Doench2014CORRECTED.Chuai2018/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Doench2014CORRECTED.Chuai2018_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4964907
# scatter plots
setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/human")
pred <- read.delim("Doench2014CORRECTED.Chuai2018_Set4_test.prediction", header=T, sep="\t", stringsAsFactors = F)
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t", stringsAsFactors = F)
pred.y <- cbind(pred, y)
pred.y$row_num <- seq.int(nrow(pred.y))
colnames(pred.y) <- c("pred", "yvec", "id")
library(ggplot2)
ggplot(pred.y, aes(x=yvec, y=pred)) + geom_point(stat="identity") + geom_smooth(method='lm') + theme_classic()
cor(pred.y$yvec, pred.y$pred)
# 0.4964907
library(dplyr)
pred.y.rank <- pred.y %>% mutate(yvec.rank=dense_rank(desc(-yvec)), pred.rank=dense_rank(desc(-pred)))
ggplot(pred.y.rank, aes(x=yvec.rank, y=pred.rank)) + geom_point(stat="identity") + geom_smooth(method='lm') + theme_classic()
cor(pred.y.rank$yvec.rank, pred.y.rank$pred.rank)
# 0.4823223
### is it better at predicting high or low scores?? based on input data??
## look at the distribution of scores and segment as high or low cutting efficiency??
ggplot(pred.y, aes(x=yvec)) + geom_density() + theme_classic()
pred.y.low <- subset(pred.y, pred.y$yvec < 0.25)
cor(pred.y.low$yvec, pred.y.low$pred)
# 0.2370957
pred.y.high <- subset(pred.y, pred.y$yvec > 0.25)
cor(pred.y.high$yvec, pred.y.high$pred)
# 0.3081182
### NOPE... what about classifying as high or low so the rank as binary
pred.y.binary <- pred.y.rank %>% mutate(yvec.binary = ifelse(yvec < 0.25, 0, 1), yvec.label = ifelse(yvec < 0.25, "low", "high"))
cor(pred.y.binary$yvec.binary, pred.y.binary$pred)
# 0.4339365
ggplot(pred.y.binary, aes(x=yvec.label, y=pred, fill=yvec.label)) + geom_boxplot() + theme_classic()
pred.y.binary <- pred.y.rank %>% mutate(yvec.binary = ifelse(yvec < 0.2, 0, ifelse(yvec > 0.4, 1, 0.5)), yvec.label = ifelse(yvec < 0.2, "low (< 0.2)", ifelse(yvec > 0.4, "high (> 0.4)", "mid")))
ggplot(pred.y.binary, aes(x=yvec.label, y=pred, fill=yvec.label)) + geom_boxplot() + theme_classic()
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Doench2014CORRECTED.Chuai2018/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score Doench2014CORRECTED.Chuai2018
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Doench2014CORRECTED.Chuai2018/cut.score/RIT.run
# p18monomer.HLgap.eVraw cut.score 0.024405835175336094 -6.973813582017135e-08 8187.692 0.2760956586167866
# p17tetramer.Hbond.energyraw cut.score 0.02169399861124001 1.5233381891088458e-08 9650.78 0.22856400907187682
# p18monomer.No.electronsraw cut.score 0.021145261282387834 -6.358427541271254e-08 7186.54 0.2823827475128112
# p5tetramer.Hbond.stackingraw cut.score 0.016134459563739226 6.340779483919579e-09 3446.598 0.24080745334174244
# p3tetramer.Hlgap.eVEraw cut.score 0.015529620296296136 2.7170416771668338e-08 3525.94 0.23787769424407473
# gene.distance0 cut.score 0.015285783299130412 2.7145958692695434e-09 4248.845 0.2148993919163693
# p11tetramer.Hlgap.eVEraw cut.score 0.01502393955729636 8.05942291919635e-11 3311.552 0.22581979882306788
# p8tetramer.Hbond.stackingraw cut.score 0.014360361525453497 5.2422111591566174e-09 2841.643 0.2374983709116446
# p20monomer.No.electronsraw cut.score 0.014346761055492067 4.712984943453509e-08 7186.305 0.21207099138346416
# p13tetramer.Hbond.stackingraw cut.score 0.01425665794199756 -6.537779000631687e-09 3322.548 0.2333450315725802
library(ggplot2)
library(reshape2)
library(RColorBrewer)
## Main H.sapien feature figure
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Doench2014CORRECTED.Chuai2018/cut.score")
imp <- read.delim("Doench2014CORRECTED.Chuai2018.importance4.effect_sorted", header=F, sep="\t", stringsAsFactors = F)
colnames(imp) <- c("Feature", "YVec", "NormEdge", "Effect", "Samples", "Linearity")
imp$Normalized.Importance <- as.numeric(imp$NormEdge)
imp$Feature.Effect <- as.numeric(imp$Effect)
imp$SampleCount <- as.numeric(imp$Samples)
imp.dir <- imp %>% mutate(Effect.Direction = ifelse(Feature.Effect < 0, "neg", ifelse(Feature.Effect > 0, "pos", "zero")))
imp.dir.top20 <- imp.dir[1:20,]
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
imp.dir.top20.df <- imp.dir.top20 %>% mutate(imp.dir = ifelse(Effect.Direction == "neg", Normalized.Importance*-1, Normalized.Importance))
imp.dir.top20.df$Feature.Label <- c("Monomer HL-gap pos18", "Tetramer H-bond pos17", "Monomer # of Electrons pos18", "Tetramer H-stacking pos5", "Tetramer HL-gap pos3", "Distance to Gene", "Tetramer HL-gap pos11", "Tetrmaer H-stacking pos8", "Monomer # of Electrons pos20", "Tetramer H-stacking pos13", "Tetramer H-stacking pos1", "Tetramer HL-gap pos17", "Tetramer H-stacking pos9", "Tetramer H-stacking pos11", "Tetramer H-bond pos14", "Tetramer HL-gap pos1", "Tetramer HL-gap pos13", "Tetramer HL-gap pos8", "Tetramer HL-gap pos7", "Tetramer H-stacking pos17")
library(ggplot2)
pdf("DoenchChuai.FeatureEngineering.pdf")
ggplot(imp.dir.top20.df, aes(x=reorder(Feature.Label, -Normalized.Importance), y=imp.dir, color=Effect.Direction)) + geom_point(size=3) + geom_segment(aes(x=Feature.Label, xend=Feature.Label, y=0, yend=imp.dir)) + labs(title="H.sapien Top Features") + ylab("Normalized Importance") + xlab("") + theme(axis.text.x = element_text(angle=90, vjust=0.6)) + scale_fill_brewer(palette="Set1") + theme_classic() + coord_flip()
dev.off()
library(ggplot2)
pdf("DoenchChuai.FeatureEngineering.31May.pdf")
ggplot(imp.dir.top20.df, aes(x=reorder(Feature.Label, Normalized.Importance), y=imp.dir, color=Effect.Direction)) + geom_point(size=3) + geom_segment(aes(x=Feature.Label, xend=Feature.Label, y=0, yend=imp.dir)) + labs(title="H.sapien Top Features") + ylab("Normalized Importance") + xlab("") + theme(axis.text.x = element_text(angle=90, vjust=0.6)) + scale_fill_brewer(palette="Set1") + theme_classic() + coord_flip()
dev.off()
#### Figure S3: Focus on effect size
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Doench2014CORRECTED.Chuai2018/cut.score")
imp <- read.delim("Doench2014CORRECTED.Chuai2018.importance4.effect_sorted", header=F, sep="\t", stringsAsFactors = F)
colnames(imp) <- c("Feature", "YVec", "NormEdge", "Effect", "Samples", "Linearity")
imp$Normalized.Importance <- as.numeric(imp$NormEdge)
imp$Feature.Effect <- as.numeric(imp$Effect)
imp$SampleCount <- as.numeric(imp$Samples)
imp.dir <- imp %>% mutate(Effect.Direction = ifelse(Feature.Effect < 0, "neg", ifelse(Feature.Effect > 0, "pos", "zero")))
imp.dir$absEffect <- abs(imp.dir$Feature.Effect)
imp.dir.effectsorted <- imp.dir[order(imp.dir$absEffect, decreasing = TRUE),]
imp.dir.effectsorted.top20 <- imp.dir.effectsorted[1:20,]
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
pdf("DoenchChuai.Top20Effect.Effect.30March.pdf")
imp.dir.effectsorted.top20$Feature.Label <- c("CTG pos2", "TCC pos15", "CC pos19", "AC pos1", "CACC pos12", "TGCA pos3", "AGAG pos10", "GATC pos1", "CACC pos7", "GCA pos1", "TCAG pos7", "GAC pos7", "ATGT pos2", "CAC pos5", "CAAT pos12", "CCTA pos9", "CACC pos2", "CTCC pos11", "GATG pos1", "GTAC pos13")
ggplot(imp.dir.effectsorted.top20) + geom_point(aes(x=reorder(Feature.Label, -absEffect), y=absEffect, color=Effect.Direction, size=Normalized.Importance)) + xlab("") + ylab("abs(Effect Size)") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Set2")
dev.off()
# mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/")
features <- read.delim("Doench2014CORRECTED.Chuai2018.finalquantum.features.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("Doench2014CORRECTED.Chuai2018.finalquantum.score.txt", header=T, sep="\t", stringsAsFactors = F)
summary(score$cut.score)
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification")
score.q1 <- score %>% mutate(cut.score = ifelse(cut.score < 0.25, 0, 1))
score.q2 <- score %>% mutate(cut.score = ifelse(cut.score < 0.50, 0, 1))
score.q3 <- score %>% mutate(cut.score = ifelse(cut.score < 0.75, 0, 1))
feature.score.q1 <- left_join(score.q1, features, by="sgRNAID")
write.table(feature.score.q1[,2:ncol(feature.score.q1)], "Doench2014CORRECTED.Chuai2018.finalquantum.classify.q1.iRFmatrix.tsv", quote=F, row.names=F, sep=",")
feature.score.q2 <- left_join(score.q2, features, by="sgRNAID")
write.table(feature.score.q2[,2:ncol(feature.score.q2)], "Doench2014CORRECTED.Chuai2018.finalquantum.classify.q2.iRFmatrix.tsv", quote=F, row.names=F, sep=",")
feature.score.q3 <- left_join(score.q3, features, by="sgRNAID")
write.table(feature.score.q3[,2:ncol(feature.score.q3)], "Doench2014CORRECTED.Chuai2018.finalquantum.classify.q3.iRFmatrix.tsv", quote=F, row.names=F, sep=",")
write.table(feature.score.q1[,1:2], "Doench2014CORRECTED.Chuai2018.finalquantum.classify.q1.score.txt", quote=F, row.names=F, sep="\t")
write.table(feature.score.q1[,1:2], "Doench2014CORRECTED.Chuai2018.finalquantum.classify.q1.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = feature.score.q1[,2]), "Doench2014CORRECTED.Chuai2018.finalquantum.classify.q1.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(feature.score.q2[,1:2], "Doench2014CORRECTED.Chuai2018.finalquantum.classify.q2.score.txt", quote=F, row.names=F, sep="\t")
write.table(feature.score.q2[,1:2], "Doench2014CORRECTED.Chuai2018.finalquantum.classify.q2.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = feature.score.q2[,2]), "Doench2014CORRECTED.Chuai2018.finalquantum.classify.q2.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(feature.score.q3[,1:2], "Doench2014CORRECTED.Chuai2018.finalquantum.classify.q3.score.txt", quote=F, row.names=F, sep="\t")
write.table(feature.score.q3[,1:2], "Doench2014CORRECTED.Chuai2018.finalquantum.classify.q3.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = feature.score.q3[,2]), "Doench2014CORRECTED.Chuai2018.finalquantum.classify.q3.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(features, "Doench2014CORRECTED.Chuai2018.finalquantum.classify.features.txt", quote=F, row.names=F, sep="\t")
write.table(features, "Doench2014CORRECTED.Chuai2018.finalquantum.classify.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(features[,2:ncol(features)], "Doench2014CORRECTED.Chuai2018.finalquantum.classify.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q1.iRF
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q1.iRF
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName classify.q1 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/Doench2014CORRECTED.Chuai2018.finalquantum.classify.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/Doench2014CORRECTED.Chuai2018.finalquantum.classify.q1.score.txt
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q2.iRF
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q2.iRF
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName classify.q2 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/Doench2014CORRECTED.Chuai2018.finalquantum.classify.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/Doench2014CORRECTED.Chuai2018.finalquantum.classify.q2.score.txt
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q3.iRF
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q3.iRF
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName classify.q3 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/Doench2014CORRECTED.Chuai2018.finalquantum.classify.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/Doench2014CORRECTED.Chuai2018.finalquantum.classify.q3.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q1.iRF/Submits/submit_full_classify.q1_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q2.iRF/Submits/submit_full_classify.q2_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q3.iRF/Submits/submit_full_classify.q3_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q1.iRF/Submits/submit_train_classify.q1_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q2.iRF/Submits/submit_train_classify.q2_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q3.iRF/Submits/submit_train_classify.q3_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q1.iRF/Submits/submit_test_classify.q1_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q2.iRF/Submits/submit_test_classify.q2_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q3.iRF/Submits/submit_test_classify.q3_0.sh
# Andes
module load python/3.7.0-anaconda3-5.3.0
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q1.iRF
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/iRF.run/YNames.txt classify.q1
# 0.1941056538650136
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/classify.q1_cut.score.importance4 | head
# p16tetramer.Hbond.stackingraw: 31.572
# p10tetramer.Hlgap.eVEraw: 29.6904
# p1tetramer.Hlgap.eVEraw: 29.2804
# p14tetramer.Hbond.energyraw: 28.0641
# p1tetramer.Hbond.stackingraw: 27.6199
# p18trimer.Hbond.stackingraw: 27.5189
# p5tetramer.Hbond.stackingraw: 27.0375
# p7tetramer.Hlgap.eVEraw: 26.6611
# p11tetramer.Hlgap.eVEraw: 26.4983
# p6tetramer.Hlgap.eVEraw: 25.6329
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q1.iRF/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("classify.q1_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4893618
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q2.iRF
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/iRF.run/YNames.txt classify.q2
# 0.1337298835000708
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/classify.q2_cut.score.importance4 | head
# gene.distance0: 15.7001
# p8tetramer.Hbond.stackingraw: 12.7385
# p5tetramer.Hbond.stackingraw: 11.9665
# p1tetramer.Hbond.stackingraw: 11.7926
# p11tetramer.Hbond.stackingraw: 11.3131
# p13tetramer.Hbond.stackingraw: 9.38647
# p9tetramer.Hbond.stackingraw: 9.28884
# p1tetramer.Hlgap.eVEraw: 9.09503
# p8tetramer.Hlgap.eVEraw: 8.49876
# p11tetramer.Hlgap.eVEraw: 8.14378
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q2.iRF/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("classify.q2_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.3747989
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q3.iRF
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/iRF.run/YNames.txt classify.q3
# -0.011371164612896233
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/classify.q3_cut.score.importance4 | head
# gene.distance0: 3.33147
# p8tetramer.Hbond.stackingraw: 1.53449
# p3tetramer.Hbond.stackingraw: 1.2385
# p1tetramer.Hbond.stackingraw: 1.18453
# p8tetramer.Hlgap.eVEraw: 1.16985
# p13tetramer.Hbond.stackingraw: 0.786996
# p9tetramer.Hbond.stackingraw: 0.637593
# p1tetramer.Hlgap.eVEraw: 0.622084
# p17tetramer.Hbond.stackingraw: 0.619632
# p10tetramer.Hbond.stackingraw: 0.617121
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q3.iRF/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("classify.q3_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.1548719
–> use human, y.lipolytica, and e.coli to train the model –> then test the output on each dataset
# mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species
######################## need to normalize cut score across datasets... ########################
# z = (xi - min(x)) / (max(x) - min(x))
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
doench <- read.delim("Doench2014.raw.onehot.tensor.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
doench$cut.score <- doench$cut.score.x
doench.cut <- doench[,c(1,1657, 3:1654, 1656)]
ncol(doench.cut)
# 1655
nrow(doench.cut)
# 1825
doench.id <- separate(doench.cut, sgRNAID, c("data", "sgRNAID"))
doench.num <- mutate_all(doench.id[,2:ncol(doench.id)], function(x) as.numeric(as.character(x)))
doench.num$cut.score <- (doench.num$cut.score - min(doench.num$cut.score)) / (max(doench.num$cut.score) - min(doench.num$cut.score))
summary(doench.num$cut.score)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 0.00000 0.04388 0.11479 0.19639 0.28086 1.00000
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
lipolytica <- read.delim("y.lipolytica.raw.onehot.tensor.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
lipolytica <- lipolytica[,c(1,1656,3:1649,1651:1655,1657)]
ncol(lipolytica)
# 1655
nrow(lipolytica)
# 45271
lipolytica.num <- mutate_all(lipolytica[,1:ncol(lipolytica)], function(x) as.numeric(as.character(x)))
lipolytica.num$cut.score <- (lipolytica.num$cut.score - min(lipolytica.num$cut.score)) / (max(lipolytica.num$cut.score) - min(lipolytica.num$cut.score))
summary(lipolytica.num$cut.score)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 0.0000 0.2167 0.2877 0.3389 0.4460 1.0000
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
ecoli <- read.delim("Ecoli.allCas9.raw.onehot.tensor.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
ecoli.sep <- ecoli %>% separate(sgRNAID, c("sgRNA", "ID", "type"), sep="_")
ecoli.cas9 <- subset(ecoli.sep, ecoli.sep$type == "Cas9")
ecoli <- ecoli.cas9[,c(1:3,1658,5:1651,1653:1657,1659)]
ecoli <- ecoli %>% unite(sgRNAID, c("sgRNA", "ID", "type"), sep="_")
ncol(ecoli)
# 1655
nrow(ecoli)
# 40468
ecoli.num <- mutate_all(ecoli[,2:ncol(ecoli)], function(x) as.numeric(as.character(x)))
ecoli.num$cut.score <- (ecoli.num$cut.score - min(ecoli.num$cut.score)) / (max(ecoli.num$cut.score) - min(ecoli.num$cut.score))
ecoli.num <- cbind(data.frame("sgRNAID" = ecoli$sgRNAID), ecoli.num)
summary(ecoli.num$cut.score)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 0.0000 0.3563 0.5618 0.5077 0.6757 1.0000
ecoli.num.sample <- ecoli.num[sample(nrow(ecoli.num), 1000), ]
all <- rbind(doench.num, lipolytica.num, ecoli.num)
ncol(all)
# 1655
nrow(all)
# 87564
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species")
write.table(all, "doench.baisya.ecoli.noDWT.raw.onehot.tensor.pam.location.features.id.score.txt", quote=F, row.names=F, sep="\t")
write.table(all[,c(1,3:ncol(all))], "doench.baisya.ecoli.noDWT.raw.onehot.tensor.pam.location.features.txt", quote=F, row.names=F, sep="\t")
write.table(all[,c(1,3:ncol(all))], "doench.baisya.ecoli.noDWT.raw.onehot.tensor.pam.location.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(all[,3:ncol(all)], "doench.baisya.ecoli.noDWT.raw.onehot.tensor.pam.location.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(all[,1:2], "doench.baisya.ecoli.noDWT.score.txt", quote=F, row.names=F, sep="\t")
write.table(all[,1:2], "doench.baisya.ecoli.noDWT.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = all[,2]), "doench.baisya.ecoli.noDWT.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
doench.num.sample <- doench.num[sample(nrow(doench.num), 1000), ]
doench.num.sample$sgRNAID <- paste0("doench_", doench.num.sample$sgRNAID)
lipolytica.num.sample <- lipolytica.num[sample(nrow(lipolytica.num), 1000), ]
lipolytica.num.sample$sgRNAID <- paste0("lipolytica_", lipolytica.num.sample$sgRNAID)
ecoli.num.sample <- ecoli.num[sample(nrow(ecoli.num), 1000), ]
ecoli.num.sample$sgRNAID <- paste0("ecoli_", ecoli.num.sample$sgRNAID)
all <- rbind(doench.num.sample, lipolytica.num.sample, ecoli.num.sample)
ncol(all)
# 1655
nrow(all)
# 3000
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species")
write.table(all, "sample.doench.baisya.ecoli.noDWT.raw.onehot.tensor.pam.location.features.id.score.txt", quote=F, row.names=F, sep="\t")
write.table(all[,c(1,3:ncol(all))], "sample.doench.baisya.ecoli.noDWT.raw.onehot.tensor.pam.location.features.txt", quote=F, row.names=F, sep="\t")
write.table(all[,c(1,3:ncol(all))], "sample.doench.baisya.ecoli.noDWT.raw.onehot.tensor.pam.location.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(all[,3:ncol(all)], "sample.doench.baisya.ecoli.noDWT.raw.onehot.tensor.pam.location.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(all[,1:2], "sample.doench.baisya.ecoli.noDWT.score.txt", quote=F, row.names=F, sep="\t")
write.table(all[,1:2], "sample.doench.baisya.ecoli.noDWT.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = all[,2]), "sample.doench.baisya.ecoli.noDWT.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
# run python scripts on Andes
# run job submissions on Summit
# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]
# Andes
module load python/3.7-anaconda3
#mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName doench.baisya.ecoli.noDWT --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/doench.baisya.ecoli.noDWT.raw.onehot.tensor.pam.location.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/doench.baisya.ecoli.noDWT.score.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/sample
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName sample.doench.baisya.ecoli.noDWT --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/sample.doench.baisya.ecoli.noDWT.raw.onehot.tensor.pam.location.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/sample.doench.baisya.ecoli.noDWT.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/Submits/submit_full_doench.baisya.ecoli.noDWT_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/sample/Submits/submit_full_sample.doench.baisya.ecoli.noDWT_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/Submits/submit_train_doench.baisya.ecoli.noDWT_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/sample/Submits/submit_train_sample.doench.baisya.ecoli.noDWT_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/Submits/submit_test_doench.baisya.ecoli.noDWT_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/sample/Submits/submit_test_sample.doench.baisya.ecoli.noDWT_0.sh
# Andes
module load python/3.7-anaconda3
vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/YNames.txt doench.baisya.ecoli.noDWT
# 0.3350460737657166
sort -k3rg topVarEdges/cut.score_top95.txt | head
# sgRNA.structuresgRNA.raw cut.score 0.33219874783701125
# TTsgRNA.raw cut.score 0.02721536407114881
# pam.distance0 cut.score 0.025521266163737327
# p20homo_lumo_energygapraw cut.score 0.023461680283060747
# GGsgRNA.raw cut.score 0.019885541473588488
# CCsgRNA.raw cut.score 0.018555923802205367
# gene.distance0 cut.score 0.017976016828707926
# sgRNA.gcsgRNA.raw cut.score 0.016818264162424525
# sgRNA.tempsgRNA.raw cut.score 0.016453280057777794
# p20xz_quadrupoleraw cut.score 0.01605316256855977
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/doench.baisya.ecoli.noDWT_cut.score.importance4 | head
# sgRNA.structuresgRNA.raw: 572.797
# TTsgRNA.raw: 47.8516
# pam.distance0: 45.4937
# GGsgRNA.raw: 35.4175
# p20yz_quadrupoleraw: 35.102
# p20xz_quadrupoleraw: 35.0073
# CCsgRNA.raw: 31.7809
# gene.distance0: 31.2655
# sgRNA.tempsgRNA.raw: 30.1631
# sgRNA.gcsgRNA.raw: 28.0852
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("doench.baisya.ecoli.noDWT_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.5848896
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("doench.baisya.ecoli.noDWT_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions., method=c("pearson"))
# 0.5848896
cor(y$cut.score, pred$Predictions., method=c("spearman"))
# 0.5577428
id <- read.delim("set4_test_SampleIDs.txt", header=F, sep="\t")
colnames(id) <- "sgRNAID"
id.pred <- cbind(id, pred)
id.pred.y <- cbind(id.pred, y)
library(tidyr)
id.pred.y.group <- id.pred.y %>% separate(sgRNAID, c("sgRNA", "ID", "group"), "_")
pred.ecoli <- subset(id.pred.y.group, id.pred.y.group$group == "Cas9")
cor(pred.Cas9$cut.score, pred.Cas9$Predictions., method=c("pearson"))
# 0.4994605
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/sample
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/YNames.txt sample.doench.baisya.ecoli.noDWT
# 0.4030488131364838
sort -k3rg topVarEdges/cut.score_top95.txt | head
# sgRNA.structuresgRNA.raw cut.score 0.3805553226209416
# PAM.A0 cut.score 0.07049413297499693
# GGsgRNA.raw cut.score 0.04075023565524509
# gene.distance0 cut.score 0.021694200998125943
# pam.distance0 cut.score 0.021492929092492872
# PAM.T0 cut.score 0.021038666169178367
# CGsgRNA.raw cut.score 0.017749487963395524
# PAM.G0 cut.score 0.015037086014790793
# sgRNA.tempsgRNA.raw cut.score 0.014170446749402476
# sgRNA.gcsgRNA.raw cut.score 0.011947409859095641
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/sample.doench.baisya.ecoli.noDWT_cut.score.importance4 | head
# sgRNA.structuresgRNA.raw: 32.0997
# PAM.A0: 6.05361
# GGsgRNA.raw: 3.06696
# PAM.G0: 1.77555
# PAM.T0: 1.71547
# pam.distance0: 1.69503
# gene.distance0: 1.6383
# CGsgRNA.raw: 1.35543
# GsgRNA.raw: 1.25625
# TsgRNA.raw: 0.960653
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/sample/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("sample.doench.baisya.ecoli.noDWT_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions., method=c("pearson"))
# 0.6125933
cor(y$cut.score, pred$Predictions., method=c("spearman"))
# 0.6135849
id <- read.delim("set4_test_SampleIDs.txt", header=F, sep="\t")
colnames(id) <- "sgRNAID"
id.pred <- cbind(id, pred)
id.pred.y <- cbind(id.pred, y)
library(tidyr)
id.pred.y.group <- id.pred.y %>% separate(sgRNAID, c("group", "ID"), "_")
pred.ecoli <- subset(id.pred.y.group, id.pred.y.group$group == "ecoli")
cor(pred.ecoli$cut.score, pred.ecoli$Predictions., method=c("pearson"))
# 0.3820894
pred.doench <- subset(id.pred.y.group, id.pred.y.group$group == "doench")
cor(pred.doench$cut.score, pred.doench$Predictions., method=c("pearson"))
# 0.5427935
pred.lipolytica <- subset(id.pred.y.group, id.pred.y.group$group == "lipolytica")
cor(pred.lipolytica$cut.score, pred.lipolytica$Predictions., method=c("pearson"))
# 0.09821948
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score doench.baisya.ecoli.noDWT
# python /gpfs/alpine/syb105/proj-shared/Personal/jromero/PathAnalysis/ritEval.py doench.baisya.ecoli.noDWT_cut.score.importance4 cut.score
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/cut.score/RIT.run
sort -k3rg doench.baisya.ecoli.noDWT_cut.score.importance4.effect | head
# Feature YVec NormEdge FeatureEffect Samples Linearity
# sgRNA.structuresgRNA.raw cut.score 0.33219874783701114 1.5015673448633625e-06 171364.954 0.3852079008244159
# TTsgRNA.raw cut.score 0.027215364071148804 2.7005144780116256e-06 91778.339 0.3869038711036754
# pam.distance0 cut.score 0.02552126616373732 -1.7832494531292952e-06 39195.414 0.3945925333338561
# p20homo_lumo_energygapraw cut.score 0.02346168028306074 2.4595695339546074e-06 20252.915 0.4824571403940234
# GGsgRNA.raw cut.score 0.01988554147358848 -1.0526693263682948e-06 33003.564 0.45044545253435986
# CCsgRNA.raw cut.score 0.018555923802205363 -1.8442117417059407e-06 37633.404 0.4250683975531926
# gene.distance0 cut.score 0.017976016828707923 -1.7360295260056e-07 83554.548 0.321049884310981
# sgRNA.gcsgRNA.raw cut.score 0.016818264162424518 -1.3087550076601961e-06 35681.872 0.4454515249372095
# sgRNA.tempsgRNA.raw cut.score 0.01645328005777779 -1.5365531662233492e-06 35027.959 0.4443609858962564
# p20xz_quadrupoleraw cut.score 0.016053162568559768 -2.058420202656411e-06 16640.759 0.4722778763484967
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate shap
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species
# python
import pandas as pd
import numpy as np
np.random.seed(0)
import matplotlib.pyplot as plt
df = pd.read_table('/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/doench.baisya.ecoli.noDWT.raw.onehot.tensor.pam.location.features.id.score.txt') # Load the data
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.ensemble import RandomForestRegressor
# The target variable is 'cut.score'.
Y = df['cut.score']
# get list of features from R... dput(colnames(df))
X = df.drop(columns =['sgRNAID', 'cut.score'])
# Split the data into train and test data:
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size = 0.2)
# Build the model with the random forest regression algorithm:
model = RandomForestRegressor(max_depth=6,random_state=0,n_estimators=10)
model.fit(X_train, Y_train)
import shap
shap_values = shap.TreeExplainer(model).shap_values(X_train)
f = plt.figure()
shap.summary_plot(shap_values, X_train, plot_type="bar")
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/multi.species.18jan.shap_summary_plot_bar.png", bbox_inches='tight', dpi=600)
import matplotlib.pyplot as plt
f = plt.figure()
shap.summary_plot(shap_values, X_train)
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/multi.species.18jan.shap_summary_plot_varimp.png", bbox_inches='tight', dpi=600)
# scp noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/multi.species.18jan.shap_summary_plot_varimp.png /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP/.
### Summit
#!/bin/bash -l
#BSUB -P SYB105
#BSUB -W 02:15
#BSUB -nnodes 50
#BSUB -J multi.ecoli_0
#BSUB -o multi.ecoli_0.o%J
#BSUB -e multi.ecoli_0.e%J
#mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/ecoli.cas9/
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/ecoli.cas9
#/usr/bin/time -f "%e" jsrun -n 1 -a 1 -c 40 -bpacked:40 /gpfs/alpine/syb105/proj-shared/Projects/iRF/IterativeRanger/cpp_version/build/ranger --file /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.cas9.raw.onehot.tensor.pam.location.features_overlap_noSampleIDs.txt --yfile /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.cas9.score_overlap_noSampleIDs.txt --predict /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/cut.score/foldRuns/fold0/Runs/Set0/doench.baisya.ecoli.noDWT_cut.score.forest --treetype 3 --depvarname cut.score --impmeasure 1 --nthreads 160 --useMPI 0 --outprefix multi.ecoli --outputDirectory /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/ecoli.cas9/ > /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/ecoli.cas9/multi.ecoli_test.o
/usr/bin/time -f "%e" jsrun -n 1 -a 1 -c 40 -bpacked:40 /gpfs/alpine/syb105/proj-shared/Projects/iRF/IterativeRanger/cpp_version/build/ranger --file /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.cas9.raw.onehot.tensor.pam.location.features_overlap_noSampleIDs.txt --yfile /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.cas9.score_overlap_noSampleIDs.txt --predict /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/cut.score/foldRuns/fold9/Runs/Set4/doench.baisya.ecoli.noDWT_cut.score.forest --treetype 3 --depvarname cut.score --impmeasure 1 --nthreads 160 --useMPI 0 --outprefix multi.ecoli --outputDirectory /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/ecoli.cas9/ > /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/ecoli.cas9/multi.ecoli_test4.o
# bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/multi.ecoli_test_submit.sh
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
score <- read.delim("e.coli.cas9.score_overlap_noSampleIDs.txt", header=T, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/ecoli.cas9")
predict <- read.delim("multi.ecoli.prediction", header=T, sep="\t")
score.predict <- cbind(score, predict)
cor(score.predict$cut.score, score.predict$Predictions.)
# 0.7084526
pdf("multi.ecoli.prediction.scatter.pdf")
library(ggplot2)
ggplot(score.predict, aes(x=cut.score, y=Predictions.)) + geom_point() + theme_classic()
dev.off()
### Summit
#!/bin/bash -l
#BSUB -P SYB105
#BSUB -W 02:15
#BSUB -nnodes 50
#BSUB -J multi.lipolytica_0
#BSUB -o multi.lipolytica_0.o%J
#BSUB -e multi.lipolytica_0.e%J
#mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/lipolytica/
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/lipolytica
/usr/bin/time -f "%e" jsrun -n 1 -a 1 -c 40 -bpacked:40 /gpfs/alpine/syb105/proj-shared/Projects/iRF/IterativeRanger/cpp_version/build/ranger --file /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.raw.onehot.tensor.pam.location.features_overlap_noSampleIDs.txt --yfile /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.score_overlap_noSampleIDs.txt --predict /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/cut.score/foldRuns/fold0/Runs/Set0/doench.baisya.ecoli.noDWT_cut.score.forest --treetype 3 --depvarname cut.score --impmeasure 1 --nthreads 160 --useMPI 0 --outprefix multi.lipolytica --outputDirectory /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/lipolytica/ > /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/lipolytica/multi.lipolytica_test.o
# bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/multi.lipolytica_test_submit.sh
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/")
score <- read.delim("y.lipolytica.score_overlap_noSampleIDs.txt", header=T, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/lipolytica")
predict <- read.delim("multi.lipolytica.prediction", header=T, sep="\t")
score.predict <- cbind(score, predict)
cor(score.predict$cut.score, score.predict$Predictions.)
# 0.7169777
pdf("multi.lipolytica.prediction.scatter.pdf")
library(ggplot2)
ggplot(score.predict, aes(x=cut.score, y=Predictions.)) + geom_point() + theme_classic()
dev.off()
### Summit
#!/bin/bash -l
#BSUB -P SYB105
#BSUB -W 02:15
#BSUB -nnodes 50
#BSUB -J multi.doench_0
#BSUB -o multi.ooench_0.o%J
#BSUB -e multi.doench_0.e%J
#mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/doench/
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/doench
/usr/bin/time -f "%e" jsrun -n 1 -a 1 -c 40 -bpacked:40 /gpfs/alpine/syb105/proj-shared/Projects/iRF/IterativeRanger/cpp_version/build/ranger --file /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.raw.onehot.tensor.pam.location.features_overlap_noSampleIDs.txt --yfile /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.score_overlap_noSampleIDs.txt --predict /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/cut.score/foldRuns/fold0/Runs/Set0/doench.baisya.ecoli.noDWT_cut.score.forest --treetype 3 --depvarname cut.score --impmeasure 1 --nthreads 160 --useMPI 0 --outprefix multi.doench --outputDirectory /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/doench/ > /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/doench/multi.doench_test.o
# bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/multi.doench_test_submit.sh
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/")
score <- read.delim("Doench2014.score_overlap_noSampleIDs.txt", header=T, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/doench")
predict <- read.delim("multi.doench.prediction", header=T, sep="\t")
score.predict <- cbind(score, predict)
cor(score.predict$cut.score, score.predict$Predictions.)
# 0.7219359
pdf("multi.doench.prediction.scatter.pdf")
library(ggplot2)
ggplot(score.predict, aes(x=cut.score, y=Predictions.)) + geom_point() + theme_classic()
dev.off()
# salloc -A SYB105 -N 2 -t 4:00:00
# mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species
######################## need to normalize cut score across datasets... ########################
# z = (xi - min(x)) / (max(x) - min(x))
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
doench <- read.delim("Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.txt", header=T, sep="\t", stringsAsFactors = F)
ncol(doench)
# 6173
nrow(doench)
# 673
doench.num <- mutate_all(doench[,2:ncol(doench)], function(x) as.numeric(as.character(x)))
doench.num$cut.score <- (doench.num$cut.score - min(doench.num$cut.score)) / (max(doench.num$cut.score) - min(doench.num$cut.score))
doench.num <- cbind(data.frame("sgRNAID" = doench$sgRNAID), doench.num)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
lipolytica <- read.delim("y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.txt", header=T, sep="\t", stringsAsFactors = F)
ncol(lipolytica)
# 6173
nrow(lipolytica)
# 45271
names(lipolytica)[names(lipolytica) == 'cut.score.x'] <- 'cut.score'
lipolytica.num <- mutate_all(lipolytica[,2:ncol(lipolytica)], function(x) as.numeric(as.character(x)))
lipolytica.num$cut.score <- (lipolytica.num$cut.score - min(lipolytica.num$cut.score)) / (max(lipolytica.num$cut.score) - min(lipolytica.num$cut.score))
lipolytica.num <- cbind(data.frame("sgRNAID" = lipolytica$sgRNAID), lipolytica.num)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
ecoli <- read.delim("Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.txt", header=T, sep="\t", stringsAsFactors = F)
ncol(ecoli)
# 6173
nrow(ecoli)
# 40468
ecoli.num <- mutate_all(ecoli[,2:ncol(ecoli)], function(x) as.numeric(as.character(x)))
ecoli.num$cut.score <- (ecoli.num$cut.score - min(ecoli.num$cut.score)) / (max(ecoli.num$cut.score) - min(ecoli.num$cut.score))
ecoli.num <- cbind(data.frame("sgRNAID" = ecoli$sgRNAID), ecoli.num)
###### need to adjust sgRNA IDs to include species name...
###### need to subset data so we are taking equal number of samples from each species (limited by Doench dataset)
doench.num.sample <- doench.num[sample(nrow(doench.num), 673), ]
doench.num.sample$sgRNAID <- paste0("doench_", doench.num.sample$sgRNAID)
lipolytica.num.sample <- lipolytica.num[sample(nrow(lipolytica.num), 673), ]
lipolytica.num.sample$sgRNAID <- paste0("lipolytica_", lipolytica.num.sample$sgRNAID)
ecoli.num.sample <- ecoli.num[sample(nrow(ecoli.num), 673), ]
ecoli.num.sample$sgRNAID <- paste0("ecoli_", ecoli.num.sample$sgRNAID)
#doench.lipolytica <- dplyr::bind_rows(doench.num.sample, lipolytica.num.sample)
#all <- dplyr::bind_rows(doench.lipolytica, ecoli.num.sample)
d.names <- names(doench.num.sample)
l.names <- names(lipolytica.num.sample)
e.names <- names(ecoli.num.sample)
setdiff(d.names, l.names)
setdiff(l.names, e.names)
setdiff(e.names, l.names)
names(ecoli.num.sample)[names(ecoli.num.sample) == 'V306sgRNA.raw'] <- 'V306.xsgRNA.raw'
names(ecoli.num.sample)[names(ecoli.num.sample) == 'V307sgRNA.raw'] <- 'V307.xsgRNA.raw'
names(ecoli.num.sample)[names(ecoli.num.sample) == 'V308sgRNA.raw'] <- 'V308.xsgRNA.raw'
names(ecoli.num.sample)[names(ecoli.num.sample) == 'V309sgRNA.raw'] <- 'V309.xsgRNA.raw'
names(ecoli.num.sample)[names(ecoli.num.sample) == 'V310sgRNA.raw'] <- 'V310.xsgRNA.raw'
names(ecoli.num.sample)[names(ecoli.num.sample) == 'V311sgRNA.raw'] <- 'V311.xsgRNA.raw'
names(ecoli.num.sample)[names(ecoli.num.sample) == 'V312sgRNA.raw'] <- 'V312.xsgRNA.raw'
names(ecoli.num.sample)[names(ecoli.num.sample) == 'V313sgRNA.raw'] <- 'V313.xsgRNA.raw'
names(ecoli.num.sample)[names(ecoli.num.sample) == 'V314sgRNA.raw'] <- 'V314.xsgRNA.raw'
names(ecoli.num.sample)[names(ecoli.num.sample) == 'V315sgRNA.raw'] <- 'V315.xsgRNA.raw'
names(ecoli.num.sample)[names(ecoli.num.sample) == 'V316sgRNA.raw'] <- 'V316.xsgRNA.raw'
names(ecoli.num.sample)[names(ecoli.num.sample) == 'V317sgRNA.raw'] <- 'V317.xsgRNA.raw'
names(ecoli.num.sample)[names(ecoli.num.sample) == 'V318sgRNA.raw'] <- 'V318.xsgRNA.raw'
names(ecoli.num.sample)[names(ecoli.num.sample) == 'V319sgRNA.raw'] <- 'V319.xsgRNA.raw'
names(ecoli.num.sample)[names(ecoli.num.sample) == 'V320sgRNA.raw'] <- 'V320.xsgRNA.raw'
names(ecoli.num.sample)[names(ecoli.num.sample) == 'V81sgRNA.raw'] <- 'V81.x.xsgRNA.raw'
ecoli.num.sample$V81.y.ysgRNA.raw <- 0
e.names <- names(ecoli.num.sample)
setdiff(l.names, e.names)
setdiff(e.names, l.names)
all <- rbind(doench.num.sample, lipolytica.num.sample, ecoli.num.sample)
ncol(all)
# 6174
nrow(all)
# 2019
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species")
write.table(all, "doench.baisya.ecoli.18jan.raw.onehot.tensor.pam.location.features.id.score.txt", quote=F, row.names=F, sep="\t")
# var --> doench=0.03670343, ecoli=0.04583629, lipolytica=0.02404219
# sd --> doench=0.1915814, ecoli=0.2140941, lipolytica=0.1550555
# mean --> doench=0.1797033, ecoli=0.5210303, lipolytica=0.3284551
write.table(all[,c(1,3:ncol(all))], "doench.baisya.ecoli.18jan.raw.onehot.tensor.pam.location.features.txt", quote=F, row.names=F, sep="\t")
write.table(all[,c(1,3:ncol(all))], "doench.baisya.ecoli.18jan.raw.onehot.tensor.pam.location.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(all[,3:ncol(all)], "doench.baisya.ecoli.18jan.raw.onehot.tensor.pam.location.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(all[,1:2], "doench.baisya.ecoli.18jan.score.txt", quote=F, row.names=F, sep="\t")
write.table(all[,1:2], "doench.baisya.ecoli.18jan.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = all[,2]), "doench.baisya.ecoli.18jan.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
# run python scripts on Andes
# run job submissions on Summit
# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]
# Andes
module load python/3.7-anaconda3
#mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName doench.baisya.ecoli.18jan --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/doench.baisya.ecoli.18jan.raw.onehot.tensor.pam.location.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/doench.baisya.ecoli.18jan.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/Submits/submit_full_doench.baisya.ecoli.18jan_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/Submits/submit_train_doench.baisya.ecoli.18jan_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/Submits/submit_test_doench.baisya.ecoli.18jan_0.sh
# Andes
module load python/3.7-anaconda3
vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/YNames.txt doench.baisya.ecoli.18jan
# 0.4228729850812137
sort -k3rg topVarEdges/cut.score_top95.txt | head
# sgRNA.structuresgRNA.raw cut.score 0.462981027783319
# pam.distance0 cut.score 0.04477021223536862
# GGsgRNA.raw cut.score 0.03985650177869358
# p20No_electronsraw cut.score 0.02293728111265287
# sgRNA.tempsgRNA.raw cut.score 0.018801638583876092
# sgRNA.gcsgRNA.raw cut.score 0.018569414804966544
# CCsgRNA.raw cut.score 0.018239097698237207
# p20HL.gap_eVraw cut.score 0.018139195557280316
# p20LUMO_eVraw cut.score 0.014351123429771385
# p20HOMO_eVraw cut.score 0.013971272670979858
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/doench.baisya.ecoli.18jan_cut.score.importance4 | head
# sgRNA.structuresgRNA.raw: 25.9978
# pam.distance0: 2.82849
# GGsgRNA.raw: 2.47276
# p20No_electronsraw: 1.47081
# CCsgRNA.raw: 1.24198
# p20HOMO_eVraw: 1.21726
# sgRNA.gcsgRNA.raw: 1.03332
# sgRNA.tempsgRNA.raw: 0.957661
# p20HL.gap_eVraw: 0.881469
# p20LUMO_eVraw: 0.778138
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("doench.baisya.ecoli.18jan_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.6810235
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("doench.baisya.ecoli.18jan_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions., method=c("pearson"))
# 0.6810235
cor(y$cut.score, pred$Predictions., method=c("spearman"))
# 0.6804178
id <- read.delim("set4_test_SampleIDs.txt", header=F, sep="\t")
colnames(id) <- "sgRNAID"
id.pred <- cbind(id, pred)
id.pred.y <- cbind(id.pred, y)
library(tidyr)
id.pred.y.group <- id.pred.y %>% separate(sgRNAID, c("species", "sgRNAID"), "_")
pred.ecoli <- subset(id.pred.y.group, id.pred.y.group$species == "ecoli")
cor(pred.ecoli$cut.score, pred.ecoli$Predictions., method=c("pearson"))
# 0.5247512
pred.lipolytica <- subset(id.pred.y.group, id.pred.y.group$species == "lipolytica")
cor(pred.lipolytica$cut.score, pred.lipolytica$Predictions., method=c("pearson"))
# -0.0006011889
pred.doench <- subset(id.pred.y.group, id.pred.y.group$species == "doench")
cor(pred.doench$cut.score, pred.doench$Predictions., method=c("pearson"))
# 0.4311619
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score doench.baisya.ecoli.18jan
# python /gpfs/alpine/syb105/proj-shared/Personal/jromero/PathAnalysis/ritEval.py doench.baisya.ecoli.18jan_cut.score.importance4 cut.score
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/cut.score/RIT.run.18jan
sort -k3rg doench.baisya.ecoli.18jan_cut.score.importance4.effect | head
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate shap
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species
# python
import pandas as pd
import numpy as np
np.random.seed(0)
import matplotlib.pyplot as plt
df = pd.read_table('/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/doench.baisya.ecoli.18jan.raw.onehot.tensor.pam.location.features.id.score.txt') # Load the data
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.ensemble import RandomForestRegressor
# The target variable is 'cut.score'.
Y = df['cut.score']
# get list of features from R... dput(colnames(df))
X = df.drop(columns =['sgRNAID', 'cut.score'])
# Split the data into train and test data:
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size = 0.2)
# Build the model with the random forest regression algorithm:
model = RandomForestRegressor(max_depth=6,random_state=0,n_estimators=10)
model.fit(X_train, Y_train)
import shap
shap_values = shap.TreeExplainer(model).shap_values(X_train)
f = plt.figure()
shap.summary_plot(shap_values, X_train, plot_type="bar")
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/multi.species.18jan.shap_summary_plot_bar.png", bbox_inches='tight', dpi=600)
import matplotlib.pyplot as plt
f = plt.figure()
shap.summary_plot(shap_values, X_train)
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/multi.species.18jan.shap_summary_plot_varimp.png", bbox_inches='tight', dpi=600)
# scp noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/multi.species.18jan.shap_summary_plot_varimp.png /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP/.
# split each Cas9 group into two groups...
# add --group tag
# add --sampleSize tag
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/")
df <- read.delim("doench.baisya.ecoli.18jan.raw.onehot.tensor.pam.location.features.txt")
df.id <- data.frame(df$sgRNAID)
library(tidyr)
df.sep <- separate(df.id, df.sgRNAID, c("species", "sgRNAID"), sep="_")
df.ecoli <- subset(df.sep, df.sep$species == "ecoli")
# 673 / 2 = 336.5
df.lipolytica <- subset(df.sep, df.sep$species == "lipolytica")
df.doench <- subset(df.sep, df.sep$species == "doench")
df.ecoli.1 <- df.ecoli[1:336,]
df.ecoli.2 <- df.ecoli[337:673,]
df.lipolytica.1 <- df.lipolytica[1:336,]
df.lipolytica.2 <- df.lipolytica[337:673,]
df.doench.1 <- df.doench[1:336,]
df.doench.2 <- df.doench[337:673,]
df.ecoli.1$group <- "ecoli.group1"
df.ecoli.2$group <- "ecoli.group2"
df.lipolytica.1$group <- "lipolytica.group1"
df.lipolytica.2$group <- "lipolytica.group2"
df.doench.1$group <- "doench.group1"
df.doench.2$group <- "doench.group2"
df1 <- rbind(df.ecoli.1, df.ecoli.2)
df2 <- rbind(df1, df.lipolytica.1)
df3 <- rbind(df2, df.lipolytica.2)
df4 <- rbind(df3, df.doench.1)
df5 <- rbind(df4, df.doench.2)
library(dplyr)
df.order <- left_join(df.sep, df5, by=c("sgRNAID", "species"))
df.group <- data.frame(df.order$group)
colnames(df.group) <- "groupID"
write.table(df.group, "doench.baisya.ecoli.18jan.groupfile.txt", quote=F, row.names=F, sep="\t")
# run python scripts on Andes
# run job submissions on Summit
# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]
# Andes
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/
mkdir group.features
module load python/3.7-anaconda3
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/group.features
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 120 --Account SYB105 --NumTrees 1000 --NumIterations 6 --RunName doench.baisya.ecoli.group --bypass --Prediction --sampleSize 20000 --groupFile /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/doench.baisya.ecoli.18jan.groupfile.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/doench.baisya.ecoli.18jan.raw.onehot.tensor.pam.location.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/doench.baisya.ecoli.18jan.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/group.features/Submits/submit_full_doench.baisya.ecoli.group_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/group.features/Submits/submit_train_doench.baisya.ecoli.group_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/group.features/Submits/submit_test_doench.baisya.ecoli.group_0.sh
# Andes
module load python/3.7-anaconda3
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/group.features
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 6 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/YNames.txt doench.baisya.ecoli.group
# 0.09236658363398188
# correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/group.features/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("doench.baisya.ecoli.group_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions., method=c("pearson"))
# 0.4468816
cor(y$cut.score, pred$Predictions., method=c("spearman"))
# 0.5049132
# correlation - by Cas9 type
id <- read.delim("set4_test_SampleIDs.txt", header=F, sep="\t")
colnames(id) <- "sgRNAID"
id.pred <- cbind(id, pred)
id.pred.y <- cbind(id.pred, y)
library(tidyr)
id.pred.y.group <- id.pred.y %>% separate(sgRNAID, c("species", "sgRNAID"), "_")
# 336
cor(id.pred.y.group$cut.score, id.pred.y.group$Predictions., method=c("pearson"))
# 0.4468816
pred.ecoli <- subset(id.pred.y.group, id.pred.y.group$species == "ecoli")
cor(pred.ecoli$cut.score, pred.ecoli$Predictions., method=c("pearson"))
# NA
pred.lipolytica <- subset(id.pred.y.group, id.pred.y.group$species == "lipolytica")
cor(pred.lipolytica$cut.score, pred.lipolytica$Predictions., method=c("pearson"))
# NA
pred.doench <- subset(id.pred.y.group, id.pred.y.group$species == "doench")
cor(pred.doench$cut.score, pred.doench$Predictions., method=c("pearson"))
# 0.4468816
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/group.features/cut.score/foldRuns/fold9/Runs/Set3")
pred <- read.delim("doench.baisya.ecoli.group_Set3_test.prediction", header=T, sep="\t")
y <- read.delim("set3_Y_test_noSampleIDs.txt", header=T, sep="\t")
id <- read.delim("set3_test_SampleIDs.txt", header=F, sep="\t")
colnames(id) <- "sgRNAID"
id.pred <- cbind(id, pred)
id.pred.y <- cbind(id.pred, y)
library(tidyr)
id.pred.y.group <- id.pred.y %>% separate(sgRNAID, c("species", "sgRNAID"), "_")
# 632
cor(id.pred.y.group$cut.score, id.pred.y.group$Predictions., method=c("pearson"))
# 0.4252533
pred.ecoli <- subset(id.pred.y.group, id.pred.y.group$species == "ecoli")
cor(pred.ecoli$cut.score, pred.ecoli$Predictions., method=c("pearson"))
# 0.3815656
pred.lipolytica <- subset(id.pred.y.group, id.pred.y.group$species == "lipolytica")
cor(pred.lipolytica$cut.score, pred.lipolytica$Predictions., method=c("pearson"))
# -0.103618
pred.doench <- subset(id.pred.y.group, id.pred.y.group$species == "doench")
cor(pred.doench$cut.score, pred.doench$Predictions., method=c("pearson"))
# NA
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/group.features/cut.score/foldRuns/fold9/Runs/Set2")
pred <- read.delim("doench.baisya.ecoli.group_Set2_test.prediction", header=T, sep="\t")
y <- read.delim("set2_Y_test_noSampleIDs.txt", header=T, sep="\t")
id <- read.delim("set2_test_SampleIDs.txt", header=F, sep="\t")
colnames(id) <- "sgRNAID"
id.pred <- cbind(id, pred)
id.pred.y <- cbind(id.pred, y)
library(tidyr)
id.pred.y.group <- id.pred.y %>% separate(sgRNAID, c("species", "sgRNAID"), "_")
# 337
cor(id.pred.y.group$cut.score, id.pred.y.group$Predictions., method=c("pearson"))
# 0.5566255
pred.ecoli <- subset(id.pred.y.group, id.pred.y.group$species == "ecoli")
cor(pred.ecoli$cut.score, pred.ecoli$Predictions., method=c("pearson"))
# NA
pred.lipolytica <- subset(id.pred.y.group, id.pred.y.group$species == "lipolytica")
cor(pred.lipolytica$cut.score, pred.lipolytica$Predictions., method=c("pearson"))
# NA
pred.doench <- subset(id.pred.y.group, id.pred.y.group$species == "doench")
cor(pred.doench$cut.score, pred.doench$Predictions., method=c("pearson"))
# 0.5566255
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/group.features/cut.score/foldRuns/fold9/Runs/Set1")
pred <- read.delim("doench.baisya.ecoli.group_Set1_test.prediction", header=T, sep="\t")
y <- read.delim("set1_Y_test_noSampleIDs.txt", header=T, sep="\t")
id <- read.delim("set1_test_SampleIDs.txt", header=F, sep="\t")
colnames(id) <- "sgRNAID"
id.pred <- cbind(id, pred)
id.pred.y <- cbind(id.pred, y)
library(tidyr)
id.pred.y.group <- id.pred.y %>% separate(sgRNAID, c("species", "sgRNAID"), "_")
# 337
cor(id.pred.y.group$cut.score, id.pred.y.group$Predictions., method=c("pearson"))
# 0.0346217
pred.ecoli <- subset(id.pred.y.group, id.pred.y.group$species == "ecoli")
cor(pred.ecoli$cut.score, pred.ecoli$Predictions., method=c("pearson"))
# NA
pred.lipolytica <- subset(id.pred.y.group, id.pred.y.group$species == "lipolytica")
cor(pred.lipolytica$cut.score, pred.lipolytica$Predictions., method=c("pearson"))
# 0.0346217
pred.doench <- subset(id.pred.y.group, id.pred.y.group$species == "doench")
cor(pred.doench$cut.score, pred.doench$Predictions., method=c("pearson"))
# NA
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/group.features/cut.score/foldRuns/fold9/Runs/Set0")
pred <- read.delim("doench.baisya.ecoli.group_Set0_test.prediction", header=T, sep="\t")
y <- read.delim("set0_Y_test_noSampleIDs.txt", header=T, sep="\t")
id <- read.delim("set0_test_SampleIDs.txt", header=F, sep="\t")
colnames(id) <- "sgRNAID"
id.pred <- cbind(id, pred)
id.pred.y <- cbind(id.pred, y)
library(tidyr)
id.pred.y.group <- id.pred.y %>% separate(sgRNAID, c("species", "sgRNAID"), "_")
# 377
cor(id.pred.y.group$cut.score, id.pred.y.group$Predictions., method=c("pearson"))
# 0.2406913
pred.ecoli <- subset(id.pred.y.group, id.pred.y.group$species == "ecoli")
cor(pred.ecoli$cut.score, pred.ecoli$Predictions., method=c("pearson"))
# 0.2406913
pred.lipolytica <- subset(id.pred.y.group, id.pred.y.group$species == "lipolytica")
cor(pred.lipolytica$cut.score, pred.lipolytica$Predictions., method=c("pearson"))
# NA
pred.doench <- subset(id.pred.y.group, id.pred.y.group$species == "doench")
cor(pred.doench$cut.score, pred.doench$Predictions., method=c("pearson"))
# NA
–> use human, y.lipolytica, and e.coli to train the model –> then test the output on each dataset
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated
######################## need to normalize cut score across datasets... ########################
# z = (xi - min(x)) / (max(x) - min(x))
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
doench <- read.delim("Doench2014.finalquantum.noncorrelated.txt", header=T, sep="\t", stringsAsFactors = F)
ncol(doench)
# 6161
nrow(doench)
# 673
doench.id <- separate(doench, sgRNAID, c("data", "sgRNAID"))
doench.num <- mutate_all(doench.id[,2:ncol(doench.id)], function(x) as.numeric(as.character(x)))
doench.num$cut.score <- (doench.num$cut.score - min(doench.num$cut.score)) / (max(doench.num$cut.score) - min(doench.num$cut.score))
summary(doench.num$cut.score)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
#0.00000 0.03967 0.10641 0.17970 0.26133 1.00000
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
lipolytica <- read.delim("y.lipolytica.finalquantum.noncorrelated.txt", header=T, sep="\t", stringsAsFactors = F)
ncol(lipolytica)
# 6161
nrow(lipolytica)
# 45271
lipolytica.num <- mutate_all(lipolytica[,1:ncol(lipolytica)], function(x) as.numeric(as.character(x)))
lipolytica.num$cut.score <- (lipolytica.num$cut.score - min(lipolytica.num$cut.score)) / (max(lipolytica.num$cut.score) - min(lipolytica.num$cut.score))
summary(lipolytica.num$cut.score)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 0.0000 0.2167 0.2877 0.3389 0.4460 1.0000
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
ecoli <- read.delim("Ecoli.finalquantum.noncorrelated.txt", header=T, sep="\t", stringsAsFactors = F)
ncol(ecoli)
# 6160
nrow(ecoli)
# 40468
ecoli.num <- mutate_all(ecoli[,2:ncol(ecoli)], function(x) as.numeric(as.character(x)))
ecoli.num$cut.score <- (ecoli.num$cut.score - min(ecoli.num$cut.score)) / (max(ecoli.num$cut.score) - min(ecoli.num$cut.score))
ecoli.num <- cbind(data.frame("sgRNAID" = ecoli$sgRNAID), ecoli.num)
summary(ecoli.num$cut.score)
ecoli.num.sample <- ecoli.num[sample(nrow(ecoli.num), 1000), ]
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 0.0000 0.3563 0.5618 0.5077 0.6757 1.0000
#### columns don't match... find which columns and remove?
ecoli.names <- names(ecoli.num)
lipolytica.names <- names(lipolytica.num)
doench.names <- names(doench.num)
setdiff(lipolytica.names, doench.names)
# character(0)
setdiff(lipolytica.names, ecoli.names)
# [1] "V306.xsgRNA.raw" "V307.xsgRNA.raw" "V308.xsgRNA.raw" "V309.xsgRNA.raw"
# [5] "V310.xsgRNA.raw" "V311.xsgRNA.raw" "V312.xsgRNA.raw" "V313.xsgRNA.raw"
# [9] "V314.xsgRNA.raw" "V315.xsgRNA.raw" "V316.xsgRNA.raw" "V317.xsgRNA.raw"
# [13] "V318.xsgRNA.raw" "V319.xsgRNA.raw" "V320.xsgRNA.raw" "V81.x.xsgRNA.raw"
# [17] "V81.y.ysgRNA.raw"
setdiff(ecoli.names, lipolytica.names)
# [1] "V306sgRNA.raw" "V307sgRNA.raw" "V308sgRNA.raw" "V309sgRNA.raw"
# [5] "V310sgRNA.raw" "V311sgRNA.raw" "V312sgRNA.raw" "V313sgRNA.raw"
# [9] "V314sgRNA.raw" "V315sgRNA.raw" "V316sgRNA.raw" "V317sgRNA.raw"
# [13] "V318sgRNA.raw" "V319sgRNA.raw" "V320sgRNA.raw" "V81sgRNA.raw"
ecoli.num.df <- ecoli.num %>% select(-grep("V306sgRNA.raw", names(ecoli.num)), -grep("V307sgRNA.raw", names(ecoli.num)), -grep("V308sgRNA.raw", names(ecoli.num)), -grep("V309sgRNA.raw", names(ecoli.num)), -grep("V310sgRNA.raw", names(ecoli.num)), -grep("V311sgRNA.raw", names(ecoli.num)), -grep("V312sgRNA.raw", names(ecoli.num)), -grep("V313sgRNA.raw", names(ecoli.num)), -grep("V314sgRNA.raw", names(ecoli.num)), -grep("V315sgRNA.raw", names(ecoli.num)), -grep("V316sgRNA.raw", names(ecoli.num)), -grep("V317sgRNA.raw", names(ecoli.num)), -grep("V318sgRNA.raw", names(ecoli.num)), -grep("V319sgRNA.raw", names(ecoli.num)), -grep("V320sgRNA.raw", names(ecoli.num)), -grep("V81sgRNA.raw", names(ecoli.num)))
lipolytica.num.df <- lipolytica.num %>% select(-grep("V306.xsgRNA.raw", names(lipolytica.num)), -grep("V307.xsgRNA.raw", names(lipolytica.num)), -grep("V308.xsgRNA.raw", names(lipolytica.num)), -grep("V309.xsgRNA.raw", names(lipolytica.num)), -grep("V310.xsgRNA.raw", names(lipolytica.num)), -grep("V311.xsgRNA.raw", names(lipolytica.num)), -grep("V312.xsgRNA.raw", names(lipolytica.num)), -grep("V313.xsgRNA.raw", names(lipolytica.num)), -grep("V314.xsgRNA.raw", names(lipolytica.num)), -grep("V315.xsgRNA.raw", names(lipolytica.num)), -grep("V316.xsgRNA.raw", names(lipolytica.num)), -grep("V317.xsgRNA.raw", names(lipolytica.num)), -grep("V318.xsgRNA.raw", names(lipolytica.num)), -grep("V319.xsgRNA.raw", names(lipolytica.num)), -grep("V320.xsgRNA.raw", names(lipolytica.num)), -grep("V81.x.xsgRNA.raw", names(lipolytica.num)), -grep("V81.y.ysgRNA.raw", names(lipolytica.num)))
doench.num.df <- doench.num %>% select(-grep("V306.xsgRNA.raw", names(doench.num)), -grep("V307.xsgRNA.raw", names(doench.num)), -grep("V308.xsgRNA.raw", names(doench.num)), -grep("V309.xsgRNA.raw", names(doench.num)), -grep("V310.xsgRNA.raw", names(doench.num)), -grep("V311.xsgRNA.raw", names(doench.num)), -grep("V312.xsgRNA.raw", names(doench.num)), -grep("V313.xsgRNA.raw", names(doench.num)), -grep("V314.xsgRNA.raw", names(doench.num)), -grep("V315.xsgRNA.raw", names(doench.num)), -grep("V316.xsgRNA.raw", names(doench.num)), -grep("V317.xsgRNA.raw", names(doench.num)), -grep("V318.xsgRNA.raw", names(doench.num)), -grep("V319.xsgRNA.raw", names(doench.num)), -grep("V320.xsgRNA.raw", names(doench.num)), -grep("V81.x.xsgRNA.raw", names(doench.num)), -grep("V81.y.ysgRNA.raw", names(doench.num)))
all <- rbind(doench.num.df, lipolytica.num.df, ecoli.num.df)
ncol(all)
# 6144
nrow(all)
# 86412
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated")
write.table(all, "doench.baisya.ecoli.finalquantum.noncorrelated.features.id.score.txt", quote=F, row.names=F, sep="\t")
####### START HERE #######
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated")
all <- read.delim("doench.baisya.ecoli.finalquantum.noncorrelated.features.id.score.txt", header=T, sep="\t", stringsAsFactors = F)
write.table(all[,c(1,3:ncol(all))], "doench.baisya.ecoli.finalquantum.noncorrelated.features.txt", quote=F, row.names=F, sep="\t")
write.table(all[,c(1,3:ncol(all))], "doench.baisya.ecoli.finalquantum.noncorrelated.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(all[,3:ncol(all)], "doench.baisya.ecoli.finalquantum.noncorrelated.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(all[,1:2], "doench.baisya.ecoli.finalquantum.noncorrelated.score.txt", quote=F, row.names=F, sep="\t")
write.table(all[,1:2], "doench.baisya.ecoli.finalquantum.noncorrelated.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = all[,2]), "doench.baisya.ecoli.finalquantum.noncorrelated.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
doench.num.sample <- doench.num.df[sample(nrow(doench.num.df), 600), ]
doench.num.sample$sgRNAID <- paste0("doench_", doench.num.sample$sgRNAID)
lipolytica.num.sample <- lipolytica.num.df[sample(nrow(lipolytica.num.df), 600), ]
lipolytica.num.sample$sgRNAID <- paste0("lipolytica_", lipolytica.num.sample$sgRNAID)
ecoli.num.sample <- ecoli.num.df[sample(nrow(ecoli.num.df), 600), ]
ecoli.num.sample$sgRNAID <- paste0("ecoli_", ecoli.num.sample$sgRNAID)
all <- rbind(doench.num.sample, lipolytica.num.sample, ecoli.num.sample)
ncol(all)
# 6144
nrow(all)
# 1800
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated")
write.table(all, "sample.doench.baisya.ecoli.finalquantum.noncorrelated.features.id.score.txt", quote=F, row.names=F, sep="\t")
write.table(all[,c(1,3:ncol(all))], "sample.doench.baisya.ecoli.finalquantum.noncorrelated.features.txt", quote=F, row.names=F, sep="\t")
write.table(all[,c(1,3:ncol(all))], "sample.doench.baisya.ecoli.finalquantum.noncorrelated.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(all[,3:ncol(all)], "sample.doench.baisya.ecoli.finalquantum.noncorrelated.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(all[,1:2], "sample.doench.baisya.ecoli.finalquantum.noncorrelated.score.txt", quote=F, row.names=F, sep="\t")
write.table(all[,1:2], "sample.doench.baisya.ecoli.finalquantum.noncorrelated.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = all[,2]), "sample.doench.baisya.ecoli.finalquantum.noncorrelated.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
# run python scripts on Andes
# run job submissions on Summit
# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]
# Andes
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName doench.baisya.ecoli.finalquantum.noncorrelated --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/doench.baisya.ecoli.finalquantum.noncorrelated.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/doench.baisya.ecoli.finalquantum.noncorrelated.score.txt
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/sample
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/sample
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName sample.doench.baisya.ecoli.finalquantum.noncorrelated --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/sample.doench.baisya.ecoli.finalquantum.noncorrelated.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/sample.doench.baisya.ecoli.finalquantum.noncorrelated.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/Submits/submit_full_doench.baisya.ecoli.finalquantum.noncorrelated_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/sample/Submits/submit_full_sample.doench.baisya.ecoli.finalquantum.noncorrelated_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/Submits/submit_train_doench.baisya.ecoli.finalquantum.noncorrelated_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/sample/Submits/submit_train_sample.doench.baisya.ecoli.finalquantum.noncorrelated_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/Submits/submit_test_doench.baisya.ecoli.finalquantum.noncorrelated_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/sample/Submits/submit_test_sample.doench.baisya.ecoli.finalquantum.noncorrelated_0.sh
# Andes
module load python/3.7-anaconda3
vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/YNames.txt doench.baisya.ecoli.finalquantum.noncorrelated
# 0.31954645322248604
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/doench.baisya.ecoli.finalquantum.noncorrelated_cut.score.importance4 | head
# sgRNA.structuresgRNA.raw: 528.312
# p20basepair.Hbond.energyraw: 84.5808
# p19dimer.Hbond.stackingraw: 43.9609
# p18trimer.Hbond.stackingraw: 37.1693
# TTsgRNA.raw: 29.1179
# p18dimer.Hbond.stackingraw: 28.198
# p15tetramer.Hbond.stackingraw: 26.2502
# p11tetramer.Hbond.stackingraw: 24.5269
# p13tetramer.Hbond.stackingraw: 24.1983
# p1tetramer.Hbond.stackingraw: 23.2881
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("doench.baisya.ecoli.finalquantum.noncorrelated_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.5687324
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("doench.baisya.ecoli.finalquantum.noncorrelated_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions., method=c("pearson"))
# 0.5687324
cor(y$cut.score, pred$Predictions., method=c("spearman"))
# 0.5455017
id <- read.delim("set4_test_SampleIDs.txt", header=F, sep="\t")
colnames(id) <- "sgRNAID"
id.pred <- cbind(id, pred)
id.pred.y <- cbind(id.pred, y)
library(tidyr)
id.pred.y.group <- id.pred.y %>% separate(sgRNAID, c("sgRNA", "ID", "group"), "_")
pred.ecoli <- subset(id.pred.y.group, id.pred.y.group$group == "Cas9")
cor(pred.ecoli$cut.score, pred.ecoli$Predictions., method=c("pearson"))
# 0.4936689
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/sample
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/YNames.txt sample.doench.baisya.ecoli.finalquantum.noncorrelated
# 0.4209199735102055
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/sample.doench.baisya.ecoli.finalquantum.noncorrelated_cut.score.importance4 | head
# sgRNA.structuresgRNA.raw: 23.207
# pam.distance0: 2.61693
# p15dimer.Hbond.stackingraw: 2.50501
# GGsgRNA.raw: 1.92695
# p13tetramer.Hbond.stackingraw: 1.67273
# p20monomer.No.electronsraw: 1.51487
# p20monomer.HLgap.eVraw: 1.40704
# p19dimer.Hbond.stackingraw: 1.31469
# p17tetramer.Hbond.stackingraw: 1.21605
# sgRNA.tempsgRNA.raw: 0.876837
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/sample/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("sample.doench.baisya.ecoli.finalquantum.noncorrelated_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions., method=c("pearson"))
# 0.6068327
cor(y$cut.score, pred$Predictions., method=c("spearman"))
# 0.5982605
id <- read.delim("set4_test_SampleIDs.txt", header=F, sep="\t")
colnames(id) <- "sgRNAID"
id.pred <- cbind(id, pred)
id.pred.y <- cbind(id.pred, y)
library(tidyr)
id.pred.y.group <- id.pred.y %>% separate(sgRNAID, c("group", "ID"), "_")
pred.ecoli <- subset(id.pred.y.group, id.pred.y.group$group == "ecoli")
cor(pred.ecoli$cut.score, pred.ecoli$Predictions., method=c("pearson"))
# 0.2953007
pred.doench <- subset(id.pred.y.group, id.pred.y.group$group == "doench")
cor(pred.doench$cut.score, pred.doench$Predictions., method=c("pearson"))
# 0.6683965
pred.lipolytica <- subset(id.pred.y.group, id.pred.y.group$group == "lipolytica")
cor(pred.lipolytica$cut.score, pred.lipolytica$Predictions., method=c("pearson"))
# 0.02491944
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score doench.baisya.ecoli.finalquantum.noncorrelated
# python /gpfs/alpine/syb105/proj-shared/Personal/jromero/PathAnalysis/ritEval.py doench.baisya.ecoli.finalquantum.noncorrelated_cut.score.importance4 cut.score
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/cut.score/RIT.run
sort -k3rg doench.baisya.ecoli.finalquantum.noncorrelated_cut.score.importance4.effect | head
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
doench <- read.delim("Doench2014.finalquantum.noncorrelated.txt", header=T, sep="\t", stringsAsFactors = F)
doench.id <- separate(doench, sgRNAID, c("data", "sgRNAID"))
doench.num <- mutate_all(doench.id[,2:ncol(doench.id)], function(x) as.numeric(as.character(x)))
doench.num$cut.score <- (doench.num$cut.score - min(doench.num$cut.score)) / (max(doench.num$cut.score) - min(doench.num$cut.score))
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
lipolytica <- read.delim("y.lipolytica.finalquantum.noncorrelated.txt", header=T, sep="\t", stringsAsFactors = F)
lipolytica.num <- mutate_all(lipolytica[,1:ncol(lipolytica)], function(x) as.numeric(as.character(x)))
lipolytica.num$cut.score <- (lipolytica.num$cut.score - min(lipolytica.num$cut.score)) / (max(lipolytica.num$cut.score) - min(lipolytica.num$cut.score))
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
ecoli <- read.delim("Ecoli.finalquantum.noncorrelated.txt", header=T, sep="\t", stringsAsFactors = F)
ecoli.num <- mutate_all(ecoli[,2:ncol(ecoli)], function(x) as.numeric(as.character(x)))
ecoli.num$cut.score <- (ecoli.num$cut.score - min(ecoli.num$cut.score)) / (max(ecoli.num$cut.score) - min(ecoli.num$cut.score))
ecoli.num <- cbind(data.frame("sgRNAID" = ecoli$sgRNAID), ecoli.num)
summary(ecoli.num$cut.score)
ecoli.num.df <- ecoli.num %>% select(-grep("V306sgRNA.raw", names(ecoli.num)), -grep("V307sgRNA.raw", names(ecoli.num)), -grep("V308sgRNA.raw", names(ecoli.num)), -grep("V309sgRNA.raw", names(ecoli.num)), -grep("V310sgRNA.raw", names(ecoli.num)), -grep("V311sgRNA.raw", names(ecoli.num)), -grep("V312sgRNA.raw", names(ecoli.num)), -grep("V313sgRNA.raw", names(ecoli.num)), -grep("V314sgRNA.raw", names(ecoli.num)), -grep("V315sgRNA.raw", names(ecoli.num)), -grep("V316sgRNA.raw", names(ecoli.num)), -grep("V317sgRNA.raw", names(ecoli.num)), -grep("V318sgRNA.raw", names(ecoli.num)), -grep("V319sgRNA.raw", names(ecoli.num)), -grep("V320sgRNA.raw", names(ecoli.num)), -grep("V81sgRNA.raw", names(ecoli.num)))
lipolytica.num.df <- lipolytica.num %>% select(-grep("V306.xsgRNA.raw", names(lipolytica.num)), -grep("V307.xsgRNA.raw", names(lipolytica.num)), -grep("V308.xsgRNA.raw", names(lipolytica.num)), -grep("V309.xsgRNA.raw", names(lipolytica.num)), -grep("V310.xsgRNA.raw", names(lipolytica.num)), -grep("V311.xsgRNA.raw", names(lipolytica.num)), -grep("V312.xsgRNA.raw", names(lipolytica.num)), -grep("V313.xsgRNA.raw", names(lipolytica.num)), -grep("V314.xsgRNA.raw", names(lipolytica.num)), -grep("V315.xsgRNA.raw", names(lipolytica.num)), -grep("V316.xsgRNA.raw", names(lipolytica.num)), -grep("V317.xsgRNA.raw", names(lipolytica.num)), -grep("V318.xsgRNA.raw", names(lipolytica.num)), -grep("V319.xsgRNA.raw", names(lipolytica.num)), -grep("V320.xsgRNA.raw", names(lipolytica.num)), -grep("V81.x.xsgRNA.raw", names(lipolytica.num)), -grep("V81.y.ysgRNA.raw", names(lipolytica.num)))
doench.num.df <- doench.num %>% select(-grep("V306.xsgRNA.raw", names(doench.num)), -grep("V307.xsgRNA.raw", names(doench.num)), -grep("V308.xsgRNA.raw", names(doench.num)), -grep("V309.xsgRNA.raw", names(doench.num)), -grep("V310.xsgRNA.raw", names(doench.num)), -grep("V311.xsgRNA.raw", names(doench.num)), -grep("V312.xsgRNA.raw", names(doench.num)), -grep("V313.xsgRNA.raw", names(doench.num)), -grep("V314.xsgRNA.raw", names(doench.num)), -grep("V315.xsgRNA.raw", names(doench.num)), -grep("V316.xsgRNA.raw", names(doench.num)), -grep("V317.xsgRNA.raw", names(doench.num)), -grep("V318.xsgRNA.raw", names(doench.num)), -grep("V319.xsgRNA.raw", names(doench.num)), -grep("V320.xsgRNA.raw", names(doench.num)), -grep("V81.x.xsgRNA.raw", names(doench.num)), -grep("V81.y.ysgRNA.raw", names(doench.num)))
ecoli.num.df$species <- 1
lipolytica.num.df$species <- 2
doench.num.df$species <- 3
all <- rbind(doench.num.df, lipolytica.num.df, ecoli.num.df)
all$cut.score <- as.numeric(all$cut.score)
write.table(all[,c(1,3:ncol(all))], "doench.baisya.ecoli.finalquantum.noncorrelated.species.features.txt", quote=F, row.names=F, sep="\t")
write.table(all[,c(1,3:ncol(all))], "doench.baisya.ecoli.finalquantum.noncorrelated.species.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(all[,3:ncol(all)], "doench.baisya.ecoli.finalquantum.noncorrelated.species.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(all[,1:2], "doench.baisya.ecoli.finalquantum.noncorrelated.species.score.txt", quote=F, row.names=F, sep="\t")
write.table(all[,1:2], "doench.baisya.ecoli.finalquantum.noncorrelated.species.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = all[,2]), "doench.baisya.ecoli.finalquantum.noncorrelated.species.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
# run python scripts on Andes
# run job submissions on Summit
# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]
# Andes
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName doench.baisya.ecoli.finalquantum.noncorrelated.species --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/doench.baisya.ecoli.finalquantum.noncorrelated.species.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/doench.baisya.ecoli.finalquantum.noncorrelated.species.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species/Submits/submit_full_doench.baisya.ecoli.finalquantum.noncorrelated.species_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species/Submits/submit_train_doench.baisya.ecoli.finalquantum.noncorrelated.species_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species/Submits/submit_test_doench.baisya.ecoli.finalquantum.noncorrelated.species_0.sh
#### need to make species feature numeric??
# Andes
module load python/3.7-anaconda3
vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/YNames.txt doench.baisya.ecoli.finalquantum.noncorrelated.species
# 0.32658425662799506
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/doench.baisya.ecoli.finalquantum.noncorrelated.species_cut.score.importance4 | head
# species: 518.485
# p20basepair.Hbond.energyraw: 86.6616
# p19dimer.Hbond.stackingraw: 44.8908
# p18trimer.Hbond.stackingraw: 36.5359
# p18dimer.Hbond.stackingraw: 29.2181
# p13tetramer.Hbond.stackingraw: 28.2456
# TTsgRNA.raw: 26.6635
# p15tetramer.Hbond.stackingraw: 26.4083
# p11tetramer.Hbond.stackingraw: 24.6197
# p1tetramer.Hbond.stackingraw: 23.3195
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("doench.baisya.ecoli.finalquantum.noncorrelated.species_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.573719
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("doench.baisya.ecoli.finalquantum.noncorrelated.species_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions., method=c("pearson"))
# 0.573719
cor(y$cut.score, pred$Predictions., method=c("spearman"))
# 0.5507857
id <- read.delim("set4_test_SampleIDs.txt", header=F, sep="\t")
colnames(id) <- "sgRNAID"
id.pred <- cbind(id, pred)
id.pred.y <- cbind(id.pred, y)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
species <- read.delim("doench.baisya.ecoli.finalquantum.noncorrelated.species.features.txt", header=T, sep="\t")
species.df <- id[,c(1,6144)]
id.pred.y.species <- left_join(id.pred.y, species.df, by="sgRNAID")
library(tidyr)
pred.ecoli <- subset(id.pred.y.species, id.pred.y.species$species == 1)
# 8095
cor(pred.ecoli$cut.score, pred.ecoli$Predictions., method=c("pearson"))
# 0.4936689
pred.lipolytica <- subset(id.pred.y.species, id.pred.y.species$species == 2)
# 9108
cor(pred.lipolytica$cut.score, pred.lipolytica$Predictions., method=c("pearson"))
# 0.3320421
pred.doench <- subset(id.pred.y.species, id.pred.y.species$species == 3)
# 191
cor(pred.doench$cut.score, pred.doench$Predictions., method=c("pearson"))
# 0.5422075
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
doench <- read.delim("Doench2014.finalquantum.noncorrelated.txt", header=T, sep="\t", stringsAsFactors = F)
doench.id <- separate(doench, sgRNAID, c("data", "sgRNAID"))
doench.num <- mutate_all(doench.id[,2:ncol(doench.id)], function(x) as.numeric(as.character(x)))
doench.num$cut.score <- (doench.num$cut.score - min(doench.num$cut.score)) / (max(doench.num$cut.score) - min(doench.num$cut.score))
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
lipolytica <- read.delim("y.lipolytica.finalquantum.noncorrelated.txt", header=T, sep="\t", stringsAsFactors = F)
lipolytica.num <- mutate_all(lipolytica[,1:ncol(lipolytica)], function(x) as.numeric(as.character(x)))
lipolytica.num$cut.score <- (lipolytica.num$cut.score - min(lipolytica.num$cut.score)) / (max(lipolytica.num$cut.score) - min(lipolytica.num$cut.score))
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
ecoli <- read.delim("Ecoli.finalquantum.noncorrelated.txt", header=T, sep="\t", stringsAsFactors = F)
ecoli.num <- mutate_all(ecoli[,2:ncol(ecoli)], function(x) as.numeric(as.character(x)))
ecoli.num$cut.score <- (ecoli.num$cut.score - min(ecoli.num$cut.score)) / (max(ecoli.num$cut.score) - min(ecoli.num$cut.score))
ecoli.num <- cbind(data.frame("sgRNAID" = ecoli$sgRNAID), ecoli.num)
summary(ecoli.num$cut.score)
ecoli.num.df <- ecoli.num %>% select(-grep("V306sgRNA.raw", names(ecoli.num)), -grep("V307sgRNA.raw", names(ecoli.num)), -grep("V308sgRNA.raw", names(ecoli.num)), -grep("V309sgRNA.raw", names(ecoli.num)), -grep("V310sgRNA.raw", names(ecoli.num)), -grep("V311sgRNA.raw", names(ecoli.num)), -grep("V312sgRNA.raw", names(ecoli.num)), -grep("V313sgRNA.raw", names(ecoli.num)), -grep("V314sgRNA.raw", names(ecoli.num)), -grep("V315sgRNA.raw", names(ecoli.num)), -grep("V316sgRNA.raw", names(ecoli.num)), -grep("V317sgRNA.raw", names(ecoli.num)), -grep("V318sgRNA.raw", names(ecoli.num)), -grep("V319sgRNA.raw", names(ecoli.num)), -grep("V320sgRNA.raw", names(ecoli.num)), -grep("V81sgRNA.raw", names(ecoli.num)))
lipolytica.num.df <- lipolytica.num %>% select(-grep("V306.xsgRNA.raw", names(lipolytica.num)), -grep("V307.xsgRNA.raw", names(lipolytica.num)), -grep("V308.xsgRNA.raw", names(lipolytica.num)), -grep("V309.xsgRNA.raw", names(lipolytica.num)), -grep("V310.xsgRNA.raw", names(lipolytica.num)), -grep("V311.xsgRNA.raw", names(lipolytica.num)), -grep("V312.xsgRNA.raw", names(lipolytica.num)), -grep("V313.xsgRNA.raw", names(lipolytica.num)), -grep("V314.xsgRNA.raw", names(lipolytica.num)), -grep("V315.xsgRNA.raw", names(lipolytica.num)), -grep("V316.xsgRNA.raw", names(lipolytica.num)), -grep("V317.xsgRNA.raw", names(lipolytica.num)), -grep("V318.xsgRNA.raw", names(lipolytica.num)), -grep("V319.xsgRNA.raw", names(lipolytica.num)), -grep("V320.xsgRNA.raw", names(lipolytica.num)), -grep("V81.x.xsgRNA.raw", names(lipolytica.num)), -grep("V81.y.ysgRNA.raw", names(lipolytica.num)))
doench.num.df <- doench.num %>% select(-grep("V306.xsgRNA.raw", names(doench.num)), -grep("V307.xsgRNA.raw", names(doench.num)), -grep("V308.xsgRNA.raw", names(doench.num)), -grep("V309.xsgRNA.raw", names(doench.num)), -grep("V310.xsgRNA.raw", names(doench.num)), -grep("V311.xsgRNA.raw", names(doench.num)), -grep("V312.xsgRNA.raw", names(doench.num)), -grep("V313.xsgRNA.raw", names(doench.num)), -grep("V314.xsgRNA.raw", names(doench.num)), -grep("V315.xsgRNA.raw", names(doench.num)), -grep("V316.xsgRNA.raw", names(doench.num)), -grep("V317.xsgRNA.raw", names(doench.num)), -grep("V318.xsgRNA.raw", names(doench.num)), -grep("V319.xsgRNA.raw", names(doench.num)), -grep("V320.xsgRNA.raw", names(doench.num)), -grep("V81.x.xsgRNA.raw", names(doench.num)), -grep("V81.y.ysgRNA.raw", names(doench.num)))
ecoli.num.df$species <- 1
lipolytica.num.df$species <- 2
doench.num.df$species <- 3
doench.num.sample <- doench.num.df[sample(nrow(doench.num.df), 500), ]
lipolytica.num.sample <- lipolytica.num.df[sample(nrow(lipolytica.num.df), 500), ]
ecoli.num.sample <- ecoli.num.df[sample(nrow(ecoli.num.df), 500), ]
all <- rbind(doench.num.sample, lipolytica.num.sample, ecoli.num.sample)
all$cut.score <- as.numeric(all$cut.score)
write.table(all[,c(1,3:ncol(all))], "doench.baisya.ecoli.finalquantum.noncorrelated.species.equal.features.txt", quote=F, row.names=F, sep="\t")
write.table(all[,c(1,3:ncol(all))], "doench.baisya.ecoli.finalquantum.noncorrelated.species.equal.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(all[,3:ncol(all)], "doench.baisya.ecoli.finalquantum.noncorrelated.species.equal.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
# run python scripts on Andes
# run job submissions on Summit
# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]
# Andes
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species.equal
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species.equal
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName doench.baisya.ecoli.finalquantum.noncorrelated.species.equal --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/doench.baisya.ecoli.finalquantum.noncorrelated.species.equal.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/doench.baisya.ecoli.finalquantum.noncorrelated.species.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species.equal
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species.equal/Submits/submit_full_doench.baisya.ecoli.finalquantum.noncorrelated.species.equal_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species.equal/Submits/submit_train_doench.baisya.ecoli.finalquantum.noncorrelated.species.equal_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species.equal/Submits/submit_test_doench.baisya.ecoli.finalquantum.noncorrelated.species.equal_0.sh
#### need to make species feature numeric??
# Andes
module load python/3.7-anaconda3
vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species.equal
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/YNames.txt doench.baisya.ecoli.finalquantum.noncorrelated.species.equal
#
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/doench.baisya.ecoli.finalquantum.noncorrelated.species.equal_cut.score.importance4 | head
# sgRNA.structuresgRNA.raw: 1.03285
# p12tetramer.Hbond.stackingraw: 0.979958
# p10tetramer.Hbond.stackingraw: 0.741546
# V2358sgRNA.raw: 0.713869
# p5tetramer.Hbond.stackingraw: 0.598242
# V1103.xsgRNA.raw: 0.580338
# p3tetramer.Hlgap.eVEraw: 0.577636
# p6tetramer.Hlgap.eVEraw: 0.482052
# p15tetramer.Hlgap.eVEraw: 0.428174
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species.equal/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("doench.baisya.ecoli.finalquantum.noncorrelated.species.equal_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.2670445
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species.equal/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("doench.baisya.ecoli.finalquantum.noncorrelated.species.equal_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions., method=c("pearson"))
# 0.2670445
cor(y$cut.score, pred$Predictions., method=c("spearman"))
# 0.3238192
id <- read.delim("set4_test_SampleIDs.txt", header=F, sep="\t")
colnames(id) <- "sgRNAID"
id.pred <- cbind(id, pred)
id.pred.y <- cbind(id.pred, y)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
species <- read.delim("doench.baisya.ecoli.finalquantum.noncorrelated.species.equal.features.txt", header=T, sep="\t")
species.df <- species[,c(1,6144)]
id.pred.y.species <- left_join(id.pred.y, species.df, by="sgRNAID")
library(tidyr)
pred.ecoli <- subset(id.pred.y.species, id.pred.y.species$species == 1)
# 98
cor(pred.ecoli$cut.score, pred.ecoli$Predictions., method=c("pearson"))
# -0.02541041
pred.doench <- subset(id.pred.y.species, id.pred.y.species$species == 2)
# 97
cor(pred.doench$cut.score, pred.doench$Predictions., method=c("pearson"))
# 0.1163934
pred.lipolytica <- subset(id.pred.y.species, id.pred.y.species$species == 3)
# 105
cor(pred.lipolytica$cut.score, pred.lipolytica$Predictions., method=c("pearson"))
# -0.05232319
–> try with only h.sapien and e.coli
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
doench <- read.delim("Doench2014.finalquantum.noncorrelated.txt", header=T, sep="\t", stringsAsFactors = F)
doench.id <- separate(doench, sgRNAID, c("data", "sgRNAID"))
doench.num <- mutate_all(doench.id[,2:ncol(doench.id)], function(x) as.numeric(as.character(x)))
doench.num$cut.score <- (doench.num$cut.score - min(doench.num$cut.score)) / (max(doench.num$cut.score) - min(doench.num$cut.score))
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
ecoli <- read.delim("Ecoli.finalquantum.noncorrelated.txt", header=T, sep="\t", stringsAsFactors = F)
ecoli.num <- mutate_all(ecoli[,2:ncol(ecoli)], function(x) as.numeric(as.character(x)))
ecoli.num$cut.score <- (ecoli.num$cut.score - min(ecoli.num$cut.score)) / (max(ecoli.num$cut.score) - min(ecoli.num$cut.score))
ecoli.num <- cbind(data.frame("sgRNAID" = ecoli$sgRNAID), ecoli.num)
summary(ecoli.num$cut.score)
ecoli.num.df <- ecoli.num %>% select(-grep("V306sgRNA.raw", names(ecoli.num)), -grep("V307sgRNA.raw", names(ecoli.num)), -grep("V308sgRNA.raw", names(ecoli.num)), -grep("V309sgRNA.raw", names(ecoli.num)), -grep("V310sgRNA.raw", names(ecoli.num)), -grep("V311sgRNA.raw", names(ecoli.num)), -grep("V312sgRNA.raw", names(ecoli.num)), -grep("V313sgRNA.raw", names(ecoli.num)), -grep("V314sgRNA.raw", names(ecoli.num)), -grep("V315sgRNA.raw", names(ecoli.num)), -grep("V316sgRNA.raw", names(ecoli.num)), -grep("V317sgRNA.raw", names(ecoli.num)), -grep("V318sgRNA.raw", names(ecoli.num)), -grep("V319sgRNA.raw", names(ecoli.num)), -grep("V320sgRNA.raw", names(ecoli.num)), -grep("V81sgRNA.raw", names(ecoli.num)))
doench.num.df <- doench.num %>% select(-grep("V306.xsgRNA.raw", names(doench.num)), -grep("V307.xsgRNA.raw", names(doench.num)), -grep("V308.xsgRNA.raw", names(doench.num)), -grep("V309.xsgRNA.raw", names(doench.num)), -grep("V310.xsgRNA.raw", names(doench.num)), -grep("V311.xsgRNA.raw", names(doench.num)), -grep("V312.xsgRNA.raw", names(doench.num)), -grep("V313.xsgRNA.raw", names(doench.num)), -grep("V314.xsgRNA.raw", names(doench.num)), -grep("V315.xsgRNA.raw", names(doench.num)), -grep("V316.xsgRNA.raw", names(doench.num)), -grep("V317.xsgRNA.raw", names(doench.num)), -grep("V318.xsgRNA.raw", names(doench.num)), -grep("V319.xsgRNA.raw", names(doench.num)), -grep("V320.xsgRNA.raw", names(doench.num)), -grep("V81.x.xsgRNA.raw", names(doench.num)), -grep("V81.y.ysgRNA.raw", names(doench.num)))
ecoli.num.df$species <- 1
doench.num.df$species <- 2
doench.num.sample <- doench.num.df[sample(nrow(doench.num.df), 500), ]
ecoli.num.sample <- ecoli.num.df[sample(nrow(ecoli.num.df), 500), ]
all <- rbind(doench.num.sample, ecoli.num.sample)
all$cut.score <- as.numeric(all$cut.score)
write.table(all[,c(1,3:ncol(all))], "doench.ecoli.finalquantum.noncorrelated.species.equal.features.txt", quote=F, row.names=F, sep="\t")
write.table(all[,c(1,3:ncol(all))], "doench.ecoli.finalquantum.noncorrelated.species.equal.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(all[,3:ncol(all)], "doench.ecoli.finalquantum.noncorrelated.species.equal.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(all[,1:2], "doench.ecoli.finalquantum.noncorrelated.species.equal.score.txt", quote=F, row.names=F, sep="\t")
write.table(all[,1:2], "doench.ecoli.finalquantum.noncorrelated.species.equal.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = all[,2]), "doench.ecoli.finalquantum.noncorrelated.species.equal.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
# run python scripts on Andes
# run job submissions on Summit
# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]
# Andes
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/doench.ecoli.equal
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/doench.ecoli.equal
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName doench.ecoli.equal --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/doench.ecoli.finalquantum.noncorrelated.species.equal.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/doench.ecoli.finalquantum.noncorrelated.species.equal.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/doench.ecoli.equal
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/doench.ecoli.equal/Submits/submit_full_doench.ecoli.equal_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/doench.ecoli.equal/Submits/submit_train_doench.ecoli.equal_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/doench.ecoli.equal/Submits/submit_test_doench.ecoli.equal_0.sh
#### need to make species feature numeric??
# Andes
module load python/3.7-anaconda3
vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/doench.ecoli.equal
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/YNames.txt doench.ecoli.equal
# 0.46586381290648016
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/doench.ecoli.equal_cut.score.importance4 | head
# species: 14.6898
# sgRNA.structuresgRNA.raw: 6.48601
# p13tetramer.Hbond.stackingraw: 1.97119
# p19dimer.Hbond.stackingraw: 1.31008
# p20monomer.HLgap.eVraw: 0.975087
# p15tetramer.Hbond.stackingraw: 0.80502
# p20monomer.No.electronsraw: 0.737543
# p17tetramer.Hbond.stackingraw: 0.603806
# p6trimer.Hlgap.eVEraw: 0.449374
# p14trimer.Hbond.stackingraw: 0.442922
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/doench.ecoli.equal/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("doench.ecoli.equal_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions., method=c("pearson"))
# 0.687478
cor(y$cut.score, pred$Predictions., method=c("spearman"))
# 0.7021931
id <- read.delim("set4_test_SampleIDs.txt", header=F, sep="\t")
colnames(id) <- "sgRNAID"
id.pred <- cbind(id, pred)
id.pred.y <- cbind(id.pred, y)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
species <- read.delim("doench.ecoli.finalquantum.noncorrelated.species.equal.features.txt", header=T, sep="\t")
species.df <- species[,c(1,6144)]
id.pred.y.species <- left_join(id.pred.y, species.df, by="sgRNAID")
library(dplyr)
pred.ecoli <- subset(id.pred.y.species, id.pred.y.species$species == 1)
# 99
cor(pred.ecoli$cut.score, pred.ecoli$Predictions., method=c("pearson"))
# 0.3457917
pred.doench <- subset(id.pred.y.species, id.pred.y.species$species == 2)
# 101
cor(pred.doench$cut.score, pred.doench$Predictions., method=c("pearson"))
# 0.6503844
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED")
doench <- read.delim("Doench2014CORRECTED.Chuai2018.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)
doench.num <- mutate_all(doench[,2:ncol(doench)], function(x) as.numeric(as.character(x)))
doench.num$cut.score <- (doench.num$cut.score - min(doench.num$cut.score)) / (max(doench.num$cut.score) - min(doench.num$cut.score))
doench.num <- cbind(data.frame("sgRNAID" = doench$sgRNAID), doench.num)
summary(doench.num$cut.score)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 0.0000 0.1238 0.2058 0.2471 0.3410 1.0000
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
ecoli <- read.delim("Ecoli.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)
ecoli.num <- mutate_all(ecoli[,2:ncol(ecoli)], function(x) as.numeric(as.character(x)))
ecoli.num$cut.score <- (ecoli.num$cut.score - min(ecoli.num$cut.score)) / (max(ecoli.num$cut.score) - min(ecoli.num$cut.score))
ecoli.num <- cbind(data.frame("sgRNAID" = ecoli$sgRNAID), ecoli.num)
summary(ecoli.num$cut.score)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 0.0000 0.3563 0.5618 0.5077 0.6757 1.0000
setdiff(names(ecoli.num), names(doench.num))
setdiff(names(doench.num), names(ecoli.num))
ecoli.num.df <- ecoli.num %>% select(-grep("V306sgRNA.raw", names(ecoli.num)), -grep("V307sgRNA.raw", names(ecoli.num)), -grep("V308sgRNA.raw", names(ecoli.num)), -grep("V309sgRNA.raw", names(ecoli.num)), -grep("V310sgRNA.raw", names(ecoli.num)), -grep("V311sgRNA.raw", names(ecoli.num)), -grep("V312sgRNA.raw", names(ecoli.num)), -grep("V313sgRNA.raw", names(ecoli.num)), -grep("V314sgRNA.raw", names(ecoli.num)), -grep("V315sgRNA.raw", names(ecoli.num)), -grep("V316sgRNA.raw", names(ecoli.num)), -grep("V317sgRNA.raw", names(ecoli.num)), -grep("V318sgRNA.raw", names(ecoli.num)), -grep("V319sgRNA.raw", names(ecoli.num)), -grep("V320sgRNA.raw", names(ecoli.num)), -grep("V81sgRNA.raw", names(ecoli.num)))
doench.num.df <- doench.num %>% select(-grep("V306.xsgRNA.raw", names(doench.num)), -grep("V307.xsgRNA.raw", names(doench.num)), -grep("V308.xsgRNA.raw", names(doench.num)), -grep("V309.xsgRNA.raw", names(doench.num)), -grep("V310.xsgRNA.raw", names(doench.num)), -grep("V311.xsgRNA.raw", names(doench.num)), -grep("V312.xsgRNA.raw", names(doench.num)), -grep("V313.xsgRNA.raw", names(doench.num)), -grep("V314.xsgRNA.raw", names(doench.num)), -grep("V315.xsgRNA.raw", names(doench.num)), -grep("V316.xsgRNA.raw", names(doench.num)), -grep("V317.xsgRNA.raw", names(doench.num)), -grep("V318.xsgRNA.raw", names(doench.num)), -grep("V319.xsgRNA.raw", names(doench.num)), -grep("V320.xsgRNA.raw", names(doench.num)), -grep("V81.x.xsgRNA.raw", names(doench.num)), -grep("V81.y.ysgRNA.raw", names(doench.num)))
ecoli.num.df$species <- 1
doench.num.df$species <- 2
doench.num.sample <- doench.num.df[sample(nrow(doench.num.df), 15000), ]
ecoli.num.sample <- ecoli.num.df[sample(nrow(ecoli.num.df), 15000), ]
all <- rbind(doench.num.sample, ecoli.num.sample)
all$cut.score <- as.numeric(all$cut.score)
write.table(all[,c(1,3:ncol(all))], "hsapien.ecoli.finalquantum.noncorrelated.species.features.txt", quote=F, row.names=F, sep="\t")
write.table(all[,c(1,3:ncol(all))], "hsapien.ecoli.finalquantum.noncorrelated.species.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(all[,3:ncol(all)], "hsapien.ecoli.finalquantum.noncorrelated.species.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(all[,1:2], "hsapien.ecoli.finalquantum.noncorrelated.species.score.txt", quote=F, row.names=F, sep="\t")
write.table(all[,1:2], "hsapien.ecoli.finalquantum.noncorrelated.species.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = all[,2]), "hsapien.ecoli.finalquantum.noncorrelated.species.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
# run python scripts on Andes
# run job submissions on Summit
# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]
# Andes
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName hsapien.ecoli --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/hsapien.ecoli.finalquantum.noncorrelated.species.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/hsapien.ecoli.finalquantum.noncorrelated.species.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli/Submits/submit_full_hsapien.ecoli_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli/Submits/submit_train_hsapien.ecoli_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli/Submits/submit_test_hsapien.ecoli_0.sh
#### need to make species feature numeric??
# Andes
module load python/3.7-anaconda3
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/YNames.txt hsapien.icoli
#
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/hsapien.ecoli_cut.score.importance4 | head
# species: 403.04
# p20basepair.Hbond.energyraw: 15.5363
# p20basepair.Hlgap.eVEraw: 14.0459
# p18trimer.Hbond.energyraw: 13.7647
# p18monomer.No.electronsraw: 13.7079
# p19dimer.HLgap.eVEraw: 11.9187
# p1tetramer.Hbond.energyraw: 11.1035
# p11tetramer.Hbond.energyraw: 10.1444
# p17tetramer.Hbond.stackingraw: 8.17368
# p19dimer.Hbond.stackingraw: 8.04079
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("hsapien.ecoli_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions., method=c("pearson"))
# 0.6972761
cor(y$cut.score, pred$Predictions., method=c("spearman"))
# 0.6865943
id <- read.delim("set4_test_SampleIDs.txt", header=F, sep="\t")
colnames(id) <- "sgRNAID"
id.pred <- cbind(id, pred)
id.pred.y <- cbind(id.pred, y)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
species <- read.delim("hsapien.ecoli.finalquantum.noncorrelated.species.features.txt", header=T, sep="\t")
species.df <- species[,c(1,6218)]
id.pred.y.species <- left_join(id.pred.y, species.df, by="sgRNAID")
library(dplyr)
pred.ecoli <- subset(id.pred.y.species, id.pred.y.species$species == 1)
# 2999
cor(pred.ecoli$cut.score, pred.ecoli$Predictions., method=c("pearson"))
# 0.5042479
pred.doench <- subset(id.pred.y.species, id.pred.y.species$species == 2)
# 3001
cor(pred.doench$cut.score, pred.doench$Predictions., method=c("pearson"))
# 0.4909198
library(ggplot2)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
pdf("hsapien.ecoli.finalquantum.species.correlation.pdf")
ggplot(id.pred.y.species, aes(x=cut.score, y=Predictions., color=species)) + geom_point() + theme_classic() + geom_smooth(method='lm')
dev.off()
pdf("hsapien.ecoli.finalquantum.species1.correlation.pdf")
ggplot(pred.ecoli, aes(x=cut.score, y=Predictions., color=species)) + geom_point() + theme_classic() + geom_smooth(method='lm')
dev.off()
pdf("hsapien.ecoli.finalquantum.species2.correlation.pdf")
ggplot(pred.doench, aes(x=cut.score, y=Predictions., color=species)) + geom_point() + theme_classic() + geom_smooth(method='lm')
dev.off()
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score hsapien.ecoli
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli/cut.score/RIT.run
# species cut.score 0.4406124933876367 -2.8183428048041573e-05 30000.0 0.5000142472067156
# p20basepair.Hbond.energyraw cut.score 0.019823692363659887 -4.240483571948726e-06 8765.894 0.5327902858841508
# p20basepair.Hlgap.eVEraw cut.score 0.01445105897913007 2.21313949456048e-06 6433.343 0.4819049333913702
# p18monomer.No.electronsraw cut.score 0.013388172085470917 -7.504843419688948e-07 16044.203 0.4364978373568982
# p18trimer.Hbond.energyraw cut.score 0.012540958441570604 -1.0643703405694754e-07 12592.634 0.3618404018991294
# p19dimer.Hbond.stackingraw cut.score 0.011375769441069144 -8.87869358943278e-08 10569.825 0.369172303679948
# p19dimer.HLgap.eVEraw cut.score 0.011323883334471187 2.125886575572766e-08 9245.464 0.4125654176032728
# p1tetramer.Hbond.energyraw cut.score 0.010794039709261714 1.114493561333129e-07 8173.915 0.4253986571396801
# p11tetramer.Hbond.energyraw cut.score 0.010290398567884208 -1.164100814647948e-07 7445.994 0.4031929448565761
# p17tetramer.Hbond.energyraw cut.score 0.008051417943653588 -3.3000720861262885e-08 9328.267 0.32286973343101755
library(ggplot2)
library(reshape2)
library(RColorBrewer)
library(dplyr)
## Main H.sapien feature figure
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli/cut.score")
imp <- read.delim("hsapien.ecoli.importance4.effect_sorted", header=F, sep="\t", stringsAsFactors = F)
colnames(imp) <- c("Feature", "YVec", "NormEdge", "Effect", "Samples", "Linearity")
imp$Normalized.Importance <- as.numeric(imp$NormEdge)
imp$Feature.Effect <- as.numeric(imp$Effect)
imp$SampleCount <- as.numeric(imp$Samples)
imp.dir <- imp %>% mutate(Effect.Direction = ifelse(Feature.Effect < 0, "neg", ifelse(Feature.Effect > 0, "pos", "zero")))
imp.dir.top20 <- imp.dir[1:20,]
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
imp.dir.top20.df <- imp.dir.top20 %>% mutate(imp.dir = ifelse(Effect.Direction == "neg", Normalized.Importance*-1, Normalized.Importance))
imp.dir.top20.df$Feature.Label <- c("Species", "Basepair H-bond pos20", "Basepair HL-gap pos20", "Monomer # of Electrons pos18", "Trimer H-bond pos18", "Dimer H-stacking pos19", "Dimer HL-gap pos19", "Tetramer H-bond pos1", "Tetramer H-bond pos11", "Tetramer H-bond pos17", "Tetramer H-bond pos14", "Tetramer H-bond pos13", "Tetramer H-stacking pos17", "Tetramer H-bond pos15", "Tetramer H-bond pos10", "Tetramer H-stacking pos16", "Tetramer H-stacking pos15", "Trimer H-bond pos15", "Tetramer HL-gap pos7", "Tetramer HL-gap pos3")
library(ggplot2)
pdf("hsapien.ecoli.FeatureEngineering.pdf")
ggplot(imp.dir.top20.df, aes(x=reorder(Feature.Label, Normalized.Importance), y=imp.dir, color=Effect.Direction)) + geom_point(size=3) + geom_segment(aes(x=Feature.Label, xend=Feature.Label, y=0, yend=imp.dir)) + labs(title="Multi-species model Top Features") + ylab("Normalized Importance") + xlab("") + theme(axis.text.x = element_text(angle=90, vjust=0.6)) + scale_fill_brewer(palette="Set1") + theme_classic() + coord_flip()
dev.off()
imp.dir.top20.nospecies <- imp.dir.top20.df[2:20,]
imp.dir.top20.nospecies$Feature.Label <- c("Basepair H-bond pos20", "Basepair HL-gap pos20", "Monomer # of Electrons pos18", "Trimer H-bond pos18", "Dimer H-stacking pos19", "Dimer HL-gap pos19", "Tetramer H-bond pos1", "Tetramer H-bond pos11", "Tetramer H-bond pos17", "Tetramer H-bond pos14", "Tetramer H-bond pos13", "Tetramer H-stacking pos17", "Tetramer H-bond pos15", "Tetramer H-bond pos10", "Tetramer H-stacking pos16", "Tetramer H-stacking pos15", "Trimer H-bond pos15", "Tetramer HL-gap pos7", "Tetramer HL-gap pos3")
pdf("hsapien.ecoli.FeatureEngineering.minusspecies.pdf")
ggplot(imp.dir.top20.nospecies, aes(x=reorder(Feature.Label, Normalized.Importance), y=imp.dir, color=Effect.Direction)) + geom_point(size=3) + geom_segment(aes(x=Feature.Label, xend=Feature.Label, y=0, yend=imp.dir)) + labs(title="Multi-species model Top Features [Minus Species]") + ylab("Normalized Importance") + xlab("") + theme(axis.text.x = element_text(angle=90, vjust=0.6)) + scale_fill_brewer(palette="Set1") + theme_classic() + coord_flip()
dev.off()
–> remove species from feature list and re-run
#R
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
df <- read.delim("hsapien.ecoli.finalquantum.noncorrelated.species.features.txt", header=T, sep="\t", stringsAsFactors = F)
df.nospecies <- df[,c(1,2:6217)]
write.table(df.nospecies, "hsapien.ecoli.finalquantum.noncorrelated.nospecies.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.nospecies, "hsapien.ecoli.finalquantum.noncorrelated.nospecies.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.nospecies[,2:ncol(df.nospecies)], "hsapien.ecoli.finalquantum.noncorrelated.nospecies.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
score <- read.delim("hsapien.ecoli.finalquantum.noncorrelated.species.score.txt", header=T, sep="\t", stringsAsFactors = F)
library(ggplot2)
library(dplyr)
score.species <- left_join(score, df[,c(1,6218)], by="sgRNAID")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
score.species.df <- score.species %>% mutate(species.id = ifelse(species == 1, "E.coli", ifelse(species == 2, "H.sapien", "Unknown")))
pdf("ecoli.hsapien.multispecies.score.violin.pdf")
ggplot(score.species.df) + geom_violin(aes(x=species.id, y=cut.score)) + theme_classic()
dev.off()
# p20basepair.Hbond.energyraw
# p20basepair.Hlgap.eVEraw
# p18monomer.No.electronsraw
# p18trimer.Hbond.energyraw
# p19dimer.Hbond.stackingraw
df.species <- df %>% select(grep("sgRNAID", names(df)), grep("p20basepair.Hbond.energyraw", names(df)), grep("p20basepair.Hlgap.eVEraw", names(df)), grep("p18monomer.No.electronsraw", names(df)), grep("p18trimer.Hbond.energyraw", names(df)), grep("p19dimer.Hbond.stackingraw", names(df)), grep("species", names(df)))
df.species.id <- df.species %>% mutate(species.id = ifelse(species == 1, "E.coli", ifelse(species == 2, "H.sapien", "Unknown")))
df.species.melt <- melt(df.species.id[,c(1:6,8)])
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
pdf("ecoli.hsapien.multispecies.featuredistribution.violin.pdf")
ggplot(df.species.melt) + geom_violin(aes(x=species.id, y=value, fill=species.id)) + theme_classic() + facet_grid(. ~ variable)
dev.off()
pdf("ecoli.hsapien.multispecies.featuredistribution.boxplot.pdf")
ggplot(df.species.melt) + geom_boxplot(aes(x=species.id, y=value, fill=species.id)) + theme_classic() + facet_grid(. ~ variable)
dev.off()
df.species.score <- left_join(score, df.species.id, by="sgRNAID")
df.species.score.melt <- melt(df.species.score[,c(1:7,9)], id=c("sgRNAID", "species.id", "cut.score"))
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
pdf("ecoli.hsapien.multispecies.topfeature.score.scatter.pdf")
ggplot(df.species.score.melt) + geom_point(aes(x=value, y=cut.score, color=species.id)) + theme_classic() + facet_grid(. ~ variable)
dev.off()
pdf("ecoli.hsapien.multispecies.p20basepair.Hbond.score.scatter.pdf")
ggplot(df.species.score) + geom_point(aes(x=p20basepair.Hbond.energyraw, y=cut.score, color=species.id)) + theme_classic()
dev.off()
pdf("ecoli.hsapien.multispecies.p20basepair.Hbond.violinr.pdf")
df.species.score.melt <- melt(df.species.score[,c(1:3,9)], id=c("sgRNAID", "species.id"))
ggplot(df.species.score.melt) + geom_violin(aes(x=species.id, y=value, color=species.id)) + theme_classic() + facet_grid(. ~ variable)
dev.off()
pdf("ecoli.hsapien.multispecies.p20basepair.Hbond.density.pdf")
df.species.score.factor <- df.species.score %>% mutate(p20bp.Hbond = ifelse(p20basepair.Hbond.energyraw == 27.11950899, "high", ifelse(p20basepair.Hbond.energyraw == 8.563800343, "low", "NA")))
ggplot(df.species.score.factor) + geom_density(aes(x=cut.score, color=p20bp.Hbond)) + theme_classic() + facet_grid(. ~ species.id)
dev.off()
pdf("ecoli.hsapien.multispecies.p20basepair.Hlgap.density.pdf")
df.species.score.factor <- df.species.score %>% mutate(p20bp.Hlgap = ifelse(p20basepair.Hlgap.eVEraw == 3.284, "high", ifelse(p20basepair.Hlgap.eVEraw == 3.161, "low", "NA")))
ggplot(df.species.score.factor) + geom_density(aes(x=cut.score, color=p20bp.Hlgap)) + theme_classic() + facet_grid(. ~ species.id)
dev.off()
pdf("ecoli.hsapien.multispecies.p18monomer.electrons.density.pdf")
df.species.score.factor <- df.species.score %>% mutate(p18monomer = ifelse(p18monomer.No.electronsraw > 49, "high", ifelse(p18monomer.No.electronsraw < 49, "low", "NA")))
ggplot(df.species.score.factor) + geom_density(aes(x=cut.score, color=p18monomer)) + theme_classic() + facet_grid(. ~ species.id)
dev.off()
pdf("ecoli.hsapien.multispecies.p18trimer.Hbond.density.pdf")
df.species.score.factor <- df.species.score %>% mutate(p18trimer = ifelse(p18trimer.Hbond.energyraw <= 63, "first.quarter", ifelse(p18trimer.Hbond.energyraw <= 77, "second.quarter", ifelse(p18trimer.Hbond.energyraw <= 84, "third.quarter", "fourth.quarter"))))
ggplot(df.species.score.factor) + geom_density(aes(x=cut.score, color=p18trimer)) + theme_classic() + facet_grid(. ~ species.id)
dev.off()
# Andes
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli.nospecies
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli.nospecies
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName hsapien.ecoli.nospecies --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/hsapien.ecoli.finalquantum.noncorrelated.nospecies.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/hsapien.ecoli.finalquantum.noncorrelated.species.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli.nospecies
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli.nospecies/Submits/submit_full_hsapien.ecoli.nospecies_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli.nospecies/Submits/submit_train_hsapien.ecoli.nospecies_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli.nospecies/Submits/submit_test_hsapien.ecoli.nospecies_0.sh
#### need to make species feature numeric??
# Andes
module load python/3.7-anaconda3
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli.nospecies
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/YNames.txt hsapien.ecoli.nospecies
# 0.4531928800756321
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/hsapien.ecoli.nospecies_cut.score.importance4 | head
# sgRNA.structuresgRNA.raw: 410.319
# p20basepair.Hlgap.eVEraw: 15.0505
# p18trimer.Hbond.energyraw: 14.6217
# p20basepair.Hbond.energyraw: 14.1491
# p18monomer.No.electronsraw: 12.331
# p19dimer.HLgap.eVEraw: 12.2639
# p1tetramer.Hbond.energyraw: 10.8594
# p11tetramer.Hbond.energyraw: 9.91986
# p17tetramer.Hbond.stackingraw: 8.06495
# p13tetramer.Hbond.energyraw: 7.77037
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli.nospecies/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("hsapien.ecoli.nospecies_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions., method=c("pearson"))
# 0.6954827
cor(y$cut.score, pred$Predictions., method=c("spearman"))
# 0.684502
id <- read.delim("set4_test_SampleIDs.txt", header=F, sep="\t")
colnames(id) <- "sgRNAID"
id.pred <- cbind(id, pred)
id.pred.y <- cbind(id.pred, y)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
species <- read.delim("hsapien.ecoli.finalquantum.noncorrelated.species.features.txt", header=T, sep="\t")
species.df <- species[,c(1,6218)]
id.pred.y.species <- left_join(id.pred.y, species.df, by="sgRNAID")
library(dplyr)
pred.ecoli <- subset(id.pred.y.species, id.pred.y.species$species == 1)
cor(pred.ecoli$cut.score, pred.ecoli$Predictions., method=c("pearson"))
# 0.4987557
pred.doench <- subset(id.pred.y.species, id.pred.y.species$species == 2)
cor(pred.doench$cut.score, pred.doench$Predictions., method=c("pearson"))
# 0.4875551
Datasets received from Andrew on March 8, 2022 including 2 biological replicates for Cas9 (6hr and 17hr), delta-Cas9 (17hr), and libraries
raw counts, read frequency, and DE-seq output
150000 total guides (1500 non-targeting)
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/putida/Lib1_Cas9_library_database.csv")
id <- read.delim("Lib1_Cas9_library_database.csv", header=T, sep=",", stringsAsFactors = F)
data <- read.delim("deseq2_lib_vs_cas_tf.csv", header=T, sep=",", stringsAsFactors = F)
library(tidyverse)
library(dplyr)
id$sgRNAID <- str_extract_all(id$gRNA, "[A-Z]+")
colnames(id) <- c("gRNA", "seq", "nucleotide.sequence")
data.id <- left_join(data, id[,c(1,3)], by="gRNA")
df <- data.id[,c(1,3,8)]
colnames(df) <- c("sgRNAID", "cut.score", "nucleotide.sequence")
df.mat <- as.matrix(df)
df.na <- na.omit(df.mat)
write.table(df.na, "putida.txt", quote=F, row.names=F, sep="\t")
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida
# scp /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/putida/putida.txt noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/.
# scp /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/putida/GCF_000412675.1_ASM41267v1_genomic.fna noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/.
# scp /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/putida/GCF_000412675.1_ASM41267v1_genomic.gff noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/.
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/
sed '1d' putida.txt | awk '{print ">"$1"\n"$3}' > putida.fasta
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
## blast
# conda install blast
# cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes
# wget https://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/ncbi-blast-2.11.0+-x64-linux.tar.gz
# tar zxvpf ncbi-blast-2.11.0+-x64-linux.tar.gz
# export PATH=$PATH:$HOME/ncbi-blast-2.10.1+/bin
# echo $PATH
# mkdir $HOME/blastdb
# export BLASTDB=$HOME/blastdb
# set BLASTDB=$HOME/blastdb
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/makeblastdb -in GCF_000412675.1_ASM41267v1_genomic.fna -dbtype nucl
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query putida.fasta -db GCF_000412675.1_ASM41267v1_genomic.fna -out putida.gRNA.blast.tab -outfmt 6 -evalue 0.0005 -task blastn -num_threads 10
awk '{if ($9 > $10) print $2"\t"$10"\t"$9"\t"$1}' putida.gRNA.blast.tab > tmp1.bed
awk '{if ($10 > $9) print $2"\t"$9"\t"$10"\t"$1}' putida.gRNA.blast.tab > tmp2.bed
cat tmp1.bed tmp2.bed > putida.gRNA.blast.bed
## not capturing all of the guides... only 28971... why??
#/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query putida.fasta -db GCF_000412675.1_ASM41267v1_genomic.fna -out putida.gRNA.blast2.tab -outfmt 6 -evalue 0.001 -task blastn -num_threads 10
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query putida.fasta -db GCF_000412675.1_ASM41267v1_genomic.fna -out putida.gRNA.blast.tab -outfmt 6 -task blastn -num_threads 10
awk '{if ($9 > $10) print $2"\t"$10"\t"$9"\t"$1}' putida.gRNA.blast.tab > tmp1.bed
awk '{if ($10 > $9) print $2"\t"$9"\t"$10"\t"$1}' putida.gRNA.blast.tab > tmp2.bed
cat tmp1.bed tmp2.bed > putida.gRNA.blast.bed
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
# R
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/")
df <- read.delim("putida.txt", header=T, sep="\t")
colnames(df) <- c("sgRNAID", "nucleotide.sequence", "cut.score")
coord <- read.delim("putida.gRNA.blast.bed", header=F, sep="\t")
colnames(coord) <- c("chr", "start", "end", "sgRNA")
df$sgRNA <- df$sgRNAID
library(dplyr)
df.coord <- left_join(coord, df, by="sgRNA")
write.table(df.coord, "putida.sgRNA.coord.txt", quote=F, row.names=F, sep="\t")
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida
faidx GCF_000412675.1_ASM41267v1_genomic.fna -i chromsizes > putida.sizes.genome
bedtools makewindows -g putida.sizes.genome -w 20 -s 1 > putida.20bp.sliding.bed
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida
## genes
grep 'gene' GCF_000412675.1_ASM41267v1_genomic.gff | sort -k 1,1 -k 4,4n > GCF_000412675.1_ASM41267v1_genomic.sort.gff
bedtools intersect -wo -a putida.20bp.sliding.bed -b GCF_000412675.1_ASM41267v1_genomic.sort.gff > putida.gene.20sliding.bed
## GC content
bedtools nuc -fi GCF_000412675.1_ASM41267v1_genomic.fna -bed putida.20bp.sliding.bed | sed '1d' > putida.GC.20sliding.bed
https://biopython.org/docs/1.75/api/Bio.SeqUtils.MeltingTemp.html
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
Bio.SeqUtils.MeltingTemp.Tm_NN(seq, check=True, strict=True, c_seq=None, shift=0, nn_table=None, tmm_table=None, imm_table=None, de_table=None, dnac1=25, dnac2=25, selfcomp=False, Na=50, K=0, Tris=0, Mg=0, dNTPs=0, saltcorr=5)
https://warwick.ac.uk/fac/sci/moac/people/students/peter_cock/python/fasta_n
# summit: # conda install -c conda-forge biopython
### sgRNA
# count nucleotides
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida
python3
input_file = open('putida.fasta', 'r')
output_file = open('nucleotide_counts_sgRNA.tsv','w')
output_file.write('Window\tA\tC\tG\tT\tLength\tCG%\n')
from Bio import SeqIO
for cur_record in SeqIO.parse(input_file, "fasta") :
gene_name = cur_record.name
A_count = cur_record.seq.count('A')
C_count = cur_record.seq.count('C')
G_count = cur_record.seq.count('G')
T_count = cur_record.seq.count('T')
length = len(cur_record.seq)
cg_percentage = float(C_count + G_count) / length
output_line = '%s\t%i\t%i\t%i\t%i\t%i\t%f\n' % \
(gene_name, A_count, C_count, G_count, T_count, length, cg_percentage)
output_file.write(output_line)
output_file.close()
input_file.close()
exit()
# Melting temperature(°C) = 64.9 + 41 * (nG+nC-16.4)/(nA+nT+nG+nC)
R
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
df <- read.delim("nucleotide_counts_sgRNA.tsv", header=T, sep="\t")
df.melt <- df %>% mutate(MeltingTemp = 64.9 + 41 * (G+C-16.4) / (A+T+G+C))
write.table(df.melt, "putida.nucleotide_counts_sgRNA_temp.txt", quote=F, row.names=F, sep="\t")
q()
### 20bp sliding windows
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida
bedtools getfasta -fi GCF_000412675.1_ASM41267v1_genomic.fna -bed putida.20bp.sliding.bed -fo putida.20sliding.fa
# count nucleotides
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida
python3
input_file = open('putida.20sliding.fa', 'r')
output_file = open('nucleotide_counts_20sliding.tsv','w')
output_file.write('Window\tA\tC\tG\tT\tLength\tCG%\n')
from Bio import SeqIO
for cur_record in SeqIO.parse(input_file, "fasta") :
gene_name = cur_record.name
A_count = cur_record.seq.count('A')
C_count = cur_record.seq.count('C')
G_count = cur_record.seq.count('G')
T_count = cur_record.seq.count('T')
length = len(cur_record.seq)
cg_percentage = float(C_count + G_count) / length
output_line = '%s\t%i\t%i\t%i\t%i\t%i\t%f\n' % \
(gene_name, A_count, C_count, G_count, T_count, length, cg_percentage)
output_file.write(output_line)
output_file.close()
input_file.close()
exit()
# Melting temperature(°C) = 64.9 + 41 * (nG+nC-16.4)/(nA+nT+nG+nC)
R
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
df <- read.delim("nucleotide_counts_20sliding.tsv", header=T, sep="\t")
df.melt <- df %>% mutate(MeltingTemp = 64.9 + 41 * (G+C-16.4) / (A+T+G+C))
write.table(df.melt, "putida.nucleotide_counts_20sliding_temp.txt", quote=F, row.names=F, sep="\t")
q()
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/
cut -f 1,3 putida.txt > putida.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/encode_sequences.py putida.noscore.txt
# separate nucleotide sequence values into individual columns in data frame so each position counts as one feature
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/
sed '1d' putida.noscore_independent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID A C T G' | cut -d ' ' -f 1-5 > putida_ind1.txt
sed '1d' putida.noscore_independent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID AA AC AT AG CA CC CT CG TA TC TT TG GA GC GT GG' | cut -d ' ' -f 1-17 > putida_ind2.txt
sed '1d' putida.noscore_dependent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID p1.A p1.C p1.T p1.G p2.A p2.C p2.T p2.G p3.A p3.C p3.T p3.G p4.A p4.C p4.T p4.G p5.A p5.C p5.T p5.G p6.A p6.C p6.T p6.G p7.A p7.C p7.T p7.G p8.A p8.C p8.T p8.G p9.A p9.C p9.T p9.G p10.A p10.C p10.T p10.G p11.A p11.C p11.T p11.G p12.A p12.C p12.T p12.G p13.A p13.C p13.T p13.G p14.A p14.C p14.T p14.G p15.A p15.C p15.T p15.G p16.A p16.C p16.T p16.G p17.A p17.C p17.T p17.G p18.A p18.C p18.T p18.G p19.A p19.C p19.T p19.G p20.A p20.C p20.T p20.G' | cut -d ' ' -f 1-81 > putida_dep1.txt
sed '1d' putida.noscore_dependent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID p1.AA p1.AC p1.AT p1.AG p1.CA p1.CC p1.CT p1.CG p1.TA p1.TC p1.TT p1.TG p1.GA p1.GC p1.GT p1.GG p2.AA p2.AC p2.AT p2.AG p2.CA p2.CC p2.CT p2.CG p2.TA p2.TC p2.TT p2.TG p2.GA p2.GC p2.GT p2.GG p3.AA p3.AC p3.AT p3.AG p3.CA p3.CC p3.CT p3.CG p3.TA p3.TC p3.TT p3.TG p3.GA p3.GC p3.GT p3.GG p4.AA p4.AC p4.AT p4.AG p4.CA p4.CC p4.CT p4.CG p4.TA p4.TC p4.TT p4.TG p4.GA p4.GC p4.GT p4.GG p5.AA p5.AC p5.AT p5.AG p5.CA p5.CC p5.CT p5.CG p5.TA p5.TC p5.TT p5.TG p5.GA p5.GC p5.GT p5.GG p6.AA p6.AC p6.AT p6.AG p6.CA p6.CC p6.CT p6.CG p6.TA p6.TC p6.TT p6.TG p6.GA p6.GC p6.GT p6.GG p7.AA p7.AC p7.AT p7.AG p7.CA p7.CC p7.CT p7.CG p7.TA p7.TC p7.TT p7.TG p7.GA p7.GC p7.GT p7.GG p8.AA p8.AC p8.AT p8.AG p8.CA p8.CC p8.CT p8.CG p8.TA p8.TC p8.TT p8.TG p8.GA p8.GC p8.GT p8.GG p9.AA p9.AC p9.AT p9.AG p9.CA p9.CC p9.CT p9.CG p9.TA p9.TC p9.TT p9.TG p9.GA p9.GC p9.GT p9.GG p10.AA p10.AC p10.AT p10.AG p10.CA p10.CC p10.CT p10.CG p10.TA p10.TC p10.TT p10.TG p10.GA p10.GC p10.GT p10.GG p11.AA p11.AC p11.AT p11.AG p11.CA p11.CC p11.CT p11.CG p11.TA p11.TC p11.TT p11.TG p11.GA p11.GC p11.GT p11.GG p12.AA p12.AC p12.AT p12.AG p12.CA p12.CC p12.CT p12.CG p12.TA p12.TC p12.TT p12.TG p12.GA p12.GC p12.GT p12.GG p13.AA p13.AC p13.AT p13.AG p13.CA p13.CC p13.CT p13.CG p13.TA p13.TC p13.TT p13.TG p13.GA p13.GC p13.GT p13.GG p14.AA p14.AC p14.AT p14.AG p14.CA p14.CC p14.CT p14.CG p14.TA p14.TC p14.TT p14.TG p14.GA p14.GC p14.GT p14.GG p15.AA p15.AC p15.AT p15.AG p15.CA p15.CC p15.CT p15.CG p15.TA p15.TC p15.TT p15.TG p15.GA p15.GC p15.GT p15.GG p16.AA p16.AC p16.AT p16.AG p16.CA p16.CC p16.CT p16.CG p16.TA p16.TC p16.TT p16.TG p16.GA p16.GC p16.GT p16.GG p17.AA p17.AC p17.AT p17.AG p17.CA p17.CC p17.CT p17.CG p17.TA p17.TC p17.TT p17.TG p17.GA p17.GC p17.GT p17.GG p18.AA p18.AC p18.AT p18.AG p18.CA p18.CC p18.CT p18.CG p18.TA p18.TC p18.TT p18.TG p18.GA p18.GC p18.GT p18.GG p19.AA p19.AC p19.AT p19.AG p19.CA p19.CC p19.CT p19.CG p19.TA p19.TC p19.TT p19.TG p19.GA p19.GC p19.GT p19.GG p20.AA p20.AC p20.AT p20.AG p20.CA p20.CC p20.CT p20.CG p20.TA p20.TC p20.TT p20.TG p20.GA p20.GC p20.GT p20.GG' | cut -d ' ' -f 1-321 > putida_dep2.txt
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/
sed '1d' putida.noscore.txt | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 p16 p17 p18 p19 p20' | cut -d ' ' -f 1-21 > putida.sequence.txt
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("protein_rna_dna-vector_lee_nucleotide_dna_data.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/")
seq <- read.delim("putida.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:5]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- c("A", "C", "G", "T")
rownames(seq) <- seq[,1]
seq.df <- seq[,2:21]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
write.table(seq.tensor.melt, "putida.tensors.melt.txt", quote=F, row.names=F, sep="\t")
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "putida.tensors.txt", quote=F, row.names=F, sep="\t")
https://www.tbi.univie.ac.at/RNA/tutorial/ minimum free energy (MFE) structure
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate ViennaRNA
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/vienna
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/vienna
RNAfold < ../putida.fasta > putida.gRNA.ViennaRNA.output.txt
grep '(' putida.gRNA.ViennaRNA.output.txt | grep -Eo '[+-]?[0-9]+([.][0-9]+)?' > putida.gRNA.ViennaRNA.output.value.txt
grep '>' putida.gRNA.ViennaRNA.output.txt | sed 's/>//g' > putida.gRNA.names.txt
paste putida.gRNA.names.txt putida.gRNA.ViennaRNA.output.value.txt > putida.gRNA.ViennaRNA.output.value.id.txt
cp putida.gRNA.ViennaRNA.output.value.id.txt ../.
# 20bp sliding fasta
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/vienna
RNAfold < ../putida.20sliding.fa > putida.20sliding.ViennaRNA.output.txt
grep '(' putida.20sliding.ViennaRNA.output.txt | grep -Eo '[+-]?[0-9]+([.][0-9]+)?' > putida.20sliding.ViennaRNA.output.value.txt
grep '>' putida.20sliding.ViennaRNA.output.txt | sed 's/>//g' > putida.20sliding.names.txt
paste putida.20sliding.names.txt putida.20sliding.ViennaRNA.output.value.txt > putida.20sliding.ViennaRNA.output.value.id.txt
cp putida.20sliding.ViennaRNA.output.value.id.txt ../.
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J ViennaRNA.ylipolytica
#SBATCH -N 2
#SBATCH -t 48:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate ViennaRNA
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/vienna
RNAfold < ../putida.20sliding.fa > putida.20sliding.ViennaRNA.output.txt
grep '(' putida.20sliding.ViennaRNA.output.txt | grep -Eo '[+-]?[0-9]+([.][0-9]+)?' > putida.20sliding.ViennaRNA.output.value.txt
grep '>' putida.20sliding.ViennaRNA.output.txt | sed 's/>//g' > putida.20sliding.names.txt
paste putida.20sliding.names.txt putida.20sliding.ViennaRNA.output.value.txt > putida.20sliding.ViennaRNA.output.value.id.txt
cp putida.20sliding.ViennaRNA.output.value.id.txt ../.
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/ViennaRNA.putida.sh
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
## GATC motif
## fastaregex
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f GCF_000412675.1_ASM41267v1_genomic.fna -r 'GATC' > putida.gatc.bed
bedtools intersect -wo -a putida.20bp.sliding.bed -b putida.gatc.bed > putida.gatc.20sliding.bed
https://www.synthego.com/guide/how-to-use-crispr/pam-sequence
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
# generate fastq file of NGG sequences and blast to reference
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida
cut -f 1-4 putida.sgRNA.coord.txt | sed '1d' | sort -k 1,1 -k 2,2n > putida.sgRNA.coord.bed
# vim NGG.PAM.fasta
## fastaRegexFinder
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f GCF_000412675.1_ASM41267v1_genomic.fna -r 'AGG' > putida.AGG.PAM.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f GCF_000412675.1_ASM41267v1_genomic.fna -r 'TGG' > putida.TGG.PAM.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f GCF_000412675.1_ASM41267v1_genomic.fna -r 'CGG' > putida.CGG.PAM.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f GCF_000412675.1_ASM41267v1_genomic.fna -r 'GGG' > putida.GGG.PAM.txt
cat putida.AGG.PAM.txt putida.TGG.PAM.txt putida.CGG.PAM.txt putida.GGG.PAM.txt > putida.NGG.PAM.txt
sort -k 1,1 -k 2,2n putida.NGG.PAM.txt > putida.NGG.PAM.sorted.bed
# intersect with sliding windows in the genome to get density for DWT
bedtools intersect -wo -a putida.20bp.sliding.bed -b putida.NGG.PAM.sorted.bed > putida.NGG.PAM.20bp.sliding.windows.bed
# closest with gRNAs to identify distance (downstream, strand)
awk '{print $0"\t""+"}' putida.sgRNA.coord.bed > putida.sgRNA.coord.strand.txt
bedtools closest -a putida.sgRNA.coord.strand.txt -b putida.NGG.PAM.sorted.bed -io -iu -D a > putida.sgRNA.closestPAM.bed
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida
awk '{if ($3 == "gene") print $0}' GCF_000412675.1_ASM41267v1_genomic.gff | sort -k 1,1 -k 4,4n > GCF_000412675.1_ASM41267v1_genomic.gene.gff
bedtools closest -a putida.sgRNA.coord.bed -b GCF_000412675.1_ASM41267v1_genomic.gene.gff -D b > putida.sgRNA.gene.closest.bed
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
structure <- read.delim("putida.gRNA.ViennaRNA.output.value.id.txt", header=F, sep="\t", stringsAsFactors = F)
nuc <- read.delim("putida.nucleotide_counts_sgRNA_temp.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("putida.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- unique(score[,c(5:6)])
colnames(score.df) <- c("sgRNAID", "cut.score")
structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]
# strucutre, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])
structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"
structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]
structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score.df, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")
## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
onehot.ind1 <- read.delim("putida_ind1.txt", header=T, sep=" ")
onehot.ind2 <- read.delim("putida_ind2.txt", header=T, sep=" ")
onehot.dep1 <- read.delim("putida_dep1.txt", header=T, sep=" ")
onehot.dep2 <- read.delim("putida_dep2.txt", header=T, sep=" ")
onehot.dep2 <- onehot.dep2[,1:305]
onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep <- full_join(onehot.dep1, onehot.dep2, by="sgRNAID")
onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"
data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))
df.melt <- melt(data.onehot, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
write.table(df.id, "df.id.test.txt", quote=F, row.names=F, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
tensor <- read.delim("putida.tensors.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0
tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0
write.table(tensor.id, "tensor.id.test", quote=F, row.names=F, sep="\t")
# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
df.id <- read.delim("df.id.test.txt", header=T, sep="\t")
tensor.id <- read.delim("tensor.id.test", header=T, sep="\t")
df.score <- unique(df.id[,c(1,3)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
head(df.id)
head(tensor.score.order)
tensor.df <- rbind(df.id, tensor.score.order)
write.table(tensor.df, "putida.raw.onehot.tensor.txt", quote=F, row.names=F, sep="\t")
df.dcast <- tensor.df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
write.table(df.dcast, "putida.raw.onehot.tensor.dcast.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast)
# 149437
df.dcast.na <- na.omit(df.dcast)
write.table(df.dcast.na, "putida.raw.onehot.tensor.dcast.na.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast.na)
# 149437
# pam (distance and nucleotide)
# setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
# sgRNA.pam <- read.table("putida.sgRNA.closestPAM.bed", header=F, sep="\t", stringsAsFactors = F)
# sgRNA.pam.sub <- sgRNA.pam[,c(4,12,13)]
# colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
# sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
# sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]
#
# score.location <- left_join(score.df, sgRNA.pam.df, by=c("sgRNAID"))
# score.location$scale <- 0
#
# df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
# df <- na.omit(df.melt)
# colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
#
# df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
# df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
# df.dcast.na <- na.omit(df.dcast)
# # 27345
# write.table(df.dcast.na, "putida.sgRNA.pam.dcast.txt", quote=F, row.names=F, sep="\t")
#
# setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
# df.dcast <- read.delim("putida.sgRNA.pam.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
# df <- read.delim("putida.raw.onehot.tensor.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
#
# df.location <- left_join(df, df.dcast, by=c("sgRNAID"))
# nrow(df.location)
# # 27363
#
# write.table(df.location, "putida.raw.onehot.tensor.pam.dcast.na.txt", quote=F, row.names=F, sep="\t")
# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
sgRNA.genes <- read.table("putida.sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- unique(sgRNA.genes[,c(4,14)])
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")
score.location <- left_join(score.df, sgRNA.genes.df, by=c("sgRNAID"))
score.location$scale <- 0
df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
# 148591
write.table(df.dcast.na, "putida.sgRNA.location.dcast.txt", quote=F, row.names=F, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
df.dcast.na <- read.delim("putida.sgRNA.location.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
#df <- read.delim("putida.raw.onehot.tensor.pam.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df <- read.delim("putida.raw.onehot.tensor.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
nrow(df.location)
# 148591
write.table(df.location, "putida.raw.onehot.tensor.pam.location.dcast.na.txt", quote=F, row.names=F, sep="\t")
# positional encoding kmers 1-4
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer1_positional_encode.py putida.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer2_positional_encode.py putida.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer3_positional_encode.py putida.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer4_positional_encode.py putida.noscore.txt
# separate nucleotide sequence values into individual columns in data frame so each position counts as one feature
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/
sed '1d' putida.noscore_dependent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > putida_dep1.txt
sed '1d' putida.noscore_dependent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > putida_dep2.txt
sed '1d' putida.noscore_dependent3.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > putida_dep3.txt
sed '1d' putida.noscore_dependent4.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > putida_dep4.txt
# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("protein_rna_dna-vector_lee_nucleotide_dna_data_15dec.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/")
seq <- read.delim("putida.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:5]
tensor.t <- as.data.frame(t(tensor.df[63:70,]))
tensor.t$base <- c("A", "C", "G", "T")
rownames(seq) <- seq[,1]
seq.df <- seq[,2:21]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "putida.tensors.single.bp.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "putida.tensors.single.bp.melt.txt", quote=F, row.names=F, sep="\t")
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J jan18.matrix
#SBATCH -N 4
#SBATCH -t 10:00:00
module load python
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida
R CMD BATCH mar8.matrix.R
#sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/mar8.matrix.sh
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
structure <- read.delim("putida.gRNA.ViennaRNA.output.value.id.txt", header=F, sep="\t", stringsAsFactors = F)
nuc <- read.delim("putida.nucleotide_counts_sgRNA_temp.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("putida.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- unique(score[,c(5:6)])
colnames(score.df) <- c("sgRNAID", "cut.score")
structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]
# strucutre, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])
structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"
structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]
structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score.df, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")
## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
onehot.ind1 <- read.delim("putida_ind1.txt", header=T, sep=" ")
onehot.ind2 <- read.delim("putida_ind2.txt", header=T, sep=" ")
onehot.dep1 <- read.delim("putida_dep1.txt", header=F, sep=" ")
onehot.dep2 <- read.delim("putida_dep2.txt", header=F, sep=" ")
onehot.dep3 <- read.delim("putida_dep3.txt", header=F, sep=" ")
onehot.dep4 <- read.delim("putida_dep4.txt", header=F, sep=" ")
colnames(onehot.dep1)[1] <- "sgRNAID"
colnames(onehot.dep2)[1] <- "sgRNAID"
colnames(onehot.dep3)[1] <- "sgRNAID"
colnames(onehot.dep4)[1] <- "sgRNAID"
onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)-1], onehot.dep2[,1:ncol(onehot.dep2)-1], by="sgRNAID")
onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)-1], by="sgRNAID")
onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)-1], by="sgRNAID")
onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"
data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))
df.melt <- melt(data.onehot, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
write.table(df.id, "putida.structure.temp.gc.onehot1to4.txt", quote=F, row.names=F, sep="\t")
# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
tensor <- read.delim("putida.tensors.single.bp.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0
tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0
write.table(tensor.id, "tensor.id.test", quote=F, row.names=F, sep="\t")
library(dplyr)
library(reshape2)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
tensor.id <- read.delim("tensor.id.test", header=T, sep="\t")
df.id <- read.delim("putida.structure.temp.gc.onehot1to4.txt", header=T, sep="\t")
score <- read.delim("putida.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- unique(score[,c(5:6)])
colnames(score.df) <- c("sgRNAID", "cut.score")
#df.score <- unique(df.id[,c(1,3)])
tensor.score <- inner_join(tensor.id, score.df, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
#head(df.id)
#head(tensor.score.order)
tensor.df <- rbind(df.id, tensor.score.order)
df.dcast <- tensor.df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
write.table(df.dcast, "putida.raw.onehot.tensor.single.bp.dcast.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast.na)
# 96002
nrow(df.dcast)
# 149625
# pam (distance and nucleotide)
# setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
# sgRNA.pam <- read.table("putida.sgRNA.closestPAM.bed", header=F, sep="\t", stringsAsFactors = F)
# sgRNA.pam.sub <- sgRNA.pam[,c(4,12,13)]
# colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
# sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
# sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]
# #sgRNA.pam.df$id <- "Cas9"
# #sgRNA.pam.id <- unite(sgRNA.pam.df, "sgRNAID", c(sgRNAID, id), sep="_")
#
# score <- read.delim("putida.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
# score.df <- score[,c(5:6)]
# colnames(score.df) <- c("sgRNAID", "cut.score")
#
# score.location <- left_join(score.df, sgRNA.pam.df, by="sgRNAID")
# score.location$scale <- 0
#
# df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
# df <- na.omit(df.melt)
# colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
#
# df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
# df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
# df.dcast.na <- na.omit(df.dcast)
#
# df <- read.delim("putida.raw.onehot.tensor.single.bp.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
#
# df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
# nrow(df.location)
#
# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
sgRNA.genes <- read.table("putida.sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- sgRNA.genes[,c(4,14)]
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")
#sgRNA.genes.df$id <- "Cas9"
#sgRNA.genes.id <- unite(sgRNA.genes.df, "sgRNAID", c(sgRNAID, id), sep="_")
score.location <- left_join(score.df, sgRNA.genes.df, by=c("sgRNAID"))
score.location$scale <- 0
df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
#df <- df.location
df <- read.delim("putida.raw.onehot.tensor.single.bp.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
nrow(df.location)
# 148591
write.table(df.location, "putida.raw.onehot.tensor.single.bp.pam.location.dcast.na.txt", quote=F, row.names=F, sep="\t")
# add new DNA/RNA dimer features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(tidyr)
library(reshape2)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("quantum_dimers_20dec.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
seq <- read.delim("putida.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.dimer <- seq %>% unite("p1", p1:p2, remove=F, sep= "") %>% unite("p2", p2:p3, remove=F, sep= "") %>% unite("p3", p3:p4, remove=F, sep= "") %>% unite("p4", p4:p5, remove=F, sep= "") %>% unite("p5", p5:p6, remove=F, sep= "") %>% unite("p6", p6:p7, remove=F, sep= "") %>% unite("p7", p7:p8, remove=F, sep= "") %>% unite("p8", p8:p9, remove=F, sep= "") %>% unite("p9", p9:p10, remove=F, sep= "") %>% unite("p10", p10:p11, remove=F, sep= "") %>% unite("p11", p11:p12, remove=F, sep= "") %>% unite("p12", p12:p13, remove=F, sep= "") %>% unite("p13", p13:p14, remove=F, sep= "") %>% unite("p14", p14:p15, remove=F, sep= "") %>% unite("p15", p15:p16, remove=F, sep= "") %>% unite("p16", p16:p17, remove=F, sep= "") %>% unite("p17", p17:p18, remove=F, sep= "") %>% unite("p18", p18:p19, remove=F, sep= "") %>% unite("p19", p19:p20, remove=T, sep= "")
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:17]
tensor.t <- as.data.frame(t(tensor.df))
#tensor.t$base <- c("A", "C", "G", "T")
tensor.t$base <- names(tensor[,2:17])
rownames(seq) <- seq.dimer[,1]
seq.df <- seq.dimer[,2:20]
seq.melt <- melt(seq.dimer, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "putida.tensors.dimers.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "putida.tensors.dimers.melt.txt", quote=F, row.names=F, sep="\t")
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
df <- read.delim("putida.raw.onehot.tensor.single.bp.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
tensor <- read.delim("putida.tensors.dimers.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0
tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0
df.score <- unique(df[,c(1,2)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
colnames(tensor.score.order) <- c("cut.score", "feature.scale", "sgRNAID", "value")
df.dcast <- tensor.score.order %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
write.table(df.dcast.na, "putida.raw.onehot.tensor.single.bp.dimers.dcast.na.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast.na)
# 27345
df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
write.table(df.location, "putida.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.txt", quote=F, row.names=F, sep="\t")
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
df <- read.delim("putida.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df <- df[,c(1:6073,6075:6079,6081,6083:6177)]
df.num <- mutate_all(df[,2:ncol(df)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"
write.table(df.all, "putida.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "putida.raw.onehot.tensor.single.bp.dimers.pam.location.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "putida.raw.onehot.tensor.single.bp.dimers.pam.location.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "putida.raw.onehot.tensor.single.bp.dimers.pam.location.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "putida.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "putida.raw.onehot.tensor.single.bp.dimers.pam.location.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.all[,2]), "putida.raw.onehot.tensor.single.bp.dimers.pam.location.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
df <- read.delim("putida.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df <- df[,c(1:6072,6074:6078,6080,6082:6176)]
df.num <- mutate_all(df[,2:ncol(df)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"
df.abs <- df.all %>% select(grep("bondraw", names(df.all))) %>% abs()
df.all.sub <- df.all %>% select(-grep("bondraw", names(df.all)))
df.abs.all <- cbind(df.all.sub, df.abs)
df.abs.all2 <- df.abs.all %>% select(-grep("cut.score.x", names(df.abs.all))) %>% select(-grep("cut.score.y", names(df.abs.all))) %>% select(-grep("cut.score.y.y", names(df.abs.all)))
df.abs.all3 <- df.abs.all2 %>% select(-grep("cut.score.y.y", names(df.abs.all2)))
write.table(df.abs.all3, "putida.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.txt", quote=F, row.names=F, sep="\t")
df.minusHL <- df.abs.all3 %>% select(-grep("HL", names(df.abs.all3)), -grep("HOMO", names(df.abs.all3)), -grep("LUMO", names(df.abs.all3)))
# 5991 features
write.table(df.minusHL, "putida.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.txt", quote=F, row.names=F, sep="\t")
write.table(df.minusHL[,c(1,3:ncol(df.minusHL))], "putida.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.minusHL[,c(1,3:ncol(df.minusHL))], "putida.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.minusHL[,3:ncol(df.minusHL)], "putida.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.minusHL[,c(1:2)], "putida.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.minusHL[,c(1:2)], "putida.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame(cut.score = df.minusHL[,2]), "putida.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
# run python scripts on Andes
# run job submissions on Summit
# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]
# Andes
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/noHL
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/noHL
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName noHL --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/putida.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/putida.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/noHL
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/noHL/Submits/submit_full_noHL_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/noHL/Submits/submit_train_noHL_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/noHL/Submits/submit_test_noHL_0.sh
# Andes
module load python/3.7-anaconda3
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/noHL
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/YNames.txt noHL
# 0.2564941121557385
sort -k3rg topVarEdges/cut.score_top95.txt | head
# sgRNA.structuresgRNA.raw cut.score 0.2676690008756795
# sgRNA.gcsgRNA.raw cut.score 0.04810165968914591
# sgRNA.tempsgRNA.raw cut.score 0.045682202832182765
# V4087sgRNA.raw cut.score 0.028748615100844917 <-- p16.GGCC
# GGsgRNA.raw cut.score 0.028323663567439376
# pam.distance0 cut.score 0.018764516297564895
# V4343sgRNA.raw cut.score 0.017372547699812658 <-- p17.GGCC
# V4312sgRNA.raw cut.score 0.014410503258427519 <-- p17.GCCT
# p13dimer_H_bondraw cut.score 0.014158699462834634
# p11dimer_H_bondraw cut.score 0.011419832884509533
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/noHL/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("noHL_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.5316399
** Need to compile the C++ file /gpfs/alpine/syb105/proj-shared/Personal/jromero/codesnippets/ritw **
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/noHL/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score noHL
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/noHL/cut.score/RIT.run
# sgRNA.structuresgRNA.raw cut.score 0.2676690008756795 -0.017026901627220408 70802.78 -0.4160079400908415
# sgRNA.gcsgRNA.raw cut.score 0.04810165968914591 0.009099651611727953 30993.431 -1.2138644207966751
# sgRNA.tempsgRNA.raw cut.score 0.045682202832182765 0.009947162040008827 29374.496 -1.201684899390678
# V4087sgRNA.raw cut.score 0.028748615100844917 0.018105634883326425 24193.116 -1.020549171992924
# GGsgRNA.raw cut.score 0.028323663567439376 0.007262151543367551 20337.142 -1.1698805330346533
# pam.distance0 cut.score 0.018764516297564895 0.0022152333872347044 12920.775 -1.3359447252109964
# V4343sgRNA.raw cut.score 0.017372547699812658 0.015554623512120214 16979.811 -0.9183917130469017
# V4312sgRNA.raw cut.score 0.014410503258427519 0.011493453405739885 16654.431 -1.2281994074729736
# p13dimer_H_bondraw cut.score 0.014158699462834634 0.005819836565006202 6787.281 -0.9978164560081595
# p11dimer_H_bondraw cut.score 0.011419832884509533 0.005690488639647928 4136.188 -1.1907651715285015
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate shap
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida
# python
import pandas as pd
import numpy as np
np.random.seed(0)
import matplotlib.pyplot as plt
df = pd.read_table('/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/putida.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.txt') # Load the data
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.ensemble import RandomForestRegressor
# The target variable is 'cut.score'.
Y = df['cut.score']
# get list of features from R... dput(colnames(df))
X = df.drop(columns =['sgRNAID', 'cut.score'])
# Split the data into train and test data:
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size = 0.2)
# Build the model with the random forest regression algorithm:
model = RandomForestRegressor(max_depth=6,random_state=0,n_estimators=10)
model.fit(X_train, Y_train)
import shap
shap_values = shap.TreeExplainer(model).shap_values(X_train)
f = plt.figure()
shap.summary_plot(shap_values, X_train, plot_type="bar")
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/putida.raw.onehot.tensor.single.bp.dimers.pam.location.shap_summary_plot_bar.png", bbox_inches='tight', dpi=600)
import matplotlib.pyplot as plt
f = plt.figure()
shap.summary_plot(shap_values, X_train)
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/putida.onehot.tensor.single.bp.dimers.pam.location.shap_summary_plot_varimp.png", bbox_inches='tight', dpi=600)
# scp noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/putida.onehot.tensor.single.bp.dimers.pam.location.shap_summary_plot_varimp.png /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP/.
#!/bin/bash -l
#BSUB -P SYB105
#BSUB -W 02:15
#BSUB -nnodes 50
#BSUB -J putida.test_0
#BSUB -o putida.test_0.o%J
#BSUB -e putida.test_0.e%J
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/ecoli.model.test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/ecoli.model.test
/usr/bin/time -f "%e" jsrun -n 1 -a 1 -c 40 -bpacked:40 /gpfs/alpine/syb105/proj-shared/Projects/iRF/IterativeRanger/cpp_version/build/ranger --file /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/putida.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.features_overlap_noSampleIDs.txt --yfile /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/putida.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.score_overlap_noSampleIDs.txt --predict /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/cut.score/noHL_cut.score.forest --treetype 3 --depvarname cut.score --impmeasure 1 --nthreads 160 --useMPI 0 --outprefix ecoli.model.putida.test --outputDirectory /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/ecoli.model.test > /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/ecoli.model.test/ecoli.model.putida.test.o
# bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/ecoli.model.putida.test.sh
#### test the output
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/")
score <- read.delim("putida.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.features_overlap_noSampleIDs.txt", header=T, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/ecoli.model.test/")
predict <- read.delim("ecoli.model.putida.test.prediction", header=T, sep="\t")
score.predict <- cbind(score, predict)
cor(score.predict$cut.score, score.predict$Predictions.)
#-
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J putida.matrix
#SBATCH -N 1
#SBATCH -t 10:00:00
#SBATCH -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida
R CMD BATCH mar15.matrix.R
#sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/mar15.matrix.sh
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
structure <- read.delim("putida.gRNA.ViennaRNA.output.value.id.txt", header=F, sep="\t", stringsAsFactors = F)
nuc <- read.delim("putida.nucleotide_counts_sgRNA_temp.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("putida.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(5:7)]
colnames(score.df) <- c("sgRNAID", "cut.score", "nucleotide.sequence")
structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]
# structure, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])
structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"
structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]
structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score.df, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "seq", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")
## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
onehot.ind1 <- read.delim("putida_ind1.txt", header=T, sep=" ")
onehot.ind2 <- read.delim("putida_ind2.txt", header=T, sep=" ")
onehot.dep1 <- read.delim("putida_dep1.txt", header=F, sep=" ")
onehot.dep2 <- read.delim("putida_dep2.txt", header=F, sep=" ")
onehot.dep3 <- read.delim("putida_dep3.txt", header=F, sep=" ")
onehot.dep4 <- read.delim("putida_dep4.txt", header=F, sep=" ")
colnames(onehot.dep1)[1] <- "sgRNAID"
colnames(onehot.dep2)[1] <- "sgRNAID"
colnames(onehot.dep3)[1] <- "sgRNAID"
colnames(onehot.dep4)[1] <- "sgRNAID"
onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)-1], onehot.dep2[,1:ncol(onehot.dep2)-1], by="sgRNAID")
onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)-1], by="sgRNAID")
onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)-1], by="sgRNAID")
onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"
data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))
df.melt <- melt(data.onehot[,c(1,2,4:ncol(data.onehot))], id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
write.table(df.id, "putida.structure.temp.gc.onehot1to4.txt", quote=F, row.names=F, sep="\t")
#
# pam (distance and nucleotide)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
sgRNA.pam <- read.table("putida.sgRNA.closestPAM.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.pam.sub <- sgRNA.pam[,c(4,12,13)]
colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]
#sgRNA.pam.df$id <- "Cas9"
#sgRNA.pam.id <- unite(sgRNA.pam.df, "sgRNAID", c(sgRNAID, id), sep="_")
sgRNA.pam.id <- sgRNA.pam.df
score <- read.delim("putida.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(5:6)]
colnames(score.df) <- c("sgRNAID", "cut.score")
score.location <- left_join(score.df, sgRNA.pam.id, by="sgRNAID")
score.location$scale <- 0
df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.pam.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df <- read.delim("putida.structure.temp.gc.onehot1to4.txt", header=T, sep="\t")
df.onehot.dcast <- df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.onehot.pam <- left_join(df.onehot.dcast, df.pam.dcast, by=c("sgRNAID"))
df.onehot.pam.na <- na.omit(df.onehot.pam)
nrow(df.onehot.pam.na)
#
# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
sgRNA.genes <- read.table("putida.sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- sgRNA.genes[,c(4,14)]
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")
sgRNA.genes.id <- sgRNA.genes.df
score.location <- left_join(score.df, sgRNA.genes.id, by=c("sgRNAID"))
score.location$scale <- 0
df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.location.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.location.dcast.na <- na.omit(df.location.dcast)
df.pam.location <- inner_join(df.location.dcast.na, df.onehot.pam.na, by=c("sgRNAID"))
nrow(df.pam.location)
#
write.table(df.pam.location, "putida.raw.matrix.txt", quote=F, row.names=F, sep="\t")
# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
# Monomer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Monomer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/")
seq <- read.delim("putida.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq) <- seq[,1]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "putida.quantum.monomer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "putida.quantum.monomer.melt.txt", quote=F, row.names=F, sep="\t")
# Basepair
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Basepair.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/")
seq <- read.delim("putida.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq) <- seq[,1]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "putida.quantum.basepair.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "putida.quantum.basepair.melt.txt", quote=F, row.names=F, sep="\t")
# Dimer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Dimer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/")
seq <- read.delim("putida.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.dimer <- seq %>% unite("p1", p1:p2, remove=F, sep= "") %>% unite("p2", p2:p3, remove=F, sep= "") %>% unite("p3", p3:p4, remove=F, sep= "") %>% unite("p4", p4:p5, remove=F, sep= "") %>% unite("p5", p5:p6, remove=F, sep= "") %>% unite("p6", p6:p7, remove=F, sep= "") %>% unite("p7", p7:p8, remove=F, sep= "") %>% unite("p8", p8:p9, remove=F, sep= "") %>% unite("p9", p9:p10, remove=F, sep= "") %>% unite("p10", p10:p11, remove=F, sep= "") %>% unite("p11", p11:p12, remove=F, sep= "") %>% unite("p12", p12:p13, remove=F, sep= "") %>% unite("p13", p13:p14, remove=F, sep= "") %>% unite("p14", p14:p15, remove=F, sep= "") %>% unite("p15", p15:p16, remove=F, sep= "") %>% unite("p16", p16:p17, remove=F, sep= "") %>% unite("p17", p17:p18, remove=F, sep= "") %>% unite("p18", p18:p19, remove=F, sep= "") %>% unite("p19", p19:p20, remove=T, sep= "")
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq.dimer) <- seq.dimer[,1]
seq.df <- seq.dimer[,1:20]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "putida.quantum.dimer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "putida.quantum.dimer.melt.txt", quote=F, row.names=F, sep="\t")
# Trimer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Trimer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/")
seq <- read.delim("putida.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.trimer <- seq %>% unite("p1", p1:p3, remove=F, sep= "") %>% unite("p2", p2:p4, remove=F, sep= "") %>% unite("p3", p3:p5, remove=F, sep= "") %>% unite("p4", p4:p6, remove=F, sep= "") %>% unite("p5", p5:p7, remove=F, sep= "") %>% unite("p6", p6:p8, remove=F, sep= "") %>% unite("p7", p7:p9, remove=F, sep= "") %>% unite("p8", p8:p10, remove=F, sep= "") %>% unite("p9", p9:p11, remove=F, sep= "") %>% unite("p10", p10:p12, remove=F, sep= "") %>% unite("p11", p11:p13, remove=F, sep= "") %>% unite("p12", p12:p14, remove=F, sep= "") %>% unite("p13", p13:p15, remove=F, sep= "") %>% unite("p14", p14:p16, remove=F, sep= "") %>% unite("p15", p15:p17, remove=F, sep= "") %>% unite("p16", p16:p18, remove=F, sep= "") %>% unite("p17", p17:p19, remove=F, sep= "") %>% unite("p18", p18:p20, remove=F, sep= "")
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq.trimer) <- seq.trimer[,1]
seq.df <- seq.trimer[,1:19]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "putida.quantum.trimer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "putida.quantum.trimer.melt.txt", quote=F, row.names=F, sep="\t")
# Tetramer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Tetramer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/")
seq <- read.delim("putida.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.tetramer <- seq %>% unite("p1", p1:p4, remove=F, sep= "") %>% unite("p2", p2:p5, remove=F, sep= "") %>% unite("p3", p3:p6, remove=F, sep= "") %>% unite("p4", p4:p7, remove=F, sep= "") %>% unite("p5", p5:p8, remove=F, sep= "") %>% unite("p6", p6:p9, remove=F, sep= "") %>% unite("p7", p7:p10, remove=F, sep= "") %>% unite("p8", p8:p11, remove=F, sep= "") %>% unite("p9", p9:p12, remove=F, sep= "") %>% unite("p10", p10:p13, remove=F, sep= "") %>% unite("p11", p11:p14, remove=F, sep= "") %>% unite("p12", p12:p15, remove=F, sep= "") %>% unite("p13", p13:p16, remove=F, sep= "") %>% unite("p14", p14:p17, remove=F, sep= "") %>% unite("p15", p15:p18, remove=F, sep= "") %>% unite("p16", p16:p19, remove=F, sep= "") %>% unite("p17", p17:p20, remove=F, sep= "")
tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])
rownames(seq.tetramer) <- seq.tetramer[,1]
seq.df <- seq.tetramer[,1:18]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")
seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "putida.quantum.tetramer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "putida.quantum.tetramer.melt.txt", quote=F, row.names=F, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/")
monomer <- read.delim("putida.quantum.monomer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
basepair <- read.delim("putida.quantum.basepair.melt.txt", header=T, sep="\t", stringsAsFactors = F)
dimer <- read.delim("putida.quantum.dimer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
trimer <- read.delim("putida.quantum.trimer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
tetramer <- read.delim("putida.quantum.tetramer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
monomer.basepair <- rbind(monomer, basepair)
monomer.basepair.dimer <- rbind(monomer.basepair, dimer)
monomer.basepair.dimer.trimer <- rbind(monomer.basepair.dimer, trimer)
monomer.basepair.dimer.trimer.tetramer <- rbind(monomer.basepair.dimer.trimer, tetramer)
write.table(monomer.basepair.dimer.trimer.tetramer, "putida.15mar22.quantum.melt.txt", quote=F, row.names=F, sep="\t")
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R
library(dplyr)
library(reshape2)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
df <- read.delim("putida.raw.matrix.txt", header=T, sep="\t", stringsAsFactors = F)
# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
tensor <- read.delim("putida.15mar22.quantum.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0
tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0
df.score <- unique(df[,c(1,2)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
colnames(tensor.score.order) <- c("cut.score", "feature.scale", "sgRNAID", "value")
df.dcast <- tensor.score.order %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
nrow(df.dcast.na)
# 27345
df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
write.table(df.location, "putida.finalquantum.txt", quote=F, row.names=F, sep="\t")
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
df <- read.delim("putida.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df.cut <- df %>% select(-grep("cut.score.y.y", names(df)), -grep("cut.score.y", names(df)), -grep("cut.score.x.x", names(df)))
df.num <- mutate_all(df.cut[,2:ncol(df.cut)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df.cut[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"
write.table(df.all, "putida.finalquantum.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "putida.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "putida.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "putida.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "putida.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "putida.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.all[,2]), "putida.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
# run python scripts on Andes
# run job submissions on Summit
# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]
# Andes
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName putida.finalquantum --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/putida.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/putida.finalquantum.score.txt
# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum
module load python/3.7.0-anaconda3-5.3.0
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum/Submits/submit_full_putida.finalquantum_0.sh
# train
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum/Submits/submit_train_putida.finalquantum_0.sh
# once the train submissions are done run the test submissions
# test
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum/Submits/submit_test_putida.finalquantum_0.sh
# Andes
module load python/3.7-anaconda3
vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/YNames.txt putida.finalquantum
# 0.2497100288038237
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/putida.finalquantum_cut.score.importance4 | head
# sgRNA.structuresgRNA.raw: 22369.2
# sgRNA.tempsgRNA.raw: 4369.7
# sgRNA.gcsgRNA.raw: 3938.45
# V4087sgRNA.raw: 2280.42
# GGsgRNA.raw: 1981.13
# p11tetramer.Hbond.energyraw: 1816.39
# V4343sgRNA.raw: 1530.88
# p13tetramer.Hbond.energyraw: 1508.66
# p17tetramer.Hlgap.eVEraw: 1142.82
# p7tetramer.Hbond.energyraw: 1131.84
# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("putida.finalquantum_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.5215119
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0
#SBATCH -p gpu
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score putida.finalquantum
# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum/cut.score/RIT.run
# sgRNA.structuresgRNA.raw cut.score 0.23350755818800148 -0.003092209374979174 62056.212 -0.7346448352143302
# sgRNA.gcsgRNA.raw cut.score 0.04289250555917792 0.0036930635871796277 28767.83 -1.2115198992908018
# sgRNA.tempsgRNA.raw cut.score 0.03905222514980143 0.0038157350050903737 26623.295 -0.9897214396709659
# V4087sgRNA.raw cut.score 0.025792547076915376 0.010823889309003597 23344.273 -1.06989025798672
# p11tetramer.Hbond.energyraw cut.score 0.019625679832632137 0.001709177788478303 8774.493 -0.9065756264264376
# GGsgRNA.raw cut.score 0.01851639846123979 0.002251535744183914 14006.253 -0.821462737942859
# V4343sgRNA.raw cut.score 0.015418165378914536 0.007039166873850421 14269.541 -0.9449242024857153
# p7tetramer.Hbond.energyraw cut.score 0.013642788627073781 -0.0007493166151327098 7707.85 -1.1241960466240646
# p13tetramer.Hbond.energyraw cut.score 0.013434957932850747 0.002999438556682073 6821.335 -1.2045936711515992
# p17tetramer.Hlgap.eVEraw cut.score 0.01259884826856768 -3.894005154055647e-05 6126.836 -1.0959472886330697
library(ggplot2)
library(reshape2)
library(RColorBrewer)
# Figure 3A
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum/cut.score")
imp <- read.delim("putida.finalquantum.importance4.effect_sorted", header=F, sep="\t", stringsAsFactors = F)
colnames(imp) <- c("Feature", "YVec", "NormEdge", "Effect", "Samples", "Linearity")
imp$Normalized.Importance <- as.numeric(imp$NormEdge)
imp$Feature.Effect <- as.numeric(imp$Effect)
imp$SampleCount <- as.numeric(imp$Samples)
imp.dir <- imp %>% mutate(Effect.Direction = ifelse(Feature.Effect < 0, "neg", ifelse(Feature.Effect > 0, "pos", "zero")))
imp.dir.top20 <- imp.dir[1:20,]
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
pdf("putida.Imp.Dir.Top20.21March.pdf")
ggplot(imp.dir.top20) + geom_bar(aes(x=reorder(Feature, -Normalized.Importance), y=Normalized.Importance, fill=Effect.Direction), stat="identity") + theme_classic() + xlab("putida Top Features") + ylab("Normalized Importance") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_fill_brewer(palette="Set1")
dev.off()
pdf("putida.Imp.Dir.Top20.Effect.21March.pdf")
imp.dir.top20$Sample.Prop <- imp.dir.top20$SampleCount/32374
ggplot(imp.dir.top20, aes(x=reorder(Feature, -Normalized.Importance))) + geom_point(aes(y=Sample.Prop, color=Effect.Direction, size=Feature.Effect)) + xlab("putida") + ylab("Avg Proportion of Samples that Features Influence") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Set2")
dev.off()
Putida & E.coli - look into top features in both models - distribution of feature values - generate multi-species model - distribution of cutting efficiency scores
** Work in Jupyter notebook with r4environment connection [/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/bacterial.sgRNA.iRF.ipynb]
https://academic.oup.com/nar/article/46/14/7052/5047272#120184448 https://www.biorxiv.org/content/10.1101/2021.09.14.460134v1.full.pdf https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0227994 https://www.science.org/lookup/doi/10.1126/science.aad5227 https://github.com/bm2-lab/iGWOS https://github.com/maximilianh/crisporWebsite http://www.ams.sunysb.edu/~pfkuan/predictSGRNA/demopredictSGRNA_1.0.1.pdf https://www.chemistryworld.com/news/machine-learning-accurately-predicts-rna-structures-using-tiny-dataset/4014347.article https://www.biorxiv.org/content/10.1101/605790v1.full https://science.sciencemag.org/content/373/6558/964.full https://www.nature.com/articles/s41467-021-23576-0 https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-3395-z https://www.future-science.com/doi/full/10.2144/btn-2018-0187 https://www.nature.com/articles/s41467-021-23576-0 https://onlinelibrary.wiley.com/doi/epdf/10.1002/advs.201902312 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5795621/ https://www.nature.com/articles/nbt.3026 https://www.nature.com/articles/s42003-020-01452-9#MOESM4 https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1697-6 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6921152/ https://www.embopress.org/doi/full/10.15252/embj.201899466 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4338555/ https://www.embopress.org/doi/full/10.15252/embj.201899466 - paper that shows that tandem PAMs affects Cas9 binding to target https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3151-4#Sec2 https://www.sciencedirect.com/science/article/pii/S2001037021000738#s0010 - CNN model with only sgRNA sequence input https://www.sciencedirect.com/science/article/pii/S2001037019303551 - Summary of current methods https://pubmed.ncbi.nlm.nih.gov/30988204/ - integrating the energetics of R-loop formation under Cas9 binding, the effect of the protospacer adjacent motif sequence, and the folding stability of the whole single guide RNA, we devised a unified, physical model that can apply to any cleavage-activity dataset. https://www.biorxiv.org/content/10.1101/269910v1.full.pdf - “inefficient RNAs have a significantly higher average melting temperature than efficient ones” Temperature of Melting (Tm) is defined as the temperature at which 50% of double stranded DNA is changed to single-stranded DNA. The higher the melting temperature the greater the guanine-cytosine (GC) content of the DNA. Formula: Tm = 2 °C(A + T) + 4 °C(G + C) = °C Tm.
- https://www.cambridge.org/core/journals/quarterly-reviews-of-biophysics/article/key-role-of-the-rec-lobe-during-crisprcas9-activation-by-sensing-regulating-and-locking-the-catalytic-hnh-domain/DD8DCCAC11DC69C73C9B2AEB15E4B656
- https://pubs.acs.org/doi/10.1021/jacs.7b13047
https://pubs.acs.org/doi/10.1021/jacs.7b13047 “Collectively, the current understanding of RNA-guided DNA targeting and cleavage by Cas9 involves (1) sgRNA binding to elicit an active Cas9 conformation, (9, 10) (2) PAM recognition, (8) (3) local DNA duplex unwinding and RNA strand invasion, (4) complete directional unwinding of the DNA to form the RNA-DNA heteroduplex, (11) and (5) coupled conformational changes within Cas9 necessary for subsequent DNA cleavage. (3, 12)” “Our results directly show that HNH cleaves faster than RuvC (Figure 7A,B) and therefore expand this hypothesis to suggest that conformational activation of the HNH domain is a prerequisite for cleavage of tDNA and ntDNA, thereby controlling when (i.e., timing) DNA cleavage occurs. However, it is the slow rate (0.37 s–1, k5b, Figure 2A) for RuvC isomerization which limits the overall rate of double-stranded DNA cleavage from the pre-formed ternary complex.” “This result corroborates the findings from the multiple-turnover kinetic assays (Figure 8) indicating that DNA product release is the slowest mechanistic step (k7, Figure 2A) and also demonstrates that Cas9 can remain tightly bound to even large DNA products for a substantial amount of time following double-stranded DNA cleavage.“
https://www.nature.com/articles/nature13579 “Our structural observations suggest that the interaction between the target DNA strand and the phosphate lock loop might stabilize target DNA immediately upstream of the PAM in an unwound conformation, thereby linking PAM recognition with local strand separation.“ “Together, these structures reveal that even in the absence of compensatory base pairing to the guide RNA, target DNA binding by Cas9–RNA results in local strand separation immediately upstream of the PAM. Importantly, the interaction of the +1 phosphate with the phosphate lock loop is maintained in both structures, supporting the hypothesis that the loop contributes to stabilizing the target DNA strand in the unwound state.”
https://www.annualreviews.org/doi/10.1146/annurev-biophys-062215-010822 “Mismatches in this seed region severely impair or completely abrogate target DNA binding and cleavage, whereas close homology in the seed region often leads to off-target binding events even with many mismatches elsewhere (78).” “The most prominent conformational change takes place in the REC lobe, in particular Hel-III, which moves ∼65 Å toward the HNH domain upon sgRNA binding. In contrast, Cas9 exhibits much smaller conformational changes upon binding to target DNA and PAM sequence (Figure 5), which indicates that the majority of the extensive structural rearrangements occur prior to target DNA binding, reinforcing the notion that guide RNA loading is a key regulator of Cas9 enzyme function” “Once Cas9 has found a target site with the appropriate PAM, it triggers local DNA melting at the PAM-adjacent nucleation site, followed by RNA strand invasion to form an RNA–DNA hybrid and a displaced DNA strand (termed R-loop) from PAM-proximal to PAM-distal ends (94, 96). Perfect complementarity between the seed region of sgRNA and target DNA is necessary for Cas9-mediated DNA targeting and cleavage, whereas imperfect base pairing at the nonseed region is much more tolerated for target binding specificity” “ In the PAM duplex–bound structure (Figure 5d), a sharp kink turn is observed in the target strand immediately upstream of the PAM, with the connecting phosphodiester group (referred to as +1 phosphate) stabilized by a phosphate lock loop (K1107–S1109) located in the PAM-interacting CTD domain (3). Such a kink-turn configuration is necessary for driving the target stand DNA to transition from pairing with the nontarget strand to pairing with the guide RNA” “As observed in the PAM duplex–bound structure (3), the unwound target DNA strand kinks at the +1 phosphodiester linkage and then pairs with the spacer region to form a pseudo-A-form RNA–DNA hybrid. In contrast to the target strand, which runs the length of the central channel formed between the two Cas9 lobes, the displaced nontarget DNA strand threads into a tight side tunnel located within the NUC lobe”
https://www.science.org/doi/10.1126/sciadv.abe5496 “Across all sgRNAs, most RNA:DNA mismatches or bulges had small effects on final fraction bound (Fig. 2C and table S4). Single RNA:DNA mismatches had particularly modest impact, generally only visible in first seven positions of the seed. Curiously, the presence of multiple distal mismatches slightly increased the final fraction bound for many sgRNAs (Fig. 2A). Recent single-molecule studies suggest that distal mismatches decrease the fraction of RNP:target complexes in an unwound state even while stably bound (10, 28), which could correspond to differences in complex stability or adherence to nitrocellulose. We also, we observed that the sensitivity of a target to perturbation (as ordered in Fig. 2A) inversely correlated with the number of internal PAMs contained within the target sequence (Spearman R = −0.31, P = 0.01).” “Previous work has shown that cleavage is much more sensitive to imperfect matches than is binding (23) due to a conformational change required for target DNA cleavage (7, 31, 32). Our data are consistent with these findings. Across all sgRNAs, more than 85% of targets with 17 bp of complementarity exhibited detectable cleavage (Fig. 3F). Additional mismatches substantially decreased the fraction of targets cleaved: 38% of targets with 16 bp of complementarity exhibited cleavage below the threshold of detection, as did 62% of targets with 15 bp of complementarity” “The fit parameters indicate that the presence of a G at the nearest 3′ position (NGGG-extended PAM) slows association, in this case by 27% (table S9). However, as suggested by an analysis of CRISPRi/a data (22), an extended PAM consisting of a 3′ CC (NGGCC) slowed the association rate even more. When combined with an additional 3′ C (NGGCCC), the model predicted over a twofold drop in association rate, more than double the reduction predicted for an NGGG-extended PAM.” “Across context variants of all guides, association rates were typically the slowest to targets containing a G at the nearest 3′ base, consistent with an NGGH-extended PAM motif for achieving the most rapid association” “Our maximal productive binding measurement instead appears to align with the conventional understanding of Cas9 targets, which have an 8- to 10-bp seed region that is sensitive to disruption, an 8- to 11-bp PAM-distal region that is largely resilient, and an intermediate zone sensitive to large perturbations’
https://www.frontiersin.org/articles/10.3389/fmolb.2021.653262/full https://www.sciencedirect.com/science/article/pii/S0092867414001561?via%3Dihub#fig2 “Here, we report the crystal structure of Streptococcus pyogenes Cas9 in complex with sgRNA and its target DNA at 2.5 Å resolution. The structure revealed a bilobed architecture composed of target recognition and nuclease lobes, accommodating the sgRNA:DNA heteroduplex in a positively charged groove at their interface. Whereas the recognition lobe is essential for binding sgRNA and DNA, the nuclease lobe contains the HNH and RuvC nuclease domains, which are properly positioned for cleavage of the complementary and noncomplementary strands of the target DNA, respectively. The nuclease lobe also contains a carboxyl-terminal domain responsible for the interaction with the protospacer adjacent motif (PAM). “ “These observations suggested that the 3′-NCC-5′ sequence complementary to the 5′-NGG-3′ PAM is not recognized by Cas9 and are consistent with previous biochemical data showing that Cas9-catalyzed DNA cleavage requires the 5′-NGG-3′ PAM on the noncomplementary strand, but not the 3′-NCC-5′ sequence on the complementary strand (Jinek et al., 2012).” “The backbone phosphate groups of the guide region (nucleotides 2, 4–6, and 13–20) interact with the REC1 domain (Arg165, Gly166, Arg403, Asn407, Lys510, Tyr515, and Arg661) and the bridge helix (Arg63, Arg66, Arg70, Arg71, Arg74, and Arg78)” “The sgRNA guide region is recognized by Cas9 in a sequence-independent manner, except for the U16-Arg447 and G18-Arg71 interactions (Figures 5 and 6A). This base-specific G18-Arg71 interaction may partly explain the observed preference of Cas9 for sgRNAs with guanines in the four PAM-proximal guide regions (Wang et al., 2014).”
Discrete Wavelet Transformations: A discrete wavelet transformation (21 scales) was done on several features calculated for every 20bp sliding window of the genome. These transformations included features such as GATC motif density, gene density, GC content, PAM site density, IPD ratio, MFE (ViennaRNA), and temperature of melting (Tm) through a combination of counts, calculations, and motif searches. A fasta file was generated based on the genome assembly using the bedtools makewindows command with -w 20 and -s 1 indicating a window size of 20 sliding every base pair. This file was then used to calculate the feature values for each window. Each feature was assessed individually by generating a vector of the calculated values for each window. The vector went through a HAAR transformation using the R package wmtsa function wavMODWT. The resulting HAAR wavelet value corresponding to each scale for every 20bp sliding window of the genome was extracted. These transformed values were compiled across all features for each sgRNA resulting in an additional 184 features.
R package: wmtsa 21 scales per feature per guide 20bp sliding windows Features GC content Temperature of melting RNA structure (ViennaRNA MFE) Gene density Gene expression (RNA-seq GEO: GSM2267479) ?? GATCF motif density ?? IPD ratio (GEO: GSM3264688) ??