Feature Matrix Pipeline (19 November 2021)

Feature set includes… - Melting temperature - GC conent - MFE (ViennaRNA) - Distance to PAM - Location in target gene - Positional encoding (sgRNA ind1, ind2, dep1, dep2, dep3, dep4) - PAM nucleotide encoding - HAAR DWTs (20bp sliding windows of the genome: GATC motif, Gene density, GC content, PAM, IPD, MFE, melting temp) - Quantum chemical tensors (monomer, basepair, dimer, trimer, tetramer)

E.coli

blast

  • do a search for the sgRNA sequence in the genome
    • input fasta file of sequences, output coordinates
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/
# generate fastq file of sequences and blast to reference
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
sed '1d' DataS1.txt | awk '{print ">"$1"\n"$2}' > ecoli.gRNA.fasta

## blast
# conda install blast
# cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes
# wget https://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/ncbi-blast-2.11.0+-x64-linux.tar.gz
# tar zxvpf ncbi-blast-2.11.0+-x64-linux.tar.gz
# export PATH=$PATH:$HOME/ncbi-blast-2.10.1+/bin
# echo $PATH
# mkdir $HOME/blastdb
# export BLASTDB=$HOME/blastdb
# set BLASTDB=$HOME/blastdb

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli

/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/makeblastdb -in genome/GCF_000005845.2_ASM584v2_genomic.fna -dbtype nucl
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query ecoli.gRNA.fasta -db genome/GCF_000005845.2_ASM584v2_genomic.fna -out ecoli.gRNA.blast.tab -outfmt 6 -evalue 0.0001 -task blastn -num_threads 10

## install bwa (git clone https://github.com/bwa-mem2/bwa-mem2)
awk '{if ($9 > $10) print $2"\t"$10"\t"$9"\t"$1}' ecoli.gRNA.blast.tab > tmp1.bed
awk '{if ($10 > $9) print $2"\t"$9"\t"$10"\t"$1}' ecoli.gRNA.blast.tab > tmp2.bed
cat tmp1.bed tmp2.bed > ecoli.gRNA.blast.bed


# tr -d '\n' < mexicanus.fasta | cut -b210-220


###### run with complement sequence
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate emboss

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
revseq ecoli.gRNA.fasta -noreverse -complement -outseq ecoli.gRNA.comp.fasta

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query ecoli.gRNA.comp.fasta -db genome/GCF_000005845.2_ASM584v2_genomic.fna -out ecoli.gRNA.complement.blast.tab -outfmt 6 -evalue 0.0001 -task blastn -num_threads 10
#### only getting two outputs??
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query ecoli.gRNA.comp.fasta -db genome/GCF_000005845.2_ASM584v2_genomic.fna -out ecoli.gRNA.complement.blast.tab -outfmt 6 -evalue 0.0005 -task blastn -num_threads 10
#### only 13 outputs
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query ecoli.gRNA.comp.fasta -db genome/GCF_000005845.2_ASM584v2_genomic.fna -out ecoli.gRNA.complement.blast.tab -outfmt 6 -task blastn-short -num_threads 10
#### too many outputs [1185091]
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query ecoli.gRNA.comp.fasta -db genome/GCF_000005845.2_ASM584v2_genomic.fna -out ecoli.gRNA.complement.blast.tab -outfmt 6 -task blastn -num_threads 10
#### fewer but still too many [809935] <-- input 55671 sequences
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query ecoli.gRNA.comp.fasta -db genome/GCF_000005845.2_ASM584v2_genomic.fna -out ecoli.gRNA.complement.blast.tab -outfmt 6 -evalue 0.001 -task blastn -num_threads 10
#### only 37 outputs... 
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query ecoli.gRNA.comp.fasta -db genome/GCF_000005845.2_ASM584v2_genomic.fna -out ecoli.gRNA.complement.blast.tab -outfmt 6 -evalue 0.01 -task blastn-short -num_threads 10


awk '{if ($9 > $10) print $2"\t"$10"\t"$9"\t"$1}' ecoli.gRNA.complement.blast.tab > tmp1.comp.bed
awk '{if ($10 > $9) print $2"\t"$9"\t"$10"\t"$1}' ecoli.gRNA.complement.blast.tab > tmp2.comp.bed
cat tmp1.comp.bed tmp2.comp.bed > ecoli.gRNA.complement.blast.bed

sgRNA file

  • generate a tab-delimited file for the sgRNA data
    • chr, start, end, sgRNA, nucleotide sequence, cut score
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

# R

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
d1 <- read.delim("DataS1.txt", header=T, sep="\t")
d4 <- read.delim("DataS4.txt", header=T, sep="\t")
d6 <- read.delim("DataS6.txt", header=T, sep="\t")
coord <- read.delim("ecoli.gRNA.blast.bed", header=F, sep="\t")
colnames(coord) <- c("chr", "start", "end", "sgRNA")
d1$sgRNA <- d1$sgRNAID
d4$sgRNA <- d4$sgRNAID

library(dplyr)
df <- left_join(coord, d1, by="sgRNA")
df2 <- left_join(df, d4, by="sgRNA")
df3 <- left_join(df2, d6, by="sgRNA")
write.table(df3, "sgRNA.coord.txt", quote=F, row.names=F, sep="\t")


# complement
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
d1 <- read.delim("DataS1.txt", header=T, sep="\t")
d4 <- read.delim("DataS4.txt", header=T, sep="\t")
d6 <- read.delim("DataS6.txt", header=T, sep="\t")
coord <- read.delim("ecoli.gRNA.complement.blast.bed", header=F, sep="\t")
colnames(coord) <- c("chr", "start", "end", "sgRNA")
d1$sgRNA <- d1$sgRNAID
d4$sgRNA <- d4$sgRNAID

library(dplyr)
df <- left_join(coord, d1, by="sgRNA")
df2 <- left_join(df, d4, by="sgRNA")
df3 <- left_join(df2, d6, by="sgRNA")
write.table(df3, "sgRNA.complement.coord.txt", quote=F, row.names=F, sep="\t")



## include all cas9 types
### dataset --> DataS4... save each sheet as a dataframe, add column declaring Cas9 type, intersect with DataS1 for sequence, create new sgRNAID using both the ID and Cas9 type, merge files
setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli")
seq <- read.delim("DataS1.txt", header=T, sep="\t")
setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/DataS4.tables")
Cas9 <- read.delim("DataS4.Cas9.txt", header=T, sep="\t")
eSpCas9 <- read.delim("DataS4.eSpCas9.txt", header=T, sep="\t")
recAcas9 <- read.delim("DataS4.recACas9.txt", header=T, sep="\t")
# > nrow(seq)
# [1] 55671
# > nrow(Cas9)
# [1] 44163
# > nrow(eSpCas9)
# [1] 45071
# > nrow(recAcas9)
# [1] 48112
library(dplyr)
library(tidyr)
Cas9.seq <- left_join(Cas9, seq, by="sgRNAID")
eSpCas9.seq <- left_join(eSpCas9, seq, by="sgRNAID")
recAcas9.seq <- left_join(recAcas9, seq, by="sgRNAID")

Cas9.seq.id <- Cas9.seq %>% unite(sgRNAID, c(sgRNAID, type), sep="_")
eSpCas9.seq.id <- eSpCas9.seq %>% unite(sgRNAID, c(sgRNAID, type), sep="_")
recAcas9.seq.id <- recAcas9.seq %>% unite(sgRNAID, c(sgRNAID, type), sep="_")
df <- rbind(Cas9.seq.id, eSpCas9.seq.id)
df2 <- rbind(df, recAcas9.seq.id)
# 137346
df.na <- na.omit(df2)
# 126182
write.table(df.na, "Ecoli.allCas9.txt", quote=F, row.names=F, sep="\t")


sed '1d' Ecoli.allCas9.txt | awk '{print ">"$1"\n"$3}' > Ecoli.allCas9.fasta

# cd /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/DataS4.tables
# scp Ecoli.allCas9.txt noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/.
# scp Ecoli.allCas9.fasta noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/.

sliding windows

  • make 20bp sliding windows (every 1bp)
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli

bedtools makewindows -g ecoli.sizes.genome -w 20 -s 1 > ecoli.20bp.sliding.bed

Features

Gene density & GC content

module load python
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli

## genes
bedtools intersect -wo -a ecoli.20bp.sliding.bed -b genome/GCF_000005845.2_ASM584v2_genomic.gene.gff > ecoli.gene.20sliding.bed

## GC content
bedtools nuc -fi genome/GCF_000005845.2_ASM584v2_genomic.fna -bed ecoli.20bp.sliding.bed | sed '1d' > ecoli.GC.20sliding.bed

Temperature of melting (Tm)

https://biopython.org/docs/1.75/api/Bio.SeqUtils.MeltingTemp.html

module load python
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

Bio.SeqUtils.MeltingTemp.Tm_NN(seq, check=True, strict=True, c_seq=None, shift=0, nn_table=None, tmm_table=None, imm_table=None, de_table=None, dnac1=25, dnac2=25, selfcomp=False, Na=50, K=0, Tris=0, Mg=0, dNTPs=0, saltcorr=5)

https://warwick.ac.uk/fac/sci/moac/people/students/peter_cock/python/fasta_n

# summit: # conda install -c conda-forge biopython 

### sgRNA
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli

# count nucleotides
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
python3

input_file = open('ecoli.gRNA.fasta', 'r')
output_file = open('nucleotide_counts_sgRNA.tsv','w')
output_file.write('Window\tA\tC\tG\tT\tLength\tCG%\n')
from Bio import SeqIO
for cur_record in SeqIO.parse(input_file, "fasta") :
    gene_name = cur_record.name
    A_count = cur_record.seq.count('A')
    C_count = cur_record.seq.count('C')
    G_count = cur_record.seq.count('G')
    T_count = cur_record.seq.count('T')
    length = len(cur_record.seq)
    cg_percentage = float(C_count + G_count) / length
    output_line = '%s\t%i\t%i\t%i\t%i\t%i\t%f\n' % \
    (gene_name, A_count, C_count, G_count, T_count, length, cg_percentage)
    output_file.write(output_line)
    
output_file.close()
input_file.close()
exit()

# Melting temperature(°C) = 64.9 + 41 * (nG+nC-16.4)/(nA+nT+nG+nC)
R

library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica")
df <- read.delim("nucleotide_counts_sgRNA.tsv", header=T, sep="\t")
df.melt <- df %>% mutate(MeltingTemp = 64.9 + 41 * (G+C-16.4) / (A+T+G+C))

write.table(df.melt, "nucleotide_counts_sgRNA_temp.txt", quote=F, row.names=F, sep="\t")
q()


### 20bp sliding windows
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
bedtools getfasta -fi genome/GCF_000005845.2_ASM584v2_genomic.fna -bed ecoli.20bp.sliding.bed -fo ecoli.20sliding.fa

# count nucleotides
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
python3

input_file = open('ecoli.20sliding.fa', 'r')
output_file = open('nucleotide_counts_20sliding.tsv','w')
output_file.write('Window\tA\tC\tG\tT\tLength\tCG%\n')
from Bio import SeqIO
for cur_record in SeqIO.parse(input_file, "fasta") :
    gene_name = cur_record.name
    A_count = cur_record.seq.count('A')
    C_count = cur_record.seq.count('C')
    G_count = cur_record.seq.count('G')
    T_count = cur_record.seq.count('T')
    length = len(cur_record.seq)
    cg_percentage = float(C_count + G_count) / length
    output_line = '%s\t%i\t%i\t%i\t%i\t%i\t%f\n' % \
    (gene_name, A_count, C_count, G_count, T_count, length, cg_percentage)
    output_file.write(output_line)
    
output_file.close()
input_file.close()
exit()

# Melting temperature(°C) = 64.9 + 41 * (nG+nC-16.4)/(nA+nT+nG+nC)
R

library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("nucleotide_counts_20sliding.tsv", header=T, sep="\t")
df.melt <- df %>% mutate(MeltingTemp = 64.9 + 41 * (G+C-16.4) / (A+T+G+C))

write.table(df.melt, "nucleotide_counts_20sliding_temp.txt", quote=F, row.names=F, sep="\t")
q()

RNA structure (ViennaRNA)

https://www.tbi.univie.ac.at/RNA/tutorial/ minimum free energy (MFE) structure

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate ViennaRNA

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
RNAfold < ecoli.gRNA.fasta > ecoli.gRNA.ViennaRNA.output.txt

grep '(' ecoli.gRNA.ViennaRNA.output.txt | grep -Eo '[+-]?[0-9]+([.][0-9]+)?' > ecoli.gRNA.ViennaRNA.output.value.txt
grep '>' ecoli.gRNA.ViennaRNA.output.txt | sed 's/>//g' > ecoli.gRNA.names.txt
paste ecoli.gRNA.names.txt ecoli.gRNA.ViennaRNA.output.value.txt > ecoli.gRNA.ViennaRNA.output.value.id.txt

# 20bp sliding fasta
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
RNAfold < ecoli.20sliding.fa > ecoli.20sliding.ViennaRNA.output.txt

grep '(' ecoli.20sliding.ViennaRNA.output.txt | grep -Eo '[+-]?[0-9]+([.][0-9]+)?' > ecoli.20sliding.ViennaRNA.output.value.txt
grep '>' ecoli.20sliding.ViennaRNA.output.txt | sed 's/>//g' > ecoli.20sliding.names.txt
paste ecoli.20sliding.names.txt ecoli.20sliding.ViennaRNA.output.value.txt > ecoli.20sliding.ViennaRNA.output.value.id.txt

onehot encoding

### onehot encoding
# import os, sys
# import numpy as np
# 
# onehot_dict = {
#     'A': '1000',
#     'C': '0100',
#     'T': '0010',
#     'G': '0001',
#     'AA': '1000000000000000',
#     'AC': '0100000000000000',
#     'AT': '0010000000000000',
#     'AG': '0001000000000000',
#     'CA': '0000100000000000',
#     'CC': '0000010000000000',
#     'CT': '0000001000000000',
#     'CG': '0000000100000000',
#     'TA': '0000000010000000',
#     'TC': '0000000001000000',
#     'TT': '0000000000100000',
#     'TG': '0000000000010000',
#     'GA': '0000000000001000',
#     'GC': '0000000000000100',
#     'GT': '0000000000000010',
#     'GG': '0000000000000001',
# }
# 
# # open input and output files
# input_path = sys.argv[1]
# input_file = open(input_path, 'r')
# dep1_file = open(input_path[:-4]+'_dependent1.txt', 'w')
# dep2_file = open(input_path[:-4]+'_dependent2.txt', 'w')
# indep1_file = open(input_path[:-4]+'_independent1.txt', 'w')
# indep2_file = open(input_path[:-4]+'_independent2.txt', 'w')
# 
# # loop over nucleotide sequences
# for idx, line in enumerate(input_file):
# 
#     # if first iteration, write title line
#     if idx == 0:
#         
#         dep1_file.writelines(line+': first-order position-dependent features'+ '\n')
#         dep2_file.writelines(line+': second-order position-dependent features'+ '\n')
#         indep1_file.writelines(line+': first-order position-independent features'+ '\n')
#         indep2_file.writelines(line+': second-order position-independent features'+ '\n')
#         
#     # otherwise encode sequence
#     else:
# 
#         # split line by tab
#         line = line.split('\t')
# 
#         # extract sequence (also remove \n)
#         seq = line[-1][:-1]
# 
#         # compute position-dependent features as one-hot vectors
#         pos_dep1 = ''.join([onehot_dict[seq[i]] for i in range(len(seq))])
#         pos_dep2 = ''.join([onehot_dict[seq[i:i+2]] for i in range(len(seq)-1)])
# 
#         # compute position-independent features as sum over position-dependent features
#         pos_indep1 = list(np.array([int(o) for o in pos_dep1]).reshape([-1, 4]).sum(axis=0))
#         pos_indep2 = list(np.array([int(o) for o in pos_dep2]).reshape([-1, 16]).sum(axis=0))
#         pos_indep1 = ''.join([str(p) for p in pos_indep1])
#         pos_indep2 = ''.join([str(p) for p in pos_indep2])
#         
#         # write features to file
#         dep1_file.writelines(line[0] + '\t' + pos_dep1 + '\n')
#         dep2_file.writelines(line[0] + '\t' + pos_dep2 + '\n')
#         indep1_file.writelines(line[0] + '\t' + pos_indep1 + '\n')
#         indep2_file.writelines(line[0] + '\t' + pos_indep2 + '\n')
#         
#     if idx % 10000 == 0:
#         print('{0:,}'.format(idx)+' lines processed...')
#         
# print('Done!')
#             
# input_file.close()
# dep1_file.close()
# dep2_file.close()
# indep1_file.close()
# indep2_file.close()

#python path/to/encode_sequences.py path/to/data.txt

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/
cut -f 1,3 Ecoli.allCas9.txt > Ecoli.allCas9.noscore.txt
python encode_sequences.py Ecoli.allCas9.noscore.txt


# separate nucleotide sequence values into individual columns in data frame so each position counts as one feature
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/

sed '1d' Ecoli.allCas9.noscore_independent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID A C T G' | cut -d ' ' -f 1-5 > Ecoli.allCas9_ind1.txt
sed '1d' Ecoli.allCas9.noscore_independent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID AA AC AT AG CA CC CT CG TA TC TT TG GA GC GT GG' | cut -d ' ' -f 1-17 > Ecoli.allCas9_ind2.txt
sed '1d' Ecoli.allCas9.noscore_dependent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID p1.A p1.C p1.T p1.G p2.A p2.C p2.T p2.G p3.A p3.C p3.T p3.G p4.A p4.C p4.T p4.G p5.A p5.C p5.T p5.G p6.A p6.C p6.T p6.G p7.A p7.C p7.T p7.G p8.A p8.C p8.T p8.G p9.A p9.C p9.T p9.G p10.A p10.C p10.T p10.G p11.A p11.C p11.T p11.G p12.A p12.C p12.T p12.G p13.A p13.C p13.T p13.G p14.A p14.C p14.T p14.G p15.A p15.C p15.T p15.G p16.A p16.C p16.T p16.G p17.A p17.C p17.T p17.G p18.A p18.C p18.T p18.G p19.A p19.C p19.T p19.G p20.A p20.C p20.T p20.G' | cut -d ' ' -f 1-81 > Ecoli.allCas9_dep1.txt
sed '1d' Ecoli.allCas9.noscore_dependent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID p1.AA p1.AC p1.AT p1.AG p1.CA p1.CC p1.CT p1.CG p1.TA p1.TC p1.TT p1.TG p1.GA p1.GC p1.GT p1.GG p2.AA p2.AC p2.AT p2.AG p2.CA p2.CC p2.CT p2.CG p2.TA p2.TC p2.TT p2.TG p2.GA p2.GC p2.GT p2.GG p3.AA p3.AC p3.AT p3.AG p3.CA p3.CC p3.CT p3.CG p3.TA p3.TC p3.TT p3.TG p3.GA p3.GC p3.GT p3.GG p4.AA p4.AC p4.AT p4.AG p4.CA p4.CC p4.CT p4.CG p4.TA p4.TC p4.TT p4.TG p4.GA p4.GC p4.GT p4.GG p5.AA p5.AC p5.AT p5.AG p5.CA p5.CC p5.CT p5.CG p5.TA p5.TC p5.TT p5.TG p5.GA p5.GC p5.GT p5.GG p6.AA p6.AC p6.AT p6.AG p6.CA p6.CC p6.CT p6.CG p6.TA p6.TC p6.TT p6.TG p6.GA p6.GC p6.GT p6.GG p7.AA p7.AC p7.AT p7.AG p7.CA p7.CC p7.CT p7.CG p7.TA p7.TC p7.TT p7.TG p7.GA p7.GC p7.GT p7.GG p8.AA p8.AC p8.AT p8.AG p8.CA p8.CC p8.CT p8.CG p8.TA p8.TC p8.TT p8.TG p8.GA p8.GC p8.GT p8.GG p9.AA p9.AC p9.AT p9.AG p9.CA p9.CC p9.CT p9.CG p9.TA p9.TC p9.TT p9.TG p9.GA p9.GC p9.GT p9.GG p10.AA p10.AC p10.AT p10.AG p10.CA p10.CC p10.CT p10.CG p10.TA p10.TC p10.TT p10.TG p10.GA p10.GC p10.GT p10.GG p11.AA p11.AC p11.AT p11.AG p11.CA p11.CC p11.CT p11.CG p11.TA p11.TC p11.TT p11.TG p11.GA p11.GC p11.GT p11.GG p12.AA p12.AC p12.AT p12.AG p12.CA p12.CC p12.CT p12.CG p12.TA p12.TC p12.TT p12.TG p12.GA p12.GC p12.GT p12.GG p13.AA p13.AC p13.AT p13.AG p13.CA p13.CC p13.CT p13.CG p13.TA p13.TC p13.TT p13.TG p13.GA p13.GC p13.GT p13.GG p14.AA p14.AC p14.AT p14.AG p14.CA p14.CC p14.CT p14.CG p14.TA p14.TC p14.TT p14.TG p14.GA p14.GC p14.GT p14.GG p15.AA p15.AC p15.AT p15.AG p15.CA p15.CC p15.CT p15.CG p15.TA p15.TC p15.TT p15.TG p15.GA p15.GC p15.GT p15.GG p16.AA p16.AC p16.AT p16.AG p16.CA p16.CC p16.CT p16.CG p16.TA p16.TC p16.TT p16.TG p16.GA p16.GC p16.GT p16.GG p17.AA p17.AC p17.AT p17.AG p17.CA p17.CC p17.CT p17.CG p17.TA p17.TC p17.TT p17.TG p17.GA p17.GC p17.GT p17.GG p18.AA p18.AC p18.AT p18.AG p18.CA p18.CC p18.CT p18.CG p18.TA p18.TC p18.TT p18.TG p18.GA p18.GC p18.GT p18.GG p19.AA p19.AC p19.AT p19.AG p19.CA p19.CC p19.CT p19.CG p19.TA p19.TC p19.TT p19.TG p19.GA p19.GC p19.GT p19.GG p20.AA p20.AC p20.AT p20.AG p20.CA p20.CC p20.CT p20.CG p20.TA p20.TC p20.TT p20.TG p20.GA p20.GC p20.GT p20.GG' | cut -d ' ' -f 1-321 > Ecoli.allCas9_dep2.txt

chemical tensors

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/
sed '1d' Ecoli.allCas9.noscore.txt | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 p16 p17 p18 p19 p20' | cut -d ' ' -f 1-21 > Ecoli.allCas9.sequence.txt


# salloc -A SYB105 -N 2 -t 4:00:00
module load python
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

R
library(dplyr)
library(reshape2)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("protein_rna_dna-vector_lee_nucleotide_dna_data.txt", header=T, sep="\t", stringsAsFactors = F)
seq <- read.delim("Ecoli.allCas9.sequence.txt", header=T, sep=" ", stringsAsFactors = F)

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:5]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- c("A", "C", "G", "T")

rownames(seq) <- seq[,1]
seq.df <- seq[,2:21]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))

seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Ecoli.allCas9.tensors.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Ecoli.allCas9.tensors.melt.txt", quote=F, row.names=F, sep="\t")

RNA-seq

library(tidyr)
library(dplyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/genome")
# sed '1d' GCF_000005845.2_ASM584v2_genomic.gff | sed '1d' | sed '1d' | sed '1d' | sed '1d' | sed '1d' | sed '1d' > GCF_000005845.2_ASM584v2_genomic.txt
annotation <- read.delim("GCF_000005845.2_ASM584v2_genomic.txt", header=F, sep="\t")
gene <- subset(annotation, annotation$V3 == "gene")
gene.id <- separate(gene, V9, c("id1", "id2"), sep="EcoGene:")
gene.id$gene_id <- substr(gene.id$id2, 1, 7)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
rna <- read.delim("GSM2267479_Sample-1.genes.results.txt", header=T, sep="\t")

rna.id <- left_join(rna, gene.id, by="gene_id")
rna.id.idf <- na.omit(rna.id[,c(8,11,12,1,3:7)])
write.table(rna.id.idf, "GSM2267479.fpkm.coord.txt", quote=F, row.names=F, sep="\t")

# calculate density 

bedtools intersect -wo -a ecoli.20bp.sliding.bed -b GSM2267479.fpkm.coord.bed > ecoli.rnaseq.20sliding.bed

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
window <- read.delim("ecoli.rnaseq.20sliding.bed", header=F, sep="\t")

window.df <- window %>% group_by(V1, V2, V3) %>% mutate(avg.fpkm = mean(V12))
window.uniq <- unique(window.df[,c(1:3,14)])
write.table(window.uniq, "ecoli.rnaseq.average.20sliding.bed", quote=F, row.names=F, sep="\t")

GATC motif

  • proxy for putative methylation
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

## GATC motif
## fastaregex
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f genome/GCF_000005845.2_ASM584v2_genomic.fna -r 'GATC' > ecoli.gatc.bed

bedtools intersect -wo -a ecoli.20bp.sliding.bed -b ecoli.gatc.coord.bed > ecoli.gatc.20sliding.bed

IPD ratios

  • associated with GATC and methylation in E.coli
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("GSM3264688_Ecoli.gff", header=F, sep="\t")
df2 <- df[5:nrow(df),]

library(dplyr)
library(tidyr)
df.sep <- df2 %>% separate(V9, c("coverage", "context", "IPD"), sep=";")
df.ipd <- df.sep %>% separate(IPD, c("IPD", "IPD.value"), sep="=")
df.ipd$chr <- "NC_000913.3"

df.coord <- df.ipd[,c(13,4,5,12)]

write.table(df.coord, "GSM3264688_Ecoli.coord.bed", quote=F, row.names=F, col.names=F, sep="\t")

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
bedtools intersect -wo -a ecoli.20bp.sliding.bed -b GSM3264688_Ecoli.coord.bed > ecoli.ipd.20sliding.bed


setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
window <- read.delim("ecoli.ipd.20sliding.bed", header=F, sep="\t")

window.df <- window %>% group_by(V1, V2, V3) %>% mutate(avg.fpkm = mean(V7))
write.table(window.df, "ecoli.ipd.average.20sliding.bed", quote=F, row.names=F, sep="\t")

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
window.df <- read.delim("ecoli.ipd.average.20sliding.bed", header=T, sep="\t")

window.uniq <- unique(window.df[,c(1:3,9)])
write.table(window.uniq, "ecoli.ipd.average.20sliding.bed", quote=F, row.names=F, sep="\t")

PAM

https://www.synthego.com/guide/how-to-use-crispr/pam-sequence

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

# generate fastq file of NGG sequences and blast to reference

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
# vim NGG.PAM.fasta

## fastaRegexFinder
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f genome/GCF_000005845.2_ASM584v2_genomic.fna -r 'AGG' > AGG.PAM.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f genome/GCF_000005845.2_ASM584v2_genomic.fna -r 'TGG' > TGG.PAM.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f genome/GCF_000005845.2_ASM584v2_genomic.fna -r 'CGG' > CGG.PAM.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f genome/GCF_000005845.2_ASM584v2_genomic.fna -r 'GGG' > GGG.PAM.txt

cat AGG.PAM.txt TGG.PAM.txt CGG.PAM.txt GGG.PAM.txt > NGG.PAM.txt
sort -k 1,1 -k 2,2n NGG.PAM.txt > NGG.PAM.sorted.bed

# intersect with sliding windows in the genome to get density for DWT
bedtools intersect -wo -a ecoli.20bp.sliding.bed -b NGG.PAM.sorted.bed > NGG.PAM.20bp.sliding.windows.bed

# closest with gRNAs to identify distance (downstream, strand)
cut -f 1-4 sgRNA.coord.txt | sed '1d' | sort -k 1,1 -k 2,2n > sgRNA.coord.bed
awk '{print $0"\t""+"}' sgRNA.coord.bed > sgRNA.coord.strand.txt
bedtools closest -a sgRNA.coord.strand.txt -b NGG.PAM.sorted.bed -io -iu -D a > ecoli.sgRNA.closestPAM.bed

# determine if N = A,C,T, or G 
## feature: PAM.A.raw, PAM.C.raw, PAM.T.raw, PAM.G.raw <-- binary

location relative to gene

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
bedtools closest -a sgRNA.coord.bed -b genome/GCF_000005845.2_ASM584v2_genomic.gene.gff -D b> sgRNA.gene.closest.bed

Raw features matrix

# salloc -A SYB105 -N 2 -t 4:00:00

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
structure <- read.delim("Ecoli.allCas9.structure.txt", header=T, sep="\t", stringsAsFactors = F)
nuc <- read.delim("Ecoli.allCas9.nuc.count.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("Ecoli.allCas9.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(1:2)]
colnames(score.df) <- c("sgRNAID", "cut.score")

structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]

# strucutre, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])

structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"

structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]

structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "seq", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")

## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
onehot.ind1 <- read.delim("Ecoli.allCas9_ind1.txt", header=T, sep=" ")
onehot.ind2 <- read.delim("Ecoli.allCas9_ind2.txt", header=T, sep=" ")
onehot.dep1 <- read.delim("Ecoli.allCas9_dep1.txt", header=T, sep=" ")
onehot.dep2 <- read.delim("Ecoli.allCas9_dep2.txt", header=T, sep=" ")
onehot.dep2 <- onehot.dep2[,1:305]

onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep <- full_join(onehot.dep1, onehot.dep2, by="sgRNAID")
onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"

data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))

df.melt <- melt(data.onehot[,c(1,2,4:ncol(data.onehot))], id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
write.table(df.id, "df.id.test.txt", quote=F, row.names=F, sep="\t")

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
tensor <- read.delim("Ecoli.allCas9.tensors.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0

tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0
write.table(tensor.id, "tensor.id.test", quote=F, row.names=F, sep="\t")



# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df.id <- read.delim("df.id.test.txt", header=T, sep="\t")
tensor.id <- read.delim("tensor.id.test", header=T, sep="\t")

df.score <- unique(df.id[,c(1,3)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]

head(df.id)
head(tensor.score.order)
tensor.df <- rbind(df.id, tensor.score.order)
write.table(tensor.df, "Ecoli.allCas9.raw.onehot.tensor.txt", quote=F, row.names=F, sep="\t")

df.dcast <- tensor.df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
write.table(df.dcast, "Ecoli.allCas9.raw.onehot.tensor.dcast.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast)
# 
df.dcast.na <- na.omit(df.dcast)
write.table(df.dcast.na, "Ecoli.allCas9.raw.onehot.tensor.dcast.na.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast.na)
# 


# pam (distance and nucleotide)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
sgRNA.pam <- read.table("ecoli.sgRNA.closestPAM.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.pam.sub <- sgRNA.pam[,c(4,12,13)]
colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]
sgRNA.pam.df$id <- "Cas9"
sgRNA.pam.id <- unite(sgRNA.pam.df, "sgRNAID", c(sgRNAID, id), sep="_")

score.location <- left_join(score.df, sgRNA.pam.id, by="sgRNAID")
score.location$scale <- 0

df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
# 40468
write.table(df.dcast.na, "ecoli.sgRNA.pam.dcast.txt", quote=F, row.names=F, sep="\t")

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df.dcast <- read.delim("ecoli.sgRNA.pam.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
df <- read.delim("Ecoli.allCas9.raw.onehot.tensor.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)

df.location <- inner_join(df, df.dcast, by=c("sgRNAID"))
nrow(df.location)
# 40468

write.table(df.location, "Ecoli.allCas9.raw.onehot.tensor.pam.dcast.na.txt", quote=F, row.names=F, sep="\t")



# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
sgRNA.genes <- read.table("sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- sgRNA.genes[,c(4,14)]
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")
sgRNA.genes.df$id <- "Cas9"
sgRNA.genes.id <- unite(sgRNA.genes.df, "sgRNAID", c(sgRNAID, id), sep="_")

score.location <- left_join(score.df, sgRNA.genes.id, by=c("sgRNAID"))
score.location$scale <- 0

df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
# 40468
write.table(df.dcast.na, "ecoli.sgRNA.location.dcast.txt", quote=F, row.names=F, sep="\t")


setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df.dcast <- read.delim("ecoli.sgRNA.location.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
df <- read.delim("Ecoli.allCas9.raw.onehot.tensor.pam.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df.location <- inner_join(df, df.dcast, by=c("sgRNAID"))
nrow(df.location)
# 

write.table(df.location, "Ecoli.allCas9.raw.onehot.tensor.pam.location.dcast.na.txt", quote=F, row.names=F, sep="\t")

HAAR wavelets

salloc -A SYB105 -N 2 -p gpu -t 4:00:00

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)
library(wmtsa)
library(data.table)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
gatc <- read.table("ecoli.gatc.20sliding.bed", header=F, sep="\t", stringsAsFactors = F)
ipd <- read.table("ecoli.ipd.average.20sliding.bed", header=T, sep="\t", stringsAsFactors = F)
gene <- read.table("ecoli.gene.20sliding.bed", header=F, sep="\t", stringsAsFactors = F)
structure <- read.table("ecoli.20sliding.ViennaRNA.output.value.id.txt", header=T, sep="\t", stringsAsFactors = F)
nuc <- read.table("nucleotide_counts_20sliding_temp.txt", header=T, sep="\t", stringsAsFactors = F)
rnaseq <- read.table("ecoli.rnaseq.average.20sliding.bed", header=T, sep="\t", stringsAsFactors = F)
pam <- read.table("NGG.PAM.20bp.sliding.windows.bed", header=F, sep="\t", stringsAsFactors = F)
window <- read.table("ecoli.20bp.sliding.bed", header=F, sep="\t", stringsAsFactors = F)
score <- read.table("sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
colnames(score) <- c("chr", "start", "end", "sgRNA", "id", "seq", "id2", "cut.score", "gid", "change.val", "quality")
score.df <- score[,c(1:4,8)]

gatc.bin <- gatc %>% group_by(V1, V2, V3) %>% mutate(gatc.count = n())
gatc.count <- unique(gatc.bin[,c(1:3,8)])

gene.bin <- gene %>% group_by(V1, V2, V3) %>% mutate(gene.count = n())
gene.count <- unique(gene.bin[,c(1:3,14)])

pam.bin <- pam %>% group_by(V1, V2, V3) %>% mutate(pam.count = n())
pam.count <- unique(pam.bin[,c(1:3,12)])

window.v <- window[,1:3]
colnames(window.v) <- c("V1", "V2", "V3")
gatc.win <- left_join(window.v, gatc.count, by=c("V1", "V2", "V3"))
gatc.win[is.na(gatc.win)] <- 0
gene.win <- left_join(window.v, gene.count, by=c("V1", "V2", "V3"))
gene.win[is.na(gene.win)] <- 0
ipd.win <- left_join(window.v, ipd, by=c("V1", "V2", "V3"))
ipd.win[is.na(ipd.win)] <- 0
rnaseq.win <- left_join(window.v, rnaseq, by=c("V1", "V2", "V3"))
rnaseq.win[is.na(rnaseq.win)] <- 0
pam.win <- left_join(window.v, pam.count, by=c("V1", "V2", "V3"))
pam.win[is.na(pam.win)] <- 0

gene.df <- gene.win$gene.count
gatc.df <- gatc.win$gatc.count
pam.df <- pam.win$pam.count

ipd.df <- ipd.win[,4]
structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]
rna.df <- rnaseq.win[,4]

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/modwt")

temp.modwt <- wavMODWT(temp.df, wavelet="haar")
temp.modwt.df <- as.matrix(temp.modwt)
temp.modwt.label <- data.frame(label = row.names(temp.modwt.df), temp.modwt.df)
temp.modwt.dt <- as.data.table(temp.modwt.label)
temp.modwt.name <- temp.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(temp.modwt.name) <- c("label", "temp.dwt", "scale", "window")
write.table(temp.modwt.name, "temp.modwt.haar.txt", quote=F, row.names=F, sep="\t")

gc.modwt <- wavMODWT(gc.df, wavelet="haar")
gc.modwt.df <- as.matrix(gc.modwt)
gc.modwt.label <- data.frame(label = row.names(gc.modwt.df), gc.modwt.df)
gc.modwt.dt <- as.data.table(gc.modwt.label)
gc.modwt.name <- gc.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(gc.modwt.name) <- c("label", "gc.dwt", "scale", "window")
write.table(gc.modwt.name, "gc.modwt.haar.txt", quote=F, row.names=F, sep="\t")

structure.modwt <- wavMODWT(structure.df, wavelet="haar")
structure.modwt.df <- as.matrix(structure.modwt)
structure.modwt.label <- data.frame(label = row.names(structure.modwt.df), structure.modwt.df)
structure.modwt.dt <- as.data.table(structure.modwt.label)
structure.modwt.name <- structure.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(structure.modwt.name) <- c("label", "structure.dwt", "scale", "window")
write.table(structure.modwt.name, "structure.modwt.haar.txt", quote=F, row.names=F, sep="\t")

rna.modwt <- wavMODWT(rna.df, wavelet="haar")
rna.modwt.df <- as.matrix(rna.modwt)
rna.modwt.label <- data.frame(label = row.names(rna.modwt.df), rna.modwt.df)
rna.modwt.dt <- as.data.table(rna.modwt.label)
rna.modwt.name <- rna.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(rna.modwt.name) <- c("label", "rna.dwt", "scale", "window")
write.table(rna.modwt.name, "rnaseq.modwt.haar.txt", quote=F, row.names=F, sep="\t")

ipd.modwt <- wavMODWT(ipd.df, wavelet="haar")
ipd.modwt.df <- as.matrix(ipd.modwt)
ipd.modwt.label <- data.frame(label = row.names(ipd.modwt.df), ipd.modwt.df)
ipd.modwt.dt <- as.data.table(ipd.modwt.label)
ipd.modwt.name <- ipd.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(ipd.modwt.name) <- c("label", "ipd.dwt", "scale", "window")
write.table(ipd.modwt.name, "ipd.modwt.haar.txt", quote=F, row.names=F, sep="\t")

gene.modwt <- wavMODWT(gene.df, wavelet="haar")
gene.modwt.df <- as.matrix(gene.modwt)
gene.modwt.label <- data.frame(label = row.names(gene.modwt.df), gene.modwt.df)
gene.modwt.dt <- as.data.table(gene.modwt.label)
gene.modwt.name <- gene.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(gene.modwt.name) <- c("label", "gene.dwt", "scale", "window")
write.table(gene.modwt.name, "gene.density.modwt.haar.txt", quote=F, row.names=F, sep="\t")

gatc.modwt <- wavMODWT(gatc.df, wavelet="haar")
gatc.modwt.df <- as.matrix(gatc.modwt)
gatc.modwt.label <- data.frame(label = row.names(gatc.modwt.df), gatc.modwt.df)
gatc.modwt.dt <- as.data.table(gatc.modwt.label)
gatc.modwt.name <- gatc.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(gatc.modwt.name) <- c("label", "gatc.dwt", "scale", "window")
write.table(gatc.modwt.name, "gatc.density.modwt.haar.txt", quote=F, row.names=F, sep="\t")

pam.modwt <- wavMODWT(pam.df, wavelet="haar")
pam.modwt.df <- as.matrix(pam.modwt)
pam.modwt.label <- data.frame(label = row.names(pam.modwt.df), pam.modwt.df)
pam.modwt.dt <- as.data.table(pam.modwt.label)
pam.modwt.name <- pam.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(pam.modwt.name) <- c("label", "pam.dwt", "scale", "window")
write.table(pam.modwt.name, "pam.density.modwt.haar.txt", quote=F, row.names=F, sep="\t")

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/modwt")
temp.modwt.name <- read.delim("temp.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
gc.modwt.name <- read.delim("gc.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
structure.modwt.name <- read.delim("structure.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
rna.modwt.name <- read.delim("rnaseq.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
gene.modwt.name <- read.delim("gene.density.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
gatc.modwt.name <- read.delim("gatc.density.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
ipd.modwt.name <- read.delim("ipd.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
pam.modwt.name <- read.delim("pam.density.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
window <- read.table("ecoli.20bp.sliding.bed", header=F, sep="\t", stringsAsFactors = F)
score <- read.table("sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
colnames(score) <- c("chr", "start", "end", "sgRNA", "id", "seq", "id2", "cut.score", "gid", "change.val", "quality")
score.df <- score[,c(1:4,8)]

colnames(window) <- c("chr", "start", "end")
window$window <- seq.int(nrow(window))
window$window <- as.character(window$window-1)
window$start <- as.numeric(window$start)
window$end <- as.numeric(window$end - 1)

window.score.df <- left_join(score.df, window, by=c("chr", "start", "end"))
window.score.df$window <- as.integer(window.score.df$window)
window.score.temp <- left_join(window.score.df, temp.modwt.name[,c(3,4,2)], by="window")
window.temp.gc <- left_join(window.score.temp, gc.modwt.name[,c(3,4,2)], by=c("window", "scale"))
window.temp.gc.structure <- left_join(window.temp.gc, structure.modwt.name[,c(3,4,2)], by=c("window", "scale"))
window.temp.gc.structure.rna <- left_join(window.temp.gc.structure, rna.modwt.name[,c(3,4,2)], by=c("window", "scale"))
window.temp.gc.structure.rna.gene <- left_join(window.temp.gc.structure.rna, gene.modwt.name[,c(3,4,2)], by=c("window", "scale"))
window.temp.gc.structure.rna.gene.gatc <- left_join(window.temp.gc.structure.rna.gene, gatc.modwt.name[,c(3,4,2)], by=c("window", "scale"))
window.temp.gc.structure.rna.gene.gatc.ipd <- left_join(window.temp.gc.structure.rna.gene.gatc, ipd.modwt.name[,c(3,4,2)], by=c("window", "scale"))
window.temp.gc.structure.rna.gene.gatc.ipd.pam <- left_join(window.temp.gc.structure.rna.gene.gatc.ipd, pam.modwt.name[,c(3,4,2)], by=c("window", "scale"))
nrow(window.temp.gc.structure.rna.gene.gatc.ipd.pam)
# 
window.temp.gc.structure.rna.gene.gatc.ipd.pam.sgRNA <- subset(window.temp.gc.structure.rna.gene.gatc.ipd.pam, window.temp.gc.structure.rna.gene.gatc.ipd.pam$cut.score != "NA")
nrow(window.temp.gc.structure.rna.gene.gatc.ipd.pam)
# 
write.table(window.temp.gc.structure.rna.gene.gatc.ipd.pam.sgRNA, "ecoli.20sliding.exact.DWT.haar.txt", quote=F, row.names=F, sep="\t")

df.melt <- melt(window.temp.gc.structure.rna.gene.gatc.ipd.pam.sgRNA[,c(4,5,7:15)], id=c("cut.score", "scale", "sgRNA"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNA", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNA + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
nrow(df.dcast.na)
# 
write.table(df.dcast.na, "ecoli.20sliding.exact.DWT.haar.dcast.txt", quote=F, row.names=F, sep="\t")

Raw + DWT matrix

# combine regional DWT with other features 
library(tidyr)
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df.dcast.na <- read.delim("ecoli.20sliding.exact.DWT.haar.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
df.dcast.sep <- df.dcast.na %>% separate(sgRNA, c("sgRNA", "ID"), sep="_")
df.dcast.dwt <- df.dcast.sep[,c(4:ncol(df.dcast.sep))]
colnames(df.dcast.dwt) <- paste0('sgRNA_', colnames(df.dcast.dwt))
df.dcast <- cbind(df.dcast.sep[,1:3], df.dcast.dwt)

df <- read.delim("Ecoli.allCas9.raw.onehot.tensor.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df.sep <- df %>% separate(sgRNAID, c("sgRNA", "ID", "type"), sep="_")
nrow(df.sep)
# 40468
df.cas9 <- subset(df.sep, df.sep$type == "Cas9")
# 40468

df.sep.region <- inner_join(df.cas9[,c(1:3,1658,5:1651,1653:1657,1659)], df.dcast[,c(1,2,4:ncol(df.dcast))], by=c("sgRNA", "ID"))
df.sep.region.id <- df.sep.region %>% unite(sgRNAID, c("sgRNA", "ID", "type"), sep="_")
nrow(df.sep.region.id)
# 40468

write.table(df.sep.region.id, "ecoli.20sliding.raw.onehot.tensor.dwt.dcast.txt", quote=F, row.names=F, sep="\t")

K-mer positional encoding

  • expand current position-dependent one-hot encoding matrix to 1,2,3,4,5,etc.
  • see how influential the k-mers are in correlation output
  • where does noise overwhelm information

python encoding

### kmer positional encoding
import os, sys
import numpy as np

onehot_dict={
  'A':'1000',
  'C':'0100',
  'T':'0010',
  'G':'0001'
}

# open input and output files
input_path = sys.argv[1]
input_file = open(input_path, 'r')
dep_file = open(input_path[:-4]+'_dependent1.txt', 'w')

# loop over nucleotide sequences
for idx, line in enumerate(input_file):

    # if first iteration, write title line
    if idx == 0:
        dep_file.writelines(line+': first-order position-dependent features'+ '\n')

    # otherwise encode sequence
    else:

        # split line by tab
        line = line.split('\t')

        # extract sequence (also remove \n)
        seq = line[-1][:-1]

        # compute position-dependent features as one-hot vectors
        pos_dep = ''.join([onehot_dict[seq[i]] for i in range(len(seq))])

        # write features to file
        dep_file.writelines(line[0] + '\t' + pos_dep + '\n')

    if idx % 10000 == 0:
        print('{0:,}'.format(idx)+' lines processed...')

print('Done!')

input_file.close()
dep_file.close()

#/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer1_positional_encode.py
#python file.py data.txt




import os, sys
import numpy as np

onehot_dict = {
  'AA':'1000000000000000',
  'AC':'0100000000000000',
  'AT':'0010000000000000',
  'AG':'0001000000000000',
  'CA':'0000100000000000',
  'CC':'0000010000000000',
  'CT':'0000001000000000',
  'CG':'0000000100000000',
  'TA':'0000000010000000',
  'TC':'0000000001000000',
  'TT':'0000000000100000',
  'TG':'0000000000010000',
  'GA':'0000000000001000',
  'GC':'0000000000000100',
  'GT':'0000000000000010',
  'GG':'0000000000000001'
}

# open input and output files
input_path = sys.argv[1]
input_file = open(input_path, 'r')
dep_file = open(input_path[:-4]+'_dependent2.txt', 'w')

# loop over nucleotide sequences
for idx, line in enumerate(input_file):

    # if first iteration, write title line
    if idx == 0:
        dep_file.writelines(line+': second-order position-dependent features'+ '\n')

    # otherwise encode sequence
    else:

        # split line by tab
        line = line.split('\t')

        # extract sequence (also remove \n)
        seq = line[-1][:-1]

        # compute position-dependent features as one-hot vectors
        pos_dep = ''.join([onehot_dict[seq[i:i+2]] for i in range(len(seq)-1)])

        # write features to file
        dep_file.writelines(line[0] + '\t' + pos_dep + '\n')

    if idx % 10000 == 0:
        print('{0:,}'.format(idx)+' lines processed...')

print('Done!')

input_file.close()
dep_file.close()

#/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer2_positional_encode.py
#python file.py data.txt



import os, sys
import numpy as np

onehot_dict = {
'AAA':'1000000000000000000000000000000000000000000000000000000000000000',
'AAC':'0100000000000000000000000000000000000000000000000000000000000000',
'AAT':'0010000000000000000000000000000000000000000000000000000000000000',
'AAG':'0001000000000000000000000000000000000000000000000000000000000000',
'ACA':'0000100000000000000000000000000000000000000000000000000000000000',
'ACC':'0000010000000000000000000000000000000000000000000000000000000000',
'ACT':'0000001000000000000000000000000000000000000000000000000000000000',
'ACG':'0000000100000000000000000000000000000000000000000000000000000000',
'ATA':'0000000010000000000000000000000000000000000000000000000000000000',
'ATC':'0000000001000000000000000000000000000000000000000000000000000000',
'ATT':'0000000000100000000000000000000000000000000000000000000000000000',
'ATG':'0000000000010000000000000000000000000000000000000000000000000000',
'AGA':'0000000000001000000000000000000000000000000000000000000000000000',
'AGC':'0000000000000100000000000000000000000000000000000000000000000000',
'AGT':'0000000000000010000000000000000000000000000000000000000000000000',
'AGG':'0000000000000001000000000000000000000000000000000000000000000000',
'CAA':'0000000000000000100000000000000000000000000000000000000000000000',
'CAC':'0000000000000000010000000000000000000000000000000000000000000000',
'CAT':'0000000000000000001000000000000000000000000000000000000000000000',
'CAG':'0000000000000000000100000000000000000000000000000000000000000000',
'CCA':'0000000000000000000010000000000000000000000000000000000000000000',
'CCC':'0000000000000000000001000000000000000000000000000000000000000000',
'CCT':'0000000000000000000000100000000000000000000000000000000000000000',
'CCG':'0000000000000000000000010000000000000000000000000000000000000000',
'CTA':'0000000000000000000000001000000000000000000000000000000000000000',
'CTC':'0000000000000000000000000100000000000000000000000000000000000000',
'CTT':'0000000000000000000000000010000000000000000000000000000000000000',
'CTG':'0000000000000000000000000001000000000000000000000000000000000000',
'CGA':'0000000000000000000000000000100000000000000000000000000000000000',
'CGC':'0000000000000000000000000000010000000000000000000000000000000000',
'CGT':'0000000000000000000000000000001000000000000000000000000000000000',
'CGG':'0000000000000000000000000000000100000000000000000000000000000000',
'TAA':'0000000000000000000000000000000010000000000000000000000000000000',
'TAC':'0000000000000000000000000000000001000000000000000000000000000000',
'TAT':'0000000000000000000000000000000000100000000000000000000000000000',
'TAG':'0000000000000000000000000000000000010000000000000000000000000000',
'TCA':'0000000000000000000000000000000000001000000000000000000000000000',
'TCC':'0000000000000000000000000000000000000100000000000000000000000000',
'TCT':'0000000000000000000000000000000000000010000000000000000000000000',
'TCG':'0000000000000000000000000000000000000001000000000000000000000000',
'TTA':'0000000000000000000000000000000000000000100000000000000000000000',
'TTC':'0000000000000000000000000000000000000000010000000000000000000000',
'TTT':'0000000000000000000000000000000000000000001000000000000000000000',
'TTG':'0000000000000000000000000000000000000000000100000000000000000000',
'TGA':'0000000000000000000000000000000000000000000010000000000000000000',
'TGC':'0000000000000000000000000000000000000000000001000000000000000000',
'TGT':'0000000000000000000000000000000000000000000000100000000000000000',
'TGG':'0000000000000000000000000000000000000000000000010000000000000000',
'GAA':'0000000000000000000000000000000000000000000000001000000000000000',
'GAC':'0000000000000000000000000000000000000000000000000100000000000000',
'GAT':'0000000000000000000000000000000000000000000000000010000000000000',
'GAG':'0000000000000000000000000000000000000000000000000001000000000000',
'GCA':'0000000000000000000000000000000000000000000000000000100000000000',
'GCC':'0000000000000000000000000000000000000000000000000000010000000000',
'GCT':'0000000000000000000000000000000000000000000000000000001000000000',
'GCG':'0000000000000000000000000000000000000000000000000000000100000000',
'GTA':'0000000000000000000000000000000000000000000000000000000010000000',
'GTC':'0000000000000000000000000000000000000000000000000000000001000000',
'GTT':'0000000000000000000000000000000000000000000000000000000000100000',
'GTG':'0000000000000000000000000000000000000000000000000000000000010000',
'GGA':'0000000000000000000000000000000000000000000000000000000000001000',
'GGC':'0000000000000000000000000000000000000000000000000000000000000100',
'GGT':'0000000000000000000000000000000000000000000000000000000000000010',
'GGG':'0000000000000000000000000000000000000000000000000000000000000001'
}

# open input and output files
input_path = sys.argv[1]
input_file = open(input_path, 'r')
dep_file = open(input_path[:-4]+'_dependent3.txt', 'w')

# loop over nucleotide sequences
for idx, line in enumerate(input_file):

    # if first iteration, write title line
    if idx == 0:
        dep_file.writelines(line+': third-order position-dependent features'+ '\n')

    # otherwise encode sequence
    else:

        # split line by tab
        line = line.split('\t')

        # extract sequence (also remove \n)
        seq = line[-1][:-1]

        # compute position-dependent features as one-hot vectors
        pos_dep = ''.join([onehot_dict[seq[i:i+3]] for i in range(len(seq)-2)])

        # write features to file
        dep_file.writelines(line[0] + '\t' + pos_dep + '\n')

    if idx % 10000 == 0:
        print('{0:,}'.format(idx)+' lines processed...')

print('Done!')

input_file.close()
dep_file.close()

#/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer3_positional_encode.py
#python file.py data.txt


import os, sys
import numpy as np

onehot_dict = {
'AAAA':'1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAAC':'0100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAAT':'0010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAAG':'0001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACA':'0000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACC':'0000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACT':'0000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACG':'0000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATA':'0000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATC':'0000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATT':'0000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATG':'0000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGA':'0000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGC':'0000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGT':'0000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGG':'0000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACAA':'0000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACAC':'0000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACAT':'0000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACAG':'0000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCA':'0000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCC':'0000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCT':'0000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCG':'0000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTA':'0000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTC':'0000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTT':'0000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTG':'0000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGA':'0000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGC':'0000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGT':'0000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGG':'0000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATAA':'0000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATAC':'0000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATAT':'0000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATAG':'0000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCA':'0000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCC':'0000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCT':'0000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCG':'0000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTA':'0000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTC':'0000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTT':'0000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTG':'0000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGA':'0000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGC':'0000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGT':'0000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGG':'0000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGAA':'0000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGAC':'0000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGAT':'0000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGAG':'0000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCA':'0000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCC':'0000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCT':'0000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCG':'0000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTA':'0000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTC':'0000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTT':'0000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTG':'0000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGA':'0000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGC':'0000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGT':'0000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGG':'0000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAAA':'0000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAAC':'0000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAAT':'0000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAAG':'0000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACA':'0000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACC':'0000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACT':'0000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACG':'0000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATA':'0000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATC':'0000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATT':'0000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATG':'0000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000',
'TGTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000',
'TGTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000',
'TGTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000',
'TGGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000',
'TGGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000',
'TGGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000',
'TGGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000',
'GAAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000',
'GAAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000',
'GAAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000',
'GAAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000',
'GACA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000',
'GACC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000',
'GACT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000',
'GACG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000',
'GATA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000',
'GATC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000',
'GATT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000',
'GATG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000',
'GAGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000',
'GAGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000',
'GAGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000',
'GAGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000',
'GCAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000',
'GCAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000',
'GCAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000',
'GCAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000',
'GCCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000',
'GCCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000',
'GCCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000',
'GCCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000',
'GCTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000',
'GCTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000',
'GCTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000',
'GCTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000',
'GCGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000',
'GCGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000',
'GCGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000',
'GCGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000',
'GTAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000',
'GTAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000',
'GTAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000',
'GTAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000',
'GTCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000',
'GTCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000',
'GTCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000',
'GTCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000',
'GTTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000',
'GTTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000',
'GTTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000',
'GTTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000',
'GTGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000',
'GTGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000',
'GTGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000',
'GTGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000',
'GGAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000',
'GGAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000',
'GGAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000',
'GGAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000',
'GGCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000',
'GGCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000',
'GGCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000',
'GGCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000',
'GGTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000',
'GGTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000',
'GGTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000',
'GGTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000',
'GGGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000',
'GGGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100',
'GGGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010',
'GGGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001'
}

# open input and output files
input_path = sys.argv[1]
input_file = open(input_path, 'r')
dep_file = open(input_path[:-4]+'_dependent4.txt', 'w')

# loop over nucleotide sequences
for idx, line in enumerate(input_file):

    # if first iteration, write title line
    if idx == 0:
        dep_file.writelines(line+': fourth-order position-dependent features'+ '\n')

    # otherwise encode sequence
    else:

        # split line by tab
        line = line.split('\t')

        # extract sequence (also remove \n)
        seq = line[-1][:-1]

        # compute position-dependent features as one-hot vectors
        pos_dep = ''.join([onehot_dict[seq[i:i+4]] for i in range(len(seq)-3)])

        # write features to file
        dep_file.writelines(line[0] + '\t' + pos_dep + '\n')

    if idx % 10000 == 0:
        print('{0:,}'.format(idx)+' lines processed...')

print('Done!')

input_file.close()
dep_file.close()

#/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer4_positional_encode.py
#python file.py data.txt


import os, sys
import numpy as np

onehot_dict = {
'AAAAA':'1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAAAC':'0100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAAAT':'0010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAAAG':'0001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAACA':'0000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAACC':'0000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAACT':'0000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAACG':'0000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAATA':'0000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAATC':'0000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAATT':'0000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAATG':'0000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAAGA':'0000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAAGC':'0000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAAGT':'0000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAAGG':'0000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACAA':'0000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACAC':'0000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACAT':'0000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACAG':'0000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACCA':'0000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACCC':'0000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACCT':'0000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACCG':'0000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACTA':'0000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACTC':'0000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACTT':'0000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACTG':'0000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACGA':'0000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACGC':'0000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACGT':'0000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AACGG':'0000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATAA':'0000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATAC':'0000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATAT':'0000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATAG':'0000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATCA':'0000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATCC':'0000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATCT':'0000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATCG':'0000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATTA':'0000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATTC':'0000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATTT':'0000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATTG':'0000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATGA':'0000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATGC':'0000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATGT':'0000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AATGG':'0000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGAA':'0000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGAC':'0000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGAT':'0000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGAG':'0000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGCA':'0000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGCC':'0000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGCT':'0000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGCG':'0000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGTA':'0000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGTC':'0000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGTT':'0000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGTG':'0000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGGA':'0000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGGC':'0000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGGT':'0000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AAGGG':'0000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACAAA':'0000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACAAC':'0000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACAAT':'0000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACAAG':'0000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACACA':'0000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACACC':'0000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACACT':'0000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACACG':'0000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACATA':'0000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACATC':'0000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACATT':'0000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACATG':'0000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACAGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACAGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACAGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACAGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACCGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACTGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ACGGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATAAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATAAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATAAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATAAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATACA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATACC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATACT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATACG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATATA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATATC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATATT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATATG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATAGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATAGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATAGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATAGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATCGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATTGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'ATGGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGAAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGAAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGAAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGAAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGACA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGACC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGACT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGACG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGATA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGATC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGATT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGATG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGAGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGAGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGAGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGAGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000',
'AGCCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGCGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGTGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'AGGGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAAAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAAAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAAAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAAAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAACA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAACC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAACT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAACG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAATA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAATC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAATT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAATG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAAGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAAGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAAGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAAGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CACGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CATGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CAGGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCAAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCAAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCAAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCAAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCACA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCACC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCACT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCACG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCATA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCATC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCATT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCATG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCAGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCAGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCAGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCAGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCCGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCTGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CCGGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTAAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTAAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTAAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTAAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTACA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTACC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTACT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTACG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTATA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTATC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTATT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTATG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTAGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTAGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTAGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTAGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTCGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTTGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CTGGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGAAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGAAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGAAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGAAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGACA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGACC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGACT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGACG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGATA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGATC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGATT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGATG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGAGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGAGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGAGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGAGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGCGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGTGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'CGGGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAAAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAAAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAAAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAAAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAACA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAACC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAACT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAACG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAATA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAATC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAATT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAATG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAAGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAAGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAAGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAAGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TACGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TATGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TAGGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCAAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCAAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCAAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCAAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCACA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCACC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCACT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCACG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCATA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCATC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCATT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCATG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCAGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCAGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCAGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCAGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCCGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCTGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TCGGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTAAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTAAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTAAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTAAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTACA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTACC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTACT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTACG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTATA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTATC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTATT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTATG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTAGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTAGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTAGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTAGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTCGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTTGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TTGGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGAAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGAAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGAAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGAAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGACA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGACC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGACT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGACG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGATA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGATC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGATT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGATG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGAGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGAGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGAGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGAGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGCGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGTGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGGAA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGGAC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGGAT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGGAG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGGCA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGGCC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGGCT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGGCG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGGTA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGGTC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGGTT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGGTG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGGGA':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGGGC':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGGGT':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'TGGGG':'000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAAAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAAAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAAAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAAAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAACA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAACC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAACT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAACG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAATA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAATC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAATT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAATG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAAGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAAGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAAGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAAGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GACAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GACAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GACAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GACAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GACCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GACCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GACCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GACCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GACTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GACTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GACTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GACTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GACGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GACGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GACGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GACGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GATAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GATAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GATAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GATAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GATCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GATCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GATCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GATCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GATTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GATTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GATTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GATTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GATGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GATGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GATGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GATGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAGAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAGAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAGAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAGAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAGCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAGCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAGCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAGCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAGTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAGTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAGTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAGTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAGGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAGGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAGGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GAGGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCAAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCAAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCAAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCAAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCACA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCACC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCACT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCACG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCATA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCATC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCATT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCATG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCAGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCAGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCAGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCAGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCCAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCCAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCCAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCCAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCCCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCCCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCCCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCCCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCCTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCCTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCCTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCCTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCCGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCCGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCCGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCCGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCTAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCTAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCTAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCTAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCTCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCTCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCTCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCTCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCTTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCTTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCTTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCTTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCTGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCTGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCTGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCTGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCGAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCGAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCGAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCGAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCGCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCGCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCGCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCGCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCGTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCGTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCGTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCGTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCGGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCGGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCGGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GCGGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTAAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTAAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTAAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTAAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTACA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTACC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTACT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTACG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTATA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTATC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTATT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTATG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTAGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTAGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTAGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTAGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTCAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTCAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTCAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTCAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTCCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTCCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTCCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTCCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTCTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTCTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTCTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTCTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTCGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTCGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTCGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTCGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTTAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTTAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTTAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTTAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTTCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTTCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTTCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTTCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTTTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTTTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTTTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTTTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTTGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTTGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTTGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTTGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTGAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTGAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTGAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTGAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTGCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000000',
'GTGCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000000',
'GTGCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000',
'GTGCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000000',
'GTGTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000000',
'GTGTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000000',
'GTGTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000000',
'GTGTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000000',
'GTGGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000000',
'GTGGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000000',
'GTGGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000000',
'GTGGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000000',
'GGAAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000000',
'GGAAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000000',
'GGAAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000000',
'GGAAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000000',
'GGACA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000000',
'GGACC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000',
'GGACT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000000',
'GGACG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000000',
'GGATA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000',
'GGATC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000000',
'GGATT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000000',
'GGATG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000',
'GGAGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000000',
'GGAGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000000',
'GGAGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000',
'GGAGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000000',
'GGCAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000000',
'GGCAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000',
'GGCAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000000',
'GGCAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000',
'GGCCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000',
'GGCCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000000',
'GGCCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000',
'GGCCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000',
'GGCTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000000',
'GGCTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000',
'GGCTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000',
'GGCTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000000',
'GGCGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000',
'GGCGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000',
'GGCGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000000',
'GGCGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000',
'GGTAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000',
'GGTAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000000',
'GGTAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000',
'GGTAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000',
'GGTCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000000',
'GGTCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000',
'GGTCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000',
'GGTCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000000',
'GGTTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000',
'GGTTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000',
'GGTTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000000',
'GGTTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000',
'GGTGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000',
'GGTGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000000',
'GGTGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000',
'GGTGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000000',
'GGGAA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000000',
'GGGAC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000',
'GGGAT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000000',
'GGGAG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000000',
'GGGCA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000',
'GGGCC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000000',
'GGGCT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000000',
'GGGCG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000',
'GGGTA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000',
'GGGTC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000',
'GGGTT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000',
'GGGTG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000',
'GGGGA':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000',
'GGGGC':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100',
'GGGGT':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010',
'GGGGG':'0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001'
}

# open input and output files
input_path = sys.argv[1]
input_file = open(input_path, 'r')
dep_file = open(input_path[:-4]+'_dependent5.txt', 'w')

# loop over nucleotide sequences
for idx, line in enumerate(input_file):

    # if first iteration, write title line
    if idx == 0:
        dep_file.writelines(line+': fifth-order position-dependent features'+ '\n')

    # otherwise encode sequence
    else:

        # split line by tab
        line = line.split('\t')

        # extract sequence (also remove \n)
        seq = line[-1][:-1]

        # compute position-dependent features as one-hot vectors
        pos_dep = ''.join([onehot_dict[seq[i]] for i in range(len(seq))])

        # write features to file
        dep_file.writelines(line[0] + '\t' + pos_dep + '\n')

    if idx % 10000 == 0:
        print('{0:,}'.format(idx)+' lines processed...')

print('Done!')

input_file.close()
dep_file.close()

#/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer5_positional_encode.py
#python file.py data.txt
automated
def kmer2onehot(kmer, letters='ACTG'):
    idx = 0
    onehot = '0' * len(letters)**len(kmer)
    for position, mer in enumerate(kmer[::-1]):
        idx += letters.index(mer) * len(letters)**(position)
    onehot = onehot[:idx] + '1' + onehot[idx+1:]
    return onehot

–> testing methods in python

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

# conda install -c conda-forge more-itertools

python

from itertools import product
from string import ascii_lowercase
alphabet = 'ACTG'
keywords = [''.join(i) for i in product(alphabet, repeat = 5)]
print(keywords)

def kmer2onehot(kmer, letters='ACTG'):
    idx = 0
    onehot = '0' * len(letters)**len(kmer)
    for position, mer in enumerate(kmer[::-1]):
        idx += letters.index(mer) * len(letters)**(position)
    onehot = onehot[:idx] + '1' + onehot[idx+1:]
    return onehot

global onehot
# onehot = []
# for i in keywords:
#     kmer2onehot = kmer2onehot(i, letters='ACTG')
#     #print("'" + i + "'" + ":" + "'" + onehot + "',")
#     onehot.append("'" + i + "'" + ":" + "'" + kmer2onehot + "',")
onehot_dict = {}
for kmer in keywords:
    onehot_dict[kmer] = kmer2onehot(kmer, letters='ACTG')

### kmer positional encoding
import os, sys
import numpy as np

# onehot_dict={
#   THIS IS WHERE I NEED THE OUTPUT FROM THE PREVIOUS FUNCTION...
# }

# open input and output files
input_path = sys.argv[1]
input_file = open(input_path, 'r')
dep_file = open(input_path[:-4]+'_dependent5.txt', 'w')

# loop over nucleotide sequences
for idx, line in enumerate(input_file):

    # if first iteration, write title line
    if idx == 0:
        dep_file.writelines(line+': fifth-order position-dependent features'+ '\n')

    # otherwise encode sequence
    else:

        # split line by tab
        line = line.split('\t')

        # extract sequence (also remove \n)
        seq = line[-1][:-1]

        # compute position-dependent features as one-hot vectors
        pos_dep = ''.join([onehot_dict[seq[i:i+5]] for i in range(len(seq)-4)])

        # write features to file
        dep_file.writelines(line[0] + '\t' + pos_dep + '\n')

    if idx % 10000 == 0:
        print('{0:,}'.format(idx)+' lines processed...')

print('Done!')

input_file.close()
dep_file.close()

#/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer5_positional_encode.py
#python file.py data.txt

file generation

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/
python ../kmer1_positional_encode.py Ecoli.allCas9.noscore.txt
python ../kmer2_positional_encode.py Ecoli.allCas9.noscore.txt
python ../kmer3_positional_encode.py Ecoli.allCas9.noscore.txt
python ../kmer4_positional_encode.py Ecoli.allCas9.noscore.txt
python ../kmer5_positional_encode.py Ecoli.allCas9.noscore.txt
python ../kmer6_positional_encode.py Ecoli.allCas9.noscore.txt
python ../kmer7_positional_encode.py Ecoli.allCas9.noscore.txt
python ../kmer8_positional_encode.py Ecoli.allCas9.noscore.txt


# separate nucleotide sequence values into individual columns in data frame so each position counts as one feature
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/

sed '1d' Ecoli.allCas9.noscore_dependent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Ecoli.allCas9_dep1.txt
sed '1d' Ecoli.allCas9.noscore_dependent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Ecoli.allCas9_dep2.txt
sed '1d' Ecoli.allCas9.noscore_dependent3.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Ecoli.allCas9_dep3.txt
sed '1d' Ecoli.allCas9.noscore_dependent4.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Ecoli.allCas9_dep4.txt
sed '1d' Ecoli.allCas9.noscore_dependent5.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Ecoli.allCas9_dep5.txt
sed '1d' Ecoli.allCas9.noscore_dependent6.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Ecoli.allCas9_dep6.txt
sed '1d' Ecoli.allCas9.noscore_dependent7.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Ecoli.allCas9_dep7.txt
sed '1d' Ecoli.allCas9.noscore_dependent8.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Ecoli.allCas9_dep8.txt

score matrix

#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J kmer.onehot.matrix
#SBATCH -N 2
#SBATCH -t 24:00:00
#SBATCH --mem-per-cpu=0

module load python
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
R CMD BATCH onehot.kmer1to8.score.matrix.R

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/onehot.kmer1to8.score.matrix.sh
# salloc -A SYB105 -N 2 -p gpu -t 4:00:00

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
score <- read.delim("Ecoli.allCas9.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(1:2)]
colnames(score.df) <- c("sgRNAID", "cut.score")

onehot.dep1 <- read.delim("Ecoli.allCas9_dep1.txt", header=F, sep=" ")
onehot.dep2 <- read.delim("Ecoli.allCas9_dep2.txt", header=F, sep=" ")
onehot.dep3 <- read.delim("Ecoli.allCas9_dep3.txt", header=F, sep=" ")
onehot.dep4 <- read.delim("Ecoli.allCas9_dep4.txt", header=F, sep=" ")
onehot.dep5 <- read.delim("Ecoli.allCas9_dep5.txt", header=F, sep=" ")
onehot.dep6 <- read.delim("Ecoli.allCas9_dep6.txt", header=F, sep=" ")
onehot.dep7 <- read.delim("Ecoli.allCas9_dep7.txt", header=F, sep=" ")
onehot.dep8 <- read.delim("Ecoli.allCas9_dep8.txt", header=F, sep=" ")
colnames(onehot.dep1)[1] <- "sgRNAID"
colnames(onehot.dep2)[1] <- "sgRNAID"
colnames(onehot.dep3)[1] <- "sgRNAID"
colnames(onehot.dep4)[1] <- "sgRNAID"
colnames(onehot.dep5)[1] <- "sgRNAID"
colnames(onehot.dep6)[1] <- "sgRNAID"
colnames(onehot.dep7)[1] <- "sgRNAID"
colnames(onehot.dep8)[1] <- "sgRNAID"

onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)-1], onehot.dep2[,1:ncol(onehot.dep2)-1], by="sgRNAID")
onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)-1], by="sgRNAID")
onehot.dep1234 <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)-1], by="sgRNAID")
onehot.dep12345 <- full_join(onehot.dep1234, onehot.dep5[,1:ncol(onehot.dep5)-1], by="sgRNAID")
onehot.dep123456 <- full_join(onehot.dep12345, onehot.dep6[,1:ncol(onehot.dep6)-1], by="sgRNAID")
onehot.dep1234567 <- full_join(onehot.dep123456, onehot.dep7[,1:ncol(onehot.dep7)-1], by="sgRNAID")
onehot.dep <- full_join(onehot.dep1234567, onehot.dep8[,1:ncol(onehot.dep8)-1], by="sgRNAID")
onehot.score <- full_join(score.df, onehot.dep, by="sgRNAID")

df.melt <- melt(onehot.score, id=c("cut.score", "sgRNAID"))
df <- na.omit(df.melt)

colnames(df) <- c("cut.score", "sgRNAID", "variable", "value")

df$value <- as.numeric(df$value)
df.id <- df[!(is.na(df$value) | df$value==""), ]
colnames(df.id) <- c("cut.score", "sgRNAID", "feature", "value")

df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
write.table(df.dcast, "Ecoli.allCas9.kmer1to8.encoding.txt", quote=F, row.names=F, sep="\t")

iRF

#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J iRF.onehot.kmer
#SBATCH -N 1
#SBATCH -p gpu
#SBATCH -t 10:00:00
#SBATCH --mem-per-cpu=0

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
R CMD BATCH iRF.onehot.kmer.R
R CMD BATCH iRF.onehot.kmer.control.R
R CMD BATCH iRF.onehot.kmer1to8.R

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.onehot.kmer.sh
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

R

library(ranger)

iRF <- function(xmat, y, ntree=200, iter=5, classification=F, threads=1, alwayssplits=NULL, saveall=T)
{
  tmp <- cbind(xmat, Y = y)
  wt <- rep(1/ncol(xmat), ncol(xmat)) # start with equal sample weighting per SNP
  rfs <- list()
  for(i in 1:iter)
  {
    cat("\niRF iteration ",i,"\n")
    cat("=================\n")
    mtry = 0.5*sum(wt>0)
    rf <- ranger::ranger(dependent.variable.name = "Y", data = tmp, num.trees=ntree,
                         split.select.weights = wt, classification = classification,
                         mtry = mtry, importance = "impurity_corrected", num.threads=threads, write.forest = T,
                         always.split.variables = alwayssplits)
    wt        <- rf$variable.importance / sum(abs(rf$variable.importance)) # scale importance to range(0,1)
    wt[wt<0]  <- 0 # set negative weights to zero
    cat("mtry:  ", mtry, "\n")
    cat("prediction error:  ",rf$prediction.error,"\n")
    if(classification==FALSE) cat("r^2:   ",rf$r.squared,"\n")
    if(classification==TRUE) print(rf$confusion.matrix)
    cat("cor(y,yhat):   ",cor(rf$predictions,y),"\n")
    cat("SNPs with importance > 0:",sum(wt>0),"\n")
    if(saveall) rfs[[i]] <- rf
    if(sum(wt>0) < max(0.01*(ncol(xmat)-1), 10))
    {
      if(!saveall) rfs <- rf
      break
    }
  }
  return(rfs)
}


library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("Ecoli.allCas9.kmer.encoding.txt", header=T, sep="\t", stringsAsFactors = F)
df.sep <- separate(df, sgRNAID, c("sgRNA", "ID", "cas"), sep="_")
df.cas9 <- subset(df.sep, df.sep$cas == "Cas9")
df.cas9.id <- unite(df.cas9, "sgRNAID", c(sgRNA, ID, cas), sep="_")
set.seed(2458)
df.sample <- df.cas9.id[sample(nrow(df.cas9.id), 10000), ]

# kmer = 1
df.1 <- df.sample[,c(2:82)]
iRF(df.1[,2:ncol(df.1)], df.1$cut.score)
# iRF iteration  2 
# =================
# mtry:   31.5 
# prediction error:   89.54715 
# r^2:    0.194584 
# cor(y,yhat):    0.4412569 
# SNPs with importance > 0: 45 

# kmer = 2
df.2 <- df.sample[,c(2,83:386)]
iRF(df.2[,2:ncol(df.2)], df.2$cut.score)
# iRF iteration  2 
# =================
# mtry:   89 
# prediction error:   88.42499 
# r^2:    0.2046771 
# cor(y,yhat):    0.4546679 
# SNPs with importance > 0: 114 

# kmer = 3
df.3 <- df.sample[,c(2,387:1538)]
iRF(df.3[,2:ncol(df.3)], df.3$cut.score)
# iRF iteration  3 
# =================
# mtry:   196 
# prediction error:   88.91244 
# r^2:    0.2002929 
# cor(y,yhat):    0.470527 
# SNPs with importance > 0: 282 

# kmer = 4
df.4 <- df.sample[,c(2,1539:5890)]
iRF(df.4[,2:ncol(df.4)], df.4$cut.score)
# iRF iteration  4
# =================
# mtry:   599 
# prediction error:   89.92779 
# r^2:    0.1911605 
# cor(y,yhat):    0.4695909 
# SNPs with importance > 0: 931 

# kmer = 5
df.5 <- df.sample[,c(2,5891:ncol(df.sample))]
iRF(df.5[,2:ncol(df.5)], df.5$cut.score)

# kmer = 1 + 2
df.1.2 <- df.sample[,c(2:386)]
iRF(df.1.2[,2:ncol(df.1.2)], df.1.2$cut.score)
# iRF iteration  2 
# =================
# mtry:   111.5 
# prediction error:   86.3626 
# r^2:    0.2232269 
# cor(y,yhat):    0.472461 
# SNPs with importance > 0: 136 

# kmer = 1 + 2 + 3
df.1.2.3 <- df.sample[,c(2:1538)]
iRF(df.1.2.3[,2:ncol(df.1.2.3)], df.1.2.3$cut.score)
# iRF iteration  5 
# =================
# mtry:   127.5 
# prediction error:   84.82643 
# r^2:    0.2370438 
# cor(y,yhat):    0.4899957 
# SNPs with importance > 0: 214 

# kmer = 1 + 2 + 3 + 4
df.1.2.3.4 <- df.sample[,c(2:5890)]
iRF(df.1.2.3.4[,2:ncol(df.1.2.3.4)], df.1.2.3.4$cut.score)
# iRF iteration  5
# =================
# mtry:   460 
# prediction error:   81.89764 
# r^2:    0.2633862 
# cor(y,yhat):    0.5152009 
# SNPs with importance > 0: 738 

# kmer = 1 + 2 + 3 + 4 + 5
df.1.2.3.4.5 <- df.sample[,c(2:ncol(df.sample))]
iRF(df.1.2.3.4.5[,2:ncol(df.1.2.3.4.5)], df.1.2.3.4.5$cut.score)

######################## NEED TO FIGURE OUT HOW TO NON-MANUALLY CODE THE KMERS ######################## 


--> control... test using 2-mer twice instead of using 4-mer to see what that does?
df.1.2.3 <- df.sample[,c(2:1538)]
df.2 <- df.sample[,c(2,83:386)]
df.1.2.3.2 <- cbind(df.1.2.3, df.2[,2:ncol(df.2)], df.2[,2:ncol(df.2)])
iRF(df.1.2.3.2[,2:ncol(df.1.2.3.2)], df.1.2.3.2$cut.score)
# iteration 3
# mtry:   343 
# prediction error:   84.09708 
# r^2:    0.2436038 
# cor(y,yhat):    0.4937429 <--- adding more and more kmers is really just accentuating the same information???

# SNPs with importance > 0: 482 

  



### expanding kmers to 8

library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("Ecoli.allCas9.kmer1to8.encoding.txt", header=T, sep="\t", stringsAsFactors = F)
df.sep <- separate(df, sgRNAID, c("sgRNA", "ID", "cas"), sep="_")
df.cas9 <- subset(df.sep, df.sep$cas == "Cas9")
df.cas9.id <- unite(df.cas9, "sgRNAID", c(sgRNA, ID, cas), sep="_")
set.seed(2458)
df.sample <- df.cas9.id[sample(nrow(df.cas9.id), 10000), ]

# kmer = 1
df.1 <- df.sample[,c(2:82)]
iRF(df.1[,2:ncol(df.1)], df.1$cut.score)

# kmer = 2
df.2 <- df.sample[,c(2,83:386)]
iRF(df.2[,2:ncol(df.2)], df.2$cut.score)

# kmer = 3
df.3 <- df.sample[,c(2,387:1538)]
iRF(df.3[,2:ncol(df.3)], df.3$cut.score)

# kmer = 4
df.4 <- df.sample[,c(2,1539:5890)]
iRF(df.4[,2:ncol(df.4)], df.4$cut.score)

# kmer = 5
df.5 <- df.sample[,c(2,5891:)]
iRF(df.5[,2:ncol(df.5)], df.5$cut.score)

# kmer = 6
df.5 <- df.sample[,c(2,)]
iRF(df.5[,2:ncol(df.5)], df.5$cut.score)

# kmer = 7
df.5 <- df.sample[,c(2,)]
iRF(df.5[,2:ncol(df.5)], df.5$cut.score)

# kmer = 8
df.5 <- df.sample[,c(2,)]
iRF(df.5[,2:ncol(df.5)], df.5$cut.score)

# kmer = 1 - 2
df.1.2 <- df.sample[,c(2:386)]
iRF(df.1.2[,2:ncol(df.1.2)], df.1.2$cut.score)

# kmer = 1 - 3
df.1.2.3 <- df.sample[,c(2:1538)]
iRF(df.1.2.3[,2:ncol(df.1.2.3)], df.1.2.3$cut.score)

# kmer = 1 - 4
df.1.2.3 <- df.sample[,c(2:5890)]
iRF(df.1.2.3[,2:ncol(df.1.2.3)], df.1.2.3$cut.score)

# kmer = 1 - 5
df.1.2.3 <- df.sample[,c(2:)]
iRF(df.1.2.3[,2:ncol(df.1.2.3)], df.1.2.3$cut.score)

# kmer = 1 - 6
df.1.2.3 <- df.sample[,c(2:)]
iRF(df.1.2.3[,2:ncol(df.1.2.3)], df.1.2.3$cut.score)

# kmer = 1 - 7
df.1.2.3 <- df.sample[,c(2:)]
iRF(df.1.2.3[,2:ncol(df.1.2.3)], df.1.2.3$cut.score)

# kmer = 1 - 8
df.1.2.3 <- df.sample[,c(2:)]
iRF(df.1.2.3[,2:ncol(df.1.2.3)], df.1.2.3$cut.score)
full matrix
  • combine additional one-hot kmers with full matrix and run iRF
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J e.coli.full
#SBATCH -N 1
#SBATCH -t 10:00:00
#SBATCH --mem-per-cpu=0
#SBATCH -o e.coli.full-%j.o
#SBATCH -e e.coli.full-%j.e

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
R CMD BATCH e.coli.full.R

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.full.sh
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
full <- read.delim("ecoli.20sliding.raw.onehot.tensor.dwt.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
df <- read.delim("Ecoli.allCas9.kmer.encoding.txt", header=T, sep="\t", stringsAsFactors = F)
df.sep <- separate(df, sgRNAID, c("sgRNA", "ID", "cas"), sep="_")
df.cas9 <- subset(df.sep, df.sep$cas == "Cas9")
df.cas9.id <- unite(df.cas9, "sgRNAID", c(sgRNA, ID, cas), sep="_")

library(dplyr)
df.full <- left_join(full, df.cas9.id[,c(1,387:ncol(df.cas9.id))], by="sgRNAID")
# 7343
write.table(df.full, "ecoli.20sliding.raw.onehot.kmer1to4.tensor.dwt.dcast.txt", quote=F, row.names=F, sep="\t")

set.seed(2458)
df.sample <- df.full[sample(nrow(df.full), 10000), ]

library(ranger)

iRF <- function(xmat, y, ntree=200, iter=5, classification=F, threads=1, alwayssplits=NULL, saveall=T)
{
  tmp <- cbind(xmat, Y = y)
  wt <- rep(1/ncol(xmat), ncol(xmat)) # start with equal sample weighting per SNP
  rfs <- list()
  for(i in 1:iter)
  {
    cat("\niRF iteration ",i,"\n")
    cat("=================\n")
    mtry = 0.5*sum(wt>0)
    rf <- ranger::ranger(dependent.variable.name = "Y", data = tmp, num.trees=ntree,
                         split.select.weights = wt, classification = classification,
                         mtry = mtry, importance = "impurity_corrected", num.threads=threads, write.forest = T,
                         always.split.variables = alwayssplits)
    wt        <- rf$variable.importance / sum(abs(rf$variable.importance)) # scale importance to range(0,1)
    wt[wt<0]  <- 0 # set negative weights to zero
    cat("mtry:  ", mtry, "\n")
    cat("prediction error:  ",rf$prediction.error,"\n")
    if(classification==FALSE) cat("r^2:   ",rf$r.squared,"\n")
    if(classification==TRUE) print(rf$confusion.matrix)
    cat("cor(y,yhat):   ",cor(rf$predictions,y),"\n")
    cat("SNPs with importance > 0:",sum(wt>0),"\n")
    if(saveall) rfs[[i]] <- rf
    if(sum(wt>0) < max(0.01*(ncol(xmat)-1), 10))
    {
      if(!saveall) rfs <- rf
      break
    }
  }
  return(rfs)
}

df.kmer <- df.sample[,c(2:ncol(df.sample))]
iRF(df.kmer[,2:ncol(df.kmer)], df.kmer$cut.score)
# iRF iteration  5 
# =================
# mtry:   481 
# prediction error:   85.28233 
# r^2:    0.2209295 
# cor(y,yhat):    0.4711767 
# SNPs with importance > 0: 718 


df.raw <- df.sample[,c(2,1642:1644,1650,1653)]
iRF(df.raw[,2:ncol(df.raw)], df.raw$cut.score)
# iRF iteration  1 
# =================
# mtry:   2.5 
# prediction error:   107.6557 
# r^2:    0.01654484 
# cor(y,yhat):    0.1625113 
# SNPs with importance > 0: 2 

df.dwt <- df.sample[,c(2,1656:1839)]
iRF(df.dwt[,2:ncol(df.dwt)], df.dwt$cut.score)
# iRF iteration  2 
# =================
# mtry:   61.5 
# prediction error:   101.7383 
# r^2:    0.07060119 
# cor(y,yhat):    0.2656731 
# SNPs with importance > 0: 91 

df.onehot <- df.sample[,c(2,3:17,1645:1649,1651:1652,1654:1655,18:57,120:139,202:221,284:303,366:385,448:467,530:549,612:631,694:713,776:795,920:943,1068:1087,1150:1169,1232:1251,1314:1333,1396:1415,1478:1497,1560:1579,1840:ncol(df.sample))]
iRF(df.onehot[,2:ncol(df.onehot)], df.onehot$cut.score)
# iRF iteration  5 
# =================
# mtry:   441 
# prediction error:   80.37818 
# r^2:    0.2657299 
# cor(y,yhat):    0.5159296 
# SNPs with importance > 0: 707 

df.quantum <- df.sample[,c(2,58:119,140:201,222:283,304:365,386:447,468:529,550:611,632:693,714:775,796:919,944:1067,1088:1149,1170:1231,1252:1313,1334:1395,1416:1477,1498:1559,1580:1641)]
iRF(df.quantum[,2:ncol(df.quantum)], df.quantum$cut.score)
# iRF iteration  5 
# =================
# mtry:   125 
# prediction error:   88.3275 
# r^2:    0.1931113 
# cor(y,yhat):    0.442198 
# SNPs with importance > 0: 175 

df.raw.dwt <- cbind(df.raw, df.dwt[,2:ncol(df.dwt)])
iRF(df.raw.dwt[,2:ncol(df.raw.dwt)], df.raw.dwt$cut.score)
# iRF iteration  5 
# =================
# mtry:   31.5 
# prediction error:   101.5211 
# r^2:    0.07258498 
# cor(y,yhat):    0.2709458 
# SNPs with importance > 0: 53 

df.raw.onehot <- cbind(df.raw, df.onehot[,2:ncol(df.onehot)])
iRF(df.raw.onehot[,2:ncol(df.raw.onehot)], df.raw.onehot$cut.score)
# iRF iteration  5 
# =================
# mtry:   453 
# prediction error:   80.5656 
# r^2:    0.2640177 
# cor(y,yhat):    0.5147411 
# SNPs with importance > 0: 725 

df.raw.quantum <- cbind(df.raw, df.quantum[,2:ncol(df.quantum)])
iRF(df.raw.quantum[,2:ncol(df.raw.quantum)], df.raw.quantum$cut.score)
# iRF iteration  3 
# =================
# mtry:   234 
# prediction error:   87.29735 
# r^2:    0.2025218 
# cor(y,yhat):    0.451906 
# SNPs with importance > 0: 319 

df.onehot.dwt <- cbind(df.onehot, df.dwt[,2:ncol(df.dwt)])
iRF(df.onehot.dwt[,2:ncol(df.onehot.dwt)], df.onehot.dwt$cut.score)
# iRF iteration  3
# =================
# mtry:   680.5 
# prediction error:   84.89164 
# r^2:    0.2244985 
# cor(y,yhat):    0.4753814 
# SNPs with importance > 0: 812 

df.onehot.quantum <- cbind(df.onehot, df.quantum[,2:ncol(df.quantum)])
iRF(df.onehot.quantum[,2:ncol(df.onehot.quantum)], df.onehot.quantum$cut.score)
# iRF iteration  4
# =================
# mtry:   708 
# prediction error:   81.49942 
# r^2:    0.255487 
# cor(y,yhat):    0.5069903 
# SNPs with importance > 0: 1038 

df.quantum.dwt <- cbind(df.quantum, df.dwt[,2:ncol(df.dwt)])
iRF(df.quantum.dwt[,2:ncol(df.quantum.dwt)], df.quantum.dwt$cut.score)
# iRF iteration  5 
# =================
# mtry:   194.5 
# prediction error:   87.64181 
# r^2:    0.1993751 
# cor(y,yhat):    0.4479156 
# SNPs with importance > 0: 312 

df.raw.dwt.onehot <- cbind(df.raw, df.dwt[,2:ncol(df.dwt)], df.onehot.quantum[,2:ncol(df.onehot.quantum)])
iRF(df.raw.dwt.onehot[,2:ncol(df.raw.dwt.onehot)], df.raw.dwt.onehot$cut.score)
# iteration 5
# mtry:   412 
# prediction error:   84.02855 
# r^2:    0.232383 
# cor(y,yhat):    0.4842411 
# SNPs with importance > 0: 612 

df.raw.dwt.quantum <- cbind(df.raw, df.dwt[,2:ncol(df.dwt)], df.quantum[,2:ncol(df.quantum)])
iRF(df.raw.dwt.quantum[,2:ncol(df.raw.dwt.quantum)], df.raw.dwt.quantum$cut.score)
# iRF iteration  5 
# =================
# mtry:   186.5 
# prediction error:   87.38436 
# r^2:    0.201727 
# cor(y,yhat):    0.4502237 
# SNPs with importance > 0: 322 

df.raw.onehot.quantum <- cbind(df.raw, df.onehot[,2:ncol(df.onehot)], df.quantum[,2:ncol(df.quantum)])
iRF(df.raw.onehot.quantum[,2:ncol(df.raw.onehot.quantum)], df.raw.onehot.quantum$cut.score)
# iteration 5
# mtry:   531 
# prediction error:   81.57787 
# r^2:    0.2547704 
# cor(y,yhat):    0.5064083 
# SNPs with importance > 0: 833 

df.dwt.onehot.quantum <- cbind(df.dwt, df.onehot[,2:ncol(df.onehot)], df.quantum[,2:ncol(df.quantum)])
iRF(df.dwt.onehot.quantum[,2:ncol(df.dwt.onehot.quantum)], df.dwt.onehot.quantum$cut.score)

df.all <- cbind(df.dwt, df.onehot[,2:ncol(df.onehot)], df.raw[,2:ncol(df.raw)], df.quantum[,2:ncol(df.quantum)])
iRF(df.all[,2:ncol(df.all)], df.all$cut.score)

RUNNING test iRF - summit

library(dplyr)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("ecoli.20sliding.raw.onehot.kmer1to4.tensor.dwt.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
ncol(df)
# 7343
df.num <- mutate_all(df[,2:ncol(df)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"

write.table(df.all[,c(1,3:ncol(df.all))], "e.coli.cas9.raw.onehot.tensor.pam.location.dwt.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "e.coli.cas9.raw.onehot.tensor.pam.location.dwt.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "e.coli.cas9.raw.onehot.tensor.pam.location.dwt.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "e.coli.cas9.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "e.coli.cas9.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.all[,2]), "e.coli.cas9.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("Ecoli.allCas9.raw.onehot.tensor.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df.sep <- df %>% separate(sgRNAID, c("sgRNA", "ID", "type"), sep="_")
df.cas9 <- subset(df.sep, df.sep$type == "Cas9")
df <- df.cas9[,c(1:3,1658,5:1651,1653:1657,1659)]
df.cas9.id <- unite(df, "sgRNAID", c(sgRNA, ID, type), sep="_")
write.table(df.cas9.id, "Ecoli.allCas9.raw.onehot.tensor.pam.location.dcast.txt", quote=F, row.names=F, sep="\t")
ncol(df)
# 1657
df.num <- mutate_all(df.cas9.id[,2:ncol(df.cas9.id)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"

write.table(df.all[,c(1,3:ncol(df.all))], "e.coli.cas9.raw.onehot.tensor.pam.location.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "e.coli.cas9.raw.onehot.tensor.pam.location.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "e.coli.cas9.raw.onehot.tensor.pam.location.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.noDWT
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.noDWT
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName e.coli.noDWT --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.cas9.raw.onehot.tensor.pam.location.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.cas9.score.txt

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.cas9.DWT
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.cas9.DWT
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName e.coli.cas9.DWT --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.cas9.raw.onehot.tensor.pam.location.dwt.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.cas9.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.noDWT/Submits/submit_full_e.coli.noDWT_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.cas9.DWT/Submits/submit_full_e.coli.cas9.DWT_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.noDWT/Submits/submit_train_e.coli.noDWT_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.cas9.DWT/Submits/submit_train_e.coli.cas9.DWT_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.noDWT/Submits/submit_test_e.coli.noDWT_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.cas9.DWT/Submits/submit_test_e.coli.cas9.DWT_0.sh

# Andes
module load python/3.7-anaconda3

vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.noDWT
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt e.coli.noDWT
# 
sort -k3rg topVarEdges/cut.score_top95.txt | head
# p20homo_lumo_energygapraw cut.score   0.07800873556611687
# p15.CCsgRNA.raw   cut.score   0.03354121734879912
# GGsgRNA.raw   cut.score   0.032218576015115866
# p19.GGsgRNA.raw   cut.score   0.03201591141170038
# pam.distance0 cut.score   0.03177960466450477
# CCsgRNA.raw   cut.score   0.03044758321566565
# sgRNA.gcsgRNA.raw cut.score   0.028365152554358
# sgRNA.tempsgRNA.raw   cut.score   0.026106844733878046
# TsgRNA.raw    cut.score   0.024264328556364425
# p20xz_quadrupoleraw   cut.score   0.021899457216109038
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/e.coli.noDWT_cut.score.importance4 | head
# p20homo_lumo_energygapraw: 22341
# p15.CCsgRNA.raw: 13634
# sgRNA.gcsgRNA.raw: 11941.9
# sgRNA.tempsgRNA.raw: 11300.7
# p19.GGsgRNA.raw: 10844.7
# GGsgRNA.raw: 10575.2
# CCsgRNA.raw: 10572
# pam.distance0: 9257.53
# TsgRNA.raw: 8865.36
# GsgRNA.raw: 8007.28

# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.noDWT/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("e.coli.noDWT_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.09442431 <-- SOMETHING ISN'T RIGHT HERE....


cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.cas9.DWT
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt e.coli.cas9e.DWT
# 
sort -k3rg topVarEdges/cut.score_top95.txt | head
 
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/e.coli.cas9.DWT_cut.score.importance4 | head


# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.cas9.DWT/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("e.coli.cas9.DWT_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 
SHAP
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate shap

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli

# python
import pandas as pd
import numpy as np
np.random.seed(0)
import matplotlib.pyplot as plt
df = pd.read_table('/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.pam.location.dcast.txt') # Load the data
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.ensemble import RandomForestRegressor
# The target variable is 'cut.score'.
Y = df['cut.score']
# get list of features from R... dput(colnames(df))
X = df.drop(columns =['sgRNAID', 'cut.score'])

# Split the data into train and test data:
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size = 0.2)
# Build the model with the random forest regression algorithm:
model = RandomForestRegressor(max_depth=6,random_state=0,n_estimators=10)
model.fit(X_train, Y_train)

import shap
shap_values = shap.TreeExplainer(model).shap_values(X_train)
f = plt.figure()
shap.summary_plot(shap_values, X_train, plot_type="bar")
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.noDWT.16dec.shap_summary_plot_bar.png", bbox_inches='tight', dpi=600)

import matplotlib.pyplot as plt
f = plt.figure()
shap.summary_plot(shap_values, X_train)
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.noDWT.16dec.shap_summary_plot_varimp.png", bbox_inches='tight', dpi=600)

# scp noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.noDWT.16dec.shap_summary_plot_varimp.png /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP/.
RIT

** Need to compile the C++ file /gpfs/alpine/syb105/proj-shared/Personal/jromero/codesnippets/ritw **

  • run RIT on Cas9 model with all features
  • need to run arva-rit and then runRIT.sh (3 scripts)
  • two outputs: size effect and directionality
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00

# /gpfs/alpine/syb105/proj-shared/Personal/jromero/PathAnalysis/runRIT.sh
## cp /gpfs/alpine/syb105/proj-shared/Personal/jromero/PathAnalysis/runRIT.sh /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh
# runRIT.sh feature name            ### Note: name is name of the run and feature is the name of the y-value

#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/Ecoli.allCas9/all.features/dwt20bp.noncor2.cas9/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score dwt20bp.noncor2.cas9

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/Ecoli.allCas9/all.features/dwt20bp.noncor2.cas9/cut.score/RIT.run

DNA/RNA quantum tensor additions (15 December 2021)

Email from Stephan: Attached please find our recent DNA/RNA monomer base and base pair data for now. This data was created with the help of Bredesen Center student Tyler Walker, cc’d to this message. Listed are total energies (please ignore), HOMO energy in eV, LUMO energy in eV, HOMO-LUMO (HL gap) gap in eV, number of valence electrons in the molecule (this determines to some extent the HOMO energy), as well as interaction energy E in kcal/mol. We have checked the literature and our HL gap data for monomers is consistent with previously reported results.

Somewhat surprisingly, when hydrogen bonded base pairs are formed through hydrogen bonding, the HL gaps are significantly reduced, and I have to research this a bit more to see if others have reported this behavior as well. Another surprise, perhaps more significant, is that the GC base pair is significantly stronger H-bonded than AT and AU pairs. If one had assumed additivity rules, one would have expected: -5 kcal/mol (1 H bond) (water dimer) -10 kcal/mol (2 H bonds) (AT, AU) -15 kcal/mol (3 H bonds) (GC) Instead, we see -11 kcal/mol for AT and AU (consistent with additivity model) but -25 kcal/mol, which overshoots the additivity model by 10 kcal/mol, indicating a synergistic strengthening due to the 3rd H bond in this base pair. Naively speaking one would therefore expect that unpairing GC pairs requires more energy than AT/AU pairs. This however should be done by the RNA polymerase, I assume. Not sure about CRISPR-CAS9.

# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

R
library(dplyr)
library(reshape2)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("protein_rna_dna-vector_lee_nucleotide_dna_data_15dec.txt", header=T, sep="\t", stringsAsFactors = F)
seq <- read.delim("Ecoli.allCas9.sequence.txt", header=T, sep=" ", stringsAsFactors = F)

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:5]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- c("A", "C", "G", "T")

rownames(seq) <- seq[,1]
seq.df <- seq[,2:21]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))

seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Ecoli.allCas9.tensorsDNARNA.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Ecoli.allCas9.tensorsDNARNA.melt.txt", quote=F, row.names=F, sep="\t")
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
structure <- read.delim("Ecoli.allCas9.structure.txt", header=T, sep="\t", stringsAsFactors = F)
nuc <- read.delim("Ecoli.allCas9.nuc.count.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("Ecoli.allCas9.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(1:2)]
colnames(score.df) <- c("sgRNAID", "cut.score")

structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]

# structure, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])

structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"

structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]

structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "seq", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")

## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
onehot.ind1 <- read.delim("Ecoli.allCas9_ind1.txt", header=T, sep=" ")
onehot.ind2 <- read.delim("Ecoli.allCas9_ind2.txt", header=T, sep=" ")
onehot.dep1 <- read.delim("Ecoli.allCas9_dep1.txt", header=T, sep=" ")
onehot.dep2 <- read.delim("Ecoli.allCas9_dep2.txt", header=T, sep=" ")
onehot.dep2 <- onehot.dep2[,1:305]

onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep <- full_join(onehot.dep1, onehot.dep2, by="sgRNAID")
onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"

data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))

df.melt <- melt(data.onehot[,c(1,2,4:ncol(data.onehot))], id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")

# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
tensor <- read.delim("Ecoli.allCas9.tensorsDNARNA.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0

tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0
write.table(tensor.id, "tensor.id.test", quote=F, row.names=F, sep="\t")
tensor.id <- read.delim("tensor.id.test", header=T, sep="\t")

df.score <- unique(df.id[,c(1,3)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]

head(df.id)
head(tensor.score.order)
tensor.df <- rbind(df.id, tensor.score.order)

df.dcast <- tensor.df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
write.table(df.dcast.na, "Ecoli.allCas9.raw.onehot.tensorDNARNA.dcast.na.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast.na)
# 126182


# pam (distance and nucleotide)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
sgRNA.pam <- read.table("ecoli.sgRNA.closestPAM.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.pam.sub <- sgRNA.pam[,c(4,12,13)]
colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]
sgRNA.pam.df$id <- "Cas9"
sgRNA.pam.id <- unite(sgRNA.pam.df, "sgRNAID", c(sgRNAID, id), sep="_")

score.location <- left_join(score.df, sgRNA.pam.id, by="sgRNAID")
score.location$scale <- 0

df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
# 40468

df <- read.delim("Ecoli.allCas9.raw.onehot.tensorDNARNA.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)

df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
nrow(df.location)
# 40468


# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
sgRNA.genes <- read.table("sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- sgRNA.genes[,c(4,14)]
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")
sgRNA.genes.df$id <- "Cas9"
sgRNA.genes.id <- unite(sgRNA.genes.df, "sgRNAID", c(sgRNAID, id), sep="_")

score.location <- left_join(score.df, sgRNA.genes.id, by=c("sgRNAID"))
score.location$scale <- 0

df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
# 40468

df <- df.location
df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
nrow(df.location)
# 40468

write.table(df.location, "Ecoli.allCas9.raw.onehot.tensorDNARNA.pam.location.dcast.na.txt", quote=F, row.names=F, sep="\t")

iRF

# library(tidyr)
# setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
# df <- read.delim("Ecoli.allCas9.raw.onehot.tensorDNARNA.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
# df.sep <- separate(df, sgRNAID, c("sgRNA", "ID", "cas"), sep="_")
# df.cas9 <- subset(df.sep, df.sep$cas == "Cas9")
# df.cas9.id <- unite(df.cas9, "sgRNAID", c(sgRNA, ID, cas), sep="_")
# df <- df.cas9.id[,c(1,1816,3:1809,1811:1815)]
# write.table(df, "Ecoli.allCas9.raw.onehot.tensorDNARNA.pam.location.dcast.txt", quote=F, row.names=F, sep="\t")

library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("Ecoli.allCas9.raw.onehot.tensorDNARNA.pam.location.dcast.txt", header=T, sep="\t", stringsAsFactors = F)

set.seed(2458)
df.sample <- df[sample(nrow(df), 10000), ]

library(ranger)

iRF <- function(xmat, y, ntree=200, iter=5, classification=F, threads=1, alwayssplits=NULL, saveall=T)
{
  tmp <- cbind(xmat, Y = y)
  wt <- rep(1/ncol(xmat), ncol(xmat)) # start with equal sample weighting per SNP
  rfs <- list()
  for(i in 1:iter)
  {
    cat("\niRF iteration ",i,"\n")
    cat("=================\n")
    mtry = 0.5*sum(wt>0)
    rf <- ranger::ranger(dependent.variable.name = "Y", data = tmp, num.trees=ntree,
                         split.select.weights = wt, classification = classification,
                         mtry = mtry, importance = "impurity_corrected", num.threads=threads, write.forest = T,
                         always.split.variables = alwayssplits)
    wt        <- rf$variable.importance / sum(abs(rf$variable.importance)) # scale importance to range(0,1)
    wt[wt<0]  <- 0 # set negative weights to zero
    cat("mtry:  ", mtry, "\n")
    cat("prediction error:  ",rf$prediction.error,"\n")
    if(classification==FALSE) cat("r^2:   ",rf$r.squared,"\n")
    if(classification==TRUE) print(rf$confusion.matrix)
    cat("cor(y,yhat):   ",cor(rf$predictions,y),"\n")
    cat("SNPs with importance > 0:",sum(wt>0),"\n")
    if(saveall) rfs[[i]] <- rf
    if(sum(wt>0) < max(0.01*(ncol(xmat)-1), 10))
    {
      if(!saveall) rfs <- rf
      break
    }
  }
  return(rfs)
}

# sgRNAID: [,1]
# cut.score: [,2]
# one-hot independent: [,c(3:17,1805:1811,1813:1814)]
# one-hot dependent: [,c(18:57,128:147,218:237,308:327,398:418,488:507,578:597,668:687,758:777,848:867,1008:1031,1172:1191,1262:1281,1352:1371,1442:1461,1532:1551,1622:1642,1712:1731,)]
# chemical tensors: [,c(58:127,148:217,238:307,328:397,419:487,508:577,598:667,688:757,778:847,868:1007,1032:1171,1192:1261,1282:1351,1372:1441,1462:1531,1552:1621,1643:1711,1732:1801)]
# raw (gc, structure, temp, gene.distance, pam.distance): [,c(1802:1804,1812)]

iRF(df.sample[,3:ncol(df.sample)], df.sample$cut.score)
# iRF iteration  4 
# =================
# mtry:   222.5 
# prediction error:   84.22153 
# r^2:    0.2306201 
# cor(y,yhat):    0.4812496 
# SNPs with importance > 0: 320 

df.quantum.new <- df.sample[,c(58:127,148:217,238:307,328:397,419:487,508:577,598:667,688:757,778:847,868:1007,1032:1171,1192:1261,1282:1351,1372:1441,1462:1531,1552:1621,1643:1711,1732:1801)]
iRF(df.quantum.new, df.sample$cut.score)
# iRF iteration  5 
# =================
# mtry:   139 
# prediction error:   88.10261 
# r^2:    0.1951657 
# cor(y,yhat):    0.4452454     #### slightly better than previous but not significantly
# SNPs with importance > 0: 204 

#### previously quantum data resulted in r^2:0.1931113 and cor(y,yhat): 0.442198 


## feature importance
xmat = df.sample[,3:ncol(df.sample)]
xmat.score = df.sample$cut.score

library(gbm)
gbm.df <- gbm(formula=xmat.score ~ ., data=xmat, distribution = "gaussian", n.trees = 500, shrinkage = 0.1,             
            interaction.depth = 3, bag.fraction = 0.2, train.fraction = 0.8,  
            n.minobsinnode = 10, cv.folds = 5, keep.data = TRUE, 
            verbose = FALSE, n.cores = 1)
best.iter <- gbm.perf(gbm.df, method = "OOB")
print(best.iter)
best.iter <- gbm.perf(gbm.df, method = "cv")
print(best.iter)
head(summary(gbm.df, n.trees = best.iter))
# Number of Observations: 500 
# Equivalent Number of Parameters: 39.85 
# Residual Standard Error: 0.04408 
# [1] 174
#                                 var  rel.inf
# p19.GGsgRNA.raw     p19.GGsgRNA.raw 4.044676
# sgRNA.gcsgRNA.raw sgRNA.gcsgRNA.raw 3.639182  #### GC content
# p20bp_HOMO_evraw   p20bp_HOMO_evraw 3.336434  #### updated quantum chemical tensor
# p20homo_energyraw p20homo_energyraw 2.909753
# CCsgRNA.raw             CCsgRNA.raw 2.770156
# GGsgRNA.raw             GGsgRNA.raw 2.472840


#### try with just the updated QCTs
df.quantum.newONLY <- df.sample[,c(58:61,63,65,69,73,148:151,153,155,159,163,238:241,243,245,249,253,328:331,333,335,339,343,418:421,423,425,429,453,508:511,513,515,519,523,598:601,603,605,609,613,688:691,693,695,699,703,778:781,783,785,789,793,868:871,873,875,879,883,1032:1035,1037,1039,1043,1047,1192:1195,1197,1199,1203,1207,1282:1285,1287,1289,1293,1297,1372:1375,1377,1379,1383,1387,1462:1465,1467,1469,1473,1477,1552:1555,1557,1559,1563,1567,1642:1645,1647,1649,1653,1657,1732:1735,1737,1739,1743,1747)]
df.quantum.new <- df.sample[,c(58:127,148:217,238:307,328:397,418:487,508:577,598:667,688:757,778:847,868:1007,1032:1171,1192:1261,1282:1351,1372:1441,1462:1531,1552:1621,1642:1711,1732:1801)]
df.quantum.old <- df.sample[,c(62,64,66:68,70:72,74:127,152,154,156:158,160:162,164:217,242,244,246:248,250:252,254:307,332,334,336:338,340:342,344:397,422,424,426:428,450:452,454:487,512,514,516:518,520:522,524:577,602,604,606:608,610:612,614:667,692,694,696:698,700:702,704:757,782,784,786:788,790:792,794:847,872,874,876:878,880:882,884:1007,1036,1038,1040:1042,1044:1046,1048:1171,1196,1198,1200:1202,1204:1206,1208:1261,1286,1288,1290:1292,1294:1296,1298:1351,1376,1378,1380:1382,1384:1386,1388:1441,1466,1468,1470:1472,1474:1476,1478:1531,1556,1558,1560:1562,1564:1566,1568:1621,1646,1648,1650:1652,1654:1656,1658:1711,1736,1738,1780:1782,1784:1786,1788:1801)]

iRF(df.quantum.newONLY, df.sample$cut.score)
# iRF iteration  2 
# =================
# mtry:   59 
# prediction error:   87.97316 
# r^2:    0.1963482 
# cor(y,yhat):    0.4477064 
# SNPs with importance > 0: 73 

iRF(df.quantum.old, df.sample$cut.score)
# iRF iteration  5 
# =================
# mtry:   117 
# prediction error:   88.02909 
# r^2:    0.1958372 
# cor(y,yhat):    0.4442053 
# SNPs with importance > 0: 168 

iRF(df.quantum.new, df.sample$cut.score)
# iRF iteration  5 
# =================
# mtry:   124.5 
# prediction error:   87.73299 
# r^2:    0.1985422 
# cor(y,yhat):    0.4478398 
# SNPs with importance > 0: 195 


# bp_HOMO_eV ONLY
df.bp_HOMO_eV <- df.sample[,c(60,150,240,330,420,510,600,690,780,870,940,1034,1104,1194,1284,1374,1464,1554,1644,1734)]
iRF(df.bp_HOMO_eV, df.sample$cut.score)
# iRF iteration  1 
# =================
# mtry:   10 
# prediction error:   99.29029 
# r^2:    0.09296408 
# cor(y,yhat):    0.306499 
# SNPs with importance > 0: 6 



## feature importance
library(gbm)
xmat = df.quantum.newONLY
xmat.score = df.sample$cut.score
gbm.df <- gbm(formula=xmat.score ~ ., data=xmat, distribution = "gaussian", n.trees = 500, shrinkage = 0.1,             
            interaction.depth = 3, bag.fraction = 0.2, train.fraction = 0.8,  
            n.minobsinnode = 10, cv.folds = 5, keep.data = TRUE, 
            verbose = FALSE, n.cores = 1)
best.iter <- gbm.perf(gbm.df, method = "OOB")
print(best.iter)
best.iter <- gbm.perf(gbm.df, method = "cv")
print(best.iter)
head(summary(gbm.df, n.trees = best.iter))
# Number of Observations: 500 
# Equivalent Number of Parameters: 39.85 
# Residual Standard Error: 0.05074 
# [1] 327
#                             var  rel.inf
# p19HL.gap_eVraw p19HL.gap_eVraw 6.288143
# p20bp_bondraw     p20bp_bondraw 5.540806
# p20HOMO_eVraw     p20HOMO_eVraw 4.060408
# p18HOMO_eVraw     p18HOMO_eVraw 3.850749
# p19HOMO_eVraw     p19HOMO_eVraw 3.540256
# p17HL.gap_eVraw p17HL.gap_eVraw 3.530563

library(gbm)
xmat = df.quantum.new
xmat.score = df.sample$cut.score
gbm.df <- gbm(formula=xmat.score ~ ., data=xmat, distribution = "gaussian", n.trees = 500, shrinkage = 0.1,             
            interaction.depth = 3, bag.fraction = 0.2, train.fraction = 0.8,  
            n.minobsinnode = 10, cv.folds = 5, keep.data = TRUE, 
            verbose = FALSE, n.cores = 1)
best.iter <- gbm.perf(gbm.df, method = "OOB")
print(best.iter)
best.iter <- gbm.perf(gbm.df, method = "cv")
print(best.iter)
head(summary(gbm.df, n.trees = best.iter))
# Number of Observations: 500 
# Equivalent Number of Parameters: 39.85 
# Residual Standard Error: 0.05003 
# [1] 359
#                                         var  rel.inf
# p20bp_HOMO_evraw           p20bp_HOMO_evraw 2.487339
# p20homo_energyraw         p20homo_energyraw 2.239112
# p18HOMO_eVraw                 p18HOMO_eVraw 2.114659
# p19rot_constants_yraw p19rot_constants_yraw 1.663645
# p20bp_bondraw                 p20bp_bondraw 1.478998
# p19tot_dipoleraw           p19tot_dipoleraw 1.410728

library(gbm)
xmat = df.quantum.old
xmat.score = df.sample$cut.score
gbm.df <- gbm(formula=xmat.score ~ ., data=xmat, distribution = "gaussian", n.trees = 500, shrinkage = 0.1,             
            interaction.depth = 3, bag.fraction = 0.2, train.fraction = 0.8,  
            n.minobsinnode = 10, cv.folds = 5, keep.data = TRUE, 
            verbose = FALSE, n.cores = 1)
best.iter <- gbm.perf(gbm.df, method = "OOB")
print(best.iter)
best.iter <- gbm.perf(gbm.df, method = "cv")
print(best.iter)
head(summary(gbm.df, n.trees = best.iter))
# Number of Observations: 500 
# Equivalent Number of Parameters: 39.85 
# Residual Standard Error: 0.05249 
# [1] 499
#                                                 var  rel.inf
# p20homo_lumo_energygapraw p20homo_lumo_energygapraw 2.678070
# p20homo_energyraw                 p20homo_energyraw 2.121547
# p19molecular_volumeraw       p19molecular_volumeraw 1.407671
# p18homo_energyraw                 p18homo_energyraw 1.356138
# p18xz_quadrupoleraw             p18xz_quadrupoleraw 1.313431
# p19rot_constants_yraw         p19rot_constants_yraw 1.200738

library(gbm)
xmat = df.sample[,3:ncol(df.sample)]
xmat.score = df.sample$cut.score
gbm.df <- gbm(formula=xmat.score ~ ., data=xmat, distribution = "gaussian", n.trees = 500, shrinkage = 0.1,             
            interaction.depth = 3, bag.fraction = 0.2, train.fraction = 0.8,  
            n.minobsinnode = 10, cv.folds = 5, keep.data = TRUE, 
            verbose = FALSE, n.cores = 1)
best.iter <- gbm.perf(gbm.df, method = "OOB")
print(best.iter)
best.iter <- gbm.perf(gbm.df, method = "cv")
print(best.iter)
head(summary(gbm.df, n.trees = best.iter))
Number of Observations: 500 
Equivalent Number of Parameters: 39.85 
Residual Standard Error: 0.04919 
# [1] 228
#                                 var  rel.inf
# p20bp_bondraw         p20bp_bondraw 5.011324
# p19.GGsgRNA.raw     p19.GGsgRNA.raw 3.197801
# sgRNA.gcsgRNA.raw sgRNA.gcsgRNA.raw 3.046770
# GGsgRNA.raw             GGsgRNA.raw 2.345230
# pam.distance0.x     pam.distance0.x 2.115209
# p15.CCsgRNA.raw     p15.CCsgRNA.raw 1.621739

kmers of quantum tensors

library(gtools)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("protein_rna_dna-vector_lee_nucleotide_dna_data_15dec.txt", header=T, sep="\t", stringsAsFactors = F)
rownames(tensor) <- tensor[,1]
input <- tensor[67:70,2:5]

nucleotides <- c("A", "C", "G", "T")

# input <- data.frame(matrix(ncol=4, nrow=2,
#                     dimnames=list(c("Feat1", "Feat2"), nucleotides)))
# input["Feat1",] <- c(2, 5, 10, 15)
# input["Feat2",] <- c(12, 15, 8, 20)

numnucleotides <- 3

# Get all permutations
# n = the number of possibilities
# r = the number of draws
# v = the nucleotides
permlist <- permutations(n=length(nucleotides), r=numnucleotides,
                         v=nucleotides, repeats.allowed=TRUE)

# Merge each rows of permlist into 1 column
# To create a vector of strings for each potential sequence
sequence <- rep(NA, nrow(permlist))
for (ii in 1:nrow(permlist)) {
  sequence[ii] <- paste(permlist[ii,], sep="", collapse="")
}

# Create diagnal matrix that corresponds to all possible sequences
diagmatrix <- diag(1, nrow(permlist))
rownames(diagmatrix) <- sequence
colnames(diagmatrix) <- sequence


# Loop through each sequence in the permutation
for (ii in 1:nrow(permlist)) {
  # Create an empty vector in order to substitute values
  nucleotidevector <- rep(NA, numnucleotides)
  # Loop through each nucleotide of the sequence
  for (jj in 1:numnucleotides) {
    # Get nucleotide
    nucleotidevalue <- permlist[ii,jj]
    # Get value that corresponds to the nucleotide
    # Place in vector of values
    nucleotidevector[jj] <- input[1,eval(nucleotidevalue)]
  }
  # Substitute mean of nucleotide values into the corresponding location
  # in the diagonal matrix 
  diagmatrix[ii,ii] <- mean(nucleotidevector)
}

write.table(diagmatrix, "tensor.kmer.3.txt", quote=F, row.names=T, col.names=F, sep=" ")
### kmer positional encoding
import os, sys
import numpy as np

#kmer 1
onehot_dict={
  'A':'-5.367  0.000  0.000  0.000 ',
  'C':'0.000 -4.951  0.000  0.000 ',
  'T':'0.000  0.000 -4.951  0.000 ',
  'G':'0.000  0.000  0.000 -5.367 '
}

#kmer 2
onehot_dict={
'AA':'-5.367  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000 0.000  0.000  0.000  0.000  0.000 ',
'AC':'0.000 -5.159  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000 0.000  0.000  0.000  0.000  0.000 ',
'AG':'0.000  0.000 -5.159  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000 0.000  0.000  0.000  0.000  0.000 ',
'AT':'0.000  0.000  0.000 -5.367  0.000  0.000  0.000  0.000  0.000  0.000  0.000 0.000  0.000  0.000  0.000  0.000 ',
'CA':'0.000  0.000  0.000  0.000 -5.159  0.000  0.000  0.000  0.000  0.000  0.000 0.000  0.000  0.000  0.000  0.000 ',
'CC':'0.000  0.000  0.000  0.000  0.000 -4.951  0.000  0.000  0.000  0.000  0.000 0.000  0.000  0.000  0.000  0.000 ',
'CG':'0.000  0.000  0.000  0.000  0.000  0.000 -4.951  0.000  0.000  0.000  0.000 0.000  0.000  0.000  0.000  0.000 ',
'CT':'0.000  0.000  0.000  0.000  0.000  0.000  0.000 -5.159  0.000  0.000  0.000 0.000  0.000  0.000  0.000  0.000 ',
'GA':'0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000 -5.159  0.000  0.000 0.000  0.000  0.000  0.000  0.000 ',
'GC':'0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000 -4.951  0.000 0.000  0.000  0.000  0.000  0.000 ',
'GG':'0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000 -4.951 0.000  0.000  0.000  0.000  0.000 ',
'GT':'0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000 -5.159  0.000  0.000  0.000  0.000 ',
'TA':'0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000 0.000 -5.367  0.000  0.000  0.000 ',
'TC':'0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000 0.000  0.000 -5.159  0.000  0.000 ',
'TG':'0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000 0.000  0.000  0.000 -5.159  0.000 ',
'TT':'0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000  0.000 0.000  0.000  0.000  0.000 -5.367'
}

#kmer 3
onehot_dict={
'AAA':'-5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AAC':'0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AAG':'0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AAT':'0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACA':'0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACC':'0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACG':'0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACT':'0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGA':'0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGC':'0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGG':'0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGT':'0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATA':'0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATC':'0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 0 0 0 ',
'TCC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 0 ',
'TCG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 0 0 0 ',
'TCT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 0 ',
'TGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 0 0 0 ',
'TGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 0 ',
'TGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.08966666666667 0 0 0 0 0 ',
'TGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 0 0 ',
'TTA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 ',
'TTC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 0 ',
'TTG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.22833333333333 0 ',
'TTT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 '
}

#kmer 4
onehot_dict={
'AAAA':'-5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AAAC':'0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AAAG':'0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AAAT':'0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AACA':'0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AACC':'0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AACG':'0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AACT':'0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AAGA':'0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AAGC':'0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AAGG':'0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AAGT':'0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AATA':'0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AATC':'0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AATG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AATT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACCA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACCC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACCG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACCT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACTA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACTC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACTG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ACTT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGCA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGCC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGCG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGCT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGTA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGTC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGTG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'AGTT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATCA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATCC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATCG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATCT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATTA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATTC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATTG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'ATTT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CAAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CAAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CAAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CAAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CACA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CACC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CACG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CACT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CAGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CAGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CAGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CAGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CATA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CATC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CATG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CATT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCCA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCCC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCCG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCCT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCTA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCTC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCTG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CCTT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGCA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGCC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGCG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGCT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGTA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGTC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGTG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CGTT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTCA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTCC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTCG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTCT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTTA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTTC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTTG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'CTTT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GAAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GAAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GAAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GAAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GACA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GACC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GACG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GACT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GAGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GAGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GAGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GAGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GATA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GATC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GATG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GATT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCCA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCCC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCCG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCCT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCTA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCTC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCTG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GCTT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGCA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGCC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGCG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGCT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -4.951 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGTA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGTC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGTG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GGTT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTCA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTCC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTCG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTCT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTTA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTTC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTTG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'GTTT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TAAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TAAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TAAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TAAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TACA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TACC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TACG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TACT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TAGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TAGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TAGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TAGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TATA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TATC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TATG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TATT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCCA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCCC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCCG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCCT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCTA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCTC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCTG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TCTT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TGAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TGAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TGAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TGAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TGCA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TGCC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TGCG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TGCT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TGGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TGGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TGGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.055 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TGGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TGTA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TGTC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TGTG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TGTT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TTAA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TTAC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TTAG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TTAT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 0 0 0 0 0 0 0 0 0 ',
'TTCA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 0 0 0 ',
'TTCC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 0 ',
'TTCG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 0 0 0 ',
'TTCT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 0 ',
'TTGA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 0 0 0 ',
'TTGC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 0 ',
'TTGG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.159 0 0 0 0 0 ',
'TTGT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 0 0 ',
'TTTA':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 0 0 0 ',
'TTTC':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 0 ',
'TTTG':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.263 0 ',
'TTTT':'0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -5.367 '
}


# open input and output files
input_path = sys.argv[1]
input_file = open(input_path, 'r')
dep_file = open(input_path[:-4]+'_dependent1.txt', 'w')

# loop over nucleotide sequences
for idx, line in enumerate(input_file):

    # if first iteration, write title line
    if idx == 0:
        dep_file.writelines(line+': third-order position-dependent features'+ '\n')

    # otherwise encode sequence
    else:

        # split line by tab
        line = line.split('\t')

        # extract sequence (also remove \n)
        seq = line[-1][:-1]

        # compute position-dependent features as one-hot vectors
        pos_dep = ''.join([onehot_dict[seq[i]] for i in range(len(seq))])

        # write features to file
        dep_file.writelines(line[0] + '\t' + pos_dep + '\n')

    if idx % 10000 == 0:
        print('{0:,}'.format(idx)+' lines processed...')

print('Done!')

input_file.close()
dep_file.close()

#/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer1_quantum_positional_encode.py
#python file.py data.txt
file generation
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/
python ../kmer1_quantum_positional_encode.py Ecoli.allCas9.noscore.txt
python ../kmer2_quantum_positional_encode.py Ecoli.allCas9.noscore.txt
python ../kmer3_quantum_positional_encode.py Ecoli.allCas9.noscore.txt
python ../kmer4_quantum_positional_encode.py Ecoli.allCas9.noscore.txt


# separate nucleotide sequence values into individual columns in data frame so each position counts as one feature
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/

sed '1d' Ecoli.allCas9.noscore_dependent1.txt | sed '1d' | sed 's/\t/ /g' > Ecoli.allCas9.quantum.tensor_dep1.txt
sed '1d' Ecoli.allCas9.noscore_dependent2.txt | sed '1d' | sed 's/\t/ /g' > Ecoli.allCas9.quantum.tensor_dep2.txt
sed '1d' Ecoli.allCas9.noscore_dependent3.txt | sed '1d' | sed 's/\t/ /g' > Ecoli.allCas9.quantum.tensor_dep3.txt
sed '1d' Ecoli.allCas9.noscore_dependent4.txt | sed '1d' | sed 's/\t/ /g' > Ecoli.allCas9.quantum.tensor_dep4.txt
RUNNING score matrix
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J kmer.matrix
#SBATCH -N 4
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0

module load python
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
R CMD BATCH quantum.kmer.score.matrix.R

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/quantum.kmer.score.matrix.sh
# salloc -A SYB105 -N 2 -p gpu -t 4:00:00

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
score <- read.delim("Ecoli.allCas9.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(1:2)]
colnames(score.df) <- c("sgRNAID", "cut.score")

onehot.dep1 <- read.delim("Ecoli.allCas9.quantum.tensor_dep1.txt", header=F, sep=" ")
onehot.dep2 <- read.delim("Ecoli.allCas9.quantum.tensor_dep2.txt", header=F, sep=" ")
onehot.dep3 <- read.delim("Ecoli.allCas9.quantum.tensor_dep3.txt", header=F, sep=" ")
onehot.dep4 <- read.delim("Ecoli.allCas9.quantum.tensor_dep4.txt", header=F, sep=" ")
colnames(onehot.dep1)[1] <- "sgRNAID"
colnames(onehot.dep2)[1] <- "sgRNAID"
colnames(onehot.dep3)[1] <- "sgRNAID"
colnames(onehot.dep4)[1] <- "sgRNAID"

onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)-1], onehot.dep2[,1:ncol(onehot.dep2)-1], by="sgRNAID")
onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)-1], by="sgRNAID")
onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)-1], by="sgRNAID")
onehot.score <- full_join(score.df, onehot.dep, by="sgRNAID")
onehot.score[is.na(onehot.score)] <- 0

df.melt <- melt(onehot.score, id=c("cut.score", "sgRNAID"))
df <- na.omit(df.melt)

colnames(df) <- c("cut.score", "sgRNAID", "variable", "value")

df$value <- as.numeric(df$value)
df.id <- df[!(is.na(df$value) | df$value==""), ]
colnames(df.id) <- c("cut.score", "sgRNAID", "feature", "value")

df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
write.table(df.dcast, "Ecoli.allCas9.quantum.tensor.kmer.encoding.txt", quote=F, row.names=F, sep="\t")
iRF
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J iRF.quantum.kmer
#SBATCH -N 1
#SBATCH -p gpu
#SBATCH -t 10:00:00
#SBATCH --mem-per-cpu=0

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
R CMD BATCH iRF.quantum.kmer.R

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.quantum.kmer.sh
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

R

library(ranger)

iRF <- function(xmat, y, ntree=200, iter=5, classification=F, threads=1, alwayssplits=NULL, saveall=T)
{
  tmp <- cbind(xmat, Y = y)
  wt <- rep(1/ncol(xmat), ncol(xmat)) # start with equal sample weighting per SNP
  rfs <- list()
  for(i in 1:iter)
  {
    cat("\niRF iteration ",i,"\n")
    cat("=================\n")
    mtry = 0.5*sum(wt>0)
    rf <- ranger::ranger(dependent.variable.name = "Y", data = tmp, num.trees=ntree,
                         split.select.weights = wt, classification = classification,
                         mtry = mtry, importance = "impurity_corrected", num.threads=threads, write.forest = T,
                         always.split.variables = alwayssplits)
    wt        <- rf$variable.importance / sum(abs(rf$variable.importance)) # scale importance to range(0,1)
    wt[wt<0]  <- 0 # set negative weights to zero
    cat("mtry:  ", mtry, "\n")
    cat("prediction error:  ",rf$prediction.error,"\n")
    if(classification==FALSE) cat("r^2:   ",rf$r.squared,"\n")
    if(classification==TRUE) print(rf$confusion.matrix)
    cat("cor(y,yhat):   ",cor(rf$predictions,y),"\n")
    cat("SNPs with importance > 0:",sum(wt>0),"\n")
    if(saveall) rfs[[i]] <- rf
    if(sum(wt>0) < max(0.01*(ncol(xmat)-1), 10))
    {
      if(!saveall) rfs <- rf
      break
    }
  }
  return(rfs)
}


library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("Ecoli.allCas9.quantum.tensor.kmer.encoding.txt", header=T, sep="\t", stringsAsFactors = F)
df.sep <- separate(df, sgRNAID, c("sgRNA", "ID", "cas"), sep="_")
df.cas9 <- subset(df.sep, df.sep$cas == "Cas9")
df.cas9.id <- unite(df.cas9, "sgRNAID", c(sgRNA, ID, cas), sep="_")
set.seed(2458)
df.sample <- df.cas9.id[sample(nrow(df.cas9.id), 10000), ]

# kmer = 1
df.1 <- df.sample[,c(2:82)]
iRF(df.1[,2:ncol(df.1)], df.1$cut.score)

# kmer = 2
df.2 <- df.sample[,c(2,83:386)]
iRF(df.2[,2:ncol(df.2)], df.2$cut.score)

# kmer = 3
df.3 <- df.sample[,c(2,387:1538)]
iRF(df.3[,2:ncol(df.3)], df.3$cut.score)

# kmer = 4
df.4 <- df.sample[,c(2,1539:5890)]
iRF(df.4[,2:ncol(df.4)], df.4$cut.score)

# kmer = 1 + 2
df.1.2 <- df.sample[,c(2:386)]
iRF(df.1.2[,2:ncol(df.1.2)], df.1.2$cut.score)

# kmer = 1 + 2 + 3
df.1.2.3 <- df.sample[,c(2:1538)]
iRF(df.1.2.3[,2:ncol(df.1.2.3)], df.1.2.3$cut.score)

# kmer = 1 + 2 + 3 + 4
df.1.2.3.4 <- df.sample[,c(2:5890)]
iRF(df.1.2.3.4[,2:ncol(df.1.2.3.4)], df.1.2.3.4$cut.score)

dimers

# add new DNA/RNA dimer features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(tidyr)
library(reshape2)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
#tensor <- read.delim("protein_rna_dna-vector_lee_nucleotide_dna_data_15dec.txt", header=T, sep="\t", stringsAsFactors = F)
tensor <- read.delim("quantum_dimers_20dec.txt", header=T, sep="\t", stringsAsFactors = F)
seq <- read.delim("Ecoli.allCas9.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.dimer <- seq %>% unite("p1", p1:p2, remove=F, sep= "") %>% unite("p2", p2:p3, remove=F, sep= "") %>% unite("p3", p3:p4, remove=F, sep= "") %>% unite("p4", p4:p5, remove=F, sep= "") %>% unite("p5", p5:p6, remove=F, sep= "") %>% unite("p6", p6:p7, remove=F, sep= "") %>% unite("p7", p7:p8, remove=F, sep= "") %>% unite("p8", p8:p9, remove=F, sep= "") %>% unite("p9", p9:p10, remove=F, sep= "") %>% unite("p10", p10:p11, remove=F, sep= "") %>% unite("p11", p11:p12, remove=F, sep= "") %>% unite("p12", p12:p13, remove=F, sep= "") %>% unite("p13", p13:p14, remove=F, sep= "") %>% unite("p14", p14:p15, remove=F, sep= "") %>% unite("p15", p15:p16, remove=F, sep= "") %>% unite("p16", p16:p17, remove=F, sep= "") %>% unite("p17", p17:p18, remove=F, sep= "") %>% unite("p18", p18:p19, remove=F, sep= "") %>% unite("p19", p19:p20, remove=T, sep= "")


tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:17]
tensor.t <- as.data.frame(t(tensor.df))
#tensor.t$base <- c("A", "C", "G", "T")
tensor.t$base <- names(tensor[,2:17])

rownames(seq) <- seq.dimer[,1]
seq.df <- seq.dimer[,2:20]
seq.melt <- melt(seq.dimer, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))

seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Ecoli.allCas9.tensorsDNARNAdimers.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Ecoli.allCas9.tensorsDNARNAdimers.melt.txt", quote=F, row.names=F, sep="\t")
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("Ecoli.allCas9.raw.onehot.tensorDNARNA.pam.location.dcast.txt", header=T, sep="\t", stringsAsFactors = F)

# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
tensor <- read.delim("Ecoli.allCas9.tensorsDNARNAdimers.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0

tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0

df.score <- unique(df[,c(1,2)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]

df.dcast <- tensor.score.order %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
write.table(df.dcast.na, "Ecoli.allCas9.raw.onehot.tensorDNARNAdimers.dcast.na.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast.na)
# 40468

df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
write.table(df.location, "Ecoli.allCas9.raw.onehot.tensorDNARNAdimers.pam.location.dcast.na.txt", quote=F, row.names=F, sep="\t")
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("Ecoli.allCas9.raw.onehot.tensorDNARNAdimers.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)

set.seed(2458)
df.sample <- df[sample(nrow(df), 10000), ]

library(ranger)

iRF <- function(xmat, y, ntree=200, iter=5, classification=F, threads=1, alwayssplits=NULL, saveall=T)
{
  tmp <- cbind(xmat, Y = y)
  wt <- rep(1/ncol(xmat), ncol(xmat)) # start with equal sample weighting per SNP
  rfs <- list()
  for(i in 1:iter)
  {
    cat("\niRF iteration ",i,"\n")
    cat("=================\n")
    mtry = 0.5*sum(wt>0)
    rf <- ranger::ranger(dependent.variable.name = "Y", data = tmp, num.trees=ntree,
                         split.select.weights = wt, classification = classification,
                         mtry = mtry, importance = "impurity_corrected", num.threads=threads, write.forest = T,
                         always.split.variables = alwayssplits)
    wt        <- rf$variable.importance / sum(abs(rf$variable.importance)) # scale importance to range(0,1)
    wt[wt<0]  <- 0 # set negative weights to zero
    cat("mtry:  ", mtry, "\n")
    cat("prediction error:  ",rf$prediction.error,"\n")
    if(classification==FALSE) cat("r^2:   ",rf$r.squared,"\n")
    if(classification==TRUE) print(rf$confusion.matrix)
    cat("cor(y,yhat):   ",cor(rf$predictions,y),"\n")
    cat("SNPs with importance > 0:",sum(wt>0),"\n")
    if(saveall) rfs[[i]] <- rf
    if(sum(wt>0) < max(0.01*(ncol(xmat)-1), 10))
    {
      if(!saveall) rfs <- rf
      break
    }
  }
  return(rfs)
}

# sgRNAID: [,1]
# cut.score: [,2]
# one-hot independent: [,c(3:17,1805:1811,1813:1814)]
# one-hot dependent: [,c(18:57,128:147,218:237,308:327,398:418,488:507,578:597,668:687,758:777,848:867,1008:1031,1172:1191,1262:1281,1352:1371,1442:1461,1532:1551,1622:1642,1712:1731)]
# chemical tensors: [,c(58:127,148:217,238:307,328:397,419:487,508:577,598:667,688:757,778:847,868:1007,1032:1171,1192:1261,1282:1351,1372:1441,1462:1531,1552:1621,1643:1711,1732:1801)]
# raw (gc, structure, temp, gene.distance, pam.distance): [,c(1802:1804,1812)]
# chemical tensor dimers: [,c(1816:1910)]

df.tensor.dimer <- df.sample[,c(1816:1910)]
iRF(df.tensor.dimer, df.sample$cut.score.x)
# iRF iteration  2 
# =================
# mtry:   26.5 
# prediction error:   94.9823 
# r^2:    0.1323184 
# cor(y,yhat):    0.3668597 
# SNPs with importance > 0: 27 

df.tensor <- df.sample[,c(58:127,148:217,238:307,328:397,419:487,508:577,598:667,688:757,778:847,868:1007,1032:1171,1192:1261,1282:1351,1372:1441,1462:1531,1552:1621,1643:1711,1732:1801)]
iRF(df.tensor, df.sample$cut.score.x)
# iRF iteration  5 
# =================
# mtry:   128 
# prediction error:   87.8625 
# r^2:    0.1973591 
# cor(y,yhat):    0.4461191 
# SNPs with importance > 0: 190 

iRF(df.sample[,c(3:1814,1816:1910)], df.sample$cut.score.x)
# iRF iteration  5 
# =================
# mtry:   165.5 
# prediction error:   84.29982 
# r^2:    0.2299048 
# cor(y,yhat):    0.4806311 
# SNPs with importance > 0: 257 

df.tensor.bond.dimer <- df.sample[,c(58:61,63,65,69,73,148:151,153,155,159,163,238:241,243,245,249,253,328:331,333,335,339,343,418:421,423,425,429,453,508:511,513,515,519,523,598:601,603,605,609,613,688:691,693,695,699,703,778:781,783,785,789,793,868:871,873,875,879,883,1032:1035,1037,1039,1043,1047,1192:1195,1197,1199,1203,1207,1282:1285,1287,1289,1293,1297,1372:1375,1377,1379,1383,1387,1462:1465,1467,1469,1473,1477,1552:1555,1557,1559,1563,1567,1642:1645,1647,1649,1653,1657,1732:1735,1737,1739,1743,1747,1816:1910)]
iRF(df.tensor.bond.dimer, df.sample$cut.score.x)
# iRF iteration  2 
# =================
# mtry:   83.5 
# prediction error:   87.03005 
# r^2:    0.2049637 
# cor(y,yhat):    0.4545251 
# SNPs with importance > 0: 107 


## feature importance
library(gbm)
xmat = df.sample[,c(3:1814,1816:1910)]
xmat.score = df.sample$cut.score.x
gbm.df <- gbm(formula=xmat.score ~ ., data=xmat, distribution = "gaussian", n.trees = 500, shrinkage = 0.1,             
            interaction.depth = 3, bag.fraction = 0.2, train.fraction = 0.8,  
            n.minobsinnode = 10, cv.folds = 5, keep.data = TRUE, 
            verbose = FALSE, n.cores = 1)
best.iter <- gbm.perf(gbm.df, method = "OOB")
print(best.iter)
best.iter <- gbm.perf(gbm.df, method = "cv")
print(best.iter)
head(summary(gbm.df, n.trees = best.iter))
# Number of Observations: 500 
# Equivalent Number of Parameters: 39.85 
# Residual Standard Error: 0.04349 
# [1] 162
#                                 var  rel.inf
# sgRNA.gcsgRNA.raw sgRNA.gcsgRNA.raw 3.433471
# p20bp_bondraw         p20bp_bondraw 2.452850
# p19H_bondraw           p19H_bondraw 2.275021
# GGsgRNA.raw             GGsgRNA.raw 2.218373
# p20homo_energyraw p20homo_energyraw 2.157199
# p20bp_HOMO_evraw   p20bp_HOMO_evraw 1.582125
RUNNING done summit iRF
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("Ecoli.allCas9.raw.onehot.tensorDNARNAdimers.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df <- df[,c(1:1654,1656)]
df.num <- mutate_all(df[,2:ncol(df)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"

write.table(df.all, "Ecoli.allCas9.raw.onehot.tensorDNARNAdimers.pam.location.dcast.na.corrected.txt", quote=F, row.names=F, sep="\t")

write.table(df.all[,c(1,3:ncol(df.all))], "Ecoli.allCas9.raw.onehot.tensorDNARNAdimers.pam.location.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "Ecoli.allCas9.raw.onehot.tensorDNARNAdimers.pam.location.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "Ecoli.allCas9.raw.onehot.tensorDNARNAdimers.pam.location.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Ecoli.allCas9.raw.onehot.tensorDNARNAdimers.pam.location.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Ecoli.allCas9.raw.onehot.tensorDNARNAdimers.pam.location.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.all[,2]), "Ecoli.allCas9.raw.onehot.tensorDNARNAdimers.pam.location.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")


# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.dimers.noDWT
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.dimers.noDWT
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName e.coli.tensor.dimers.noDWT --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensorDNARNAdimers.pam.location.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensorDNARNAdimers.pam.location.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.dimers.noDWT
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.dimers.noDWT/Submits/submit_full_e.coli.tensor.dimers.noDWT_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.dimers.noDWT/Submits/submit_train_e.coli.tensor.dimers.noDWT_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.dimers.noDWT/Submits/submit_test_e.coli.tensor.dimers.noDWT_0.sh

# Andes
module load python/3.7-anaconda3

vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.dimers.noDWT
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt e.coli.tensor.dimers.noDWT
# 0.2527974463591811
sort -k3rg topVarEdges/cut.score_top95.txt | head
# GGsgRNA.raw   cut.score   0.041778468190407723
# CCsgRNA.raw   cut.score   0.039018995323038465
# p15.CCsgRNA.raw   cut.score   0.03434113872955213
# GsgRNA.raw    cut.score   0.031975618837973466
# p19.GGsgRNA.raw   cut.score   0.03134661557038756
# CsgRNA.raw    cut.score   0.030342742435033754
# AsgRNA.raw    cut.score   0.02678672308048948
# p20bp_LUMO_evraw  cut.score   0.025944109138838368
# p20homo_lumo_energygapraw cut.score   0.025743528577272926
# GCsgRNA.raw   cut.score   0.019657819154867556

sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/e.coli.tensor.dimers.noDWT_cut.score.importance4 | head
# CCsgRNA.raw: 71761.7
# GGsgRNA.raw: 71694.9
# p20bp_HOMO_evraw: 64499.4
# p19.GGsgRNA.raw: 57819.5
# p15.CCsgRNA.raw: 57003.9
# GsgRNA.raw: 55730
# CsgRNA.raw: 48776.1
# AsgRNA.raw: 45918.2
# p20xz_quadrupoleraw: 37561.6
# GCsgRNA.raw: 33928.6

# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.dimers.noDWT/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("e.coli.tensor.dimers.noDWT_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4902065
SHAP
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate shap

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli

# python
import pandas as pd
import numpy as np
np.random.seed(0)
import matplotlib.pyplot as plt
df = pd.read_table('/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensorDNARNAdimers.pam.location.dcast.na.corrected.txt') # Load the data
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.ensemble import RandomForestRegressor
# The target variable is 'cut.score'.
Y = df['cut.score']
# get list of features from R... dput(colnames(df))
X = df.drop(columns =['sgRNAID', 'cut.score'])

# Split the data into train and test data:
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size = 0.2)
# Build the model with the random forest regression algorithm:
model = RandomForestRegressor(max_depth=6,random_state=0,n_estimators=10)
model.fit(X_train, Y_train)

import shap
shap_values = shap.TreeExplainer(model).shap_values(X_train)
f = plt.figure()
shap.summary_plot(shap_values, X_train, plot_type="bar")
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.noDWT.dimer.14feb.shap_summary_plot_bar.png", bbox_inches='tight', dpi=600)

import matplotlib.pyplot as plt
f = plt.figure()
shap.summary_plot(shap_values, X_train)
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.noDWT.dimer.14feb.shap_summary_plot_varimp.png", bbox_inches='tight', dpi=600)

# scp noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.noDWT.dimer.14feb.shap_summary_plot_varimp.png /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP/.

18 January

  • matrix including raw values, positional encoding kmers, quantum tensors (singleton, basepair, dimer)
# positional encoding kmers 1-4
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer1_positional_encode.py Ecoli.allCas9.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer2_positional_encode.py Ecoli.allCas9.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer3_positional_encode.py Ecoli.allCas9.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer4_positional_encode.py Ecoli.allCas9.noscore.txt


# separate nucleotide sequence values into individual columns in data frame so each position counts as one feature
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/

sed '1d' Ecoli.allCas9.noscore_dependent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Ecoli.allCas9_dep1.txt
sed '1d' Ecoli.allCas9.noscore_dependent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Ecoli.allCas9_dep2.txt
sed '1d' Ecoli.allCas9.noscore_dependent3.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Ecoli.allCas9_dep3.txt
sed '1d' Ecoli.allCas9.noscore_dependent4.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Ecoli.allCas9_dep4.txt
# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

R
library(dplyr)
library(reshape2)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("protein_rna_dna-vector_lee_nucleotide_dna_data_15dec.txt", header=T, sep="\t", stringsAsFactors = F)
seq <- read.delim("Ecoli.allCas9.sequence.txt", header=T, sep=" ", stringsAsFactors = F)

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:5]
tensor.t <- as.data.frame(t(tensor.df[63:70,]))
tensor.t$base <- c("A", "C", "G", "T")

rownames(seq) <- seq[,1]
seq.df <- seq[,2:21]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))

seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Ecoli.allCas9.tensors.single.bp.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Ecoli.allCas9.tensors.single.bp.melt.txt", quote=F, row.names=F, sep="\t")
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J jan18.matrix
#SBATCH -N 4
#SBATCH -t 10:00:00

module load python
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
R CMD BATCH jan18.matrix.R
R CMD BATCH jan18.matrix.2.R

#sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/jan18.matrix.sh
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
structure <- read.delim("Ecoli.allCas9.structure.txt", header=T, sep="\t", stringsAsFactors = F)
nuc <- read.delim("Ecoli.allCas9.nuc.count.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("Ecoli.allCas9.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(1:2)]
colnames(score.df) <- c("sgRNAID", "cut.score")

structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]

# structure, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])

structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"

structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]

structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "seq", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")

## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
onehot.ind1 <- read.delim("Ecoli.allCas9_ind1.txt", header=T, sep=" ")
# 5 columns (-1 for sgRNAID)
onehot.ind2 <- read.delim("Ecoli.allCas9_ind2.txt", header=T, sep=" ")
# 17
onehot.dep1 <- read.delim("Ecoli.allCas9_dep1.txt", header=F, sep=" ")
# 81
onehot.dep2 <- read.delim("Ecoli.allCas9_dep2.txt", header=F, sep=" ")
# 321
onehot.dep3 <- read.delim("Ecoli.allCas9_dep3.txt", header=F, sep=" ")
# 1154 <-- have 1218 for the labels??
onehot.dep4 <- read.delim("Ecoli.allCas9_dep4.txt", header=F, sep=" ")
# 4354 <-- have 5121 for the labels??
# 5926 total features...
colnames(onehot.dep1)[1] <- "sgRNAID"
colnames(onehot.dep2)[1] <- "sgRNAID"
colnames(onehot.dep3)[1] <- "sgRNAID"
colnames(onehot.dep4)[1] <- "sgRNAID"

onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)-1], onehot.dep2[,1:ncol(onehot.dep2)-1], by="sgRNAID")
onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)-1], by="sgRNAID")
onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)-1], by="sgRNAID")

onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"

### getting the labels for the onehot matrix
# setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
# setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/onehot")
# onehot.ind1 <- read.delim("ind1.head.txt", header=T, sep=" ")
# onehot.ind2 <- read.delim("ind2.head.txt", header=T, sep=" ")
# onehot.dep1 <- read.delim("dep1.txt", header=F, sep=" ")
# onehot.dep2 <- read.delim("dep2.txt", header=F, sep=" ")
# onehot.dep3 <- read.delim("dep3.txt", header=F, sep=" ")
# onehot.dep3 <- onehot.dep3[,1:1154]
# onehot.dep4 <- read.delim("dep4.txt", header=F, sep=" ")
# onehot.dep4 <- onehot.dep4[,1:4354]
# colnames(onehot.dep1)[1] <- "sgRNAID"
# colnames(onehot.dep2)[1] <- "sgRNAID"
# colnames(onehot.dep3)[1] <- "sgRNAID"
# colnames(onehot.dep4)[1] <- "sgRNAID"
# 
# onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
# onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)], onehot.dep2[,1:ncol(onehot.dep2)], by="sgRNAID")
# onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)], by="sgRNAID")
# onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)], by="sgRNAID")
# onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
# write.table(onehot, "onehot.labels.txt", quote=F, row.names=F, sep="\t")
# onehot.t <- data.frame(t(onehot))
# 6754 columns <-- corrected to match matrix used = 5926 total features

data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))

df.melt <- melt(data.onehot[,c(1,2,4:ncol(data.onehot))], id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
write.table(df.id, "e.coli.structure.temp.gc.onehot1to4.txt", quote=F, row.names=F, sep="\t")
# 5910

# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
tensor <- read.delim("Ecoli.allCas9.tensors.single.bp.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0

tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0
write.table(tensor.id, "tensor.id.test", quote=F, row.names=F, sep="\t")

tensor.id <- read.delim("tensor.id.test", header=T, sep="\t")
df.id <- read.delim("e.coli.structure.temp.gc.onehot1to4.txt", header=T, sep="\t")

df.score <- unique(df.id[,c(1,3)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]

head(df.id)
head(tensor.score.order)

tensor.df <- rbind(df.id, tensor.score.order)

df.dcast <- tensor.df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
write.table(df.dcast.na, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dcast.na.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast.na)
# 126182


# pam (distance and nucleotide)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
sgRNA.pam <- read.table("ecoli.sgRNA.closestPAM.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.pam.sub <- sgRNA.pam[,c(4,12,13)]
colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]
sgRNA.pam.df$id <- "Cas9"
sgRNA.pam.id <- unite(sgRNA.pam.df, "sgRNAID", c(sgRNAID, id), sep="_")

score <- read.delim("Ecoli.allCas9.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(1:2)]
colnames(score.df) <- c("sgRNAID", "cut.score")

score.location <- left_join(score.df, sgRNA.pam.id, by="sgRNAID")
score.location$scale <- 0

df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
# 40468

df <- read.delim("Ecoli.allCas9.raw.onehot.tensor.single.bp.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)

df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
nrow(df.location)
# 40468


# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
sgRNA.genes <- read.table("sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- sgRNA.genes[,c(4,14)]
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")
sgRNA.genes.df$id <- "Cas9"
sgRNA.genes.id <- unite(sgRNA.genes.df, "sgRNAID", c(sgRNAID, id), sep="_")

score.location <- left_join(score.df, sgRNA.genes.id, by=c("sgRNAID"))
score.location$scale <- 0

df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
# 40468

df <- df.location
df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
nrow(df.location)
# 40468

write.table(df.location, "Ecoli.allCas9.raw.onehot.tensor.single.bp.pam.location.dcast.na.txt", quote=F, row.names=F, sep="\t")
# add new DNA/RNA dimer features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(tidyr)
library(reshape2)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("quantum_dimers_20dec.txt", header=T, sep="\t", stringsAsFactors = F)
seq <- read.delim("Ecoli.allCas9.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.dimer <- seq %>% unite("p1", p1:p2, remove=F, sep= "") %>% unite("p2", p2:p3, remove=F, sep= "") %>% unite("p3", p3:p4, remove=F, sep= "") %>% unite("p4", p4:p5, remove=F, sep= "") %>% unite("p5", p5:p6, remove=F, sep= "") %>% unite("p6", p6:p7, remove=F, sep= "") %>% unite("p7", p7:p8, remove=F, sep= "") %>% unite("p8", p8:p9, remove=F, sep= "") %>% unite("p9", p9:p10, remove=F, sep= "") %>% unite("p10", p10:p11, remove=F, sep= "") %>% unite("p11", p11:p12, remove=F, sep= "") %>% unite("p12", p12:p13, remove=F, sep= "") %>% unite("p13", p13:p14, remove=F, sep= "") %>% unite("p14", p14:p15, remove=F, sep= "") %>% unite("p15", p15:p16, remove=F, sep= "") %>% unite("p16", p16:p17, remove=F, sep= "") %>% unite("p17", p17:p18, remove=F, sep= "") %>% unite("p18", p18:p19, remove=F, sep= "") %>% unite("p19", p19:p20, remove=T, sep= "")


tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:17]
tensor.t <- as.data.frame(t(tensor.df))
#tensor.t$base <- c("A", "C", "G", "T")
tensor.t$base <- names(tensor[,2:17])

rownames(seq) <- seq.dimer[,1]
seq.df <- seq.dimer[,2:20]
seq.melt <- melt(seq.dimer, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))

seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Ecoli.allCas9.tensors.dimers.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Ecoli.allCas9.tensors.dimers.melt.txt", quote=F, row.names=F, sep="\t")
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("Ecoli.allCas9.raw.onehot.tensor.single.bp.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)

# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
tensor <- read.delim("Ecoli.allCas9.tensors.dimers.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0

tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0

df.score <- unique(df[,c(1,2)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
colnames(tensor.score.order) <- c("cut.score", "feature.scale", "sgRNAID", "value")

df.dcast <- tensor.score.order %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
write.table(df.dcast.na, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.dcast.na.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast.na)
# 40468

df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
write.table(df.location, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.txt", quote=F, row.names=F, sep="\t")
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df <- df[,c(1:6072,6074:6078,6080,6082:6176)]
df.num <- mutate_all(df[,2:ncol(df)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"

write.table(df.all, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.txt", quote=F, row.names=F, sep="\t")

write.table(df.all[,c(1,3:ncol(df.all))], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.all[,2]), "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")


# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName e.coli.tensor.single.bp.dimers --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/Submits/submit_full_e.coli.tensor.single.bp.dimers_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/Submits/submit_train_e.coli.tensor.single.bp.dimers_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/Submits/submit_test_e.coli.tensor.single.bp.dimers_0.sh

# Andes
module load python/3.7-anaconda3

vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt e.coli.tensor.single.bp.dimers
# 0.25925023667824065
sort -k3rg topVarEdges/cut.score_top95.txt | head
# p20bp_HOMO_evraw  cut.score   0.04132934588303254
# p20bp_HL.gab_evraw    cut.score   0.027377143866354425
# V231.xsgRNA.raw   cut.score   0.02703345485327785
# V303.xsgRNA.raw   cut.score   0.0235055107545665
# sgRNA.tempsgRNA.raw   cut.score   0.021431169257080367
# sgRNA.gcsgRNA.raw cut.score   0.02122949600576284
# p20bp_LUMO_evraw  cut.score   0.021225482607726668
# pam.distance0 cut.score   0.020960305693732667
# p18HOMO_eVraw cut.score   0.020957253838975995
# p20bp_bondraw cut.score   0.020619877566560477

sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/e.coli.tensor.single.bp.dimers_cut.score.importance4 | head
# p20bp_HOMO_evraw: 124305
# V303.xsgRNA.raw: 52691.6        <-- dependent 2 (p19.GC)
# V231.xsgRNA.raw: 51941.8        <-- dependent 2 (p15.CC)
# CCsgRNA.raw: 43885
# sgRNA.gcsgRNA.raw: 39482.6
# GGsgRNA.raw: 39031.4
# sgRNA.tempsgRNA.raw: 38915.7
# pam.distance0: 38026.6
# p18LUMO_eVraw: 37618.4
# p18HOMO_eVraw: 36632.3

# V231.x = p15.CC
# V303.x = p19.GC
# V1110.x = p19.CCA
# V257.x = p16.GG
# V74.x = p5.TA
# V305.x = p19.GG
# V215.x = p14.CC


# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("e.coli.tensor.single.bp.dimers_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.5010897
RIT

** Need to compile the C++ file /gpfs/alpine/syb105/proj-shared/Personal/jromero/codesnippets/ritw **

  • run RIT on Cas9 model with all features
  • need to run arva-rit and then runRIT.sh (3 scripts)
  • two outputs: size effect and directionality
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00

#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score e.coli.tensor.single.bp.dimers

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/cut.score/RIT.run
#### looking at the top features (weight and direction)
# on local computer
#scp noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/cut.score/e.coli.tensor.single.bp.dimers.importance4.effect_sorted .

library(dplyr)
library(tidyr)
library(reshape2)
library(ggplot2)
library(RColorBrewer)

setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP")
imp <- read.delim("e.coli.tensor.single.bp.dimers.importance4.effect_sorted", header=F, sep="\t")
nrow(imp)
# 2020
imp$weight <- as.numeric(substr(imp$V3, 0, 4))
imp.dir <- imp %>% mutate(direction = ifelse(V4 < 0, "neg", ifelse(V4 > 0, "pos", "zero")))

imp.dir.top20 <- imp.dir[1:20,]
ggplot(imp.dir.top20) + geom_bar(aes(x=reorder(V1, -weight), y=weight, fill=direction), stat="identity") + theme_classic() + xlab("Top Feature") + ylab("Feature Importance Weight") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_fill_brewer(palette="Set1")



setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP")
imp <- read.delim("e.coli.tensor.single.bp.dimers.importance4.effect_sorted", header=F, sep="\t", stringsAsFactors = F)
colnames(imp) <- c("Feature", "YVec", "NormEdge", "Effect", "Samples", "Linearity")
#imp$Normalized.Importance <- as.numeric(substr(imp$NormEdge, 0, 4))
imp$Normalized.Importance <- as.numeric(imp$NormEdge)
imp$Feature.Effect <- as.numeric(imp$Effect)
imp$SampleCount <- as.numeric(imp$Samples)
imp.dir <- imp %>% mutate(Effect.Direction = ifelse(Feature.Effect < 0, "neg", ifelse(Feature.Effect > 0, "pos", "zero")))
imp.dir.top20 <- imp.dir[1:20,]

ggplot(imp.dir.top20) + geom_bar(aes(x=reorder(Feature, -Normalized.Importance), y=Normalized.Importance, fill=Effect.Direction), stat="identity") + theme_classic() + xlab("Top Features") + ylab("Normalized Importance") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_fill_brewer(palette="Set1")

ggplot(imp.dir.top20) + geom_bar(aes(x=Feature, y=Feature.Effect, fill=Effect.Direction), stat="identity") + coord_flip() + theme_classic() + xlab("Top Features") + ylab("Feature Effect") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_fill_brewer(palette="Set1")

ggplot(imp.dir.top20, aes(x=reorder(Feature, -Normalized.Importance))) + geom_bar(aes(y=Normalized.Importance, fill=Effect.Direction), stat="identity") + coord_flip() + xlab("") + ylab("Normalized Importance") + theme_classic() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position="bottom") + scale_fill_brewer(palette="Set1")

ggplot(imp.dir.top20, aes(x=reorder(Feature, -Normalized.Importance))) + geom_bar(aes(y=Normalized.Importance, fill=Effect.Direction), stat="identity") + geom_point(aes(y=abs(Feature.Effect))) + coord_flip() + xlab("") + theme_classic() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_fill_brewer(palette="Set1") + scale_y_continuous("Normalized Importance (bars)", sec.axis = sec_axis(~. * 100, name="% Feature Effect (points)"))

imp.dir.top20$Sample.Prop <- imp.dir.top20$SampleCount/32374
ggplot(imp.dir.top20, aes(x=reorder(Feature, -Normalized.Importance))) + geom_bar(aes(y=Normalized.Importance, fill=Effect.Direction), stat="identity") + geom_point(aes(y=abs(Sample.Prop))) + coord_flip() + xlab("") + theme_classic() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_fill_brewer(palette="Set1") + scale_y_continuous("Normalized Importance (bars)", sec.axis = sec_axis(~. , name="Avg Proportion of Samples that Features Influence"))

ggplot(imp.dir.top20, aes(x=reorder(Feature, -Normalized.Importance))) + geom_point(aes(y=Sample.Prop, color=Effect.Direction, size=Normalized.Importance)) + xlab("") + ylab("Avg Proportion of Samples that Features Influence") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Set2")

ggplot(imp.dir.top20, aes(x=reorder(Feature, -Normalized.Importance))) + geom_point(aes(y=Sample.Prop, color=Effect.Direction, size=Feature.Effect)) + xlab("") + ylab("Avg Proportion of Samples that Features Influence") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Set2")


#### looking at the interaction of features

library(dplyr)
library(tidyr)
library(reshape2)
library(ggplot2)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/cut.score")
rit <- read.delim("e.coli.tensor.single.bp.dimers_cut.score.rit", header=F, sep="\t")
key <- read.delim("e.coli.tensor.single.bp.dimers_cut.score.paths.key.out", header=F, sep=",")

colnames(key) <- c("feature", "feature.key")
colnames(rit) <- c("rit.value", "rit.features")
rit.id <- separate(rit, "rit.features", c("feature1.key", "feature2.key"))
rit.id$feature1.key <- as.numeric(rit.id$feature1.key)
rit.id$feature2.key <- as.numeric(rit.id$feature2.key)

key.1 <- key
colnames(key.1) <- c("feature1", "feature1.key")
key.2 <- key
colnames(key.2) <- c("feature2", "feature2.key")
rit.feature1.key <- left_join(rit.id, key.1, by=c("feature1.key"))
rit.key <- inner_join(rit.feature1.key, key.2, by=c("feature2.key"))
write.table(rit.key, "e.coli.tensor.single.bp.dimers_cut.score.rit_IDdefined.txt", quote=F, row.names=F, sep="\t")

# check to see if any of the features in this file have a 0 importance score
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/cut.score")
rit <- read.delim("e.coli.tensor.single.bp.dimers_cut.score.rit_IDdefined.txt", header=T, sep=" ")
imp <- read.delim("e.coli.tensor.single.bp.dimers_cut.score.importance4", header=F, sep=":")

imp.feature1 <- subset(imp, imp$V1 %in% rit$feature1)
imp.feature2 <- subset(imp, imp$V1 %in% rit$feature2)



## look at full RIT set
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/cut.score")
rit <- read.delim("e.coli.tensor.single.bp.dimers_cut.score.full_rit_sort", header=F, sep="\t")
key <- read.delim("e.coli.tensor.single.bp.dimers_cut.score.paths.key.out", header=F, sep=",")

colnames(key) <- c("feature", "feature.key")
colnames(rit) <- c("rit.value", "rit.features")
rit.id <- separate(rit, "rit.features", c("feature1.key", "feature2.key"))
rit.id$feature1.key <- as.numeric(rit.id$feature1.key)
rit.id$feature2.key <- as.numeric(rit.id$feature2.key)

key.1 <- key
colnames(key.1) <- c("feature1", "feature1.key")
key.2 <- key
colnames(key.2) <- c("feature2", "feature2.key")
rit.feature1.key <- left_join(rit.id, key.1, by=c("feature1.key"))
rit.key <- inner_join(rit.feature1.key, key.2, by=c("feature2.key"))
write.table(rit.key, "e.coli.tensor.single.bp.dimers_cut.score.full_rit_sort_IDdefined.txt", quote=F, row.names=F, sep="\t")




# on local computer
#scp noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/cut.score/e.coli.tensor.single.bp.dimers_cut.score.rit_IDdefined.txt .

# https://methods.sagepub.com/dataset/howtoguide/network-diagram-in-unhcr-2016
# install.packages(c("igraph","graphlayouts","ggraph","ggplot2"))

library(igraph)
library(ggraph)
library(graphlayouts)

setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP")
df <- read.csv("e.coli.tensor.single.bp.dimers_cut.score.rit_IDdefined.txt", header=T, sep=" ")
df$rit.val.dec <- as.numeric(substr(df$rit.value, 0, 4))
df.network <- df[,c(4,5,6)]

nodes1 <- df.network %>% select("feature1") %>% distinct() %>% rename("feature" = "feature1")
nodes2 <- df.network %>% select("feature2") %>% distinct() %>% rename("feature" = "feature2")
nodes <- union(nodes1,nodes2) 
nodes$ID <- seq.int(nrow(nodes))

net <- graph_from_data_frame(d=df.network, directed=TRUE)
l <- layout_with_lgl(net, maxiter=93)
edgesp18HOMO_eVraw <- incident(net, V(net)[name=="p18HOMO_eVraw"], mode="out")
edgesp18LUMO_eVraw <- incident(net, V(net)[name=="p18LUMO_eVraw"], mode="out")
ecol <- rep("gray", ecount(net))
ecol[edgesp18HOMO_eVraw] <- "orange"
ecol[edgesp18LUMO_eVraw] <- "gold"
vcol <- rep("gray", vcount(net))
vcol[V(net)$name=="p18HOMO_eVraw"] <- "orange"
vcol[V(net)$name=="p18LUMO_eVraw"] <- "gold"

#plot(net, main="E.coli RIT", layout=l, edge.curved=.25, edge.arrow.size=log(E(net)$rit.val.dec)/6, edge.label=E(net)$rit.val.dec, edge.label.color="black", edge.label.cex=.7, vertex.label.color="black", vertex.label.cex=log(strength(net))/12)

plot(net, main="E.coli RIT", layout=l, edge.arrow.size=log(E(net)$rit.val.dec)/6, edge.label=E(net)$rit.val.dec, edge.label.color="black", edge.label.cex=.5, vertex.label.color="black", vertex.label.cex=.5, edge.color=ecol, vertex.color=vcol)




library(igraph)
library(ggraph)
library(graphlayouts)
library(dplyr)
library(tidyr)

setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP")
df <- read.csv("e.coli.tensor.single.bp.dimers_cut.score.full_rit_sort_IDdefined.txt", header=T, sep="\t")
df$rit.val.dec <- as.numeric(substr(df$rit.value, 0, 4))
df.network <- df[,c(4,5,6)]

nodes1 <- df.network %>% select("feature1") %>% distinct() %>% rename("feature" = "feature1")
nodes2 <- df.network %>% select("feature2") %>% distinct() %>% rename("feature" = "feature2")
nodes <- union(nodes1,nodes2) 
nodes$ID <- seq.int(nrow(nodes))

net <- graph_from_data_frame(d=df.network, directed=TRUE)
l <- layout_with_lgl(net, maxiter=93)
edgesp20bp_HOMO_evraw <- incident(net, V(net)[name=="p20bp_HOMO_evraw"], mode="out")
edgesp20bp_HL.gab_evraw <- incident(net, V(net)[name=="p20bp_HL.gab_evraw"], mode="out")
edgesp18HOMO_eVraw <- incident(net, V(net)[name=="p18HOMO_eVraw"], mode="out")
edgesp18LUMO_eVraw <- incident(net, V(net)[name=="p18LUMO_eVraw"], mode="out")
ecol <- rep("gray", ecount(net))
ecol[edgesp20bp_HOMO_evraw] <- "orange"
ecol[edgesp20bp_HL.gab_evraw] <- "gold"
ecol[edgesp18HOMO_eVraw] <- "light green"
ecol[edgesp18LUMO_eVraw] <- "light blue"
vcol <- rep("gray", vcount(net))
vcol[V(net)$name=="p20bp_HOMO_evraw"] <- "orange"
vcol[V(net)$name=="p20bp_HL.gab_evraw"] <- "gold"
vcol[V(net)$name=="p18HOMO_eVraw"] <- "light green"
vcol[V(net)$name=="p18LUMO_eVraw"] <- "light blue"

plot(net, main="E.coli RIT", layout=l, edge.arrow.size=log(E(net)$rit.val.dec)/6, edge.label=E(net)$rit.val.dec, edge.label.color="black", edge.label.cex=.5, vertex.label.color="black", vertex.label.cex=.5, edge.color=ecol, vertex.color=vcol)

pdf("ecoli.rit.network.pdf")
plot(net, main="E.coli RIT", layout=l, edge.arrow.size=log(E(net)$rit.val.dec)/6, edge.label=E(net)$rit.val.dec, edge.label.color="black", edge.label.cex=.5, vertex.label.color="black", vertex.label.cex=.5, edge.color=ecol, vertex.color=vcol)
dev.off()





library(igraph)
library(ggraph)
library(graphlayouts)
library(dplyr)
library(tidyr)
library(reshape2)

setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/onehot")
onehot <- read.delim("onehot.labels.txt", header=F, sep="\t")
onehot.t <- data.frame(t(onehot))
colnames(onehot.t) <- c("matrix.label", "onehot.label")

setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP")
df <- read.csv("e.coli.tensor.single.bp.dimers_cut.score.full_rit_sort_IDdefined.txt", header=T, sep="\t")
df$rit.val.dec <- as.numeric(substr(df$rit.value, 0, 4))
df.network <- df[,c(4,5,6)]
df.network$feature1.label <- 
  c("p19.GC","p16.GG","p16.GG","p18.HOMOeV","p18.HOMOeV","p19.CCA","p15.CC","GC.content","p15.CC","p15.CC",
    "p20bp.HOMOeV","p18.LUMOeV","p18.LUMOeV","p16.GG","p18.LUMOeV","p19.GC","p18.HOMOeV","GC.content","p19.CCA","p18.HOMOeV",
    "p19.CCA","p19.GC","p18.HOMOeV","p16.GG","p15.CC","p19.GC","CC","p18.HOMOeV","p19.GC","CC",
    "p19.GC","p16.GG","CC","p19.CCA","p19.GC","p15.CC","p16.GG","p19.GC","CC","p19.CCA",
    "p18.LUMOeV","p18.LUMOeV","GG","p15.dimer.Hbond","CC","p19.GC","p18.LUMOeV","GC.content","p1.TTTT","GG",
    "GG","CC","p18.LUMOeV","p15.CC","p18.HOMOeV","p16.GG","p15.CC","p1.TTTT","p19.LUMOeV","PAM.distance",
    "CC","p16.GG","PAM.distance","p20bp.HOMOeV","p19.CCA","p20bp.HLgap","p15.CC","p1.TTTT","p18.HOMOeV","p19.CCA",
    "p16.ATCA","p13.dimer.Hbond","p18.LUMOeV","p19.GC","CC","p16.GG")
df.network$feature2.label <- 
  c("p15.CC","GC.content","p19.CCA","p15.CC","GC.content","GC.content","p20bp.HOMOeV","p20bp.HOMOeV","Tm","GC.content",
    "Tm","GC.content","p15.CC","Tm","Tm","p19.HOMOeV","Tm","p15.dimer.Hbond","Tm","p19.CCA",
    "p15.dimer.Hbond","p20bp.HOMOeV","p20bp.HOMOeV","p1.TTTT","p19.CCA","p18.HOMOeV","p15.CC","p16.GG","p19.LUMOeV","p18.HOMOeV",
    "CC","p15.dimer.Hbond","p20bp.HOMOeV","p1.TTTT","Tm","GG","p20bp.HOMOeV","GC.content","GG","p20bp.HOMOeV",
    "p19.CCA","p20bp.HOMOeV","GC.content","Tm","GC.content","p18.LUMOeV","p16.GG","Tm","GC.content","p20bp.HOMOeV",
    "Tm","Tm","CC","p19.HOMOeV","p15.dimer.Hbond","p14.CC","p20bp.HLgap","Tm","p15.CC","Tm",
    "p16.GG","p19.CCA","GC.content","p15.dimer.Hbond","p14.CC","GC.content","T","p15.dimer.Hbond","GG","GG",
    "p19.CCA","Tm","p15.dimer.Hbond","GG","p19.CCA","p15.CC")

df.network.label <- df.network[,c(4,5,3)]
colnames(df.network.label) <- c("feature1","feature2","rit.val.dec")
write.table(df.network.label, "e.coli.network.rit.txt", quote=F, row.names=F, sep="\t")

nodes1 <- df.network.label %>% select("feature1") %>% distinct() %>% rename("feature" = "feature1")
nodes2 <- df.network.label %>% select("feature2") %>% distinct() %>% rename("feature" = "feature2")
nodes <- union(nodes1,nodes2) 
nodes$ID <- seq.int(nrow(nodes))

net <- graph_from_data_frame(d=df.network.label, directed=TRUE)
l <- layout_with_lgl(net, maxiter=93)
edges.pam <- incident(net, V(net)[name=="PAM.distance"], mode="out")
edges.GCcontent <- incident(net, V(net)[name=="GC.content"], mode="out")
edges.Tm <- incident(net, V(net)[name=="Tm"], mode="out")
edges.GG <- incident(net, V(net)[name=="GG"], mode="out")
edges.CC <- incident(net, V(net)[name=="CC"], mode="out")
edges.T <- incident(net, V(net)[name=="T"], mode="out")
edges.p16.GG <- incident(net, V(net)[name=="p16.GG"], mode="out")
edges.p1.TTTT <- incident(net, V(net)[name=="p1.TTTT"], mode="out")
edges.p15.CC <- incident(net, V(net)[name=="p15.CC"], mode="out")
edges.p19.GC <- incident(net, V(net)[name=="p19.GC"], mode="out")
edges.p19.CCA <- incident(net, V(net)[name=="p19.CCA"], mode="out")
edges.p14.CC <- incident(net, V(net)[name=="p14.CC"], mode="out")
edges.p16.ATCA <- incident(net, V(net)[name=="p16.ATCA"], mode="out")
edges.p20bp.HOMOeV <- incident(net, V(net)[name=="p20bp.HOMOeV"], mode="out")
edges.p20bpHL.gap <- incident(net, V(net)[name=="p20bpHL.gap"], mode="out")
edges.p13.dimer.Hbond <- incident(net, V(net)[name=="p13.dimer.Hbond"], mode="out")
edges.p15.dimer.Hbond <- incident(net, V(net)[name=="p15.dimer.Hbond"], mode="out")
edges.p18.LUMOeV <- incident(net, V(net)[name=="p18.LUMOeV"], mode="out")
edges.p19.LUMOeV <- incident(net, V(net)[name=="p19.LUMOeV"], mode="out")
edges.p19.HOMOeV <- incident(net, V(net)[name=="p19.HOMOeV"], mode="out")

ecol <- rep("gray", ecount(net))
ecol[edges.pam] <- "orange"
ecol[edges.GCcontent] <- "orange"
ecol[edges.Tm] <- "orange"
ecol[edges.GG] <- "orange"
ecol[edges.CC] <- "orange"
ecol[edges.T] <- "orange"
ecol[edges.p1.TTTT] <- "yellow"
ecol[edges.p16.GG] <- "yellow"
ecol[edges.p15.CC] <- "yellow"
ecol[edges.p19.GC] <- "yellow"
ecol[edges.p19.CCA] <- "yellow"
ecol[edges.p14.CC] <- "yellow"
ecol[edges.p16.ATCA] <- "yellow"
ecol[edges.p20bp.HLgap] <- "light purple"
ecol[edges.p20bp.HOMOeV] <- "light purple"
ecol[edges.p13.dimer.Hbond] <- "light green"
ecol[edges.p15.dimer.Hbond] <- "light green"
ecol[edges.p19.LUMOeV] <- "light blue"
ecol[edges.p18.LUMOeV] <- "light blue"
ecol[edges.p19.HOMOeV] <- "light blue"


vcol <- rep("gray", vcount(net))
vcol[V(net)$name=="PAM.distance"] <- "orange"
vcol[V(net)$name=="GC.content"] <- "orange"
vcol[V(net)$name=="Tm"] <- "orange"
vcol[V(net)$name=="GG"] <- "orange"
vcol[V(net)$name=="CC"] <- "orange"
vcol[V(net)$name=="T"] <- "orange"
vcol[V(net)$name=="p1.TTTT"] <- "yellow"
vcol[V(net)$name=="p16.GG"] <- "yellow"
vcol[V(net)$name=="p15.CC"] <- "yellow"
vcol[V(net)$name=="p19.GC"] <- "yellow"
vcol[V(net)$name=="p19.CCA"] <- "yellow"
vcol[V(net)$name=="p14.CC"] <- "yellow"
vcol[V(net)$name=="p16.ATCA"] <- "yellow"
vcol[V(net)$name=="p20bp.HOMOeV"] <- "light purple"
vcol[V(net)$name=="p20bp.HLgap"] <- "light purple"
vcol[V(net)$name=="p13.dimer.Hbond"] <- "light green"
vcol[V(net)$name=="p15.dimer.Hbond"] <- "light green"
vcol[V(net)$name=="p18.LUMOeV"] <- "light blue"
vcol[V(net)$name=="p19.LUMOeV"] <- "light blue"
vcol[V(net)$name=="p19.HOMOeV"] <- "light blue"


plot(net, main="E.coli RIT", layout=l, edge.arrow.size=log(E(net)$rit.val.dec)/6, edge.label="", edge.label.color="black", edge.label.cex=.5, vertex.label.color="black", vertex.label.cex=.5, edge.color=ecol, vertex.color=vcol)



df <- read.delim("e.coli.network.rit.txt", header=T, sep="\t")
df$edge.weight <- df$rit.val.dec * 100
write.table(df, "e.coli.network.rit.edge.txt", quote=F, row.names=F, sep="\t")

df <- read.delim("e.coli.network.rit.txt", header=T, sep="\t")
df$feature1.group <- 
  c("positional.encoding","positional.encoding","positional.encoding","quantum.single","quantum.single","positional.encoding","positional.encoding","raw.calculation","positional.encoding","positional.encoding","quantum.basepair","quantum.single","quantum.single","positional.encoding","quantum.single","positional.encoding","quantum.single","raw.calculation","positional.encoding","quantum.single","positional.encoding","positional.encoding","quantum.single","positional.encoding","positional.encoding","positional.encoding","raw.calculation","quantum.single","positional.encoding","raw.calculation","positional.encoding","positional.encoding","raw.calculation","positional.encoding","positional.encoding","positional.encoding","positional.encoding","positional.encoding","raw.calculation","positional.encoding","quantum.single","quantum.single","raw.calculation","quantum.dimer","raw.calculation","positional.encoding","quantum.single","raw.calculation","positional.encoding","raw.calculation","raw.calculation","raw.calculation","quantum.single","positional.encoding","quantum.single","positional.encoding","positional.encoding","positional.encoding","quantum.single","raw.calculation","raw.calculation","positional.encoding","raw.calculation","quantum.basepair","positional.encoding","quantum.basepair","positional.encoding","positional.encoding","quantum.single","positional.encoding","positional.encoding","quantum.dimer","quantum.single","positional.encoding","raw.calculation","positional.encoding")
df$feature2.group <- c("positional.encoding","raw.calculation","positional.encoding","positional.encoding","raw.calculation","raw.calculation","quantum.basepair","quantum.basepair","raw.calculation","raw.calculation","raw.calculation","raw.calculation","positional.encoding","raw.calculation","raw.calculation","quantum.single","raw.calculation","quantum.dimer","raw.calculation","positional.encoding","quantum.dimer","quantum.basepair","quantum.basepair","positional.encoding","positional.encoding","quantum.single","positional.encoding","positional.encoding","quantum.single","quantum.single","raw.calculation","quantum.dimer","quantum.basepair","positional.encoding","raw.calculation","raw.calculation","quantum.basepair","raw.calculation","raw.calculation","quantum.basepair","positional.encoding","quantum.basepair","raw.calculation","raw.calculation","raw.calculation","quantum.single","positional.encoding","raw.calculation","raw.calculation","quantum.basepair","raw.calculation","raw.calculation","raw.calculation","quantum.single","quantum.dimer","positional.encoding","quantum.basepair","raw.calculation","postional.encoding","raw.calculation","postional.encoding","postional.encoding","raw.calculation","quantum.dimer","postional.encoding","raw.calculation","raw.calculation","quantum.dimer","raw.calculation","raw.calculation","postional.encoding","raw.calculation","quantum.dimer","raw.calculation","postional.encoding","postional.encoding")
df$edge.weight <- df$rit.val.dec * 10
write.table(df, "e.coli.network.rit.group.txt", quote=F, row.names=F, sep="\t")
# understanding the output of /gpfs/alpine/syb105/proj-shared/Personal/jromero/PathAnalysis/ritEval.py ????

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/cut.score")
rit.adj <- read.delim("e.coli.tensor.single.bp.dimers_cut.score.rit.adj", header=T, sep="\t")
rit.edge <- read.delim("e.coli.tensor.single.bp.dimers_cut.score.rit.edge", header=T, sep="\t")
effect <- read.delim("e.coli.tensor.single.bp.dimers_cut.score.importance4.effect", header=T, sep="\t")
SHAP
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate shap

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli

# python
import pandas as pd
import numpy as np
np.random.seed(0)
import matplotlib.pyplot as plt
df = pd.read_table('/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.txt') # Load the data
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.ensemble import RandomForestRegressor
# The target variable is 'cut.score'.
Y = df['cut.score']
# get list of features from R... dput(colnames(df))
X = df.drop(columns =['sgRNAID', 'cut.score'])

# Split the data into train and test data:
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size = 0.2)
# Build the model with the random forest regression algorithm:
model = RandomForestRegressor(max_depth=6,random_state=0,n_estimators=10)
model.fit(X_train, Y_train)

import shap
shap_values = shap.TreeExplainer(model).shap_values(X_train)
f = plt.figure()
shap.summary_plot(shap_values, X_train, plot_type="bar")
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.raw.onehot.tensor.single.bp.dimers.pam.location.shap_summary_plot_bar.png", bbox_inches='tight', dpi=600)

import matplotlib.pyplot as plt
f = plt.figure()
shap.summary_plot(shap_values, X_train)
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.raw.onehot.tensor.single.bp.dimers.pam.location.shap_summary_plot_varimp.png", bbox_inches='tight', dpi=600)

# scp noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.raw.onehot.tensor.single.bp.dimers.pam.location.shap_summary_plot_varimp.png /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP/.

Run top feature set

  • run a model with the top 5,10,20,50 features (as determined by importance score) and see when model performance increases with the addition of features and then decreases as the features added are just noise.
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/e.coli.tensor.single.bp.dimers_cut.score.importance4 > cut.score/foldRuns/fold9/Runs/Set4/e.coli.tensor.single.bp.dimers_cut.score.importance4.sorted

# R
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/cut.score/foldRuns/fold9/Runs/Set4/")
imp <- read.delim("e.coli.tensor.single.bp.dimers_cut.score.importance4.sorted", header=F, sep=":")
imp5 <- imp[1:5,]
imp10 <- imp[1:10,]
imp20 <- imp[1:20,]
imp50 <- imp[1:50,]
imp100 <- imp[1:100,]
imp200 <- imp[1:200,]
imp500 <- imp[1:500,]
imp1k <- imp[1:1000,]

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
df <- read.delim("Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.txt", header=T, sep="\t")

top5 <- df %>% select(matches(imp5$V1))
df.top5 <- cbind(df[,1], top5)
write.table(df.top5, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top5.txt", quote=F, row.names=F, sep="\t")
write.table(df.top5, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top5_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.top5[,2:ncol(df.top5)], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top5_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

top10 <- df %>% select(matches(imp10$V1))
df.top10 <- cbind(df[,1], top10)
write.table(df.top10, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top10.txt", quote=F, row.names=F, sep="\t")
write.table(df.top10, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top10_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.top10[,2:ncol(df.top10)], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top10_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

top20 <- df %>% select(matches(imp20$V1))
df.top20 <- cbind(df[,1], top20)
write.table(df.top20, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top20.txt", quote=F, row.names=F, sep="\t")
write.table(df.top20, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top20_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.top20[,2:ncol(df.top20)], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top20_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

top50 <- df %>% select(matches(imp50$V1))
df.top50 <- cbind(df[,1], top50)
write.table(df.top50, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top50.txt", quote=F, row.names=F, sep="\t")
write.table(df.top50, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top50_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.top50[,2:ncol(df.top50)], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top50_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

top100 <- df %>% select(matches(imp100$V1))
df.top100 <- cbind(df[,1], top100)
write.table(df.top100, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top100.txt", quote=F, row.names=F, sep="\t")
write.table(df.top100, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top100_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.top100[,2:ncol(df.top100)], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top100_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

top200 <- df %>% select(matches(imp200$V1))
df.top200 <- cbind(df[,1], top200)
write.table(df.top200, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top200.txt", quote=F, row.names=F, sep="\t")
write.table(df.top200, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top200_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.top200[,2:ncol(df.top200)], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top200_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

top500 <- df %>% select(matches(imp500$V1))
df.top500 <- cbind(df[,1], top500)
write.table(df.top500, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top500.txt", quote=F, row.names=F, sep="\t")
write.table(df.top500, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top500_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.top500[,2:ncol(df.top500)], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top500_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

top1k <- df %>% select(matches(imp1k$V1))
df.top1k <- cbind(df[,1], top1k)
write.table(df.top1k, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top1k.txt", quote=F, row.names=F, sep="\t")
write.table(df.top1k, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top1k_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.top1k[,2:ncol(df.top1k)], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top1k_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")


# top 5 features
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top5
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top5
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName top5 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top5.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top5
module load python/3.7.0-anaconda3-5.3.0
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top5/Submits/submit_full_top5_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top5/Submits/submit_train_top5_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top5/Submits/submit_test_top5_0.sh

module load python/3.7-anaconda3
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top5
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt top5
# 0.11240745945016933
sort -k3rg topVarEdges/cut.score_top95.txt | head
# p20bp_HOMO_evraw  cut.score   0.5560700206650031
# sgRNA.gcsgRNA.raw cut.score   0.18401685074949461
# V303.xsgRNA.raw   cut.score   0.13877919464539287
# CCsgRNA.raw   cut.score   0.07303826527934598

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top5/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("top5_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.3436711


# top 10 features
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top10
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top10
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName top10 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top10.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top10
module load python/3.7.0-anaconda3-5.3.0
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top10/Submits/submit_full_top10_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top10/Submits/submit_train_top10_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top10/Submits/submit_test_top10_0.sh

module load python/3.7-anaconda3
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top10
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt top10
# 0.15779734147083332
sort -k3rg topVarEdges/cut.score_top95.txt | head
# p20bp_HOMO_evraw  cut.score   0.2779751149892543
# CCsgRNA.raw   cut.score   0.10392795171106281
# GGsgRNA.raw   cut.score   0.08649169902899904
# pam.distance0 cut.score   0.08305282825098441
# V303.xsgRNA.raw   cut.score   0.0808250888344932
# sgRNA.gcsgRNA.raw cut.score   0.08065523571159726
# p18HOMO_eVraw cut.score   0.07858676088784716
# p18LUMO_eVraw cut.score   0.07790865831751967
# sgRNA.tempsgRNA.raw   cut.score   0.07505008265603128
# V231.xsgRNA.raw   cut.score   0.05552657961221091

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top10/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("top10_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4019815


# top 20 features
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top20
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top20
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName top20 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top20.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top20
module load python/3.7.0-anaconda3-5.3.0
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top20/Submits/submit_full_top20_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top20/Submits/submit_train_top20_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top20/Submits/submit_test_top20_0.sh

module load python/3.7-anaconda3
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top20
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt top20
# 0.20172360134328232
sort -k3rg topVarEdges/cut.score_top95.txt | head
# p15dimer_H_bondraw    cut.score   0.07323354697021574
# pam.distance0 cut.score   0.06606994398300291
# sgRNA.gcsgRNA.raw cut.score   0.05703286182756317
# CCsgRNA.raw   cut.score   0.056726691069441004
# sgRNA.tempsgRNA.raw   cut.score   0.05473447914546529
# GGsgRNA.raw   cut.score   0.05202158281633737
# TsgRNA.raw    cut.score   0.05141008209856573
# p18HOMO_eVraw cut.score   0.05070351109634776
# p18LUMO_eVraw cut.score   0.04933637277977404
# p19LUMO_eVraw cut.score   0.04712981730229931

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top20/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("top20_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4458406


# top 50 features
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top50
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top50
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName top50 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top50.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top50
module load python/3.7.0-anaconda3-5.3.0
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top50/Submits/submit_full_top50_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top50/Submits/submit_train_top50_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top50/Submits/submit_test_top50_0.sh

module load python/3.7-anaconda3
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top50
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt top50
# 0.24529071033783692
sort -k3rg topVarEdges/cut.score_top95.txt | head
# p15dimer_H_bondraw    cut.score   0.05288961312310658
# p18dimer_H_bondraw    cut.score   0.048287272079531054
# sgRNA.gcsgRNA.raw cut.score   0.048042513201620105
# sgRNA.tempsgRNA.raw   cut.score   0.046670325843486814
# p20bp_HL.gab_evraw    cut.score   0.03680256204578182
# p19dimer_HOMO_eVraw   cut.score   0.036082345356197476
# p19LUMO_eVraw cut.score   0.035649593966638055
# p20bp_bondraw cut.score   0.03532168658007362
# p18LUMO_eVraw cut.score   0.035193304018897434
# CCsgRNA.raw   cut.score   0.03424359955178815

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top50/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("top50_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4903894


# top 100 features
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top100
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top100
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName top100 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top100.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top100
module load python/3.7.0-anaconda3-5.3.0
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top100/Submits/submit_full_top100_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top100/Submits/submit_train_top100_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top100/Submits/submit_test_top100_0.sh

module load python/3.7-anaconda3
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top100
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt top100
# 0.2511902743156288
sort -k3rg topVarEdges/cut.score_top95.txt | head
# sgRNA.tempsgRNA.raw   cut.score   0.042941379113082226
# sgRNA.gcsgRNA.raw cut.score   0.040372033685997025
# p15dimer_H_bondraw    cut.score   0.03820204540598607
# p20bp_HL.gab_evraw    cut.score   0.03748607181550359
# p20bp_LUMO_evraw  cut.score   0.032674508114237846
# p18HOMO_eVraw cut.score   0.030854362074953134
# p18dimer_H_bondraw    cut.score   0.029996170116640183
# p18LUMO_eVraw cut.score   0.029986605869136346
# p20bp_bondraw cut.score   0.02958156995010671
# CCsgRNA.raw   cut.score   0.028279537131068865

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top100/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("top100_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4967809


# top 200 features
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top200
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top200
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName top200 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top200.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top200
module load python/3.7.0-anaconda3-5.3.0
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top200/Submits/submit_full_top200_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top200/Submits/submit_train_top200_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top200/Submits/submit_test_top200_0.sh

module load python/3.7-anaconda3
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top200
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt top200
# 0.2541662419500061
sort -k3rg topVarEdges/cut.score_top95.txt | head
# p20bp_HL.gab_evraw    cut.score   0.040370718407463375
# sgRNA.gcsgRNA.raw cut.score   0.03770659114517902
# sgRNA.tempsgRNA.raw   cut.score   0.03717516292014969
# p15dimer_H_bondraw    cut.score   0.03390945697992189
# p20bp_LUMO_evraw  cut.score   0.030311114551425513
# p20bp_bondraw cut.score   0.02994484824438354
# p18dimer_H_bondraw    cut.score   0.0298580274228511
# CCsgRNA.raw   cut.score   0.02874886974665889
# p19dimer_H_bondraw    cut.score   0.025755526794290634
# p18HOMO_eVraw cut.score   0.025255463915996947

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top200/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("top200_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4984185


# top 500 features
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top500
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top500
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName top500 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top500.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top500
module load python/3.7.0-anaconda3-5.3.0
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top500/Submits/submit_full_top500_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top500/Submits/submit_train_top500_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top500/Submits/submit_test_top500_0.sh

module load python/3.7-anaconda3
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top500
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt top500
# 0.2564356719903999
sort -k3rg topVarEdges/cut.score_top95.txt | head
# p20bp_bondraw cut.score   0.033351661481709816
# p20bp_HL.gab_evraw    cut.score   0.03302323848544733
# sgRNA.gcsgRNA.raw cut.score   0.03235167607143662
# sgRNA.tempsgRNA.raw   cut.score   0.030300746213757945
# p15dimer_H_bondraw    cut.score   0.028587105767771046
# p20bp_HOMO_evraw  cut.score   0.027145399197064105
# CCsgRNA.raw   cut.score   0.026952897435269286
# p20bp_LUMO_evraw  cut.score   0.026059446119664594
# p18dimer_H_bondraw    cut.score   0.026041073444393575
# p18LUMO_eVraw cut.score   0.025907985806609503

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top500/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("top500_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.500109


# top 1000 features
module load python/3.7-anaconda3
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top1k
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top1k
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName top1k --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.features.top1k.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features
module load python/3.7.0-anaconda3-5.3.0
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top1k/Submits/submit_full_top1k_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top1k/Submits/submit_train_top1k_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top1k/Submits/submit_test_top1k_0.sh

module load python/3.7-anaconda3
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top1k
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt top1k
# 0.25773091055490766
sort -k3rg topVarEdges/cut.score_top95.txt | head
# p20bp_LUMO_evraw  cut.score   0.03437642180047371
# sgRNA.gcsgRNA.raw cut.score   0.027582072433968742
# p20bp_HOMO_evraw  cut.score   0.027573956881477995
# p20bp_HL.gab_evraw    cut.score   0.02745782508958597
# sgRNA.tempsgRNA.raw   cut.score   0.026721515126104947
# p20bp_bondraw cut.score   0.026638154288020657
# CCsgRNA.raw   cut.score   0.024201018590038908
# p15dimer_H_bondraw    cut.score   0.02355067173419084
# pam.distance0 cut.score   0.023324230177193622
# p18LUMO_eVraw cut.score   0.02261733909149155

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/top.features/top1k/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("top1k_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.5000614

Remove highly correlated features…

# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test


# python 

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
import warnings
warnings.filterwarnings("ignore")
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix
np.random.seed(123)

data = pd.read_table('/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.txt')
data = data.iloc[:,2:-1]

label_encoder = LabelEncoder()
data.iloc[:,0] = label_encoder.fit_transform(data.iloc[:,0]).astype('float64')

corr = data.corr()
corr.to_csv("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.correlationmatrix.txt")

columns = np.full((corr.shape[0],), True, dtype=bool)
for i in range(corr.shape[0]):
    for j in range(i+1, corr.shape[0]):
        if corr.iloc[i,j] >= 0.9:
            if columns[j]:
                columns[j] = False

selected_columns = data.columns[columns]
data = data[selected_columns]

data.to_csv("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.pythoncorrelation.txt")


# R
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

# head -n 1 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.pythoncorrelation.txt > /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.pythoncorrelation.header.txt

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
df.noncor <- read.delim("Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.pythoncorrelation.header.txt", header=F, sep=",")

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
df <- read.delim("Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.txt", header=T, sep="\t")
df.subset <- df[ , which(names(df) %in% df.noncor[1,])]

df.mat <- as.matrix(df.subset[,2:ncol(df.subset)])
df.mat.id <- cbind(as.data.frame(df$sgRNAID), df.mat)
write.table(df.mat.id, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.noncorrelated.txt", quote=F, row.names=F, sep="\t")

write.table(df.mat.id, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.noncorrelated.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.mat.id, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.noncorrelated.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.mat.id[,2:ncol(df.mat.id)], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.noncorrelated.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

df.features <- data.frame(feature = colnames(df))
df.features.noncor <- data.frame(t(df.noncor))
colnames(df.features.noncor) <- "feature"
df.removedfeatures<- subset(df.features, !(df.features$feature %in% df.features.noncor$feature))
# 19    p10bp_HL.gab_evraw
# 21      p10bp_LUMO_evraw
# 25    p10No_electronsraw
# 27    p11bp_HL.gab_evraw
# 29      p11bp_LUMO_evraw
# 33    p11No_electronsraw
# 35    p12bp_HL.gab_evraw
# 37      p12bp_LUMO_evraw
# 41    p12No_electronsraw
# 43    p13bp_HL.gab_evraw
# 45      p13bp_LUMO_evraw
# 49    p13No_electronsraw
# 51    p14bp_HL.gab_evraw
# 53      p14bp_LUMO_evraw
# 57    p14No_electronsraw
# 59    p15bp_HL.gab_evraw
# 61      p15bp_LUMO_evraw
# 65    p15No_electronsraw
# 67    p16bp_HL.gab_evraw
# 69      p16bp_LUMO_evraw
# 73    p16No_electronsraw
# 75    p17bp_HL.gab_evraw
# 77      p17bp_LUMO_evraw
# 81    p17No_electronsraw
# 83    p18bp_HL.gab_evraw
# 85      p18bp_LUMO_evraw
# 89    p18No_electronsraw
# 91    p19bp_HL.gab_evraw
# 93      p19bp_LUMO_evraw
# 97    p19No_electronsraw
# 99     p1bp_HL.gab_evraw
# 101      p1bp_LUMO_evraw
# 105    p1No_electronsraw
# 107   p20bp_HL.gab_evraw
# 109     p20bp_LUMO_evraw
# 113   p20No_electronsraw
# 115    p2bp_HL.gab_evraw
# 117      p2bp_LUMO_evraw
# 121    p2No_electronsraw
# 123    p3bp_HL.gab_evraw
# 125      p3bp_LUMO_evraw
# 129    p3No_electronsraw
# 131    p4bp_HL.gab_evraw
# 133      p4bp_LUMO_evraw
# 137    p4No_electronsraw
# 139    p5bp_HL.gab_evraw
# 141      p5bp_LUMO_evraw
# 145    p5No_electronsraw
# 147    p6bp_HL.gab_evraw
# 149      p6bp_LUMO_evraw
# 153    p6No_electronsraw
# 155    p7bp_HL.gab_evraw
# 157      p7bp_LUMO_evraw
# 161    p7No_electronsraw
# 163    p8bp_HL.gab_evraw
# 165      p8bp_LUMO_evraw
# 169    p8No_electronsraw
# 171    p9bp_HL.gab_evraw
# 173      p9bp_LUMO_evraw
# 177    p9No_electronsraw
# 180  sgRNA.tempsgRNA.raw
# 6083 p10dimer_LUMO_eVraw
# 6088 p11dimer_LUMO_eVraw
# 6093 p12dimer_LUMO_eVraw
# 6098 p13dimer_LUMO_eVraw
# 6103 p14dimer_LUMO_eVraw
# 6108 p15dimer_LUMO_eVraw
# 6113 p16dimer_LUMO_eVraw
# 6118 p17dimer_LUMO_eVraw
# 6123 p18dimer_LUMO_eVraw
# 6128 p19dimer_LUMO_eVraw
# 6133  p1dimer_LUMO_eVraw
# 6138  p2dimer_LUMO_eVraw
# 6143  p3dimer_LUMO_eVraw
# 6148  p4dimer_LUMO_eVraw
# 6153  p5dimer_LUMO_eVraw
# 6158  p6dimer_LUMO_eVraw
# 6163  p7dimer_LUMO_eVraw
# 6168  p8dimer_LUMO_eVraw
# 6173  p9dimer_LUMO_eVraw



# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.noncorrelated
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.noncorrelated
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName e.coli.tensor.single.bp.dimers.noncorrelated --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.noncorrelated.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.noncorrelated
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.noncorrelated/Submits/submit_full_e.coli.tensor.single.bp.dimers.noncorrelated_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.noncorrelated/Submits/submit_train_e.coli.tensor.single.bp.dimers.noncorrelated_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.noncorrelated/Submits/submit_test_e.coli.tensor.single.bp.dimers.noncorrelated_0.sh

# Andes
module load python/3.7-anaconda3

vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.noncorrelated
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt e.coli.tensor.single.bp.dimers.noncorrelated
# 0.25928279270089966
sort -k3rg topVarEdges/cut.score_top95.txt | head
# p20bp_bondraw cut.score   0.06947945062987773
# sgRNA.gcsgRNA.raw cut.score   0.041528973085977305
# p20bp_HOMO_evraw  cut.score   0.040472249983229965
# V231.xsgRNA.raw   cut.score   0.027522152556921066
# V303.xsgRNA.raw   cut.score   0.023400629503740097
# pam.distance0 cut.score   0.02084947923491131
# p18LUMO_eVraw cut.score   0.01995639196431779
# V1110.xsgRNA.raw  cut.score   0.018461591295195666
# CCsgRNA.raw   cut.score   0.01839402874517121
# p18HOMO_eVraw cut.score   0.017933831498720857

sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/e.coli.tensor.single.bp.dimers.noncorrelated_cut.score.importance4 | head
# p20bp_HOMO_evraw: 125733
# p20bp_bondraw: 82183
# sgRNA.gcsgRNA.raw: 76333.2
# V303.xsgRNA.raw: 52552.2
# V231.xsgRNA.raw: 52200.2
# CCsgRNA.raw: 44234.6
# p18LUMO_eVraw: 39001.7
# GGsgRNA.raw: 38551.2
# pam.distance0: 38142.4
# V1110.xsgRNA.raw: 35370.2

# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.noncorrelated/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("e.coli.tensor.single.bp.dimers.noncorrelated_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.5000786
## look at correlation matrix

library(dplyr)
library(ggplot2)
library(reshape2)
library(pheatmap)
library(RColorBrewer)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
df <- read.delim("Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.correlationmatrix.txt", header=T, row.names=1, sep=",")

# pheatmap(mat, color = colorRampPalette(rev(brewer.pal(n = 7, name =
#   "RdYlBu")))(100), kmeans_k = NA, breaks = NA, border_color = "grey60",
#   cellwidth = NA, cellheight = NA, scale = "none", cluster_rows = TRUE,
#   cluster_cols = TRUE, clustering_distance_rows = "euclidean",
#   clustering_distance_cols = "euclidean", clustering_method = "complete",
#   clustering_callback = identity2, cutree_rows = NA, cutree_cols = NA,
#   treeheight_row = ifelse((class(cluster_rows) == "hclust") || cluster_rows,
#   50, 0), treeheight_col = ifelse((class(cluster_cols) == "hclust") ||
#   cluster_cols, 50, 0), legend = TRUE, legend_breaks = NA,
#   legend_labels = NA, annotation_row = NA, annotation_col = NA,
#   annotation = NA, annotation_colors = NA, annotation_legend = TRUE,
#   annotation_names_row = TRUE, annotation_names_col = TRUE,
#   drop_levels = TRUE, show_rownames = T, show_colnames = T, main = NA,
#   fontsize = 10, fontsize_row = fontsize, fontsize_col = fontsize,
#   angle_col = c("270", "0", "45", "90", "315"), display_numbers = F,
#   number_format = "%.2f", number_color = "grey30", fontsize_number = 0.8
#   * fontsize, gaps_row = NULL, gaps_col = NULL, labels_row = NULL,
#   labels_col = NULL, filename = NA, width = NA, height = NA,
#   silent = FALSE, na_col = "#DDDDDD", ...)

pdf("ecoli.correlation.matrix.pheatmap.pdf")
pheatmap(df)
dev.off()
RIT
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00

#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.noncorrelated/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score e.coli.tensor.single.bp.dimers.noncorrelated

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.noncorrelated/cut.score/RIT.run

17 Feb 2022: Run iRF with specific sets

  • use absolute value of H bond quantum features (to interpret as bond strength)
  • Remove H-L gap and run with just H-bond data (and vice versa)
  • Remove quantum properties and run with just positional encoding (and vice versa)

–> look at model metrics and identify top features

library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df <- df[,c(1:6072,6074:6078,6080,6082:6176)]
df.num <- mutate_all(df[,2:ncol(df)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"

df.abs <- df.all %>% select(grep("bondraw", names(df.all))) %>% abs()
df.all.sub <- df.all %>% select(-grep("bondraw", names(df.all))) 
df.abs.all <- cbind(df.all.sub, df.abs)
# 6173 columns --> 6171 features
write.table(df.abs.all, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.txt", quote=F, row.names=F, sep="\t")

df.minusIndOnehot <- df.abs.all[,c(1,2,18:160,166:ncol(df.abs.all))]
# 6151 features
df.minusDepOnehot <- df.abs.all %>% select(-starts_with("V")) 
# 284 features
df.minusOnehot <- df.minusIndOnehot %>% select(-starts_with("V")) 
# 264 features
df.minusHL <- df.abs.all %>% select(-grep("HL", names(df.abs.all)), -grep("HOMO", names(df.abs.all)), -grep("LUMO", names(df.abs.all))) 
# 5994 features
df.minusHbond <- df.abs.all %>% select(-grep("bondraw", names(df.abs.all))) 
# 6132 features
write.table(df.minusIndOnehot, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noIndOnehot.txt", quote=F, row.names=F, sep="\t")
write.table(df.minusDepOnehot, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noDepOnehot.txt", quote=F, row.names=F, sep="\t")
write.table(df.minusOnehot, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noOnehot.txt", quote=F, row.names=F, sep="\t")
write.table(df.minusHL, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.txt", quote=F, row.names=F, sep="\t")
write.table(df.minusHbond, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHbond.txt", quote=F, row.names=F, sep="\t")

write.table(df.minusIndOnehot[,c(1,3:ncol(df.minusIndOnehot))], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noIndOnehot.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.minusIndOnehot[,c(1,3:ncol(df.minusIndOnehot))], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noIndOnehot.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.minusIndOnehot[,3:ncol(df.minusIndOnehot)], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noIndOnehot.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

write.table(df.minusDepOnehot[,c(1,3:ncol(df.minusDepOnehot))], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noDepOnehot.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.minusDepOnehot[,c(1,3:ncol(df.minusDepOnehot))], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noDepOnehot.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.minusDepOnehot[,3:ncol(df.minusDepOnehot)], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noDepOnehot.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

write.table(df.minusOnehot[,c(1,3:ncol(df.minusOnehot))], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noOnehot.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.minusOnehot[,c(1,3:ncol(df.minusOnehot))], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noOnehot.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.minusOnehot[,3:ncol(df.minusOnehot)], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noOnehot.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

write.table(df.minusHL[,c(1,3:ncol(df.minusHL))], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.minusHL[,c(1,3:ncol(df.minusHL))], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.minusHL[,3:ncol(df.minusHL)], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

write.table(df.minusHbond[,c(1,3:ncol(df.minusHbond))], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHbond.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.minusHbond[,c(1,3:ncol(df.minusHbond))], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHbond.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.minusHbond[,3:ncol(df.minusHbond)], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHbond.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

write.table(df.abs.all[,c(1,3:ncol(df.abs.all))], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.abs.all[,c(1,3:ncol(df.abs.all))], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.abs.all[,3:ncol(df.abs.all)], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")



# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs

python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName HbondAbs --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt

mkdir noIndOnehot
cd noIndOnehot
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName noIndOnehot --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noIndOnehot.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt

mkdir noDepOnehot
cd noDepOnehot
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName noDepOnehot --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noDepOnehot.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt

mkdir noOnehot
cd noOnehot
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName noOnehot --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noOnehot.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt

mkdir noHL
cd noHL
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName noHL --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt

mkdir noHbond
cd noHbond
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName noHbond --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHbond.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/Submits/submit_full_HbondAbs_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noIndOnehot/Submits/submit_full_noIndOnehot_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noDepOnehot/Submits/submit_full_noDepOnehot_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noOnehot/Submits/submit_full_noOnehot_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/Submits/submit_full_noHL_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHbond/Submits/submit_full_noHbond_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/Submits/submit_train_HbondAbs_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noIndOnehot/Submits/submit_train_noIndOnehot_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noDepOnehot/Submits/submit_train_noDepOnehot_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noOnehot/Submits/submit_train_noOnehot_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/Submits/submit_train_noHL_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHbond/Submits/submit_train_noHbond_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/Submits/submit_test_HbondAbs_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noIndOnehot/Submits/submit_test_noIndOnehot_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noDepOnehot/Submits/submit_test_noDepOnehot_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noOnehot/Submits/submit_test_noOnehot_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/Submits/submit_test_noHL_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHbond/Submits/submit_test_noHbond_0.sh

# Andes
module load python/3.7-anaconda3

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt HbondAbs
# 0.2593011912468103
sort -k3rg topVarEdges/cut.score_top95.txt | head
# p20bp_HL.gab_evraw    cut.score   0.04074407176356848
# p20bp_LUMO_evraw  cut.score   0.02982532979067789
# V231.xsgRNA.raw   cut.score   0.02675069834247342
# V303.xsgRNA.raw   cut.score   0.023207693772593792
# sgRNA.gcsgRNA.raw cut.score   0.02196850968561247
# pam.distance0 cut.score   0.020873562612804073
# p20bp_HOMO_evraw  cut.score   0.02087113775865276
# sgRNA.tempsgRNA.raw   cut.score   0.020637348372202036
# p20bp_bondraw cut.score   0.01933946609162686
# p18HOMO_eVraw cut.score   0.019013992133558383

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noIndOnehot
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt noIndOnehot
# 0.25904726835907205
sort -k3rg topVarEdges/cut.score_top95.txt | head
# p20bp_LUMO_evraw  cut.score   0.03740568648026862
# p20bp_HL.gab_evraw    cut.score   0.03407397259759679
# V231.xsgRNA.raw   cut.score   0.028626303338179237
# sgRNA.tempsgRNA.raw   cut.score   0.02773197638507984
# sgRNA.gcsgRNA.raw cut.score   0.026339820424743258
# pam.distance0 cut.score   0.023934353850656814
# V303.xsgRNA.raw   cut.score   0.023365743061858745
# p20bp_bondraw cut.score   0.022009149376761586
# sgRNA.structuresgRNA.raw  cut.score   0.020812371699668097
# p20bp_HOMO_evraw  cut.score   0.020086001516374175

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noDepOnehot
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt noDepOnehot
# 0.25357812652866174
sort -k3rg topVarEdges/cut.score_top95.txt | head
# p20bp_LUMO_evraw  cut.score   0.04684960167834949
# p15dimer_H_bondraw    cut.score   0.044751944753825774
# p20bp_HL.gab_evraw    cut.score   0.03758546845367489
# sgRNA.tempsgRNA.raw   cut.score   0.03571010401806396
# sgRNA.gcsgRNA.raw cut.score   0.035098526732744245
# CCsgRNA.raw   cut.score   0.03137770268065862
# p18dimer_H_bondraw    cut.score   0.02847774217987957
# p18LUMO_eVraw cut.score   0.02699199481644375
# TsgRNA.raw    cut.score   0.023972177699285294
# p19dimer_H_bondraw    cut.score   0.023924804668336123

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noOnehot
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt noOnehot
# 0.25388901215160736
sort -k3rg topVarEdges/cut.score_top95.txt | head
# p15dimer_H_bondraw    cut.score   0.04844827160615669
# p20bp_HL.gab_evraw    cut.score   0.04557467495597135
# sgRNA.gcsgRNA.raw cut.score   0.04251073923388409
# sgRNA.tempsgRNA.raw   cut.score   0.04228681946523368
# p20bp_LUMO_evraw  cut.score   0.041264861464619725
# p18dimer_H_bondraw    cut.score   0.030493792411302015
# p18LUMO_eVraw cut.score   0.028655140236086465
# pam.distance0 cut.score   0.02475168709528717
# p18HOMO_eVraw cut.score   0.024570167196268682
# p19dimer_H_bondraw    cut.score   0.023367997635470598

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt noHL
# 0.2599369980927068
sort -k3rg topVarEdges/cut.score_top95.txt | head
# p20bp_bondraw cut.score   0.10832436968998468
# V231.xsgRNA.raw   cut.score   0.026747712483184086
# V303.xsgRNA.raw   cut.score   0.02271752579589592
# V76.xsgRNA.raw    cut.score   0.021235582903126448
# sgRNA.tempsgRNA.raw   cut.score   0.020746628383162113
# pam.distance0 cut.score   0.020692481727169403
# p18No_electronsraw    cut.score   0.020667635563513885
# V1110.xsgRNA.raw  cut.score   0.018462048154277343
# sgRNA.gcsgRNA.raw cut.score   0.018359721190801315
# CCsgRNA.raw   cut.score   0.01809352397032094

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHbond
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt noHbond
# 0.25837914211669216
sort -k3rg topVarEdges/cut.score_top95.txt | head
# p20bp_LUMO_evraw  cut.score   0.04205003555911055
# p20bp_HOMO_evraw  cut.score   0.03785388358245961
# V231.xsgRNA.raw   cut.score   0.031536205225035786
# p20bp_HL.gab_evraw    cut.score   0.03122862333339956
# V303.xsgRNA.raw   cut.score   0.024179787748844286
# sgRNA.gcsgRNA.raw cut.score   0.022325151165478587
# sgRNA.tempsgRNA.raw   cut.score   0.022005907003537557
# pam.distance0 cut.score   0.021674161559039572
# GGsgRNA.raw   cut.score   0.0211353898413249
# CCsgRNA.raw   cut.score   0.020240876928700906



# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("HbondAbs_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.5010311

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noIndOnehot/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("noIndOnehot_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.5013474

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noDepOnehot/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("noDepOnehot_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.49789

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noOnehot/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("noOnehot_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4986254

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("noHL_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.5015522

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHbond/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("noHbond_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4995839
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00

#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score HbondAbs
###### same issue as with full matrix in that I can only look at the top 5 features... (instead of the top 20)

sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/RIT.HbondAbs.run
# p20bp_HL.gab_evraw    cut.score   0.04074407176356848 0.06540570583232581 14892.224   23.47789509039808
# p20bp_LUMO_evraw  cut.score   0.02982532979067789 0.06852932986603244 10885.892   23.24410559501383
# V231.xsgRNA.raw   cut.score   0.02675069834247342 -0.052429449724413546   17877.631   22.94429495512176
# V303.xsgRNA.raw   cut.score   0.023207693772593792    0.05858117313590401 14956.815   21.52922058776297
# sgRNA.gcsgRNA.raw cut.score   0.02196850968561247 -0.014567245088641557   19886.711   26.422310901621934
# pam.distance0 cut.score   0.020873562612804073    -0.0191047257820114 10651.117   27.065113245420157
# p20bp_HOMO_evraw  cut.score   0.02087113775865276 -0.06852829159673891    7607.984    26.573739935276844
# sgRNA.tempsgRNA.raw   cut.score   0.020637348372202036    -0.013810720625434906   18662.106   26.340266985358248
# p20bp_bondraw cut.score   0.01933946609162686 -0.06525044005698032    7081.9  26.33594806547877
# p18HOMO_eVraw cut.score   0.019013992133558383    -0.0693252895131309 14475.231   25.420233315501324
##### same output as before (with H bond eV) just with opposite direction of effect since we are looking at the abs value for H bond strength
##### + efficiency, + 20bp HL gap (+ 20bp LUMO, - 20bp HOMO), - 20bp H bond strength, - 15 CC, + 19 GC, - GC content (- Tm), - PAM distance, 

sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/RIT.noIndOnehot.run
# p20bp_LUMO_evraw  cut.score   0.03740568648026862 0.06678493661923375 13354.44    23.294747824975627
# p20bp_HL.gab_evraw    cut.score   0.03407397259759679 0.06560206806550278 12140.4 23.451527654635452
# V231.xsgRNA.raw   cut.score   0.028626303338179237    -0.037431182552715485   19044.338   23.12743173068614
# sgRNA.tempsgRNA.raw   cut.score   0.02773197638507984 -0.013487715939844616   23607.278   26.061409782987504
# sgRNA.gcsgRNA.raw cut.score   0.026339820424743258    -0.012468520720419561   22618.613   26.04062919937365
# pam.distance0 cut.score   0.023934353850656814    -0.009123565254675696   11830.245   26.833122718642414
# V303.xsgRNA.raw   cut.score   0.023365743061858745    0.054219557800637495    14739.697   21.49704621067919
# p20bp_bondraw cut.score   0.022009149376761586    -0.0666407473201099 7810.324    26.414363932046037
# sgRNA.structuresgRNA.raw  cut.score   0.020812371699668097    -0.014624154257627253   12291.255   27.70968379573303
# p20bp_HOMO_evraw  cut.score   0.020086001516374175    -0.0646871973097071 7162.836    26.224404044272184
##### top features don't change

sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/RIT.noDepOnehot.run
# p20bp_LUMO_evraw  cut.score   0.04684960167834949 0.06559633254725446 15421.92    22.86735058327459
# p15dimer_H_bondraw    cut.score   0.044751944753825774    -0.018249689131871152   29467.565   25.310882754443824
# p20bp_HL.gab_evraw    cut.score   0.03758546845367489 0.06625957223659626 12294.858   23.010907788980344
# sgRNA.tempsgRNA.raw   cut.score   0.03571010401806396 -0.009117342321789484   26681.361   26.10256342430109
# sgRNA.gcsgRNA.raw cut.score   0.035098526732744245    -0.01012755071121127    26207.784   26.111151846881516
# CCsgRNA.raw   cut.score   0.03137770268065862 -0.03907716741468688    21518.278   25.3623582177715
# p18dimer_H_bondraw    cut.score   0.02847774217987957 -0.02571747651136996    18124.51    26.037486328442327
# p18LUMO_eVraw cut.score   0.02699199481644375 -0.053745338230437735   18726.339   25.41774729858125
# TsgRNA.raw    cut.score   0.023972177699285294    0.020293282462238597    11774.273   24.643862616300428
# p19dimer_H_bondraw    cut.score   0.023924804668336123    -0.019326338172870678   14164.44    23.90551517067496
##### when you remove the dependent binary encoding you get more H bond strength (dimers) as top features... position 15, 18, 19... all anti-correlated with efficiency (increase bond strength, decrease efficiency --> unwinding??)

sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/RIT.noOnehot.run
# p15dimer_H_bondraw    cut.score   0.04844827160615669 -0.01626686757785544    32700.244   25.367999127802303
# p20bp_HL.gab_evraw    cut.score   0.04557467495597135 0.06095579480035842 14733.536   22.966227259031594
# sgRNA.gcsgRNA.raw cut.score   0.04251073923388409 -0.008538980282246797   30988.33    25.830152377698347
# sgRNA.tempsgRNA.raw   cut.score   0.04228681946523368 -0.008612329070563751   30935.114   25.858476132502098
# p20bp_LUMO_evraw  cut.score   0.041264861464619725    0.06124963455515558 13341.517   22.94242827344009
# p18dimer_H_bondraw    cut.score   0.030493792411302015    -0.025126675974208045   19517.9 26.014219788816973
# p18LUMO_eVraw cut.score   0.028655140236086465    -0.04778019133430412    19304.225   25.297925346523744
# pam.distance0 cut.score   0.02475168709528717 -0.008897994974411958   11194.564   26.41533583531853
# p18HOMO_eVraw cut.score   0.024570167196268682    -0.04603703700494443    16596.839   25.365968488547143
# p19dimer_H_bondraw    cut.score   0.023367997635470598    -0.013995866371895512   13912.626   23.82128624830395
##### BUT when you remove the binary encoding you decrease predictability... R2 = 0.254 (instead of 0.259) and pearson correlation = 0.499 (instead of 0.501)

sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/RIT.noHL.run
# p20bp_bondraw cut.score   0.10832436968998468 -0.06371517105691962    40468.0 26.53584211699757
# V231.xsgRNA.raw   cut.score   0.026747712483184086    -0.04016628745063616    18083.387   22.91134775462232
# V303.xsgRNA.raw   cut.score   0.02271752579589592 0.03206212868359705 15138.146   21.29942426339952
# V76.xsgRNA.raw    cut.score   0.021235582903126448    0.02541373469402145 12825.977   20.884984161074318
# sgRNA.tempsgRNA.raw   cut.score   0.020746628383162113    -0.012244974560426967   19607.456   26.382438091132016
# pam.distance0 cut.score   0.020692481727169403    -0.016393207865661823   11038.427   27.125380443657956
# p18No_electronsraw    cut.score   0.020667635563513885    -0.044766067617166456   16715.325   26.341873200928912
# V1110.xsgRNA.raw  cut.score   0.018462048154277343    -0.04571418673448329    14411.222   27.417386748509827
# sgRNA.gcsgRNA.raw cut.score   0.018359721190801315    -0.010406803699713921   17438.614   26.531284668885103
# CCsgRNA.raw   cut.score   0.01809352397032094 -0.04048452666186056    14459.546   25.373397321161697
##### removing HL values maintains R2 at 0.5015
##### + efficiency --> - 20bp H bond strength, - p15 CC, + p19 GC, + p19 T (???? how can it be both p19 GC and p19 T... also articles claims Cas9 loading best for G and A in 4 bp at end of sgRNA ?????), - Tm, - PAM distance

# BuildLineTime:  0.006212472915649414
# ProcessLineTime:  0.0004608631134033203
# Feature:  p20bp_bondraw  time:  0.015341043472290039
# ['V231.xsgRNA.raw'] p20bp_bondraw
# Traceback (most recent call last):
#   File "/gpfs/alpine/syb105/proj-shared/Personal/jromero/PathAnalysis/ritEval.py", line 126, in <module>
#     vals+=ConImpSet(pathInfo,pathLoc,featList)
#   File "/gpfs/alpine/syb105/proj-shared/Personal/jromero/PathAnalysis/RIThelper.py", line 841, in ConImpSet
#     ConImp.append(((result[0][0]/result[0][1])-(impTemp[0]-result[0][0])/(impTemp[1]-result[0][1]))*result[0][1])
# ZeroDivisionError: division by zero
############## --> DOES THIS MEAN THAT p20bp_bondraw AND V231.xsgRNA.raw ALWAYS APPEAR TOGETHER???? <-- ##############


sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/RIT.noHbond.run
# p20bp_LUMO_evraw  cut.score   0.04205003555911055 0.08353960703131265 15337.372   23.485361950448553
# p20bp_HOMO_evraw  cut.score   0.03785388358245961 -0.08072346303280459    13759.12    26.00979104332283
# V231.xsgRNA.raw   cut.score   0.031536205225035786    -0.07748569309326722    23376.948   24.034224854970773
# p20bp_HL.gab_evraw    cut.score   0.03122862333339956 0.08134330387171265 11371.508   23.528872705952985
# V303.xsgRNA.raw   cut.score   0.024179787748844286    0.08807981666533397 15380.933   21.956827949221033
# sgRNA.gcsgRNA.raw cut.score   0.022325151165478587    -0.024106490691227642   20311.568   26.244890774253403
# sgRNA.tempsgRNA.raw   cut.score   0.022005907003537557    -0.023982217046430908   20150.625   26.334414158992356
# pam.distance0 cut.score   0.021674161559039572    -0.03126214752727416    10974.685   26.83810830776713
# GGsgRNA.raw   cut.score   0.0211353898413249  -0.05595029772816261    14994.22    25.551588097444174
# CCsgRNA.raw   cut.score   0.020240876928700906    -0.05637778181358408    16701.448   25.492536106674574
##### removing H-bond values slightly decreases R2 (0.499 instead of 0.501)
##### + efficiency --> 20bp +LUMO -HOMO +HLgap, - p15 CC, + p19 GC, - GC content, - Tm, - PAM distance, -GG & -CC (presence of NGG PAM sites in seq)

–> minus onehot & minus HL

library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df <- df[,c(1:6072,6074:6078,6080,6082:6176)]
df.num <- mutate_all(df[,2:ncol(df)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"

df.abs <- df.all %>% select(grep("bondraw", names(df.all))) %>% abs()
df.all.sub <- df.all %>% select(-grep("bondraw", names(df.all))) 
df.abs.all <- cbind(df.all.sub, df.abs)
# 6173 columns --> 6171 features

df.minusIndOnehot <- df.abs.all[,c(1,2,18:160,166:ncol(df.abs.all))]
df.minusOnehot <- df.minusIndOnehot %>% select(-starts_with("V")) 
df.minusOnehot.minusHL <- df.minusOnehot %>% select(-grep("HL", names(df.minusOnehot)), -grep("HOMO", names(df.minusOnehot)), -grep("LUMO", names(df.minusOnehot))) 
# 87 features
df.Hbond <- df.abs.all %>% select(grep("bondraw", names(df.abs.all))) 
df.onlyHbond <- cbind(df.abs.all[,1:2], df.Hbond)
# 39 features

write.table(df.minusOnehot.minusHL, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noOnehot.noHL.txt", quote=F, row.names=F, sep="\t")
write.table(df.onlyHbond, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.onlyHbond.txt", quote=F, row.names=F, sep="\t")

write.table(df.minusOnehot.minusHL[,c(1,3:ncol(df.minusOnehot.minusHL))], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noOnehot.noHL.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.minusOnehot.minusHL[,c(1,3:ncol(df.minusOnehot.minusHL))], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noOnehot.noHL.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.minusOnehot.minusHL[,3:ncol(df.minusOnehot.minusHL)], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noOnehot.noHL.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

write.table(df.onlyHbond[,c(1,3:ncol(df.onlyHbond))], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.onlyHbond.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.onlyHbond[,c(1,3:ncol(df.onlyHbond))], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.onlyHbond.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.onlyHbond[,3:ncol(df.onlyHbond)], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.onlyHbond.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")


# Andes
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

module load python/3.7-anaconda3

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs
mkdir noOnehot.noHL
cd noOnehot.noHL
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName noOnehot.noHL --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noOnehot.noHL.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs
mkdir onlyHbond
cd onlyHbond
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName onlyHbond --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.onlyHbond.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noOnehot.noHL/Submits/submit_full_noOnehot.noHL_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/onlyHbond/Submits/submit_full_onlyHbond_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noOnehot.noHL/Submits/submit_train_noOnehot.noHL_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/onlyHbond/Submits/submit_train_onlyHbond_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noOnehot.noHL/Submits/submit_test_noOnehot.noHL_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/onlyHbond/Submits/submit_test_onlyHbond_0.sh

# Andes
module load python/3.7-anaconda3

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noOnehot.noHL
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt noOnehot.noHL
# 0.24799083517400664
sort -k3rg topVarEdges/cut.score_top95.txt | head
# p20bp_bondraw cut.score   0.12443999734254331
# p19No_electronsraw    cut.score   0.06504520534831205
# p19dimer_H_bondraw    cut.score   0.05707600239485058
# p15dimer_H_bondraw    cut.score   0.054043146748815385
# p18dimer_H_bondraw    cut.score   0.050200639795243515
# sgRNA.tempsgRNA.raw   cut.score   0.0494014094372015
# sgRNA.gcsgRNA.raw cut.score   0.04800084409206308
# p18No_electronsraw    cut.score   0.03985453633191877
# p17dimer_dipoleraw    cut.score   0.03811049033126881
# sgRNA.structuresgRNA.raw  cut.score   0.029678402148282976

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/onlyHbond
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt onlyHbond
# 0.180958939451897
sort -k3rg topVarEdges/cut.score_top95.txt | head
# p20bp_bondraw cut.score   0.1477944544885238
# p19dimer_H_bondraw    cut.score   0.08678872314376257
# p15dimer_H_bondraw    cut.score   0.08297400138624117
# p18dimer_H_bondraw    cut.score   0.07182739536386486
# p16dimer_H_bondraw    cut.score   0.05751856816469617
# p17dimer_H_bondraw    cut.score   0.0549216487126984
# p13dimer_H_bondraw    cut.score   0.053662650576422784
# p14dimer_H_bondraw    cut.score   0.04429202906856266
# p18bp_bondraw cut.score   0.044110755243141446
# p1dimer_H_bondraw cut.score   0.040268650732535265

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noOnehot.noHL/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("noOnehot.noHL_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4954905

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/onlyHbond/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("onlyHbond_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4367462
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noOnehot.noHL/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score noOnehot.noHL

#sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/RIT.noOnehot.noHL.run

# p20bp_bondraw cut.score   0.12443999734254331 -0.06086493294229018    39154.739   27.211546401266805
# p19No_electronsraw    cut.score   0.06504520534831205 0.008923883429441068    38520.807   23.99076976022741
# p19dimer_H_bondraw    cut.score   0.05707600239485058 -0.010939272215614184   33145.098   24.208864724972894
# p15dimer_H_bondraw    cut.score   0.054043146748815385    -0.01362492907054232    35588.614   25.712658675366907
# p18dimer_H_bondraw    cut.score   0.050200639795243515    -0.03100595933795214    31255.543   26.21521166674126
# sgRNA.tempsgRNA.raw   cut.score   0.0494014094372015  -0.013212082083621093   35564.463   25.97715494479864
# sgRNA.gcsgRNA.raw cut.score   0.04800084409206308 -0.010016722735733482   34283.919   25.90971917092089
# p18No_electronsraw    cut.score   0.03985453633191877 -0.025330903847284533   26521.346   26.413815379527346
# p17dimer_dipoleraw    cut.score   0.03811049033126881 -0.025843533243601967   24057.005   25.45356655231652
# sgRNA.structuresgRNA.raw  cut.score   0.029678402148282976    -0.01734664102468826    14977.634   26.878166015742522
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/onlyHbond/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score onlyHbond

#sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/RIT.onlyHbond.run

# p20bp_bondraw cut.score   0.1477944544885238  -0.07803761940100769    38934.887   29.546222754005143
# p19dimer_H_bondraw    cut.score   0.08678872314376257 -0.005703923004214408   50515.812   24.50135206711675
# p15dimer_H_bondraw    cut.score   0.08297400138624117 -0.01968899560256148    57045.193   26.063906523542002
# p18dimer_H_bondraw    cut.score   0.07182739536386486 -0.03125822610930675    50404.83    26.76022473629682
# p16dimer_H_bondraw    cut.score   0.05751856816469617 -0.003027973505231765   37458.294   25.396221121581732
# p17dimer_H_bondraw    cut.score   0.0549216487126984  0.010782588578066393    38636.846   23.965176687226734
# p13dimer_H_bondraw    cut.score   0.053662650576422784    -0.033627563302297506   41136.794   26.82103639543603
# p14dimer_H_bondraw    cut.score   0.04429202906856266 -0.02322576289984385    32772.794   26.651173019002712
# p18bp_bondraw cut.score   0.044110755243141446    -0.05198946284229877    28658.845   27.883558122868152
# p1dimer_H_bondraw cut.score   0.040268650732535265    -0.012672799266200033   23382.961   26.305437169935242
noHL output
#### look at the raw .rit files
library(dplyr)
library(tidyr)
library(reshape2)
library(ggplot2)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/cut.score")
rit <- read.delim("noHL_cut.score.rit", header=F, sep="\t")
key <- read.delim("noHL_cut.score.paths.key.out", header=F, sep=",")

colnames(key) <- c("feature", "feature.key")
colnames(rit) <- c("rit.value", "rit.features")
rit.id <- separate(rit, "rit.features", c("feature1.key", "feature2.key"))
rit.id$feature1.key <- as.numeric(rit.id$feature1.key)
rit.id$feature2.key <- as.numeric(rit.id$feature2.key)

key.1 <- key
colnames(key.1) <- c("feature1", "feature1.key")
key.2 <- key
colnames(key.2) <- c("feature2", "feature2.key")
rit.feature1.key <- left_join(rit.id, key.1, by=c("feature1.key"))
rit.key <- inner_join(rit.feature1.key, key.2, by=c("feature2.key"))
write.table(rit.key, "noHL_cut.score.rit_IDdefined.txt", quote=F, row.names=F, sep="\t")

–> trying to understand the p19.T and p19.GC –> split the feature matrix into guides that have p19.T and run iRF on the two sets (do the same for GC?)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
df <- read.delim("Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.txt", header=T, sep="\t")
df.p19Tpresent <- subset(df, df$V76.xsgRNA.raw == 1)
df.p19Tabsent <- subset(df, df$V76.xsgRNA.raw == 0)
df.p19GCpresent <- subset(df, df$V303.xsgRNA.raw == 1)
df.p19GCabsent <- subset(df, df$V303.xsgRNA.raw == 0)

nrow(df.p19Tpresent)
# [1] 9965
nrow(df.p19Tabsent)
# [1] 30503
nrow(df.p19GCpresent)
# [1] 3554
nrow(df.p19GCabsent)
# [1] 36914

summary(df.p19Tpresent$cut.score)
   # Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   # 0.00   20.58   28.68   26.22   33.43   47.63 
summary(df.p19Tabsent$cut.score)
   # Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   # 0.00   16.21   26.59   24.02   32.39   48.38 
summary(df.p19GCpresent$cut.score)
   # Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   # 0.00   20.80   28.62   26.25   33.23   47.86 
summary(df.p19GCabsent$cut.score)
   # Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   # 0.00   16.92   27.02   24.40   32.63   48.38
   
   
write.table(df.p19Tpresent[,c(1,3:ncol(df.p19Tpresent))], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.p19Tpresent.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.p19Tpresent[,c(1,3:ncol(df.p19Tpresent))], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.p19Tpresent.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.p19Tpresent[,3:ncol(df.p19Tpresent)], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.p19Tpresent.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.p19Tpresent[,1:2], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.p19Tpresent.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.p19Tpresent[,1:2], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.p19Tpresent.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.p19Tpresent[,2]), "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.p19Tpresent.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

write.table(df.p19Tabsent[,c(1,3:ncol(df.p19Tabsent))], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.p19Tabsent.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.p19Tabsent[,c(1,3:ncol(df.p19Tabsent))], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.p19Tabsent.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.p19Tabsent[,3:ncol(df.p19Tabsent)], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.p19Tabsent.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.p19Tabsent[,1:2], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.p19Tabsent.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.p19Tabsent[,1:2], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.p19Tabsent.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.p19Tabsent[,2]), "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.p19Tabsent.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

write.table(df.p19GCpresent[,c(1,3:ncol(df.p19GCpresent))], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.p19GCpresent.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.p19GCpresent[,c(1,3:ncol(df.p19GCpresent))], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.p19GCpresent.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.p19GCpresent[,3:ncol(df.p19GCpresent)], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.p19GCpresent.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.p19GCpresent[,1:2], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.p19GCpresent.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.p19GCpresent[,1:2], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.p19GCpresent.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.p19GCpresent[,2]), "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.p19GCpresent.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

write.table(df.p19GCabsent[,c(1,3:ncol(df.p19GCabsent))], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.p19GCabsent.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.p19GCabsent[,c(1,3:ncol(df.p19GCabsent))], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.p19GCabsent.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.p19GCabsent[,3:ncol(df.p19GCabsent)], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.p19GCabsent.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.p19GCabsent[,1:2], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.p19GCabsent.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.p19GCabsent[,1:2], "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.p19GCabsent.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.p19GCabsent[,2]), "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.p19GCabsent.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")




# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/p19Tpresent
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/p19Tabsent
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/p19GCpresent
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/p19GCabsent

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/p19Tpresent
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName p19Tpresent --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.p19Tpresent.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.p19Tpresent.score.txt

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/p19Tabsent
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName p19Tabsent --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.p19Tabsent.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.p19Tabsent.score.txt

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/p19GCpresent
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName p19GCpresent --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.p19GCpresent.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.p19GCpresent.score.txt

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/p19GCabsent
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName p1GCabsent --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.p19GCabsent.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.p19GCabsent.score.txt

# Summit
module load python/3.7.0-anaconda3-5.3.0

-------------- 21 FEB - RUNNING (Full & Train) --------------

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL
# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/p19Tpresent/Submits/submit_full_p19Tpresent_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/p19Tabsent/Submits/submit_full_p19Tabsent_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/p19GCpresent/Submits/submit_full_p19GCpresent_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/p19GCabsent/Submits/submit_full_p1GCabsent_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/p19Tpresent/Submits/submit_train_p19Tpresent_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/p19Tabsent/Submits/submit_train_p19Tabsent_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/p19GCpresent/Submits/submit_train_p19GCpresent_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/p19GCabsent/Submits/submit_train_p1GCabsent_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/p19Tpresent/Submits/submit_test_p19Tpresent_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/p19Tabsent/Submits/submit_test_p19Tabsent_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/p19GCpresent/Submits/submit_test_p19GCpresent_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/p19GCabsent/Submits/submit_test_p1GCabsent_0.sh

# Andes
module load python/3.7-anaconda3

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/p19Tpresent
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt p19Tpresent
# 0.1780913063671694
sort -k3rg topVarEdges/cut.score_top95.txt | head
# p19dimer_H_bondraw    cut.score   0.05703119594092076
# p17dimer_dipoleraw    cut.score   0.05183332172486474
# p18No_electronsraw    cut.score   0.05043329755794083
# pam.distance0 cut.score   0.026685455603498196
# p20bp_bondraw cut.score   0.026168226131384743
# p15dimer_H_bondraw    cut.score   0.023678757176819998
# CCsgRNA.raw   cut.score   0.023563347329654497
# V231.xsgRNA.raw   cut.score   0.023557304929279342
# sgRNA.tempsgRNA.raw   cut.score   0.02217299100333084
# sgRNA.gcsgRNA.raw cut.score   0.021612256248516253

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/p19Tabsent
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt p19Tabsent
# 0.27183334641539714
sort -k3rg topVarEdges/cut.score_top95.txt | head
# p20bp_bondraw cut.score   0.12589986706744333
# p19dimer_dipoleraw    cut.score   0.06613295137089187
# p18No_electronsraw    cut.score   0.039242199961416174
# p18dimer_H_bondraw    cut.score   0.03528859227225292
# V231.xsgRNA.raw   cut.score   0.02859306766025421
# V1110.xsgRNA.raw  cut.score   0.024069304464189856
# sgRNA.gcsgRNA.raw cut.score   0.02174813873715094
# sgRNA.tempsgRNA.raw   cut.score   0.019434033116543646
# pam.distance0 cut.score   0.01915053869578549
# p13dimer_H_bondraw    cut.score   0.017379386407298608

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/p19GCpresent
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt p19GCpresent
# 0.07216572977661447
sort -k3rg topVarEdges/cut.score_top95.txt | head
# sgRNA.tempsgRNA.raw   cut.score   0.048701028044401726
# sgRNA.gcsgRNA.raw cut.score   0.044620261446637555
# GGsgRNA.raw   cut.score   0.039082951201808325
# pam.distance0 cut.score   0.03749584630348131
# p10dimer_H_bondraw    cut.score   0.023517321133699278
# p13dimer_H_bondraw    cut.score   0.023505616534155734
# GsgRNA.raw    cut.score   0.022700068487117493
# p15dimer_H_bondraw    cut.score   0.019912304474287865
# p18No_electronsraw    cut.score   0.019511826104270377
# p11dimer_H_bondraw    cut.score   0.019277798779695328

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/p19GCabsent
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt p1GCabsent
# 0.2698276193898828
sort -k3rg topVarEdges/cut.score_top95.txt | head
# p20bp_bondraw cut.score   0.13993847837286075
# V76.xsgRNA.raw    cut.score   0.03582199087632203
# V231.xsgRNA.raw   cut.score   0.027531684162342425
# p18No_electronsraw    cut.score   0.025437384872412514
# p18bp_bondraw cut.score   0.025223490647335135
# p17dimer_dipoleraw    cut.score   0.021300485018915315
# V1110.xsgRNA.raw  cut.score   0.02023719857688435
# pam.distance0 cut.score   0.019841770829077888
# p19No_electronsraw    cut.score   0.018118988081585373
# CCsgRNA.raw   cut.score   0.017911147877539505
iRF-LOOP
python3 /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp.py <data file> --System Summit --NodesPer <number of nodes for each model> --TotalNodes <total number of nodes used at once> --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName <Runame that you pick, make it simple and unique> --RunTime <time in minutes>

python3 /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing_LOOP.py <path to yvec file> <Runame>
# source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
# conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/iRF.LOOP
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/iRF.LOOP

# generate iRF-LOOP submit code
python3 /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp.py /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.txt --System Summit --NodesPer 1 --TotalNodes 50 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName HbondAbs --RunTime 10

chmod +x Submits/submitAllFull_HbondAbs.sh # change permissions to allow execution of scripts
Submits/submitAllFull_HbondAbs.sh # submit all runs to summit queue
# Job <1862711> is submitted to default queue <batch>.
# Job <1862712> is submitted to default queue <batch>.
# Job <1862713> is submitted to default queue <batch>.
# Job <1862714> is submitted to default queue <batch>.
# Job <1862715> is submitted to default queue <batch>.
# Job <1862716> is submitted to default queue <batch>.

# make yvec file for post-processing
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
df <- read.delim("Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.txt", header=T, sep="\t")
df.names <- data.frame(names(df))
df.yvec <- data.frame(df.names[2:nrow(df.names),])
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/iRF.LOOP/Runs")
write.table(df.yvec, "HbondAbs.iRF.LOOP.yvec.txt", quote=F, row.names=F, col.names=F, sep="\t")

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/iRF.LOOP/Runs
python3 /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing_LOOP.py HbondAbs.iRF.LOOP.yvec.txt HbondAbs

# cat all the files in the normalizedEdges directory and that'll give you your final network!
#i choose either top 0.1% or top 100k, 50k, 10k edges
cd Runs/normalizedEdgeFiles
cat *_Normalize.txt > network.txt
# 125263 rows --> top .1% = 125
sort -k3rg network.txt | head
sort -k3rg network.txt | awk '{if ($3 < 1.0) print $0}' > network.ltone.txt
sort -k3rg network.ltone.txt | head -125 > network.ltone.top0.1percent.txt
sort -k3rg network.ltone.txt | head -100000 > network.ltone.top100k.txt
sort -k3rg network.ltone.txt | head -50000 > network.ltone.top50k.txt
sort -k3rg network.ltone.txt | head -10000 > network.ltone.top10k.txt

# pulling out the top edges with 'V76.xsgRNA.raw' (p19T)
## p19HOMO_eVraw, p19LUMO_eVraw, p19.TC, p16.GAGT, p16.CCGT, p16.GTTT, p18dimer_dipoleraw, p16.GTGT, p16.GGCT, p16.GGTT <-- not very illuminating
## 0.058 - 0.546

# pulling out the top edges with 'V303.xsgRNA.raw' (p19GC)
## p19.CGC, p19.GGC, p19dimer_LUMO_eVraw, p20HL.gap_eVraw, p20No_electronsraw, p17.GCGC, p19dimer_dipoleraw, p17.GGGC, p17.GAGC, p17.ACGC
## 0.101 - 0.231

# pulling out the top edges with 'V231.xsgRNA.raw' (p15CC)
## p15dimer_H_bondraw, p16.CCA, p13.GGCC, p15.CCTA, p15.CCGA, p13.CTCC, p15.CCCT, p15.CCGG, p13.TACC, p15.CCTC
## 0.090 - 0.405

# pulling out the top edges with 'cut.score'
grep 'cut.score' network.txt | sort -k3rg | head
## p8.CCGC, p8.CGCT, p8.CTTA, pam.distance0, p3.GTG, p8.CTGA, p8.CGCG, p15.GATC, p8.CTAC, p12.TG
## 0.072 - 0.092

# pulling out the top edges with 'p20bp_bondraw'
## p20bp_HOMO_evraw, p20bp_HOMO_evraw, p20bp_HL.gab_evraw, p20bp_LUMO_evraw, p20.A, cut.score, p20HOMO_eVraw, p15.GATC, p20No_electronsraw, CTsgRNA.raw
## 0.00017 - 0.514 (p20bp_HOMO_evraw & p20bp_HOMO_evraw = 1.0)

### LOTS of edge weight of 1 because I am using highly correlated variables... run again with the noHL matrix...



# noHL
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/iRF.LOOP
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/iRF.LOOP

# generate iRF-LOOP submit code
python3 /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp.py /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.txt --System Summit --NodesPer 1 --TotalNodes 50 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName HbondAbs.noHL --RunTime 10

chmod +x Submits/submitAllFull_HbondAbs.noHL.sh # change permissions to allow execution of scripts
Submits/submitAllFull_HbondAbs.noHL.sh # submit all runs to summit queue
# Job <1863359> is submitted to default queue <batch>.
# Job <1863360> is submitted to default queue <batch>.
# Job <1863361> is submitted to default queue <batch>.
# Job <1863363> is submitted to default queue <batch>.
# Job <1863364> is submitted to default queue <batch>.

-------------- 21 FEB - RUNNING --------------


# make yvec file for post-processing
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
df <- read.delim("Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.txt", header=T, sep="\t")
df.names <- data.frame(names(df))
df.yvec <- data.frame(df.names[2:nrow(df.names),])
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/iRF.LOOP/Runs")
write.table(df.yvec, "HbondAbs.noHL.iRF.LOOP.yvec.txt", quote=F, row.names=F, col.names=F, sep="\t")

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/iRF.LOOP/Runs
python3 /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing_LOOP.py HbondAbs.noHL.iRF.LOOP.yvec.txt HbondAbs.noHL

cd Runs/normalizedEdgeFiles
cat *_Normalize.txt > network.txt
# 47310 rows --> top .1% = 473
sort -k3rg network.txt | head
sort -k3rg network.txt | head -473 > network.top0.1percent.txt
sort -k3rg network.txt | head -100000 > network.top100k.txt
sort -k3rg network.txt | head -50000 > network.top50k.txt
sort -k3rg network.txt | head -10000 > network.top10k.txt


grep 'cut.score' network.txt | sort -k3rg | head
# cut.score pam.distance0   0.08841481693851837
# p20bp_bondraw cut.score   0.06110032175917336
# pam.distance0 cut.score   0.02408763351149707
# TsgRNA.raw    cut.score   0.019515037601399155
# p15dimer_H_bondraw    cut.score   0.01740265390815862
# CCsgRNA.raw   cut.score   0.016785175110388233
# sgRNA.gcsgRNA.raw cut.score   0.016711802143397854
# p13dimer_H_bondraw    cut.score   0.016327125420048758
# p18dimer_H_bondraw    cut.score   0.01590848828691507
# GsgRNA.raw    cut.score   0.015366785097371518

grep 'V76.xsgRNA.raw' network.txt | sort -k3rg | head
# p19bp_bondraw V76.xsgRNA.raw  0.15188889190473331
# V76.xsgRNA.raw    V4088sgRNA.raw  0.10756992590874931
# V76.xsgRNA.raw    V4000sgRNA.raw  0.10175957778147536
# V76.xsgRNA.raw    V3984sgRNA.raw  0.09654661286619425
# V76.xsgRNA.raw    V4096sgRNA.raw  0.08483105130995978
# V76.xsgRNA.raw    V4080sgRNA.raw  0.0836337394079397
# V76.xsgRNA.raw    V3936sgRNA.raw  0.08087753157943142
# V76.xsgRNA.raw    V3884sgRNA.raw  0.07978658332523354
# V76.xsgRNA.raw    V3904sgRNA.raw  0.07555473700109853

grep 'V303.xsgRNA.raw' network.txt | sort -k3rg | head
# V303.xsgRNA.raw   V1119.xsgRNA.raw    0.3884976598714581
# V303.xsgRNA.raw   V1151.xsgRNA.raw    0.19703972236877706
# V303.xsgRNA.raw   V4303sgRNA.raw  0.11641607416247793
# V303.xsgRNA.raw   V4319sgRNA.raw  0.1138546399195823
# V303.xsgRNA.raw   V4159sgRNA.raw  0.11270664283826766
# V303.xsgRNA.raw   V4255sgRNA.raw  0.1089358865285607
# V303.xsgRNA.raw   V4335sgRNA.raw  0.10481924119027607
# V303.xsgRNA.raw   V4351sgRNA.raw  0.10369638650870966
# V303.xsgRNA.raw   V4143sgRNA.raw  0.10236016550670296
# V303.xsgRNA.raw   V4191sgRNA.raw  0.09911827919285124

grep 'V231.xsgRNA.raw' network.txt | sort -k3rg | head
# p15dimer_H_bondraw    V231.xsgRNA.raw 0.5978295804961967
# V231.xsgRNA.raw   V918.xsgRNA.raw 0.40421967538404063
# V231.xsgRNA.raw   V3674sgRNA.raw  0.20498975279558035
# V231.xsgRNA.raw   V3681sgRNA.raw  0.14872182466757883
# p15No_electronsraw    V231.xsgRNA.raw 0.12292070595320288
# V59.xsgRNA.raw    V231.xsgRNA.raw 0.10618300926922816
# V231.xsgRNA.raw   V3675sgRNA.raw  0.10552127381664149
# V231.xsgRNA.raw   V3672sgRNA.raw  0.10398216869669075
# V231.xsgRNA.raw   V3666sgRNA.raw  0.09884702885484467
# V231.xsgRNA.raw   V3667sgRNA.raw  0.09454500241178802



#### replace the names with the onehot labels
library(dplyr)

setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/onehot")
labels <- read.delim("onehot.labels.txt", header=F, sep="\t")
labels.t <- data.frame(t(labels))
labels.df <- labels.t[22:nrow(labels.t),]

setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/iRF.LOOP")
df <- read.delim("noHL.network.txt", header=F, sep="\t")

df$V1<-gsub("sgRNA.raw","",as.character(df$V1))
df$V2<-gsub("sgRNA.raw","",as.character(df$V2))

library(data.table)
colnames(labels.df) <- c("V1", "label")
df.label <- setDT(df)[labels.df, V1 := i.label , on =.(V1)]
colnames(labels.df) <- c("V2", "label")
df.label2 <- setDT(df.label)[labels.df, V2 := i.label , on =.(V2)]
write.table(df.label2, "noHL.network.label.txt", quote=F, row.names=F, sep="\t")
# sort -k3rg noHL.network.label.txt | head -473 > noHL.network.label.top0.1percent.txt
Efficiency cutoff
  • Separate the matrix into “good/bad” guides…
# You could sample x% from each of the tails as the good and bad sets.  Check where 10% from each end would put the cutoffs.  Or what percent would everything above 40 be?

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
df <- read.delim("Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.txt", header=T, sep="\t")
summary(df$cut.score)
   # Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   # 0.00   17.24   27.18   24.56   32.69   48.38 
df.val <- subset(df, df$cut.score > 0)
nrow(df)
# 40468
nrow(subset(df, df$cut.score > 40))
# 868 / 40468 = 0.02144905

df.quarter1 <- subset(df, df$cut.score < 17.24)
df.quarter4 <- subset(df, df$cut.score > 32.69)
write.table(df.quarter1, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.FirstQuarter.txt", quote=F, row.names=F, sep="\t")
write.table(df.quarter4, "Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.FourthQuarter.txt", quote=F, row.names=F, sep="\t")

nrow(df.val)
# 40464 * 0.1 = 4046.4
library(dplyr)
df.top <- df.val %>%                                
  arrange(desc(cut.score)) %>% 
  slice(1:4046)
head(df.top[,1:5])
nrow(df.top)

df.bottom <- df.val %>%                                
  arrange(cut.score) %>% 
  slice(1:4046)
head(df.bottom[,1:5])
nrow(df.bottom)
# Taking the top and bottom 10% I get cutting efficiency scores > 36.32 and < 7.81, respectively


nrow(df.val)
# 40464 * 0.05 = 2023.2
library(dplyr)
df.top <- df.val %>%                                
  arrange(desc(cut.score)) %>% 
  slice(1:2023)
tail(df.top[,1:5])
nrow(df.top)

df.bottom <- df.val %>%                                
  arrange(cut.score) %>% 
  slice(1:2023)
tail(df.bottom[,1:5])
nrow(df.bottom)
# Taking the top and bottom 5% I get cutting efficiency scores > 38.22 and < 4.21, respectively
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/iRF.LOOP.top
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/iRF.LOOP.top

# generate iRF-LOOP submit code
python3 /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp.py /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.FourthQuarter.txt --System Summit --NodesPer 1 --TotalNodes 50 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName fourth.quarter --RunTime 10

chmod +x Submits/submitAllFull_fourth.quarter.sh # change permissions to allow execution of scripts
Submits/submitAllFull_fourth.quarter.sh # submit all runs to summit queue
# Job <1865784> is submitted to default queue <batch>.
# Job <1865785> is submitted to default queue <batch>.
# Job <1865786> is submitted to default queue <batch>.
# Job <1865787> is submitted to default queue <batch>.
# Job <1865788> is submitted to default queue <batch>.

# post-processing
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/iRF.LOOP.top/Runs
python3 /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing_LOOP.py /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/iRF.LOOP/Runs/HbondAbs.noHL.iRF.LOOP.yvec.txt fourth.quarter

cd Runs/normalizedEdgeFiles
cat *_Normalize.txt > network.txt
#  rows --> top .1% = 
sort -k3rg network.txt | head
sort -k3rg network.txt | head -X > network.top0.1percent.txt
sort -k3rg network.txt | head -100000 > network.top100k.txt
sort -k3rg network.txt | head -50000 > network.top50k.txt
sort -k3rg network.txt | head -10000 > network.top10k.txt


mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/iRF.LOOP.bottom
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/iRF.LOOP.bottom

# generate iRF-LOOP submit code
python3 /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp.py /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.FirstQuarter.txt --System Summit --NodesPer 1 --TotalNodes 50 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName first.quarter --RunTime 10

chmod +x Submits/submitAllFull_first.quarter.sh # change permissions to allow execution of scripts
Submits/submitAllFull_first.quarter.sh # submit all runs to summit queue
# Job <1865790> is submitted to default queue <batch>.
# Job <1865791> is submitted to default queue <batch>.
# Job <1865792> is submitted to default queue <batch>.
# Job <1865793> is submitted to default queue <batch>.
# Job <1865794> is submitted to default queue <batch>.

# post-processing
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/iRF.LOOP.bottom/Runs
python3 /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing_LOOP.py /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/iRF.LOOP/Runs/HbondAbs.noHL.iRF.LOOP.yvec.txt first.quarter

cd Runs/normalizedEdgeFiles
cat *_Normalize.txt > network.txt
#  rows --> top .1% = 
sort -k3rg network.txt | head
sort -k3rg network.txt | head -X > network.top0.1percent.txt
sort -k3rg network.txt | head -100000 > network.top100k.txt
sort -k3rg network.txt | head -50000 > network.top50k.txt
sort -k3rg network.txt | head -10000 > network.top10k.txt
Investigating matrix
  • look into examples of sgRNA that have p19.GC, p19.T, p15.CC, etc… are they annotated properly? what does the cut.score look like? are they found together?
library(dplyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("Ecoli.allCas9.raw.onehot.tensor.single.bp.dcast.na.txt", header=T, sep="\t")

p19 <- df %>% select(sgRNAID, cut.score, V303.xsgRNA.raw, V76.xsgRNA.raw) # select columns for p19.GC and p19.T
subset(p19, p19$V303.xsgRNA.raw == 1 & V76.xsgRNA.raw == 1) # no rows have both p19.GC = 1 AND p19.T = 1 which is good!

p19G <- df %>% select(sgRNAID, cut.score, V303.xsgRNAraw, V77.xsgRNA.raw, V79.xsgRNA.raw) # select columns for p19.GC and p19.G and P20.C
nrow(subset(p19G, p19G$V303.xsgRNA.raw == 1)) # how many guides have GC at position 19/20
nrow(subset(p19G, p19G$V303.xsgRNA.raw == 1 & p19G$V77.xsgRNA.raw == 1 & p19G$V79.xsgRNA.raw == 1)) # same number of guides with p19.GC have a G at position 19 and C at position 20, another good check

p19GC <- subset(df, df$V303.xsgRNA.raw == 1)
# nrow = 11056
p19T <- subset(df, df$V76.xsgRNA.raw == 1)
# nrow = 30872 <-- three times the amount of guides with a T at position 19 compared to a GC at 19/20??
p15CC <- subset(df, df$V231.xsgRNA.raw == 1)
# nrow = 8663

nrow(subset(p19GC, p19GC$V231.xsgRNA.raw == 1))
# 776 / 11056 = 0.07018813 <-- 7% of the guides with GC at position 19/20 ALSO have CC at position 15/16... anti-correlated in model so makes sense
nrow(subset(p19GC, p19GC$V76.xsgRNA.raw == 1))
# 0 <-- just another check that guides with GC at position 19/20 ARE NOT labeled as T at position 19
nrow(subset(p19T, p19T$V231.xsgRNA.raw == 1))
# 2266 / 30872 = 0.07339984 <-- also only 7% of guides with T at position 19 ALSO have CC at position 15/16... also anti-corrlated in model

summary(p19GC$cut.score)
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#   0.00   12.46   16.78   19.30   27.58   47.86  <-- fewer cases but GC has higher cut.score than T
summary(p19T$cut.score)
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#   0.00   12.04   16.15   18.86   27.33   47.63 
summary(p15CC$cut.score)
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#   0.01    7.68   12.90   14.51   19.64   47.08 

## what about just having a G at position 19 or just a C at position 20 but not the combination??
p19G <- subset(df, df$V77.xsgRNA.raw == 1)
# 30311
p20C <- subset(df, df$V79.xsgRNA.raw == 1)
# 41990
summary(p19G$cut.score)
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#   0.00   12.09   16.39   19.19   28.04   48.38  <-- the G at position 19 is more important??
summary(p20C$cut.score)
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#   0.00   10.33   14.57   16.64   23.63   47.86 

# or a CG instead of a GC?
p19CG <- subset(df, df$V297.xsgRNA.raw == 1)
# 8580
summary(p19CG$cut.score)
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#   0.01   10.66   14.74   16.74   23.16   45.10 <-- CG has lower cut.score than GC

### really care about NOT having p15.CC
no.p15CC <- subset(df, df$V231.xsgRNA.raw == 0)
# 117519
summary(no.p15CC$cut.score)
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#   0.00   11.36   15.46   18.04   26.21   48.38 

# no.p15CC with p19.GC
no.p15CC.p19GC <- subset(df, df$V231.xsgRNA.raw == 0 & df$V303.xsgRNA.raw == 1)
# 10280
summary(no.p15CC.p19GC$cut.score)
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#   0.00   12.57   16.89   19.46   27.72   47.86  <-- higher cut.score when 15CC absent AND 19GC present
no.p15CC.p19T <- subset(df, df$V231.xsgRNA.raw == 0 & df$V76.xsgRNA.raw == 1)
# 28606
summary(no.p15CC.p19T$cut.score)
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#   0.00   12.21   16.34   19.09   27.61   47.63 


#AT at p20??? efficiency?? cor w/ HL/Hbond?? <-- A cor = 0.703255986687413; T cor = 0.685400918800646; G cor = -0.00342552613946828; C cor = -0.74822153977598
p20.A <- subset(df, df$V78.xsgRNA.raw == 1)
p20.T <- subset(df, df$V80.xsgRNA.raw == 1)
p20.G <- subset(df, df$V81.xsgRNA.raw == 1)
p20.C <- subset(df, df$V79.xsgRNA.raw == 1)

summary(p20.A$cut.score)
summary(p20.T$cut.score)
summary(p20.G$cut.score)
summary(p20.C$cut.score)
   # Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   # 0.00   12.12   16.19   19.01   27.57   46.27 
   # Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   # 0.01   13.11   18.15   20.78   29.83   48.38  <-- T at position 20 has the highest efficiency
   # Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   # 0.00   10.78   15.09   17.49   25.55   48.38 
   # Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   # 0.00   10.33   14.57   16.64   23.63   47.86 <-- definitely DO NOT want a C at position 20??

# no.p15CC with p20.T
no.p15CC.p20T <- subset(df, df$V231.xsgRNA.raw == 0 & df$V80.xsgRNA.raw == 1)
# 23242
summary(no.p15CC.p20T$cut.score)
   # Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   # 0.03   13.20   18.40   20.96   30.05   48.38 



# based on the U16-Arg447 & G18-Arg71 ...
p16.T <- subset(df, df$V64.xsgRNA.raw == 1)
# 29101
p18.G <- subset(df, df$V73.xsgRNA.raw == 1)
# 27429
summary(p16.T$cut.score)
   # Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   # 0.00   11.76   15.82   18.49   26.80   48.38 <-- features say NO p15.CC or p16.GG
summary(p18.G$cut.score)
   # Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   # 0.00    9.31   13.92   15.73   21.79   48.38 

iRF output figures

  • move all of the R2_foldResults.txt output files from each run into a combined folder and re-name file to match the specific run
  • generate inclusive violin plots that show the variation in R2 for kfold cross validation runs across different matrix sets
RIT+ figures
require(data.table)

### violin plots of R2 across different models

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults")

#create a list of the files from your target directory
file_list <- list.files(path="/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults")

#initiate a blank data frame, each iteration of the loop will add a column of the data from the given file to this variable
dataset <- data.frame()

for (i in 1:length(file_list)){
  temp_data <- fread(file_list[i], stringsAsFactors = F) #read in files using the fread function from the data.table package
  dataset <- do.call(cbind, sapply(file_list,data.table::fread, simplify = FALSE)) #for each iteration, bind the new data to the building dataset
}
colnames(dataset) <- c("onehot", "onehot.QCT", "raw.onehot", "raw.onehot.QCT", "raw", "raw.QCT", "QCT", "QCT.dimers", "QCT.single.bp.dimers.noncorrelated", "QCT.single.bp.dimers", "top10", "top100", "top1k", "top20", "top200", "top5", "top50", "top500")

library(ggplot2)
library(reshape2)
library(RColorBrewer)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")

dataset.order <- dataset[,c(5,1,7,3,6,2,4,8,10,9,16,11,14,17,12,15,18,13)]
dataset.order.melt <- melt(dataset.order)
pdf("R2.order.violin.pdf")
ggplot(dataset.order.melt, aes(x=value, y=variable, fill=variable)) + geom_violin(trim=FALSE) + geom_boxplot(width=0.1, fill="white") + scale_color_brewer(palette="Dark2") + labs(title="R2 across iRF runs", x="feature run", y="R2") + theme_minimal() + theme(legend.position = "none")
dev.off()

dataset.subsets <- dataset[,c(5,1,7,3,6,2,4,8,10,9)]
dataset.subsets.melt <- melt(dataset.subsets)
pdf("R2.subsets.violin.pdf")
ggplot(dataset.subsets.melt, aes(x=value, y=variable, fill=variable)) + geom_violin(trim=FALSE) + geom_boxplot(width=0.1, fill="white") + scale_color_brewer(palette="Dark2") + labs(title="R2 across iRF runs", x="feature run", y="R2") + theme_minimal() + theme(legend.position = "none")
dev.off()

dataset.subsets2 <- dataset[,c(5,1,7,3,6,2,4,10)]
dataset.subsets2.melt <- melt(dataset.subsets2)
pdf("R2.subsets2.violin.pdf")
ggplot(dataset.subsets2.melt, aes(x=value, y=variable, fill=variable)) + geom_violin(trim=FALSE) + geom_boxplot(width=0.1, fill="white") + scale_color_brewer(palette="Dark2") + labs(title="R2 across iRF runs", x="feature run", y="R2") + theme_minimal() + theme(legend.position = "none")
dev.off()

dataset.top <- dataset[,c(16,11,14,17,12,15,18,13)]
dataset.top.melt <- melt(dataset.top)
pdf("R2.top.violin.pdf")
ggplot(dataset.top.melt, aes(x=value, y=variable, fill=variable)) + geom_violin(trim=FALSE) + geom_boxplot(width=0.1, fill="white") + scale_color_brewer(palette="Dark2") + labs(title="R2 across iRF runs", x="feature run", y="R2") + theme_minimal() + theme(legend.position = "none")
dev.off()


## scatterplot of feature importance vs the samples affected by that feature
library(ggplot2)
library(reshape2)
library(RColorBrewer)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/cut.score")
effect <- read.delim("e.coli.tensor.single.bp.dimers_cut.score.importance4.effect", header=T, sep="\t", stringsAsFactors = F)
# 2020
effect.sort <- effect[order(-effect$NormEdge),]
effect.sort.100 <- effect.sort[1:100,]

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
pdf("Imp.Effect.scatter.pdf")
ggplot(effect.sort.100, aes(x=NormEdge, y=Samples, label=Feature)) + geom_point(aes(size = FeatureEffect)) + scale_color_brewer(palette="Dark2") + labs(title="Samples v Importance (Top 100 Features)", x="Normalized Importance", y="Samples") + theme_minimal() + geom_text(hjust=0, vjust=0, size=2)
dev.off()

library(ggrepel)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
pdf("Imp.Effect.scatter.label.pdf")
ggplot(effect.sort.100, aes(x=NormEdge, y=Samples, label=Feature)) + geom_point(aes(size = FeatureEffect)) + scale_color_brewer(palette="Dark2") + labs(title="Samples v Importance (Top 100 Features)", x="Normalized Importance", y="Samples") + theme_minimal() + geom_text_repel(aes(label = Feature), box.padding = 0.35, point.padding = 0.5, segment.color = 'grey50', size=2)
dev.off()

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
summary(effect.sort.100$NormEdge)
# Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
# 0.002762 0.003401 0.004474 0.007956 0.010092 0.041329 
summary(effect.sort.100$Samples)
   # Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   # 1199    1722    3033    5030    6255   19242 

# pdf("Imp.Effect.scatter.label.quartile.pdf")
# ggplot(effect.sort.100, aes(x=NormEdge, y=Samples, label=Feature)) + geom_point(color = dplyr::case_when(effect.sort.100$Samples > 6255 ~ "#1b9e77", effect.sort.100$Samples < 1722 ~ "#d95f02", TRUE ~ "#7570b3"), size = effect.sort.100$FeatureEffect, alpha = 0.8) +
#   geom_text_repel(data = subset(effect.sort.100, Samples > 6255),
#                   nudge_y       = 32 - subset(effect.sort.100, Samples > 6255)$Samples,
#                   size          = 2,
#                   box.padding   = 1.5,
#                   point.padding = 0.5,
#                   force         = 100,
#                   segment.size  = 0.2,
#                   segment.color = "grey50",
#                   direction     = "x") + 
#   scale_color_brewer(palette="Dark2") + labs(title="Samples v Importance (Top 100 Features)", x="Normalized Importance", y="Samples") + theme_minimal()
# dev.off()

pdf("Imp.Effect.scatter.label.quartile.color.pdf")
ggplot(effect.sort.100, aes(x=NormEdge, y=Samples, label=Feature)) + geom_point(aes(size = FeatureEffect), color = dplyr::case_when(effect.sort.100$Samples > 6255 ~ "#1b9e77", effect.sort.100$Samples < 1722 ~ "#d95f02", TRUE ~ "#7570b3"), alpha = 0.8) + geom_text_repel(aes(label = Feature), box.padding = 0.35, point.padding = 0.5, segment.color = 'grey50', size=2) + labs(title="Samples v Importance (Top 100 Features)", x="Normalized Importance", y="Samples") + theme_minimal()
dev.off()

pdf("Imp.Effect.scatter.label.quartile.colorEffect.pdf")
ggplot(effect.sort.100, aes(x=NormEdge, y=Samples, label=Feature)) + geom_point(aes(size = FeatureEffect), color = dplyr::case_when(effect.sort.100$FeatureEffect > 0.010092 ~ "#1b9e77", effect.sort.100$FeatureEffect < 0.003401 ~ "#d95f02", TRUE ~ "#7570b3"), alpha = 0.8) + geom_text_repel(aes(label = Feature), box.padding = 0.35, point.padding = 0.5, segment.color = 'grey50', size=2) + labs(title="Samples v Importance (Top 100 Features)", x="Normalized Importance", y="Samples") + theme_minimal()
dev.off()


# ## heatmap of direction, size effect and importance of top features
# library(dplyr)
# library(ggplot2)
# library(reshape2)
# library(hrbrthemes)
# library(viridis)
# 
# setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/cut.score")
# effect <- read.delim("e.coli.tensor.single.bp.dimers_cut.score.importance4.effect", header=T, sep="\t", stringsAsFactors = F)
# # 2020
# effect.sort <- effect[order(-effect$NormEdge),]
# effect.sort.20 <- effect.sort[1:20,c(1,3:4)]
# effect.sort.20.dir <- effect.sort.20 %>% mutate(Direction = ifelse(FeatureEffect > 0, "Predict High Efficiency", "Predict Low Efficiency"))
# colnames(effect.sort.20.dir) <- c("Feature", "Normalized Importance", "Effect Size", "Direction of Effect")
# effect.sort.20.melt <- melt(effect.sort.20.dir, id="Feature")
# 
# setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
# pdf("Dir.Imp.Effect.heatmap.pdf")
# #ggplot(effect.sort.20.melt, aes(variable, Feature, fill= value)) + geom_tile() + scale_fill_viridis(discrete=FALSE) + theme_ipsum() + facet_wrap(. ~ variable, scales="free") + labs(title="Direction, Importance, and Effect (Top 20 Features)") + theme_minimal()
# ggplot(effect.sort.20.melt, aes(variable, Feature, fill= value)) + geom_tile() + facet_wrap(. ~ variable, scales="free") + labs(title="Direction, Importance, and Effect (Top 20 Features)") + theme_minimal() + theme(legend.position="bottom")
# dev.off()


## for a single feature... normalized importance across models
## normalized importance across features (for a single model)
library(ggplot2)
library(reshape2)
library(RColorBrewer)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/cut.score")
effect <- read.delim("e.coli.tensor.single.bp.dimers_cut.score.importance4.effect", header=T, sep="\t", stringsAsFactors = F)
# 2020
effect.sort <- effect[order(-effect$NormEdge),]
effect.sort.50 <- effect.sort[1:50,]
effect.sort.50$category <- c("QCT.bp", "QCT.bp", "dep.kmer2", "dep.kmer2", "raw", "raw", "QCT.bp", "raw", "QCT.nucleotide", "QCT.bp", "ind.kmer2", "ind.kmer2", "dep.kmer3", "QCT.nucleotide", "ind.kmer1", "QCT.nucleotide", "QCT.dimer", "QCT.dimer", "QCT.nucleotide", "ind.kmer1", "ind.kmer1", "QCT.dimer", "QCT.dimer", "dep.kmer2", "QCT.dimer", "QCT.dimer", "raw", "QCT.dimer", "QCT.dimer", "QCT.dimer", "ind.kmer2", "QCT.dimer", "ind.kmer2", "QCT.dimer", "QCT.dimer", "QCT.nucleotide", "dep.kmer2", "QCT.nucleotide", "ind.kmer2", "ind.kmer2", "QCT.nucleotide", "QCT.dimer", "ind.kmer1", "QCT.nucleotide", "QCT.bp", "QCT.dimer", "QCT.dimer", "dep.kmer2", "QCT.bp", "dep.kmer2")


setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
pdf("Imp.bar.pdf")
ggplot(effect.sort.50, aes(x=reorder(Feature, -NormEdge), y=NormEdge, color=category)) + geom_bar(stat="identity") + labs(title="Feature Importance (Top 50 Features)", x="Feature", y="Normalized Importance") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Paired") 
dev.off()
R2 features subsets (Figure X)

–> calculate average R2 in each set

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults

sed '1d' <FILE> | awk '{ total += $1; count++ } END { print total/count }'

# e.coli.tensor.single.bp.dimers.noncorrelated.R2_foldResults.txt = 0.259283
# e.coli.tensor.single.bp.dimers.R2_foldResults.txt = 0.25925
# e.coli.tensor.dimers.noDWT.R2_foldResults.txt = 0.252797
# cas9only.raw.onehot.tensor.R2_foldResults.txt = 0.250661
# cas9only.onehot.tensor.R2_foldResults.txt = 0.25007
# cas9only.raw.tensor.R2_foldResults.txt = 0.242835
# cas9only.raw.onehot.R2_foldResults.txt = 0.192075
# cas9only.tensor.R2_foldResults.txt = 0.238398
# cas9only.onehot.R2_foldResults.txt = 0.191217
# cas9only.raw.R2_foldResults.txt = 0.0406861

# raw = 5
# onehot = 5885 + 4 PAM = 5889
# QCT = 80
# QCT bp = 80
# QCT dimer = 94
# all = 6148
# noncorrelated = 6091

df <- data.frame(feature.set = c("raw", "onehot", "QCT", "raw + onehot", "raw + QCT", "onehot + QCT", "raw + onehot + QCT", "raw + onehot + QCT + dimers", "raw + onehot + QCT + dimers + bp", "non-correlated"), R2 = c(0.0406861, 0.191217, 0.238398, 0.192075, 0.242835, 0.25007, 0.250661, 0.252797, 0.25925, 0.259283), feature.count = c(5, 5889, 80, 5+5889, 5+80, 5889+80, 5+5889+80, 5+5889+80+94, 6148, 6091))

library(ggplot2)
library(RColorBrewer)
ggplot(df) + geom_bar(aes(x=feature.set, y=R2, fill=feature.set), stat="identity") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Paired") + geom_line(aes(x=feature.set, y=feature.count, group=1),inherit.aes = FALSE, color="blue",size=2) + scale_y_continuous(name = "R2", sec.axis=sec_axis(~ . , name="Feature Count"), limits=c(0,6200)) + labs(title = "Size and Prediction Accuracy of Feature Subsets", x = "Feature Set", y = "R2") 
dataset metrics (Figure SX)
library(dplyr)
library(tidyr)
library(ggplot2)
library(reshape2)
library(RColorBrewer)

setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP")

#setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
doench <- read.delim("Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.txt", header=T, sep="\t", stringsAsFactors = F)
ncol(doench)
# 6174
nrow(doench)
# 673
var(doench$cut.score)
# 0.03162405
sd(doench$cut.score)
# 0.1778315
mean(doench$cut.score)
# 0.1678287
doench.num <- mutate_all(doench[,2:ncol(doench)], function(x) as.numeric(as.character(x)))
doench.num$cut.score <- (doench.num$cut.score - min(doench.num$cut.score)) / (max(doench.num$cut.score) - min(doench.num$cut.score))
doench.num <- cbind(data.frame("sgRNAID" = doench$sgRNAID), doench.num)
var(doench.num$cut.score)
# 0.03670343
sd(doench.num$cut.score)
# 0.1915814
mean(doench.num$cut.score)
# 0.1797033

ggplot(doench, aes(x=cut.score)) + geom_density() + theme_classic() + labs(title = "H.sapien (Doench et al., 2014)", x = "Experimental Cutting Efficiency Score", y = "Density")


#setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
lipolytica <- read.delim("y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.txt", header=T, sep="\t", stringsAsFactors = F)
ncol(lipolytica)
# 6174
nrow(lipolytica)
# 45271
var(lipolytica$cut.score)
# 7.647933
sd(lipolytica$cut.score)
# 2.76549
mean(lipolytica$cut.score)
# -3.50674
names(lipolytica)[names(lipolytica) == 'cut.score.x'] <- 'cut.score'
lipolytica.num <- mutate_all(lipolytica[,2:ncol(lipolytica)], function(x) as.numeric(as.character(x)))
lipolytica.num$cut.score <- (lipolytica.num$cut.score - min(lipolytica.num$cut.score)) / (max(lipolytica.num$cut.score) - min(lipolytica.num$cut.score))
lipolytica.num <- cbind(data.frame("sgRNAID" = lipolytica$sgRNAID), lipolytica.num)
var(lipolytica.num$cut.score)
# 0.02594887
sd(lipolytica.num$cut.score)
# 0.1610865
mean(lipolytica.num$cut.score)
# 0.3389167

ggplot(lipolytica, aes(x=cut.score)) + geom_density() + theme_classic() + labs(title = "Y.lipolytica (Baisya et al., 2021)", x = "Experimental Cutting Efficiency Score", y = "Density")


#setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
ecoli <- read.delim("Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.txt", header=T, sep="\t", stringsAsFactors = F)
ncol(ecoli)
# 6174
nrow(ecoli)
# 40468
var(ecoli$cut.score)
# 110.5085
sd(ecoli$cut.score)
# 10.5123
mean(ecoli$cut.score)
# 24.56023
ecoli.num <- mutate_all(ecoli[,2:ncol(ecoli)], function(x) as.numeric(as.character(x)))
ecoli.num$cut.score <- (ecoli.num$cut.score - min(ecoli.num$cut.score)) / (max(ecoli.num$cut.score) - min(ecoli.num$cut.score))
ecoli.num <- cbind(data.frame("sgRNAID" = ecoli$sgRNAID), ecoli.num)
var(ecoli.num$cut.score)
# 0.04721325
sd(ecoli.num$cut.score)
# 0.2172861
mean(ecoli.num$cut.score)
# 0.5076525

ggplot(ecoli, aes(x=cut.score)) + geom_density() + theme_classic() + labs(title = "E.coli (Guo et al., 2018)", x = "Experimental Cutting Efficiency Score", y = "Density")


df <- data.frame("E.coli" = c(40468, 0.5076525, 0.04721325, 0.2172861), "Y.lipolytica" = c(45271, 0.3389167, 0.02594887, 0.1610865), "H.sapien" = c(673, 0.1797033, 0.03670343, 0.1915814))
df$label <- c("Sample Size", "Mean", "Variance", "Standard Deviation")
df.melt <- melt(df)
ggplot(df.melt) + geom_bar(aes(x=variable, y=value, fill=variable), stat="identity") + facet_wrap(. ~ label, scales="free") + theme_classic() + theme(legend.position="bottom") + labs(x = "", y = "") + scale_fill_brewer(palette="Set1")

ecoli.num.df <- ecoli.num[,1:2]
ecoli.num.df$dataset <- "E.coli"
doench.num.df <- doench.num[,1:2]
doench.num.df$dataset <- "H.sapien"
lipolytica.num.df <- lipolytica.num[,1:2]
lipolytica.num.df$dataset <- "Y.lipolytica"
ecoli.doench.lip <- rbind(ecoli.num.df, doench.num.df, lipolytica.num.df)

ggplot(ecoli.doench.lip, aes(x=cut.score, color=dataset)) + geom_density() + theme_classic() + theme(legend.position="bottom") + labs(x = "Normalized Cutting Efficiency Score", y = "Density") + scale_color_brewer(palette="Set2")

Test model with yeast & human data

#!/bin/bash -l
#BSUB -P SYB105
#BSUB -W 02:15
#BSUB -nnodes 50
#BSUB -J yeast.test_0
#BSUB -o yeast.test_0.o%J
#BSUB -e yeast.test_0.e%J

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/yeast.test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/yeast.test

/usr/bin/time -f "%e" jsrun -n 1 -a 1 -c 40 -bpacked:40 /gpfs/alpine/syb105/proj-shared/Projects/iRF/IterativeRanger/cpp_version/build/ranger --file /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.features_overlap_noSampleIDs.txt --yfile /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.score_overlap_noSampleIDs.txt --predict /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/cut.score/e.coli.tensor.single.bp.dimers_cut.score.forest --treetype 3 --depvarname cut.score --impmeasure 1 --nthreads 160 --useMPI 0 --outprefix ecoli.model.yeast.test --outputDirectory /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/yeast.test > /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/yeast.test/ecoli.model.yeast.test.o

# bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/ecoli.model.yeast.test.sh


#### test the output
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/")
score <- read.delim("y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.score_overlap_noSampleIDs.txt", header=T, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/yeast.test/")
predict <- read.delim("ecoli.model.yeast.test.prediction", header=T, sep="\t")

score.predict <- cbind(score, predict)
cor(score.predict$cut.score, score.predict$Predictions.)
#-0.02424251
#!/bin/bash -l
#BSUB -P SYB105
#BSUB -W 02:15
#BSUB -nnodes 50
#BSUB -J doench.test_0
#BSUB -o doench.test_0.o%J
#BSUB -e doench.test_0.e%J

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/doench.test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/doench.test

/usr/bin/time -f "%e" jsrun -n 1 -a 1 -c 40 -bpacked:40 /gpfs/alpine/syb105/proj-shared/Projects/iRF/IterativeRanger/cpp_version/build/ranger --file /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.features_overlap_noSampleIDs.txt --yfile /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.score_overlap_noSampleIDs.txt --predict /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/cut.score/e.coli.tensor.single.bp.dimers_cut.score.forest --treetype 3 --depvarname cut.score --impmeasure 1 --nthreads 160 --useMPI 0 --outprefix ecoli.model.doench.test --outputDirectory /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/doench.test > /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/doench.test/ecoli.model.doench.test.o

# bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/ecoli.model.doench.test.sh


#### test the output
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/")
score <- read.delim("Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.score_overlap_noSampleIDs.txt", header=T, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers/doench.test/")
predict <- read.delim("ecoli.model.doench.test.prediction", header=T, sep="\t")

score.predict <- cbind(score, predict)
cor(score.predict$cut.score, score.predict$Predictions.)
# 0.06557198
## R2 ~ 0.004299685 

Essential genes

https://www.genome.wisc.edu/Gerdes2003/

  • The control is a dCas9 which means that it actually sits on the target but doesn’t cut meaning that it could still be disrupting the transcription of an essential gene and ultimately killing the cell… therefore the control is not an accurate control and would actually be lowering the log2FC calculation because the control denominator would be decreased
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
essential.genes <- read.delim("essential.genes.header.txt", header=T, sep="\t", stringsAsFactors = F)
sgRNA <- read.delim("sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)

essential.genes.df <- as.data.frame(essential.genes)
sgRNA.df <- as.data.frame(sgRNA)

sgRNA.essential <- subset(sgRNA.df, sgRNA.df$gene..promoter. %in% essential.genes.df$gene)
sgRNA.nonessential <- subset(sgRNA.df, !(sgRNA.df$gene..promoter. %in% sgRNA.essential$gene..promoter.))
length(unique(essential.genes$gene))
# 4162
length(unique(sgRNA.df$gene..promoter.))
# 4135
length(unique(sgRNA.essential$gene..promoter.))
# 3287
nrow(sgRNA)
# 56251
nrow(sgRNA.essential)
# 42944

summary(sgRNA$score)
   # Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   # 0.00   17.23   27.18   24.56   32.69   48.38   15757 
summary(sgRNA.nonessential$score)
   # Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   # 0.00   17.06   27.00   24.44   32.68   46.12    5812 
summary(sgRNA.essential$score)
   # Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
   # 0.00   17.27   27.21   24.59   32.69   48.38    9945 

15 March 2022: Quantum Matrix

  • generate final matrix with updated quantum properties (HL and H-bond) for monomer, basepair, dimer, trimer, tetramer
  • think through incorporating DNA and RNA sequence?
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J mar15.matrix
#SBATCH -N 4
#SBATCH -t 10:00:00

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli
R CMD BATCH mar15.matrix.R

#sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/mar15.matrix.sh
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
structure <- read.delim("Ecoli.allCas9.structure.txt", header=T, sep="\t", stringsAsFactors = F)
nuc <- read.delim("Ecoli.allCas9.nuc.count.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("Ecoli.allCas9.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(1:2)]
colnames(score.df) <- c("sgRNAID", "cut.score")

structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]

# structure, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])

structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"

structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]

structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "seq", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")

## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
onehot.ind1 <- read.delim("Ecoli.allCas9_ind1.txt", header=T, sep=" ")
# 5 columns (-1 for sgRNAID)
onehot.ind2 <- read.delim("Ecoli.allCas9_ind2.txt", header=T, sep=" ")
# 17
onehot.dep1 <- read.delim("Ecoli.allCas9_dep1.txt", header=F, sep=" ")
# 81
onehot.dep2 <- read.delim("Ecoli.allCas9_dep2.txt", header=F, sep=" ")
# 321
onehot.dep3 <- read.delim("Ecoli.allCas9_dep3.txt", header=F, sep=" ")
# 1154 <-- have 1218 for the labels??
onehot.dep4 <- read.delim("Ecoli.allCas9_dep4.txt", header=F, sep=" ")
# 4354 <-- have 5121 for the labels??
# 5926 total features...
colnames(onehot.dep1)[1] <- "sgRNAID"
colnames(onehot.dep2)[1] <- "sgRNAID"
colnames(onehot.dep3)[1] <- "sgRNAID"
colnames(onehot.dep4)[1] <- "sgRNAID"

onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)-1], onehot.dep2[,1:ncol(onehot.dep2)-1], by="sgRNAID")
onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)-1], by="sgRNAID")
onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)-1], by="sgRNAID")

onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"

### getting the labels for the onehot matrix
# setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
# setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/onehot")
# onehot.ind1 <- read.delim("ind1.head.txt", header=T, sep=" ")
# onehot.ind2 <- read.delim("ind2.head.txt", header=T, sep=" ")
# onehot.dep1 <- read.delim("dep1.txt", header=F, sep=" ")
# onehot.dep2 <- read.delim("dep2.txt", header=F, sep=" ")
# onehot.dep3 <- read.delim("dep3.txt", header=F, sep=" ")
# onehot.dep3 <- onehot.dep3[,1:1154]
# onehot.dep4 <- read.delim("dep4.txt", header=F, sep=" ")
# onehot.dep4 <- onehot.dep4[,1:4354]
# colnames(onehot.dep1)[1] <- "sgRNAID"
# colnames(onehot.dep2)[1] <- "sgRNAID"
# colnames(onehot.dep3)[1] <- "sgRNAID"
# colnames(onehot.dep4)[1] <- "sgRNAID"
# 
# onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
# onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)], onehot.dep2[,1:ncol(onehot.dep2)], by="sgRNAID")
# onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)], by="sgRNAID")
# onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)], by="sgRNAID")
# onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
# write.table(onehot, "onehot.labels.txt", quote=F, row.names=F, sep="\t")
# onehot.t <- data.frame(t(onehot))
# 6754 columns <-- corrected to match matrix used = 5926 total features

data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))

df.melt <- melt(data.onehot[,c(1,2,4:ncol(data.onehot))], id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
write.table(df.id, "e.coli.structure.temp.gc.onehot1to4.txt", quote=F, row.names=F, sep="\t")
# 

# pam (distance and nucleotide)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
sgRNA.pam <- read.table("ecoli.sgRNA.closestPAM.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.pam.sub <- sgRNA.pam[,c(4,12,13)]
colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]
sgRNA.pam.df$id <- "Cas9"
sgRNA.pam.id <- unite(sgRNA.pam.df, "sgRNAID", c(sgRNAID, id), sep="_")

score <- read.delim("Ecoli.allCas9.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(1:2)]
colnames(score.df) <- c("sgRNAID", "cut.score")

score.location <- left_join(score.df, sgRNA.pam.id, by="sgRNAID")
score.location$scale <- 0

df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.pam.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)

df <- read.delim("e.coli.structure.temp.gc.onehot1to4.txt", header=T, sep="\t")
df.onehot.dcast <- df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)

df.onehot.pam <- left_join(df.onehot.dcast, df.pam.dcast, by=c("sgRNAID"))

df.onehot.pam.na <- na.omit(df.onehot.pam)
nrow(df.onehot.pam.na)
# 


# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
sgRNA.genes <- read.table("sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- sgRNA.genes[,c(4,14)]
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")
sgRNA.genes.df$id <- "Cas9"
sgRNA.genes.id <- unite(sgRNA.genes.df, "sgRNAID", c(sgRNAID, id), sep="_")

score.location <- left_join(score.df, sgRNA.genes.id, by=c("sgRNAID"))
score.location$scale <- 0

df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.location.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.location.dcast.na <- na.omit(df.location.dcast)

df.pam.location <- inner_join(df.location.dcast.na, df.onehot.pam.na, by=c("sgRNAID"))
nrow(df.pam.location)
# 

write.table(df.pam.location, "Ecoli.allCas9.raw.matrix.txt", quote=F, row.names=F, sep="\t")
# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)

# Monomer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Monomer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
seq <- read.delim("Ecoli.allCas9.sequence.txt", header=T, sep=" ", stringsAsFactors = F)

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq) <- seq[,1]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Ecoli.quantum.monomer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Ecoli.quantum.monomer.melt.txt", quote=F, row.names=F, sep="\t")


# Basepair
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Basepair.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
seq <- read.delim("Ecoli.allCas9.sequence.txt", header=T, sep=" ", stringsAsFactors = F)

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq) <- seq[,1]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Ecoli.quantum.basepair.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Ecoli.quantum.basepair.melt.txt", quote=F, row.names=F, sep="\t")


# Dimer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Dimer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
seq <- read.delim("Ecoli.allCas9.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.dimer <- seq %>% unite("p1", p1:p2, remove=F, sep= "") %>% unite("p2", p2:p3, remove=F, sep= "") %>% unite("p3", p3:p4, remove=F, sep= "") %>% unite("p4", p4:p5, remove=F, sep= "") %>% unite("p5", p5:p6, remove=F, sep= "") %>% unite("p6", p6:p7, remove=F, sep= "") %>% unite("p7", p7:p8, remove=F, sep= "") %>% unite("p8", p8:p9, remove=F, sep= "") %>% unite("p9", p9:p10, remove=F, sep= "") %>% unite("p10", p10:p11, remove=F, sep= "") %>% unite("p11", p11:p12, remove=F, sep= "") %>% unite("p12", p12:p13, remove=F, sep= "") %>% unite("p13", p13:p14, remove=F, sep= "") %>% unite("p14", p14:p15, remove=F, sep= "") %>% unite("p15", p15:p16, remove=F, sep= "") %>% unite("p16", p16:p17, remove=F, sep= "") %>% unite("p17", p17:p18, remove=F, sep= "") %>% unite("p18", p18:p19, remove=F, sep= "") %>% unite("p19", p19:p20, remove=T, sep= "")

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq.dimer) <- seq.dimer[,1]
seq.df <- seq.dimer[,1:20]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Ecoli.quantum.dimer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Ecoli.quantum.dimer.melt.txt", quote=F, row.names=F, sep="\t")


# Trimer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Trimer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
seq <- read.delim("Ecoli.allCas9.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.trimer <- seq %>% unite("p1", p1:p3, remove=F, sep= "") %>% unite("p2", p2:p4, remove=F, sep= "") %>% unite("p3", p3:p5, remove=F, sep= "") %>% unite("p4", p4:p6, remove=F, sep= "") %>% unite("p5", p5:p7, remove=F, sep= "") %>% unite("p6", p6:p8, remove=F, sep= "") %>% unite("p7", p7:p9, remove=F, sep= "") %>% unite("p8", p8:p10, remove=F, sep= "") %>% unite("p9", p9:p11, remove=F, sep= "") %>% unite("p10", p10:p12, remove=F, sep= "") %>% unite("p11", p11:p13, remove=F, sep= "") %>% unite("p12", p12:p14, remove=F, sep= "") %>% unite("p13", p13:p15, remove=F, sep= "") %>% unite("p14", p14:p16, remove=F, sep= "") %>% unite("p15", p15:p17, remove=F, sep= "") %>% unite("p16", p16:p18, remove=F, sep= "") %>% unite("p17", p17:p19, remove=F, sep= "") %>% unite("p18", p18:p20, remove=F, sep= "")

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq.trimer) <- seq.trimer[,1]
seq.df <- seq.trimer[,1:19]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Ecoli.quantum.trimer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Ecoli.quantum.trimer.melt.txt", quote=F, row.names=F, sep="\t")


# Tetramer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Tetramer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
seq <- read.delim("Ecoli.allCas9.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.tetramer <- seq %>% unite("p1", p1:p4, remove=F, sep= "") %>% unite("p2", p2:p5, remove=F, sep= "") %>% unite("p3", p3:p6, remove=F, sep= "") %>% unite("p4", p4:p7, remove=F, sep= "") %>% unite("p5", p5:p8, remove=F, sep= "") %>% unite("p6", p6:p9, remove=F, sep= "") %>% unite("p7", p7:p10, remove=F, sep= "") %>% unite("p8", p8:p11, remove=F, sep= "") %>% unite("p9", p9:p12, remove=F, sep= "") %>% unite("p10", p10:p13, remove=F, sep= "") %>% unite("p11", p11:p14, remove=F, sep= "") %>% unite("p12", p12:p15, remove=F, sep= "") %>% unite("p13", p13:p16, remove=F, sep= "") %>% unite("p14", p14:p17, remove=F, sep= "") %>% unite("p15", p15:p18, remove=F, sep= "") %>% unite("p16", p16:p19, remove=F, sep= "") %>% unite("p17", p17:p20, remove=F, sep= "") 

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq.tetramer) <- seq.tetramer[,1]
seq.df <- seq.tetramer[,1:18]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Ecoli.quantum.tetramer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Ecoli.quantum.tetramer.melt.txt", quote=F, row.names=F, sep="\t")



setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
monomer <- read.delim("Ecoli.quantum.monomer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
basepair <- read.delim("Ecoli.quantum.basepair.melt.txt", header=T, sep="\t", stringsAsFactors = F)
dimer <- read.delim("Ecoli.quantum.dimer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
trimer <- read.delim("Ecoli.quantum.trimer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
tetramer <- read.delim("Ecoli.quantum.tetramer.melt.txt", header=T, sep="\t", stringsAsFactors = F)

monomer.basepair <- rbind(monomer, basepair)
monomer.basepair.dimer <- rbind(monomer.basepair, dimer)
monomer.basepair.dimer.trimer <- rbind(monomer.basepair.dimer, trimer)
monomer.basepair.dimer.trimer.tetramer <- rbind(monomer.basepair.dimer.trimer, tetramer)
write.table(monomer.basepair.dimer.trimer.tetramer, "Ecoli.15mar22.quantum.melt.txt", quote=F, row.names=F, sep="\t")
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("Ecoli.allCas9.raw.matrix.txt", header=T, sep="\t", stringsAsFactors = F)

# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
tensor <- read.delim("Ecoli.15mar22.quantum.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0

tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0

df.score <- unique(df[,c(1,2)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
colnames(tensor.score.order) <- c("cut.score", "feature.scale", "sgRNAID", "value")

df.dcast <- tensor.score.order %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
nrow(df.dcast.na)
# 40468

df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
write.table(df.location, "Ecoli.finalquantum.txt", quote=F, row.names=F, sep="\t")
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("Ecoli.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df.cut <- df %>% select(-grep("cut.score.y.y", names(df)), -grep("cut.score.y", names(df)), -grep("cut.score.x.x", names(df))) 
df.num <- mutate_all(df.cut[,2:ncol(df.cut)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df.cut[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"

write.table(df.all, "Ecoli.finalquantum.txt", quote=F, row.names=F, sep="\t")

write.table(df.all[,c(1,3:ncol(df.all))], "Ecoli.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "Ecoli.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "Ecoli.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Ecoli.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Ecoli.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.all[,2]), "Ecoli.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")


# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName e.coli.finalquantum --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/Submits/submit_full_e.coli.finalquantum_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/Submits/submit_train_e.coli.finalquantum_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/Submits/submit_test_e.coli.finalquantum_0.sh

# Andes
module load python/3.7-anaconda3

vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt e.coli.finalquantum
# 0.2491263918468429

sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/e.coli.finalquantum_cut.score.importance4 | head
# p20basepair.Hbond.energyraw: 129834
# p19dimer.Hbond.stackingraw: 83816.2
# p20basepair.Hlgap.eVEraw: 72112.9
# p1tetramer.Hbond.energyraw: 47947.1
# p18trimer.Hbond.stackingraw: 45361.6
# p11tetramer.Hbond.energyraw: 44435.1
# V231.xsgRNA.raw: 40206.9  <-- p15.CC
# p18dimer.Hbond.energyraw: 39997
# p18dimer.Hbond.stackingraw: 37078.4
# sgRNA.tempsgRNA.raw: 29678.1


# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("e.coli.finalquantum_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.5026295
RIT
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00

#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score e.coli.finalquantum

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/cut.score/RIT.run

# p20basepair.Hlgap.eVEraw  cut.score   0.06696141562856729 0.03657642590538788 27636.877   22.7097104529399
# p19dimer.Hbond.stackingraw    cut.score   0.03464100776701448 -0.02177720806464378    23222.226   23.879833753099728
# p20basepair.Hbond.energyraw   cut.score   0.030869948756904336    -0.03726636075285378    12671.673   26.854588230689323
# p11tetramer.Hbond.energyraw   cut.score   0.02157359665811225 -0.017156906797274052   15825.268   26.12220936768207
# p1tetramer.Hbond.energyraw    cut.score   0.021430069435486497    0.009695123833459367    19851.039   27.05879242091512
# V231.xsgRNA.raw   cut.score   0.02114294017802617 -0.030188561170599513   16749.994   24.112707842750073
# p18trimer.Hbond.stackingraw   cut.score   0.018561357882087588    0.02252639747755138 13935.941   23.14311630354901
# p18dimer.Hbond.stackingraw    cut.score   0.015586899271518235    0.01981957306973018 11737.613   23.47773385851222
# p18trimer.Hbond.energyraw cut.score   0.015115507543728361    -0.007948753966886423   10510.788   23.662072722110985
# p18dimer.Hbond.energyraw  cut.score   0.014481062606810093    -0.015054629467698068   8062.311    22.76173208112195



#### sorted by feature effect (not importance)
sort -k4rg e.coli.finalquantum_cut.score.importance4.effect | head
# p11basepair.Hlgap.eVEraw  cut.score   6.66202719563372e-07    41.99212346291712   0.421   -104.56922208388855
# V246sgRNA.raw cut.score   3.915671009763828e-07   18.985163076923072  0.094   12.388724153846157
# V4055sgRNA.raw    cut.score   7.190051765502139e-07   18.389079629629627  0.114   12.940920370370371
# V161.xsgRNA.raw   cut.score   2.985695404334478e-07   16.7936 0.053   -21.827900800000002
# V137.ysgRNA.raw   cut.score   1.9847267533431217e-07  16.6169 0.051   -25.763079500000003
# V86.xsgRNA.raw    cut.score   6.236383133538739e-07   12.254980263157897  0.089   -241.95720486973687
# V132sgRNA.raw cut.score   2.7053179487203707e-07  11.168820930232553  0.178   3.776982591860481
# V209.xsgRNA.raw   cut.score   2.5923627352298865e-07  9.804176712328768   0.081   0.7271831130136945
# V42.xsgRNA.raw    cut.score   7.193642751525615e-07   7.330155415809548   0.259   -201.256495464747
# p16basepair.Hbond.energyraw   cut.score   2.54235077363212e-07    6.7097000000000016  0.062   -196.83178666000006

# p11basepair.Hlgap.eVE, p1.GGCA, p16.GCCC, p10.GG, p3.ACG, p6.CA, p1.TAAT, p13.GG, p11.A, p16basepair.Hbond.energy
QCT.Subsets
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("Ecoli.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)
# 6234
monomer <- df %>% select(grep("monomer", names(df)))
# 40
bp <- df %>% select(grep("basepair", names(df)))
# 60
dimer <- df %>% select(grep("dimer", names(df)))
# 76
trimer <- df %>% select(grep("trimer", names(df)))
# 72
tetramer <- df %>% select(grep("tetramer", names(df)))
# 68
monomer.bp <- cbind(monomer, bp)
# 100
monomer.bp.dimer <- cbind(monomer, bp, dimer)
# 176
monomer.bp.dimer.trimer <- cbind(monomer, bp, dimer, trimer)
# 248
monomer.bp.dimer.trimer.tetramer <- cbind(monomer, bp, dimer, trimer, tetramer)
# 316

df.monomer <- cbind(df[,1:2], monomer)
df.bp <- cbind(df[,1:2], bp)
df.dimer <- cbind(df[,1:2], dimer)
df.trimer <- cbind(df[,1:2], trimer)
df.tetramer <- cbind(df[,1:2], tetramer)
df.monomer.bp <- cbind(df.monomer, bp)
df.monomer.bp.dimer <- cbind(df.monomer, bp, dimer)
df.monomer.bp.dimer.trimer <- cbind(df.monomer, bp, dimer, trimer)
df.monomer.bp.dimer.trimer.tetramer <- cbind(df.monomer, bp, dimer, trimer, tetramer)

write.table(df.monomer[,c(1,3:ncol(df.monomer))], "Ecoli.monomer.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.monomer[,c(1,3:ncol(df.monomer))], "Ecoli.monomer.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.monomer[,3:ncol(df.monomer)], "Ecoli.monomer.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.bp[,c(1,3:ncol(df.bp))], "Ecoli.bp.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.bp[,c(1,3:ncol(df.bp))], "Ecoli.bp.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.bp[,3:ncol(df.bp)], "Ecoli.bp.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.dimer[,c(1,3:ncol(df.dimer))], "Ecoli.dimer.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.dimer[,c(1,3:ncol(df.dimer))], "Ecoli.dimer.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.dimer[,3:ncol(df.dimer)], "Ecoli.dimer.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.trimer[,c(1,3:ncol(df.trimer))], "Ecoli.trimer.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.trimer[,c(1,3:ncol(df.trimer))], "Ecoli.trimer.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.trimer[,3:ncol(df.trimer)], "Ecoli.trimer.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.tetramer[,c(1,3:ncol(df.tetramer))], "Ecoli.tetramer.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.tetramer[,c(1,3:ncol(df.tetramer))], "Ecoli.tetramer.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.tetramer[,3:ncol(df.tetramer)], "Ecoli.tetramer.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

write.table(df.monomer.bp[,c(1,3:ncol(df.monomer.bp))], "Ecoli.monomer.bp.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.monomer.bp[,c(1,3:ncol(df.monomer.bp))], "Ecoli.monomer.bp.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.monomer.bp[,3:ncol(df.monomer.bp)], "Ecoli.monomer.bp.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.monomer.bp.dimer[,c(1,3:ncol(df.monomer.bp.dimer))], "Ecoli.monomer.bp.dimer.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.monomer.bp.dimer[,c(1,3:ncol(df.monomer.bp.dimer))], "Ecoli.monomer.bp.dimer.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.monomer.bp.dimer[,3:ncol(df.monomer.bp.dimer)], "Ecoli.monomer.bp.dimer.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.monomer.bp.dimer.trimer[,c(1,3:ncol(df.monomer.bp.dimer.trimer))], "Ecoli.monomer.bp.dimer.trimer.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.monomer.bp.dimer.trimer[,c(1,3:ncol(df.monomer.bp.dimer.trimer))], "Ecoli.monomer.bp.dimer.trimer.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.monomer.bp.dimer.trimer[,3:ncol(df.monomer.bp.dimer.trimer)], "Ecoli.monomer.bp.dimer.trimer.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.monomer.bp.dimer.trimer.tetramer[,c(1,3:ncol(df.monomer.bp.dimer.trimer.tetramer))], "Ecoli.monomer.bp.dimer.trimer.tetramer.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.monomer.bp.dimer.trimer.tetramer[,c(1,3:ncol(df.monomer.bp.dimer.trimer.tetramer))], "Ecoli.monomer.bp.dimer.trimer.tetramer.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.monomer.bp.dimer.trimer.tetramer[,3:ncol(df.monomer.bp.dimer.trimer.tetramer)], "Ecoli.monomer.bp.dimer.trimer.tetramer.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")



# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/bp
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/dimer
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/trimer
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/tetramer
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer.trimer
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer.trimer.tetramer

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName quantum.monomer --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.monomer.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/bp
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName quantum.bp --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.bp.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/dimer
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName quantum.dimer --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.dimer.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/trimer
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName quantum.trimer --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.trimer.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/tetramer
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName quantum.tetramer --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.tetramer.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName quantum.monomer.bp --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.monomer.bp.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName quantum.monomer.bp.dimer --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.monomer.bp.dimer.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer.trimer
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName quantum.monomer.bp.dimer.trimer --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.monomer.bp.dimer.trimer.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer.trimer.tetramer
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName quantum.monomer.bp.dimer.trimer.tetramer --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.monomer.bp.dimer.trimer.tetramer.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer/Submits/submit_full_quantum.monomer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/bp/Submits/submit_full_quantum.bp_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/dimer/Submits/submit_full_quantum.dimer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/trimer/Submits/submit_full_quantum.trimer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/tetramer/Submits/submit_full_quantum.tetramer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp/Submits/submit_full_quantum.monomer.bp_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer/Submits/submit_full_quantum.monomer.bp.dimer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer.trimer/Submits/submit_full_quantum.monomer.bp.dimer.trimer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer.trimer.tetramer/Submits/submit_full_quantum.monomer.bp.dimer.trimer.tetramer_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer/Submits/submit_train_quantum.monomer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/bp/Submits/submit_train_quantum.bp_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/dimer/Submits/submit_train_quantum.dimer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/trimer/Submits/submit_train_quantum.trimer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/tetramer/Submits/submit_train_quantum.tetramer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp/Submits/submit_train_quantum.monomer.bp_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer/Submits/submit_train_quantum.monomer.bp.dimer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer.trimer/Submits/submit_train_quantum.monomer.bp.dimer.trimer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer.trimer.tetramer/Submits/submit_train_quantum.monomer.bp.dimer.trimer.tetramer_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer/Submits/submit_test_quantum.monomer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/bp/Submits/submit_test_quantum.bp_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/dimer/Submits/submit_test_quantum.dimer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/trimer/Submits/submit_test_quantum.trimer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/tetramer/Submits/submit_test_quantum.tetramer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp/Submits/submit_test_quantum.monomer.bp_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer/Submits/submit_test_quantum.monomer.bp.dimer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer.trimer/Submits/submit_test_quantum.monomer.bp.dimer.trimer_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer.trimer.tetramer/Submits/submit_test_quantum.monomer.bp.dimer.trimer.tetramer_0.sh

# Andes
module load python/3.7-anaconda3

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt quantum.monomer
# 0.23912958274324625
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/quantum.monomer_cut.score.importance4 | head
# p20monomer.No.electronsraw: 234402
# p18monomer.No.electronsraw: 165851
# p19monomer.HLgap.eVraw: 116336
# p19monomer.No.electronsraw: 106237
# p17monomer.No.electronsraw: 79784.1
# p16monomer.No.electronsraw: 79011.8
# p15monomer.No.electronsraw: 49554.4
# p15monomer.HLgap.eVraw: 44367.4
# p17monomer.HLgap.eVraw: 36062.9
# p14monomer.HLgap.eVraw: 35593.3

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/bp
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt quantum.bp
# 0.10865513159923842
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/quantum.bp_cut.score.importance4 | head
# p20basepair.Hbond.energyraw: 132049
# p20basepair.Hlgap.eVEraw: 76184.5
# p18basepair.Hlgap.eVEraw: 45633
# p18basepair.Hbond.energyraw: 44610.9
# p14basepair.Hbond.energyraw: 14119.6
# p11basepair.Hbond.energyraw: 13603.3
# p14basepair.Hlgap.eVEraw: 13600.5
# p16basepair.Hbond.energyraw: 12756.2
# p16basepair.Hlgap.eVEraw: 12573.6
# p11basepair.Hlgap.eVEraw: 12563.8

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/dimer
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt quantum.dimer
# 0.24662278990401446
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/quantum.dimer_cut.score.importance4 | head
# p19dimer.Hbond.energyraw: 282852
# p18dimer.Hbond.energyraw: 94006.1
# p16dimer.Hbond.stackingraw: 82983.1
# p19dimer.HLgap.eVEraw: 72357.1
# p15dimer.Hbond.energyraw: 70803.2
# p18dimer.Hbond.stackingraw: 67454.3
# p15dimer.Hbond.stackingraw: 65940.3
# p13dimer.Hbond.energyraw: 63473.3
# p19dimer.Hbond.stackingraw: 55555.9
# p18dimer.HLgap.eVEraw: 51251.5

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/trimer
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt quantum.trimer
# 0.2343633085914134
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/quantum.trimer_cut.score.importance4 | head
# p18trimer.Hbond.energyraw: 379823
# p18trimer.Hlgap.eVEraw: 125230
# p17trimer.Hbond.stackingraw: 123012
# p15trimer.Hbond.stackingraw: 76675.7
# p14trimer.Hbond.energyraw: 62165.7
# p15trimer.Hbond.energyraw: 45814.7
# p1trimer.Hbond.energyraw: 43700.5
# p11trimer.Hbond.energyraw: 43260.1
# p12trimer.Hbond.energyraw: 42536.3
# p13trimer.Hbond.energyraw: 41079.6

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/tetramer
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt quantum.tetramer
# 0.22809195591591117
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/quantum.tetramer_cut.score.importance4 | head
# p17tetramer.Hbond.energyraw: 313042
# p16tetramer.Hbond.stackingraw: 176979
# p17tetramer.Hlgap.eVEraw: 112160
# p16tetramer.Hbond.energyraw: 74317.7
# p15tetramer.Hbond.stackingraw: 74174.2
# p11tetramer.Hbond.energyraw: 71547.9
# p17tetramer.Hbond.stackingraw: 68452.7
# p1tetramer.Hbond.energyraw: 67379.8
# p13tetramer.Hbond.energyraw: 67226.3
# p15tetramer.Hbond.energyraw: 63265.2

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt quantum.monomer.bp
# 0.24123214119568534
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/quantum.monomer.bp_cut.score.importance4 | head
# p19monomer.HLgap.eVraw: 132757
# p20basepair.Hbond.energyraw: 110940
# p18monomer.No.electronsraw: 97866.1
# p20basepair.Hlgap.eVEraw: 81235.4
# p16monomer.No.electronsraw: 80132.5
# p17monomer.No.electronsraw: 74586.9
# p20monomer.No.electronsraw: 73039.8
# p19monomer.No.electronsraw: 49834.4
# p15monomer.No.electronsraw: 49326
# p15monomer.HLgap.eVraw: 47552.6

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt quantum.monomer.bp.dimer
# 0.2503873805742469
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/quantum.monomer.bp.dimer_cut.score.importance4 | head
# p19dimer.Hbond.energyraw: 107949
# p20basepair.Hbond.energyraw: 87396.1
# p16dimer.Hbond.stackingraw: 86288.4
# p20basepair.Hlgap.eVEraw: 76726.1
# p19dimer.Hbond.stackingraw: 72936.8
# p18dimer.Hbond.energyraw: 71045.7
# p18dimer.Hbond.stackingraw: 66421.5
# p15dimer.Hbond.energyraw: 66143.4
# p13dimer.Hbond.energyraw: 63958.4
# p15dimer.Hbond.stackingraw: 62114

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer.trimer
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt quantum.monomer.bp.dimer.trimer
# 0.24631083915041616
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/quantum.monomer.bp.dimer.trimer_cut.score.importance4 | head
# p20basepair.Hbond.energyraw: 90824.1
# p19dimer.Hbond.energyraw: 76022.6
# p20basepair.Hlgap.eVEraw: 71778.3
# p18trimer.Hbond.energyraw: 71042.3
# p19dimer.Hbond.stackingraw: 67615.9
# p15trimer.Hbond.stackingraw: 58769.7
# p11trimer.Hbond.energyraw: 44117.4
# p1trimer.Hbond.energyraw: 43938.7
# p17trimer.Hbond.stackingraw: 43434.2
# p14trimer.Hbond.energyraw: 42841.4

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer.trimer.tetramer
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt quantum.monomer.bp.dimer.trimer.tetramer
# 0.24179354649146106
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/quantum.monomer.bp.dimer.trimer.tetramer_cut.score.importance4 | head
# p20basepair.Hbond.energyraw: 102256
# p20basepair.Hlgap.eVEraw: 70983
# p19dimer.Hbond.stackingraw: 68035.5
# p19dimer.Hbond.energyraw: 65745.6
# p11tetramer.Hbond.energyraw: 65184.4
# p1tetramer.Hbond.energyraw: 61460.2
# p18trimer.Hbond.energyraw: 57405.8
# p13tetramer.Hbond.energyraw: 49726
# p15tetramer.Hbond.stackingraw: 44775.1
# p18dimer.Hbond.stackingraw: 42723.5


# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("quantum.monomer_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4775135

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/bp/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("quantum.bp_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.3430826

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/dimer/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("quantum.dimer_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4919964

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/trimer/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("quantum.trimer_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.481761

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/tetramer/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("quantum.tetramer_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4792102

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("quantum.monomer.bp_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4789989

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("quantum.monomer.bp.dimer_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4953076

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer.trimer/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("quantum.monomer.bp.dimer.trimer_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4961969

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/monomer.bp.dimer.trimer.tetramer/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("quantum.monomer.bp.dimer.trimer.tetramer_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4920071

All.Subsets

library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
df <- read.delim("Ecoli.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)
# 6234
raw <- df[,c(1:2,3,19:21,5916)]
onehot <- df[,c(1:2,4:18,22:5915,5917:5918)]
qct <- df[,c(1:2,5919:6234)]
raw_onehot <- df[,c(1:2,3,19:21,5916,4:18,22:5915,5917:5918)]
raw_qct <- df[,c(1:2,3,19:21,5916,5919:6234)]
onehot_qct <- df[,c(1:2,4:18,22:5915,5917:5918,5919:6234)]
raw_onehot_qct <- df[,c(1:2,3,19:21,5916,4:18,22:5915,5917:5918,5919:6234)]

write.table(raw[,c(1,3:ncol(raw))], "Ecoli.raw.features.txt", quote=F, row.names=F, sep="\t")
write.table(raw[,c(1,3:ncol(raw))], "Ecoli.raw.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(raw[,3:ncol(raw)], "Ecoli.raw.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(onehot[,c(1,3:ncol(onehot))], "Ecoli.onehot.features.txt", quote=F, row.names=F, sep="\t")
write.table(onehot[,c(1,3:ncol(onehot))], "Ecoli.onehot.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(onehot[,3:ncol(onehot)], "Ecoli.onehot.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(qct[,c(1,3:ncol(qct))], "Ecoli.qct.features.txt", quote=F, row.names=F, sep="\t")
write.table(qct[,c(1,3:ncol(qct))], "Ecoli.qct.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(qct[,3:ncol(qct)], "Ecoli.qct.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(raw_onehot[,c(1,3:ncol(raw_onehot))], "Ecoli.raw_onehot.features.txt", quote=F, row.names=F, sep="\t")
write.table(raw_onehot[,c(1,3:ncol(raw_onehot))], "Ecoli.raw_onehot.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(raw_onehot[,3:ncol(raw_onehot)], "Ecoli.raw_onehot.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(raw_qct[,c(1,3:ncol(raw_qct))], "Ecoli.raw_qct.features.txt", quote=F, row.names=F, sep="\t")
write.table(raw_qct[,c(1,3:ncol(raw_qct))], "Ecoli.raw_qct.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(raw_qct[,3:ncol(raw_qct)], "Ecoli.raw_qct.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(onehot_qct[,c(1,3:ncol(onehot_qct))], "Ecoli.onehot_qct.features.txt", quote=F, row.names=F, sep="\t")
write.table(onehot_qct[,c(1,3:ncol(onehot_qct))], "Ecoli.onehot_qct.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(onehot_qct[,3:ncol(onehot_qct)], "Ecoli.onehot_qct.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(raw_onehot_qct[,c(1,3:ncol(raw_onehot_qct))], "Ecoli.raw_onehot_qct.features.txt", quote=F, row.names=F, sep="\t")
write.table(raw_onehot_qct[,c(1,3:ncol(raw_onehot_qct))], "Ecoli.raw_onehot_qct.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(raw_onehot_qct[,3:ncol(raw_onehot_qct)], "Ecoli.raw_onehot_qct.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")


# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/onehot
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/qct
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_onehot
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_qct
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/onehot_qct
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_onehot_qct

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName raw --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.raw.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/onehot
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName onehot --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.onehot.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/qct
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName qct --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.qct.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_onehot
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName raw_onehot --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.raw_onehot.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_qct
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName raw_qct --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.raw_qct.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/onehot_qct
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName onehot_qct --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.onehot_qct.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_onehot_qct
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName raw_onehot_qct --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.raw_onehot_qct.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt


# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw/Submits/submit_full_raw_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/onehot/Submits/submit_full_onehot_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/qct/Submits/submit_full_qct_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_onehot/Submits/submit_full_raw_onehot_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_qct/Submits/submit_full_raw_qct_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/onehot_qct/Submits/submit_full_onehot_qct_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_onehot_qct/Submits/submit_full_raw_onehot_qct_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw/Submits/submit_train_raw_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/onehot/Submits/submit_train_onehot_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/qct/Submits/submit_train_qct_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_onehot/Submits/submit_train_raw_onehot_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_qct/Submits/submit_train_raw_qct_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/onehot_qct/Submits/submit_train_onehot_qct_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_onehot_qct/Submits/submit_train_raw_onehot_qct_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw/Submits/submit_test_raw_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/onehot/Submits/submit_test_onehot_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/qct/Submits/submit_test_qct_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_onehot/Submits/submit_test_raw_onehot_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_qct/Submits/submit_test_raw_qct_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/onehot_qct/Submits/submit_test_onehot_qct_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_onehot_qct/Submits/submit_test_raw_onehot_qct_0.sh

# Andes
module load python/3.7-anaconda3

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt raw
# 
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/raw_cut.score.importance4 | head
# sgRNA.tempsgRNA.raw: 77242.8
# sgRNA.gcsgRNA.raw: 76936.2
# sgRNA.structuresgRNA.raw: 4904.05
# pam.distance0: 1557.88
# gene.distance0: 0

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/onehot
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt onehot
# 0.2600428516356858
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/onehot_cut.score.importance4 | head
# V80.xsgRNA.raw: 125603
# V78.xsgRNA.raw: 97003.8
# CCsgRNA.raw: 65756.1
# V231.xsgRNA.raw: 62699.6
# GGsgRNA.raw: 61549.4
# V76.xsgRNA.raw: 55640.9
# V303.xsgRNA.raw: 55623.9
# V73.xsgRNA.raw: 55053.2
# V72.xsgRNA.raw: 54051.9
# TsgRNA.raw: 50452

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/qct
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt qct
# 0.24183122435585644
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/qct_cut.score.importance4 | head
# p20basepair.Hbond.energyraw: 94414.7
# p19dimer.Hbond.energyraw: 77736.9
# p20basepair.Hlgap.eVEraw: 72590.9
# p11tetramer.Hbond.energyraw: 64640.2
# p19dimer.Hbond.stackingraw: 63317.4
# p1tetramer.Hbond.energyraw: 60242.7
# p18trimer.Hbond.energyraw: 56528.4
# p13tetramer.Hbond.energyraw: 52013.4
# p15tetramer.Hbond.stackingraw: 44911
# p18dimer.Hbond.stackingraw: 42437

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_onehot
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt raw_onehot
# 0.2602828644651521
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/raw_onehot_cut.score.importance4 | head
# V80.xsgRNA.raw: 124786
# V78.xsgRNA.raw: 96487.2
# V231.xsgRNA.raw: 62187.8
# pam.distance0: 57859.9
# CCsgRNA.raw: 56013.6
# V76.xsgRNA.raw: 55856.1
# V303.xsgRNA.raw: 55425.8
# V72.xsgRNA.raw: 54023.2
# V73.xsgRNA.raw: 53518.8
# GGsgRNA.raw: 51386.6

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_qct
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt raw_qct
# 0.24177446035820813
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/raw_qct_cut.score.importance4 | head
# p20basepair.Hbond.energyraw: 99208.4
# p20basepair.Hlgap.eVEraw: 69960.5
# p19dimer.Hbond.stackingraw: 67007.8
# p19dimer.Hbond.energyraw: 65094.9
# p18trimer.Hbond.energyraw: 56776.3
# p1tetramer.Hbond.energyraw: 53759
# p11tetramer.Hbond.energyraw: 48134.6
# p15tetramer.Hbond.stackingraw: 41530.7
# p18dimer.Hbond.stackingraw: 39964.4
# p19dimer.HLgap.eVEraw: 39812.1

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/onehot_qct
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt onehot_qct
# 0.24905182664101577
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/onehot_qct_cut.score.importance4 | head
# p20basepair.Hbond.energyraw: 127860
# p19dimer.Hbond.stackingraw: 83010.5
# p20basepair.Hlgap.eVEraw: 73663.5
# p11tetramer.Hbond.energyraw: 54304.4
# p1tetramer.Hbond.energyraw: 49929.4
# p18trimer.Hbond.stackingraw: 44303.8
# p18dimer.Hbond.energyraw: 40919
# V231.xsgRNA.raw: 39596.2
# p13tetramer.Hbond.energyraw: 36465.7
# p18dimer.Hbond.stackingraw: 33128.6

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_onehot_qct
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt raw_onehot_qct
# 0.24906667479923555
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/raw_onehot_qct_cut.score.importance4 | head
# p20basepair.Hbond.energyraw: 125357
# p19dimer.Hbond.stackingraw: 80519.8
# p20basepair.Hlgap.eVEraw: 74407.4
# p1tetramer.Hbond.energyraw: 47519
# p18trimer.Hbond.stackingraw: 46368.4
# p11tetramer.Hbond.energyraw: 43490.4
# V231.xsgRNA.raw: 40232.6
# p18dimer.Hbond.energyraw: 37379.9
# p18dimer.Hbond.stackingraw: 34258.7
# p13tetramer.Hbond.energyraw: 29700.7

# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("raw_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.2007612

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/onehot/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("onehot_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4914184

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/qct/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("qct_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4918057

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_onehot/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("raw_onehot_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4931724

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_qct/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("raw_qct_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4939777

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/onehot_qct/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("onehot_qct_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.500817

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/raw_onehot_qct/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("raw_onehot_qct_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.5019173

Figures

require(data.table)

### violin plots of R2 across different models

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults")

#create a list of the files from your target directory
file_list <- list.files(path="/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults")

#initiate a blank data frame, each iteration of the loop will add a column of the data from the given file to this variable
dataset <- data.frame()

for (i in 1:length(file_list)){
  temp_data <- fread(file_list[i], stringsAsFactors = F) #read in files using the fread function from the data.table package
  dataset <- do.call(cbind, sapply(file_list,data.table::fread, simplify = FALSE)) #for each iteration, bind the new data to the building dataset
}
colnames(dataset) <- c("onehot", "onehot.QCT", "raw.onehot", "raw.onehot.QCT", "raw", "raw.QCT", "QCT", "QCT.dimers", "QCT.single.bp.dimers.noncorrelated", "QCT.single.bp.dimers", "top10", "top100", "top1k", "top20", "top200", "top5", "top50", "top500")

library(ggplot2)
library(reshape2)
library(RColorBrewer)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")

# Figure 2A
dataset.subsets2 <- dataset[,c(5,1,3,7,6,2,4,10)]
colnames(dataset.subsets2) <- c("Raw", "One-hot", "Raw + One-hot", "Quantum", "Raw + Quantum", "One-hot + Quantum", "Raw + One-hot + Quantum", "Raw + One-hot + Quantum + Kmers")
dataset.subsets2.melt <- melt(dataset.subsets2)
#pdf("R2.subsets2.violin.pdf")
#ggplot(dataset.subsets2.melt, aes(x=value, y=variable, fill=variable)) + geom_violin(trim=FALSE) + geom_boxplot(width=0.1, fill="white") + scale_color_brewer(palette="Dark2") + labs(title="R2 across iRF runs", x="feature run", y="R2") + theme_minimal() + theme(legend.position = "none")
pdf("R2.subsets2.violin.nocolor.pdf")
ggplot(dataset.subsets2.melt, aes(x=value, y=variable)) + geom_violin(trim=FALSE) + geom_boxplot(width=0.1, fill="white") + labs(title="R2 across iRF models", x="feature run", y="R2") + theme(legend.position = "none") + theme_minimal()
dev.off()

# Figure 2D
dataset.top <- dataset[,c(16,11,14,17,12,15,18,13)]
dataset.top.melt <- melt(dataset.top)
#pdf("R2.top.violin.pdf")
#ggplot(dataset.top.melt, aes(x=value, y=variable, fill=variable)) + geom_violin(trim=FALSE) + geom_boxplot(width=0.1, fill="white") + scale_color_brewer(palette="Dark2") + labs(title="R2 across iRF runs", x="feature run", y="R2") + theme_minimal() + theme(legend.position = "none")
pdf("R2.top.violin.nocolor.pdf")
ggplot(dataset.top.melt, aes(x=value, y=variable)) + geom_violin(trim=FALSE) + geom_boxplot(width=0.1, fill="white") + labs(title="R2 across iRF runs", x="feature run", y="R2") + theme(legend.position = "none") + theme_minimal()
dev.off()


# Figure 2B
### updated output (21 March 2022)
library(ggplot2)
library(reshape2)
library(RColorBrewer)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/cut.score")
effect <- read.delim("e.coli.finalquantum_cut.score.importance4.effect", header=T, sep="\t", stringsAsFactors = F)
effect.sort <- effect[order(-effect$NormEdge),]
effect.sort.50 <- effect.sort[1:50,]
effect.sort.50$category <- c("QCT.bp", "QCT.dimer", "QCT.bp", "QCT.tetramer", "QCT.tetramer", "dep.kmer2", "QCT.trimer", "QCT.dimer", "QCT.trimer", "QCT.dimer", "QCT.dimer", "QCT.tetramer", "raw", "raw", "ind.kmer2", "QCT.tetramer", "QCT.tetramer", "QCT.tetramer", "QCT.tetramer", "QCT.trimer", "QCT.tetramer", "QCT.tetramer", "QCT.trimer", "QCT.monomer", "QCT.tetramer", "QCT.tetramer", "QCT.tetramer", "QCT.trimer", "QCT.dimer", "QCT.tetramer", "QCT.tetramer", "QCT.tetramer", "QCT.tetramer", "QCT.tetramer", "QCT.tetramer", "QCT.trimer", "QCT.tetramer", "QCT.tetramer", "QCT.tetramer", "QCT.tetramer", "QCT.bp", "QCT.tetramer", "QCT.tetramer", "QCT.tetramer", "QCT.tetramer", "QCT.tetramer", "QCT.tetramer", "QCT.tetramer", "QCT.tetramer", "QCT.tetramer")

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
pdf("Imp.bar.21march.pdf")
ggplot(effect.sort.50, aes(x=reorder(Feature, -NormEdge), y=NormEdge, color=category)) + geom_bar(stat="identity") + labs(title="Feature Importance (Top 50 Features)", x="Feature", y="Normalized Importance") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Paired") 
dev.off()


effect.sort.50$Category <- c("Quantum Basepair", "Quantum Dimer", "Quantum Basepair", "Quantum Tetramer", "Quantum Tetramer", "Onehot Dimer", "Quantum Trimer", "Quantum Dimer", "Quantum Trimer", "Quantum Dimer", "Quantum Dimer", "Quantum Tetramer", "Raw Calculation", "Raw Calculation", "Onehot Dimer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Trimer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Trimer", "Quantum Monomer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Trimer", "Quantum Dimer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Trimer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Basepair", "Quantum Tetramer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Tetramer", "Quantum Tetramer")

effect.sort.50$Feature.Label <- c("Basepair HL-gap pos20", "Dimer H-stacking pos19", "Basepair H-bond pos20", "Tetramer H-bond pos11", "Tetramer H-bond pos1", "CC pos15", "Trimer H-stacking pos18", "Dimer H-stacking pos18", "Trimer H-bond pos18", "Dimer H-bond pos18", "Dimer HL-gap pos19", "Tetramer H-bond pos13", "GC content", "Melting Temperature", "CCA pos19", "Tetramer H-stacking pos17", "Tetramer H-bond pos2", "Tetramer H-stacking pos14", "Tetramer H-stacking pos15", "Trimer HL-gap pos18", "Tetramer H-stacking pos7", "Tetramer H-stacking pos16", "Trimer H-stacking pos15", "Monomer # of Electrons pos18", "Tetramer HL-gap pos1", "Tetramer H-bond pos14", "Tetramer HL-gap pos5", "Trimer H-bond pos1", "Dimer H-stacking pos16", "Tetramer H-bond pos8", "Tetramer HL-gap pos7", "Tetramer HL-gap pos12", "Tetramer H-bond pos15", "Tetramer HL-gap pos6", "Tetramer H-bond pos17", "Trimer H-stacking pos17", "Tetramer HL-gap pos4", "Tetramer HL-gap pos10", "Tetramer H-bond pos10", "Tetramer H-stacking pos6", "Basepair H-bond pos18", "Tetramer HL-gap pos9", "Tetramer HL-gap pos14", "Tetramer HL-gap pos11", "Tetramer H-stacking pos2", "Tetramer HL-gap pos3", "Tetramer H-stacking pos5", "Tetramer HL-gap pos2", "Tetramer H-bond pos3", "Tetramer H-stacking pos13")

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
pdf("Imp.bar.19May.pdf")
ggplot(effect.sort.50, aes(x=reorder(Feature.Label, -NormEdge), y=NormEdge, fill=Category)) + geom_bar(colour="black", stat="identity") + labs(title="Feature Importance (Top 50 Features)", x="Feature", y="Normalized Importance") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1, size=8)) + scale_fill_brewer(palette="Paired")
dev.off()


effect.sort.50$Category <- c("HL-gap", "H-stacking", "H-bond", "H-bond", "H-bond", "One-hot Dimer", "H-stacking", "H-stacking", "H-bond", "H-bond", "HL-gap", "H-bond", "Raw Calculation", "Raw Calculation", "One-hot Dimer", "H-stacking", "H-bond", "H-stacking", "H-stacking", "HL-gap", "H-stacking", "H-stacking", "H-stacking", "# of Electrons", "HL-gap", "H-bond", "HL-gap", "H-bond", "H-stacking", "H-bond", "HL-gap", "HL-gap", "H-bond", "HL-gap", "H-bond", "H-stacking", "HL-gap", "HL-gap", "H-bond", "H-stacking", "H-bond", "HL-gap", "HL-gap", "HL-gap", "H-stacking", "HL-gap", "H-stacking", "HL-gap", "H-bond", "H-stacking")

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
pdf("Imp.bar.feature.19May.pdf")
ggplot(effect.sort.50, aes(x=reorder(Feature.Label, -NormEdge), y=NormEdge, fill=Category)) + geom_bar(colour="black", stat="identity") + labs(title="Feature Importance (Top 50 Features)", x="Feature", y="Normalized Importance") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1, size=8)) + scale_fill_brewer(palette="Paired")
dev.off()





# Figure 2C
library(ggplot2)
library(reshape2)
library(RColorBrewer)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/cut.score")
effect <- read.delim("e.coli.finalquantum_cut.score.importance4.effect", header=T, sep="\t", stringsAsFactors = F)
effect.sort <- effect[order(-effect$NormEdge),]
effect.sort.100 <- effect.sort[1:100,]

library(ggrepel)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
summary(effect.sort.100$Samples)
   # Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   # 1585    2776    4066    5730    6792   27637 

pdf("Imp.Effect.scatter.label.quartile.color.21March.pdf")
ggplot(effect.sort.100, aes(x=NormEdge, y=Samples, label=Feature)) + geom_point(aes(size = FeatureEffect), color = dplyr::case_when(effect.sort.100$Samples > 6792 ~ "#1b9e77", effect.sort.100$Samples < 2776 ~ "#d95f02", TRUE ~ "#7570b3"), alpha = 0.8) + geom_text_repel(aes(label = Feature), box.padding = 0.35, point.padding = 0.5, segment.color = 'grey50', size=2) + labs(title="Samples v Importance (Top 100 Features)", x="Normalized Importance", y="Samples") + theme_minimal()
dev.off()

pdf("Imp.Sample.scatter.label.Effect.color.18May.pdf")
ggplot(effect.sort.100, aes(x=NormEdge, y=Samples, label=Feature)) + geom_point(aes(color = FeatureEffect, size=NormEdge), alpha = 0.8) + geom_text_repel(aes(label = Feature), box.padding = 0.35, point.padding = 0.5, segment.color = 'grey50', size=2) + labs(title="Samples v Importance (Top 100 Features)", x="Normalized Importance", y="Samples") + theme_minimal() + scale_colour_gradient2()
dev.off()

effect.sort.100$Feature <- gsub('.raw', '', effect.sort.100$Feature)
effect.sort.100$Feature <- gsub('raw', '', effect.sort.100$Feature)
pdf("Imp.Sample.scatter.LargeLabel.Effect.color.18May.pdf")
ggplot(effect.sort.100, aes(x=NormEdge, y=Samples, label=Feature)) + geom_point(aes(color = FeatureEffect, size=NormEdge), alpha = 1) + geom_text_repel(aes(label = Feature), box.padding = 1, point.padding = 0.5, segment.color = 'grey50', size=4) + labs(title="Samples v Importance (Top 100 Features)", x="Normalized Importance", y="Samples") + theme_minimal() + scale_colour_gradient2()
dev.off()

pdf("Imp.Sample.scatter.NoLabel.Effect.color.18May.pdf")
ggplot(effect.sort.100, aes(x=NormEdge, y=Samples, label=Feature)) + geom_point(aes(color = FeatureEffect, size=NormEdge), alpha = 1) + labs(title="Samples v Importance (Top 100 Features)", x="Normalized Importance", y="Samples") + theme_minimal() + scale_colour_gradient2()
dev.off()



# Figure 3A
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/cut.score")
imp <- read.delim("e.coli.finalquantum.importance4.effect_sorted", header=F, sep="\t", stringsAsFactors = F)
colnames(imp) <- c("Feature", "YVec", "NormEdge", "Effect", "Samples", "Linearity")
imp$Normalized.Importance <- as.numeric(imp$NormEdge)
imp$Feature.Effect <- as.numeric(imp$Effect)
imp$SampleCount <- as.numeric(imp$Samples)
imp.dir <- imp %>% mutate(Effect.Direction = ifelse(Feature.Effect < 0, "neg", ifelse(Feature.Effect > 0, "pos", "zero")))
imp.dir.top20 <- imp.dir[1:20,]

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
pdf("Imp.Dir.Top20.21March.pdf")
ggplot(imp.dir.top20) + geom_bar(aes(x=reorder(Feature, -Normalized.Importance), y=Normalized.Importance, fill=Effect.Direction), stat="identity") + theme_classic() + xlab("Top Features") + ylab("Normalized Importance") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_fill_brewer(palette="Set1")
dev.off()

ggplot(imp.dir.top20, aes(x=reorder(Feature, -Normalized.Importance))) + geom_bar(aes(y=Normalized.Importance, fill=Effect.Direction), stat="identity") + coord_flip() + xlab("") + ylab("Normalized Importance") + theme_classic() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position="bottom") + scale_fill_brewer(palette="Set1")

# Figure 3B
pdf("Imp.Dir.Top20.Effect.21March.pdf")
imp.dir.top20$Sample.Prop <- imp.dir.top20$SampleCount/32374
ggplot(imp.dir.top20, aes(x=reorder(Feature, -Normalized.Importance))) + geom_point(aes(y=Sample.Prop, color=Effect.Direction, size=Normalized.Importance)) + xlab("") + ylab("Avg Proportion of Samples that Features Influence") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Set2")
dev.off()

pdf("Imp.Dir.Top20.Effect.30March.pdf")
ggplot(imp.dir.top20, aes(x=reorder(Feature, -Normalized.Importance))) + geom_point(aes(y=Sample.Prop, color=Effect.Direction, size=Feature.Effect)) + xlab("") + ylab("Avg Proportion of Samples that Features Influence") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Set2")
dev.off()


#### Figure S3: Focus on effect size
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/cut.score")
imp <- read.delim("e.coli.finalquantum.importance4.effect_sorted", header=F, sep="\t", stringsAsFactors = F)
colnames(imp) <- c("Feature", "YVec", "NormEdge", "Effect", "Samples", "Linearity")
imp$Normalized.Importance <- as.numeric(imp$NormEdge)
imp$Feature.Effect <- as.numeric(imp$Effect)
imp$SampleCount <- as.numeric(imp$Samples)
imp.dir <- imp %>% mutate(Effect.Direction = ifelse(Feature.Effect < 0, "neg", ifelse(Feature.Effect > 0, "pos", "zero")))
imp.dir$absEffect <- abs(imp.dir$Feature.Effect)
imp.dir.effectsorted <- imp.dir[order(imp.dir$absEffect, decreasing = TRUE),]
imp.dir.effectsorted.top20 <- imp.dir.effectsorted[1:20,]

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
pdf("Imp.Dir.Top20Effect.Effect.30March.pdf")
imp.dir.effectsorted.top20$Feature.Label <- c("CTG pos19", "Basepair HL-gap pos11", "CACC pos12", "GCTA pos6", "CAC pos12", "TC pos12", "CTG pos6", "GCCG pos8", "ACA pos15", "TACT pos3", "CAGC pos6", "TGC pos13", "AAC pos13", "GTG pos3", "CAGT pos4", "TCCT pos5", "AACA pos4", "AGCA pos4", "GTGG pos7", "GAGA pos1")
ggplot(imp.dir.effectsorted.top20) + geom_point(aes(x=reorder(Feature.Label, -absEffect), y=absEffect, color=Effect.Direction, size=Normalized.Importance)) + xlab("") + ylab("abs(Effect Size)") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Set2")
dev.off()

pdf("Imp.Dir.Top20Effect.Effect.31May.pdf")
imp.dir.effectsorted.top20$Feature.Label <- c("CTG pos19", "Basepair HL-gap pos11", "CACC pos12", "GCTA pos6", "CAC pos12", "TC pos12", "CTG pos6", "GCCG pos8", "ACA pos15", "TACT pos3", "CAGC pos6", "TGC pos13", "AAC pos13", "GTG pos3", "CAGT pos4", "TCCT pos5", "AACA pos4", "AGCA pos4", "GTGG pos7", "GAGA pos1")
ggplot(imp.dir.effectsorted.top20) + geom_point(aes(x=reorder(Feature.Label, absEffect), y=absEffect, color=Effect.Direction, size=Normalized.Importance)) + xlab("") + ylab("abs(Effect Size)") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Set2")
dev.off()






## Main E.coli feature figure
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/cut.score")
imp <- read.delim("e.coli.finalquantum.importance4.effect_sorted", header=F, sep="\t", stringsAsFactors = F)
colnames(imp) <- c("Feature", "YVec", "NormEdge", "Effect", "Samples", "Linearity")
imp$Normalized.Importance <- as.numeric(imp$NormEdge)
imp$Feature.Effect <- as.numeric(imp$Effect)
imp$SampleCount <- as.numeric(imp$Samples)
imp.dir <- imp %>% mutate(Effect.Direction = ifelse(Feature.Effect < 0, "neg", ifelse(Feature.Effect > 0, "pos", "zero")))
imp.dir.top20 <- imp.dir[1:20,]

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
imp.dir.top20.df <- imp.dir.top20 %>% mutate(imp.dir = ifelse(Effect.Direction == "neg", Normalized.Importance*-1, Normalized.Importance))
imp.dir.top20.df$Feature.Label <- c("Basepair HL-gap pos20", "Dimer H-stacking pos19", "Basepair H-bond pos20", "Tetramer H-bond pos11", "Tetramer H-bond pos1", "CC pos15", "Trimer H-stacking pos18", "Dimer H-stacking pos18", "Trimer H-bond pos18", "Dimer H-bond pos18", "Dimer HL-gap pos19", "Tetramer H-bond pos13", "GC content", "Temperature of Melting", "CCA pos19", "Tetramer H-stacking pos17", "Tetramer H-bond pos2", "Tetramer H-stacking pos14", "Tetramer H-stacking pos15", "Trimer HL-gap pos18")


library(ggplot2)
pdf("Ecoli.FeatureEngineering.pdf")
ggplot(imp.dir.top20.df, aes(x=reorder(Feature.Label, -Normalized.Importance), y=imp.dir, color=Effect.Direction)) + geom_point(size=3) + geom_segment(aes(x=Feature.Label, xend=Feature.Label, y=0, yend=imp.dir)) + labs(title="Ecoli Top Features") + ylab("Normalized Importance") + xlab("") + theme(axis.text.x = element_text(angle=90, vjust=0.6)) + scale_fill_brewer(palette="Set1") + theme_classic() + coord_flip()
dev.off()

library(ggplot2)
pdf("Ecoli.FeatureEngineering.nocolor.pdf")
ggplot(imp.dir.top20.df, aes(x=reorder(Feature.Label, -Normalized.Importance), y=imp.dir), color="black") + geom_point(size=3) + geom_segment(aes(x=Feature.Label, xend=Feature.Label, y=0, yend=imp.dir)) + labs(title="Ecoli Top Features") + ylab("Normalized Importance") + xlab("") + theme(axis.text.x = element_text(angle=90, vjust=0.6)) + theme_classic() + coord_flip()
dev.off()

library(ggplot2)
pdf("Ecoli.FeatureEngineering.31May.pdf")
ggplot(imp.dir.top20.df, aes(x=reorder(Feature.Label, Normalized.Importance), y=imp.dir, color=Effect.Direction)) + geom_point(size=3) + geom_segment(aes(x=Feature.Label, xend=Feature.Label, y=0, yend=imp.dir)) + labs(title="Ecoli Top Features") + ylab("Normalized Importance") + xlab("") + theme(axis.text.x = element_text(angle=90, vjust=0.6)) + scale_fill_brewer(palette="Set1") + theme_classic() + coord_flip()
dev.off()
# violin plot of iRF output from the same matrix generation across species (e.coli, y.lipolytica, h.sapien, p.putida)
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_cross.species
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_cross.species
cp /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/cut.score/foldRuns/results/R2_foldResults.txt e.coli.R2_foldResults.txt
cp /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum/cut.score/foldRuns/results/R2_foldResults.txt y.lipolytica.R2_foldResults.txt
cp /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum/cut.score/foldRuns/results/R2_foldResults.txt Doench2014.R2_foldResults.txt
cp /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum/cut.score/foldRuns/results/R2_foldResults.txt putida.R2_foldResults.txt


require(data.table)

### violin plots of R2 across different models

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_cross.species")

#create a list of the files from your target directory
file_list <- list.files(path="/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_cross.species")

#initiate a blank data frame, each iteration of the loop will add a column of the data from the given file to this variable
dataset <- data.frame()

for (i in 1:length(file_list)){
  temp_data <- fread(file_list[i], stringsAsFactors = F) #read in files using the fread function from the data.table package
  dataset <- do.call(cbind, sapply(file_list,data.table::fread, simplify = FALSE)) #for each iteration, bind the new data to the building dataset
}
colnames(dataset) <- c("H.sapien", "E.coli", "P.putida", "Y.lipolytica")

library(ggplot2)
library(reshape2)
library(RColorBrewer)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_cross.species")

dataset.order <- dataset[,c(2,3,4,1)]
dataset.order.melt <- melt(dataset.order)
pdf("R2.cross.species.violin.pdf")
ggplot(dataset.order.melt, aes(x=value, y=variable, fill=variable)) + geom_violin(trim=FALSE) + geom_boxplot(width=0.1, fill="white") + scale_color_brewer(palette="Dark2") + labs(title="R2 across iRF runs", x="feature run", y="R2") + theme_minimal() + theme(legend.position = "none")
dev.off()





#### run same plot with MSE instead of R2
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_cross.species.MSE
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_cross.species.MSE
cp /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/cut.score/foldRuns/results/MSE_foldResults.txt e.coli.MSE_foldResults.txt
cp /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum/cut.score/foldRuns/results/MSE_foldResults.txt y.lipolytica.MSE_foldResults.txt
cp /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum/cut.score/foldRuns/results/MSE_foldResults.txt Doench2014.MSE_foldResults.txt
cp /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum/cut.score/foldRuns/results/MSE_foldResults.txt putida.MSE_foldResults.txt


require(data.table)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_cross.species.MSE")

#create a list of the files from your target directory
file_list <- list.files(path="/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_cross.species.MSE")

#initiate a blank data frame, each iteration of the loop will add a column of the data from the given file to this variable
dataset <- data.frame()

for (i in 1:length(file_list)){
  temp_data <- fread(file_list[i], stringsAsFactors = F) #read in files using the fread function from the data.table package
  dataset <- do.call(cbind, sapply(file_list,data.table::fread, simplify = FALSE)) #for each iteration, bind the new data to the building dataset
}
colnames(dataset) <- c("H.sapien", "E.coli", "P.putida", "Y.lipolytica")

library(ggplot2)
library(reshape2)
library(RColorBrewer)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_cross.species.MSE")

dataset.order <- dataset[,c(2,3,4,1)]
dataset.order.melt <- melt(dataset.order)
pdf("MSE.cross.species.violin.pdf")
ggplot(dataset.order.melt, aes(x=value, y=variable, fill=variable)) + geom_violin(trim=FALSE) + geom_boxplot(width=0.1, fill="white") + scale_color_brewer(palette="Dark2") + labs(title="MSE across iRF runs", x="feature run", y="R2") + theme_minimal() + theme(legend.position = "none")
dev.off()

# RMSE
df <- dataset[,c(2,7,8,1)]
colnames(df) <- c("E.coli", "P.putida", "Y.lipolytica", "H.sapien")
df.melt <- melt(df)
df.melt$rmse <- sqrt(df.melt$value)
pdf("RMSE.cross.species.violin.pdf")
ggplot(df.melt, aes(x=rmse, y=variable, fill=variable)) + geom_violin(trim=FALSE) + geom_boxplot(width=0.1, fill="white") + scale_color_brewer(palette="Dark2") + labs(title="RMSE across iRF runs", x="feature run", y="RMSE") + theme_minimal() + theme(legend.position = "none")
dev.off()








#### run same plot with MAE 
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_cross.species.MAE
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_cross.species.MAE
cp /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum/cut.score/foldRuns/results/MAE_foldResults.txt e.coli.MAE_foldResults.txt
cp /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum/cut.score/foldRuns/results/MAE_foldResults.txt y.lipolytica.MAE_foldResults.txt
cp /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum/cut.score/foldRuns/results/MAE_foldResults.txt Doench2014.MAE_foldResults.txt
cp /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum/cut.score/foldRuns/results/MAE_foldResults.txt putida.MAE_foldResults.txt


require(data.table)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_cross.species.MAE")

#create a list of the files from your target directory
file_list <- list.files(path="/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_cross.species.MAE")

#initiate a blank data frame, each iteration of the loop will add a column of the data from the given file to this variable
dataset <- data.frame()

for (i in 1:length(file_list)){
  temp_data <- fread(file_list[i], stringsAsFactors = F) #read in files using the fread function from the data.table package
  dataset <- do.call(cbind, sapply(file_list,data.table::fread, simplify = FALSE)) #for each iteration, bind the new data to the building dataset
}
colnames(dataset) <- c("H.sapien.MAE", "H.sapien.MEAN", "H.sapien.MAE.MEAN", "E.coli.MAE", "E.coli.MEAN", "E.coli.MAE.MEAN", "P.putida.MAE", "P.putida.MEAN", "P.putida.MAE.MEAN", "Y.lipolytica.MAE", "Y.lipolytica.MEAN", "Y.lipolytica.MAE.MEAN")

library(ggplot2)
library(reshape2)
library(RColorBrewer)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_cross.species.MAE")

dataset.order <- dataset[,c(6,9,12,3)]
dataset.order.melt <- melt(dataset.order)
pdf("MAE.cross.species.violin.pdf")
ggplot(dataset.order.melt, aes(x=value, y=variable, fill=variable)) + geom_violin(trim=FALSE) + geom_boxplot(width=0.1, fill="white") + scale_color_brewer(palette="Dark2") + labs(title="MAE/MEAN across iRF runs", x="feature run", y="R2") + theme_minimal() + theme(legend.position = "none")
dev.off()
#### summary figure of R2 and feature count: Figure S3A?

### number of features / R2 / correlation 
# raw = 5 / 0.0406861 / 0.2007612
# onehot = 5911 / 0.2600428516356858 / 0.4914184
# qct = 316 / 0.24183122435585644 / 0.4918057
# raw+onehot = 5916 / 0.2602828644651521 / 0.4931724
# raw+qct =  312 / 0.24177446035820813 / 0.4939777
# onehot+qct = 6227 / 0.24905182664101577 / 0.500817
# raw+onehot+qct = 6232 / 0.24906667479923555 / 0.5019173

library(ggplot2)
library(reshape2)
library(RColorBrewer)

df <- data.frame(feature.set = c("raw", "onehot", "QCT", "raw+onehot", "raw+QCT", "onehot+QCT", "raw+onehot+QCT"), R2 = c(0.0406861, 0.2600428516356858, 0.24183122435585644, 0.2602828644651521, 0.24177446035820813, 0.24905182664101577, 0.24906667479923555), Correlation = c(0.2007612, 0.4914184, 0.4918057, 0.4931724, 0.4939777, 0.500817, 0.5019173), feature.count = c(5, 5911, 316, 5916, 312, 6227, 6232))

ggplot(df) + geom_bar(aes(x=feature.set, y=Correlation, fill=feature.set), stat="identity") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Paired") + geom_line(aes(x=feature.set, y=feature.count, group=1),inherit.aes = FALSE, color="blue",size=2) + scale_y_continuous(name = "R2", sec.axis=sec_axis(~ . , name="Feature Count"), limits=c(0,6200)) + labs(title = "Size and Prediction Accuracy of Feature Subsets", x = "Feature Set", y = "R2") 

Remove highly correlated features…

# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test


# python 

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
import warnings
warnings.filterwarnings("ignore")
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix
np.random.seed(123)

data = pd.read_table('/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.txt')
data = data.iloc[:,2:-1]

label_encoder = LabelEncoder()
data.iloc[:,0] = label_encoder.fit_transform(data.iloc[:,0]).astype('float64')

corr = data.corr()
corr.to_csv("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.correlationmatrix.txt")

columns = np.full((corr.shape[0],), True, dtype=bool)
for i in range(corr.shape[0]):
    for j in range(i+1, corr.shape[0]):
        if abs(corr.iloc[i,j]) >= 0.9:
            if columns[j]:
                columns[j] = False

selected_columns = data.columns[columns]
data = data[selected_columns]

data.to_csv("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.pythoncorrelation.txt")


# R
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

# head -n 1 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.pythoncorrelation.txt > /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.pythoncorrelation.header.txt

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
df.noncor <- read.delim("Ecoli.finalquantum.pythoncorrelation.header.txt", header=F, sep=",")

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
df <- read.delim("Ecoli.finalquantum.txt", header=T, sep="\t")
# 6234
df.subset <- df[ , which(names(df) %in% df.noncor[1,])]
# 6190 <- removed 44 highly correlated features

df.features <- data.frame(feature = colnames(df))
df.features.noncor <- data.frame(t(df.noncor))
colnames(df.features.noncor) <- "feature"
df.removedfeatures<- subset(df.features, !(df.features$feature %in% df.features.noncor$feature))
#                       feature
# 1                     sgRNAID
# 2                   cut.score
# 21        sgRNA.tempsgRNA.raw
# 5920 p10basepair.Hlgap.eVEraw
# 5926   p10monomer.HLgap.eVraw
# 5937 p11basepair.Hlgap.eVEraw
# 5943   p11monomer.HLgap.eVraw
# 5954 p12basepair.Hlgap.eVEraw
# 5960   p12monomer.HLgap.eVraw
# 5971 p13basepair.Hlgap.eVEraw
# 5977   p13monomer.HLgap.eVraw
# 5988 p14basepair.Hlgap.eVEraw
# 5994   p14monomer.HLgap.eVraw
# 6005 p15basepair.Hlgap.eVEraw
# 6011   p15monomer.HLgap.eVraw
# 6022 p16basepair.Hlgap.eVEraw
# 6028   p16monomer.HLgap.eVraw
# 6039 p17basepair.Hlgap.eVEraw
# 6045   p17monomer.HLgap.eVraw
# 6056 p18basepair.Hlgap.eVEraw
# 6062   p18monomer.HLgap.eVraw
# 6069 p19basepair.Hlgap.eVEraw
# 6075   p19monomer.HLgap.eVraw
# 6078  p1basepair.Hlgap.eVEraw
# 6084    p1monomer.HLgap.eVraw
# 6095 p20basepair.Hlgap.eVEraw
# 6097   p20monomer.HLgap.eVraw
# 6100  p2basepair.Hlgap.eVEraw
# 6106    p2monomer.HLgap.eVraw
# 6117  p3basepair.Hlgap.eVEraw
# 6123    p3monomer.HLgap.eVraw
# 6134  p4basepair.Hlgap.eVEraw
# 6140    p4monomer.HLgap.eVraw
# 6151  p5basepair.Hlgap.eVEraw
# 6157    p5monomer.HLgap.eVraw
# 6168  p6basepair.Hlgap.eVEraw
# 6174    p6monomer.HLgap.eVraw
# 6185  p7basepair.Hlgap.eVEraw
# 6191    p7monomer.HLgap.eVraw
# 6202  p8basepair.Hlgap.eVEraw
# 6208    p8monomer.HLgap.eVraw
# 6219  p9basepair.Hlgap.eVEraw
# 6225    p9monomer.HLgap.eVraw
# 6234 p9trimer.No.electronsraw
### removing all monomer and basepair HL gap... what are they highly correlated with??
cor <- read.delim("Ecoli.finalquantum.correlationmatrix.txt", header=T, sep=",")
rownames(cor) <- cor$X
library(dplyr)
cor.hl <- cor %>% select("p1monomer.HLgap.eVraw", "p1basepair.Hlgap.eVEraw")
subset(cor.hl, abs(cor.hl$p1monomer.HLgap.eVraw) > 0.9)
# -0.9921853 V3.xsgRNA.raw (p1.C)
subset(cor.hl, abs(cor.hl$p1basepair.Hlgap.eVEraw) > 0.9)
# -1 p1basepair.Hbond.energyraw
cor.hl <- cor %>% select("p19monomer.HLgap.eVraw", "p19basepair.Hlgap.eVEraw")
subset(cor.hl, abs(cor.hl$p19monomer.HLgap.eVraw) > 0.9)
# -0.9910196 V75.xsgRNA.raw (p19.C)
subset(cor.hl, abs(cor.hl$p19basepair.Hlgap.eVEraw) > 0.9)
# -1 p1basepair.Hbond.energyraw
cor.hbond <- cor %>% select("p9tetramer.Hbond.stackingraw")
subset(cor.hbond, abs(cor.hbond$p9tetramer.Hbond.stackingraw) > 0.9)

### remove basepair.Hlgap.evraw features
### keep Hbond.stacking but remove Hbond.energy (for dimer, trimer, tetramer)

#df.mat <- as.matrix(df.subset[,2:ncol(df.subset)])
#df.mat.id <- cbind(as.data.frame(df$sgRNAID), df.mat)

df.rm <- df %>% select(-grep("basepair.Hlgap.eVEraw", names(df)), -grep("dimer.Hbond.energyraw", names(df)), -grep("trimer.Hbond.energyraw", names(df)), -grep("tetramer.Hbond.energyraw", names(df))) 
# 6160

write.table(df.rm, "Ecoli.finalquantum.noncorrelated.txt", quote=F, row.names=F, sep="\t")
write.table(df.rm[,c(1,3:ncol(df.rm))], "Ecoli.finalquantum.noncorrelated.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.rm[,c(1,3:ncol(df.rm))], "Ecoli.finalquantum.noncorrelated.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.rm[,3:ncol(df.rm)], "Ecoli.finalquantum.noncorrelated.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")



# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.noncorrelated
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.noncorrelated
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName e.coli.finalquantum.noncorrelated --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.noncorrelated.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.noncorrelated
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.noncorrelated/Submits/submit_full_e.coli.finalquantum.noncorrelated_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.noncorrelated/Submits/submit_train_e.coli.finalquantum.noncorrelated_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.noncorrelated/Submits/submit_test_e.coli.finalquantum.noncorrelated_0.sh

# Andes
module load python/3.7-anaconda3

vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.noncorrelated
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt e.coli.finalquantum.noncorrelated
# 0.2471523392232555

sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/e.coli.finalquantum.noncorrelated_cut.score.importance4 | head
# p20basepair.Hbond.energyraw: 204495
# p19dimer.Hbond.stackingraw: 92637.5
# p18dimer.Hbond.stackingraw: 66305.6
# p18trimer.Hbond.stackingraw: 65587.7
# V231.xsgRNA.raw: 43584.4
# p15tetramer.Hbond.stackingraw: 37593.2
# sgRNA.tempsgRNA.raw: 36557.7
# p14tetramer.Hbond.stackingraw: 35886.7
# p17tetramer.Hbond.stackingraw: 32507.3
# p11tetramer.Hbond.stackingraw: 32358



# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.noncorrelated/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("e.coli.finalquantum.noncorrelated_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4990876


##### RIT:

#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.noncorrelated/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score e.coli.finalquantum.noncorrelated

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.noncorrelated/cut.score/RIT.run

noramalize (0-1 scale) model and test on yeast & human data

# normalize cut.score to be on a zero to one scale to compare across species
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
ecoli <- read.delim("Ecoli.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)

ecoli.num <- mutate_all(ecoli[,2:ncol(ecoli)], function(x) as.numeric(as.character(x)))
ecoli.num$cut.score <- (ecoli.num$cut.score - min(ecoli.num$cut.score)) / (max(ecoli.num$cut.score) - min(ecoli.num$cut.score))
ecoli.num <- cbind(data.frame("sgRNAID" = ecoli$sgRNAID), ecoli.num)
summary(ecoli.num$cut.score)
 #   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 # 0.0000  0.3563  0.5618  0.5077  0.6757  1.0000 

write.table(ecoli.num[,1:2], "Ecoli.finalquantum.normalize.score.txt", quote=F, row.names=F, sep="\t")
write.table(ecoli.num[,1:2], "Ecoli.finalquantum.normalize.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = ecoli.num[,2]), "Ecoli.finalquantum.normalize.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.normalize
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.normalize
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName e.coli.finalquantum.normalize --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/Ecoli.finalquantum.normalize.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.normalize
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.normalize/Submits/submit_full_e.coli.finalquantum.normalize_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.normalize/Submits/submit_train_e.coli.finalquantum.normalize_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.normalize/Submits/submit_test_e.coli.finalquantum.normalize_0.sh

# Andes
module load python/3.7-anaconda3

vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.normalize
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt e.coli.finalquantum.normalize
# 0.2489770719208607

sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/e.coli.finalquantum.normalize_cut.score.importance4 | head
# p20basepair.Hbond.energyraw: 46.2351
# p20basepair.Hlgap.eVEraw: 40.0312
# p19dimer.Hbond.stackingraw: 35.2122
# sgRNA.gcsgRNA.raw: 23.9518
# p1tetramer.Hbond.energyraw: 20.35
# p11tetramer.Hbond.energyraw: 19.0148
# p18trimer.Hbond.stackingraw: 18.8612
# V231.xsgRNA.raw: 17.0118
# p18dimer.Hbond.energyraw: 16.8966
# p18dimer.Hbond.stackingraw: 15.3229


# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.normalize/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("e.coli.finalquantum.normalize_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.5015611
# scatter plots
setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli")
pred <- read.delim("e.coli.finalquantum.normalize_Set4_test.prediction", header=T, sep="\t", stringsAsFactors = F)
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t", stringsAsFactors = F)

pred.y <- cbind(pred, y)
pred.y$row_num <- seq.int(nrow(pred.y)) 
colnames(pred.y) <- c("pred", "yvec", "id")

library(ggplot2)
ggplot(pred.y, aes(x=yvec, y=pred)) + geom_point(stat="identity") + geom_smooth(method='lm') + theme_classic()
cor(pred.y$yvec, pred.y$pred)
# 0.5015611

library(dplyr)
pred.y.rank <- pred.y %>% mutate(yvec.rank=dense_rank(desc(-yvec)), pred.rank=dense_rank(desc(-pred)))
ggplot(pred.y.rank, aes(x=yvec.rank, y=pred.rank)) + geom_point(stat="identity") + geom_smooth(method='lm') + theme_classic()
cor(pred.y.rank$yvec.rank, pred.y.rank$pred.rank)
# 0.4887508

### is it better at predicting high or low scores??  based on input data??
## look at the distribution of scores and segment as high or low cutting efficiency??

ggplot(pred.y, aes(x=yvec)) + geom_density() + theme_classic()
pred.y.low <- subset(pred.y, pred.y$yvec < 0.5)
cor(pred.y.low$yvec, pred.y.low$pred)
# 0.2642204
pred.y.high <- subset(pred.y, pred.y$yvec > 0.5)
cor(pred.y.high$yvec, pred.y.high$pred)
# 0.2732044
### NOPE... what about classifying as high or low so the rank as binary

pred.y.binary <- pred.y.rank %>% mutate(yvec.binary = ifelse(yvec < 0.25, 0, ifelse(yvec > 0.75, 1, 0.5)), yvec.label =  ifelse(yvec < 0.25, "low (< 0.25)", ifelse(yvec > 0.75, "high (> 0.75)", "mid")))
cor(pred.y.binary$yvec.binary, pred.y.binary$pred)
# 0.3947551
ggplot(pred.y.binary, aes(x=yvec.label, y=pred, fill=yvec.label)) + geom_boxplot() + theme_classic()
library(dplyr)
library(tidyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
doench <- read.delim("Doench2014.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)
doench.num <- mutate_all(doench[,2:ncol(doench)], function(x) as.numeric(as.character(x)))
doench.num$cut.score <- (doench.num$cut.score - min(doench.num$cut.score)) / (max(doench.num$cut.score) - min(doench.num$cut.score))
summary(doench.num$cut.score)
#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
# 0.00000 0.03967 0.10641 0.17970 0.26133 1.00000 
write.table(doench.num[,1:2], "Doench2014.finalquantum.normalize.score.txt", quote=F, row.names=F, sep="\t")
write.table(doench.num[,1:2], "Doench2014.finalquantum.normalize.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = doench.num[,2]), "Doench2014.finalquantum.normalize.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
lipolytica <- read.delim("y.lipolytica.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)
lipolytica.num <- mutate_all(lipolytica[,1:ncol(lipolytica)], function(x) as.numeric(as.character(x)))
lipolytica.num$cut.score <- (lipolytica.num$cut.score - min(lipolytica.num$cut.score)) / (max(lipolytica.num$cut.score) - min(lipolytica.num$cut.score))
summary(lipolytica.num$cut.score)
 #   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 # 0.0000  0.2167  0.2877  0.3389  0.4460  1.0000 
write.table(lipolytica.num[,1:2], "y.lipolytica.finalquantum.normalize.score.txt", quote=F, row.names=F, sep="\t")
write.table(lipolytica.num[,1:2], "y.lipolytica.finalquantum.normalize.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = lipolytica.num[,2]), "y.lipolytica.finalquantum.normalize.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
# test e.coli trained model with y.lipolytica data

#!/bin/bash -l

#BSUB -P SYB105
#BSUB -W 02:15
#BSUB -nnodes 50
#BSUB -J yeast.test_0
#BSUB -o yeast.test_0.o%J
#BSUB -e yeast.test_0.e%J

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.normalize/yeast.test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.normalize/yeast.test

/usr/bin/time -f "%e" jsrun -n 1 -a 1 -c 40 -bpacked:40 /gpfs/alpine/syb105/proj-shared/Projects/iRF/IterativeRanger/cpp_version/build/ranger --file /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.finalquantum.features_overlap_noSampleIDs.txt --yfile /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.finalquantum.normalize.score_overlap_noSampleIDs.txt --predict /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.normalize/cut.score/e.coli.finalquantum.normalize_cut.score.forest --treetype 3 --depvarname cut.score --impmeasure 1 --nthreads 160 --useMPI 0 --outprefix ecoli.model.yeast.normalize.test --outputDirectory /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.normalize/yeast.test > /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.normalize/yeast.test/ecoli.model.yeast.normalize.test.o

# bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.normalize/ecoli.model.yeast.test.sh


#### test the output
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/")
score <- read.delim("y.lipolytica.finalquantum.normalize.score_overlap_noSampleIDs.txt", header=T, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.normalize/yeast.test/")
predict <- read.delim("ecoli.model.yeast.normalize.test.prediction", header=T, sep="\t")

score.predict <- cbind(score, predict)
cor(score.predict$cut.score, score.predict$Predictions.)
# -0.02454386
# test e.coli trained model with Doench data

#!/bin/bash -l

#BSUB -P SYB105
#BSUB -W 02:15
#BSUB -nnodes 50
#BSUB -J doench.test_0
#BSUB -o doench.test_0.o%J
#BSUB -e doench.test_0.e%J

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.normalize/doench.test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.normalize/doench.test

/usr/bin/time -f "%e" jsrun -n 1 -a 1 -c 40 -bpacked:40 /gpfs/alpine/syb105/proj-shared/Projects/iRF/IterativeRanger/cpp_version/build/ranger --file /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.finalquantum.features_overlap_noSampleIDs.txt --yfile /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.finalquantum.normalize.score_overlap_noSampleIDs.txt --predict /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.normalize/cut.score/e.coli.finalquantum.normalize_cut.score.forest --treetype 3 --depvarname cut.score --impmeasure 1 --nthreads 160 --useMPI 0 --outprefix ecoli.model.doench.normalize.test --outputDirectory /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.normalize/doench.test > /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.normalize/doench.test/ecoli.model.doench.normalize.test.o

# bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.normalize/ecoli.model.doench.normalize.test.sh


#### test the output
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/")
score <- read.delim("Doench2014.finalquantum.normalize.score_overlap_noSampleIDs.txt", header=T, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.normalize/doench.test/")
predict <- read.delim("ecoli.model.doench.normalize.test.prediction", header=T, sep="\t")

score.predict <- cbind(score, predict)
cor(score.predict$cut.score, score.predict$Predictions.)
# 0.05821821
# test e.coli trained model with Doench and Chuai data

#!/bin/bash -l

#BSUB -P SYB105
#BSUB -W 02:15
#BSUB -nnodes 50
#BSUB -J doench.chuai.test_0
#BSUB -o doench.chuai.test_0.o%J
#BSUB -e doench.chuai.test_0.e%J

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.normalize/doench.chuai.test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.normalize/doench.chuai.test

/usr/bin/time -f "%e" jsrun -n 1 -a 1 -c 40 -bpacked:40 /gpfs/alpine/syb105/proj-shared/Projects/iRF/IterativeRanger/cpp_version/build/ranger --file /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/Doench2014CORRECTED.Chuai2018.finalquantum.features_overlap_noSampleIDs.txt --yfile /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/Doench2014CORRECTED.Chuai2018.finalquantum.score_overlap_noSampleIDs.txt --predict /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.normalize/cut.score/e.coli.finalquantum.normalize_cut.score.forest --treetype 3 --depvarname cut.score --impmeasure 1 --nthreads 160 --useMPI 0 --outprefix ecoli.model.doench.chuai.test --outputDirectory /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.normalize/doench.chuai.test > /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.normalize/doench.chuai.test/ecoli.model.doench.chuai.test.o

# bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.normalize/ecoli.model.doench.chuai.test.sh


#### test the output
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/")
score <- read.delim("Doench2014CORRECTED.Chuai2018.finalquantum.score_overlap_noSampleIDs.txt", header=T, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.finalquantum.normalize/doench.chuai.test/")
predict <- read.delim("ecoli.model.doench.chuai.test.prediction", header=T, sep="\t")

score.predict <- cbind(score, predict)
cor(score.predict$cut.score, score.predict$Predictions.)
# 0.01564792

Classification

  • read through matrix and classify as binary by quantile
    • q1: cutting efficiency < 0.25 = 0, 1
    • q2: cutting efficiency < 0.50 = 0, 1
    • q3: cutting efficiency < 0.75 = 0, 1
# mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
features <- read.delim("Ecoli.finalquantum.features.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("Ecoli.finalquantum.normalize.score.txt", header=T, sep="\t", stringsAsFactors = F)
summary(score$cut.score)
 #   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 # 0.0000  0.3563  0.5618  0.5077  0.6757  1.0000 
 
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification")
score.q1 <- score %>% mutate(cut.score = ifelse(cut.score < 0.25, 0, 1))
score.q2 <- score %>% mutate(cut.score = ifelse(cut.score < 0.50, 0, 1))
score.q3 <- score %>% mutate(cut.score = ifelse(cut.score < 0.75, 0, 1))

feature.score.q1 <- left_join(score.q1, features, by="sgRNAID")
write.table(feature.score.q1[,2:ncol(feature.score.q1)], "Ecoli.finalquantum.classify.q1.iRFmatrix.tsv", quote=F, row.names=F, sep=",")
feature.score.q2 <- left_join(score.q2, features, by="sgRNAID")
write.table(feature.score.q2[,2:ncol(feature.score.q2)], "Ecoli.finalquantum.classify.q2.iRFmatrix.tsv", quote=F, row.names=F, sep=",")
feature.score.q3 <- left_join(score.q3, features, by="sgRNAID")
write.table(feature.score.q3[,2:ncol(feature.score.q3)], "Ecoli.finalquantum.classify.q3.iRFmatrix.tsv", quote=F, row.names=F, sep=",")

write.table(feature.score.q1[,1:2], "Ecoli.finalquantum.classify.q1.score.txt", quote=F, row.names=F, sep="\t")
write.table(feature.score.q1[,1:2], "Ecoli.finalquantum.classify.q1.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = feature.score.q1[,2]), "Ecoli.finalquantum.classify.q1.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(feature.score.q2[,1:2], "Ecoli.finalquantum.classify.q2.score.txt", quote=F, row.names=F, sep="\t")
write.table(feature.score.q2[,1:2], "Ecoli.finalquantum.classify.q2.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = feature.score.q2[,2]), "Ecoli.finalquantum.classify.q2.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(feature.score.q3[,1:2], "Ecoli.finalquantum.classify.q3.score.txt", quote=F, row.names=F, sep="\t")
write.table(feature.score.q3[,1:2], "Ecoli.finalquantum.classify.q3.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = feature.score.q3[,2]), "Ecoli.finalquantum.classify.q3.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

write.table(features, "Ecoli.finalquantum.classify.features.txt", quote=F, row.names=F, sep="\t")
write.table(features, "Ecoli.finalquantum.classify.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(features[,2:ncol(features)], "Ecoli.finalquantum.classify.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

iRF

module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q1.iRF
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q1.iRF
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName classify.q1 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/Ecoli.finalquantum.classify.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/Ecoli.finalquantum.classify.q1.score.txt
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q2.iRF
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q2.iRF
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName classify.q2 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/Ecoli.finalquantum.classify.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/Ecoli.finalquantum.classify.q2.score.txt
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q3.iRF
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q3.iRF
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName classify.q3 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/Ecoli.finalquantum.classify.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/Ecoli.finalquantum.classify.q3.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q1.iRF/Submits/submit_full_classify.q1_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q2.iRF/Submits/submit_full_classify.q2_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q3.iRF/Submits/submit_full_classify.q3_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q1.iRF/Submits/submit_train_classify.q1_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q2.iRF/Submits/submit_train_classify.q2_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q3.iRF/Submits/submit_train_classify.q3_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q1.iRF/Submits/submit_test_classify.q1_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q2.iRF/Submits/submit_test_classify.q2_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q3.iRF/Submits/submit_test_classify.q3_0.sh

# Andes
module load python/3.7-anaconda3
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q1.iRF
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt classify.q1
# 0.12529151834520436
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/classify.q1_cut.score.importance4 | head
# p19dimer.Hbond.stackingraw: 110.528
# p18dimer.Hbond.energyraw: 52.9624
# p1tetramer.Hbond.energyraw: 51.0775
# V231.xsgRNA.raw: 47.3827
# p15tetramer.Hbond.stackingraw: 42.5926
# p11tetramer.Hbond.energyraw: 39.3022
# p18trimer.Hlgap.eVEraw: 32.6164
# p6tetramer.Hlgap.eVEraw: 32.1497
# p6tetramer.Hbond.stackingraw: 31.8695
# p2tetramer.Hbond.energyraw: 31.798

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q1.iRF/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("classify.q1_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.3588828

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q2.iRF
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt classify.q2
# 0.18607930324108463
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/classify.q2_cut.score.importance4 | head
# p20basepair.Hbond.energyraw: 183.638
# p20basepair.Hlgap.eVEraw: 163.451
# p19dimer.Hbond.stackingraw: 144.959
# p1tetramer.Hbond.energyraw: 82.4669
# p11tetramer.Hbond.energyraw: 80.3186
# p18trimer.Hbond.stackingraw: 79.974
# p18dimer.Hbond.energyraw: 72.5719
# p15tetramer.Hbond.stackingraw: 60.8338
# p13tetramer.Hbond.energyraw: 60.0197
# p16tetramer.Hbond.stackingraw: 54.315

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q2.iRF/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("classify.q2_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4318153


cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q3.iRF
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/YNames.txt classify.q3
# 0.01631988476578034
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/classify.q3_cut.score.importance4 | head
# p20basepair.Hbond.energyraw: 27.6664
# p20basepair.Hlgap.eVEraw: 25.6897
# p12tetramer.Hlgap.eVEraw: 22.6305
# p4tetramer.Hbond.stackingraw: 22.2947
# p7tetramer.Hlgap.eVEraw: 22.028
# p1tetramer.Hlgap.eVEraw: 21.0615
# p9tetramer.Hlgap.eVEraw: 20.7667
# p12tetramer.Hbond.stackingraw: 20.737
# p11tetramer.Hbond.stackingraw: 20.2828
# p10tetramer.Hbond.stackingraw: 20.0696

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/binary.classification/q3.iRF/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("classify.q3_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.1601039

Y.lipolytica

Data provided by collaborators (Bill)

# /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/Y.Lipolytica.SupTable1.txt
# /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/GSM552919_Ylip.fsa.txt

sgRNA dataset

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

### dataset --> Data S4... save each sheet as a dataframe, add column declaring Cas9 type, intersect with Data S1 for sequence, create new sgRNAID using both the ID and Cas9 type, merge files
setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration")
df <- read.delim("Y.Lipolytica.SupTable1.txt", header=T, sep="\t")

library(dplyr)
library(tidyr)

df2 <- unite(df, sgRNAID,c("Number", "Gene.target"), sep="_", remove=TRUE)
df3 <- df2[,c(1,3,2)]
colnames(df3) <- c("sgRNAID",   "cut.score",    "nucleotide.sequence")
df.na <- na.omit(df3)
# 46711
write.table(df.na, "Y.Lipolytica.txt", quote=F, row.names=F, sep="\t")


sed '1d' Y.Lipolytica.txt | awk '{print ">"$1"\n"$3}' > Y.Lipolytica.fasta

# cd /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/
# scp Y.Lipolytica.txt noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/.
# scp Y.Lipolytica.fasta noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/.

Baisya et al., 2021

https://www.biorxiv.org/content/10.1101/2021.09.29.461753v1.supplementary-material

sgRNA dataset

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica
# scp /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/y.lipolytica/baisya2021.tableS3.txt noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/.
# scp /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/GCF_000002525.2_ASM252v1_genomic.fna.gz noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/.
# scp /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/GCF_000002525.2_ASM252v1_genomic.gff.gz noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/.

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/")
df <- read.delim("baisya2021.tableS3.txt", header=T, sep="\t")
df2 <- df[,c(1,6,3)]
colnames(df2) <- c("sgRNAID", "cut.score", "nucleotide.sequence")
df.na <- na.omit(df2)

write.table(df.na, "baisya2021.txt", quote=F, row.names=F, sep="\t")


cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/
sed '1d' baisya2021.txt | awk '{print ">"$1"\n"$2}' > baisya2021.fasta

blast

  • do a search for the sgRNA sequence in the genome
    • input fasta file of sequences, output coordinates
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

## blast
# conda install blast
# cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes
# wget https://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/ncbi-blast-2.11.0+-x64-linux.tar.gz
# tar zxvpf ncbi-blast-2.11.0+-x64-linux.tar.gz
# export PATH=$PATH:$HOME/ncbi-blast-2.10.1+/bin
# echo $PATH
# mkdir $HOME/blastdb
# export BLASTDB=$HOME/blastdb
# set BLASTDB=$HOME/blastdb

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica

/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/makeblastdb -in GCF_000002525.2_ASM252v1_genomic.fna -dbtype nucl
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query baisya2021.fasta -db GCF_000002525.2_ASM252v1_genomic.fna -out y.lipolytica.gRNA.blast.tab -outfmt 6 -evalue 0.0005 -task blastn -num_threads 10

awk '{if ($9 > $10) print $2"\t"$10"\t"$9"\t"$1}' y.lipolytica.gRNA.blast.tab > tmp1.bed
awk '{if ($10 > $9) print $2"\t"$9"\t"$10"\t"$1}' y.lipolytica.gRNA.blast.tab > tmp2.bed
cat tmp1.bed tmp2.bed > y.lipolytica.gRNA.blast.bed
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

# R

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/")
df <- read.delim("baisya2021.txt", header=T, sep="\t")
colnames(df) <- c("sgRNAID", "nucleotide.sequence", "cut.score")
coord <- read.delim("y.lipolytica.gRNA.blast.bed", header=F, sep="\t")
colnames(coord) <- c("chr", "start", "end", "sgRNA")
df$sgRNA <- df$sgRNAID

library(dplyr)
df.coord <- left_join(coord, df, by="sgRNA")
write.table(df.coord, "y.lipolytica.sgRNA.coord.txt", quote=F, row.names=F, sep="\t")

sliding windows

  • make 20bp sliding windows (every 1bp)
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica

faidx GCF_000002525.2_ASM252v1_genomic.fna -i chromsizes > y.lipolytica.sizes.genome
bedtools makewindows -g y.lipolytica.sizes.genome -w 20 -s 1 > y.lipolytica.20bp.sliding.bed

Features

Gene density & GC content

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica

## genes
grep 'gene' GCF_000002525.2_ASM252v1_genomic.gff | sort -k 1,1 -k 4,4n > GCF_000002525.2_ASM252v1_genomic.gene.sort.gff
bedtools intersect -wo -a y.lipolytica.20bp.sliding.bed -b GCF_000002525.2_ASM252v1_genomic.gene.sort.gff > y.lipolytica.gene.20sliding.bed

## GC content
bedtools nuc -fi GCF_000002525.2_ASM252v1_genomic.fna -bed y.lipolytica.20bp.sliding.bed | sed '1d' > y.lipolytica.GC.20sliding.bed

Temperature of melting (Tm)

https://biopython.org/docs/1.75/api/Bio.SeqUtils.MeltingTemp.html

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

Bio.SeqUtils.MeltingTemp.Tm_NN(seq, check=True, strict=True, c_seq=None, shift=0, nn_table=None, tmm_table=None, imm_table=None, de_table=None, dnac1=25, dnac2=25, selfcomp=False, Na=50, K=0, Tris=0, Mg=0, dNTPs=0, saltcorr=5)

https://warwick.ac.uk/fac/sci/moac/people/students/peter_cock/python/fasta_n

# summit: # conda install -c conda-forge biopython 

### sgRNA
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica

# count nucleotides
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica
python3

input_file = open('baisya2021.fasta', 'r')
output_file = open('nucleotide_counts_sgRNA.tsv','w')
output_file.write('Window\tA\tC\tG\tT\tLength\tCG%\n')
from Bio import SeqIO
for cur_record in SeqIO.parse(input_file, "fasta") :
    gene_name = cur_record.name
    A_count = cur_record.seq.count('A')
    C_count = cur_record.seq.count('C')
    G_count = cur_record.seq.count('G')
    T_count = cur_record.seq.count('T')
    length = len(cur_record.seq)
    cg_percentage = float(C_count + G_count) / length
    output_line = '%s\t%i\t%i\t%i\t%i\t%i\t%f\n' % \
    (gene_name, A_count, C_count, G_count, T_count, length, cg_percentage)
    output_file.write(output_line)
    
output_file.close()
input_file.close()
exit()

# Melting temperature(°C) = 64.9 + 41 * (nG+nC-16.4)/(nA+nT+nG+nC)
R

library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica")
df <- read.delim("nucleotide_counts_sgRNA.tsv", header=T, sep="\t")
df.melt <- df %>% mutate(MeltingTemp = 64.9 + 41 * (G+C-16.4) / (A+T+G+C))

write.table(df.melt, "y.lipolytica.nucleotide_counts_sgRNA_temp.txt", quote=F, row.names=F, sep="\t")
q()




### 20bp sliding windows
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica
bedtools getfasta -fi GCF_000002525.2_ASM252v1_genomic.fna -bed y.lipolytica.20bp.sliding.bed -fo y.lipolytica.20sliding.fa

# count nucleotides
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica
python3

input_file = open('y.lipolytica.20sliding.fa', 'r')
output_file = open('nucleotide_counts_20sliding.tsv','w')
output_file.write('Window\tA\tC\tG\tT\tLength\tCG%\n')
from Bio import SeqIO
for cur_record in SeqIO.parse(input_file, "fasta") :
    gene_name = cur_record.name
    A_count = cur_record.seq.count('A')
    C_count = cur_record.seq.count('C')
    G_count = cur_record.seq.count('G')
    T_count = cur_record.seq.count('T')
    length = len(cur_record.seq)
    cg_percentage = float(C_count + G_count) / length
    output_line = '%s\t%i\t%i\t%i\t%i\t%i\t%f\n' % \
    (gene_name, A_count, C_count, G_count, T_count, length, cg_percentage)
    output_file.write(output_line)
    
output_file.close()
input_file.close()
exit()

# Melting temperature(°C) = 64.9 + 41 * (nG+nC-16.4)/(nA+nT+nG+nC)
R

library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica")
df <- read.delim("nucleotide_counts_20sliding.tsv", header=T, sep="\t")
df.melt <- df %>% mutate(MeltingTemp = 64.9 + 41 * (G+C-16.4) / (A+T+G+C))

write.table(df.melt, "y.lipolytica.nucleotide_counts_20sliding_temp.txt", quote=F, row.names=F, sep="\t")
q()

Onehot encoding

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/
cut -f 1,3 baisya2021.txt > y.lipolytica.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/encode_sequences.py y.lipolytica.noscore.txt


# separate nucleotide sequence values into individual columns in data frame so each position counts as one feature
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/

sed '1d' y.lipolytica.noscore_independent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID A C T G' | cut -d ' ' -f 1-5 > y.lipolytica_ind1.txt
sed '1d' y.lipolytica.noscore_independent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID AA AC AT AG CA CC CT CG TA TC TT TG GA GC GT GG' | cut -d ' ' -f 1-17 > y.lipolytica_ind2.txt
sed '1d' y.lipolytica.noscore_dependent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID p1.A p1.C p1.T p1.G p2.A p2.C p2.T p2.G p3.A p3.C p3.T p3.G p4.A p4.C p4.T p4.G p5.A p5.C p5.T p5.G p6.A p6.C p6.T p6.G p7.A p7.C p7.T p7.G p8.A p8.C p8.T p8.G p9.A p9.C p9.T p9.G p10.A p10.C p10.T p10.G p11.A p11.C p11.T p11.G p12.A p12.C p12.T p12.G p13.A p13.C p13.T p13.G p14.A p14.C p14.T p14.G p15.A p15.C p15.T p15.G p16.A p16.C p16.T p16.G p17.A p17.C p17.T p17.G p18.A p18.C p18.T p18.G p19.A p19.C p19.T p19.G p20.A p20.C p20.T p20.G' | cut -d ' ' -f 1-81 > y.lipolytica_dep1.txt
sed '1d' y.lipolytica.noscore_dependent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID p1.AA p1.AC p1.AT p1.AG p1.CA p1.CC p1.CT p1.CG p1.TA p1.TC p1.TT p1.TG p1.GA p1.GC p1.GT p1.GG p2.AA p2.AC p2.AT p2.AG p2.CA p2.CC p2.CT p2.CG p2.TA p2.TC p2.TT p2.TG p2.GA p2.GC p2.GT p2.GG p3.AA p3.AC p3.AT p3.AG p3.CA p3.CC p3.CT p3.CG p3.TA p3.TC p3.TT p3.TG p3.GA p3.GC p3.GT p3.GG p4.AA p4.AC p4.AT p4.AG p4.CA p4.CC p4.CT p4.CG p4.TA p4.TC p4.TT p4.TG p4.GA p4.GC p4.GT p4.GG p5.AA p5.AC p5.AT p5.AG p5.CA p5.CC p5.CT p5.CG p5.TA p5.TC p5.TT p5.TG p5.GA p5.GC p5.GT p5.GG p6.AA p6.AC p6.AT p6.AG p6.CA p6.CC p6.CT p6.CG p6.TA p6.TC p6.TT p6.TG p6.GA p6.GC p6.GT p6.GG p7.AA p7.AC p7.AT p7.AG p7.CA p7.CC p7.CT p7.CG p7.TA p7.TC p7.TT p7.TG p7.GA p7.GC p7.GT p7.GG p8.AA p8.AC p8.AT p8.AG p8.CA p8.CC p8.CT p8.CG p8.TA p8.TC p8.TT p8.TG p8.GA p8.GC p8.GT p8.GG p9.AA p9.AC p9.AT p9.AG p9.CA p9.CC p9.CT p9.CG p9.TA p9.TC p9.TT p9.TG p9.GA p9.GC p9.GT p9.GG p10.AA p10.AC p10.AT p10.AG p10.CA p10.CC p10.CT p10.CG p10.TA p10.TC p10.TT p10.TG p10.GA p10.GC p10.GT p10.GG p11.AA p11.AC p11.AT p11.AG p11.CA p11.CC p11.CT p11.CG p11.TA p11.TC p11.TT p11.TG p11.GA p11.GC p11.GT p11.GG p12.AA p12.AC p12.AT p12.AG p12.CA p12.CC p12.CT p12.CG p12.TA p12.TC p12.TT p12.TG p12.GA p12.GC p12.GT p12.GG p13.AA p13.AC p13.AT p13.AG p13.CA p13.CC p13.CT p13.CG p13.TA p13.TC p13.TT p13.TG p13.GA p13.GC p13.GT p13.GG p14.AA p14.AC p14.AT p14.AG p14.CA p14.CC p14.CT p14.CG p14.TA p14.TC p14.TT p14.TG p14.GA p14.GC p14.GT p14.GG p15.AA p15.AC p15.AT p15.AG p15.CA p15.CC p15.CT p15.CG p15.TA p15.TC p15.TT p15.TG p15.GA p15.GC p15.GT p15.GG p16.AA p16.AC p16.AT p16.AG p16.CA p16.CC p16.CT p16.CG p16.TA p16.TC p16.TT p16.TG p16.GA p16.GC p16.GT p16.GG p17.AA p17.AC p17.AT p17.AG p17.CA p17.CC p17.CT p17.CG p17.TA p17.TC p17.TT p17.TG p17.GA p17.GC p17.GT p17.GG p18.AA p18.AC p18.AT p18.AG p18.CA p18.CC p18.CT p18.CG p18.TA p18.TC p18.TT p18.TG p18.GA p18.GC p18.GT p18.GG p19.AA p19.AC p19.AT p19.AG p19.CA p19.CC p19.CT p19.CG p19.TA p19.TC p19.TT p19.TG p19.GA p19.GC p19.GT p19.GG p20.AA p20.AC p20.AT p20.AG p20.CA p20.CC p20.CT p20.CG p20.TA p20.TC p20.TT p20.TG p20.GA p20.GC p20.GT p20.GG' | cut -d ' ' -f 1-321 > y.lipolytica_dep2.txt

chemical tensors

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/
sed '1d' y.lipolytica.noscore.txt | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 p16 p17 p18 p19 p20' | cut -d ' ' -f 1-21 > y.lipolytica.sequence.txt


# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

R
library(dplyr)
library(reshape2)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("protein_rna_dna-vector_lee_nucleotide_dna_data.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/")
seq <- read.delim("y.lipolytica.sequence.txt", header=T, sep=" ", stringsAsFactors = F)

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:5]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- c("A", "C", "G", "T")

rownames(seq) <- seq[,1]
seq.df <- seq[,2:21]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))

seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "y.lipolytica.tensors.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "y.lipolytica.tensors.melt.txt", quote=F, row.names=F, sep="\t")

RNA structure (ViennaRNA)

https://www.tbi.univie.ac.at/RNA/tutorial/ minimum free energy (MFE) structure

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate ViennaRNA

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/vienna
RNAfold < ../baisya2021.fasta > y.lipolytica.gRNA.ViennaRNA.output.txt

grep '(' y.lipolytica.gRNA.ViennaRNA.output.txt | grep -Eo '[+-]?[0-9]+([.][0-9]+)?' > y.lipolytica.gRNA.ViennaRNA.output.value.txt
grep '>' y.lipolytica.gRNA.ViennaRNA.output.txt | sed 's/>//g' > y.lipolytica.gRNA.names.txt
paste y.lipolytica.gRNA.names.txt y.lipolytica.gRNA.ViennaRNA.output.value.txt > y.lipolytica.gRNA.ViennaRNA.output.value.id.txt
cp y.lipolytica.gRNA.ViennaRNA.output.value.id.txt ../.

# 20bp sliding fasta
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/vienna
RNAfold < ../y.lipolytica.20sliding.fa > y.lipolytica.20sliding.ViennaRNA.output.txt

grep '(' y.lipolytica.20sliding.ViennaRNA.output.txt | grep -Eo '[+-]?[0-9]+([.][0-9]+)?' > y.lipolytica.20sliding.ViennaRNA.output.value.txt
grep '>' y.lipolytica.20sliding.ViennaRNA.output.txt | sed 's/>//g' > y.lipolytica.20sliding.names.txt
paste y.lipolytica.20sliding.names.txt y.lipolytica.20sliding.ViennaRNA.output.value.txt > y.lipolytica.20sliding.ViennaRNA.output.value.id.txt
cp y.lipolytica.20sliding.ViennaRNA.output.value.id.txt ../.
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J ViennaRNA.ylipolytica
#SBATCH -N 2
#SBATCH -t 48:00:00

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate ViennaRNA

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/vienna
RNAfold < ../y.lipolytica.20sliding.fa > y.lipolytica.20sliding.ViennaRNA.output.txt

grep '(' y.lipolytica.20sliding.ViennaRNA.output.txt | grep -Eo '[+-]?[0-9]+([.][0-9]+)?' > y.lipolytica.20sliding.ViennaRNA.output.value.txt
grep '>' y.lipolytica.20sliding.ViennaRNA.output.txt | sed 's/>//g' > y.lipolytica.20sliding.names.txt
paste y.lipolytica.20sliding.names.txt y.lipolytica.20sliding.ViennaRNA.output.value.txt > y.lipolytica.20sliding.ViennaRNA.output.value.id.txt
cp y.lipolytica.20sliding.ViennaRNA.output.value.id.txt ../.

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/ViennaRNA.ylipolytica.sh

GATC motif

  • proxy for putative methylation
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

## GATC motif
## fastaregex
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f GCF_000002525.2_ASM252v1_genomic.fna -r 'GATC' > y.lipolytica.gatc.bed

bedtools intersect -wo -a y.lipolytica.20bp.sliding.bed -b y.lipolytica.gatc.bed > y.lipolytica.gatc.20sliding.bed

PAM

https://www.synthego.com/guide/how-to-use-crispr/pam-sequence

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

# generate fastq file of NGG sequences and blast to reference

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica
# vim NGG.PAM.fasta

## fastaRegexFinder
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f GCF_000002525.2_ASM252v1_genomic.fna -r 'AGG' > y.lipolytica.AGG.PAM.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f GCF_000002525.2_ASM252v1_genomic.fna -r 'TGG' > y.lipolytica.TGG.PAM.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f GCF_000002525.2_ASM252v1_genomic.fna -r 'CGG' > y.lipolytica.CGG.PAM.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f GCF_000002525.2_ASM252v1_genomic.fna -r 'GGG' > y.lipolytica.GGG.PAM.txt

cat y.lipolytica.AGG.PAM.txt y.lipolytica.TGG.PAM.txt y.lipolytica.CGG.PAM.txt y.lipolytica.GGG.PAM.txt > y.lipolytica.NGG.PAM.txt
sort -k 1,1 -k 2,2n y.lipolytica.NGG.PAM.txt > y.lipolytica.NGG.PAM.sorted.bed

# intersect with sliding windows in the genome to get density for DWT
bedtools intersect -wo -a y.lipolytica.20bp.sliding.bed -b y.lipolytica.NGG.PAM.sorted.bed > y.lipolytica.NGG.PAM.20bp.sliding.windows.bed

# closest with gRNAs to identify distance (downstream, strand)
awk '{print $0"\t""+"}' y.lipolytica.sgRNA.coord.bed > y.lipolytica.sgRNA.coord.strand.txt
bedtools closest -a y.lipolytica.sgRNA.coord.strand.txt -b y.lipolytica.NGG.PAM.sorted.bed -io -iu -D a > y.lipolytica.sgRNA.closestPAM.bed

location relative to gene

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica
cut -f 1-4 y.lipolytica.sgRNA.coord.txt | sed '1d' | sort -k 1,1 -k 2,2n > y.lipolytica.sgRNA.coord.bed
grep 'gene' GCF_000002525.2_ASM252v1_genomic.gff | sort -k 1,1 -k 4,4n > GCF_000002525.2_ASM252v1_genomic.gene.sort.gff
bedtools closest -a y.lipolytica.sgRNA.coord.bed -b GCF_000002525.2_ASM252v1_genomic.gene.sort.gff -D b > y.lipolytica.sgRNA.gene.closest.bed

Raw features matrix

# salloc -A SYB105 -N 2 -t 4:00:00

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
structure <- read.delim("y.lipolytica.gRNA.ViennaRNA.output.value.id.txt", header=F, sep="\t", stringsAsFactors = F)
nuc <- read.delim("y.lipolytica.nucleotide_counts_sgRNA_temp.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("y.lipolytica.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(5:6)]
colnames(score.df) <- c("sgRNAID", "cut.score")

structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]

# strucutre, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])

structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"

structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]

structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score.df, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")

## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
onehot.ind1 <- read.delim("y.lipolytica_ind1.txt", header=T, sep=" ")
onehot.ind2 <- read.delim("y.lipolytica_ind2.txt", header=T, sep=" ")
onehot.dep1 <- read.delim("y.lipolytica_dep1.txt", header=T, sep=" ")
onehot.dep2 <- read.delim("y.lipolytica_dep2.txt", header=T, sep=" ")
onehot.dep2 <- onehot.dep2[,1:305]

onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep <- full_join(onehot.dep1, onehot.dep2, by="sgRNAID")
onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"

data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))

df.melt <- melt(data.onehot, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
write.table(df.id, "df.id.test.txt", quote=F, row.names=F, sep="\t")

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica")
tensor <- read.delim("y.lipolytica.tensors.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0

tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0
write.table(tensor.id, "tensor.id.test", quote=F, row.names=F, sep="\t")



# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
df.id <- read.delim("df.id.test.txt", header=T, sep="\t")
tensor.id <- read.delim("tensor.id.test", header=T, sep="\t")

df.score <- unique(df.id[,c(1,3)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]

head(df.id)
head(tensor.score.order)
tensor.df <- rbind(df.id, tensor.score.order)
write.table(tensor.df, "y.lipolytica.raw.onehot.tensor.txt", quote=F, row.names=F, sep="\t")

df.dcast <- tensor.df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
write.table(df.dcast, "y.lipolytica.raw.onehot.tensor.dcast.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast)
# 45271
df.dcast.na <- na.omit(df.dcast)
write.table(df.dcast.na, "y.lipolytica.raw.onehot.tensor.dcast.na.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast.na)
# 45271


# pam (distance and nucleotide)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
sgRNA.pam <- read.table("y.lipolytica.sgRNA.closestPAM.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.pam.sub <- sgRNA.pam[,c(4,12,13)]
colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]

score.location <- left_join(score.df, sgRNA.pam.df, by=c("sgRNAID"))
score.location$scale <- 0

df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
# 45271
write.table(df.dcast.na, "y.lipolytica.sgRNA.pam.dcast.txt", quote=F, row.names=F, sep="\t")

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
df.dcast <- read.delim("y.lipolytica.sgRNA.pam.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
df <- read.delim("y.lipolytica.raw.onehot.tensor.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)

df.location <- left_join(df, df.dcast, by=c("sgRNAID"))
nrow(df.location)
# 45271

write.table(df.location, "y.lipolytica.raw.onehot.tensor.pam.dcast.na.txt", quote=F, row.names=F, sep="\t")



# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
sgRNA.genes <- read.table("y.lipolytica.sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- unique(sgRNA.genes[,c(4,14)])
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")

score.location <- left_join(score.df, sgRNA.genes.df, by=c("sgRNAID"))
score.location$scale <- 0

df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
# 45271
write.table(df.dcast.na, "y.lipolytica.sgRNA.location.dcast.txt", quote=F, row.names=F, sep="\t")


setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
df.dcast <- read.delim("y.lipolytica.sgRNA.location.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
df <- read.delim("y.lipolytica.raw.onehot.tensor.pam.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)

df.location <- inner_join(df, df.dcast, by=c("sgRNAID"))
nrow(df.location)
# 45271

write.table(df.location, "y.lipolytica.raw.onehot.tensor.pam.location.dcast.na.txt", quote=F, row.names=F, sep="\t")

HAAR wavelets

#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J haar.matrix
#SBATCH -N 1
#SBATCH -p gpu
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica
R CMD BATCH haar.matrix.R

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/haar.matrix.sh
salloc -A SYB105 -N 2 -p gpu -t 4:00:00

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/modwt

R

library(dplyr)
library(reshape2)
library(tidyr)
library(wmtsa)
library(data.table)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica")
gatc <- read.table("y.lipolytica.gatc.20sliding.bed", header=F, sep="\t", stringsAsFactors = F)
#gene <- read.table("y.lipolytica.gene.20sliding.bed", header=F, sep="\t", stringsAsFactors = F)
gene <- read.table("y.lipolytica.gene.20sliding.coord.bed", header=F, sep="\t", stringsAsFactors = F)
structure <- read.table("y.lipolytica.20sliding.ViennaRNA.output.value.id.txt", header=T, sep="\t", stringsAsFactors = F)
nuc <- read.table("y.lipolytica.nucleotide_counts_20sliding_temp.txt", header=T, sep="\t", stringsAsFactors = F)
pam <- read.table("y.lipolytica.NGG.PAM.20bp.sliding.windows.bed", header=F, sep="\t", stringsAsFactors = F)
window <- read.table("y.lipolytica.20bp.sliding.bed", header=F, sep="\t", stringsAsFactors = F)
score <- read.table("y.lipolytica.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
colnames(score) <- c("chr", "start", "end", "sgRNA", "sgRNAid", "cut.score", "seq")
score.df <- score[,c(1:3,5,6)]

gatc.bin <- gatc %>% group_by(V1, V2, V3) %>% mutate(gatc.count = n())
gatc.count <- unique(gatc.bin[,c(1:3,12)])

gene.bin <- gene %>% group_by(V1, V2, V3) %>% mutate(gene.count = n())
#gene.count <- unique(gene.bin[,c(1:3,14)])
gene.count <- unique(gene.bin)

pam.bin <- pam %>% group_by(V1, V2, V3) %>% mutate(pam.count = n())
pam.count <- unique(pam.bin[,c(1:3,12)])

window.v <- window[,1:3]
colnames(window.v) <- c("V1", "V2", "V3")
gatc.win <- left_join(window.v, gatc.count, by=c("V1", "V2", "V3"))
gatc.win[is.na(gatc.win)] <- 0
gene.win <- left_join(window.v, gene.count, by=c("V1", "V2", "V3"))
gene.win[is.na(gene.win)] <- 0
pam.win <- left_join(window.v, pam.count, by=c("V1", "V2", "V3"))
pam.win[is.na(pam.win)] <- 0

gene.df <- gene.win$gene.count
gatc.df <- gatc.win$gatc.count
pam.df <- pam.win$pam.count

structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/modwt")

temp.modwt <- wavMODWT(temp.df, wavelet="haar")
temp.modwt.df <- as.matrix(temp.modwt)
temp.modwt.label <- data.frame(label = row.names(temp.modwt.df), temp.modwt.df)
temp.modwt.dt <- as.data.table(temp.modwt.label)
temp.modwt.name <- temp.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(temp.modwt.name) <- c("label", "temp.dwt", "scale", "window")
write.table(temp.modwt.name, "temp.modwt.haar.txt", quote=F, row.names=F, sep="\t")

gc.modwt <- wavMODWT(gc.df, wavelet="haar")
gc.modwt.df <- as.matrix(gc.modwt)
gc.modwt.label <- data.frame(label = row.names(gc.modwt.df), gc.modwt.df)
gc.modwt.dt <- as.data.table(gc.modwt.label)
gc.modwt.name <- gc.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(gc.modwt.name) <- c("label", "gc.dwt", "scale", "window")
write.table(gc.modwt.name, "gc.modwt.haar.txt", quote=F, row.names=F, sep="\t")

structure.modwt <- wavMODWT(structure.df, wavelet="haar")
structure.modwt.df <- as.matrix(structure.modwt)
structure.modwt.label <- data.frame(label = row.names(structure.modwt.df), structure.modwt.df)
structure.modwt.dt <- as.data.table(structure.modwt.label)
structure.modwt.name <- structure.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(structure.modwt.name) <- c("label", "structure.dwt", "scale", "window")
write.table(structure.modwt.name, "structure.modwt.haar.txt", quote=F, row.names=F, sep="\t")

gene.modwt <- wavMODWT(gene.df, wavelet="haar")
gene.modwt.df <- as.matrix(gene.modwt)
gene.modwt.label <- data.frame(label = row.names(gene.modwt.df), gene.modwt.df)
gene.modwt.dt <- as.data.table(gene.modwt.label)
gene.modwt.name <- gene.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(gene.modwt.name) <- c("label", "gene.dwt", "scale", "window")
write.table(gene.modwt.name, "gene.density.modwt.haar.txt", quote=F, row.names=F, sep="\t")

gatc.modwt <- wavMODWT(gatc.df, wavelet="haar")
gatc.modwt.df <- as.matrix(gatc.modwt)
gatc.modwt.label <- data.frame(label = row.names(gatc.modwt.df), gatc.modwt.df)
gatc.modwt.dt <- as.data.table(gatc.modwt.label)
gatc.modwt.name <- gatc.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(gatc.modwt.name) <- c("label", "gatc.dwt", "scale", "window")
write.table(gatc.modwt.name, "gatc.density.modwt.haar.txt", quote=F, row.names=F, sep="\t")

pam.modwt <- wavMODWT(pam.df, wavelet="haar")
pam.modwt.df <- as.matrix(pam.modwt)
pam.modwt.label <- data.frame(label = row.names(pam.modwt.df), pam.modwt.df)
pam.modwt.dt <- as.data.table(pam.modwt.label)
pam.modwt.name <- pam.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(pam.modwt.name) <- c("label", "pam.dwt", "scale", "window")
write.table(pam.modwt.name, "pam.density.modwt.haar.txt", quote=F, row.names=F, sep="\t")

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica/modwt")
temp.modwt.name <- read.delim("temp.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
gc.modwt.name <- read.delim("gc.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
structure.modwt.name <- read.delim("structure.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
gene.modwt.name <- read.delim("gene.density.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
gatc.modwt.name <- read.delim("gatc.density.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
pam.modwt.name <- read.delim("pam.density.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica")
window <- read.table("y.lipolytica.20bp.sliding.bed", header=F, sep="\t", stringsAsFactors = F)
score <- read.table("y.lipolytica.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
colnames(score) <- c("chr", "start", "end", "sgRNA", "sgRNAid", "cut.score", "seq")
score.df <- score[,c(1:3,5,6)]

colnames(window) <- c("chr", "start", "end")
window$window <- seq.int(nrow(window))
window$window <- as.character(window$window-1)
window$start <- as.numeric(window$start)
window$end <- as.numeric(window$end - 1)

window.score.df <- left_join(score.df, window, by=c("chr", "start", "end"))
window.score.df$window <- as.integer(window.score.df$window)
window.score.temp <- left_join(window.score.df, temp.modwt.name[,c(3,4,2)], by="window")
window.temp.gc <- left_join(window.score.temp, gc.modwt.name[,c(3,4,2)], by=c("window", "scale"))
window.temp.gc.structure <- left_join(window.temp.gc, structure.modwt.name[,c(3,4,2)], by=c("window", "scale"))
window.temp.gc.structure.gene <- left_join(window.temp.gc.structure, gene.modwt.name[,c(3,4,2)], by=c("window", "scale"))
window.temp.gc.structure.gene.gatc <- left_join(window.temp.gc.structure.gene, gatc.modwt.name[,c(3,4,2)], by=c("window", "scale"))
window.temp.gc.structure.gene.gatc.pam <- left_join(window.temp.gc.structure.gene.gatc, pam.modwt.name[,c(3,4,2)], by=c("window", "scale"))
nrow(window.temp.gc.structure.gene.gatc.pam)
# 
window.temp.gc.structure.gene.gatc.pam.sgRNA <- subset(window.temp.gc.structure.gene.gatc.pam, window.temp.gc.structure.gene.gatc.pam$cut.score != "NA")
nrow(window.temp.gc.structure.gene.gatc.pam)
# 
write.table(window.temp.gc.structure.gene.gatc.pam.sgRNA, "y.lipolytica.20sliding.exact.DWT.haar.txt", quote=F, row.names=F, sep="\t")

df.melt <- melt(window.temp.gc.structure.gene.gatc.pam.sgRNA[,c(4,5,7:ncol(window.temp.gc.structure.gene.gatc.pam.sgRNA))], id=c("cut.score", "scale", "sgRNAid"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAid", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAid + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast[is.na(df.dcast)] <- 0
df.dcast.na <- na.omit(df.dcast)
nrow(df.dcast.na)
# 45271
write.table(df.dcast.na, "y.lipolytica.20sliding.exact.DWT.haar.dcast.txt", quote=F, row.names=F, sep="\t")

Raw + DWT matrix

# combine regional DWT with other features
library(tidyr)
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
df.dcast.na <- read.delim("y.lipolytica.20sliding.exact.DWT.haar.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
names(df.dcast.na)[names(df.dcast.na) == 'sgRNAid'] <- 'sgRNAID'

df <- read.delim("y.lipolytica.raw.onehot.tensor.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df <- df[,c(1,1656,3:1649,1651:1655,1657)]
nrow(df)
# 45271

df.region <- inner_join(df, df.dcast.na[,c(1,3:ncol(df.dcast.na))], by=c("sgRNAID"))
nrow(df.region)
# 45271

write.table(df.region, "y.lipolytica.20sliding.raw.onehot.tensor.dwt.dcast.txt", quote=F, row.names=F, sep="\t")
iRF
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J y.lipolytica.iRF
#SBATCH -N 1
#SBATCH -t 10:00:00
#SBATCH --mem-per-cpu=0
#SBATCH -p gpu

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save
R CMD BATCH iRF.test.R

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.test.sh
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

R

library(ranger)

iRF <- function(xmat, y, ntree=200, iter=5, classification=F, threads=1, alwayssplits=NULL, saveall=T)
{
  tmp <- cbind(xmat, Y = y)
  wt <- rep(1/ncol(xmat), ncol(xmat)) # start with equal sample weighting per SNP
  rfs <- list()
  for(i in 1:iter)
  {
    cat("\niRF iteration ",i,"\n")
    cat("=================\n")
    mtry = 0.5*sum(wt>0)
    rf <- ranger::ranger(dependent.variable.name = "Y", data = tmp, num.trees=ntree,
                         split.select.weights = wt, classification = classification,
                         mtry = mtry, importance = "impurity_corrected", num.threads=threads, write.forest = T,
                         always.split.variables = alwayssplits)
    wt        <- rf$variable.importance / sum(abs(rf$variable.importance)) # scale importance to range(0,1)
    wt[wt<0]  <- 0 # set negative weights to zero
    cat("mtry:  ", mtry, "\n")
    cat("prediction error:  ",rf$prediction.error,"\n")
    if(classification==FALSE) cat("r^2:   ",rf$r.squared,"\n")
    if(classification==TRUE) print(rf$confusion.matrix)
    cat("cor(y,yhat):   ",cor(rf$predictions,y),"\n")
    cat("SNPs with importance > 0:",sum(wt>0),"\n")
    if(saveall) rfs[[i]] <- rf
    if(sum(wt>0) < max(0.01*(ncol(xmat)-1), 10))
    {
      if(!saveall) rfs <- rf
      break
    }
  }
  return(rfs)
}


library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
df <- read.delim("y.lipolytica.20sliding.raw.onehot.tensor.dwt.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
set.seed(2458)
df.sample <- df[sample(nrow(df), 10000), ]

# sgRNAID: [,1]
# cut.score: [,2]
# one-hot independent: [,c(3:17,1645:1649,1651:1652,1654:1655)]
# one-hot dependent: [,c(18:57,120:139,202:221,284:303,366:385,448:467,530:549,612:631,694:713,776:795,920:943,1068:1087,1150:1169,1232:1251,1314:1333,1396:1415,1478:1497,1560:1579)]
# chemical tensors: [,c(58:119,140:201,222:283,304:365,386:447,468:529,550:611,632:693,714:775,796:919,944:1067,1088:1149,1170:1231,1252:1313,1334:1395,1416:1477,1498:1559,1580:1641)]
# raw (gc, structure, temp, gene.distance, pam.distance): [,c(1642:1644,1650,1653)]
# DWT : [,c(1656:1797)]


df.raw <- df.sample[,c(2,1642:1644,1650,1653)]
iRF(df.raw[,2:ncol(df.raw)], df.raw$cut.score)
# iRF iteration  1 
# =================
# mtry:   2.5 
# prediction error:   7.765089 
# r^2:    -0.01215432 
# cor(y,yhat):    0.0608681 
# SNPs with importance > 0: 2 

df.dwt <- df.sample[,c(2,1656:1797)]
iRF(df.dwt[,2:ncol(df.dwt)], df.dwt$cut.score)
# iRF iteration  5 
# =================
# mtry:   24 
# prediction error:   7.457404 
# r^2:    0.02795152 
# cor(y,yhat):    0.1798774 
# SNPs with importance > 0: 40 

df.onehot <- df.sample[,c(2,3:17,1645:1649,1651:1652,1654:1655,18:57,120:139,202:221,284:303,366:385,448:467,530:549,612:631,694:713,776:795,920:943,1068:1087,1150:1169,1232:1251,1314:1333,1396:1415,1478:1497,1560:1579)]
iRF(df.onehot[,2:ncol(df.onehot)], df.onehot$cut.score)
# iRF iteration  2 
# =================
# mtry:   106.5 
# prediction error:   7.143456 
# r^2:    0.06887356 
# cor(y,yhat):    0.2652687 
# SNPs with importance > 0: 149 

df.quantum <- df.sample[,c(2,58:119,140:201,222:283,304:365,386:447,468:529,550:611,632:693,714:775,796:919,944:1067,1088:1149,1170:1231,1252:1313,1334:1395,1416:1477,1498:1559,1580:1641)]
iRF(df.quantum[,2:ncol(df.quantum)], df.quantum$cut.score)
# iRF iteration  3 
# =================
# mtry:   164.5 
# prediction error:   7.420933 
# r^2:    0.03270536 
# cor(y,yhat):    0.1926085 
# SNPs with importance > 0: 189 

df.raw.dwt <- cbind(df.raw, df.dwt[,2:ncol(df.dwt)])
iRF(df.raw.dwt[,2:ncol(df.raw.dwt)], df.raw.dwt$cut.score)
# iRF iteration  5 
# =================
# mtry:   22 
# prediction error:   7.444199 
# r^2:    0.02967267 
# cor(y,yhat):    0.1882058 
# SNPs with importance > 0: 35 

df.raw.onehot <- cbind(df.raw, df.onehot[,2:ncol(df.onehot)])
iRF(df.raw.onehot[,2:ncol(df.raw.onehot)], df.raw.onehot$cut.score)
# iRF iteration  4 
# =================
# mtry:   61 
# prediction error:   7.112543 
# r^2:    0.07290298 
# cor(y,yhat):    0.2733014 
# SNPs with importance > 0: 108 

df.raw.quantum <- cbind(df.raw, df.quantum[,2:ncol(df.quantum)])
iRF(df.raw.quantum[,2:ncol(df.raw.quantum)], df.raw.quantum$cut.score)
# iRF iteration  4 
# =================
# mtry:   105 
# prediction error:   7.344396 
# r^2:    0.04268167 
# cor(y,yhat):    0.2127093 
# SNPs with importance > 0: 164 

df.onehot.dwt <- cbind(df.onehot, df.dwt[,2:ncol(df.dwt)])
iRF(df.onehot.dwt[,2:ncol(df.onehot.dwt)], df.onehot.dwt$cut.score)
# iRF iteration  3
# =================
# mtry:   118 
# prediction error:   7.091331 
# r^2:    0.07566788 
# cor(y,yhat):    0.2752033 
# SNPs with importance > 0: 165 

df.onehot.quantum <- cbind(df.onehot, df.quantum[,2:ncol(df.quantum)])
iRF(df.onehot.quantum[,2:ncol(df.onehot.quantum)], df.onehot.quantum$cut.score)
# iRF iteration  4 
# =================
# mtry:   126 
# prediction error:   7.119273 
# r^2:    0.07202576 
# cor(y,yhat):    0.2690378 
# SNPs with importance > 0: 174 

df.quantum.dwt <- cbind(df.quantum, df.dwt[,2:ncol(df.dwt)])
iRF(df.quantum.dwt[,2:ncol(df.quantum.dwt)], df.quantum.dwt$cut.score)
# iRF iteration  4 
# =================
# mtry:   199 
# prediction error:   7.372495 
# r^2:    0.03901906 
# cor(y,yhat):    0.2007382 
# SNPs with importance > 0: 307 

df.raw.dwt.onehot <- cbind(df.raw, df.dwt[,2:ncol(df.dwt)], df.onehot.quantum[,2:ncol(df.onehot.quantum)])
iRF(df.raw.dwt.onehot[,2:ncol(df.raw.dwt.onehot)], df.raw.dwt.onehot$cut.score)
# iRF iteration  5 
# =================
# mtry:   140 
# prediction error:   7.054261 
# r^2:    0.08049979 
# cor(y,yhat):    0.2840999 
# SNPs with importance > 0: 221 

df.raw.dwt.quantum <- cbind(df.raw, df.dwt[,2:ncol(df.dwt)], df.quantum[,2:ncol(df.quantum)])
iRF(df.raw.dwt.quantum[,2:ncol(df.raw.dwt.quantum)], df.raw.dwt.quantum$cut.score)
# iRF iteration  4 
# =================
# mtry:   203.5 
# prediction error:   7.31899 
# r^2:    0.0459933 
# cor(y,yhat):    0.2160629 
# SNPs with importance > 0: 309 

df.raw.onehot.quantum <- cbind(df.raw, df.onehot[,2:ncol(df.onehot)], df.quantum[,2:ncol(df.quantum)])
iRF(df.raw.onehot.quantum[,2:ncol(df.raw.onehot.quantum)], df.raw.onehot.quantum$cut.score)
# iRF iteration  5 
# =================
# mtry:   106.5 
# prediction error:   7.068021 
# r^2:    0.07870629 
# cor(y,yhat):    0.2818915 
# SNPs with importance > 0: 156 

df.dwt.onehot.quantum <- cbind(df.dwt, df.onehot[,2:ncol(df.onehot)], df.quantum[,2:ncol(df.quantum)])
iRF(df.dwt.onehot.quantum[,2:ncol(df.dwt.onehot.quantum)], df.dwt.onehot.quantum$cut.score)
# iRF iteration  4 
# =================
# mtry:   213.5 
# prediction error:   7.062192 
# r^2:    0.07946603 
# cor(y,yhat):    0.282356 
# SNPs with importance > 0: 307 

df.all <- cbind(df.dwt, df.onehot[,2:ncol(df.onehot)], df.raw[,2:ncol(df.raw)], df.quantum[,2:ncol(df.quantum)])
iRF(df.all[,2:ncol(df.all)], df.all$cut.score)
# iRF iteration  5 
# =================
# mtry:   154 
# prediction error:   7.026276 
# r^2:    0.08414762 
# cor(y,yhat):    0.2901653 
# SNPs with importance > 0: 227 
add in kmer encoding
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/
cut -f 1,3 baisya2021.txt > y.lipolytica.noscore.txt
python ../kmer1_positional_encode.py y.lipolytica.noscore.txt
python ../kmer2_positional_encode.py y.lipolytica.noscore.txt
python ../kmer3_positional_encode.py y.lipolytica.noscore.txt
python ../kmer4_positional_encode.py y.lipolytica.noscore.txt


# separate nucleotide sequence values into individual columns in data frame so each position counts as one feature
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/

sed '1d' y.lipolytica.noscore_dependent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > y.lipolytica_dep1.txt
sed '1d' y.lipolytica.noscore_dependent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > y.lipolytica_dep2.txt
sed '1d' y.lipolytica.noscore_dependent3.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > y.lipolytica_dep3.txt
sed '1d' y.lipolytica.noscore_dependent4.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > y.lipolytica_dep4.txt
# salloc -A SYB105 -N 2 -p gpu -t 4:00:00

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
score <- read.delim("y.lipolytica.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- unique(score[,c(5,7)])
colnames(score.df) <- c("sgRNAID", "cut.score")

onehot.dep1 <- read.delim("y.lipolytica_dep1.txt", header=F, sep=" ")
onehot.dep2 <- read.delim("y.lipolytica_dep2.txt", header=F, sep=" ")
onehot.dep3 <- read.delim("y.lipolytica_dep3.txt", header=F, sep=" ")
onehot.dep4 <- read.delim("y.lipolytica_dep4.txt", header=F, sep=" ")
colnames(onehot.dep1)[1] <- "sgRNAID"
colnames(onehot.dep2)[1] <- "sgRNAID"
colnames(onehot.dep3)[1] <- "sgRNAID"
colnames(onehot.dep4)[1] <- "sgRNAID"

onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)-1], onehot.dep2[,1:ncol(onehot.dep2)-1], by="sgRNAID")
onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)-1], by="sgRNAID")
onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)-1], by="sgRNAID")
onehot.score <- full_join(score.df, onehot.dep, by="sgRNAID")

df.melt <- melt(onehot.score, id=c("cut.score", "sgRNAID"))
df <- na.omit(df.melt)

colnames(df) <- c("cut.score", "sgRNAID", "variable", "value")

df$value <- as.numeric(df$value)
df.id <- df[!(is.na(df$value) | df$value==""), ]
colnames(df.id) <- c("cut.score", "sgRNAID", "feature", "value")

df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
write.table(df.dcast, "y.lipolytica.kmer.encoding.txt", quote=F, row.names=F, sep="\t")
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J iRF.onehot.kmer
#SBATCH -N 1
#SBATCH -t 10:00:00
#SBATCH --mem-per-cpu=0

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica
R CMD BATCH iRF.onehot.kmer.R

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.onehot.kmer.sh
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

R

library(ranger)

iRF <- function(xmat, y, ntree=200, iter=5, classification=F, threads=1, alwayssplits=NULL, saveall=T)
{
  tmp <- cbind(xmat, Y = y)
  wt <- rep(1/ncol(xmat), ncol(xmat)) # start with equal sample weighting per SNP
  rfs <- list()
  for(i in 1:iter)
  {
    cat("\niRF iteration ",i,"\n")
    cat("=================\n")
    mtry = 0.5*sum(wt>0)
    rf <- ranger::ranger(dependent.variable.name = "Y", data = tmp, num.trees=ntree,
                         split.select.weights = wt, classification = classification,
                         mtry = mtry, importance = "impurity_corrected", num.threads=threads, write.forest = T,
                         always.split.variables = alwayssplits)
    wt        <- rf$variable.importance / sum(abs(rf$variable.importance)) # scale importance to range(0,1)
    wt[wt<0]  <- 0 # set negative weights to zero
    cat("mtry:  ", mtry, "\n")
    cat("prediction error:  ",rf$prediction.error,"\n")
    if(classification==FALSE) cat("r^2:   ",rf$r.squared,"\n")
    if(classification==TRUE) print(rf$confusion.matrix)
    cat("cor(y,yhat):   ",cor(rf$predictions,y),"\n")
    cat("SNPs with importance > 0:",sum(wt>0),"\n")
    if(saveall) rfs[[i]] <- rf
    if(sum(wt>0) < max(0.01*(ncol(xmat)-1), 10))
    {
      if(!saveall) rfs <- rf
      break
    }
  }
  return(rfs)
}


library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
df <- read.delim("y.lipolytica.kmer.encoding.txt", header=T, sep="\t", stringsAsFactors = F)
set.seed(2458)
df <- df[sample(nrow(df), 10000), ]

# kmer = 1
df.1 <- df[,c(2:82)]
iRF(df.1[,2:ncol(df.1)], df.1$cut.score)

# kmer = 2
df.2 <- df[,c(2,83:386)]
iRF(df.2[,2:ncol(df.2)], df.2$cut.score)

# kmer = 3
df.3 <- df[,c(2,387:1538)]
iRF(df.3[,2:ncol(df.3)], df.3$cut.score)

# kmer = 4
df.4 <- df[,c(2,1539:5890)]
iRF(df.4[,2:ncol(df.4)], df.4$cut.score)

# kmer = 1 + 2
df.1.2 <- df[,c(2:386)]
iRF(df.1.2[,2:ncol(df.1.2)], df.1.2$cut.score)

# kmer = 1 + 2 + 3
df.1.2.3 <- df[,c(2:1538)]
iRF(df.1.2.3[,2:ncol(df.1.2.3)], df.1.2.3$cut.score)

# kmer = 1 + 2 + 3 + 4
df.1.2.3.4 <- df[,c(2:5890)]
iRF(df.1.2.3.4[,2:ncol(df.1.2.3.4)], df.1.2.3.4$cut.score)
try with normalized scores
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
lipolytica <- read.delim("y.lipolytica.raw.onehot.tensor.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
lipolytica <- lipolytica[,c(1,1656,3:1649,1651:1655,1657)]
ncol(lipolytica)
# 1655
nrow(lipolytica)
# 45271
lipolytica.num <- mutate_all(lipolytica[,1:ncol(lipolytica)], function(x) as.numeric(as.character(x)))
lipolytica.num$cut.score <- (lipolytica.num$cut.score - min(lipolytica.num$cut.score)) / (max(lipolytica.num$cut.score) - min(lipolytica.num$cut.score))
df <- lipolytica.num

set.seed(2458)
df.sample <- df[sample(nrow(df), 10000), ]

library(ranger)
iRF <- function(xmat, y, ntree=200, iter=5, classification=F, threads=1, alwayssplits=NULL, saveall=T)
{
  tmp <- cbind(xmat, Y = y)
  wt <- rep(1/ncol(xmat), ncol(xmat)) # start with equal sample weighting per SNP
  rfs <- list()
  for(i in 1:iter)
  {
    cat("\niRF iteration ",i,"\n")
    cat("=================\n")
    mtry = 0.5*sum(wt>0)
    rf <- ranger::ranger(dependent.variable.name = "Y", data = tmp, num.trees=ntree,
                         split.select.weights = wt, classification = classification,
                         mtry = mtry, importance = "impurity_corrected", num.threads=threads, write.forest = T,
                         always.split.variables = alwayssplits)
    wt        <- rf$variable.importance / sum(abs(rf$variable.importance)) # scale importance to range(0,1)
    wt[wt<0]  <- 0 # set negative weights to zero
    cat("mtry:  ", mtry, "\n")
    cat("prediction error:  ",rf$prediction.error,"\n")
    if(classification==FALSE) cat("r^2:   ",rf$r.squared,"\n")
    if(classification==TRUE) print(rf$confusion.matrix)
    cat("cor(y,yhat):   ",cor(rf$predictions,y),"\n")
    cat("SNPs with importance > 0:",sum(wt>0),"\n")
    if(saveall) rfs[[i]] <- rf
    if(sum(wt>0) < max(0.01*(ncol(xmat)-1), 10))
    {
      if(!saveall) rfs <- rf
      break
    }
  }
  return(rfs)
}

# sgRNAID: [,1]
# cut.score: [,2]
# one-hot independent: [,c(3:17,1645:1649,1651:1652,1654:1655)]
# one-hot dependent: [,c(18:57,120:139,202:221,284:303,366:385,448:467,530:549,612:631,694:713,776:795,920:943,1068:1087,1150:1169,1232:1251,1314:1333,1396:1415,1478:1497,1560:1579)]
# chemical tensors: [,c(58:119,140:201,222:283,304:365,386:447,468:529,550:611,632:693,714:775,796:919,944:1067,1088:1149,1170:1231,1252:1313,1334:1395,1416:1477,1498:1559,1580:1641)]
# raw (gc, structure, temp, gene.distance, pam.distance): [,c(1642:1644,1650,1653)]

df.raw <- df.sample[,c(2,1642:1644,1650,1653)]
iRF(df.raw[,2:ncol(df.raw)], df.raw$cut.score)
# iRF iteration  2 
# =================
# mtry:   286 
# prediction error:   0.02513118 
# r^2:    0.03452986 
# cor(y,yhat):    0.1976886 
# SNPs with importance > 0: 314 

df.onehot <- df.sample[,c(2,3:17,1645:1649,1651:1652,1654:1655,18:57,120:139,202:221,284:303,366:385,448:467,530:549,612:631,694:713,776:795,920:943,1068:1087,1150:1169,1232:1251,1314:1333,1396:1415,1478:1497,1560:1579)]
iRF(df.onehot[,2:ncol(df.onehot)], df.onehot$cut.score)
# iRF iteration  4 
# =================
# mtry:   57.5 
# prediction error:   0.024432 
# r^2:    0.06139041 
# cor(y,yhat):    0.2574278 
# SNPs with importance > 0: 92 

df.quantum <- df.sample[,c(2,58:119,140:201,222:283,304:365,386:447,468:529,550:611,632:693,714:775,796:919,944:1067,1088:1149,1170:1231,1252:1313,1334:1395,1416:1477,1498:1559,1580:1641)]
iRF(df.quantum[,2:ncol(df.quantum)], df.quantum$cut.score)
# iRF iteration  4 
# =================
# mtry:   95 
# prediction error:   0.02534101 
# r^2:    0.02646891 
# cor(y,yhat):    0.1749871 
# SNPs with importance > 0: 102 

df.raw.onehot <- cbind(df.raw, df.onehot[,2:ncol(df.onehot)])
iRF(df.raw.onehot[,2:ncol(df.raw.onehot)], df.raw.onehot$cut.score)

df.raw.quantum <- cbind(df.raw, df.quantum[,2:ncol(df.quantum)])
iRF(df.raw.quantum[,2:ncol(df.raw.quantum)], df.raw.quantum$cut.score)

df.onehot.quantum <- cbind(df.onehot, df.quantum[,2:ncol(df.quantum)])
iRF(df.onehot.quantum[,2:ncol(df.onehot.quantum)], df.onehot.quantum$cut.score)

df.raw.onehot.quantum <- cbind(df.raw, df.onehot[,2:ncol(df.onehot)], df.quantum[,2:ncol(df.quantum)])
iRF(df.raw.onehot.quantum[,2:ncol(df.raw.onehot.quantum)], df.raw.onehot.quantum$cut.score)

DONE RUNNING iRF - summit

library(dplyr)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
df <- read.delim("y.lipolytica.raw.onehot.tensor.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df <- df[,c(1,1656,3:1649,1651:1655,1657)]
write.table(df, "y.lipolytica.raw.onehot.tensor.pam.location.dcast.txt", quote=F, row.names=F, sep="\t")
ncol(df)
# 1655
df.num <- mutate_all(df[,2:ncol(df)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"

write.table(df.all[,c(1,3:ncol(df.all))], "y.lipolytica.raw.onehot.tensor.pam.location.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "y.lipolytica.raw.onehot.tensor.pam.location.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "y.lipolytica.raw.onehot.tensor.pam.location.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "y.lipolytica.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "y.lipolytica.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.all[,2]), "y.lipolytica.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
df <- read.delim("y.lipolytica.20sliding.raw.onehot.tensor.dwt.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
ncol(df)
# 
df.num <- mutate_all(df[,2:ncol(df)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"

write.table(df.all[,c(1,3:ncol(df.all))], "y.lipolytica.raw.onehot.tensor.pam.location.dwt.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "y.lipolytica.raw.onehot.tensor.pam.location.dwt.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "y.lipolytica.raw.onehot.tensor.pam.location.dwt.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
module load python/3.7-anaconda3

#mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run
#mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.noDWT
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.noDWT
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName y.lipolytica.noDWT --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.raw.onehot.tensor.pam.location.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.score.txt

#mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.DWT
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.DWT
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName y.lipolytica.DWT --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.raw.onehot.tensor.pam.location.dwt.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.noDWT/Submits/submit_full_y.lipolytica.noDWT_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.DWT/Submits/submit_full_y.lipolytica.DWT_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.noDWT/Submits/submit_train_y.lipolytica.noDWT_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.DWT/Submits/submit_train_y.lipolytica.DWT_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.noDWT/Submits/submit_test_y.lipolytica.noDWT_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.DWT/Submits/submit_test_y.lipolytica.DWT_0.sh

# Andes
module load python/3.7-anaconda3

vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.noDWT
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/YNames.txt y.lipolytica.noDWT
# 0.09210298017671817
sort -k3rg topVarEdges/cut.score_top95.txt | head
# sgRNA.structuresgRNA.raw  cut.score   0.07789451487981215
# TTsgRNA.raw   cut.score   0.07174965288226201
# gene.distance0    cut.score   0.04313731815364106
# pam.distance0 cut.score   0.038925709394903266
# CGsgRNA.raw   cut.score   0.03330233677130654
# AAsgRNA.raw   cut.score   0.029370519036537104
# TsgRNA.raw    cut.score   0.024903402499592636
# GsgRNA.raw    cut.score   0.02376442487627081
# AsgRNA.raw    cut.score   0.021933041187672975
# GCsgRNA.raw   cut.score   0.021531833934896223
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/y.lipolytica.noDWT_cut.score.importance4 | head
# sgRNA.structuresgRNA.raw: 9447.4
# TTsgRNA.raw: 8342.87
# gene.distance0: 4978.74
# pam.distance0: 4663.46
# CGsgRNA.raw: 3942.05
# AAsgRNA.raw: 3512.91
# TsgRNA.raw: 2977.86
# GCsgRNA.raw: 2885.99
# GsgRNA.raw: 2808.87
# AsgRNA.raw: 2691.89

# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.noDWT/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("y.lipolytica.noDWT_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.3180251


cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.DWT
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/YNames.txt y.lipolytica.DWT
# 
sort -k3rg topVarEdges/cut.score_top95.txt | head
 
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/y.lipolytica.save.DWT_cut.score.importance4 | head


# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.DWT/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("y.lipolytica.DWT_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 
SHAP
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate shap

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save

# python
import pandas as pd
import numpy as np
np.random.seed(0)
import matplotlib.pyplot as plt
df = pd.read_table('/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.raw.onehot.tensor.pam.location.dcast.txt') # Load the data
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.ensemble import RandomForestRegressor
# The target variable is 'cut.score'.
Y = df['cut.score']
# get list of features from R... dput(colnames(df))
X = df.drop(columns =['sgRNAID', 'cut.score'])

# Split the data into train and test data:
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size = 0.2)
# Build the model with the random forest regression algorithm:
model = RandomForestRegressor(max_depth=6,random_state=0,n_estimators=10)
model.fit(X_train, Y_train)

import shap
shap_values = shap.TreeExplainer(model).shap_values(X_train)
f = plt.figure()
shap.summary_plot(shap_values, X_train, plot_type="bar")
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/baisya.noDWT.16dec.shap_summary_plot_bar.png", bbox_inches='tight', dpi=600)

import matplotlib.pyplot as plt
f = plt.figure()
shap.summary_plot(shap_values, X_train)
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/baisya.noDWT.16dec.shap_summary_plot_varimp.png", bbox_inches='tight', dpi=600)

# scp noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/baisya.noDWT.16dec.shap_summary_plot_varimp.png /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP/.

18 January

  • matrix including raw values, positional encoding kmers, quantum tensors (singleton, basepair, dimer)
# positional encoding kmers 1-4
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer1_positional_encode.py y.lipolytica.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer2_positional_encode.py y.lipolytica.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer3_positional_encode.py y.lipolytica.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer4_positional_encode.py y.lipolytica.noscore.txt


# separate nucleotide sequence values into individual columns in data frame so each position counts as one feature
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/

sed '1d' y.lipolytica.noscore_dependent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > y.lipolytica_dep1.txt
sed '1d' y.lipolytica.noscore_dependent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > y.lipolytica_dep2.txt
sed '1d' y.lipolytica.noscore_dependent3.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > y.lipolytica_dep3.txt
sed '1d' y.lipolytica.noscore_dependent4.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > y.lipolytica_dep4.txt
# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

R
library(dplyr)
library(reshape2)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("protein_rna_dna-vector_lee_nucleotide_dna_data_15dec.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/")
seq <- read.delim("y.lipolytica.sequence.txt", header=T, sep=" ", stringsAsFactors = F)

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:5]
tensor.t <- as.data.frame(t(tensor.df[63:70,]))
tensor.t$base <- c("A", "C", "G", "T")

rownames(seq) <- seq[,1]
seq.df <- seq[,2:21]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))

seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "y.lipolytica.tensors.single.bp.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "y.lipolytica.tensors.single.bp.melt.txt", quote=F, row.names=F, sep="\t")
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J jan18.matrix
#SBATCH -N 4
#SBATCH -t 10:00:00

module load python
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save
R CMD BATCH jan18.matrix.R
R CMD BATCH jan18.matrix.2.R

#sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/jan18.matrix.sh
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
structure <- read.delim("y.lipolytica.gRNA.ViennaRNA.output.value.id.txt", header=F, sep="\t", stringsAsFactors = F)
nuc <- read.delim("y.lipolytica.nucleotide_counts_sgRNA_temp.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("y.lipolytica.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(5:6)]
colnames(score.df) <- c("sgRNAID", "cut.score")

structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]

# strucutre, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])

structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"

structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]

structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score.df, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")

## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
onehot.ind1 <- read.delim("y.lipolytica_ind1.txt", header=T, sep=" ")
onehot.ind2 <- read.delim("y.lipolytica_ind2.txt", header=T, sep=" ")
onehot.dep1 <- read.delim("y.lipolytica_dep1.txt", header=F, sep=" ")
onehot.dep2 <- read.delim("y.lipolytica_dep2.txt", header=F, sep=" ")
onehot.dep3 <- read.delim("y.lipolytica_dep3.txt", header=F, sep=" ")
onehot.dep4 <- read.delim("y.lipolytica_dep4.txt", header=F, sep=" ")
colnames(onehot.dep1)[1] <- "sgRNAID"
colnames(onehot.dep2)[1] <- "sgRNAID"
colnames(onehot.dep3)[1] <- "sgRNAID"
colnames(onehot.dep4)[1] <- "sgRNAID"

onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)-1], onehot.dep2[,1:ncol(onehot.dep2)-1], by="sgRNAID")
onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)-1], by="sgRNAID")
onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)-1], by="sgRNAID")

onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"

data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))

df.melt <- melt(data.onehot, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
write.table(df.id, "y.lipolytica.structure.temp.gc.onehot1to4.txt", quote=F, row.names=F, sep="\t")

# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
tensor <- read.delim("y.lipolytica.tensors.single.bp.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0

tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0
write.table(tensor.id, "tensor.id.test", quote=F, row.names=F, sep="\t")

tensor.id <- read.delim("tensor.id.test", header=T, sep="\t")
df.id <- read.delim("y.lipolytica.structure.temp.gc.onehot1to4.txt", header=T, sep="\t")
score <- read.delim("y.lipolytica.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(5:6)]
colnames(score.df) <- c("sgRNAID", "cut.score")

df.score <- unique(df.id[,c(1,3)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]

head(df.id)
head(tensor.score.order)
tensor.df <- rbind(df.id, tensor.score.order)

df.dcast <- tensor.df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
write.table(df.dcast.na, "y.lipolytica.raw.onehot.tensor.single.bp.dcast.na.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast.na)
# 105531

# pam (distance and nucleotide)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
sgRNA.pam <- read.table("y.lipolytica.sgRNA.closestPAM.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.pam.sub <- sgRNA.pam[,c(4,12,13)]
colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]
#sgRNA.pam.df$id <- "Cas9"
#sgRNA.pam.id <- unite(sgRNA.pam.df, "sgRNAID", c(sgRNAID, id), sep="_")

score <- read.delim("y.lipolytica.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(5:6)]
colnames(score.df) <- c("sgRNAID", "cut.score")

score.location <- left_join(score.df, sgRNA.pam.df, by="sgRNAID")
score.location$scale <- 0

df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)

df <- read.delim("y.lipolytica.raw.onehot.tensor.single.bp.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)

df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
nrow(df.location)
# 45271


# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
sgRNA.genes <- read.table("y.lipolytica.sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- sgRNA.genes[,c(4,14)]
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")
#sgRNA.genes.df$id <- "Cas9"
#sgRNA.genes.id <- unite(sgRNA.genes.df, "sgRNAID", c(sgRNAID, id), sep="_")

score.location <- left_join(score.df, sgRNA.genes.df, by=c("sgRNAID"))
score.location$scale <- 0

df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)

df <- df.location
df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
nrow(df.location)
# 45271

write.table(df.location, "y.lipolytica.raw.onehot.tensor.single.bp.pam.location.dcast.na.txt", quote=F, row.names=F, sep="\t")
# add new DNA/RNA dimer features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(tidyr)
library(reshape2)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("quantum_dimers_20dec.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
seq <- read.delim("y.lipolytica.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.dimer <- seq %>% unite("p1", p1:p2, remove=F, sep= "") %>% unite("p2", p2:p3, remove=F, sep= "") %>% unite("p3", p3:p4, remove=F, sep= "") %>% unite("p4", p4:p5, remove=F, sep= "") %>% unite("p5", p5:p6, remove=F, sep= "") %>% unite("p6", p6:p7, remove=F, sep= "") %>% unite("p7", p7:p8, remove=F, sep= "") %>% unite("p8", p8:p9, remove=F, sep= "") %>% unite("p9", p9:p10, remove=F, sep= "") %>% unite("p10", p10:p11, remove=F, sep= "") %>% unite("p11", p11:p12, remove=F, sep= "") %>% unite("p12", p12:p13, remove=F, sep= "") %>% unite("p13", p13:p14, remove=F, sep= "") %>% unite("p14", p14:p15, remove=F, sep= "") %>% unite("p15", p15:p16, remove=F, sep= "") %>% unite("p16", p16:p17, remove=F, sep= "") %>% unite("p17", p17:p18, remove=F, sep= "") %>% unite("p18", p18:p19, remove=F, sep= "") %>% unite("p19", p19:p20, remove=T, sep= "")

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:17]
tensor.t <- as.data.frame(t(tensor.df))
#tensor.t$base <- c("A", "C", "G", "T")
tensor.t$base <- names(tensor[,2:17])

rownames(seq) <- seq.dimer[,1]
seq.df <- seq.dimer[,2:20]
seq.melt <- melt(seq.dimer, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))

seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "y.lipolytica.tensors.dimers.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "y.lipolytica.tensors.dimers.melt.txt", quote=F, row.names=F, sep="\t")
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
df <- read.delim("y.lipolytica.raw.onehot.tensor.single.bp.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)

# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
tensor <- read.delim("y.lipolytica.tensors.dimers.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0

tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0

df.score <- unique(df[,c(1,2)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
colnames(tensor.score.order) <- c("cut.score", "feature.scale", "sgRNAID", "value")

df.dcast <- tensor.score.order %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
write.table(df.dcast.na, "y.lipolytica.raw.onehot.tensor.single.bp.dimers.dcast.na.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast.na)
# 45271

df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
write.table(df.location, "y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.txt", quote=F, row.names=F, sep="\t")
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
df <- read.delim("y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df <- df[,c(1:6073,6075:6079,6081,6083:6177)]
df.num <- mutate_all(df[,2:ncol(df)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"

write.table(df.all, "y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.txt", quote=F, row.names=F, sep="\t")

write.table(df.all[,c(1,3:ncol(df.all))], "y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.all[,2]), "y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

################### scores should actually be *(-1) ###################  
df.score <- data.frame(sgRNAID = df.all[,1], cut.score = df.all[,2]*-1)

write.table(df.score, "y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.score, "y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.score[,2]), "y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")




# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName y.lipolytica.tensor.single.bp.dimers --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers/Submits/submit_full_y.lipolytica.tensor.single.bp.dimers_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers/Submits/submit_train_y.lipolytica.tensor.single.bp.dimers_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers/Submits/submit_test_y.lipolytica.tensor.single.bp.dimers_0.sh

# Andes
module load python/3.7-anaconda3

vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/YNames.txt y.lipolytica.tensor.single.bp.dimers
# 0.09359164840323947
sort -k3rg topVarEdges/cut.score_top95.txt | head
# TTsgRNA.raw   cut.score   0.06175000685655969
# sgRNA.structuresgRNA.raw  cut.score   0.056176192213938325
# gene.distance0    cut.score   0.029570807617378663
# CGsgRNA.raw   cut.score   0.02770352475569353
# pam.distance0 cut.score   0.025529473995302983
# AAsgRNA.raw   cut.score   0.020337760753082006
# TsgRNA.raw    cut.score   0.01737628965366339
# GsgRNA.raw    cut.score   0.016720949937689043
# AsgRNA.raw    cut.score   0.015102514626684835
# GCsgRNA.raw   cut.score   0.014576723834117079

sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/y.lipolytica.tensor.single.bp.dimers_cut.score.importance4 | head
# TTsgRNA.raw: 7681.08
# sgRNA.structuresgRNA.raw: 7402.52
# gene.distance0: 3731.39
# CGsgRNA.raw: 3524.15
# pam.distance0: 3328.5
# AAsgRNA.raw: 2595.21
# TsgRNA.raw: 2259.45
# GCsgRNA.raw: 2173.4
# GsgRNA.raw: 2138.05
# AsgRNA.raw: 1998

# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("y.lipolytica.tensor.single.bp.dimers_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.3202335

### test different folds
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers/cut.score/foldRuns/fold1/Runs/Set4")
pred1 <- read.delim("y.lipolytica.tensor.single.bp.dimers_Set4_test.prediction", header=T, sep="\t")
y1 <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers/cut.score/foldRuns/fold3/Runs/Set4")
pred3 <- read.delim("y.lipolytica.tensor.single.bp.dimers_Set4_test.prediction", header=T, sep="\t")
y3 <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers/cut.score/foldRuns/fold6/Runs/Set4")
pred6 <- read.delim("y.lipolytica.tensor.single.bp.dimers_Set4_test.prediction", header=T, sep="\t")
y6 <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers/cut.score/foldRuns/fold9/Runs/Set4")
pred9 <- read.delim("y.lipolytica.tensor.single.bp.dimers_Set4_test.prediction", header=T, sep="\t")
y9 <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")

cor(y1$cut.score, pred1$Predictions.)
# 0.320635
cor(y3$cut.score, pred3$Predictions.)
# 0.3195123
cor(y6$cut.score, pred6$Predictions.)
# 0.3182949
cor(y9$cut.score, pred9$Predictions.)
# 0.3202335



### test different runs
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers/cut.score/foldRuns/fold9/Runs/Set1")
pred1 <- read.delim("y.lipolytica.tensor.single.bp.dimers_Set1_test.prediction", header=T, sep="\t")
y1 <- read.delim("set1_Y_test_noSampleIDs.txt", header=T, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers/cut.score/foldRuns/fold9/Runs/Set2")
pred2 <- read.delim("y.lipolytica.tensor.single.bp.dimers_Set2_test.prediction", header=T, sep="\t")
y2 <- read.delim("set2_Y_test_noSampleIDs.txt", header=T, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers/cut.score/foldRuns/fold9/Runs/Set3")
pred3 <- read.delim("y.lipolytica.tensor.single.bp.dimers_Set3_test.prediction", header=T, sep="\t")
y3 <- read.delim("set3_Y_test_noSampleIDs.txt", header=T, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers/cut.score/foldRuns/fold9/Runs/Set4")
pred4 <- read.delim("y.lipolytica.tensor.single.bp.dimers_Set4_test.prediction", header=T, sep="\t")
y4 <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")

cor(y1$cut.score, pred1$Predictions.)
# 0.2831959
cor(y2$cut.score, pred2$Predictions.)
# 0.320763
cor(y3$cut.score, pred3$Predictions.)
# 0.3148909
cor(y4$cut.score, pred4$Predictions.)
# 0.3202335



########## Output once cut.score values were multiplied by -1 ... 3 February 2022
module load python/3.7-anaconda3

vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/YNames.txt y.lipolytica.tensor.single.bp.dimers
# 0.09299333027101521
sort -k3rg topVarEdges/cut.score_top95.txt | head
# TTsgRNA.raw   cut.score   0.06175630698127264
# sgRNA.structuresgRNA.raw  cut.score   0.056315801993953016
# gene.distance0    cut.score   0.029906586186827133
# CGsgRNA.raw   cut.score   0.02788713025998322
# pam.distance0 cut.score   0.025801982379830196
# AAsgRNA.raw   cut.score   0.020488288199839153
# TsgRNA.raw    cut.score   0.017637269785951606
# GsgRNA.raw    cut.score   0.016757504360449742
# AsgRNA.raw    cut.score   0.01509203103807862
# GCsgRNA.raw   cut.score   0.01453627587711077

sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/y.lipolytica.tensor.single.bp.dimers_cut.score.importance4 | head
# TTsgRNA.raw: 7689.67
# sgRNA.structuresgRNA.raw: 7426.01
# gene.distance0: 3751.52
# CGsgRNA.raw: 3552.67
# pam.distance0: 3269.85
# AAsgRNA.raw: 2567.57
# TsgRNA.raw: 2270.38
# GCsgRNA.raw: 2179.87
# GsgRNA.raw: 2157.72
# AsgRNA.raw: 2007.38

# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("y.lipolytica.tensor.single.bp.dimers_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.3206221
RIT

** Need to compile the C++ file /gpfs/alpine/syb105/proj-shared/Personal/jromero/codesnippets/ritw **

  • run RIT on Cas9 model with all features
  • need to run arva-rit and then runRIT.sh (3 scripts)
  • two outputs: size effect and directionality
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00

#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score y.lipolytica.tensor.single.bp.dimers

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.tensor.single.bp.dimers/cut.score/RIT.run
# sort -k3rg y.lipolytica.tensor.single.bp.dimers_cut.score.importance4.effect > y.lipolytica.tensor.single.bp.dimers_cut.score.importance4.effect_sorted

library(dplyr)
library(tidyr)
library(reshape2)
library(ggplot2)
library(RColorBrewer)

setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP")
imp <- read.delim("y.lipolytica.tensor.single.bp.dimers_cut.score.importance4.effect_sorted", header=F, sep="\t", stringsAsFactors = F)
colnames(imp) <- c("Feature", "YVec", "NormEdge", "Effect", "Samples", "Linearity")
#imp$Normalized.Importance <- as.numeric(substr(imp$NormEdge, 0, 4))
imp$Normalized.Importance <- as.numeric(imp$NormEdge)
imp$Feature.Effect <- as.numeric(imp$Effect)
imp$SampleCount <- as.numeric(imp$Samples)
imp.dir <- imp %>% mutate(Effect.Direction = ifelse(Feature.Effect < 0, "neg", ifelse(Feature.Effect > 0, "pos", "zero")))
imp.dir.top20 <- imp.dir[1:20,]

ggplot(imp.dir.top20, aes(x=reorder(Feature, -Normalized.Importance))) + geom_bar(aes(y=Normalized.Importance, fill=Effect.Direction), stat="identity") + coord_flip() + xlab("") + ylab("Normalized Importance") + theme_classic() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position="bottom") + scale_fill_brewer(palette="Set1")

# wc -l set0_Y_train_noSampleIDs.txt <-- 36217
imp.dir.top20$Sample.Prop <- imp.dir.top20$SampleCount/36217
ggplot(imp.dir.top20, aes(x=reorder(Feature, -Normalized.Importance))) + geom_point(aes(y=Sample.Prop, color=Effect.Direction, size=Feature.Effect)) + xlab("") + ylab("Avg Proportion of Samples that Features Influence") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Set2")
SHAP
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate shap

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli

# python
import pandas as pd
import numpy as np
np.random.seed(0)
import matplotlib.pyplot as plt
df = pd.read_table('/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.txt') # Load the data
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.ensemble import RandomForestRegressor
# The target variable is 'cut.score'.
Y = df['cut.score']
# get list of features from R... dput(colnames(df))
X = df.drop(columns =['sgRNAID', 'cut.score'])

# Split the data into train and test data:
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size = 0.2)
# Build the model with the random forest regression algorithm:
model = RandomForestRegressor(max_depth=6,random_state=0,n_estimators=10)
model.fit(X_train, Y_train)

import shap
shap_values = shap.TreeExplainer(model).shap_values(X_train)
f = plt.figure()
shap.summary_plot(shap_values, X_train, plot_type="bar")
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.shap_summary_plot_bar.png", bbox_inches='tight', dpi=600)

import matplotlib.pyplot as plt
f = plt.figure()
shap.summary_plot(shap_values, X_train)
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.shap_summary_plot_varimp.png", bbox_inches='tight', dpi=600)

# scp noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.shap_summary_plot_varimp.png /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP/.

15 March 2022: Quantum Matrix

  • generate final matrix with updated quantum properties (HL and H-bond) for monomer, basepair, dimer, trimer, tetramer
  • think through incorporating DNA and RNA sequence?
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J y.lip.matrix
#SBATCH -N 1
#SBATCH -t 10:00:00
#SBATCH -p gpu

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save
R CMD BATCH mar15.matrix.R

#sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/mar15.matrix.sh
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
structure <- read.delim("y.lipolytica.gRNA.ViennaRNA.output.value.id.txt", header=F, sep="\t", stringsAsFactors = F)
nuc <- read.delim("y.lipolytica.nucleotide_counts_sgRNA_temp.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("y.lipolytica.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(5:7)]
colnames(score.df) <- c("sgRNAID", "cut.score", "nucleotide.sequence")

structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]

# strucutre, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])

structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"

structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]

structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score.df, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")

## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
onehot.ind1 <- read.delim("y.lipolytica_ind1.txt", header=T, sep=" ")
onehot.ind2 <- read.delim("y.lipolytica_ind2.txt", header=T, sep=" ")
onehot.dep1 <- read.delim("y.lipolytica_dep1.txt", header=F, sep=" ")
onehot.dep2 <- read.delim("y.lipolytica_dep2.txt", header=F, sep=" ")
onehot.dep3 <- read.delim("y.lipolytica_dep3.txt", header=F, sep=" ")
onehot.dep4 <- read.delim("y.lipolytica_dep4.txt", header=F, sep=" ")
colnames(onehot.dep1)[1] <- "sgRNAID"
colnames(onehot.dep2)[1] <- "sgRNAID"
colnames(onehot.dep3)[1] <- "sgRNAID"
colnames(onehot.dep4)[1] <- "sgRNAID"

onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)-1], onehot.dep2[,1:ncol(onehot.dep2)-1], by="sgRNAID")
onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)-1], by="sgRNAID")
onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)-1], by="sgRNAID")

onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"

data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))

df.melt <- melt(data.onehot[,c(1,2,4:ncol(data.onehot))], id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
write.table(df.id, "y.lipolytica.structure.temp.gc.onehot1to4.txt", quote=F, row.names=F, sep="\t")
# 

# pam (distance and nucleotide)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
sgRNA.pam <- read.table("y.lipolytica.sgRNA.closestPAM.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.pam.sub <- sgRNA.pam[,c(4,12,13)]
colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]
sgRNA.pam.id <- sgRNA.pam.df

score <- read.delim("y.lipolytica.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(5:6)]
colnames(score.df) <- c("sgRNAID", "cut.score")

score.location <- left_join(score.df, sgRNA.pam.id, by="sgRNAID")
score.location$scale <- 0

df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.pam.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)

df <- read.delim("y.lipolytica.structure.temp.gc.onehot1to4.txt", header=T, sep="\t")
df.onehot.dcast <- df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)

df.onehot.pam <- left_join(df.onehot.dcast, df.pam.dcast, by=c("sgRNAID"))

df.onehot.pam.na <- na.omit(df.onehot.pam)
nrow(df.onehot.pam.na)
# 


# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
sgRNA.genes <- read.table("y.lipolytica.sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- sgRNA.genes[,c(4,14)]
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")
#sgRNA.genes.df$id <- "Cas9"
#sgRNA.genes.id <- unite(sgRNA.genes.df, "sgRNAID", c(sgRNAID, id), sep="_")
sgRNA.genes.id <- sgRNA.genes.df

score.location <- left_join(score.df, sgRNA.genes.id, by=c("sgRNAID"))
score.location$scale <- 0

df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.location.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.location.dcast.na <- na.omit(df.location.dcast)

df.pam.location <- inner_join(df.location.dcast.na, df.onehot.pam.na, by=c("sgRNAID"))
nrow(df.pam.location)
# 45271

write.table(df.pam.location, "y.lipolytica.raw.matrix.txt", quote=F, row.names=F, sep="\t")
# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)

# Monomer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Monomer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/")
seq <- read.delim("y.lipolytica.sequence.txt", header=T, sep=" ", stringsAsFactors = F)

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq) <- seq[,1]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "y.lipolytica.quantum.monomer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "y.lipolytica.quantum.monomer.melt.txt", quote=F, row.names=F, sep="\t")


# Basepair
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Basepair.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/")
seq <- read.delim("y.lipolytica.sequence.txt", header=T, sep=" ", stringsAsFactors = F)

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq) <- seq[,1]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "y.lipolytica.quantum.basepair.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "y.lipolytica.quantum.basepair.melt.txt", quote=F, row.names=F, sep="\t")


# Dimer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Dimer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/")
seq <- read.delim("y.lipolytica.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.dimer <- seq %>% unite("p1", p1:p2, remove=F, sep= "") %>% unite("p2", p2:p3, remove=F, sep= "") %>% unite("p3", p3:p4, remove=F, sep= "") %>% unite("p4", p4:p5, remove=F, sep= "") %>% unite("p5", p5:p6, remove=F, sep= "") %>% unite("p6", p6:p7, remove=F, sep= "") %>% unite("p7", p7:p8, remove=F, sep= "") %>% unite("p8", p8:p9, remove=F, sep= "") %>% unite("p9", p9:p10, remove=F, sep= "") %>% unite("p10", p10:p11, remove=F, sep= "") %>% unite("p11", p11:p12, remove=F, sep= "") %>% unite("p12", p12:p13, remove=F, sep= "") %>% unite("p13", p13:p14, remove=F, sep= "") %>% unite("p14", p14:p15, remove=F, sep= "") %>% unite("p15", p15:p16, remove=F, sep= "") %>% unite("p16", p16:p17, remove=F, sep= "") %>% unite("p17", p17:p18, remove=F, sep= "") %>% unite("p18", p18:p19, remove=F, sep= "") %>% unite("p19", p19:p20, remove=T, sep= "")

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq.dimer) <- seq.dimer[,1]
seq.df <- seq.dimer[,1:20]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "y.lipolytica.quantum.dimer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "y.lipolytica.quantum.dimer.melt.txt", quote=F, row.names=F, sep="\t")


# Trimer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Trimer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/")
seq <- read.delim("y.lipolytica.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.trimer <- seq %>% unite("p1", p1:p3, remove=F, sep= "") %>% unite("p2", p2:p4, remove=F, sep= "") %>% unite("p3", p3:p5, remove=F, sep= "") %>% unite("p4", p4:p6, remove=F, sep= "") %>% unite("p5", p5:p7, remove=F, sep= "") %>% unite("p6", p6:p8, remove=F, sep= "") %>% unite("p7", p7:p9, remove=F, sep= "") %>% unite("p8", p8:p10, remove=F, sep= "") %>% unite("p9", p9:p11, remove=F, sep= "") %>% unite("p10", p10:p12, remove=F, sep= "") %>% unite("p11", p11:p13, remove=F, sep= "") %>% unite("p12", p12:p14, remove=F, sep= "") %>% unite("p13", p13:p15, remove=F, sep= "") %>% unite("p14", p14:p16, remove=F, sep= "") %>% unite("p15", p15:p17, remove=F, sep= "") %>% unite("p16", p16:p18, remove=F, sep= "") %>% unite("p17", p17:p19, remove=F, sep= "") %>% unite("p18", p18:p20, remove=F, sep= "")

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq.trimer) <- seq.trimer[,1]
seq.df <- seq.trimer[,1:19]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "y.lipolytica.quantum.trimer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "y.lipolytica.quantum.trimer.melt.txt", quote=F, row.names=F, sep="\t")


# Tetramer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Tetramer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/")
seq <- read.delim("y.lipolytica.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.tetramer <- seq %>% unite("p1", p1:p4, remove=F, sep= "") %>% unite("p2", p2:p5, remove=F, sep= "") %>% unite("p3", p3:p6, remove=F, sep= "") %>% unite("p4", p4:p7, remove=F, sep= "") %>% unite("p5", p5:p8, remove=F, sep= "") %>% unite("p6", p6:p9, remove=F, sep= "") %>% unite("p7", p7:p10, remove=F, sep= "") %>% unite("p8", p8:p11, remove=F, sep= "") %>% unite("p9", p9:p12, remove=F, sep= "") %>% unite("p10", p10:p13, remove=F, sep= "") %>% unite("p11", p11:p14, remove=F, sep= "") %>% unite("p12", p12:p15, remove=F, sep= "") %>% unite("p13", p13:p16, remove=F, sep= "") %>% unite("p14", p14:p17, remove=F, sep= "") %>% unite("p15", p15:p18, remove=F, sep= "") %>% unite("p16", p16:p19, remove=F, sep= "") %>% unite("p17", p17:p20, remove=F, sep= "") 

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq.tetramer) <- seq.tetramer[,1]
seq.df <- seq.tetramer[,1:18]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "y.lipolytica.quantum.tetramer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "y.lipolytica.quantum.tetramer.melt.txt", quote=F, row.names=F, sep="\t")



setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/")
monomer <- read.delim("y.lipolytica.quantum.monomer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
basepair <- read.delim("y.lipolytica.quantum.basepair.melt.txt", header=T, sep="\t", stringsAsFactors = F)
dimer <- read.delim("y.lipolytica.quantum.dimer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
trimer <- read.delim("y.lipolytica.quantum.trimer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
tetramer <- read.delim("y.lipolytica.quantum.tetramer.melt.txt", header=T, sep="\t", stringsAsFactors = F)

monomer.basepair <- rbind(monomer, basepair)
monomer.basepair.dimer <- rbind(monomer.basepair, dimer)
monomer.basepair.dimer.trimer <- rbind(monomer.basepair.dimer, trimer)
monomer.basepair.dimer.trimer.tetramer <- rbind(monomer.basepair.dimer.trimer, tetramer)
write.table(monomer.basepair.dimer.trimer.tetramer, "y.lipolytica.15mar22.quantum.melt.txt", quote=F, row.names=F, sep="\t")
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
df <- read.delim("y.lipolytica.raw.matrix.txt", header=T, sep="\t", stringsAsFactors = F)

# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
tensor <- read.delim("y.lipolytica.15mar22.quantum.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0

tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0

df.score <- unique(df[,c(1,2)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
colnames(tensor.score.order) <- c("cut.score", "feature.scale", "sgRNAID", "value")

df.dcast <- tensor.score.order %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
nrow(df.dcast.na)
# 45271

df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
write.table(df.location, "y.lipolytica.finalquantum.txt", quote=F, row.names=F, sep="\t")
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
df <- read.delim("y.lipolytica.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df.cut <- df %>% select(-grep("cut.score.y.y", names(df)), -grep("cut.score.y", names(df)), -grep("cut.score.x.x", names(df))) 
df.num <- mutate_all(df.cut[,2:ncol(df.cut)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df.cut[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"

write.table(df.all, "y.lipolytica.finalquantum.txt", quote=F, row.names=F, sep="\t")

write.table(df.all[,c(1,3:ncol(df.all))], "y.lipolytica.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "y.lipolytica.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "y.lipolytica.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "y.lipolytica.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "y.lipolytica.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.all[,2]), "y.lipolytica.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")


# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName y.lipolytica.finalquantum --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.finalquantum.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum/Submits/submit_full_y.lipolytica.finalquantum_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum/Submits/submit_train_y.lipolytica.finalquantum_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum/Submits/submit_test_y.lipolytica.finalquantum_0.sh

# Andes
module load python/3.7-anaconda3

vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/YNames.txt y.lipolytica.finalquantum
# 0.08545804711311601

sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/y.lipolytica.finalquantum_cut.score.importance4 | head
# TTsgRNA.raw: 6478.7
# sgRNA.structuresgRNA.raw: 4883.34
# CGsgRNA.raw: 2532.61
# p11tetramer.Hbond.stackingraw: 2431.19
# p5tetramer.Hlgap.eVEraw: 2182.59
# p2tetramer.Hbond.stackingraw: 2180.77
# p5tetramer.Hbond.stackingraw: 2092.17
# p4tetramer.Hlgap.eVEraw: 2010.87
# p6tetramer.Hbond.stackingraw: 2001.02
# p6tetramer.Hlgap.eVEraw: 1979.47

# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("y.lipolytica.finalquantum_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.3024251
RIT
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00

#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score y.lipolytica.finalquantum

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum/cut.score/RIT.run

# TTsgRNA.raw   cut.score   0.04625773976249322 0.0003649053206181409   66194.581   -3.3820271415852257
# sgRNA.structuresgRNA.raw  cut.score   0.032304910327757425    -5.1113396088919486e-05 56086.877   -3.688107542876454
# CGsgRNA.raw   cut.score   0.018116510727833473    0.0002235208147464496   46003.732   -3.5040572264423506
# p11tetramer.Hbond.stackingraw cut.score   0.015149117580774115    -2.207199826384307e-05  12336.336   -3.8326470012202316
# p5tetramer.Hbond.stackingraw  cut.score   0.01514820907514612 -2.6014164934804636e-05 15496.092   -3.8566216507153634
# p5tetramer.Hlgap.eVEraw   cut.score   0.014912532026942904    -2.648470659462676e-05  10171.831   -3.90096106338001
# p6tetramer.Hbond.stackingraw  cut.score   0.014000071727391495    -1.5348780718090023e-05 9198.55 -3.9048327611853013
# p2tetramer.Hbond.stackingraw  cut.score   0.013797742179886422    -3.2383220351173965e-05 10161.696   -3.923555913351904
# p6tetramer.Hlgap.eVEraw   cut.score   0.013487888319232852    -2.774670944787072e-05  9729.251    -3.9506660077623654
# p2tetramer.Hlgap.eVEraw   cut.score   0.013485483451394041    -4.0047331023409636e-05 9609.594    -3.948292258091099
Figures
library(ggplot2)
library(reshape2)
library(RColorBrewer)

# Figure 5A
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum/cut.score")
imp <- read.delim("y.lipolytica.finalquantum.importance4.effect_sorted", header=F, sep="\t", stringsAsFactors = F)
colnames(imp) <- c("Feature", "YVec", "NormEdge", "Effect", "Samples", "Linearity")
imp$Normalized.Importance <- as.numeric(imp$NormEdge)
imp$Feature.Effect <- as.numeric(imp$Effect)
imp$SampleCount <- as.numeric(imp$Samples)
imp.dir <- imp %>% mutate(Effect.Direction = ifelse(Feature.Effect < 0, "neg", ifelse(Feature.Effect > 0, "pos", "zero")))
imp.dir.top20 <- imp.dir[1:20,]

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
pdf("y.lipolytica.Imp.Dir.Top20.21March.pdf")
ggplot(imp.dir.top20) + geom_bar(aes(x=reorder(Feature, -Normalized.Importance), y=Normalized.Importance, fill=Effect.Direction), stat="identity") + theme_classic() + xlab("Y.lipolytica Top Features") + ylab("Normalized Importance") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_fill_brewer(palette="Set1")
dev.off()

pdf("y.lipolytica.Imp.Dir.Top20.Effect.21March.pdf")
imp.dir.top20$Sample.Prop <- imp.dir.top20$SampleCount/32374
ggplot(imp.dir.top20, aes(x=reorder(Feature, -Normalized.Importance))) + geom_point(aes(y=Sample.Prop, color=Effect.Direction, size=Feature.Effect)) + xlab("y.lipolytica") + ylab("Avg Proportion of Samples that Features Influence") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Set2")
dev.off()


#### Figure 5B: Focus on effect size
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum/cut.score")
imp <- read.delim("y.lipolytica.finalquantum.importance4.effect_sorted", header=F, sep="\t", stringsAsFactors = F)
colnames(imp) <- c("Feature", "YVec", "NormEdge", "Effect", "Samples", "Linearity")
imp$Normalized.Importance <- as.numeric(imp$NormEdge)
imp$Feature.Effect <- as.numeric(imp$Effect)
imp$SampleCount <- as.numeric(imp$Samples)
imp.dir <- imp %>% mutate(Effect.Direction = ifelse(Feature.Effect < 0, "neg", ifelse(Feature.Effect > 0, "pos", "zero")))
imp.dir$absEffect <- abs(imp.dir$Feature.Effect)
imp.dir.effectsorted <- imp.dir[order(imp.dir$absEffect, decreasing = TRUE),]
imp.dir.effectsorted.top20 <- imp.dir.effectsorted[1:20,]

pdf("y.lipolytica.Imp.Dir.Top20Effect.Effect.pdf")
ggplot(imp.dir.effectsorted.top20) + geom_point(aes(x=Feature, y=absEffect, color=Effect.Direction, size=Normalized.Importance)) + xlab("") + ylab("abs(Effect Size)") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Set2")
dev.off()

remove highly correlated

# R
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/")
df <- read.delim("y.lipolytica.finalquantum.txt", header=T, sep="\t")
df.rm <- df %>% select(-grep("basepair.Hlgap.eVEraw", names(df)), -grep("dimer.Hbond.energyraw", names(df)), -grep("trimer.Hbond.energyraw", names(df)), -grep("tetramer.Hbond.energyraw", names(df))) 
# 6160

write.table(df.rm, "y.lipolytica.finalquantum.noncorrelated.txt", quote=F, row.names=F, sep="\t")
write.table(df.rm[,c(1,3:ncol(df.rm))], "y.lipolytica.finalquantum.noncorrelated.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.rm[,c(1,3:ncol(df.rm))], "y.lipolytica.finalquantum.noncorrelated.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.rm[,3:ncol(df.rm)], "y.lipolytica.finalquantum.noncorrelated.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")


# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum.noncorrelated
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum.noncorrelated
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName y.lipolytica.finalquantum.noncorrelated --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.finalquantum.noncorrelated.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.finalquantum.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum.noncorrelated
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum.noncorrelated/Submits/submit_full_y.lipolytica.finalquantum.noncorrelated_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum.noncorrelated/Submits/submit_train_y.lipolytica.finalquantum.noncorrelated_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum.noncorrelated/Submits/submit_test_y.lipolytica.finalquantum.noncorrelated_0.sh

# Andes
module load python/3.7-anaconda3

vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum.noncorrelated
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/YNames.txt y.lipolytica.finalquantum.noncorrelated
# 0.08512465869697025

sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/y.lipolytica.finalquantum.noncorrelated_cut.score.importance4 | head
# TTsgRNA.raw: 6590.06
# sgRNA.structuresgRNA.raw: 4918.95
# p11tetramer.Hbond.stackingraw: 3215.99
# p5tetramer.Hbond.stackingraw: 3017.72
# p2tetramer.Hbond.stackingraw: 3003.06
# p6tetramer.Hbond.stackingraw: 2884.26
# p10tetramer.Hbond.stackingraw: 2821.91
# p5tetramer.Hlgap.eVEraw: 2699.19
# p13tetramer.Hbond.stackingraw: 2650.7
# CGsgRNA.raw: 2630.74



# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum.noncorrelated/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("y.lipolytica.finalquantum.noncorrelated_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.3017328


##### RIT:

#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum.noncorrelated/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score y.lipolytica.finalquantum.noncorrelated

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/iRF.run/y.lipolytica.finalquantum.noncorrelated/cut.score/RIT.run

# TTsgRNA.raw   cut.score   0.04640312466062963 0.00029452942624747116  66501.09    -3.3814161480636695
# sgRNA.structuresgRNA.raw  cut.score   0.032520167385408735    -4.44396235670715e-05   56130.327   -3.6895022196683107
# p5tetramer.Hbond.stackingraw  cut.score   0.021426626714034003    -2.036311453463841e-05  21490.653   -3.846722952178801
# p11tetramer.Hbond.stackingraw cut.score   0.020600608305521025    -1.132339680678625e-05  16079.544   -3.8274408504988124
# p2tetramer.Hbond.stackingraw  cut.score   0.01960410276530065 -2.7270302410628164e-05 14770.904   -3.9090958983529824
# p6tetramer.Hbond.stackingraw  cut.score   0.01909642287169399 -1.0491676778742884e-05 12609.69    -3.9090846026099033
# CGsgRNA.raw   cut.score   0.018538518246690866    0.00010930618298450694  48210.091   -3.4129122064349238
# p10tetramer.Hbond.stackingraw cut.score   0.018244839382512614    -3.4972201382700453e-06 11668.857   -3.8066371442425555
# p5tetramer.Hlgap.eVEraw   cut.score   0.01795936264838214 -1.2110628098910293e-05 12587.285   -3.866138113056994
# p4tetramer.Hbond.stackingraw  cut.score   0.01760166456136343 -1.9011298848335074e-05 12921.292   -3.9006183223511735

Human (Doench et al., 2014)

https://www.nature.com/articles/nbt.3026?report=reader#Sec15

sgRNA dataset

# scp /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/Sprint.Opioid.ATAC/Genome/GCF_000001405.39_GRCh38.p13_genomic.fna noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/.
# scp /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/Sprint.Opioid.ATAC/Genome/GCF_000001405.39_GRCh38.p13_genomic.gene.sorted.gtf noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/.

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/human/")
df <- read.delim("doench.2014.TableS7.txt", header=T, sep="\t")
colnames(df) <- c("sgRNAID",    "nucleotide.sequence", "cut.score")
df2 <- df[,c(1,3,2)]
df.na <- na.omit(df2)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/")
write.table(df.na, "Doench2014.txt", quote=F, row.names=F, sep="\t")

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/
sed '1d' Doench2014.txt | awk '{print ">"$1"\n"$3}' > Doench2014.fasta

blast

  • do a search for the sgRNA sequence in the genome
    • input fasta file of sequences, output coordinates
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

## blast
# conda install blast
# cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes
# wget https://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/ncbi-blast-2.11.0+-x64-linux.tar.gz
# tar zxvpf ncbi-blast-2.11.0+-x64-linux.tar.gz
# export PATH=$PATH:$HOME/ncbi-blast-2.10.1+/bin
# echo $PATH
# mkdir $HOME/blastdb
# export BLASTDB=$HOME/blastdb
# set BLASTDB=$HOME/blastdb

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014

/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/makeblastdb -in GCF_000001405.39_GRCh38.p13_genomic.fna -dbtype nucl
# /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query Doench2014.fasta -db GCF_000001405.39_GRCh38.p13_genomic.fna -out Doench2014.gRNA.blast.tab -outfmt 6 -evalue 0.0005 -task blastn -num_threads 10
# 
# # can't find sequences... do i need the complement??
# source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
# conda create --name emboss python=3.8
# conda activate emboss
# conda install -c conda-forge -c bioconda emboss
# ## revseq test.fasta -noreverse -complement -outseq test.comp.fasta
# revseq Doench2014.fasta -noreverse -complement -outseq Doench2014.comp.fasta
# 
# /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query Doench2014.comp.fasta -db GCF_000001405.39_GRCh38.p13_genomic.fna -out Doench2014.gRNA.blast.tab -outfmt 6 -evalue 0.0005 -task blastn -num_threads 10

#### correction... just needed to adjust settings in the blast command... used forward strand (originally provided from table S7)
#/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query Doench2014.fasta -db GCF_000001405.39_GRCh38.p13_genomic.fna -out Doench2014.gRNA.blast.tab -outfmt 6 -task blastn-short -num_threads 10
# 105959 (1841 sgRNAs)

/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query Doench2014.fasta -db GCF_000001405.39_GRCh38.p13_genomic.fna -out Doench2014.gRNA.blast.tab -outfmt 6 -evalue 0.01 -task blastn-short -num_threads 10
# 1733

awk '{if ($9 > $10) print $2"\t"$10"\t"$9"\t"$1}' Doench2014.gRNA.blast.tab > tmp1.bed
awk '{if ($10 > $9) print $2"\t"$9"\t"$10"\t"$1}' Doench2014.gRNA.blast.tab > tmp2.bed
cat tmp1.bed tmp2.bed > Doench2014.gRNA.blast.bed


#### also run complement
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query Doench2014.comp.fasta -db GCF_000001405.39_GRCh38.p13_genomic.fna -out Doench2014.gRNA.complement.blast.tab -outfmt 6 -task blastn-short -num_threads 10

awk '{if ($9 > $10) print $2"\t"$10"\t"$9"\t"$1}' Doench2014.gRNA.complement.blast.tab > tmp1.comp.bed
awk '{if ($10 > $9) print $2"\t"$9"\t"$10"\t"$1}' Doench2014.gRNA.complement.blast.tab > tmp2.comp.bed
cat tmp1.comp.bed tmp2.comp.bed > Doench2014.gRNA.complement.blast.bed
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

# R

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/human/")
df <- read.delim("doench.2014.TableS7.txt", header=T, sep="\t")
colnames(df) <- c("sgRNAID",    "nucleotide.sequence", "cut.score")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/")
coord <- read.delim("Doench2014.gRNA.blast.bed", header=F, sep="\t")
colnames(coord) <- c("chr", "start", "end", "sgRNA")
df$sgRNA <- df$sgRNAID

library(dplyr)
df.coord <- left_join(coord, df, by="sgRNA")
write.table(df.coord, "Doench2014.sgRNA.coord.txt", quote=F, row.names=F, sep="\t")

sliding windows

  • make 20bp sliding windows (every 1bp)
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014

faidx GCF_000001405.39_GRCh38.p13_genomic.fna -i chromsizes > Doench2014.sizes.genome
grep 'NC_' Doench2014.sizes.genome > Doench2014.sizes.genome.chr
bedtools makewindows -g Doench2014.sizes.genome.chr -w 20 -s 1 > Doench2014.20bp.sliding.bed

Features

Gene density & GC content

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014

## genes
bedtools intersect -wo -a Doench2014.20bp.sliding.bed -b GCF_000001405.39_GRCh38.p13_genomic.gene.sorted.gtf > Doench2014.gene.20sliding.bed

## GC content
bedtools nuc -fi GCF_000001405.39_GRCh38.p13_genomic.fna -bed Doench2014.20bp.sliding.bed | sed '1d' > Doench2014.GC.20sliding.bed

Temperature of melting (Tm)

https://biopython.org/docs/1.75/api/Bio.SeqUtils.MeltingTemp.html

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

Bio.SeqUtils.MeltingTemp.Tm_NN(seq, check=True, strict=True, c_seq=None, shift=0, nn_table=None, tmm_table=None, imm_table=None, de_table=None, dnac1=25, dnac2=25, selfcomp=False, Na=50, K=0, Tris=0, Mg=0, dNTPs=0, saltcorr=5)

https://warwick.ac.uk/fac/sci/moac/people/students/peter_cock/python/fasta_n

# summit: # conda install -c conda-forge biopython 

### sgRNA
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014

# count nucleotides
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014
python3

input_file = open('Doench2014.fasta', 'r')
output_file = open('nucleotide_counts_sgRNA.tsv','w')
output_file.write('Window\tA\tC\tG\tT\tLength\tCG%\n')
from Bio import SeqIO
for cur_record in SeqIO.parse(input_file, "fasta") :
    gene_name = cur_record.name
    A_count = cur_record.seq.count('A')
    C_count = cur_record.seq.count('C')
    G_count = cur_record.seq.count('G')
    T_count = cur_record.seq.count('T')
    length = len(cur_record.seq)
    cg_percentage = float(C_count + G_count) / length
    output_line = '%s\t%i\t%i\t%i\t%i\t%i\t%f\n' % \
    (gene_name, A_count, C_count, G_count, T_count, length, cg_percentage)
    output_file.write(output_line)
    
output_file.close()
input_file.close()
exit()

# Melting temperature(°C) = 64.9 + 41 * (nG+nC-16.4)/(nA+nT+nG+nC)
R

library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
df <- read.delim("nucleotide_counts_sgRNA.tsv", header=T, sep="\t")
df.melt <- df %>% mutate(MeltingTemp = 64.9 + 41 * (G+C-16.4) / (A+T+G+C))

write.table(df.melt, "Doench2014.nucleotide_counts_sgRNA_temp.txt", quote=F, row.names=F, sep="\t")
q()


### 20bp sliding windows
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014
bedtools getfasta -fi GCF_000001405.39_GRCh38.p13_genomic.fna -bed Doench2014.20bp.sliding.bed -fo Doench2014.20sliding.fa

# count nucleotides
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014
python3

input_file = open('Doench2014.20sliding.fa', 'r')
output_file = open('nucleotide_counts_20sliding.tsv','w')
output_file.write('Window\tA\tC\tG\tT\tLength\tCG%\n')
from Bio import SeqIO
for cur_record in SeqIO.parse(input_file, "fasta") :
    gene_name = cur_record.name
    A_count = cur_record.seq.count('A')
    C_count = cur_record.seq.count('C')
    G_count = cur_record.seq.count('G')
    T_count = cur_record.seq.count('T')
    length = len(cur_record.seq)
    cg_percentage = float(C_count + G_count) / length
    output_line = '%s\t%i\t%i\t%i\t%i\t%i\t%f\n' % \
    (gene_name, A_count, C_count, G_count, T_count, length, cg_percentage)
    output_file.write(output_line)
    
output_file.close()
input_file.close()
exit()

# Melting temperature(°C) = 64.9 + 41 * (nG+nC-16.4)/(nA+nT+nG+nC)
R

library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
df <- read.delim("nucleotide_counts_20sliding.tsv", header=T, sep="\t")
df.melt <- df %>% mutate(MeltingTemp = 64.9 + 41 * (G+C-16.4) / (A+T+G+C))

write.table(df.melt, "Doench2014.nucleotide_counts_20sliding_temp.txt", quote=F, row.names=F, sep="\t")
q()
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J temp.melt.sliding
#SBATCH -N 1
#SBATCH -t 48:00:00
#SBATCH -o temp.melt.sliding-%j.o
#SBATCH -e temp.melt.sliding-%j.e

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014

python3 temp.melt.sliding.py

R CMD BATCH temp.melt.sliding.R

#sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/temp.melt.sliding.sh

Onehot encoding

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/
cut -f 1,3 Doench2014.txt > Doench2014.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/encode_sequences.py Doench2014.noscore.txt


# separate nucleotide sequence values into individual columns in data frame so each position counts as one feature
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/

sed '1d' Doench2014.noscore_independent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID A C T G' | cut -d ' ' -f 1-5 > Doench2014_ind1.txt
sed '1d' Doench2014.noscore_independent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID AA AC AT AG CA CC CT CG TA TC TT TG GA GC GT GG' | cut -d ' ' -f 1-17 > Doench2014_ind2.txt
sed '1d' Doench2014.noscore_dependent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID p1.A p1.C p1.T p1.G p2.A p2.C p2.T p2.G p3.A p3.C p3.T p3.G p4.A p4.C p4.T p4.G p5.A p5.C p5.T p5.G p6.A p6.C p6.T p6.G p7.A p7.C p7.T p7.G p8.A p8.C p8.T p8.G p9.A p9.C p9.T p9.G p10.A p10.C p10.T p10.G p11.A p11.C p11.T p11.G p12.A p12.C p12.T p12.G p13.A p13.C p13.T p13.G p14.A p14.C p14.T p14.G p15.A p15.C p15.T p15.G p16.A p16.C p16.T p16.G p17.A p17.C p17.T p17.G p18.A p18.C p18.T p18.G p19.A p19.C p19.T p19.G p20.A p20.C p20.T p20.G' | cut -d ' ' -f 1-81 > Doench2014_dep1.txt
sed '1d' Doench2014.noscore_dependent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID p1.AA p1.AC p1.AT p1.AG p1.CA p1.CC p1.CT p1.CG p1.TA p1.TC p1.TT p1.TG p1.GA p1.GC p1.GT p1.GG p2.AA p2.AC p2.AT p2.AG p2.CA p2.CC p2.CT p2.CG p2.TA p2.TC p2.TT p2.TG p2.GA p2.GC p2.GT p2.GG p3.AA p3.AC p3.AT p3.AG p3.CA p3.CC p3.CT p3.CG p3.TA p3.TC p3.TT p3.TG p3.GA p3.GC p3.GT p3.GG p4.AA p4.AC p4.AT p4.AG p4.CA p4.CC p4.CT p4.CG p4.TA p4.TC p4.TT p4.TG p4.GA p4.GC p4.GT p4.GG p5.AA p5.AC p5.AT p5.AG p5.CA p5.CC p5.CT p5.CG p5.TA p5.TC p5.TT p5.TG p5.GA p5.GC p5.GT p5.GG p6.AA p6.AC p6.AT p6.AG p6.CA p6.CC p6.CT p6.CG p6.TA p6.TC p6.TT p6.TG p6.GA p6.GC p6.GT p6.GG p7.AA p7.AC p7.AT p7.AG p7.CA p7.CC p7.CT p7.CG p7.TA p7.TC p7.TT p7.TG p7.GA p7.GC p7.GT p7.GG p8.AA p8.AC p8.AT p8.AG p8.CA p8.CC p8.CT p8.CG p8.TA p8.TC p8.TT p8.TG p8.GA p8.GC p8.GT p8.GG p9.AA p9.AC p9.AT p9.AG p9.CA p9.CC p9.CT p9.CG p9.TA p9.TC p9.TT p9.TG p9.GA p9.GC p9.GT p9.GG p10.AA p10.AC p10.AT p10.AG p10.CA p10.CC p10.CT p10.CG p10.TA p10.TC p10.TT p10.TG p10.GA p10.GC p10.GT p10.GG p11.AA p11.AC p11.AT p11.AG p11.CA p11.CC p11.CT p11.CG p11.TA p11.TC p11.TT p11.TG p11.GA p11.GC p11.GT p11.GG p12.AA p12.AC p12.AT p12.AG p12.CA p12.CC p12.CT p12.CG p12.TA p12.TC p12.TT p12.TG p12.GA p12.GC p12.GT p12.GG p13.AA p13.AC p13.AT p13.AG p13.CA p13.CC p13.CT p13.CG p13.TA p13.TC p13.TT p13.TG p13.GA p13.GC p13.GT p13.GG p14.AA p14.AC p14.AT p14.AG p14.CA p14.CC p14.CT p14.CG p14.TA p14.TC p14.TT p14.TG p14.GA p14.GC p14.GT p14.GG p15.AA p15.AC p15.AT p15.AG p15.CA p15.CC p15.CT p15.CG p15.TA p15.TC p15.TT p15.TG p15.GA p15.GC p15.GT p15.GG p16.AA p16.AC p16.AT p16.AG p16.CA p16.CC p16.CT p16.CG p16.TA p16.TC p16.TT p16.TG p16.GA p16.GC p16.GT p16.GG p17.AA p17.AC p17.AT p17.AG p17.CA p17.CC p17.CT p17.CG p17.TA p17.TC p17.TT p17.TG p17.GA p17.GC p17.GT p17.GG p18.AA p18.AC p18.AT p18.AG p18.CA p18.CC p18.CT p18.CG p18.TA p18.TC p18.TT p18.TG p18.GA p18.GC p18.GT p18.GG p19.AA p19.AC p19.AT p19.AG p19.CA p19.CC p19.CT p19.CG p19.TA p19.TC p19.TT p19.TG p19.GA p19.GC p19.GT p19.GG p20.AA p20.AC p20.AT p20.AG p20.CA p20.CC p20.CT p20.CG p20.TA p20.TC p20.TT p20.TG p20.GA p20.GC p20.GT p20.GG' | cut -d ' ' -f 1-321 > Doench2014_dep2.txt

chemical tensors

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/
sed '1d' Doench2014.noscore.txt | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 p16 p17 p18 p19 p20' | cut -d ' ' -f 1-21 > Doench2014.sequence.txt


# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

R
library(dplyr)
library(reshape2)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("protein_rna_dna-vector_lee_nucleotide_dna_data.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/")
seq <- read.delim("Doench2014.sequence.txt", header=T, sep=" ", stringsAsFactors = F)

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:5]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- c("A", "C", "G", "T")

rownames(seq) <- seq[,1]
seq.df <- seq[,2:21]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))

seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Doench2014.tensors.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014.tensors.melt.txt", quote=F, row.names=F, sep="\t")

RNA structure (ViennaRNA)

https://www.tbi.univie.ac.at/RNA/tutorial/ minimum free energy (MFE) structure

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate ViennaRNA

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/vienna
RNAfold < ../Doench2014.fasta > Doench2014.gRNA.ViennaRNA.output.txt

grep '(' Doench2014.gRNA.ViennaRNA.output.txt | grep -Eo '[+-]?[0-9]+([.][0-9]+)?' > Doench2014.gRNA.ViennaRNA.output.value.txt
grep '>' Doench2014.gRNA.ViennaRNA.output.txt | sed 's/>//g' > Doench2014.gRNA.names.txt
paste Doench2014.gRNA.names.txt Doench2014.gRNA.ViennaRNA.output.value.txt > Doench2014.gRNA.ViennaRNA.output.value.id.txt
cp Doench2014.gRNA.ViennaRNA.output.value.id.txt ../.

# 20bp sliding fasta
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/vienna
RNAfold < ../Doench2014.20sliding.fa > Doench2014.20sliding.ViennaRNA.output.txt

grep '(' Doench2014.20sliding.ViennaRNA.output.txt | grep -Eo '[+-]?[0-9]+([.][0-9]+)?' > Doench2014.20sliding.ViennaRNA.output.value.txt
grep '>' Doench2014.20sliding.ViennaRNA.output.txt | sed 's/>//g' > Doench2014.20sliding.names.txt
paste Doench2014.20sliding.names.txt Doench2014.20sliding.ViennaRNA.output.value.txt > Doench2014.20sliding.ViennaRNA.output.value.id.txt
cp Doench2014.20sliding.ViennaRNA.output.value.id.txt ../.
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J ViennaRNA.doench2014
#SBATCH -N 2
#SBATCH -t 48:00:00

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate ViennaRNA

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/vienna
RNAfold < ../Doench2014.20sliding.fa > Doench2014.20sliding.ViennaRNA.output.txt

grep '(' Doench2014.20sliding.ViennaRNA.output.txt | grep -Eo '[+-]?[0-9]+([.][0-9]+)?' > Doench2014.20sliding.ViennaRNA.output.value.txt
grep '>' Doench2014.20sliding.ViennaRNA.output.txt | sed 's/>//g' > Doench2014.20sliding.names.txt
paste Doench2014.20sliding.names.txt Doench2014.20sliding.ViennaRNA.output.value.txt > Doench2014.20sliding.ViennaRNA.output.value.id.txt
cp Doench2014.20sliding.ViennaRNA.output.value.id.txt ../.

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/ViennaRNA.doench2014.sh

GATC motif

  • proxy for putative methylation
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

## GATC motif
## fastaregex
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f GCF_000001405.39_GRCh38.p13_genomic.fna -r 'GATC' > Doench2014.gatc.bed

bedtools intersect -wo -a Doench2014.20bp.sliding.bed -b Doench2014.gatc.bed > Doench2014.gatc.20sliding.bed

PAM

https://www.synthego.com/guide/how-to-use-crispr/pam-sequence

#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J bedtools
#SBATCH -N 1
#SBATCH -p gpu
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014
awk '{print $0"\t""+"}' Doench2014.sgRNA.coord.bed > Doench2014.sgRNA.coord.strand.txt
bedtools closest -a Doench2014.sgRNA.coord.strand.txt -b Doench2014.NGG.PAM.sorted.bed -io -iu -D a > Doench2014.sgRNA.closestPAM.bed

bedtools intersect -wo -a Doench2014.20bp.sliding.bed -b Doench2014.NGG.PAM.sorted.bed > Doench2014.NGG.PAM.20bp.sliding.windows.bed

cut -f 1-4 Doench2014.sgRNA.coord.txt | sed '1d' | sort -k 1,1 -k 2,2n > Doench2014.sgRNA.coord.bed
bedtools closest -a Doench2014.sgRNA.coord.bed -b GCF_000001405.39_GRCh38.p13_genomic.gene.sorted.gtf -D b > Doench2014.sgRNA.gene.closest.bed

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/bedtools.sh
salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

# generate fastq file of NGG sequences and blast to reference

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014
# vim NGG.PAM.fasta

## fastaRegexFinder
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f GCF_000001405.39_GRCh38.p13_genomic.fna -r 'AGG' > Doench2014.AGG.PAM.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f GCF_000001405.39_GRCh38.p13_genomic.fna -r 'TGG' > Doench2014.TGG.PAM.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f GCF_000001405.39_GRCh38.p13_genomic.fna -r 'CGG' > Doench2014.CGG.PAM.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f GCF_000001405.39_GRCh38.p13_genomic.fna -r 'GGG' > Doench2014.GGG.PAM.txt

cat Doench2014.AGG.PAM.txt Doench2014.TGG.PAM.txt Doench2014.CGG.PAM.txt Doench2014.GGG.PAM.txt > Doench2014.NGG.PAM.txt
sort -k 1,1 -k 2,2n Doench2014.NGG.PAM.txt > Doench2014.NGG.PAM.sorted.bed

# intersect with sliding windows in the genome to get density for DWT
bedtools intersect -wo -a Doench2014.20bp.sliding.bed -b Doench2014.NGG.PAM.sorted.bed > Doench2014.NGG.PAM.20bp.sliding.windows.bed

# closest with gRNAs to identify distance (downstream, strand)
awk '{print $0"\t""+"}' Doench2014.sgRNA.coord.bed > Doench2014.sgRNA.coord.strand.txt
bedtools closest -a Doench2014.sgRNA.coord.strand.txt -b Doench2014.NGG.PAM.sorted.bed -io -iu -D a > Doench2014.sgRNA.closestPAM.bed

location relative to gene

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014
cut -f 1-4 Doench2014.sgRNA.coord.txt | sed '1d' | sort -k 1,1 -k 2,2n > Doench2014.sgRNA.coord.bed
bedtools closest -a Doench2014.sgRNA.coord.bed -b GCF_000001405.39_GRCh38.p13_genomic.gene.sorted.gtf -D b > Doench2014.sgRNA.gene.closest.bed

Raw features matrix

# salloc -A SYB105 -N 2 -t 4:00:00

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
structure <- read.delim("Doench2014.gRNA.ViennaRNA.output.value.id.txt", header=T, sep="\t", stringsAsFactors = F)
nuc <- read.delim("Doench2014.nucleotide_counts_sgRNA_temp.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("Doench2014.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- unique(score[,c(5,7)])
colnames(score.df) <- c("sgRNAID", "cut.score")

structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]

# strucutre, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])

structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"

structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]

structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score.df, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")

## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
onehot.ind1 <- read.delim("Doench2014_ind1.txt", header=T, sep=" ")
onehot.ind2 <- read.delim("Doench2014_ind2.txt", header=T, sep=" ")
onehot.dep1 <- read.delim("Doench2014_dep1.txt", header=T, sep=" ")
onehot.dep2 <- read.delim("Doench2014_dep2.txt", header=T, sep=" ")
onehot.dep2 <- onehot.dep2[,1:305]

onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep <- full_join(onehot.dep1, onehot.dep2, by="sgRNAID")
onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"

data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))

df.melt <- melt(data.onehot, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
write.table(df.id, "df.id.test.txt", quote=F, row.names=F, sep="\t")

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
tensor <- read.delim("Doench2014.tensors.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0

tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0
write.table(tensor.id, "tensor.id.test", quote=F, row.names=F, sep="\t")



# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
df.id <- read.delim("df.id.test.txt", header=T, sep="\t")
tensor.id <- read.delim("tensor.id.test", header=T, sep="\t")

df.score <- unique(df.id[,c(1,3)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]

head(df.id)
head(tensor.score.order)
tensor.df <- rbind(df.id, tensor.score.order)
write.table(tensor.df, "Doench2014.raw.onehot.tensor.txt", quote=F, row.names=F, sep="\t")

df.dcast <- tensor.df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
write.table(df.dcast, "Doench2014.raw.onehot.tensor.dcast.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast)
df.dcast.na <- na.omit(df.dcast)
write.table(df.dcast.na, "Doench2014.raw.onehot.tensor.dcast.na.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast.na)


# pam (distance and nucleotide)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
sgRNA.pam <- read.table("Doench2014.sgRNA.closestPAM.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.pam.sub <- sgRNA.pam[,c(4,12,13)]
colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]

score.location <- left_join(score.df, sgRNA.pam.df, by=c("sgRNAID"))
score.location$scale <- 0

df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
write.table(df.dcast.na, "Doench2014.sgRNA.pam.dcast.txt", quote=F, row.names=F, sep="\t")

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
df.dcast <- read.delim("Doench2014.sgRNA.pam.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
df.dcast.sep <- df.dcast[,c(1,3:7)]

df <- read.delim("Doench2014.raw.onehot.tensor.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df.location <- inner_join(df, df.dcast.sep, by=c("sgRNAID"))
nrow(df.location)
# 1825

write.table(df.location, "Doench2014.raw.onehot.tensor.pam.dcast.na.txt", quote=F, row.names=F, sep="\t")



# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
sgRNA.genes <- read.table("Doench2014.sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- sgRNA.genes[,c(4,14)]
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")

score.location <- left_join(score.df, sgRNA.genes.df, by=c("sgRNAID"))
score.location$scale <- 0

df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
# 
write.table(df.dcast.na, "Doench2014.sgRNA.location.dcast.txt", quote=F, row.names=F, sep="\t")


setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
df.dcast <- read.delim("Doench2014.sgRNA.location.dcast.txt", header=T, sep="\t", stringsAsFactors = F)

df <- read.delim("Doench2014.raw.onehot.tensor.pam.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)

df.location <- inner_join(df, df.dcast, by=c("sgRNAID"))
nrow(df.location)
# 1825

write.table(df.location, "Doench2014.raw.onehot.tensor.pam.location.dcast.na.txt", quote=F, row.names=F, sep="\t")

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
df <- read.delim("Doench2014.raw.onehot.tensor.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.y'] <- 'cut.score.x'
df <- df[,c(1:1654,1656)]
write.table(df, "Doench2014.raw.onehot.tensor.pam.location.dcast.corrected.txt", quote=F, row.names=F, sep="\t")

HAAR wavelets

–> MAJOR CHALLENGE WITH WAVELETS FOR HUMAN DATA: genome is too large so compute time is too memory intensive for R… can’t generate modwt files

#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J haar.matrix
#SBATCH -N 1
#SBATCH -p gpu
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014
# R CMD BATCH haar.matrix.R
R CMD BATCH haar.matrix.gatc.R
R CMD BATCH haar.matrix.gene.R
R CMD BATCH haar.matrix.structure.R
R CMD BATCH haar.matrix.nuc.R
R CMD BATCH haar.matrix.pam.R

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/haar.matrix.sh
salloc -A SYB105 -N 2 -p gpu -t 4:00:00

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/modwt

R

library(dplyr)
library(reshape2)
library(tidyr)
library(wmtsa)
library(data.table)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
gatc <- read.table("Doench2014.gatc.20sliding.bed", header=F, sep="\t", stringsAsFactors = F)
gene <- read.table("Doench2014.gene.20sliding.bed", header=F, sep="\t", stringsAsFactors = F)
structure <- read.table("Doench2014.20sliding.ViennaRNA.output.value.id.txt", header=T, sep="\t", stringsAsFactors = F)
nuc <- read.table("Doench2014.nucleotide_counts_20sliding_temp.txt", header=T, sep="\t", stringsAsFactors = F)
pam <- read.table("Doench2014.NGG.PAM.20bp.sliding.windows.bed", header=F, sep="\t", stringsAsFactors = F)
window <- read.table("Doench2014.20bp.sliding.bed", header=F, sep="\t", stringsAsFactors = F)
score <- read.table("Doench2014.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
colnames(score) <- c("chr", "start", "end", "sgRNA", "id", "seq", "id2", "cut.score", "gid", "change.val", "quality")
score.df <- score[,c(1:4,8)]

gatc.bin <- gatc %>% group_by(V1, V2, V3) %>% mutate(gatc.count = n())
gatc.count <- unique(gatc.bin[,c(1:3,8)])

gene.bin <- gene %>% group_by(V1, V2, V3) %>% mutate(gene.count = n())
gene.count <- unique(gene.bin[,c(1:3,14)])

pam.bin <- pam %>% group_by(V1, V2, V3) %>% mutate(pam.count = n())
pam.count <- unique(pam.bin[,c(1:3,12)])

window.v <- window[,1:3]
colnames(window.v) <- c("V1", "V2", "V3")
gatc.win <- left_join(window.v, gatc.count, by=c("V1", "V2", "V3"))
gatc.win[is.na(gatc.win)] <- 0
gene.win <- left_join(window.v, gene.count, by=c("V1", "V2", "V3"))
gene.win[is.na(gene.win)] <- 0
pam.win <- left_join(window.v, pam.count, by=c("V1", "V2", "V3"))
pam.win[is.na(pam.win)] <- 0

gene.df <- gene.win$gene.count
gatc.df <- gatc.win$gatc.count
pam.df <- pam.win$pam.count

structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/modwt")

temp.modwt <- wavMODWT(temp.df, wavelet="haar")
temp.modwt.df <- as.matrix(temp.modwt)
temp.modwt.label <- data.frame(label = row.names(temp.modwt.df), temp.modwt.df)
temp.modwt.dt <- as.data.table(temp.modwt.label)
temp.modwt.name <- temp.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(temp.modwt.name) <- c("label", "temp.dwt", "scale", "window")
write.table(temp.modwt.name, "temp.modwt.haar.txt", quote=F, row.names=F, sep="\t")

gc.modwt <- wavMODWT(gc.df, wavelet="haar")
gc.modwt.df <- as.matrix(gc.modwt)
gc.modwt.label <- data.frame(label = row.names(gc.modwt.df), gc.modwt.df)
gc.modwt.dt <- as.data.table(gc.modwt.label)
gc.modwt.name <- gc.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(gc.modwt.name) <- c("label", "gc.dwt", "scale", "window")
write.table(gc.modwt.name, "gc.modwt.haar.txt", quote=F, row.names=F, sep="\t")

structure.modwt <- wavMODWT(structure.df, wavelet="haar")
structure.modwt.df <- as.matrix(structure.modwt)
structure.modwt.label <- data.frame(label = row.names(structure.modwt.df), structure.modwt.df)
structure.modwt.dt <- as.data.table(structure.modwt.label)
structure.modwt.name <- structure.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(structure.modwt.name) <- c("label", "structure.dwt", "scale", "window")
write.table(structure.modwt.name, "structure.modwt.haar.txt", quote=F, row.names=F, sep="\t")

ipd.modwt <- wavMODWT(ipd.df, wavelet="haar")
ipd.modwt.df <- as.matrix(ipd.modwt)
ipd.modwt.label <- data.frame(label = row.names(ipd.modwt.df), ipd.modwt.df)
ipd.modwt.dt <- as.data.table(ipd.modwt.label)
ipd.modwt.name <- ipd.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(ipd.modwt.name) <- c("label", "ipd.dwt", "scale", "window")
write.table(ipd.modwt.name, "ipd.modwt.haar.txt", quote=F, row.names=F, sep="\t")

gene.modwt <- wavMODWT(gene.df, wavelet="haar")
gene.modwt.df <- as.matrix(gene.modwt)
gene.modwt.label <- data.frame(label = row.names(gene.modwt.df), gene.modwt.df)
gene.modwt.dt <- as.data.table(gene.modwt.label)
gene.modwt.name <- gene.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(gene.modwt.name) <- c("label", "gene.dwt", "scale", "window")
write.table(gene.modwt.name, "gene.density.modwt.haar.txt", quote=F, row.names=F, sep="\t")

gatc.modwt <- wavMODWT(gatc.df, wavelet="haar")
gatc.modwt.df <- as.matrix(gatc.modwt)
gatc.modwt.label <- data.frame(label = row.names(gatc.modwt.df), gatc.modwt.df)
gatc.modwt.dt <- as.data.table(gatc.modwt.label)
gatc.modwt.name <- gatc.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(gatc.modwt.name) <- c("label", "gatc.dwt", "scale", "window")
write.table(gatc.modwt.name, "gatc.density.modwt.haar.txt", quote=F, row.names=F, sep="\t")

pam.modwt <- wavMODWT(pam.df, wavelet="haar")
pam.modwt.df <- as.matrix(pam.modwt)
pam.modwt.label <- data.frame(label = row.names(pam.modwt.df), pam.modwt.df)
pam.modwt.dt <- as.data.table(pam.modwt.label)
pam.modwt.name <- pam.modwt.dt[, c("name", "number") := tstrsplit(label, "[^[:alnum:]]+")]
colnames(pam.modwt.name) <- c("label", "pam.dwt", "scale", "window")
write.table(pam.modwt.name, "pam.density.modwt.haar.txt", quote=F, row.names=F, sep="\t")

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/modwt")
temp.modwt.name <- read.delim("temp.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
gc.modwt.name <- read.delim("gc.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
structure.modwt.name <- read.delim("structure.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
gene.modwt.name <- read.delim("gene.density.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
gatc.modwt.name <- read.delim("gatc.density.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
ipd.modwt.name <- read.delim("ipd.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)
pam.modwt.name <- read.delim("pam.density.modwt.haar.txt", header=T, sep="\t", stringsAsFactors = F)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
window <- read.table("Doench2014.20bp.sliding.bed", header=F, sep="\t", stringsAsFactors = F)
score <- read.table("Doench2014.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
colnames(score) <- c("chr", "start", "end", "sgRNA", "id", "seq", "id2", "cut.score", "gid", "change.val", "quality")
score.df <- score[,c(1:4,8)]

colnames(window) <- c("chr", "start", "end")
window$window <- seq.int(nrow(window))
window$window <- as.character(window$window-1)
window$start <- as.numeric(window$start)
window$end <- as.numeric(window$end - 1)

window.score.df <- left_join(score.df, window, by=c("chr", "start", "end"))
window.score.df$window <- as.integer(window.score.df$window)
window.score.temp <- left_join(window.score.df, temp.modwt.name[,c(3,4,2)], by="window")
window.temp.gc <- left_join(window.score.temp, gc.modwt.name[,c(3,4,2)], by=c("window", "scale"))
window.temp.gc.structure <- left_join(window.temp.gc, structure.modwt.name[,c(3,4,2)], by=c("window", "scale"))
window.temp.gc.structure.gene <- left_join(window.temp.gc.structure, gene.modwt.name[,c(3,4,2)], by=c("window", "scale"))
window.temp.gc.structure.gene.gatc <- left_join(window.temp.gc.structure.gene, gatc.modwt.name[,c(3,4,2)], by=c("window", "scale"))
window.temp.gc.structure.gene.gatc.pam <- left_join(window.temp.gc.structure.gene.gatc, pam.modwt.name[,c(3,4,2)], by=c("window", "scale"))
nrow(window.temp.gc.structure.gene.gatc.pam)
# 
window.temp.gc.structure.gene.gatc.pam.sgRNA <- subset(window.temp.gc.structure.gene.gatc.pam, window.temp.gc.structure.gene.gatc.pam$cut.score != "NA")
nrow(window.temp.gc.structure.gene.gatc.pam)
# 
write.table(window.temp.gc.structure.gene.gatc.pam.sgRNA, "Doench2014.20sliding.exact.DWT.haar.txt", quote=F, row.names=F, sep="\t")

df.melt <- melt(window.temp.gc.structure.gene.gatc.pam.sgRNA[,c(4,5,7:15)], id=c("cut.score", "scale", "sgRNA"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNA", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNA + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
nrow(df.dcast.na)
# 
write.table(df.dcast.na, "Doench2014.20sliding.exact.DWT.haar.dcast.txt", quote=F, row.names=F, sep="\t")

Raw + DWT matrix

# combine regional DWT with other features 
library(tidyr)
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
df.dcast.na <- read.delim("Doench2014.20sliding.exact.DWT.haar.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
df.dcast.sep <- df.dcast.na %>% separate(sgRNA, c("sgRNA", "ID"), sep="_")
df.dcast.dwt <- df.dcast.sep[,c(4:ncol(df.dcast.sep))]
colnames(df.dcast.dwt) <- paste0('sgRNA_', colnames(df.dcast.dwt))
df.dcast <- cbind(df.dcast.sep[,1:3], df.dcast.dwt)

df <- read.delim("Doench2014.raw.onehot.tensor.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df.sep <- df %>% separate(sgRNAID, c("sgRNA", "ID", "type"), sep="_")
nrow(df.sep)
# 

df.sep.region <- inner_join(df.sep, df.dcast[,c(1,2,4:ncol(df.dcast.sep))], by=c("sgRNA", "ID"))
df.sep.region.id <- df.sep.region %>% unite(sgRNAID, c("sgRNA", "ID", "type"), sep="_")
nrow(df.sep.region.id)
# 

write.table(df.sep.region.id, "Doench2014.20sliding.raw.onehot.tensor.dwt.dcast.txt", quote=F, row.names=F, sep="\t")

iRF

–> run iRF without wavelets (due to computational limitations)

#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J doench.iRF
#SBATCH -N 1
#SBATCH -t 10:00:00
#SBATCH --mem-per-cpu=0

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014
R CMD BATCH iRF.test.R

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.test.sh
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

R

library(ranger)

iRF <- function(xmat, y, ntree=200, iter=5, classification=F, threads=1, alwayssplits=NULL, saveall=T)
{
  tmp <- cbind(xmat, Y = y)
  wt <- rep(1/ncol(xmat), ncol(xmat)) # start with equal sample weighting per SNP
  rfs <- list()
  for(i in 1:iter)
  {
    cat("\niRF iteration ",i,"\n")
    cat("=================\n")
    mtry = 0.5*sum(wt>0)
    rf <- ranger::ranger(dependent.variable.name = "Y", data = tmp, num.trees=ntree,
                         split.select.weights = wt, classification = classification,
                         mtry = mtry, importance = "impurity_corrected", num.threads=threads, write.forest = T,
                         always.split.variables = alwayssplits)
    wt        <- rf$variable.importance / sum(abs(rf$variable.importance)) # scale importance to range(0,1)
    wt[wt<0]  <- 0 # set negative weights to zero
    cat("mtry:  ", mtry, "\n")
    cat("prediction error:  ",rf$prediction.error,"\n")
    if(classification==FALSE) cat("r^2:   ",rf$r.squared,"\n")
    if(classification==TRUE) print(rf$confusion.matrix)
    cat("cor(y,yhat):   ",cor(rf$predictions,y),"\n")
    cat("SNPs with importance > 0:",sum(wt>0),"\n")
    if(saveall) rfs[[i]] <- rf
    if(sum(wt>0) < max(0.01*(ncol(xmat)-1), 10))
    {
      if(!saveall) rfs <- rf
      break
    }
  }
  return(rfs)
}


library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
df <- read.delim("Doench2014.raw.onehot.tensor.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df$cut.score <- df$cut.score.x
df.cut <- df[,c(1,1657, 3:1654, 1656)]

# sgRNAID: [,1]
# cut.score: [,2]

iRF(df.cut[,3:ncol(df.cut)], df.cut$cut.score)
# iRF iteration  5 
# =================
# mtry:   213 
# prediction error:   0.01827123 
# r^2:    0.5248384 
# cor(y,yhat):    0.734981 
# SNPs with importance > 0: 355 


library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
df <- read.delim("Doench2014.raw.onehot.tensor.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df <- df[,c(1:1654,1656)]
write.table(df, "Doench2014.raw.onehot.tensor.pam.location.dcast.txt", quote=F, row.names=F, sep="\t")

# sgRNAID: [,1]
# cut.score: [,2]
# one-hot independent: [,c(3:17,1645:1649,1651:1652,1654:1655)]
# one-hot dependent: [,c(18:57,120:139,202:221,284:303,366:385,448:467,530:549,612:631,694:713,776:795,920:943,1068:1087,1150:1169,1232:1251,1314:1333,1396:1415,1478:1497,1560:1579)]
# chemical tensors: [,c(58:119,140:201,222:283,304:365,386:447,468:529,550:611,632:693,714:775,796:919,944:1067,1088:1149,1170:1231,1252:1313,1334:1395,1416:1477,1498:1559,1580:1641)]
# raw (gc, structure, temp, gene.distance, pam.distance): [,c(1642:1644,1650,1653)]

df.raw <- df[,c(2,1642:1644,1650,1653)]
iRF(df.raw[,2:ncol(df.raw)], df.raw$cut.score)
# iRF iteration  1 
# =================
# mtry:   2.5 
# prediction error:   0.03886899 
# r^2:    -0.01082707 
# cor(y,yhat):    0.1496061 
# SNPs with importance > 0: 1 

df.onehot <- df[,c(2,3:17,1645:1649,1651:1652,1654:1655,18:57,120:139,202:221,284:303,366:385,448:467,530:549,612:631,694:713,776:795,920:943,1068:1087,1150:1169,1232:1251,1314:1333,1396:1415,1478:1497,1560:1579)]
iRF(df.onehot[,2:ncol(df.onehot)], df.onehot$cut.score)
# iRF iteration  5 
# =================
# mtry:   58.5 
# prediction error:   0.01801721 
# r^2:    0.5314444 
# cor(y,yhat):    0.7364577 
# SNPs with importance > 0: 94 

df.quantum <- df[,c(2,58:119,140:201,222:283,304:365,386:447,468:529,550:611,632:693,714:775,796:919,944:1067,1088:1149,1170:1231,1252:1313,1334:1395,1416:1477,1498:1559,1580:1641)]
iRF(df.quantum[,2:ncol(df.quantum)], df.quantum$cut.score)
# iRF iteration  4 
# =================
# mtry:   216 
# prediction error:   0.02016961 
# r^2:    0.4754692 
# cor(y,yhat):    0.6990759 
# SNPs with importance > 0: 366 

df.raw.onehot <- cbind(df.raw, df.onehot[,2:ncol(df.onehot)])
iRF(df.raw.onehot[,2:ncol(df.raw.onehot)], df.raw.onehot$cut.score)
# iRF iteration  5 
# =================
# mtry:   56 
# prediction error:   0.01830529 
# r^2:    0.5239526 
# cor(y,yhat):    0.7311999 
# SNPs with importance > 0: 94 

df.raw.quantum <- cbind(df.raw, df.quantum[,2:ncol(df.quantum)])
iRF(df.raw.quantum[,2:ncol(df.raw.quantum)], df.raw.quantum$cut.score)
# iRF iteration  5 
# =================
# mtry:   176.5 
# prediction error:   0.02064207 
# r^2:    0.4631822 
# cor(y,yhat):    0.6891123 
# SNPs with importance > 0: 300 

df.onehot.quantum <- cbind(df.onehot, df.quantum[,2:ncol(df.quantum)])
iRF(df.onehot.quantum[,2:ncol(df.onehot.quantum)], df.onehot.quantum$cut.score)
# iRF iteration  5 
# =================
# mtry:   208 
# prediction error:   0.01845882 
# r^2:    0.51996 
# cor(y,yhat):    0.7299223 
# SNPs with importance > 0: 356 

df.raw.onehot.quantum <- cbind(df.raw, df.onehot[,2:ncol(df.onehot)], df.quantum[,2:ncol(df.quantum)])
iRF(df.raw.onehot.quantum[,2:ncol(df.raw.onehot.quantum)], df.raw.onehot.quantum$cut.score)
# iRF iteration  5 
# =================
# mtry:   196 
# prediction error:   0.01821351 
# r^2:    0.5263394 
# cor(y,yhat):    0.7332925 
# SNPs with importance > 0: 329 

DONE RUNNING iRF - summit

library(dplyr)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
df <- read.delim("Doench2014.raw.onehot.tensor.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df$cut.score <- df$cut.score.x
df.cut <- df[,c(1,1657, 3:1654, 1656)]
ncol(df.cut)
# 1655
df.id <- separate(df.cut, sgRNAID, c("data", "sgRNAID"))
df.num <- mutate_all(df.id[,2:ncol(df.id)], function(x) as.numeric(as.character(x)))

write.table(df.num[,c(1,3:ncol(df.num))], "Doench2014.raw.onehot.tensor.pam.location.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.num[,c(1,3:ncol(df.num))], "Doench2014.raw.onehot.tensor.pam.location.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.num[,3:ncol(df.num)], "Doench2014.raw.onehot.tensor.pam.location.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.num[,1:2], "Doench2014.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.num[,1:2], "Doench2014.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.num[,2]), "Doench2014.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Doench2014.noDWT --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.raw.onehot.tensor.pam.location.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Submits/submit_full_Doench2014.noDWT_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Submits/submit_train_Doench2014.noDWT_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Submits/submit_test_Doench2014.noDWT_0.sh

# Andes
module load python/3.7-anaconda3

vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/YNames.txt Doench2014.noDWT
# 0.44364214097908705
sort -k3rg topVarEdges/cut.score_top95.txt | head
# GGsgRNA.raw   cut.score   0.05970314755902555
# p20relativenum_Hatomsraw  cut.score   0.04573441936072455
# p16.CCsgRNA.raw   cut.score   0.0434682191939938
# pam.distance0 cut.score   0.03126013639591283
# p20num_ringsraw   cut.score   0.03026777683362737
# p19.CGsgRNA.raw   cut.score   0.020910013660982794
# p20num_doublebondsraw cut.score   0.02076397222225037
# p8.TAsgRNA.raw    cut.score   0.01734592200141828
# p2.TAsgRNA.raw    cut.score   0.016856617209545403
# p18xy_quadrupoleraw   cut.score   0.015472105481533837
 
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Doench2014.noDWT_cut.score.importance4 | head
# GGsgRNA.raw: 1.98761
# pam.distance0: 1.53691
# p19.CGsgRNA.raw: 1.52857
# p20num_ringsraw: 1.29585
# p20num_doublebondsraw: 1.10066
# p16.CCsgRNA.raw: 1.0482
# p20relativenum_Hatomsraw: 1.0156
# p8.TAsgRNA.raw: 0.799865
# p16num_singlebondsraw: 0.639571
# gene.distance0: 0.581855


# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Doench2014.noDWT_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.7221353
add in kmer encoding
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/
cut -f 1,3 Doench2014.txt > Doench2014.noscore.txt
python ../kmer1_positional_encode.py Doench2014.noscore.txt
python ../kmer2_positional_encode.py Doench2014.noscore.txt
python ../kmer3_positional_encode.py Doench2014.noscore.txt
python ../kmer4_positional_encode.py Doench2014.noscore.txt


# separate nucleotide sequence values into individual columns in data frame so each position counts as one feature
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/

sed '1d' Doench2014.noscore_dependent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Doench2014_dep1.txt
sed '1d' Doench2014.noscore_dependent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Doench2014_dep2.txt
sed '1d' Doench2014.noscore_dependent3.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Doench2014_dep3.txt
sed '1d' Doench2014.noscore_dependent4.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Doench2014_dep4.txt
# salloc -A SYB105 -N 2 -p gpu -t 4:00:00

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
score <- read.delim("Doench2014.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- unique(score[,c(5,7)])
colnames(score.df) <- c("sgRNAID", "cut.score")

onehot.dep1 <- read.delim("Doench2014_dep1.txt", header=F, sep=" ")
onehot.dep2 <- read.delim("Doench2014_dep2.txt", header=F, sep=" ")
onehot.dep3 <- read.delim("Doench2014_dep3.txt", header=F, sep=" ")
onehot.dep4 <- read.delim("Doench2014_dep4.txt", header=F, sep=" ")
colnames(onehot.dep1)[1] <- "sgRNAID"
colnames(onehot.dep2)[1] <- "sgRNAID"
colnames(onehot.dep3)[1] <- "sgRNAID"
colnames(onehot.dep4)[1] <- "sgRNAID"

onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)-1], onehot.dep2[,1:ncol(onehot.dep2)-1], by="sgRNAID")
onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)-1], by="sgRNAID")
onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)-1], by="sgRNAID")
onehot.score <- full_join(score.df, onehot.dep, by="sgRNAID")

df.melt <- melt(onehot.score, id=c("cut.score", "sgRNAID"))
df <- na.omit(df.melt)

colnames(df) <- c("cut.score", "sgRNAID", "variable", "value")

df$value <- as.numeric(df$value)
df.id <- df[!(is.na(df$value) | df$value==""), ]
colnames(df.id) <- c("cut.score", "sgRNAID", "feature", "value")

df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
write.table(df.dcast, "Doench2014.kmer.encoding.txt", quote=F, row.names=F, sep="\t")
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J iRF.onehot.kmer
#SBATCH -N 1
#SBATCH -t 10:00:00
#SBATCH --mem-per-cpu=0

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014
R CMD BATCH iRF.onehot.kmer.R

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.onehot.kmer.sh
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

R

library(ranger)

iRF <- function(xmat, y, ntree=200, iter=5, classification=F, threads=1, alwayssplits=NULL, saveall=T)
{
  tmp <- cbind(xmat, Y = y)
  wt <- rep(1/ncol(xmat), ncol(xmat)) # start with equal sample weighting per SNP
  rfs <- list()
  for(i in 1:iter)
  {
    cat("\niRF iteration ",i,"\n")
    cat("=================\n")
    mtry = 0.5*sum(wt>0)
    rf <- ranger::ranger(dependent.variable.name = "Y", data = tmp, num.trees=ntree,
                         split.select.weights = wt, classification = classification,
                         mtry = mtry, importance = "impurity_corrected", num.threads=threads, write.forest = T,
                         always.split.variables = alwayssplits)
    wt        <- rf$variable.importance / sum(abs(rf$variable.importance)) # scale importance to range(0,1)
    wt[wt<0]  <- 0 # set negative weights to zero
    cat("mtry:  ", mtry, "\n")
    cat("prediction error:  ",rf$prediction.error,"\n")
    if(classification==FALSE) cat("r^2:   ",rf$r.squared,"\n")
    if(classification==TRUE) print(rf$confusion.matrix)
    cat("cor(y,yhat):   ",cor(rf$predictions,y),"\n")
    cat("SNPs with importance > 0:",sum(wt>0),"\n")
    if(saveall) rfs[[i]] <- rf
    if(sum(wt>0) < max(0.01*(ncol(xmat)-1), 10))
    {
      if(!saveall) rfs <- rf
      break
    }
  }
  return(rfs)
}


library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
df <- read.delim("Doench2014.kmer.encoding.txt", header=T, sep="\t", stringsAsFactors = F)

# kmer = 1
df.1 <- df[,c(2:82)]
iRF(df.1[,2:ncol(df.1)], df.1$cut.score)
# iRF iteration  2 
# =================
# mtry:   32.5 
# prediction error:   0.0196492 
# r^2:    0.488806 
# cor(y,yhat):    0.7078255 
# SNPs with importance > 0: 55 

# kmer = 2
df.2 <- df[,c(2,83:386)]
iRF(df.2[,2:ncol(df.2)], df.2$cut.score)
# iRF iteration  2 
# =================
# mtry:   94 
# prediction error:   0.01810885 
# r^2:    0.5288797 
# cor(y,yhat):    0.7320499 
# SNPs with importance > 0: 138 

# kmer = 3
df.3 <- df[,c(2,387:1538)]
iRF(df.3[,2:ncol(df.3)], df.3$cut.score)
# iRF iteration  4 
# =================
# mtry:   176.5 
# prediction error:   0.02355923 
# r^2:    0.3870824 
# cor(y,yhat):    0.6366728 
# SNPs with importance > 0: 300 

# kmer = 4
df.4 <- df[,c(2,1539:5890)]
iRF(df.4[,2:ncol(df.4)], df.4$cut.score)
# iRF iteration  5 
# =================
# mtry:   426 
# prediction error:   0.02256984 
# r^2:    0.4128225 
# cor(y,yhat):    0.6467192 
# SNPs with importance > 0: 706 

# kmer = 1 + 2
df.1.2 <- df[,c(2:386)]
iRF(df.1.2[,2:ncol(df.1.2)], df.1.2$cut.score)
# iRF iteration  3 
# =================
# mtry:   88 
# prediction error:   0.01712434 
# r^2:    0.5544926 
# cor(y,yhat):    0.7554629 
# SNPs with importance > 0: 136 

# kmer = 1 + 2 + 3
df.1.2.3 <- df[,c(2:1538)]
iRF(df.1.2.3[,2:ncol(df.1.2.3)], df.1.2.3$cut.score)
# iRF iteration  5 
# =================
# mtry:   157 
# prediction error:   0.01667377 
# r^2:    0.5662148 
# cor(y,yhat):    0.7629953 
# SNPs with importance > 0: 254 

# kmer = 1 + 2 + 3 + 4
df.1.2.3.4 <- df[,c(2:5890)]
iRF(df.1.2.3.4[,2:ncol(df.1.2.3.4)], df.1.2.3.4$cut.score)
# iRF iteration  5 
# =================
# mtry:   286 
# prediction error:   0.01691536 
# r^2:    0.5599295 
# cor(y,yhat):    0.7625464 
# SNPs with importance > 0: 444 
SHAP
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate shap

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014

# python
import pandas as pd
import numpy as np
np.random.seed(0)
import matplotlib.pyplot as plt
df = pd.read_table('/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.raw.onehot.tensor.pam.location.dcast.txt') # Load the data
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.ensemble import RandomForestRegressor
# The target variable is 'cut.score'.
Y = df['cut.score']
# get list of features from R... dput(colnames(df))
X = df.drop(columns =['sgRNAID', 'cut.score'])

# Split the data into train and test data:
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size = 0.2)
# Build the model with the random forest regression algorithm:
model = RandomForestRegressor(max_depth=6,random_state=0,n_estimators=10)
model.fit(X_train, Y_train)

import shap
shap_values = shap.TreeExplainer(model).shap_values(X_train)
f = plt.figure()
shap.summary_plot(shap_values, X_train, plot_type="bar")
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.noDWT.16dec.shap_summary_plot_bar.png", bbox_inches='tight', dpi=600)

import matplotlib.pyplot as plt
f = plt.figure()
shap.summary_plot(shap_values, X_train)
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.noDWT.16dec.shap_summary_plot_varimp.png", bbox_inches='tight', dpi=600)


# scp noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.noDWT.16dec.shap_summary_plot_varimp.png /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP/.
RIT

** Need to compile the C++ file /gpfs/alpine/syb105/proj-shared/Personal/jromero/codesnippets/ritw **

  • run RIT on Cas9 model with all features
  • need to run arva-rit and then runRIT.sh (3 scripts)
  • two outputs: size effect and directionality
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00

# /gpfs/alpine/syb105/proj-shared/Personal/jromero/PathAnalysis/runRIT.sh
## cp /gpfs/alpine/syb105/proj-shared/Personal/jromero/PathAnalysis/runRIT.sh /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh
# runRIT.sh feature name            ### Note: name is name of the run and feature is the name of the y-value

# cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run

# python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Doench2014.noDWT --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.raw.onehot.tensor.pam.location.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.score.txt

#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score Doench2014.noDWT

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/cut.score/RIT.run

sort -k3rg Doench2014.noDWT_cut.score.importance4.effect | head
# GGsgRNA.raw   cut.score   0.05970314755902555 -2.4922830521170305e-06 1068.364    0.154811002130185
# p20relativenum_Hatomsraw  cut.score   0.04573441936072455 -4.842518672415318e-06  332.522 0.23191751459995405
# p16.CCsgRNA.raw   cut.score   0.0434682191939938  6.666536737294725e-06   626.932 0.20406466107233728
# pam.distance0 cut.score   0.03126013639591283 8.518014576197689e-07   557.416 0.1616350915002599
# p20num_ringsraw   cut.score   0.03026777683362737 5.3520511137451035e-06  204.4   0.1238110408020338
# p19.CGsgRNA.raw   cut.score   0.020910013660982794    8.373493754214167e-06   201.136 0.2964272539549334
# p20num_doublebondsraw cut.score   0.02076397222225037 0.0009766008330982072   146.0   0.1467377696311431
# p8.TAsgRNA.raw    cut.score   0.01734592200141828 8.389209122949194e-06   445.251 0.13091950466369892
# p2.TAsgRNA.raw    cut.score   0.016856617209545403    7.935778676155901e-06   228.942 0.21242520749190558
# p18xy_quadrupoleraw   cut.score   0.015472105481533837    1.8887582466112958e-06  209.254 0.2185088382511396

### get output from SHAP and do correlation of SHAP values to FeatureEffect (column 4 values) or EffectSize (column 3 with sign from column 4)???

18 January

  • matrix including raw values, positional encoding kmers, quantum tensors (singleton, basepair, dimer)
# positional encoding kmers 1-4
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer1_positional_encode.py Doench2014.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer2_positional_encode.py Doench2014.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer3_positional_encode.py Doench2014.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer4_positional_encode.py Doench2014.noscore.txt


# separate nucleotide sequence values into individual columns in data frame so each position counts as one feature
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/

sed '1d' Doench2014.noscore_dependent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Doench2014_dep1.txt
sed '1d' Doench2014.noscore_dependent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Doench2014_dep2.txt
sed '1d' Doench2014.noscore_dependent3.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Doench2014_dep3.txt
sed '1d' Doench2014.noscore_dependent4.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Doench2014_dep4.txt

sed '1d' Doench2014.noscore_independent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID A C T G' | cut -d ' ' -f 1-5 > Doench2014_ind1.txt
sed '1d' Doench2014.noscore_independent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID AA AC AT AG CA CC CT CG TA TC TT TG GA GC GT GG' | cut -d ' ' -f 1-17 > Doench2014_ind2.txt
# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

R
library(dplyr)
library(reshape2)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("protein_rna_dna-vector_lee_nucleotide_dna_data_15dec.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/")
seq <- read.delim("Doench2014.sequence.txt", header=T, sep=" ", stringsAsFactors = F)

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:5]
#tensor.t <- as.data.frame(t(tensor.df[63:70,]))
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- c("A", "C", "G", "T")

rownames(seq) <- seq[,1]
seq.df <- seq[,2:21]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))

seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

#write.table(seq.tensor.dcast, "Doench2014.tensors.single.bp.txt", quote=F, row.names=F, sep="\t")
#write.table(seq.tensor.melt, "Doench2014.tensors.single.bp.melt.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.dcast, "Doench2014.tensorsAll.single.bp.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014.tensorsAll.single.bp.melt.txt", quote=F, row.names=F, sep="\t")
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J jan18.matrix
#SBATCH -N 4
#SBATCH -t 10:00:00

module load python
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014
R CMD BATCH jan18.matrix.R
R CMD BATCH jan18.matrix.2.R

#sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/jan18.matrix.sh
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
structure <- read.delim("Doench2014.gRNA.ViennaRNA.output.value.id.txt", header=T, sep="\t", stringsAsFactors = F)
nuc <- read.delim("Doench2014.nucleotide_counts_sgRNA_temp.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("Doench2014.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- unique(score[,c(5,7)])
colnames(score.df) <- c("sgRNAID", "cut.score")

structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]

# strucutre, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])

structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"

structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]

structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score.df, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")

## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
onehot.ind1 <- read.delim("Doench2014_ind1.txt", header=T, sep=" ")
onehot.ind2 <- read.delim("Doench2014_ind2.txt", header=T, sep=" ")
onehot.dep1 <- read.delim("Doench2014_dep1.txt", header=F, sep=" ")
onehot.dep2 <- read.delim("Doench2014_dep2.txt", header=F, sep=" ")
onehot.dep3 <- read.delim("Doench2014_dep3.txt", header=F, sep=" ")
onehot.dep4 <- read.delim("Doench2014_dep4.txt", header=F, sep=" ")
colnames(onehot.dep1)[1] <- "sgRNAID"
colnames(onehot.dep2)[1] <- "sgRNAID"
colnames(onehot.dep3)[1] <- "sgRNAID"
colnames(onehot.dep4)[1] <- "sgRNAID"

onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)-1], onehot.dep2[,1:ncol(onehot.dep2)-1], by="sgRNAID")
# dep 1 = V2.x - V81.x
# dep 2 = V2.y - V81.y & V82 - V305
onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)-1], by="sgRNAID")
# dep 1 = V2.x - V81.x
# dep 2 = V2.y - V81.y & V82.x - V305.x
# dep 3 = V2 - V81 & V82.y - V305.y & V306 - V1153
onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)-1], by="sgRNAID")
# dep 1 = V2.x - V81.x
# dep 2 = V2.y - V81.y & V82.x - V305.x
# dep 3 = V2.x.x - V81.x.x & V82.y - V305.y & V306.x - V1153.x
# dep 4 = V2.y.y - V81.y.y & V82 - V305 & V306.y - V1153.y & V1154 - V4353
onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"

data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))

df.melt <- melt(data.onehot, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
write.table(df.id, "Doench2014.structure.temp.gc.onehot1to4.txt", quote=F, row.names=F, sep="\t")

# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
#tensor <- read.delim("Doench2014.tensors.single.bp.melt.txt", header=T, sep="\t")
tensor <- read.delim("Doench2014.tensorsAll.single.bp.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0

tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0
#write.table(tensor.id, "tensor.id.test", quote=F, row.names=F, sep="\t")

tensor.id <- read.delim("tensor.id.test", header=T, sep="\t")
df.id <- read.delim("Doench2014.structure.temp.gc.onehot1to4.txt", header=T, sep="\t")
score <- read.delim("Doench2014.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- unique(score[,c(5,7)])
colnames(score.df) <- c("sgRNAID", "cut.score")

df.score <- unique(df.id[,c(1,3)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]

head(df.id)
head(tensor.score.order)
tensor.df <- rbind(df.id, tensor.score.order)

df.dcast <- tensor.df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast[is.na(df.dcast)] <- 0
df.dcast.na <- na.omit(df.dcast)
#write.table(df.dcast.na, "Doench2014.raw.onehot.tensor.single.bp.dcast.na.txt", quote=F, row.names=F, sep="\t")
write.table(df.dcast.na, "Doench2014.raw.onehot.tensorAll.single.bp.dcast.na.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast.na)
# 929


# pam (distance and nucleotide)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
sgRNA.pam <- read.table("Doench2014.sgRNA.closestPAM.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.pam.sub <- sgRNA.pam[,c(4,12,13)]
colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]
#sgRNA.pam.df$id <- "Cas9"
#sgRNA.pam.id <- unite(sgRNA.pam.df, "sgRNAID", c(sgRNAID, id), sep="_")

score.location <- left_join(score.df, sgRNA.pam.df, by="sgRNAID")
score.location$scale <- 0

df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)

#df <- read.delim("Doench2014.raw.onehot.tensor.single.bp.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df <- read.delim("Doench2014.raw.onehot.tensorAll.single.bp.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)

df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
nrow(df.location)
# 673


# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
sgRNA.genes <- read.table("Doench2014.sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- sgRNA.genes[,c(4,14)]
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")
#sgRNA.genes.df$id <- "Cas9"
#sgRNA.genes.id <- unite(sgRNA.genes.df, "sgRNAID", c(sgRNAID, id), sep="_")

score.location <- left_join(score.df, sgRNA.genes.df, by=c("sgRNAID"))
score.location$scale <- 0

df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
# 930

df <- df.location
df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
nrow(df.location)
# 673

#write.table(df.location, "Doench2014.raw.onehot.tensor.single.bp.pam.location.dcast.na.txt", quote=F, row.names=F, sep="\t")
write.table(df.location, "Doench2014.raw.onehot.tensorAll.single.bp.pam.location.dcast.na.txt", quote=F, row.names=F, sep="\t")
# add new DNA/RNA dimer features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(tidyr)
library(reshape2)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("quantum_dimers_20dec.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
seq <- read.delim("Doench2014.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.dimer <- seq %>% unite("p1", p1:p2, remove=F, sep= "") %>% unite("p2", p2:p3, remove=F, sep= "") %>% unite("p3", p3:p4, remove=F, sep= "") %>% unite("p4", p4:p5, remove=F, sep= "") %>% unite("p5", p5:p6, remove=F, sep= "") %>% unite("p6", p6:p7, remove=F, sep= "") %>% unite("p7", p7:p8, remove=F, sep= "") %>% unite("p8", p8:p9, remove=F, sep= "") %>% unite("p9", p9:p10, remove=F, sep= "") %>% unite("p10", p10:p11, remove=F, sep= "") %>% unite("p11", p11:p12, remove=F, sep= "") %>% unite("p12", p12:p13, remove=F, sep= "") %>% unite("p13", p13:p14, remove=F, sep= "") %>% unite("p14", p14:p15, remove=F, sep= "") %>% unite("p15", p15:p16, remove=F, sep= "") %>% unite("p16", p16:p17, remove=F, sep= "") %>% unite("p17", p17:p18, remove=F, sep= "") %>% unite("p18", p18:p19, remove=F, sep= "") %>% unite("p19", p19:p20, remove=T, sep= "")

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:17]
tensor.t <- as.data.frame(t(tensor.df))
#tensor.t$base <- c("A", "C", "G", "T")
tensor.t$base <- names(tensor[,2:17])

rownames(seq) <- seq.dimer[,1]
seq.df <- seq.dimer[,2:20]
seq.melt <- melt(seq.dimer, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))

seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Doench2014.tensors.dimers.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014.tensors.dimers.melt.txt", quote=F, row.names=F, sep="\t")
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
#df <- read.delim("Doench2014.raw.onehot.tensor.single.bp.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df <- read.delim("Doench2014.raw.onehot.tensorAll.single.bp.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)

# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
tensor <- read.delim("Doench2014.tensors.dimers.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0

tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0

df.score <- unique(df[,c(1,2)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
colnames(tensor.score.order) <- c("cut.score", "feature.scale", "sgRNAID", "value")

df.dcast <- tensor.score.order %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
#write.table(df.dcast.na, "Doench2014.raw.onehot.tensor.single.bp.dimers.dcast.na.txt", quote=F, row.names=F, sep="\t")
write.table(df.dcast.na, "Doench2014.raw.onehot.tensorAll.single.bp.dimers.dcast.na.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast.na)
# 673

df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
#write.table(df.location, "Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.txt", quote=F, row.names=F, sep="\t")
write.table(df.location, "Doench2014.raw.onehot.tensorAll.single.bp.dimers.pam.location.dcast.na.txt", quote=F, row.names=F, sep="\t")
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
df <- read.delim("Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df <- df[,c(1:6073,6075:6079,6081,6083:6177)]
df.num <- mutate_all(df[,2:ncol(df)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"

write.table(df.all, "Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.txt", quote=F, row.names=F, sep="\t")

write.table(df.all[,c(1,3:ncol(df.all))], "Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.all[,2]), "Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")


# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensor.single.bp.dimers
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensor.single.bp.dimers
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Doench2014.tensor.single.bp.dimers --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensor.single.bp.dimers
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensor.single.bp.dimers/Submits/submit_full_Doench2014.tensor.single.bp.dimers_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensor.single.bp.dimers/Submits/submit_train_Doench2014.tensor.single.bp.dimers_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensor.single.bp.dimers/Submits/submit_test_Doench2014.tensor.single.bp.dimers_0.sh

# Andes
module load python/3.7-anaconda3

vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensor.single.bp.dimers
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/YNames.txt Doench2014.tensor.single.bp.dimers
# 
sort -k3rg topVarEdges/cut.score_top95.txt | head

sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Doench2014.tensor.single.bp.dimers_cut.score.importance4 | head

# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensor.single.bp.dimers/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Doench2014.tensor.single.bp.dimers_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 





### why is this prediction lower?? because I got rid of the other quantum chemical properties??  what happens if I add those back?
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
df <- read.delim("Doench2014.raw.onehot.tensorAll.single.bp.dimers.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df <- df[,c(1:7313,7315,7317,7319:7413)]
df.num <- mutate_all(df[,2:ncol(df)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"

write.table(df.all, "Doench2014.raw.onehot.tensorAll.single.bp.dimers.pam.location.dcast.na.corrected.txt", quote=F, row.names=F, sep="\t")

write.table(df.all[,c(1,3:ncol(df.all))], "Doench2014.raw.onehot.tensorAll.single.bp.dimers.pam.location.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "Doench2014.raw.onehot.tensorAll.single.bp.dimers.pam.location.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "Doench2014.raw.onehot.tensorAll.single.bp.dimers.pam.location.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Doench2014.raw.onehot.tensorAll.single.bp.dimers.pam.location.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Doench2014.raw.onehot.tensorAll.single.bp.dimers.pam.location.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.all[,2]), "Doench2014.raw.onehot.tensorAll.single.bp.dimers.pam.location.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensorAll.single.bp.dimers
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensorAll.single.bp.dimers
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Doench2014.tensorAll.single.bp.dimers --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.raw.onehot.tensorAll.single.bp.dimers.pam.location.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.raw.onehot.tensorAll.single.bp.dimers.pam.location.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensorAll.single.bp.dimers
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensorAll.single.bp.dimers/Submits/submit_full_Doench2014.tensorAll.single.bp.dimers_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensorAll.single.bp.dimers/Submits/submit_train_Doench2014.tensorAll.single.bp.dimers_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensorAll.single.bp.dimers/Submits/submit_test_Doench2014.tensorAll.single.bp.dimers_0.sh

# Andes
module load python/3.7-anaconda3

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensorAll.single.bp.dimers
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/YNames.txt Doench2014.tensorAll.single.bp.dimers
# 0.396284289347527
sort -k3rg topVarEdges/cut.score_top95.txt | head
# GGsgRNA.raw   cut.score   0.08286483432754792
# p14dimer_H_bondraw    cut.score   0.060711831264174475
# V297.xsgRNA.raw   cut.score   0.05803372127628595
# AsgRNA.raw    cut.score   0.040184665740256094
# p20relativenum_Hatomsraw  cut.score   0.03882577467341015
# V247.xsgRNA.raw   cut.score   0.0305350646682211
# V3927sgRNA.raw    cut.score   0.02846495673084694
# sgRNA.structuresgRNA.raw  cut.score   0.01961441799504762
# p20num_ringsraw   cut.score   0.019048393493482585
# p20relativenum_singlebondsraw cut.score   0.016330563330558483

sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Doench2014.tensorAll.single.bp.dimers_cut.score.importance4 | head
# GGsgRNA.raw: 1.41962
# V297.xsgRNA.raw: 0.808977       <-- dependent 2 (p19.CG)
# V985.xsgRNA.raw: 0.767533       <-- dependent 3 (p17.TCG)
# p14dimer_H_bondraw: 0.568758
# V247.xsgRNA.raw: 0.561394       <-- dependent 2 (p16.CC)
# AsgRNA.raw: 0.465157
# sgRNA.structuresgRNA.raw: 0.454485
# p20num_doublebondsraw: 0.398546
# p20relativenum_Hatomsraw: 0.384287
# V259.xsgRNA.raw: 0.374241       <-- dependent 2 (p17.AC)


# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensorAll.single.bp.dimers/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Doench2014.tensorAll.single.bp.dimers_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.7325874
RIT

** Need to compile the C++ file /gpfs/alpine/syb105/proj-shared/Personal/jromero/codesnippets/ritw **

  • run RIT on Cas9 model with all features
  • need to run arva-rit and then runRIT.sh (3 scripts)
  • two outputs: size effect and directionality
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00
s
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0

#cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensor.single.bp.dimers/cut.score
#/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score Doench2014.tensor.single.bp.dimers

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensorAll.single.bp.dimers/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score Doench2014.tensorAll.single.bp.dimers
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.top50.sh cut.score Doench2014.tensorAll.single.bp.dimers

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.tensorAll.single.bp.dimers/cut.score/RIT.run
# sort -k3rg Doench2014.tensorAll.single.bp.dimers_cut.score.importance4.effect > Doench2014.tensorAll.single.bp.dimers_cut.score.importance4.effect_sorted

library(dplyr)
library(tidyr)
library(reshape2)
library(ggplot2)
library(RColorBrewer)

setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP")
imp <- read.delim("Doench2014.tensorAll.single.bp.dimers_cut.score.importance4.effect_sorted", header=F, sep="\t", stringsAsFactors = F)
colnames(imp) <- c("Feature", "YVec", "NormEdge", "Effect", "Samples", "Linearity")
#imp$Normalized.Importance <- as.numeric(substr(imp$NormEdge, 0, 4))
imp$Normalized.Importance <- as.numeric(imp$NormEdge)
imp$Feature.Effect <- as.numeric(imp$Effect)
imp$SampleCount <- as.numeric(imp$Samples)
imp.dir <- imp %>% mutate(Effect.Direction = ifelse(Feature.Effect < 0, "neg", ifelse(Feature.Effect > 0, "pos", "zero")))
imp.dir.top20 <- imp.dir[1:20,]

ggplot(imp.dir.top20, aes(x=reorder(Feature, -Normalized.Importance))) + geom_bar(aes(y=Normalized.Importance, fill=Effect.Direction), stat="identity") + coord_flip() + xlab("") + ylab("Normalized Importance") + theme_classic() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), legend.position="bottom") + scale_fill_brewer(palette="Set1")

# wc -l set0_Y_train_noSampleIDs.txt <-- 744
imp.dir.top20$Sample.Prop <- imp.dir.top20$SampleCount/744
ggplot(imp.dir.top20, aes(x=reorder(Feature, -Normalized.Importance))) + geom_point(aes(y=Sample.Prop, color=Effect.Direction, size=Feature.Effect)) + xlab("") + ylab("Avg Proportion of Samples that Features Influence") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Set2")
SHAP
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate shap

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014

# python
import pandas as pd
import numpy as np
np.random.seed(0)
import matplotlib.pyplot as plt
df = pd.read_table('/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.txt') # Load the data
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.ensemble import RandomForestRegressor
# The target variable is 'cut.score'.
Y = df['cut.score']
# get list of features from R... dput(colnames(df))
X = df.drop(columns =['sgRNAID', 'cut.score'])

# Split the data into train and test data:
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size = 0.2)
# Build the model with the random forest regression algorithm:
model = RandomForestRegressor(max_depth=6,random_state=0,n_estimators=10)
model.fit(X_train, Y_train)

import shap
shap_values = shap.TreeExplainer(model).shap_values(X_train)
f = plt.figure()
shap.summary_plot(shap_values, X_train, plot_type="bar")
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.shap_summary_plot_bar.png", bbox_inches='tight', dpi=600)

import matplotlib.pyplot as plt
f = plt.figure()
shap.summary_plot(shap_values, X_train)
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.shap_summary_plot_varimp.png", bbox_inches='tight', dpi=600)

# scp noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.shap_summary_plot_varimp.png /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP/.

15 March 2022: Quantum Matrix

  • generate final matrix with updated quantum properties (HL and H-bond) for monomer, basepair, dimer, trimer, tetramer
  • think through incorporating DNA and RNA sequence?
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J doench.matrix
#SBATCH -N 1
#SBATCH -t 10:00:00
#SBATCH -p gpu

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014
R CMD BATCH mar15.matrix.R

#sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/mar15.matrix.sh
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
structure <- read.delim("Doench2014.gRNA.ViennaRNA.output.value.id.txt", header=T, sep="\t", stringsAsFactors = F)
nuc <- read.delim("Doench2014.nucleotide_counts_sgRNA_temp.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("Doench2014.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- unique(score[,c(5,7,6)])
colnames(score.df) <- c("sgRNAID", "cut.score", "nucleotide.sequence")

structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]

# structure, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])

structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"

structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]

structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score.df, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "seq", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")

## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
onehot.ind1 <- read.delim("Doench2014_ind1.txt", header=T, sep=" ")
onehot.ind2 <- read.delim("Doench2014_ind2.txt", header=T, sep=" ")
onehot.dep1 <- read.delim("Doench2014_dep1.txt", header=F, sep=" ")
onehot.dep2 <- read.delim("Doench2014_dep2.txt", header=F, sep=" ")
onehot.dep3 <- read.delim("Doench2014_dep3.txt", header=F, sep=" ")
onehot.dep4 <- read.delim("Doench2014_dep4.txt", header=F, sep=" ")
colnames(onehot.dep1)[1] <- "sgRNAID"
colnames(onehot.dep2)[1] <- "sgRNAID"
colnames(onehot.dep3)[1] <- "sgRNAID"
colnames(onehot.dep4)[1] <- "sgRNAID"

onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)-1], onehot.dep2[,1:ncol(onehot.dep2)-1], by="sgRNAID")
onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)-1], by="sgRNAID")
onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)-1], by="sgRNAID")

onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"

data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))

df.melt <- melt(data.onehot[,c(1,2,4:ncol(data.onehot))], id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
write.table(df.id, "Doench2014.structure.temp.gc.onehot1to4.txt", quote=F, row.names=F, sep="\t")
# 

# pam (distance and nucleotide)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
sgRNA.pam <- read.table("Doench2014.sgRNA.closestPAM.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.pam.sub <- sgRNA.pam[,c(4,12,13)]
colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]
sgRNA.pam.id <- sgRNA.pam.df

score <- read.delim("Doench2014.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(5:6)]
colnames(score.df) <- c("sgRNAID", "cut.score")

score.location <- left_join(score.df, sgRNA.pam.id, by="sgRNAID")
score.location$scale <- 0

df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.pam.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)

df <- read.delim("Doench2014.structure.temp.gc.onehot1to4.txt", header=T, sep="\t")
df.onehot.dcast <- df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)

df.onehot.pam <- left_join(df.onehot.dcast, df.pam.dcast, by=c("sgRNAID"))

df.onehot.pam.na <- na.omit(df.onehot.pam)
nrow(df.onehot.pam.na)
# 


# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
sgRNA.genes <- read.table("Doench2014.sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- sgRNA.genes[,c(4,14)]
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")
sgRNA.genes.id <- sgRNA.genes.df

score.location <- left_join(score.df, sgRNA.genes.id, by=c("sgRNAID"))
score.location$scale <- 0

df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.location.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.location.dcast.na <- na.omit(df.location.dcast)

df.pam.location <- inner_join(df.location.dcast.na, df.onehot.pam.na, by=c("sgRNAID"))
nrow(df.pam.location)
# 

write.table(df.pam.location, "Doench2014.raw.matrix.txt", quote=F, row.names=F, sep="\t")
# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)

# Monomer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Monomer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/")
seq <- read.delim("Doench2014.sequence.txt", header=T, sep=" ", stringsAsFactors = F)

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq) <- seq[,1]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Doench2014.quantum.monomer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014.quantum.monomer.melt.txt", quote=F, row.names=F, sep="\t")


# Basepair
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Basepair.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/")
seq <- read.delim("Doench2014.sequence.txt", header=T, sep=" ", stringsAsFactors = F)

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq) <- seq[,1]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Doench2014.quantum.basepair.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014.quantum.basepair.melt.txt", quote=F, row.names=F, sep="\t")


# Dimer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Dimer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/")
seq <- read.delim("Doench2014.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.dimer <- seq %>% unite("p1", p1:p2, remove=F, sep= "") %>% unite("p2", p2:p3, remove=F, sep= "") %>% unite("p3", p3:p4, remove=F, sep= "") %>% unite("p4", p4:p5, remove=F, sep= "") %>% unite("p5", p5:p6, remove=F, sep= "") %>% unite("p6", p6:p7, remove=F, sep= "") %>% unite("p7", p7:p8, remove=F, sep= "") %>% unite("p8", p8:p9, remove=F, sep= "") %>% unite("p9", p9:p10, remove=F, sep= "") %>% unite("p10", p10:p11, remove=F, sep= "") %>% unite("p11", p11:p12, remove=F, sep= "") %>% unite("p12", p12:p13, remove=F, sep= "") %>% unite("p13", p13:p14, remove=F, sep= "") %>% unite("p14", p14:p15, remove=F, sep= "") %>% unite("p15", p15:p16, remove=F, sep= "") %>% unite("p16", p16:p17, remove=F, sep= "") %>% unite("p17", p17:p18, remove=F, sep= "") %>% unite("p18", p18:p19, remove=F, sep= "") %>% unite("p19", p19:p20, remove=T, sep= "")

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq.dimer) <- seq.dimer[,1]
seq.df <- seq.dimer[,1:20]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Doench2014.quantum.dimer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014.quantum.dimer.melt.txt", quote=F, row.names=F, sep="\t")


# Trimer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Trimer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/")
seq <- read.delim("Doench2014.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.trimer <- seq %>% unite("p1", p1:p3, remove=F, sep= "") %>% unite("p2", p2:p4, remove=F, sep= "") %>% unite("p3", p3:p5, remove=F, sep= "") %>% unite("p4", p4:p6, remove=F, sep= "") %>% unite("p5", p5:p7, remove=F, sep= "") %>% unite("p6", p6:p8, remove=F, sep= "") %>% unite("p7", p7:p9, remove=F, sep= "") %>% unite("p8", p8:p10, remove=F, sep= "") %>% unite("p9", p9:p11, remove=F, sep= "") %>% unite("p10", p10:p12, remove=F, sep= "") %>% unite("p11", p11:p13, remove=F, sep= "") %>% unite("p12", p12:p14, remove=F, sep= "") %>% unite("p13", p13:p15, remove=F, sep= "") %>% unite("p14", p14:p16, remove=F, sep= "") %>% unite("p15", p15:p17, remove=F, sep= "") %>% unite("p16", p16:p18, remove=F, sep= "") %>% unite("p17", p17:p19, remove=F, sep= "") %>% unite("p18", p18:p20, remove=F, sep= "")

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq.trimer) <- seq.trimer[,1]
seq.df <- seq.trimer[,1:19]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Doench2014.quantum.trimer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014.quantum.trimer.melt.txt", quote=F, row.names=F, sep="\t")


# Tetramer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Tetramer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/")
seq <- read.delim("Doench2014.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.tetramer <- seq %>% unite("p1", p1:p4, remove=F, sep= "") %>% unite("p2", p2:p5, remove=F, sep= "") %>% unite("p3", p3:p6, remove=F, sep= "") %>% unite("p4", p4:p7, remove=F, sep= "") %>% unite("p5", p5:p8, remove=F, sep= "") %>% unite("p6", p6:p9, remove=F, sep= "") %>% unite("p7", p7:p10, remove=F, sep= "") %>% unite("p8", p8:p11, remove=F, sep= "") %>% unite("p9", p9:p12, remove=F, sep= "") %>% unite("p10", p10:p13, remove=F, sep= "") %>% unite("p11", p11:p14, remove=F, sep= "") %>% unite("p12", p12:p15, remove=F, sep= "") %>% unite("p13", p13:p16, remove=F, sep= "") %>% unite("p14", p14:p17, remove=F, sep= "") %>% unite("p15", p15:p18, remove=F, sep= "") %>% unite("p16", p16:p19, remove=F, sep= "") %>% unite("p17", p17:p20, remove=F, sep= "") 

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq.tetramer) <- seq.tetramer[,1]
seq.df <- seq.tetramer[,1:18]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Doench2014.quantum.tetramer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014.quantum.tetramer.melt.txt", quote=F, row.names=F, sep="\t")



setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/")
monomer <- read.delim("Doench2014.quantum.monomer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
basepair <- read.delim("Doench2014.quantum.basepair.melt.txt", header=T, sep="\t", stringsAsFactors = F)
dimer <- read.delim("Doench2014.quantum.dimer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
trimer <- read.delim("Doench2014.quantum.trimer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
tetramer <- read.delim("Doench2014.quantum.tetramer.melt.txt", header=T, sep="\t", stringsAsFactors = F)

monomer.basepair <- rbind(monomer, basepair)
monomer.basepair.dimer <- rbind(monomer.basepair, dimer)
monomer.basepair.dimer.trimer <- rbind(monomer.basepair.dimer, trimer)
monomer.basepair.dimer.trimer.tetramer <- rbind(monomer.basepair.dimer.trimer, tetramer)
write.table(monomer.basepair.dimer.trimer.tetramer, "Doench2014.15mar22.quantum.melt.txt", quote=F, row.names=F, sep="\t")
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
df <- read.delim("Doench2014.raw.matrix.txt", header=T, sep="\t", stringsAsFactors = F)

# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
tensor <- read.delim("Doench2014.15mar22.quantum.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0

tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0

df.score <- unique(df[,c(1,2)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
colnames(tensor.score.order) <- c("cut.score", "feature.scale", "sgRNAID", "value")

df.dcast <- tensor.score.order %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
nrow(df.dcast.na)
# 673

df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
write.table(df.location, "Doench2014.finalquantum.txt", quote=F, row.names=F, sep="\t")
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
df <- read.delim("Doench2014.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df.cut <- df %>% select(-grep("cut.score.y.y", names(df)), -grep("cut.score.y", names(df)), -grep("cut.score.x.x", names(df))) 
df.num <- mutate_all(df.cut[,2:ncol(df.cut)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df.cut[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"

write.table(df.all, "Doench2014.finalquantum.txt", quote=F, row.names=F, sep="\t")

write.table(df.all[,c(1,3:ncol(df.all))], "Doench2014.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "Doench2014.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "Doench2014.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Doench2014.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Doench2014.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.all[,2]), "Doench2014.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")


# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Doench2014.finalquantum --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.finalquantum.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum/Submits/submit_full_Doench2014.finalquantum_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum/Submits/submit_train_Doench2014.finalquantum_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum/Submits/submit_test_Doench2014.finalquantum_0.sh

# Andes
module load python/3.7-anaconda3

vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/YNames.txt Doench2014.finalquantum
# 0.32761105236921945

sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Doench2014.finalquantum_cut.score.importance4 | head
# p13tetramer.Hbond.stackingraw: 1.68426
# p20monomer.HLgap.eVraw: 0.92112
# p14tetramer.Hbond.energyraw: 0.817467
# p20monomer.No.electronsraw: 0.773501
# p15dimer.Hbond.stackingraw: 0.672684
# V111.xsgRNA.raw: 0.564113
# p17tetramer.Hbond.stackingraw: 0.543763
# p13tetramer.Hbond.energyraw: 0.423488
# p12trimer.Hbond.energyraw: 0.257399
# AsgRNA.raw: 0.246818


# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Doench2014.finalquantum_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.3024251
RIT
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00

#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score Doench2014.finalquantum

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum/cut.score/RIT.run

# p13tetramer.Hbond.stackingraw cut.score   0.11125470557540898 -0.0007895928967698461  459.388 0.19550837155006962
# p20monomer.HLgap.eVraw    cut.score   0.08848169905650763 0.0009442158277223906   311.728 0.10390178588720794
# p20monomer.No.electronsraw    cut.score   0.08696537886884295 0.0009343689766738636   302.87  0.1055127545554004
# p17tetramer.Hbond.stackingraw cut.score   0.07192122631089481 0.0003078482264406661   272.63  0.14631521294929967
# p15dimer.Hbond.stackingraw    cut.score   0.06680307047040686 -0.0006860065623903445  299.334 0.186762337828679
# p19dimer.Hbond.stackingraw    cut.score   0.054833195861223226    0.0008171232034317624   191.97  0.1406645538533321
# p14trimer.Hbond.energyraw cut.score   0.05269802703309119 -0.00044701900945406616 204.47  0.1810428619996191
# p13tetramer.Hbond.energyraw   cut.score   0.04698573790321235 -0.0006259350766183389  170.246 0.1997767598362409
# p14tetramer.Hbond.energyraw   cut.score   0.04173947787065836 6.88679903870876e-05    206.602 0.12690147469165897
# p11tetramer.Hbond.energyraw   cut.score   0.03248012764980142 -0.0002955808579939091  166.789 0.1572346687082362
Figures
library(ggplot2)
library(reshape2)
library(RColorBrewer)

# Figure 5A
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum/cut.score")
imp <- read.delim("Doench2014.finalquantum.importance4.effect_sorted", header=F, sep="\t", stringsAsFactors = F)
colnames(imp) <- c("Feature", "YVec", "NormEdge", "Effect", "Samples", "Linearity")
imp$Normalized.Importance <- as.numeric(imp$NormEdge)
imp$Feature.Effect <- as.numeric(imp$Effect)
imp$SampleCount <- as.numeric(imp$Samples)
imp.dir <- imp %>% mutate(Effect.Direction = ifelse(Feature.Effect < 0, "neg", ifelse(Feature.Effect > 0, "pos", "zero")))
imp.dir.top20 <- imp.dir[1:20,]

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
pdf("Doench2014.Imp.Dir.Top20.21March.pdf")
ggplot(imp.dir.top20) + geom_bar(aes(x=reorder(Feature, -Normalized.Importance), y=Normalized.Importance, fill=Effect.Direction), stat="identity") + theme_classic() + xlab("Doench2014 Top Features") + ylab("Normalized Importance") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_fill_brewer(palette="Set1")
dev.off()

pdf("Doench2014.Imp.Dir.Top20.Effect.21March.pdf")
imp.dir.top20$Sample.Prop <- imp.dir.top20$SampleCount/32374
ggplot(imp.dir.top20, aes(x=reorder(Feature, -Normalized.Importance))) + geom_point(aes(y=Sample.Prop, color=Effect.Direction, size=Feature.Effect)) + xlab("Doench2014") + ylab("Avg Proportion of Samples that Features Influence") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Set2")
dev.off()


#### Figure 5B: Focus on effect size 
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum/cut.score")
imp <- read.delim("Doench2014.finalquantum.importance4.effect_sorted", header=F, sep="\t", stringsAsFactors = F)
colnames(imp) <- c("Feature", "YVec", "NormEdge", "Effect", "Samples", "Linearity")
imp$Normalized.Importance <- as.numeric(imp$NormEdge)
imp$Feature.Effect <- as.numeric(imp$Effect)
imp$SampleCount <- as.numeric(imp$Samples)
imp.dir <- imp %>% mutate(Effect.Direction = ifelse(Feature.Effect < 0, "neg", ifelse(Feature.Effect > 0, "pos", "zero")))
imp.dir$absEffect <- abs(imp.dir$Feature.Effect)
imp.dir.effectsorted <- imp.dir[order(imp.dir$absEffect, decreasing = TRUE),]
imp.dir.effectsorted.top20 <- imp.dir.effectsorted[1:20,]

pdf("Doench2014.Imp.Dir.Top20Effect.Effect.pdf")
ggplot(imp.dir.effectsorted.top20) + geom_point(aes(x=Feature, y=absEffect, color=Effect.Direction, size=Normalized.Importance)) + xlab("") + ylab("abs(Effect Size)") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Set2")
dev.off()





## Main H.sapien feature figure
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum/cut.score")
imp <- read.delim("Doench2014.finalquantum.importance4.effect_sorted", header=F, sep="\t", stringsAsFactors = F)
colnames(imp) <- c("Feature", "YVec", "NormEdge", "Effect", "Samples", "Linearity")
imp$Normalized.Importance <- as.numeric(imp$NormEdge)
imp$Feature.Effect <- as.numeric(imp$Effect)
imp$SampleCount <- as.numeric(imp$Samples)
imp.dir <- imp %>% mutate(Effect.Direction = ifelse(Feature.Effect < 0, "neg", ifelse(Feature.Effect > 0, "pos", "zero")))
imp.dir.top20 <- imp.dir[1:20,]

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
imp.dir.top20.df <- imp.dir.top20 %>% mutate(imp.dir = ifelse(Effect.Direction == "neg", Normalized.Importance*-1, Normalized.Importance))
imp.dir.top20.df$Feature.Label <- c("Tetramer H-stacking pos13", "Monomer HL-gap pos20", "Monomer No.Electrons pos20", "Tetramer H-stacking pos17", "Dimer H-stacking pos15", "Dimer H-stacking pos19", "Trimer H-bond pos14", "Tetramer H-bond pos13", "Tetramer H-bond pos14", "Tetramer H-bond pos11", "Tetramer H-stacking pos11", "Tetramer H-stacking pos1", "Tetramer H-bond pos12", "Trimer H-bond pos12", "Tetramer H-stacking pos10", "Tetramer H-stacking pos14", "Dimer HL-gap pos8", "Adenines count", "Trimer H-stacking pos15", "Trimer H-stacking pos14")

library(ggplot2)
pdf("Doench2014.FeatureEngineering.pdf")
ggplot(imp.dir.top20.df, aes(x=reorder(Feature.Label, -Normalized.Importance), y=imp.dir, color=Effect.Direction)) + geom_point(size=3) + geom_segment(aes(x=Feature.Label, xend=Feature.Label, y=0, yend=imp.dir)) + labs(title="Doench2014 Top Features") + ylab("Normalized Importance") + xlab("") + theme(axis.text.x = element_text(angle=90, vjust=0.6)) + scale_fill_brewer(palette="Set1") + theme_classic() + coord_flip()
dev.off()

pdf("Doench2014.FeatureEngineering.nocolor.pdf")
ggplot(imp.dir.top20.df, aes(x=reorder(Feature.Label, -Normalized.Importance), y=imp.dir), color="black") + geom_point(size=3) + geom_segment(aes(x=Feature.Label, xend=Feature.Label, y=0, yend=imp.dir)) + labs(title="Doench2014 Top Features") + ylab("Normalized Importance") + xlab("") + theme(axis.text.x = element_text(angle=90, vjust=0.6)) + theme_classic() + coord_flip()
dev.off()

remove highly correlated

# R
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/")
df <- read.delim("Doench2014.finalquantum.txt", header=T, sep="\t")
df.rm <- df %>% select(-grep("basepair.Hlgap.eVEraw", names(df)), -grep("dimer.Hbond.energyraw", names(df)), -grep("trimer.Hbond.energyraw", names(df)), -grep("tetramer.Hbond.energyraw", names(df))) 
# 6160

write.table(df.rm, "Doench2014.finalquantum.noncorrelated.txt", quote=F, row.names=F, sep="\t")
write.table(df.rm[,c(1,3:ncol(df.rm))], "Doench2014.finalquantum.noncorrelated.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.rm[,c(1,3:ncol(df.rm))], "Doench2014.finalquantum.noncorrelated.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.rm[,3:ncol(df.rm)], "Doench2014.finalquantum.noncorrelated.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")


# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum.noncorrelated
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum.noncorrelated
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Doench2014.finalquantum.noncorrelated --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.finalquantum.noncorrelated.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.finalquantum.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum.noncorrelated
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum.noncorrelated/Submits/submit_full_Doench2014.finalquantum.noncorrelated_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum.noncorrelated/Submits/submit_train_Doench2014.finalquantum.noncorrelated_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum.noncorrelated/Submits/submit_test_Doench2014.finalquantum.noncorrelated_0.sh

# Andes
module load python/3.7-anaconda3

vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum.noncorrelated
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/YNames.txt Doench2014.finalquantum.noncorrelated
# 0.33266930365052927

sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Doench2014.finalquantum.noncorrelated_cut.score.importance4 | head
# p13tetramer.Hbond.stackingraw: 2.11042
# p20monomer.HLgap.eVraw: 0.998273
# p15dimer.Hbond.stackingraw: 0.723382
# p20monomer.No.electronsraw: 0.709679
# V111.xsgRNA.raw: 0.621192
# p17tetramer.Hbond.stackingraw: 0.611309
# p10tetramer.Hbond.stackingraw: 0.388563
# p14tetramer.Hbond.stackingraw: 0.382813
# AsgRNA.raw: 0.376975
# p15trimer.Hbond.stackingraw: 0.322423



# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum.noncorrelated/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Doench2014.finalquantum.noncorrelated_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.560227


##### RIT:

#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum.noncorrelated/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score Doench2014.finalquantum.noncorrelated

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/Doench2014.finalquantum.noncorrelated/cut.score/RIT.run

# p13tetramer.Hbond.stackingraw cut.score   0.16082085490802894 -0.0027729880753438736  623.936 0.21228153825454327
# p20monomer.No.electronsraw    cut.score   0.09244978290616876 0.002887388395177515    320.869 0.09010609848341739
# p20monomer.HLgap.eVraw    cut.score   0.08652195118841843 0.0029125430874947385   295.543 0.0925376686643795
# p15dimer.Hbond.stackingraw    cut.score   0.08233542944820899 -0.002566124585379375   341.938 0.20322230295895138
# p17tetramer.Hbond.stackingraw cut.score   0.07824150655246016 0.0009113678739205416   277.214 0.14197966625744085
# p19dimer.Hbond.stackingraw    cut.score   0.052635215849004116    0.0017581666268891225   187.0   0.1360211855511102
# p11tetramer.Hbond.stackingraw cut.score   0.048849805200897434    -0.00167068427757621    331.105 0.14239429779198848
# p14tetramer.Hbond.stackingraw cut.score   0.03845411687972666 0.0012722245148982382   199.168 0.0905293010774468
# p10tetramer.Hbond.stackingraw cut.score   0.03606679821574927 -0.0015246836695670192  175.219 0.1708593708517629
# p14trimer.Hbond.stackingraw   cut.score   0.029417674198179745    -0.0013247495209787671  129.038 0.1728652461223355

Doench 2014: FUCKED UP… Wrong Supplmenetal? Correction

  • 18 May 2022: Realized I was using the sgRNA score column of Supplemental Table 7 which corresponded to the predicted value from the model instead of the raw experimental cutting efficiency score… does that score exist? Instead use Supplemental Table 10 which has 1,278 sgRNAs targeting 414 essential genes in A375 cells. sgRNA activity is expressed as log2 fold change in abundance during two weeks of growth.

Run with Supp Table 7 column 8 (Gene % Rank)

“Within each gene, passing sgRNAs were first ranked, with the best sgRNA receiving the rank of 1. This number was then divided by the total number of sgRNAs, which was then subtracted from 1 to determine a percent-rank. This results in the worst sgRNA for a gene receiving a percent-rank of 0, while the best sgRNA will have a percent-rank approaching 1. Percent-rank values were averaged for genes that were assayed in more than one cell line.”

sgRNA dataset
# scp /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/human/Doench.et.al.2014.supp7.txt noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/Doench2014.genepercentrank.score.txt

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/")
df <- read.delim("Doench2014.genepercentrank.score.txt", header=T, sep="\t")

library(dplyr)
df2 <- df %>% mutate(id = row_number())
df3 <- df2[,c(10,2,8)]
colnames(df3) <- c("sgRNAID",   "nucleotide.sequence", "cut.score")
df3$nucleotide.sequence <- substr(df3$nucleotide.sequence, 5, 27)
df.na <- na.omit(df3)
write.table(df.na, "Doench2014.genepercentrank.ngg.txt", quote=F, row.names=F, sep="\t")

df.na$nucleotide.sequence <- substr(df.na$nucleotide.sequence, 1, 20)
write.table(df.na, "Doench2014.genepercentrank.txt", quote=F, row.names=F, sep="\t")
# 1841

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/
sed '1d' Doench2014.genepercentrank.txt | awk '{print ">"$1"\n"$2}' > Doench2014.genepercentrank.fasta

blast

  • do a search for the sgRNA sequence in the genome
    • input fasta file of sequences, output coordinates
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank

/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query Doench2014.genepercentrank.fasta -db ../Doench2014/GCF_000001405.39_GRCh38.p13_genomic.fna -out Doench2014.genepercentrank.gRNA.blast.tab -outfmt 6 -task blastn-short -num_threads 10

awk '{if ($9 > $10) print $2"\t"$10"\t"$9"\t"$1}' Doench2014.genepercentrank.gRNA.blast.tab > tmp1.bed
awk '{if ($10 > $9) print $2"\t"$9"\t"$10"\t"$1}' Doench2014.genepercentrank.gRNA.blast.tab > tmp2.bed
cat tmp1.bed tmp2.bed > Doench2014.genepercentrank.gRNA.blast.bed
# 105959
coordinates
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

# R

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/")
df <- read.delim("Doench2014.genepercentrank.txt", header=T, sep="\t")
colnames(df) <- c("sgRNAID",    "nucleotide.sequence", "cut.score")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/")
coord <- read.delim("Doench2014.genepercentrank.gRNA.blast.bed", header=F, sep="\t")
colnames(coord) <- c("chr", "start", "end", "sgRNA")
df$sgRNA <- df$sgRNAID

library(dplyr)
df.coord <- left_join(coord, df, by="sgRNA")
write.table(df.coord, "Doench2014.genepercentrank.sgRNA.coord.txt", quote=F, row.names=F, sep="\t")
length(unique(df.coord$sgRNAID))
# 1826

RNA structure (ViennaRNA)

https://www.tbi.univie.ac.at/RNA/tutorial/ minimum free energy (MFE) structure

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate ViennaRNA

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/vienna
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/vienna
RNAfold < ../Doench2014.genepercentrank.fasta > Doench2014.genepercentrank.gRNA.ViennaRNA.output.txt

grep '(' Doench2014.genepercentrank.gRNA.ViennaRNA.output.txt | grep -Eo '[+-]?[0-9]+([.][0-9]+)?' > Doench2014.genepercentrank.gRNA.ViennaRNA.output.value.txt
grep '>' Doench2014.genepercentrank.gRNA.ViennaRNA.output.txt | sed 's/>//g' > Doench2014.genepercentrank.gRNA.names.txt
paste Doench2014.genepercentrank.gRNA.names.txt Doench2014.genepercentrank.gRNA.ViennaRNA.output.value.txt > Doench2014.genepercentrank.gRNA.ViennaRNA.output.value.id.txt
cp Doench2014.genepercentrank.gRNA.ViennaRNA.output.value.id.txt ../.

Temperature of melting (Tm)

https://biopython.org/docs/1.75/api/Bio.SeqUtils.MeltingTemp.html

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

Bio.SeqUtils.MeltingTemp.Tm_NN(seq, check=True, strict=True, c_seq=None, shift=0, nn_table=None, tmm_table=None, imm_table=None, de_table=None, dnac1=25, dnac2=25, selfcomp=False, Na=50, K=0, Tris=0, Mg=0, dNTPs=0, saltcorr=5)

https://warwick.ac.uk/fac/sci/moac/people/students/peter_cock/python/fasta_n

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

# count nucleotides
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank
python3

input_file = open('Doench2014.genepercentrank.fasta', 'r')
output_file = open('nucleotide_counts_sgRNA.tsv','w')
output_file.write('Window\tA\tC\tG\tT\tLength\tCG%\n')
from Bio import SeqIO
for cur_record in SeqIO.parse(input_file, "fasta") :
    gene_name = cur_record.name
    A_count = cur_record.seq.count('A')
    C_count = cur_record.seq.count('C')
    G_count = cur_record.seq.count('G')
    T_count = cur_record.seq.count('T')
    length = len(cur_record.seq)
    cg_percentage = float(C_count + G_count) / length
    output_line = '%s\t%i\t%i\t%i\t%i\t%i\t%f\n' % \
    (gene_name, A_count, C_count, G_count, T_count, length, cg_percentage)
    output_file.write(output_line)
    
output_file.close()
input_file.close()
exit()

# Melting temperature(°C) = 64.9 + 41 * (nG+nC-16.4)/(nA+nT+nG+nC)
R

library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank")
df <- read.delim("nucleotide_counts_sgRNA.tsv", header=T, sep="\t")
df.melt <- df %>% mutate(MeltingTemp = 64.9 + 41 * (G+C-16.4) / (A+T+G+C))

write.table(df.melt, "Doench2014.genepercentrank.nucleotide_counts_sgRNA_temp.txt", quote=F, row.names=F, sep="\t")
q()

Positional Encoding

# positional encoding kmers 1-4
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/
cut -f 1-2 Doench2014.genepercentrank.txt > Doench2014.genepercentrank.noscore.txt

python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer1_positional_encode.py Doench2014.genepercentrank.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer2_positional_encode.py Doench2014.genepercentrank.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer3_positional_encode.py Doench2014.genepercentrank.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer4_positional_encode.py Doench2014.genepercentrank.noscore.txt


# separate nucleotide sequence values into individual columns in data frame so each position counts as one feature
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/

sed '1d' Doench2014.genepercentrank.noscore_dependent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Doench2014.genepercentrank_dep1.txt
sed '1d' Doench2014.genepercentrank.noscore_dependent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Doench2014.genepercentrank_dep2.txt
sed '1d' Doench2014.genepercentrank.noscore_dependent3.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Doench2014.genepercentrank_dep3.txt
sed '1d' Doench2014.genepercentrank.noscore_dependent4.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Doench2014.genepercentrank_dep4.txt

python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/encode_sequences.py Doench2014.genepercentrank.noscore.txt
sed '1d' Doench2014.genepercentrank.noscore_independent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID A C T G' | cut -d ' ' -f 1-5 > Doench2014.genepercentrank_ind1.txt
sed '1d' Doench2014.genepercentrank.noscore_independent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID AA AC AT AG CA CC CT CG TA TC TT TG GA GC GT GG' | cut -d ' ' -f 1-17 > Doench2014.genepercentrank_ind2.txt

Quantum tensors

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/
sed '1d' Doench2014.genepercentrank.noscore.txt | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 p16 p17 p18 p19 p20' | cut -d ' ' -f 1-21 > Doench2014.genepercentrank.sequence.txt


# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test


library(dplyr)
library(reshape2)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("protein_rna_dna-vector_lee_nucleotide_dna_data_15dec.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/")
seq <- read.delim("Doench2014.genepercentrank.sequence.txt", header=T, sep=" ", stringsAsFactors = F)

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:5]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- c("A", "C", "G", "T")

rownames(seq) <- seq[,1]
seq.df <- seq[,2:21]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))

seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Doench2014.genepercentrank.tensorsAll.single.bp.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014.genepercentrank.tensorsAll.single.bp.melt.txt", quote=F, row.names=F, sep="\t")

PAM

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank")
df <- read.delim("Doench2014.genepercentrank.ngg.txt")
df$ngg <- substr(df$nucleotide.sequence, 21, 23)
df$nucleotide.sequence <- substr(df$nucleotide.sequence, 1, 20)
df$pam.distance <- 1
write.table(df, "Doench2014.genepercentrank.sgRNA.closestPAM.bed", quote=F, row.names=F, sep='\t')

location relative to gene

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank
cut -f 1-4 Doench2014.genepercentrank.sgRNA.coord.txt | sed '1d' | sort -k 1,1 -k 2,2n > Doench2014.genepercentrank.sgRNA.coord.bed
bedtools closest -a Doench2014.genepercentrank.sgRNA.coord.bed -b ../Doench2014/GCF_000001405.39_GRCh38.p13_genomic.gene.sorted.gtf -D b > Doench2014.genepercentrank.sgRNA.gene.closest.bed

Feature matrix

# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank")
structure <- read.delim("Doench2014.genepercentrank.gRNA.ViennaRNA.output.value.id.txt", header=T, sep="\t", stringsAsFactors = F)
nuc <- read.delim("Doench2014.genepercentrank.nucleotide_counts_sgRNA_temp.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("Doench2014.genepercentrank.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- unique(score[,c(5,7,6)])
colnames(score.df) <- c("sgRNAID", "cut.score", "nucleotide.sequence")

structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]

# structure, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])

structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"

structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]

structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score.df, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "seq", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")

## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank")
onehot.ind1 <- read.delim("Doench2014.genepercentrank_ind1.txt", header=T, sep=" ")
onehot.ind2 <- read.delim("Doench2014.genepercentrank_ind2.txt", header=T, sep=" ")
onehot.dep1 <- read.delim("Doench2014.genepercentrank_dep1.txt", header=F, sep=" ")
onehot.dep2 <- read.delim("Doench2014.genepercentrank_dep2.txt", header=F, sep=" ")
onehot.dep3 <- read.delim("Doench2014.genepercentrank_dep3.txt", header=F, sep=" ")
onehot.dep4 <- read.delim("Doench2014.genepercentrank_dep4.txt", header=F, sep=" ")
colnames(onehot.dep1)[1] <- "sgRNAID"
colnames(onehot.dep2)[1] <- "sgRNAID"
colnames(onehot.dep3)[1] <- "sgRNAID"
colnames(onehot.dep4)[1] <- "sgRNAID"

onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)-1], onehot.dep2[,1:ncol(onehot.dep2)-1], by="sgRNAID")
onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)-1], by="sgRNAID")
onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)-1], by="sgRNAID")

onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"

data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))

df.melt <- melt(data.onehot[,c(1,2,4:ncol(data.onehot))], id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
write.table(df.id, "Doench2014.genepercentrank.structure.temp.gc.onehot1to4.txt", quote=F, row.names=F, sep="\t")
length(unique(df.id$sgRNAID))
# 1825 sgRNAIDs

# pam (distance and nucleotide)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank")
sgRNA.pam <- read.table("Doench2014.genepercentrank.sgRNA.closestPAM.bed", header=T, sep="\t", stringsAsFactors = F)
sgRNA.pam.sub <- sgRNA.pam[,c(1,4,5)]
colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]
sgRNA.pam.id <- sgRNA.pam.df

score <- read.delim("Doench2014.genepercentrank.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(5,7)]
colnames(score.df) <- c("sgRNAID", "cut.score")

score.location <- left_join(score.df, sgRNA.pam.id, by="sgRNAID")
score.location$scale <- 0

df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.pam.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)

df <- read.delim("Doench2014.genepercentrank.structure.temp.gc.onehot1to4.txt", header=T, sep="\t")
df.onehot.dcast <- df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)

df.onehot.pam <- left_join(df.onehot.dcast, df.pam.dcast, by=c("sgRNAID"))

df.onehot.pam.na <- na.omit(df.onehot.pam)
nrow(df.onehot.pam.na)
# 1825


# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank")
sgRNA.genes <- read.table("Doench2014.genepercentrank.sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- sgRNA.genes[,c(4,14)]
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")
sgRNA.genes.id <- sgRNA.genes.df

score.location <- left_join(score.df, sgRNA.genes.id, by=c("sgRNAID"))
score.location$scale <- 0

df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.location.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.location.dcast.na <- na.omit(df.location.dcast)

df.pam.location <- inner_join(df.location.dcast.na, df.onehot.pam.na, by=c("sgRNAID"))
nrow(df.pam.location)
# 1825

df.final <- df.pam.location[,c(1:3,5:5915,5917:5921)]
ncol(df.final)
# 5919
write.table(df.final, "Doench2014.genepercentrank.raw.matrix.txt", quote=F, row.names=F, sep="\t")
# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)

# Monomer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Monomer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/")
seq <- read.delim("Doench2014.genepercentrank.sequence.txt", header=T, sep=" ", stringsAsFactors = F)

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq) <- seq[,1]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Doench2014.genepercentrank.quantum.monomer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014.genepercentrank.quantum.monomer.melt.txt", quote=F, row.names=F, sep="\t")


# Basepair
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Basepair.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/")
seq <- read.delim("Doench2014.genepercentrank.sequence.txt", header=T, sep=" ", stringsAsFactors = F)

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq) <- seq[,1]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Doench2014.genepercentrank.quantum.basepair.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014.genepercentrank.quantum.basepair.melt.txt", quote=F, row.names=F, sep="\t")


# Dimer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Dimer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/")
seq <- read.delim("Doench2014.genepercentrank.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.dimer <- seq %>% unite("p1", p1:p2, remove=F, sep= "") %>% unite("p2", p2:p3, remove=F, sep= "") %>% unite("p3", p3:p4, remove=F, sep= "") %>% unite("p4", p4:p5, remove=F, sep= "") %>% unite("p5", p5:p6, remove=F, sep= "") %>% unite("p6", p6:p7, remove=F, sep= "") %>% unite("p7", p7:p8, remove=F, sep= "") %>% unite("p8", p8:p9, remove=F, sep= "") %>% unite("p9", p9:p10, remove=F, sep= "") %>% unite("p10", p10:p11, remove=F, sep= "") %>% unite("p11", p11:p12, remove=F, sep= "") %>% unite("p12", p12:p13, remove=F, sep= "") %>% unite("p13", p13:p14, remove=F, sep= "") %>% unite("p14", p14:p15, remove=F, sep= "") %>% unite("p15", p15:p16, remove=F, sep= "") %>% unite("p16", p16:p17, remove=F, sep= "") %>% unite("p17", p17:p18, remove=F, sep= "") %>% unite("p18", p18:p19, remove=F, sep= "") %>% unite("p19", p19:p20, remove=T, sep= "")

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq.dimer) <- seq.dimer[,1]
seq.df <- seq.dimer[,1:20]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Doench2014.genepercentrank.quantum.dimer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014.genepercentrank.quantum.dimer.melt.txt", quote=F, row.names=F, sep="\t")


# Trimer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Trimer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/")
seq <- read.delim("Doench2014.genepercentrank.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.trimer <- seq %>% unite("p1", p1:p3, remove=F, sep= "") %>% unite("p2", p2:p4, remove=F, sep= "") %>% unite("p3", p3:p5, remove=F, sep= "") %>% unite("p4", p4:p6, remove=F, sep= "") %>% unite("p5", p5:p7, remove=F, sep= "") %>% unite("p6", p6:p8, remove=F, sep= "") %>% unite("p7", p7:p9, remove=F, sep= "") %>% unite("p8", p8:p10, remove=F, sep= "") %>% unite("p9", p9:p11, remove=F, sep= "") %>% unite("p10", p10:p12, remove=F, sep= "") %>% unite("p11", p11:p13, remove=F, sep= "") %>% unite("p12", p12:p14, remove=F, sep= "") %>% unite("p13", p13:p15, remove=F, sep= "") %>% unite("p14", p14:p16, remove=F, sep= "") %>% unite("p15", p15:p17, remove=F, sep= "") %>% unite("p16", p16:p18, remove=F, sep= "") %>% unite("p17", p17:p19, remove=F, sep= "") %>% unite("p18", p18:p20, remove=F, sep= "")

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq.trimer) <- seq.trimer[,1]
seq.df <- seq.trimer[,1:19]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Doench2014.genepercentrank.quantum.trimer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014.genepercentrank.quantum.trimer.melt.txt", quote=F, row.names=F, sep="\t")


# Tetramer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Tetramer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/")
seq <- read.delim("Doench2014.genepercentrank.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.tetramer <- seq %>% unite("p1", p1:p4, remove=F, sep= "") %>% unite("p2", p2:p5, remove=F, sep= "") %>% unite("p3", p3:p6, remove=F, sep= "") %>% unite("p4", p4:p7, remove=F, sep= "") %>% unite("p5", p5:p8, remove=F, sep= "") %>% unite("p6", p6:p9, remove=F, sep= "") %>% unite("p7", p7:p10, remove=F, sep= "") %>% unite("p8", p8:p11, remove=F, sep= "") %>% unite("p9", p9:p12, remove=F, sep= "") %>% unite("p10", p10:p13, remove=F, sep= "") %>% unite("p11", p11:p14, remove=F, sep= "") %>% unite("p12", p12:p15, remove=F, sep= "") %>% unite("p13", p13:p16, remove=F, sep= "") %>% unite("p14", p14:p17, remove=F, sep= "") %>% unite("p15", p15:p18, remove=F, sep= "") %>% unite("p16", p16:p19, remove=F, sep= "") %>% unite("p17", p17:p20, remove=F, sep= "") 

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq.tetramer) <- seq.tetramer[,1]
seq.df <- seq.tetramer[,1:18]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Doench2014.genepercentrank.quantum.tetramer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014.genepercentrank.quantum.tetramer.melt.txt", quote=F, row.names=F, sep="\t")



setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/")
monomer <- read.delim("Doench2014.genepercentrank.quantum.monomer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
basepair <- read.delim("Doench2014.genepercentrank.quantum.basepair.melt.txt", header=T, sep="\t", stringsAsFactors = F)
dimer <- read.delim("Doench2014.genepercentrank.quantum.dimer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
trimer <- read.delim("Doench2014.genepercentrank.quantum.trimer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
tetramer <- read.delim("Doench2014.genepercentrank.quantum.tetramer.melt.txt", header=T, sep="\t", stringsAsFactors = F)

monomer.basepair <- rbind(monomer, basepair)
monomer.basepair.dimer <- rbind(monomer.basepair, dimer)
monomer.basepair.dimer.trimer <- rbind(monomer.basepair.dimer, trimer)
monomer.basepair.dimer.trimer.tetramer <- rbind(monomer.basepair.dimer.trimer, tetramer)
write.table(monomer.basepair.dimer.trimer.tetramer, "Doench2014.genepercentrank.15mar22.quantum.melt.txt", quote=F, row.names=F, sep="\t")
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank")
df <- read.delim("Doench2014.genepercentrank.raw.matrix.txt", header=T, sep="\t", stringsAsFactors = F)

# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank")
tensor <- read.delim("Doench2014.genepercentrank.15mar22.quantum.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0

tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0

df.score <- unique(df[,c(1,2)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
colnames(tensor.score.order) <- c("cut.score", "feature.scale", "sgRNAID", "value")

df.dcast <- tensor.score.order %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
nrow(df.dcast.na)
# 1825

df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
write.table(df.location, "Doench2014.genepercentrank.finalquantum.txt", quote=F, row.names=F, sep="\t")
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank")
df <- read.delim("Doench2014.genepercentrank.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df.cut <- df[,c(1:5919,5921:6236)]
df.num <- mutate_all(df.cut[,2:ncol(df.cut)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df.cut[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"


write.table(df.all, "Doench2014.genepercentrank.finalquantum.df.txt", quote=F, row.names=F, sep="\t")

write.table(df.all[,c(1,3:ncol(df.all))], "Doench2014.genepercentrank.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "Doench2014.genepercentrank.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "Doench2014.genepercentrank.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Doench2014.genepercentrank.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Doench2014.genepercentrank.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.all[,2]), "Doench2014.genepercentrank.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")


# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/iRF.run/Doench2014.genepercentrank.finalquantum
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/iRF.run/Doench2014.genepercentrank.finalquantum
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Doench2014.genepercentrank.finalquantum --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/Doench2014.genepercentrank.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/Doench2014.genepercentrank.finalquantum.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/iRF.run/Doench2014.genepercentrank.finalquantum
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/iRF.run/Doench2014.genepercentrank.finalquantum/Submits/submit_full_Doench2014.genepercentrank.finalquantum_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/iRF.run/Doench2014.genepercentrank.finalquantum/Submits/submit_train_Doench2014.genepercentrank.finalquantum_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/iRF.run/Doench2014.genepercentrank.finalquantum/Submits/submit_test_Doench2014.genepercentrank.finalquantum_0.sh

# Andes
module load python/3.7-anaconda3

vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/iRF.run/Doench2014.genepercentrank.finalquantum
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/YNames.txt Doench2014.genepercentrank.finalquantum
# 0.2755723456206027

sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Doench2014.genepercentrank.finalquantum_cut.score.importance4 | head
# p20monomer.HLgap.eVraw: 6.55918
# p20monomer.No.electronsraw: 5.46649
# GGsgRNA.raw: 4.50132
# AsgRNA.raw: 3.85251
# sgRNA.structuresgRNA.raw: 3.82253
# p18trimer.Hbond.energyraw: 3.6557
# p17tetramer.Hlgap.eVEraw: 2.57827
# p17tetramer.Hbond.energyraw: 1.76065
# p15dimer.Hbond.stackingraw: 1.63949
# p16tetramer.Hbond.energyraw: 1.40244

# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/iRF.run/Doench2014.genepercentrank.finalquantum/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Doench2014.genepercentrank.finalquantum_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.566357
RIT
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00

#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/iRF.run/Doench2014.genepercentrank.finalquantum/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score Doench2014.genepercentrank.finalquantum

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014.genepercentrank/iRF.run/Doench2014.genepercentrank.finalquantum/cut.score/RIT.run

Run with Supp Table 10

sgRNA dataset
# scp /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/human/Doench.et.al.2014.supp10.txt noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/Doench2014CORRECTED.score.txt

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/")
df <- read.delim("Doench2014CORRECTED.score.txt", header=T, sep="\t")

library(dplyr)
df2 <- df %>% mutate(id = row_number())
df3 <- df2[,c(3,2,13)]
colnames(df3) <- c("sgRNAID",   "nucleotide.sequence", "cut.score")
df3$nucleotide.sequence <- substr(df3$nucleotide.sequence, 5, 27)
df.na <- na.omit(df3)
write.table(df.na, "Doench2014CORRECTED.ngg.txt", quote=F, row.names=F, sep="\t")

df.na$nucleotide.sequence <- substr(df.na$nucleotide.sequence, 1, 20)
write.table(df.na, "Doench2014CORRECTED.txt", quote=F, row.names=F, sep="\t")
# 1278

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/
sed '1d' Doench2014CORRECTED.txt | awk '{print ">"$1"\n"$2}' > Doench2014CORRECTED.fasta

blast

  • do a search for the sgRNA sequence in the genome
    • input fasta file of sequences, output coordinates
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED

/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query Doench2014CORRECTED.fasta -db ../Doench2014/GCF_000001405.39_GRCh38.p13_genomic.fna -out Doench2014CORRECTED.gRNA.blast.tab -outfmt 6 -task blastn-short -num_threads 10

awk '{if ($9 > $10) print $2"\t"$10"\t"$9"\t"$1}' Doench2014CORRECTED.gRNA.blast.tab > tmp1.bed
awk '{if ($10 > $9) print $2"\t"$9"\t"$10"\t"$1}' Doench2014CORRECTED.gRNA.blast.tab > tmp2.bed
cat tmp1.bed tmp2.bed > Doench2014CORRECTED.gRNA.blast.bed
# 42037
coordinates
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

# R

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/")
df <- read.delim("Doench2014CORRECTED.txt", header=T, sep="\t")
colnames(df) <- c("sgRNAID",    "nucleotide.sequence", "cut.score")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/")
coord <- read.delim("Doench2014CORRECTED.gRNA.blast.bed", header=F, sep="\t")
colnames(coord) <- c("chr", "start", "end", "sgRNA")
df$sgRNA <- df$sgRNAID

library(dplyr)
df.coord <- left_join(coord, df, by="sgRNA")
write.table(df.coord, "Doench2014CORRECTED.sgRNA.coord.txt", quote=F, row.names=F, sep="\t")
length(unique(df.coord$sgRNAID))
#1278

RNA structure (ViennaRNA)

https://www.tbi.univie.ac.at/RNA/tutorial/ minimum free energy (MFE) structure

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate ViennaRNA

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/vienna
RNAfold < ../Doench2014CORRECTED.fasta > Doench2014CORRECTED.gRNA.ViennaRNA.output.txt

grep '(' Doench2014CORRECTED.gRNA.ViennaRNA.output.txt | grep -Eo '[+-]?[0-9]+([.][0-9]+)?' > Doench2014CORRECTED.gRNA.ViennaRNA.output.value.txt
grep '>' Doench2014CORRECTED.gRNA.ViennaRNA.output.txt | sed 's/>//g' > Doench2014CORRECTED.gRNA.names.txt
paste Doench2014CORRECTED.gRNA.names.txt Doench2014CORRECTED.gRNA.ViennaRNA.output.value.txt > Doench2014CORRECTED.gRNA.ViennaRNA.output.value.id.txt
cp Doench2014CORRECTED.gRNA.ViennaRNA.output.value.id.txt ../.

Temperature of melting (Tm)

https://biopython.org/docs/1.75/api/Bio.SeqUtils.MeltingTemp.html

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

Bio.SeqUtils.MeltingTemp.Tm_NN(seq, check=True, strict=True, c_seq=None, shift=0, nn_table=None, tmm_table=None, imm_table=None, de_table=None, dnac1=25, dnac2=25, selfcomp=False, Na=50, K=0, Tris=0, Mg=0, dNTPs=0, saltcorr=5)

https://warwick.ac.uk/fac/sci/moac/people/students/peter_cock/python/fasta_n

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

# count nucleotides
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED
python3

input_file = open('Doench2014CORRECTED.fasta', 'r')
output_file = open('nucleotide_counts_sgRNA.tsv','w')
output_file.write('Window\tA\tC\tG\tT\tLength\tCG%\n')
from Bio import SeqIO
for cur_record in SeqIO.parse(input_file, "fasta") :
    gene_name = cur_record.name
    A_count = cur_record.seq.count('A')
    C_count = cur_record.seq.count('C')
    G_count = cur_record.seq.count('G')
    T_count = cur_record.seq.count('T')
    length = len(cur_record.seq)
    cg_percentage = float(C_count + G_count) / length
    output_line = '%s\t%i\t%i\t%i\t%i\t%i\t%f\n' % \
    (gene_name, A_count, C_count, G_count, T_count, length, cg_percentage)
    output_file.write(output_line)
    
output_file.close()
input_file.close()
exit()

# Melting temperature(°C) = 64.9 + 41 * (nG+nC-16.4)/(nA+nT+nG+nC)
R

library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED")
df <- read.delim("nucleotide_counts_sgRNA.tsv", header=T, sep="\t")
df.melt <- df %>% mutate(MeltingTemp = 64.9 + 41 * (G+C-16.4) / (A+T+G+C))

write.table(df.melt, "Doench2014CORRECTED.nucleotide_counts_sgRNA_temp.txt", quote=F, row.names=F, sep="\t")
q()

Positional Encoding

# positional encoding kmers 1-4
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/
cut -f 1-2 Doench2014CORRECTED.txt > Doench2014CORRECTED.noscore.txt

python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer1_positional_encode.py Doench2014CORRECTED.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer2_positional_encode.py Doench2014CORRECTED.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer3_positional_encode.py Doench2014CORRECTED.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer4_positional_encode.py Doench2014CORRECTED.noscore.txt


# separate nucleotide sequence values into individual columns in data frame so each position counts as one feature
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/

sed '1d' Doench2014CORRECTED.noscore_dependent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Doench2014CORRECTED_dep1.txt
sed '1d' Doench2014CORRECTED.noscore_dependent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Doench2014CORRECTED_dep2.txt
sed '1d' Doench2014CORRECTED.noscore_dependent3.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Doench2014CORRECTED_dep3.txt
sed '1d' Doench2014CORRECTED.noscore_dependent4.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Doench2014CORRECTED_dep4.txt

python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/encode_sequences.py Doench2014CORRECTED.noscore.txt
sed '1d' Doench2014CORRECTED.noscore_independent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID A C T G' | cut -d ' ' -f 1-5 > Doench2014CORRECTED_ind1.txt
sed '1d' Doench2014CORRECTED.noscore_independent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID AA AC AT AG CA CC CT CG TA TC TT TG GA GC GT GG' | cut -d ' ' -f 1-17 > Doench2014CORRECTED_ind2.txt

Quantum tensors

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/
sed '1d' Doench2014CORRECTED.noscore.txt | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 p16 p17 p18 p19 p20' | cut -d ' ' -f 1-21 > Doench2014CORRECTED.sequence.txt


# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test


library(dplyr)
library(reshape2)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("protein_rna_dna-vector_lee_nucleotide_dna_data_15dec.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/")
seq <- read.delim("Doench2014CORRECTED.sequence.txt", header=T, sep=" ", stringsAsFactors = F)

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:5]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- c("A", "C", "G", "T")

rownames(seq) <- seq[,1]
seq.df <- seq[,2:21]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))

seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Doench2014CORRECTED.tensorsAll.single.bp.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014CORRECTED.tensorsAll.single.bp.melt.txt", quote=F, row.names=F, sep="\t")

PAM

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED")
df <- read.delim("Doench2014CORRECTED.ngg.txt")
df$ngg <- substr(df$nucleotide.sequence, 21, 23)
df$nucleotide.sequence <- substr(df$nucleotide.sequence, 1, 20)
df$pam.distance <- 1
write.table(df, "Doench2014CORRECTED.sgRNA.closestPAM.bed", quote=F, row.names=F, sep='\t')

location relative to gene

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED
cut -f 1-4 Doench2014CORRECTED.sgRNA.coord.txt | sed '1d' | sort -k 1,1 -k 2,2n > Doench2014CORRECTED.sgRNA.coord.bed
bedtools closest -a Doench2014CORRECTED.sgRNA.coord.bed -b ../Doench2014/GCF_000001405.39_GRCh38.p13_genomic.gene.sorted.gtf -D b > Doench2014CORRECTED.sgRNA.gene.closest.bed

Feature matrix

# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED")
structure <- read.delim("Doench2014CORRECTED.gRNA.ViennaRNA.output.value.id.txt", header=T, sep="\t", stringsAsFactors = F)
nuc <- read.delim("Doench2014CORRECTED.nucleotide_counts_sgRNA_temp.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("Doench2014CORRECTED.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- unique(score[,c(5,7,6)])
colnames(score.df) <- c("sgRNAID", "cut.score", "nucleotide.sequence")

structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]

# structure, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])

structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"

structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]

structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score.df, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "seq", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")

## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED")
onehot.ind1 <- read.delim("Doench2014CORRECTED_ind1.txt", header=T, sep=" ")
onehot.ind2 <- read.delim("Doench2014CORRECTED_ind2.txt", header=T, sep=" ")
onehot.dep1 <- read.delim("Doench2014CORRECTED_dep1.txt", header=F, sep=" ")
onehot.dep2 <- read.delim("Doench2014CORRECTED_dep2.txt", header=F, sep=" ")
onehot.dep3 <- read.delim("Doench2014CORRECTED_dep3.txt", header=F, sep=" ")
onehot.dep4 <- read.delim("Doench2014CORRECTED_dep4.txt", header=F, sep=" ")
colnames(onehot.dep1)[1] <- "sgRNAID"
colnames(onehot.dep2)[1] <- "sgRNAID"
colnames(onehot.dep3)[1] <- "sgRNAID"
colnames(onehot.dep4)[1] <- "sgRNAID"

onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)-1], onehot.dep2[,1:ncol(onehot.dep2)-1], by="sgRNAID")
onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)-1], by="sgRNAID")
onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)-1], by="sgRNAID")

onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"

data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))

df.melt <- melt(data.onehot[,c(1,2,4:ncol(data.onehot))], id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
write.table(df.id, "Doench2014CORRECTED.structure.temp.gc.onehot1to4.txt", quote=F, row.names=F, sep="\t")
# 1277 sgRNAIDs

# pam (distance and nucleotide)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED")
sgRNA.pam <- read.table("Doench2014CORRECTED.sgRNA.closestPAM.bed", header=T, sep="\t", stringsAsFactors = F)
sgRNA.pam.sub <- sgRNA.pam[,c(1,4,5)]
colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]
sgRNA.pam.id <- sgRNA.pam.df

score <- read.delim("Doench2014CORRECTED.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(5,7)]
colnames(score.df) <- c("sgRNAID", "cut.score")

score.location <- left_join(score.df, sgRNA.pam.id, by="sgRNAID")
score.location$scale <- 0

df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.pam.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)

df <- read.delim("Doench2014CORRECTED.structure.temp.gc.onehot1to4.txt", header=T, sep="\t")
df.onehot.dcast <- df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)

df.onehot.pam <- left_join(df.onehot.dcast, df.pam.dcast, by=c("sgRNAID"))

df.onehot.pam.na <- na.omit(df.onehot.pam)
nrow(df.onehot.pam.na)
# 1277


# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED")
sgRNA.genes <- read.table("Doench2014CORRECTED.sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- sgRNA.genes[,c(4,14)]
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")
sgRNA.genes.id <- sgRNA.genes.df

score.location <- left_join(score.df, sgRNA.genes.id, by=c("sgRNAID"))
score.location$scale <- 0

df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.location.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.location.dcast.na <- na.omit(df.location.dcast)

df.pam.location <- inner_join(df.location.dcast.na, df.onehot.pam.na, by=c("sgRNAID"))
nrow(df.pam.location)
# 1277

df.final <- df.pam.location[,c(1:3,5:5915,5917:5921)]
ncol(df.final)
# 5919
write.table(df.final, "Doench2014CORRECTED.raw.matrix.txt", quote=F, row.names=F, sep="\t")
# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)

# Monomer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Monomer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/")
seq <- read.delim("Doench2014CORRECTED.sequence.txt", header=T, sep=" ", stringsAsFactors = F)

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq) <- seq[,1]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Doench2014CORRECTED.quantum.monomer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014CORRECTED.quantum.monomer.melt.txt", quote=F, row.names=F, sep="\t")


# Basepair
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Basepair.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/")
seq <- read.delim("Doench2014CORRECTED.sequence.txt", header=T, sep=" ", stringsAsFactors = F)

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq) <- seq[,1]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Doench2014CORRECTED.quantum.basepair.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014CORRECTED.quantum.basepair.melt.txt", quote=F, row.names=F, sep="\t")


# Dimer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Dimer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/")
seq <- read.delim("Doench2014CORRECTED.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.dimer <- seq %>% unite("p1", p1:p2, remove=F, sep= "") %>% unite("p2", p2:p3, remove=F, sep= "") %>% unite("p3", p3:p4, remove=F, sep= "") %>% unite("p4", p4:p5, remove=F, sep= "") %>% unite("p5", p5:p6, remove=F, sep= "") %>% unite("p6", p6:p7, remove=F, sep= "") %>% unite("p7", p7:p8, remove=F, sep= "") %>% unite("p8", p8:p9, remove=F, sep= "") %>% unite("p9", p9:p10, remove=F, sep= "") %>% unite("p10", p10:p11, remove=F, sep= "") %>% unite("p11", p11:p12, remove=F, sep= "") %>% unite("p12", p12:p13, remove=F, sep= "") %>% unite("p13", p13:p14, remove=F, sep= "") %>% unite("p14", p14:p15, remove=F, sep= "") %>% unite("p15", p15:p16, remove=F, sep= "") %>% unite("p16", p16:p17, remove=F, sep= "") %>% unite("p17", p17:p18, remove=F, sep= "") %>% unite("p18", p18:p19, remove=F, sep= "") %>% unite("p19", p19:p20, remove=T, sep= "")

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq.dimer) <- seq.dimer[,1]
seq.df <- seq.dimer[,1:20]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Doench2014CORRECTED.quantum.dimer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014CORRECTED.quantum.dimer.melt.txt", quote=F, row.names=F, sep="\t")


# Trimer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Trimer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/")
seq <- read.delim("Doench2014CORRECTED.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.trimer <- seq %>% unite("p1", p1:p3, remove=F, sep= "") %>% unite("p2", p2:p4, remove=F, sep= "") %>% unite("p3", p3:p5, remove=F, sep= "") %>% unite("p4", p4:p6, remove=F, sep= "") %>% unite("p5", p5:p7, remove=F, sep= "") %>% unite("p6", p6:p8, remove=F, sep= "") %>% unite("p7", p7:p9, remove=F, sep= "") %>% unite("p8", p8:p10, remove=F, sep= "") %>% unite("p9", p9:p11, remove=F, sep= "") %>% unite("p10", p10:p12, remove=F, sep= "") %>% unite("p11", p11:p13, remove=F, sep= "") %>% unite("p12", p12:p14, remove=F, sep= "") %>% unite("p13", p13:p15, remove=F, sep= "") %>% unite("p14", p14:p16, remove=F, sep= "") %>% unite("p15", p15:p17, remove=F, sep= "") %>% unite("p16", p16:p18, remove=F, sep= "") %>% unite("p17", p17:p19, remove=F, sep= "") %>% unite("p18", p18:p20, remove=F, sep= "")

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq.trimer) <- seq.trimer[,1]
seq.df <- seq.trimer[,1:19]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Doench2014CORRECTED.quantum.trimer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014CORRECTED.quantum.trimer.melt.txt", quote=F, row.names=F, sep="\t")


# Tetramer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Tetramer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/")
seq <- read.delim("Doench2014CORRECTED.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.tetramer <- seq %>% unite("p1", p1:p4, remove=F, sep= "") %>% unite("p2", p2:p5, remove=F, sep= "") %>% unite("p3", p3:p6, remove=F, sep= "") %>% unite("p4", p4:p7, remove=F, sep= "") %>% unite("p5", p5:p8, remove=F, sep= "") %>% unite("p6", p6:p9, remove=F, sep= "") %>% unite("p7", p7:p10, remove=F, sep= "") %>% unite("p8", p8:p11, remove=F, sep= "") %>% unite("p9", p9:p12, remove=F, sep= "") %>% unite("p10", p10:p13, remove=F, sep= "") %>% unite("p11", p11:p14, remove=F, sep= "") %>% unite("p12", p12:p15, remove=F, sep= "") %>% unite("p13", p13:p16, remove=F, sep= "") %>% unite("p14", p14:p17, remove=F, sep= "") %>% unite("p15", p15:p18, remove=F, sep= "") %>% unite("p16", p16:p19, remove=F, sep= "") %>% unite("p17", p17:p20, remove=F, sep= "") 

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq.tetramer) <- seq.tetramer[,1]
seq.df <- seq.tetramer[,1:18]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Doench2014CORRECTED.quantum.tetramer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Doench2014CORRECTED.quantum.tetramer.melt.txt", quote=F, row.names=F, sep="\t")



setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/")
monomer <- read.delim("Doench2014CORRECTED.quantum.monomer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
basepair <- read.delim("Doench2014CORRECTED.quantum.basepair.melt.txt", header=T, sep="\t", stringsAsFactors = F)
dimer <- read.delim("Doench2014CORRECTED.quantum.dimer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
trimer <- read.delim("Doench2014CORRECTED.quantum.trimer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
tetramer <- read.delim("Doench2014CORRECTED.quantum.tetramer.melt.txt", header=T, sep="\t", stringsAsFactors = F)

monomer.basepair <- rbind(monomer, basepair)
monomer.basepair.dimer <- rbind(monomer.basepair, dimer)
monomer.basepair.dimer.trimer <- rbind(monomer.basepair.dimer, trimer)
monomer.basepair.dimer.trimer.tetramer <- rbind(monomer.basepair.dimer.trimer, tetramer)
write.table(monomer.basepair.dimer.trimer.tetramer, "Doench2014CORRECTED.15mar22.quantum.melt.txt", quote=F, row.names=F, sep="\t")
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED")
df <- read.delim("Doench2014CORRECTED.raw.matrix.txt", header=T, sep="\t", stringsAsFactors = F)

# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED")
tensor <- read.delim("Doench2014CORRECTED.15mar22.quantum.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0

tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0

df.score <- unique(df[,c(1,2)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
colnames(tensor.score.order) <- c("cut.score", "feature.scale", "sgRNAID", "value")

df.dcast <- tensor.score.order %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
nrow(df.dcast.na)
# 16748

df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
write.table(df.location, "Doench2014CORRECTED.finalquantum.txt", quote=F, row.names=F, sep="\t")
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED")
df <- read.delim("w", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df.cut <- df[,c(1:5919,5921:6236)]
df.num <- mutate_all(df.cut[,2:ncol(df.cut)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df.cut[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"


write.table(df.all, "Doench2014CORRECTED.finalquantum.txt", quote=F, row.names=F, sep="\t")

write.table(df.all[,c(1,3:ncol(df.all))], "Doench2014CORRECTED.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "Doench2014CORRECTED.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "Doench2014CORRECTED.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Doench2014CORRECTED.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Doench2014CORRECTED.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.all[,2]), "Doench2014CORRECTED.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")


# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/iRF.run/Doench2014CORRECTED.finalquantum
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/iRF.run/Doench2014CORRECTED.finalquantum
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Doench2014CORRECTED.finalquantum --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/Doench2014CORRECTED.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/Doench2014CORRECTED.finalquantum.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/iRF.run/Doench2014CORRECTED.finalquantum
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/iRF.run/Doench2014CORRECTED.finalquantum/Submits/submit_full_Doench2014CORRECTED.finalquantum_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/iRF.run/Doench2014CORRECTED.finalquantum/Submits/submit_train_Doench2014CORRECTED.finalquantum_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/iRF.run/Doench2014CORRECTED.finalquantum/Submits/submit_test_Doench2014CORRECTED.finalquantum_0.sh

# Andes
module load python/3.7-anaconda3

vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/iRF.run/Doench2014CORRECTED.finalquantum
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/iRF.run/YNames.txt Doench2014CORRECTED.finalquantum
# 0.38912071429062073

sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Doench2014CORRECTED.finalquantum_cut.score.importance4 | head
# p19dimer.Hbond.stackingraw: 3.98127
# V247.xsgRNA.raw: 2.42589
# PAM.C0: 1.86213
# p15dimer.Hbond.stackingraw: 1.20198
# p14dimer.Hbond.energyraw: 1.09491
# p13tetramer.Hbond.stackingraw: 0.796051
# p3tetramer.Hbond.energyraw: 0.746174
# p20monomer.HLgap.eVraw: 0.700829
# p20monomer.No.electronsraw: 0.529749
# p14trimer.Hbond.energyraw: 0.500073

# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/iRF.run/Doench2014CORRECTED.finalquantum/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Doench2014CORRECTED.finalquantum_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.6525512
RIT
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00

#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/iRF.run/Doench2014CORRECTED.finalquantum/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score Doench2014CORRECTED.finalquantum

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/iRF.run/Doench2014CORRECTED.finalquantum/cut.score/RIT.run

# p19dimer.Hbond.stackingraw    cut.score   0.16562004904649372 1.800345917904429e-05   1213.151    0.17853491514892303
# V247.xsgRNA.raw   cut.score   0.09835326754653839 0.0010176665362924215   924.223 0.1464096649214905
# PAM.C0    cut.score   0.09179251740061076 7.345814511089212e-06   961.303 0.2927093722046682
# p3tetramer.Hbond.energyraw    cut.score   0.042723720919565535    -2.9640260062614164e-06 348.364 0.23253379511409317
# p14dimer.Hbond.energyraw  cut.score   0.04044322736269804 -8.124362014625185e-06  378.372 0.23879281375413255
# p15dimer.Hbond.stackingraw    cut.score   0.038112174026197654    -0.0006000076543151262  485.299 0.19449052993622917
# p20monomer.No.electronsraw    cut.score   0.03045501914851094 0.0007592500338267883   386.478 0.12710706117366816
# p17tetramer.Hbond.stackingraw cut.score   0.028125521497537727    -1.4651686023597412e-06 250.067 0.17998194182849084
# p2dimer.HLgap.eVEraw  cut.score   0.027402481292298345    0.0003752894963077648   425.63  0.13126023383927354
# p14dimer.HLgap.eVEraw cut.score   0.026716706826732422    0.0007290664463055949   169.792 0.16361717817517185
Figures
library(ggplot2)
library(reshape2)
library(RColorBrewer)

## Main H.sapien feature figure
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/iRF.run/Doench2014CORRECTED.finalquantum/cut.score")
imp <- read.delim("Doench2014CORRECTED.finalquantum.importance4.effect_sorted", header=F, sep="\t", stringsAsFactors = F)
colnames(imp) <- c("Feature", "YVec", "NormEdge", "Effect", "Samples", "Linearity")
imp$Normalized.Importance <- as.numeric(imp$NormEdge)
imp$Feature.Effect <- as.numeric(imp$Effect)
imp$SampleCount <- as.numeric(imp$Samples)
imp.dir <- imp %>% mutate(Effect.Direction = ifelse(Feature.Effect < 0, "neg", ifelse(Feature.Effect > 0, "pos", "zero")))
imp.dir.top20 <- imp.dir[1:20,]

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
imp.dir.top20.df <- imp.dir.top20 %>% mutate(imp.dir = ifelse(Effect.Direction == "neg", Normalized.Importance*-1, Normalized.Importance))
imp.dir.top20.df$Feature.Label <- c()

library(ggplot2)
pdf("Doench2014CORRECTED.FeatureEngineering.pdf")
ggplot(imp.dir.top20.df, aes(x=reorder(Feature.Label, -Normalized.Importance), y=imp.dir, color=Effect.Direction)) + geom_point(size=3) + geom_segment(aes(x=Feature.Label, xend=Feature.Label, y=0, yend=imp.dir)) + labs(title="Doench2014CORRECTED Top Features") + ylab("Normalized Importance") + xlab("") + theme(axis.text.x = element_text(angle=90, vjust=0.6)) + scale_fill_brewer(palette="Set1") + theme_classic() + coord_flip()
dev.off()


pdf("Doench2014CORRECTED.FeatureEngineering.nocolor.pdf")
ggplot(imp.dir.top20.df, aes(x=reorder(Feature.Label, -Normalized.Importance), y=imp.dir), color="black") + geom_point(size=3) + geom_segment(aes(x=Feature.Label, xend=Feature.Label, y=0, yend=imp.dir)) + labs(title="Doench2014CORRECTED Top Features") + ylab("Normalized Importance") + xlab("") + theme(axis.text.x = element_text(angle=90, vjust=0.6)) + theme_classic() + coord_flip()
dev.off()

Chuai 2018: New Human Dataset to Assess

https://github.com/Peppags/CNN-SVR/blob/master/data/training_example.csv

https://www.frontiersin.org/articles/10.3389/fgene.2019.01303/full In order to evaluate the performance of our method, we used four public experimental validated gRNA on-target cleavage efficacy independent human datasets, which were integrated and processed by Chuai et al (Chuai et al., 2018). These experimented-based datasets were originally collected from public datasets (Wang et al., 2014; Hart et al., 2015; Doench et al., 2016). They covered gRNAs targeting 1071 genes from four different cell lines, including HCT116 (4239 samples) (Hart et al., 2015), HEK293T (2333 samples) (Doench et al., 2016), HELA (8101 samples) (Hart et al., 2015), and HL60 (2076 samples) (Wang et al., 2014) with redundancy removed. The gRNA on-target activity was strictly restricted to experimental assay, where the cleavage efficiency was defined as the log-fold change in the measured knockout efficacy. Readouts of cleavage efficacies without in vivo (in vitro) experimental validation were excluded.

sgRNA dataset
# scp /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/Sprint.Opioid.ATAC/Genome/GCF_000001405.39_GRCh38.p13_genomic.fna noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/.
# scp /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/Sprint.Opioid.ATAC/Genome/GCF_000001405.39_GRCh38.p13_genomic.gene.sorted.gtf noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/.
# scp /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/human/Chuai.et.al.2018/Chuai.2018.score.txt noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/.

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/")
df <- read.delim("Chuai.2018.score.txt", header=T, sep="\t")

library(dplyr)
df2 <- df %>% mutate(id = row_number())
df3 <- df2[,c(7,5,6)]
colnames(df3) <- c("sgRNAID",   "nucleotide.sequence", "cut.score")
df.na <- na.omit(df3)
write.table(df.na, "Chuai2018.ngg.txt", quote=F, row.names=F, sep="\t")

df.na$nucleotide.sequence <- substr(df.na$nucleotide.sequence, 1, 20)
write.table(df.na, "Chuai2018.txt", quote=F, row.names=F, sep="\t")
# 16750

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/
sed '1d' Chuai2018.txt | awk '{print ">"$1"\n"$2}' > Chuai2018.fasta
coordinates
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018

sed 's/chr10/NC_000010.11/g' Chuai.2018.score.txt | sed 's/chr11/NC_000011.10/g' | sed 's/chr12/NC_000012.12/g' | sed 's/chr13/NC_000013.11/g' | sed 's/chr14/NC_000014.9/g' | sed 's/chr15/NC_000015.10/g' | sed 's/chr16/NC_000016.10/g' | sed 's/chr17/NC_000017.11/g' | sed 's/chr18/NC_000018.10/g' | sed 's/chr19/NC_000019.10/g' | sed 's/chr20/NC_000020.11/g' | sed 's/chr21/NC_000021.9/g' | sed 's/chr22/NC_000022.11/g' | sed 's/chr1/NC_000001.11/g' | sed 's/chr2/NC_000002.12/g' | sed 's/chr3/NC_000003.12/g' | sed 's/chr4/NC_000004.12/g' | sed 's/chr5/NC_000005.10/g' | sed 's/chr6/NC_000006.12/g' | sed 's/chr7/NC_000007.14/g' | sed 's/chr8/NC_000008.11/g' | sed 's/chr9/NC_000009.12/g' | sed 's/chrX/NC_000023.11/g' | sed 's/chrY/NC_000024.10/g' > Chuai.2018.score.chr.txt

library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/")
df <- read.delim("Chuai.2018.score.chr.txt", header=T, sep="\t")
df.id <- df %>% mutate(id = row_number())
df.2 <- cbind(df, df.id)
df.coord <- df.2[,c(1:3,13,13,5,6)]
colnames(df.coord) <- c("chr", "start", "end", "sgRNA", "sgRNAID", "nucleotide.sequence", "cut.score")
write.table(df.coord, "Chuai2018.sgRNA.coord.txt", quote=F, row.names=F, sep="\t")

RNA structure (ViennaRNA)

https://www.tbi.univie.ac.at/RNA/tutorial/ minimum free energy (MFE) structure

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate ViennaRNA

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/vienna
RNAfold < ../Chuai2018.fasta > Chuai2018.gRNA.ViennaRNA.output.txt

grep '(' Chuai2018.gRNA.ViennaRNA.output.txt | grep -Eo '[+-]?[0-9]+([.][0-9]+)?' > Chuai2018.gRNA.ViennaRNA.output.value.txt
grep '>' Chuai2018.gRNA.ViennaRNA.output.txt | sed 's/>//g' > Chuai2018.gRNA.names.txt
paste Chuai2018.gRNA.names.txt Chuai2018.gRNA.ViennaRNA.output.value.txt > Chuai2018.gRNA.ViennaRNA.output.value.id.txt
cp Chuai2018.gRNA.ViennaRNA.output.value.id.txt ../.

Temperature of melting (Tm)

https://biopython.org/docs/1.75/api/Bio.SeqUtils.MeltingTemp.html

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

Bio.SeqUtils.MeltingTemp.Tm_NN(seq, check=True, strict=True, c_seq=None, shift=0, nn_table=None, tmm_table=None, imm_table=None, de_table=None, dnac1=25, dnac2=25, selfcomp=False, Na=50, K=0, Tris=0, Mg=0, dNTPs=0, saltcorr=5)

https://warwick.ac.uk/fac/sci/moac/people/students/peter_cock/python/fasta_n

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

# count nucleotides
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018
python3

input_file = open('Chuai2018.fasta', 'r')
output_file = open('nucleotide_counts_sgRNA.tsv','w')
output_file.write('Window\tA\tC\tG\tT\tLength\tCG%\n')
from Bio import SeqIO
for cur_record in SeqIO.parse(input_file, "fasta") :
    gene_name = cur_record.name
    A_count = cur_record.seq.count('A')
    C_count = cur_record.seq.count('C')
    G_count = cur_record.seq.count('G')
    T_count = cur_record.seq.count('T')
    length = len(cur_record.seq)
    cg_percentage = float(C_count + G_count) / length
    output_line = '%s\t%i\t%i\t%i\t%i\t%i\t%f\n' % \
    (gene_name, A_count, C_count, G_count, T_count, length, cg_percentage)
    output_file.write(output_line)
    
output_file.close()
input_file.close()
exit()

# Melting temperature(°C) = 64.9 + 41 * (nG+nC-16.4)/(nA+nT+nG+nC)
R

library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018")
df <- read.delim("nucleotide_counts_sgRNA.tsv", header=T, sep="\t")
df.melt <- df %>% mutate(MeltingTemp = 64.9 + 41 * (G+C-16.4) / (A+T+G+C))

write.table(df.melt, "Chuai2018.nucleotide_counts_sgRNA_temp.txt", quote=F, row.names=F, sep="\t")
q()

Positional Encoding

# positional encoding kmers 1-4
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/
cut -f 1-2 Chuai2018.txt > Chuai2018.noscore.txt

python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer1_positional_encode.py Chuai2018.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer2_positional_encode.py Chuai2018.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer3_positional_encode.py Chuai2018.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer4_positional_encode.py Chuai2018.noscore.txt


# separate nucleotide sequence values into individual columns in data frame so each position counts as one feature
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/

sed '1d' Chuai2018.noscore_dependent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Chuai2018_dep1.txt
sed '1d' Chuai2018.noscore_dependent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Chuai2018_dep2.txt
sed '1d' Chuai2018.noscore_dependent3.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Chuai2018_dep3.txt
sed '1d' Chuai2018.noscore_dependent4.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Chuai2018_dep4.txt

python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/encode_sequences.py Chuai2018.noscore.txt
sed '1d' Chuai2018.noscore_independent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID A C T G' | cut -d ' ' -f 1-5 > Chuai2018_ind1.txt
sed '1d' Chuai2018.noscore_independent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID AA AC AT AG CA CC CT CG TA TC TT TG GA GC GT GG' | cut -d ' ' -f 1-17 > Chuai2018_ind2.txt

Quantum tensors

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/
sed '1d' Chuai2018.noscore.txt | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 p16 p17 p18 p19 p20' | cut -d ' ' -f 1-21 > Chuai2018.sequence.txt


# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test


library(dplyr)
library(reshape2)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("protein_rna_dna-vector_lee_nucleotide_dna_data_15dec.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/")
seq <- read.delim("Chuai2018.sequence.txt", header=T, sep=" ", stringsAsFactors = F)

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:5]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- c("A", "C", "G", "T")

rownames(seq) <- seq[,1]
seq.df <- seq[,2:21]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))

seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Chuai2018.tensorsAll.single.bp.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Chuai2018.tensorsAll.single.bp.melt.txt", quote=F, row.names=F, sep="\t")

PAM

https://www.synthego.com/guide/how-to-use-crispr/pam-sequence

#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J bedtools
#SBATCH -N 1
#SBATCH -p gpu
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018
awk '{print $0"\t""+"}' Chuai2018.sgRNA.coord.bed > Chuai2018.sgRNA.coord.strand.txt
bedtools closest -a Chuai2018.sgRNA.coord.strand.txt -b Chuai2018.NGG.PAM.sorted.bed -io -iu -D a > Chuai2018.sgRNA.closestPAM.bed

bedtools intersect -wo -a Chuai2018.20bp.sliding.bed -b Chuai2018.NGG.PAM.sorted.bed > Chuai2018.NGG.PAM.20bp.sliding.windows.bed

cut -f 1-4 Chuai2018.sgRNA.coord.txt | sed '1d' | sort -k 1,1 -k 2,2n > Chuai2018.sgRNA.coord.bed
bedtools closest -a Chuai2018.sgRNA.coord.bed -b GCF_000001405.39_GRCh38.p13_genomic.gene.sorted.gtf -D b > Chuai2018.sgRNA.gene.closest.bed

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/bedtools.sh
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018")
df <- read.delim("Chuai2018.ngg.txt")
df$ngg <- substr(df$nucleotide.sequence, 21, 23)
df$nucleotide.sequence <- substr(df$nucleotide.sequence, 1, 20)
df$pam.distance <- 1
write.table(df, "Chuai2018.sgRNA.closestPAM.bed", quote=F, row.names=F, sep='\t')

location relative to gene

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018
cut -f 1-4 Chuai2018.sgRNA.coord.txt | sed '1d' | sort -k 1,1 -k 2,2n > Chuai2018.sgRNA.coord.bed
bedtools closest -a Chuai2018.sgRNA.coord.bed -b GCF_000001405.39_GRCh38.p13_genomic.gene.sorted.gtf -D b > Chuai2018.sgRNA.gene.closest.bed

Feature matrix

#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J chuari.matrix
#SBATCH -N 1
#SBATCH -t 10:00:00
#SBATCH -p gpu

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018
R CMD BATCH chuari.matrix.R

#sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/chuari.matrix.sh
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018")
structure <- read.delim("Chuai2018.gRNA.ViennaRNA.output.value.id.txt", header=T, sep="\t", stringsAsFactors = F)
nuc <- read.delim("Chuai2018.nucleotide_counts_sgRNA_temp.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("Chuai2018.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- unique(score[,c(5,7,6)])
colnames(score.df) <- c("sgRNAID", "cut.score", "nucleotide.sequence")

structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]

# structure, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])

structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"

structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]

structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score.df, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "seq", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")

## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018")
onehot.ind1 <- read.delim("Chuai2018_ind1.txt", header=T, sep=" ")
onehot.ind2 <- read.delim("Chuai2018_ind2.txt", header=T, sep=" ")
onehot.dep1 <- read.delim("Chuai2018_dep1.txt", header=F, sep=" ")
onehot.dep2 <- read.delim("Chuai2018_dep2.txt", header=F, sep=" ")
onehot.dep3 <- read.delim("Chuai2018_dep3.txt", header=F, sep=" ")
onehot.dep4 <- read.delim("Chuai2018_dep4.txt", header=F, sep=" ")
colnames(onehot.dep1)[1] <- "sgRNAID"
colnames(onehot.dep2)[1] <- "sgRNAID"
colnames(onehot.dep3)[1] <- "sgRNAID"
colnames(onehot.dep4)[1] <- "sgRNAID"

onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)-1], onehot.dep2[,1:ncol(onehot.dep2)-1], by="sgRNAID")
onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)-1], by="sgRNAID")
onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)-1], by="sgRNAID")

onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"

data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))

df.melt <- melt(data.onehot[,c(1,2,4:ncol(data.onehot))], id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
write.table(df.id, "Chuai2018.structure.temp.gc.onehot1to4.txt", quote=F, row.names=F, sep="\t")


# pam (distance and nucleotide)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018")
sgRNA.pam <- read.table("Chuai2018.sgRNA.closestPAM.bed", header=T, sep="\t", stringsAsFactors = F)
sgRNA.pam.sub <- sgRNA.pam[,c(1,4,5)]
colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]
sgRNA.pam.id <- sgRNA.pam.df

score <- read.delim("Chuai2018.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(5,7)]
colnames(score.df) <- c("sgRNAID", "cut.score")

score.location <- left_join(score.df, sgRNA.pam.id, by="sgRNAID")
score.location$scale <- 0

df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.pam.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)

df <- read.delim("Chuai2018.structure.temp.gc.onehot1to4.txt", header=T, sep="\t")
df.onehot.dcast <- df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)

df.onehot.pam <- left_join(df.onehot.dcast, df.pam.dcast, by=c("sgRNAID"))

df.onehot.pam.na <- na.omit(df.onehot.pam)
nrow(df.onehot.pam.na)
# 16748


# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018")
sgRNA.genes <- read.table("Chuai2018.sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- sgRNA.genes[,c(4,14)]
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")
sgRNA.genes.id <- sgRNA.genes.df

score.location <- left_join(score.df, sgRNA.genes.id, by=c("sgRNAID"))
score.location$scale <- 0

df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.location.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.location.dcast.na <- na.omit(df.location.dcast)

df.pam.location <- inner_join(df.location.dcast.na, df.onehot.pam.na, by=c("sgRNAID"))
nrow(df.pam.location)
# 16748

df.final <- df.pam.location[,c(1:3,5:5915,5917:5921)]
ncol(df.final)
# 5919
write.table(df.final, "Chuai2018.raw.matrix.txt", quote=F, row.names=F, sep="\t")
# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)

# Monomer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Monomer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/")
seq <- read.delim("Chuai2018.sequence.txt", header=T, sep=" ", stringsAsFactors = F)

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq) <- seq[,1]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Chuai2018.quantum.monomer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Chuai2018.quantum.monomer.melt.txt", quote=F, row.names=F, sep="\t")


# Basepair
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Basepair.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/")
seq <- read.delim("Chuai2018.sequence.txt", header=T, sep=" ", stringsAsFactors = F)

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq) <- seq[,1]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Chuai2018.quantum.basepair.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Chuai2018.quantum.basepair.melt.txt", quote=F, row.names=F, sep="\t")


# Dimer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Dimer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/")
seq <- read.delim("Chuai2018.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.dimer <- seq %>% unite("p1", p1:p2, remove=F, sep= "") %>% unite("p2", p2:p3, remove=F, sep= "") %>% unite("p3", p3:p4, remove=F, sep= "") %>% unite("p4", p4:p5, remove=F, sep= "") %>% unite("p5", p5:p6, remove=F, sep= "") %>% unite("p6", p6:p7, remove=F, sep= "") %>% unite("p7", p7:p8, remove=F, sep= "") %>% unite("p8", p8:p9, remove=F, sep= "") %>% unite("p9", p9:p10, remove=F, sep= "") %>% unite("p10", p10:p11, remove=F, sep= "") %>% unite("p11", p11:p12, remove=F, sep= "") %>% unite("p12", p12:p13, remove=F, sep= "") %>% unite("p13", p13:p14, remove=F, sep= "") %>% unite("p14", p14:p15, remove=F, sep= "") %>% unite("p15", p15:p16, remove=F, sep= "") %>% unite("p16", p16:p17, remove=F, sep= "") %>% unite("p17", p17:p18, remove=F, sep= "") %>% unite("p18", p18:p19, remove=F, sep= "") %>% unite("p19", p19:p20, remove=T, sep= "")

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq.dimer) <- seq.dimer[,1]
seq.df <- seq.dimer[,1:20]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Chuai2018.quantum.dimer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Chuai2018.quantum.dimer.melt.txt", quote=F, row.names=F, sep="\t")


# Trimer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Trimer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/")
seq <- read.delim("Chuai2018.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.trimer <- seq %>% unite("p1", p1:p3, remove=F, sep= "") %>% unite("p2", p2:p4, remove=F, sep= "") %>% unite("p3", p3:p5, remove=F, sep= "") %>% unite("p4", p4:p6, remove=F, sep= "") %>% unite("p5", p5:p7, remove=F, sep= "") %>% unite("p6", p6:p8, remove=F, sep= "") %>% unite("p7", p7:p9, remove=F, sep= "") %>% unite("p8", p8:p10, remove=F, sep= "") %>% unite("p9", p9:p11, remove=F, sep= "") %>% unite("p10", p10:p12, remove=F, sep= "") %>% unite("p11", p11:p13, remove=F, sep= "") %>% unite("p12", p12:p14, remove=F, sep= "") %>% unite("p13", p13:p15, remove=F, sep= "") %>% unite("p14", p14:p16, remove=F, sep= "") %>% unite("p15", p15:p17, remove=F, sep= "") %>% unite("p16", p16:p18, remove=F, sep= "") %>% unite("p17", p17:p19, remove=F, sep= "") %>% unite("p18", p18:p20, remove=F, sep= "")

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq.trimer) <- seq.trimer[,1]
seq.df <- seq.trimer[,1:19]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Chuai2018.quantum.trimer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Chuai2018.quantum.trimer.melt.txt", quote=F, row.names=F, sep="\t")


# Tetramer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Tetramer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/")
seq <- read.delim("Chuai2018.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.tetramer <- seq %>% unite("p1", p1:p4, remove=F, sep= "") %>% unite("p2", p2:p5, remove=F, sep= "") %>% unite("p3", p3:p6, remove=F, sep= "") %>% unite("p4", p4:p7, remove=F, sep= "") %>% unite("p5", p5:p8, remove=F, sep= "") %>% unite("p6", p6:p9, remove=F, sep= "") %>% unite("p7", p7:p10, remove=F, sep= "") %>% unite("p8", p8:p11, remove=F, sep= "") %>% unite("p9", p9:p12, remove=F, sep= "") %>% unite("p10", p10:p13, remove=F, sep= "") %>% unite("p11", p11:p14, remove=F, sep= "") %>% unite("p12", p12:p15, remove=F, sep= "") %>% unite("p13", p13:p16, remove=F, sep= "") %>% unite("p14", p14:p17, remove=F, sep= "") %>% unite("p15", p15:p18, remove=F, sep= "") %>% unite("p16", p16:p19, remove=F, sep= "") %>% unite("p17", p17:p20, remove=F, sep= "") 

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq.tetramer) <- seq.tetramer[,1]
seq.df <- seq.tetramer[,1:18]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Chuai2018.quantum.tetramer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Chuai2018.quantum.tetramer.melt.txt", quote=F, row.names=F, sep="\t")



setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/")
monomer <- read.delim("Chuai2018.quantum.monomer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
basepair <- read.delim("Chuai2018.quantum.basepair.melt.txt", header=T, sep="\t", stringsAsFactors = F)
dimer <- read.delim("Chuai2018.quantum.dimer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
trimer <- read.delim("Chuai2018.quantum.trimer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
tetramer <- read.delim("Chuai2018.quantum.tetramer.melt.txt", header=T, sep="\t", stringsAsFactors = F)

monomer.basepair <- rbind(monomer, basepair)
monomer.basepair.dimer <- rbind(monomer.basepair, dimer)
monomer.basepair.dimer.trimer <- rbind(monomer.basepair.dimer, trimer)
monomer.basepair.dimer.trimer.tetramer <- rbind(monomer.basepair.dimer.trimer, tetramer)
write.table(monomer.basepair.dimer.trimer.tetramer, "Chuai2018.15mar22.quantum.melt.txt", quote=F, row.names=F, sep="\t")
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018")
df <- read.delim("Chuai2018.raw.matrix.txt", header=T, sep="\t", stringsAsFactors = F)

# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018")
tensor <- read.delim("Chuai2018.15mar22.quantum.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0

tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0

df.score <- unique(df[,c(1,2)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
colnames(tensor.score.order) <- c("cut.score", "feature.scale", "sgRNAID", "value")

df.dcast <- tensor.score.order %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
nrow(df.dcast.na)
# 16748

df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
write.table(df.location, "Chuai2018.finalquantum.txt", quote=F, row.names=F, sep="\t")
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018")
df <- read.delim("Chuai2018.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df.cut <- df[,c(1:5919,5921:6236)]
df.num <- mutate_all(df.cut[,2:ncol(df.cut)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df.cut[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"


write.table(df.all, "w", quote=F, row.names=F, sep="\t")

write.table(df.all[,c(1,3:ncol(df.all))], "Chuai2018.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "Chuai2018.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "Chuai2018.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Chuai2018.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Chuai2018.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.all[,2]), "Chuai2018.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")


# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Chuai2018.finalquantum --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/Chuai2018.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/Chuai2018.finalquantum.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum/Submits/submit_full_Chuai2018.finalquantum_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum/Submits/submit_train_Chuai2018.finalquantum_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum/Submits/submit_test_Chuai2018.finalquantum_0.sh

# Andes
module load python/3.7-anaconda3

vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/YNames.txt Chuai2018.finalquantum
# 0.23170011706359436

sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Chuai2018.finalquantum_cut.score.importance4 | head
# p18monomer.No.electronsraw: 5.83976
# p18monomer.HLgap.eVraw: 4.99657
# p17tetramer.Hbond.energyraw: 3.53595
# p13tetramer.Hbond.stackingraw: 3.50499
# p5tetramer.Hbond.stackingraw: 3.40497
# p17tetramer.Hlgap.eVEraw: 3.37027
# p3tetramer.Hlgap.eVEraw: 3.25658
# p1tetramer.Hbond.stackingraw: 3.16035
# p11tetramer.Hlgap.eVEraw: 3.15086
# p8tetramer.Hbond.stackingraw: 3.06078

# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Chuai2018.finalquantum_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.503659

–> remove trimer/tetramer

library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018")
df <- read.delim("Chuai2018.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)
df.nokmer <- df %>% select(-grep("trimer", names(df)), -grep("tetramer", names(df))) 

write.table(df.nokmer[,c(1,3:ncol(df.nokmer))], "Chuai2018.finalquantum.nokmer.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.nokmer[,c(1,3:ncol(df.nokmer))], "Chuai2018.finalquantum.nokmer.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.nokmer[,3:ncol(df.nokmer)], "Chuai2018.finalquantum.nokmer.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum.nokmer
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum.nokmer
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Chuai2018.finalquantum --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/Chuai2018.finalquantum.nokmer.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/Chuai2018.finalquantum.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum.nokmer
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum.nokmer/Submits/submit_full_Chuai2018.finalquantum_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum.nokmer/Submits/submit_train_Chuai2018.finalquantum_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum.nokmer/Submits/submit_test_Chuai2018.finalquantum_0.sh

# Andes
module load python/3.7-anaconda3

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum.nokmer
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/YNames.txt Chuai2018.finalquantum
# 0.22948997895639026

sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Chuai2018.finalquantum_cut.score.importance4 | head
# sgRNA.structuresgRNA.raw: 6.25575
# p18monomer.HLgap.eVraw: 6.14648
# gene.distance0: 5.43944
# p18monomer.No.electronsraw: 4.85526
# p13dimer.HLgap.eVEraw: 4.40427
# p14dimer.HLgap.eVEraw: 4.19683
# p14dimer.Hbond.energyraw: 3.81849
# TsgRNA.raw: 3.57828
# p9dimer.HLgap.eVEraw: 3.49469
# p18dimer.Hbond.energyraw: 3.1656

# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum.nokmer/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Chuai2018.finalquantum_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.486193
RIT
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00

#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score Chuai2018.finalquantum

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum/cut.score/RIT.run

Chuai by cell type

# mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
hct <- read.delim("HCT116.csv", header=T, sep=",")
hek <- read.delim("HEK293T.csv", header=T, sep=",")
hel <- read.delim("HELA.csv", header=T, sep=",")
hl <- read.delim("HL60.csv", header=T, sep=",")

hct$cell.line <- "HCT116"
hek$cell.line <- "HEK293T"
hel$cell.line <- "HELA"
hl$cell.line <- "HL60"

all <- rbind(hct, hek, hel, hl)
write.table(all, "Chuai2018.cell.lines.dataset.txt", quote=F, row.names=F, sep="\t")
sgRNA dataset
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
df <- read.delim("Chuai2018.cell.lines.dataset.txt", header=T, sep="\t")

library(dplyr)
library(tidyr)
df2 <- df %>% group_by(cell.line) %>% mutate(id = row_number())
df2.id <- unite(df2, "sgRNAID", c(cell.line, id), sep="_")
df3 <- df2.id[,c(12,5,10)]
colnames(df3) <- c("sgRNAID",   "nucleotide.sequence", "cut.score")
df.na <- na.omit(df3)
write.table(df.na, "Chuai2018.cell.lines.ngg.txt", quote=F, row.names=F, sep="\t")

df.na$nucleotide.sequence <- substr(df.na$nucleotide.sequence, 1, 20)
write.table(df.na, "Chuai2018.cell.lines.txt", quote=F, row.names=F, sep="\t")
# 16749

df3 <- df2.id[,c(12,5,11)]
colnames(df3) <- c("sgRNAID",   "nucleotide.sequence", "cut.score")
df.na <- na.omit(df3)
df.na$nucleotide.sequence <- substr(df.na$nucleotide.sequence, 1, 20)
write.table(df.na, "Chuai2018.cell.lines.classification.txt", quote=F, row.names=F, sep="\t")

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines
sed '1d' Chuai2018.cell.lines.txt | awk '{print ">"$1"\n"$2}' > Chuai2018.cell.lines.fasta
coordinates
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines

sed 's/chr10/NC_000010.11/g' Chuai2018.cell.lines.dataset.txt | sed 's/chr11/NC_000011.10/g' | sed 's/chr12/NC_000012.12/g' | sed 's/chr13/NC_000013.11/g' | sed 's/chr14/NC_000014.9/g' | sed 's/chr15/NC_000015.10/g' | sed 's/chr16/NC_000016.10/g' | sed 's/chr17/NC_000017.11/g' | sed 's/chr18/NC_000018.10/g' | sed 's/chr19/NC_000019.10/g' | sed 's/chr20/NC_000020.11/g' | sed 's/chr21/NC_000021.9/g' | sed 's/chr22/NC_000022.11/g' | sed 's/chr1/NC_000001.11/g' | sed 's/chr2/NC_000002.12/g' | sed 's/chr3/NC_000003.12/g' | sed 's/chr4/NC_000004.12/g' | sed 's/chr5/NC_000005.10/g' | sed 's/chr6/NC_000006.12/g' | sed 's/chr7/NC_000007.14/g' | sed 's/chr8/NC_000008.11/g' | sed 's/chr9/NC_000009.12/g' | sed 's/chrX/NC_000023.11/g' | sed 's/chrY/NC_000024.10/g' > Chuai2018.cell.lines.score.chr.txt

library(dplyr)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
df <- read.delim("Chuai2018.cell.lines.score.chr.txt", header=T, sep="\t")
df2 <- df %>% group_by(cell.line) %>% mutate(id = row_number())
df.id <- unite(df2, "sgRNAID", c(cell.line, id), sep="_")
df.coord <- df.id[,c(1:3,12,12,5,10)]
colnames(df.coord) <- c("chr", "start", "end", "sgRNA", "sgRNAID", "nucleotide.sequence", "cut.score")
write.table(df.coord, "Chuai2018.cell.lines.sgRNA.coord.txt", quote=F, row.names=F, sep="\t")

RNA structure (ViennaRNA)

https://www.tbi.univie.ac.at/RNA/tutorial/ minimum free energy (MFE) structure

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate ViennaRNA

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/vienna
RNAfold < ../Chuai2018.cell.lines.fasta > Chuai2018.cell.lines.gRNA.ViennaRNA.output.txt

grep '(' Chuai2018.cell.lines.gRNA.ViennaRNA.output.txt | grep -Eo '[+-]?[0-9]+([.][0-9]+)?' > Chuai2018.cell.lines.gRNA.ViennaRNA.output.value.txt
grep '>' Chuai2018.cell.lines.gRNA.ViennaRNA.output.txt | sed 's/>//g' > Chuai2018.cell.lines.gRNA.names.txt
paste Chuai2018.cell.lines.gRNA.names.txt Chuai2018.cell.lines.gRNA.ViennaRNA.output.value.txt > Chuai2018.cell.lines.gRNA.ViennaRNA.output.value.id.txt
cp Chuai2018.cell.lines.gRNA.ViennaRNA.output.value.id.txt ../.

Temperature of melting (Tm)

https://biopython.org/docs/1.75/api/Bio.SeqUtils.MeltingTemp.html

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

Bio.SeqUtils.MeltingTemp.Tm_NN(seq, check=True, strict=True, c_seq=None, shift=0, nn_table=None, tmm_table=None, imm_table=None, de_table=None, dnac1=25, dnac2=25, selfcomp=False, Na=50, K=0, Tris=0, Mg=0, dNTPs=0, saltcorr=5)

https://warwick.ac.uk/fac/sci/moac/people/students/peter_cock/python/fasta_n

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

# count nucleotides
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines
python3

input_file = open('Chuai2018.cell.lines.fasta', 'r')
output_file = open('nucleotide_counts_sgRNA.tsv','w')
output_file.write('Window\tA\tC\tG\tT\tLength\tCG%\n')
from Bio import SeqIO
for cur_record in SeqIO.parse(input_file, "fasta") :
    gene_name = cur_record.name
    A_count = cur_record.seq.count('A')
    C_count = cur_record.seq.count('C')
    G_count = cur_record.seq.count('G')
    T_count = cur_record.seq.count('T')
    length = len(cur_record.seq)
    cg_percentage = float(C_count + G_count) / length
    output_line = '%s\t%i\t%i\t%i\t%i\t%i\t%f\n' % \
    (gene_name, A_count, C_count, G_count, T_count, length, cg_percentage)
    output_file.write(output_line)
    
output_file.close()
input_file.close()
exit()

# Melting temperature(°C) = 64.9 + 41 * (nG+nC-16.4)/(nA+nT+nG+nC)
R

library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
df <- read.delim("nucleotide_counts_sgRNA.tsv", header=T, sep="\t")
df.melt <- df %>% mutate(MeltingTemp = 64.9 + 41 * (G+C-16.4) / (A+T+G+C))

write.table(df.melt, "Chuai2018.cell.lines.nucleotide_counts_sgRNA_temp.txt", quote=F, row.names=F, sep="\t")
q()

Positional Encoding

# positional encoding kmers 1-4
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines
cut -f 1-2 Chuai2018.cell.lines.txt > Chuai2018.cell.lines.noscore.txt

python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer1_positional_encode.py Chuai2018.cell.lines.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer2_positional_encode.py Chuai2018.cell.lines.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer3_positional_encode.py Chuai2018.cell.lines.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer4_positional_encode.py Chuai2018.cell.lines.noscore.txt


# separate nucleotide sequence values into individual columns in data frame so each position counts as one feature
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines

sed '1d' Chuai2018.cell.lines.noscore_dependent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Chuai2018.cell.lines_dep1.txt
sed '1d' Chuai2018.cell.lines.noscore_dependent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Chuai2018.cell.lines_dep2.txt
sed '1d' Chuai2018.cell.lines.noscore_dependent3.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Chuai2018.cell.lines_dep3.txt
sed '1d' Chuai2018.cell.lines.noscore_dependent4.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > Chuai2018.cell.lines_dep4.txt

python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/encode_sequences.py Chuai2018.cell.lines.noscore.txt
sed '1d' Chuai2018.cell.lines.noscore_independent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID A C T G' | cut -d ' ' -f 1-5 > Chuai2018.cell.lines_ind1.txt
sed '1d' Chuai2018.cell.lines.noscore_independent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID AA AC AT AG CA CC CT CG TA TC TT TG GA GC GT GG' | cut -d ' ' -f 1-17 > Chuai2018.cell.lines_ind2.txt

Quantum tensors

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines
sed '1d' Chuai2018.cell.lines.noscore.txt | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 p16 p17 p18 p19 p20' | cut -d ' ' -f 1-21 > Chuai2018.cell.lines.sequence.txt


# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test


library(dplyr)
library(reshape2)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("protein_rna_dna-vector_lee_nucleotide_dna_data_15dec.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
seq <- read.delim("Chuai2018.cell.lines.sequence.txt", header=T, sep=" ", stringsAsFactors = F)

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:5]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- c("A", "C", "G", "T")

rownames(seq) <- seq[,1]
seq.df <- seq[,2:21]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))

seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Chuai2018.cell.lines.tensorsAll.single.bp.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Chuai2018.cell.lines.tensorsAll.single.bp.melt.txt", quote=F, row.names=F, sep="\t")

PAM

https://www.synthego.com/guide/how-to-use-crispr/pam-sequence

#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J bedtools
#SBATCH -N 1
#SBATCH -p gpu
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines
awk '{print $0"\t""+"}' Chuai2018.cell.lines.sgRNA.coord.bed > Chuai2018.cell.lines.sgRNA.coord.strand.txt

cut -f 1-4 Chuai2018.cell.lines.sgRNA.coord.txt | sed '1d' | sort -k 1,1 -k 2,2n > Chuai2018.cell.lines.sgRNA.coord.bed
bedtools closest -a Chuai2018.cell.lines.sgRNA.coord.bed -b ../GCF_000001405.39_GRCh38.p13_genomic.gene.sorted.gtf -D b > Chuai201.cell.lines8.sgRNA.gene.closest.bed

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/bedtools.sh
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
df <- read.delim("Chuai2018.cell.lines.ngg.txt")
df$ngg <- substr(df$nucleotide.sequence, 21, 23)
df$nucleotide.sequence <- substr(df$nucleotide.sequence, 1, 20)
df$pam.distance <- 1
write.table(df, "Chuai2018.cell.lines.sgRNA.closestPAM.bed", quote=F, row.names=F, sep='\t')

location relative to gene

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines
cut -f 1-4 Chuai2018.cell.lines.sgRNA.coord.txt | sed '1d' | sort -k 1,1 -k 2,2n > Chuai2018.cell.lines.sgRNA.coord.bed
bedtools closest -a Chuai2018.cell.lines.sgRNA.coord.bed -b ../GCF_000001405.39_GRCh38.p13_genomic.gene.sorted.gtf -D b > Chuai2018.cell.lines.sgRNA.gene.closest.bed

Feature matrix

# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
structure <- read.delim("Chuai2018.cell.lines.gRNA.ViennaRNA.output.value.id.txt", header=T, sep="\t", stringsAsFactors = F)
nuc <- read.delim("Chuai2018.cell.lines.nucleotide_counts_sgRNA_temp.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("Chuai2018.cell.lines.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- unique(score[,c(5,7,6)])
colnames(score.df) <- c("sgRNAID", "cut.score", "nucleotide.sequence")

structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]

# structure, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])

structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"

structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]

structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score.df, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "seq", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")

## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
onehot.ind1 <- read.delim("Chuai2018.cell.lines_ind1.txt", header=T, sep=" ")
onehot.ind2 <- read.delim("Chuai2018.cell.lines_ind2.txt", header=T, sep=" ")
onehot.dep1 <- read.delim("Chuai2018.cell.lines_dep1.txt", header=F, sep=" ")
onehot.dep2 <- read.delim("Chuai2018.cell.lines_dep2.txt", header=F, sep=" ")
onehot.dep3 <- read.delim("Chuai2018.cell.lines_dep3.txt", header=F, sep=" ")
onehot.dep4 <- read.delim("Chuai2018.cell.lines_dep4.txt", header=F, sep=" ")
colnames(onehot.dep1)[1] <- "sgRNAID"
colnames(onehot.dep2)[1] <- "sgRNAID"
colnames(onehot.dep3)[1] <- "sgRNAID"
colnames(onehot.dep4)[1] <- "sgRNAID"

onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)-1], onehot.dep2[,1:ncol(onehot.dep2)-1], by="sgRNAID")
onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)-1], by="sgRNAID")
onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)-1], by="sgRNAID")

onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"

data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))

df.melt <- melt(data.onehot[,c(1,2,4:ncol(data.onehot))], id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
write.table(df.id, "Chuai2018.cell.lines.structure.temp.gc.onehot1to4.txt", quote=F, row.names=F, sep="\t")


# pam (distance and nucleotide)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
sgRNA.pam <- read.table("Chuai2018.cell.lines.sgRNA.closestPAM.bed", header=T, sep="\t", stringsAsFactors = F)
sgRNA.pam.sub <- sgRNA.pam[,c(1,4,5)]
colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]
sgRNA.pam.id <- sgRNA.pam.df

score <- read.delim("Chuai2018.cell.lines.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(5,7)]
colnames(score.df) <- c("sgRNAID", "cut.score")

score.location <- left_join(score.df, sgRNA.pam.id, by="sgRNAID")
score.location$scale <- 0

df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.pam.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)

df <- read.delim("Chuai2018.cell.lines.structure.temp.gc.onehot1to4.txt", header=T, sep="\t")
df.onehot.dcast <- df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)

df.onehot.pam <- left_join(df.onehot.dcast, df.pam.dcast, by=c("sgRNAID"))

df.onehot.pam.na <- na.omit(df.onehot.pam)
nrow(df.onehot.pam.na)
# 16748


# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
sgRNA.genes <- read.table("Chuai2018.cell.lines.sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- sgRNA.genes[,c(4,14)]
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")
sgRNA.genes.id <- sgRNA.genes.df

score.location <- left_join(score.df, sgRNA.genes.id, by=c("sgRNAID"))
score.location$scale <- 0

df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.location.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.location.dcast.na <- na.omit(df.location.dcast)

df.pam.location <- inner_join(df.location.dcast.na, df.onehot.pam.na, by=c("sgRNAID"))
nrow(df.pam.location)
# 16748

df.final <- df.pam.location[,c(1:3,5:5915,5917:5921)]
ncol(df.final)
# 5919
write.table(df.final, "Chuai2018.cell.lines.raw.matrix.txt", quote=F, row.names=F, sep="\t")
# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)

# Monomer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Monomer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
seq <- read.delim("Chuai2018.cell.lines.sequence.txt", header=T, sep=" ", stringsAsFactors = F)

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq) <- seq[,1]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Chuai2018.cell.lines.quantum.monomer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Chuai2018.cell.lines.quantum.monomer.melt.txt", quote=F, row.names=F, sep="\t")


# Basepair
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Basepair.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
seq <- read.delim("Chuai2018.cell.lines.sequence.txt", header=T, sep=" ", stringsAsFactors = F)

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq) <- seq[,1]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Chuai2018.cell.lines.quantum.basepair.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Chuai2018.cell.lines.quantum.basepair.melt.txt", quote=F, row.names=F, sep="\t")


# Dimer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Dimer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
seq <- read.delim("Chuai2018.cell.lines.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.dimer <- seq %>% unite("p1", p1:p2, remove=F, sep= "") %>% unite("p2", p2:p3, remove=F, sep= "") %>% unite("p3", p3:p4, remove=F, sep= "") %>% unite("p4", p4:p5, remove=F, sep= "") %>% unite("p5", p5:p6, remove=F, sep= "") %>% unite("p6", p6:p7, remove=F, sep= "") %>% unite("p7", p7:p8, remove=F, sep= "") %>% unite("p8", p8:p9, remove=F, sep= "") %>% unite("p9", p9:p10, remove=F, sep= "") %>% unite("p10", p10:p11, remove=F, sep= "") %>% unite("p11", p11:p12, remove=F, sep= "") %>% unite("p12", p12:p13, remove=F, sep= "") %>% unite("p13", p13:p14, remove=F, sep= "") %>% unite("p14", p14:p15, remove=F, sep= "") %>% unite("p15", p15:p16, remove=F, sep= "") %>% unite("p16", p16:p17, remove=F, sep= "") %>% unite("p17", p17:p18, remove=F, sep= "") %>% unite("p18", p18:p19, remove=F, sep= "") %>% unite("p19", p19:p20, remove=T, sep= "")

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq.dimer) <- seq.dimer[,1]
seq.df <- seq.dimer[,1:20]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Chuai2018.cell.lines.quantum.dimer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Chuai2018.cell.lines.quantum.dimer.melt.txt", quote=F, row.names=F, sep="\t")


# Trimer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Trimer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
seq <- read.delim("Chuai2018.cell.lines.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.trimer <- seq %>% unite("p1", p1:p3, remove=F, sep= "") %>% unite("p2", p2:p4, remove=F, sep= "") %>% unite("p3", p3:p5, remove=F, sep= "") %>% unite("p4", p4:p6, remove=F, sep= "") %>% unite("p5", p5:p7, remove=F, sep= "") %>% unite("p6", p6:p8, remove=F, sep= "") %>% unite("p7", p7:p9, remove=F, sep= "") %>% unite("p8", p8:p10, remove=F, sep= "") %>% unite("p9", p9:p11, remove=F, sep= "") %>% unite("p10", p10:p12, remove=F, sep= "") %>% unite("p11", p11:p13, remove=F, sep= "") %>% unite("p12", p12:p14, remove=F, sep= "") %>% unite("p13", p13:p15, remove=F, sep= "") %>% unite("p14", p14:p16, remove=F, sep= "") %>% unite("p15", p15:p17, remove=F, sep= "") %>% unite("p16", p16:p18, remove=F, sep= "") %>% unite("p17", p17:p19, remove=F, sep= "") %>% unite("p18", p18:p20, remove=F, sep= "")

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq.trimer) <- seq.trimer[,1]
seq.df <- seq.trimer[,1:19]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Chuai2018.cell.lines.quantum.trimer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Chuai2018.cell.lines.quantum.trimer.melt.txt", quote=F, row.names=F, sep="\t")


# Tetramer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Tetramer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
seq <- read.delim("Chuai2018.cell.lines.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.tetramer <- seq %>% unite("p1", p1:p4, remove=F, sep= "") %>% unite("p2", p2:p5, remove=F, sep= "") %>% unite("p3", p3:p6, remove=F, sep= "") %>% unite("p4", p4:p7, remove=F, sep= "") %>% unite("p5", p5:p8, remove=F, sep= "") %>% unite("p6", p6:p9, remove=F, sep= "") %>% unite("p7", p7:p10, remove=F, sep= "") %>% unite("p8", p8:p11, remove=F, sep= "") %>% unite("p9", p9:p12, remove=F, sep= "") %>% unite("p10", p10:p13, remove=F, sep= "") %>% unite("p11", p11:p14, remove=F, sep= "") %>% unite("p12", p12:p15, remove=F, sep= "") %>% unite("p13", p13:p16, remove=F, sep= "") %>% unite("p14", p14:p17, remove=F, sep= "") %>% unite("p15", p15:p18, remove=F, sep= "") %>% unite("p16", p16:p19, remove=F, sep= "") %>% unite("p17", p17:p20, remove=F, sep= "") 

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq.tetramer) <- seq.tetramer[,1]
seq.df <- seq.tetramer[,1:18]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "Chuai2018.cell.lines.quantum.tetramer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "Chuai2018.cell.lines.quantum.tetramer.melt.txt", quote=F, row.names=F, sep="\t")



setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
monomer <- read.delim("Chuai2018.cell.lines.quantum.monomer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
basepair <- read.delim("Chuai2018.cell.lines.quantum.basepair.melt.txt", header=T, sep="\t", stringsAsFactors = F)
dimer <- read.delim("Chuai2018.cell.lines.quantum.dimer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
trimer <- read.delim("Chuai2018.cell.lines.quantum.trimer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
tetramer <- read.delim("Chuai2018.cell.lines.quantum.tetramer.melt.txt", header=T, sep="\t", stringsAsFactors = F)

monomer.basepair <- rbind(monomer, basepair)
monomer.basepair.dimer <- rbind(monomer.basepair, dimer)
monomer.basepair.dimer.trimer <- rbind(monomer.basepair.dimer, trimer)
monomer.basepair.dimer.trimer.tetramer <- rbind(monomer.basepair.dimer.trimer, tetramer)
write.table(monomer.basepair.dimer.trimer.tetramer, "Chuai2018.cell.lines.quantum.melt.txt", quote=F, row.names=F, sep="\t")
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
df <- read.delim("Chuai2018.cell.lines.raw.matrix.txt", header=T, sep="\t", stringsAsFactors = F)

# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
tensor <- read.delim("Chuai2018.cell.lines.quantum.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0

tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0

df.score <- unique(df[,c(1,2)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
colnames(tensor.score.order) <- c("cut.score", "feature.scale", "sgRNAID", "value")

df.dcast <- tensor.score.order %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
nrow(df.dcast.na)
# 16748

df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
ncol(df.location)
# 6236
write.table(df.location, "Chuai2018.cell.lines.finalquantum.txt", quote=F, row.names=F, sep="\t")
iRF - quantitative
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
df <- read.delim("Chuai2018.cell.lines.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df.cut <- df[,c(1:5919,5921:6236)]
df.num <- mutate_all(df.cut[,2:ncol(df.cut)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df.cut[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"

write.table(df.all[,c(1,3:ncol(df.all))], "Chuai2018.cell.lines.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "Chuai2018.cell.lines.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "Chuai2018.cell.lines.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Chuai2018.cell.lines.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "Chuai2018.cell.lines.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.all[,2]), "Chuai2018.cell.lines.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

library(stringr)
df.1 <- df.all %>% filter(str_detect(sgRNAID, "HCT116"))
write.table(df.1[,c(1,3:ncol(df.1))], "Chuai2018.HCT116.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.1[,c(1,3:ncol(df.1))], "Chuai2018.HCT116.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.1[,3:ncol(df.1)], "Chuai2018.HCT116.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.1[,1:2], "Chuai2018.HCT116.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.1[,1:2], "Chuai2018.HCT116.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.1[,2]), "Chuai2018.HCT116.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

df.2 <- df.all %>% filter(str_detect(sgRNAID, "HEK293T"))
write.table(df.2[,c(1,3:ncol(df.2))], "Chuai2018.HEK293T.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.2[,c(1,3:ncol(df.2))], "Chuai2018.HEK293T.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.2[,3:ncol(df.2)], "Chuai2018.HEK293T.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.2[,1:2], "Chuai2018.HEK293T.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.2[,1:2], "Chuai2018.HEK293T.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.2[,2]), "Chuai2018.HEK293T.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

df.3 <- df.all %>% filter(str_detect(sgRNAID, "HELA"))
write.table(df.3[,c(1,3:ncol(df.3))], "Chuai2018.HELA.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.3[,c(1,3:ncol(df.3))], "Chuai2018.HELA.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.3[,3:ncol(df.3)], "Chuai2018.HELA.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.3[,1:2], "Chuai2018.HELA.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.3[,1:2], "Chuai2018.HELA.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.3[,2]), "Chuai2018.HELA.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

df.4 <- df.all %>% filter(str_detect(sgRNAID, "HL60"))
write.table(df.4[,c(1,3:ncol(df.4))], "Chuai2018.HL60.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.4[,c(1,3:ncol(df.4))], "Chuai2018.HL60.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.4[,3:ncol(df.4)], "Chuai2018.HL60.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.4[,1:2], "Chuai2018.HL60.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.4[,1:2], "Chuai2018.HL60.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.4[,2]), "Chuai2018.HL60.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")


# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Chuai2018.cell.lines --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/Chuai2018.cell.lines.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/Chuai2018.cell.lines.finalquantum.score.txt

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HCT116
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HCT116
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Chuai2018.HCT116 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/Chuai2018.HCT116.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/Chuai2018.HCT116.finalquantum.score.txt

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HEK293T
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HEK293T
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Chuai2018.HEK293T --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/Chuai2018.HEK293T.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/Chuai2018.HEK293T.finalquantum.score.txt

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HELA
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HELA
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Chuai2018.HELA --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/Chuai2018.HELA.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/Chuai2018.HELA.finalquantum.score.txt

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HL60
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HL60
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Chuai2018.HL60 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/Chuai2018.HL60.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/Chuai2018.HL60.finalquantum.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Chuai2018.finalquantum
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/Submits/submit_full_Chuai2018.cell.lines_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HCT116/Submits/submit_full_Chuai2018.HCT116_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HEK293T/Submits/submit_full_Chuai2018.HEK293T_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HELA/Submits/submit_full_Chuai2018.HELA_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HL60/Submits/submit_full_Chuai2018.HL60_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/Submits/submit_train_Chuai2018.cell.lines_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HCT116/Submits/submit_train_Chuai2018.HCT116_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HEK293T/Submits/submit_train_Chuai2018.HEK293T_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HELA/Submits/submit_train_Chuai2018.HELA_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HL60/Submits/submit_train_Chuai2018.HL60_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/Submits/submit_test_Chuai2018.cell.lines_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HCT116/Submits/submit_test_Chuai2018.HCT116_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HEK293T/Submits/submit_test_Chuai2018.HEK293T_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HELA/Submits/submit_test_Chuai2018.HELA_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HL60/Submits/submit_test_Chuai2018.HL60_0.sh

# Andes
#module load python/3.7-anaconda3
module load python/3.7.0-anaconda3-5.3.0

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/YNames.txt Chuai2018.cell.lines
# 0.2215839985177171
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Chuai2018.cell.lines_cut.score.importance4 | head
# p18monomer.HLgap.eVraw: 5.59946
# p18monomer.No.electronsraw: 4.61317
# p17tetramer.Hbond.energyraw: 4.45884
# p3tetramer.Hlgap.eVEraw: 3.59588
# p8tetramer.Hbond.stackingraw: 3.52185
# p5tetramer.Hbond.stackingraw: 3.39563
# p17tetramer.Hbond.stackingraw: 3.22668
# p13tetramer.Hbond.stackingraw: 3.03115
# p11tetramer.Hlgap.eVEraw: 2.97624
# p17tetramer.Hlgap.eVEraw: 2.95762

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Chuai2018.cell.lines_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4950757

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HCT116
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/YNames.txt Chuai2018.HCT116
# 0.09946901052076344
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Chuai2018.HCT116_cut.score.importance4 | head
# p17tetramer.Hbond.stackingraw: 2.50685
# V71.xsgRNA.raw: 1.81098
# p18monomer.No.electronsraw: 1.51649
# p18monomer.HLgap.eVraw: 1.36099
# p14tetramer.Hbond.energyraw: 1.33923
# p14trimer.Hlgap.eVEraw: 1.07283
# p18trimer.Hbond.stackingraw: 1.05095
# p11tetramer.Hbond.stackingraw: 0.965189
# p12trimer.Hlgap.eVEraw: 0.890748
# p14trimer.Hbond.energyraw: 0.890312

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HCT116/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Chuai2018.HCT116_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.3326642


cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HEK293T
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/YNames.txt Chuai2018.HEK293T
# 0.07072574111118635
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Chuai2018.HEK293T_cut.score.importance4 | head
# p17tetramer.Hbond.energyraw: 1.07532
# GCsgRNA.raw: 0.500497
# p18trimer.Hbond.stackingraw: 0.486274
# gene.distance0: 0.453329
# p17tetramer.Hlgap.eVEraw: 0.418314
# p16tetramer.Hbond.energyraw: 0.415888
# p2tetramer.Hbond.energyraw: 0.296519
# AsgRNA.raw: 0.283171
# p16trimer.Hlgap.eVEraw: 0.263133
# CTsgRNA.raw: 0.255572

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HEK293T/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Chuai2018.HEK293T_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.2178619


cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HELA
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/YNames.txt Chuai2018.HELA
# 0.1073818896135052
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Chuai2018.HELA_cut.score.importance4 | head
# V71.xsgRNA.raw: 6.58317
# p17tetramer.Hbond.energyraw: 5.05808
# p5tetramer.Hbond.stackingraw: 2.18043
# p3tetramer.Hlgap.eVEraw: 2.09323
# p16tetramer.Hbond.stackingraw: 1.83042
# p9tetramer.Hbond.stackingraw: 1.74105
# p10tetramer.Hlgap.eVEraw: 1.60667
# p6tetramer.Hlgap.eVEraw: 1.57217
# p16monomer.No.electronsraw: 1.528
# p11tetramer.Hlgap.eVEraw: 1.52691

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HELA/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Chuai2018.HELA_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.3185771


cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HL60
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/YNames.txt Chuai2018.HL60
# 0.12180358813347845
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Chuai2018.HL60_cut.score.importance4 | head
# p18trimer.Hbond.energyraw: 0.825499
# p18trimer.Hbond.stackingraw: 0.543373
# p20monomer.No.electronsraw: 0.449361
# p7tetramer.Hbond.stackingraw: 0.321361
# p17tetramer.Hbond.energyraw: 0.29536
# p17tetramer.Hlgap.eVEraw: 0.247908
# p14trimer.Hbond.energyraw: 0.230709
# p10tetramer.Hlgap.eVEraw: 0.211915
# p3tetramer.Hlgap.eVEraw: 0.203093
# sgRNA.structuresgRNA.raw: 0.202799

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/iRF.run/HL60/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Chuai2018.HL60_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4012635
iRF - classification
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines")
df <- read.delim("Chuai2018.cell.lines.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df.cut <- df[,c(1:5919,5921:6236)]
df.na <- na.omit(df.cut)
df.num <- mutate_all(df.cut[,2:ncol(df.cut)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df.cut[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"

score <- read.delim("Chuai2018.cell.lines.classification.txt", header=T, sep="\t")
df.score <- inner_join(score[,c(1,3)], df.all[,c(1,3:ncol(df.all))], by=c("sgRNAID"))

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification")
write.table(df.score[,c(1,3:ncol(df.score))], "Chuai2018.cell.lines.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.score[,c(1,3:ncol(df.score))], "Chuai2018.cell.lines.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.score[,3:ncol(df.score)], "Chuai2018.cell.lines.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.score[,1:2], "Chuai2018.cell.lines.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.score[,1:2], "Chuai2018.cell.lines.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.score[,2]), "Chuai2018.cell.lines.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

library(stringr)
df.1 <- df.score %>% filter(str_detect(sgRNAID, "HCT116"))
write.table(df.1[,c(1,3:ncol(df.1))], "Chuai2018.HCT116.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.1[,c(1,3:ncol(df.1))], "Chuai2018.HCT116.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.1[,3:ncol(df.1)], "Chuai2018.HCT116.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.1[,1:2], "Chuai2018.HCT116.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.1[,1:2], "Chuai2018.HCT116.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.1[,2]), "Chuai2018.HCT116.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

df.2 <- df.score %>% filter(str_detect(sgRNAID, "HEK293T"))
write.table(df.2[,c(1,3:ncol(df.2))], "Chuai2018.HEK293T.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.2[,c(1,3:ncol(df.2))], "Chuai2018.HEK293T.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.2[,3:ncol(df.2)], "Chuai2018.HEK293T.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.2[,1:2], "Chuai2018.HEK293T.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.2[,1:2], "Chuai2018.HEK293T.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.2[,2]), "Chuai2018.HEK293T.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

df.3 <- df.score %>% filter(str_detect(sgRNAID, "HELA"))
write.table(df.3[,c(1,3:ncol(df.3))], "Chuai2018.HELA.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.3[,c(1,3:ncol(df.3))], "Chuai2018.HELA.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.3[,3:ncol(df.3)], "Chuai2018.HELA.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.3[,1:2], "Chuai2018.HELA.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.3[,1:2], "Chuai2018.HELA.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.3[,2]), "Chuai2018.HELA.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

df.4 <- df.score %>% filter(str_detect(sgRNAID, "HL60"))
write.table(df.4[,c(1,3:ncol(df.4))], "Chuai2018.HL60.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.4[,c(1,3:ncol(df.4))], "Chuai2018.HL60.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.4[,3:ncol(df.4)], "Chuai2018.HL60.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.4[,1:2], "Chuai2018.HL60.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.4[,1:2], "Chuai2018.HL60.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.4[,2]), "Chuai2018.HL60.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")


# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Chuai2018.cell.lines --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/Chuai2018.cell.lines.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/Chuai2018.cell.lines.finalquantum.score.txt

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HCT116
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HCT116
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Chuai2018.HCT116 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/Chuai2018.HCT116.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/Chuai2018.HCT116.finalquantum.score.txt

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HEK293T
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HEK293T
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Chuai2018.HEK293T --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/Chuai2018.HEK293T.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/Chuai2018.HEK293T.finalquantum.score.txt

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HELA
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HELA
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Chuai2018.HELA --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/Chuai2018.HELA.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/Chuai2018.HELA.finalquantum.score.txt

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HL60
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HL60
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Chuai2018.HL60 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/Chuai2018.HL60.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/Chuai2018.HL60.finalquantum.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/Submits/submit_full_Chuai2018.cell.lines_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HCT116/Submits/submit_full_Chuai2018.HCT116_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HEK293T/Submits/submit_full_Chuai2018.HEK293T_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HELA/Submits/submit_full_Chuai2018.HELA_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HL60/Submits/submit_full_Chuai2018.HL60_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/Submits/submit_train_Chuai2018.cell.lines_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HCT116/Submits/submit_train_Chuai2018.HCT116_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HEK293T/Submits/submit_train_Chuai2018.HEK293T_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HELA/Submits/submit_train_Chuai2018.HELA_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HL60/Submits/submit_train_Chuai2018.HL60_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/Submits/submit_test_Chuai2018.cell.lines_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HCT116/Submits/submit_test_Chuai2018.HCT116_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HEK293T/Submits/submit_test_Chuai2018.HEK293T_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HELA/Submits/submit_test_Chuai2018.HELA_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HL60/Submits/submit_test_Chuai2018.HL60_0.sh

# Andes
module load python/3.7-anaconda3

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/YNames.txt Chuai2018.cell.lines
# 
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Chuai2018.cell.lines_cut.score.importance4 | head

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Chuai2018.cell.lines_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HCT116
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/YNames.txt Chuai2018.HCT116
# 
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Chuai2018.HCT116_cut.score.importance4 | head

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HCT116/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Chuai2018.HCT116_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 


cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HEK293T
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/YNames.txt Chuai2018.HEK293T
# 
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Chuai2018.HEK293T_cut.score.importance4 | head

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HEK293T/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Chuai2018.HEK293T_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 


cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HELA
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/YNames.txt Chuai2018.HELA
# 
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Chuai2018.HELA_cut.score.importance4 | head

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HELA/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Chuai2018.HELA_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 


cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HL60
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/YNames.txt Chuai2018.HL60
# 
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Chuai2018.HL60_cut.score.importance4 | head

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/cell.lines/classification/iRF.run/HL60/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Chuai2018.HL60_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 

Doench & Chuai

library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018")
chuai <- read.delim("Chuai2018.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED")
doench <- read.delim("Doench2014CORRECTED.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)
doench.chuai <- rbind(doench, chuai)
nrow(doench)
# 1277
nrow(chuai)
# 16748
ncol(doench.chuai)
# 6235
nrow(doench.chuai)
# 17421

write.table(doench.chuai, "Doench2014CORRECTED.Chuai2018.finalquantum.txt", quote=F, row.names=F, sep="\t")

write.table(doench.chuai[,c(1,3:ncol(doench.chuai))], "Doench2014CORRECTED.Chuai2018.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(doench.chuai[,c(1,3:ncol(doench.chuai))], "Doench2014CORRECTED.Chuai2018.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(doench.chuai[,3:ncol(doench.chuai)], "Doench2014CORRECTED.Chuai2018.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(doench.chuai[,1:2], "Doench2014CORRECTED.Chuai2018.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(doench.chuai[,1:2], "Doench2014CORRECTED.Chuai2018.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = doench.chuai[,2]), "Doench2014CORRECTED.Chuai2018.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")


# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Doench2014CORRECTED.Chuai2018
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Doench2014CORRECTED.Chuai2018
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName Doench2014CORRECTED.Chuai2018 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/Doench2014CORRECTED.Chuai2018.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/Doench2014CORRECTED.Chuai2018.finalquantum.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Doench2014CORRECTED.Chuai2018
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Doench2014CORRECTED.Chuai2018/Submits/submit_full_Doench2014CORRECTED.Chuai2018_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Doench2014CORRECTED.Chuai2018/Submits/submit_train_Doench2014CORRECTED.Chuai2018_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Doench2014CORRECTED.Chuai2018/Submits/submit_test_Doench2014CORRECTED.Chuai2018_0.sh

# Andes
module load python/3.7-anaconda3

vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Doench2014CORRECTED.Chuai2018
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/YNames.txt Doench2014CORRECTED.Chuai2018
# 0.2116713321128397

sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/Doench2014CORRECTED.Chuai2018_cut.score.importance4 | head
# p18monomer.HLgap.eVraw: 5.89143
# p18monomer.No.electronsraw: 5.39423
# p17tetramer.Hbond.energyraw: 4.44324
# gene.distance0: 3.76072
# p20monomer.No.electronsraw: 3.65848
# p1tetramer.Hbond.stackingraw: 3.53852
# p5tetramer.Hbond.stackingraw: 3.51513
# p13tetramer.Hbond.stackingraw: 3.49967
# p11tetramer.Hlgap.eVEraw: 3.42702
# p14tetramer.Hbond.energyraw: 3.33662

# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Doench2014CORRECTED.Chuai2018/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("Doench2014CORRECTED.Chuai2018_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4964907
# scatter plots
setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/human")
pred <- read.delim("Doench2014CORRECTED.Chuai2018_Set4_test.prediction", header=T, sep="\t", stringsAsFactors = F)
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t", stringsAsFactors = F)

pred.y <- cbind(pred, y)
pred.y$row_num <- seq.int(nrow(pred.y)) 
colnames(pred.y) <- c("pred", "yvec", "id")

library(ggplot2)
ggplot(pred.y, aes(x=yvec, y=pred)) + geom_point(stat="identity") + geom_smooth(method='lm') + theme_classic()
cor(pred.y$yvec, pred.y$pred)
# 0.4964907

library(dplyr)
pred.y.rank <- pred.y %>% mutate(yvec.rank=dense_rank(desc(-yvec)), pred.rank=dense_rank(desc(-pred)))
ggplot(pred.y.rank, aes(x=yvec.rank, y=pred.rank)) + geom_point(stat="identity") + geom_smooth(method='lm') + theme_classic()
cor(pred.y.rank$yvec.rank, pred.y.rank$pred.rank)
# 0.4823223


### is it better at predicting high or low scores??  based on input data??
## look at the distribution of scores and segment as high or low cutting efficiency??

ggplot(pred.y, aes(x=yvec)) + geom_density() + theme_classic()
pred.y.low <- subset(pred.y, pred.y$yvec < 0.25)
cor(pred.y.low$yvec, pred.y.low$pred)
# 0.2370957
pred.y.high <- subset(pred.y, pred.y$yvec > 0.25)
cor(pred.y.high$yvec, pred.y.high$pred)
# 0.3081182
### NOPE... what about classifying as high or low so the rank as binary

pred.y.binary <- pred.y.rank %>% mutate(yvec.binary = ifelse(yvec < 0.25, 0, 1), yvec.label =  ifelse(yvec < 0.25, "low", "high"))
cor(pred.y.binary$yvec.binary, pred.y.binary$pred)
# 0.4339365
ggplot(pred.y.binary, aes(x=yvec.label, y=pred, fill=yvec.label)) + geom_boxplot() + theme_classic()

pred.y.binary <- pred.y.rank %>% mutate(yvec.binary = ifelse(yvec < 0.2, 0, ifelse(yvec > 0.4, 1, 0.5)), yvec.label =  ifelse(yvec < 0.2, "low (< 0.2)", ifelse(yvec > 0.4, "high (> 0.4)", "mid")))
ggplot(pred.y.binary, aes(x=yvec.label, y=pred, fill=yvec.label)) + geom_boxplot() + theme_classic()
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00

#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Doench2014CORRECTED.Chuai2018/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score Doench2014CORRECTED.Chuai2018

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Doench2014CORRECTED.Chuai2018/cut.score/RIT.run

# p18monomer.HLgap.eVraw    cut.score   0.024405835175336094    -6.973813582017135e-08  8187.692    0.2760956586167866
# p17tetramer.Hbond.energyraw   cut.score   0.02169399861124001 1.5233381891088458e-08  9650.78 0.22856400907187682
# p18monomer.No.electronsraw    cut.score   0.021145261282387834    -6.358427541271254e-08  7186.54 0.2823827475128112
# p5tetramer.Hbond.stackingraw  cut.score   0.016134459563739226    6.340779483919579e-09   3446.598    0.24080745334174244
# p3tetramer.Hlgap.eVEraw   cut.score   0.015529620296296136    2.7170416771668338e-08  3525.94 0.23787769424407473
# gene.distance0    cut.score   0.015285783299130412    2.7145958692695434e-09  4248.845    0.2148993919163693
# p11tetramer.Hlgap.eVEraw  cut.score   0.01502393955729636 8.05942291919635e-11    3311.552    0.22581979882306788
# p8tetramer.Hbond.stackingraw  cut.score   0.014360361525453497    5.2422111591566174e-09  2841.643    0.2374983709116446
# p20monomer.No.electronsraw    cut.score   0.014346761055492067    4.712984943453509e-08   7186.305    0.21207099138346416
# p13tetramer.Hbond.stackingraw cut.score   0.01425665794199756 -6.537779000631687e-09  3322.548    0.2333450315725802
Figures
library(ggplot2)
library(reshape2)
library(RColorBrewer)

## Main H.sapien feature figure
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Doench2014CORRECTED.Chuai2018/cut.score")
imp <- read.delim("Doench2014CORRECTED.Chuai2018.importance4.effect_sorted", header=F, sep="\t", stringsAsFactors = F)
colnames(imp) <- c("Feature", "YVec", "NormEdge", "Effect", "Samples", "Linearity")
imp$Normalized.Importance <- as.numeric(imp$NormEdge)
imp$Feature.Effect <- as.numeric(imp$Effect)
imp$SampleCount <- as.numeric(imp$Samples)
imp.dir <- imp %>% mutate(Effect.Direction = ifelse(Feature.Effect < 0, "neg", ifelse(Feature.Effect > 0, "pos", "zero")))
imp.dir.top20 <- imp.dir[1:20,]

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
imp.dir.top20.df <- imp.dir.top20 %>% mutate(imp.dir = ifelse(Effect.Direction == "neg", Normalized.Importance*-1, Normalized.Importance))
imp.dir.top20.df$Feature.Label <- c("Monomer HL-gap pos18", "Tetramer H-bond pos17", "Monomer # of Electrons pos18", "Tetramer H-stacking pos5", "Tetramer HL-gap pos3", "Distance to Gene", "Tetramer HL-gap pos11", "Tetrmaer H-stacking pos8", "Monomer # of Electrons pos20", "Tetramer H-stacking pos13", "Tetramer H-stacking pos1", "Tetramer HL-gap pos17", "Tetramer H-stacking pos9", "Tetramer H-stacking pos11", "Tetramer H-bond pos14", "Tetramer HL-gap pos1", "Tetramer HL-gap pos13", "Tetramer HL-gap pos8", "Tetramer HL-gap pos7", "Tetramer H-stacking pos17")

library(ggplot2)
pdf("DoenchChuai.FeatureEngineering.pdf")
ggplot(imp.dir.top20.df, aes(x=reorder(Feature.Label, -Normalized.Importance), y=imp.dir, color=Effect.Direction)) + geom_point(size=3) + geom_segment(aes(x=Feature.Label, xend=Feature.Label, y=0, yend=imp.dir)) + labs(title="H.sapien Top Features") + ylab("Normalized Importance") + xlab("") + theme(axis.text.x = element_text(angle=90, vjust=0.6)) + scale_fill_brewer(palette="Set1") + theme_classic() + coord_flip()
dev.off()

library(ggplot2)
pdf("DoenchChuai.FeatureEngineering.31May.pdf")
ggplot(imp.dir.top20.df, aes(x=reorder(Feature.Label, Normalized.Importance), y=imp.dir, color=Effect.Direction)) + geom_point(size=3) + geom_segment(aes(x=Feature.Label, xend=Feature.Label, y=0, yend=imp.dir)) + labs(title="H.sapien Top Features") + ylab("Normalized Importance") + xlab("") + theme(axis.text.x = element_text(angle=90, vjust=0.6)) + scale_fill_brewer(palette="Set1") + theme_classic() + coord_flip()
dev.off()




#### Figure S3: Focus on effect size
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Chuai2018/iRF.run/Doench2014CORRECTED.Chuai2018/cut.score")
imp <- read.delim("Doench2014CORRECTED.Chuai2018.importance4.effect_sorted", header=F, sep="\t", stringsAsFactors = F)
colnames(imp) <- c("Feature", "YVec", "NormEdge", "Effect", "Samples", "Linearity")
imp$Normalized.Importance <- as.numeric(imp$NormEdge)
imp$Feature.Effect <- as.numeric(imp$Effect)
imp$SampleCount <- as.numeric(imp$Samples)
imp.dir <- imp %>% mutate(Effect.Direction = ifelse(Feature.Effect < 0, "neg", ifelse(Feature.Effect > 0, "pos", "zero")))
imp.dir$absEffect <- abs(imp.dir$Feature.Effect)
imp.dir.effectsorted <- imp.dir[order(imp.dir$absEffect, decreasing = TRUE),]
imp.dir.effectsorted.top20 <- imp.dir.effectsorted[1:20,]

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
pdf("DoenchChuai.Top20Effect.Effect.30March.pdf")
imp.dir.effectsorted.top20$Feature.Label <- c("CTG pos2", "TCC pos15", "CC pos19", "AC pos1", "CACC pos12", "TGCA pos3", "AGAG pos10", "GATC pos1", "CACC pos7", "GCA pos1", "TCAG pos7", "GAC pos7", "ATGT pos2", "CAC pos5", "CAAT pos12", "CCTA pos9", "CACC pos2", "CTCC pos11", "GATG pos1", "GTAC pos13")
ggplot(imp.dir.effectsorted.top20) + geom_point(aes(x=reorder(Feature.Label, -absEffect), y=absEffect, color=Effect.Direction, size=Normalized.Importance)) + xlab("") + ylab("abs(Effect Size)") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Set2")
dev.off()

Classification

  • read through matrix and classify as binary by quantile
    • q1: cutting efficiency < 0.25 = 0, 1
    • q2: cutting efficiency < 0.50 = 0, 1
    • q3: cutting efficiency < 0.75 = 0, 1
# mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification
  
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/")
features <- read.delim("Doench2014CORRECTED.Chuai2018.finalquantum.features.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("Doench2014CORRECTED.Chuai2018.finalquantum.score.txt", header=T, sep="\t", stringsAsFactors = F)
summary(score$cut.score)

library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification")
score.q1 <- score %>% mutate(cut.score = ifelse(cut.score < 0.25, 0, 1))
score.q2 <- score %>% mutate(cut.score = ifelse(cut.score < 0.50, 0, 1))
score.q3 <- score %>% mutate(cut.score = ifelse(cut.score < 0.75, 0, 1))

feature.score.q1 <- left_join(score.q1, features, by="sgRNAID")
write.table(feature.score.q1[,2:ncol(feature.score.q1)], "Doench2014CORRECTED.Chuai2018.finalquantum.classify.q1.iRFmatrix.tsv", quote=F, row.names=F, sep=",")
feature.score.q2 <- left_join(score.q2, features, by="sgRNAID")
write.table(feature.score.q2[,2:ncol(feature.score.q2)], "Doench2014CORRECTED.Chuai2018.finalquantum.classify.q2.iRFmatrix.tsv", quote=F, row.names=F, sep=",")
feature.score.q3 <- left_join(score.q3, features, by="sgRNAID")
write.table(feature.score.q3[,2:ncol(feature.score.q3)], "Doench2014CORRECTED.Chuai2018.finalquantum.classify.q3.iRFmatrix.tsv", quote=F, row.names=F, sep=",")

write.table(feature.score.q1[,1:2], "Doench2014CORRECTED.Chuai2018.finalquantum.classify.q1.score.txt", quote=F, row.names=F, sep="\t")
write.table(feature.score.q1[,1:2], "Doench2014CORRECTED.Chuai2018.finalquantum.classify.q1.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = feature.score.q1[,2]), "Doench2014CORRECTED.Chuai2018.finalquantum.classify.q1.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(feature.score.q2[,1:2], "Doench2014CORRECTED.Chuai2018.finalquantum.classify.q2.score.txt", quote=F, row.names=F, sep="\t")
write.table(feature.score.q2[,1:2], "Doench2014CORRECTED.Chuai2018.finalquantum.classify.q2.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = feature.score.q2[,2]), "Doench2014CORRECTED.Chuai2018.finalquantum.classify.q2.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(feature.score.q3[,1:2], "Doench2014CORRECTED.Chuai2018.finalquantum.classify.q3.score.txt", quote=F, row.names=F, sep="\t")
write.table(feature.score.q3[,1:2], "Doench2014CORRECTED.Chuai2018.finalquantum.classify.q3.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = feature.score.q3[,2]), "Doench2014CORRECTED.Chuai2018.finalquantum.classify.q3.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

write.table(features, "Doench2014CORRECTED.Chuai2018.finalquantum.classify.features.txt", quote=F, row.names=F, sep="\t")
write.table(features, "Doench2014CORRECTED.Chuai2018.finalquantum.classify.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(features[,2:ncol(features)], "Doench2014CORRECTED.Chuai2018.finalquantum.classify.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

iRF

module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q1.iRF
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q1.iRF
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName classify.q1 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/Doench2014CORRECTED.Chuai2018.finalquantum.classify.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/Doench2014CORRECTED.Chuai2018.finalquantum.classify.q1.score.txt
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q2.iRF
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q2.iRF
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName classify.q2 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/Doench2014CORRECTED.Chuai2018.finalquantum.classify.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/Doench2014CORRECTED.Chuai2018.finalquantum.classify.q2.score.txt
mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q3.iRF
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q3.iRF
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName classify.q3 --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/Doench2014CORRECTED.Chuai2018.finalquantum.classify.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/Doench2014CORRECTED.Chuai2018.finalquantum.classify.q3.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q1.iRF/Submits/submit_full_classify.q1_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q2.iRF/Submits/submit_full_classify.q2_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q3.iRF/Submits/submit_full_classify.q3_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q1.iRF/Submits/submit_train_classify.q1_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q2.iRF/Submits/submit_train_classify.q2_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q3.iRF/Submits/submit_train_classify.q3_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q1.iRF/Submits/submit_test_classify.q1_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q2.iRF/Submits/submit_test_classify.q2_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q3.iRF/Submits/submit_test_classify.q3_0.sh

# Andes
module load python/3.7.0-anaconda3-5.3.0

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q1.iRF
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/iRF.run/YNames.txt classify.q1
# 0.1941056538650136
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/classify.q1_cut.score.importance4 | head
# p16tetramer.Hbond.stackingraw: 31.572
# p10tetramer.Hlgap.eVEraw: 29.6904
# p1tetramer.Hlgap.eVEraw: 29.2804
# p14tetramer.Hbond.energyraw: 28.0641
# p1tetramer.Hbond.stackingraw: 27.6199
# p18trimer.Hbond.stackingraw: 27.5189
# p5tetramer.Hbond.stackingraw: 27.0375
# p7tetramer.Hlgap.eVEraw: 26.6611
# p11tetramer.Hlgap.eVEraw: 26.4983
# p6tetramer.Hlgap.eVEraw: 25.6329

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q1.iRF/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("classify.q1_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.4893618

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q2.iRF
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/iRF.run/YNames.txt classify.q2
# 0.1337298835000708
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/classify.q2_cut.score.importance4 | head
# gene.distance0: 15.7001
# p8tetramer.Hbond.stackingraw: 12.7385
# p5tetramer.Hbond.stackingraw: 11.9665
# p1tetramer.Hbond.stackingraw: 11.7926
# p11tetramer.Hbond.stackingraw: 11.3131
# p13tetramer.Hbond.stackingraw: 9.38647
# p9tetramer.Hbond.stackingraw: 9.28884
# p1tetramer.Hlgap.eVEraw: 9.09503
# p8tetramer.Hlgap.eVEraw: 8.49876
# p11tetramer.Hlgap.eVEraw: 8.14378

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q2.iRF/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("classify.q2_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.3747989


cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q3.iRF
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/iRF.run/YNames.txt classify.q3
# -0.011371164612896233
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/classify.q3_cut.score.importance4 | head
# gene.distance0: 3.33147
# p8tetramer.Hbond.stackingraw: 1.53449
# p3tetramer.Hbond.stackingraw: 1.2385
# p1tetramer.Hbond.stackingraw: 1.18453
# p8tetramer.Hlgap.eVEraw: 1.16985
# p13tetramer.Hbond.stackingraw: 0.786996
# p9tetramer.Hbond.stackingraw: 0.637593
# p1tetramer.Hlgap.eVEraw: 0.622084
# p17tetramer.Hbond.stackingraw: 0.619632
# p10tetramer.Hbond.stackingraw: 0.617121

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED/binary.classification/q3.iRF/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("classify.q3_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.1548719

Combine matrices

Train multi-species model

–> use human, y.lipolytica, and e.coli to train the model –> then test the output on each dataset

# mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species

######################## need to normalize cut score across datasets... ######################## 
# z = (xi - min(x)) / (max(x) - min(x))

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
doench <- read.delim("Doench2014.raw.onehot.tensor.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
doench$cut.score <- doench$cut.score.x
doench.cut <- doench[,c(1,1657, 3:1654, 1656)]
ncol(doench.cut)
# 1655
nrow(doench.cut)
# 1825
doench.id <- separate(doench.cut, sgRNAID, c("data", "sgRNAID"))
doench.num <- mutate_all(doench.id[,2:ncol(doench.id)], function(x) as.numeric(as.character(x)))
doench.num$cut.score <- (doench.num$cut.score - min(doench.num$cut.score)) / (max(doench.num$cut.score) - min(doench.num$cut.score))
summary(doench.num$cut.score)
#    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
# 0.00000 0.04388 0.11479 0.19639 0.28086 1.00000 

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
lipolytica <- read.delim("y.lipolytica.raw.onehot.tensor.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
lipolytica <- lipolytica[,c(1,1656,3:1649,1651:1655,1657)]
ncol(lipolytica)
# 1655
nrow(lipolytica)
# 45271
lipolytica.num <- mutate_all(lipolytica[,1:ncol(lipolytica)], function(x) as.numeric(as.character(x)))
lipolytica.num$cut.score <- (lipolytica.num$cut.score - min(lipolytica.num$cut.score)) / (max(lipolytica.num$cut.score) - min(lipolytica.num$cut.score))
summary(lipolytica.num$cut.score)
 #   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 # 0.0000  0.2167  0.2877  0.3389  0.4460  1.0000 


setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
ecoli <- read.delim("Ecoli.allCas9.raw.onehot.tensor.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
ecoli.sep <- ecoli %>% separate(sgRNAID, c("sgRNA", "ID", "type"), sep="_")
ecoli.cas9 <- subset(ecoli.sep, ecoli.sep$type == "Cas9")
ecoli <- ecoli.cas9[,c(1:3,1658,5:1651,1653:1657,1659)]
ecoli <- ecoli %>% unite(sgRNAID, c("sgRNA", "ID", "type"), sep="_")
ncol(ecoli)
# 1655
nrow(ecoli)
# 40468
ecoli.num <- mutate_all(ecoli[,2:ncol(ecoli)], function(x) as.numeric(as.character(x)))
ecoli.num$cut.score <- (ecoli.num$cut.score - min(ecoli.num$cut.score)) / (max(ecoli.num$cut.score) - min(ecoli.num$cut.score))
ecoli.num <- cbind(data.frame("sgRNAID" = ecoli$sgRNAID), ecoli.num)
summary(ecoli.num$cut.score)
 #   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 # 0.0000  0.3563  0.5618  0.5077  0.6757  1.0000 
ecoli.num.sample <- ecoli.num[sample(nrow(ecoli.num), 1000), ]

all <- rbind(doench.num, lipolytica.num, ecoli.num)
ncol(all)
# 1655
nrow(all)
# 87564

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species")
write.table(all, "doench.baisya.ecoli.noDWT.raw.onehot.tensor.pam.location.features.id.score.txt", quote=F, row.names=F, sep="\t")

write.table(all[,c(1,3:ncol(all))], "doench.baisya.ecoli.noDWT.raw.onehot.tensor.pam.location.features.txt", quote=F, row.names=F, sep="\t")
write.table(all[,c(1,3:ncol(all))], "doench.baisya.ecoli.noDWT.raw.onehot.tensor.pam.location.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(all[,3:ncol(all)], "doench.baisya.ecoli.noDWT.raw.onehot.tensor.pam.location.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(all[,1:2], "doench.baisya.ecoli.noDWT.score.txt", quote=F, row.names=F, sep="\t")
write.table(all[,1:2], "doench.baisya.ecoli.noDWT.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = all[,2]), "doench.baisya.ecoli.noDWT.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")


doench.num.sample <- doench.num[sample(nrow(doench.num), 1000), ]
doench.num.sample$sgRNAID <- paste0("doench_", doench.num.sample$sgRNAID)
lipolytica.num.sample <- lipolytica.num[sample(nrow(lipolytica.num), 1000), ]
lipolytica.num.sample$sgRNAID <- paste0("lipolytica_", lipolytica.num.sample$sgRNAID)
ecoli.num.sample <- ecoli.num[sample(nrow(ecoli.num), 1000), ]
ecoli.num.sample$sgRNAID <- paste0("ecoli_", ecoli.num.sample$sgRNAID)

all <- rbind(doench.num.sample, lipolytica.num.sample, ecoli.num.sample)
ncol(all)
# 1655
nrow(all)
# 3000

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species")
write.table(all, "sample.doench.baisya.ecoli.noDWT.raw.onehot.tensor.pam.location.features.id.score.txt", quote=F, row.names=F, sep="\t")

write.table(all[,c(1,3:ncol(all))], "sample.doench.baisya.ecoli.noDWT.raw.onehot.tensor.pam.location.features.txt", quote=F, row.names=F, sep="\t")
write.table(all[,c(1,3:ncol(all))], "sample.doench.baisya.ecoli.noDWT.raw.onehot.tensor.pam.location.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(all[,3:ncol(all)], "sample.doench.baisya.ecoli.noDWT.raw.onehot.tensor.pam.location.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(all[,1:2], "sample.doench.baisya.ecoli.noDWT.score.txt", quote=F, row.names=F, sep="\t")
write.table(all[,1:2], "sample.doench.baisya.ecoli.noDWT.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = all[,2]), "sample.doench.baisya.ecoli.noDWT.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")



# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
module load python/3.7-anaconda3

#mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName doench.baisya.ecoli.noDWT --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/doench.baisya.ecoli.noDWT.raw.onehot.tensor.pam.location.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/doench.baisya.ecoli.noDWT.score.txt

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/sample
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName sample.doench.baisya.ecoli.noDWT --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/sample.doench.baisya.ecoli.noDWT.raw.onehot.tensor.pam.location.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/sample.doench.baisya.ecoli.noDWT.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/Submits/submit_full_doench.baisya.ecoli.noDWT_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/sample/Submits/submit_full_sample.doench.baisya.ecoli.noDWT_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/Submits/submit_train_doench.baisya.ecoli.noDWT_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/sample/Submits/submit_train_sample.doench.baisya.ecoli.noDWT_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/Submits/submit_test_doench.baisya.ecoli.noDWT_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/sample/Submits/submit_test_sample.doench.baisya.ecoli.noDWT_0.sh

# Andes
module load python/3.7-anaconda3

vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/YNames.txt doench.baisya.ecoli.noDWT
# 0.3350460737657166
sort -k3rg topVarEdges/cut.score_top95.txt | head
# sgRNA.structuresgRNA.raw  cut.score   0.33219874783701125
# TTsgRNA.raw   cut.score   0.02721536407114881
# pam.distance0 cut.score   0.025521266163737327
# p20homo_lumo_energygapraw cut.score   0.023461680283060747
# GGsgRNA.raw   cut.score   0.019885541473588488
# CCsgRNA.raw   cut.score   0.018555923802205367
# gene.distance0    cut.score   0.017976016828707926
# sgRNA.gcsgRNA.raw cut.score   0.016818264162424525
# sgRNA.tempsgRNA.raw   cut.score   0.016453280057777794
# p20xz_quadrupoleraw   cut.score   0.01605316256855977
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/doench.baisya.ecoli.noDWT_cut.score.importance4 | head
# sgRNA.structuresgRNA.raw: 572.797
# TTsgRNA.raw: 47.8516
# pam.distance0: 45.4937
# GGsgRNA.raw: 35.4175
# p20yz_quadrupoleraw: 35.102
# p20xz_quadrupoleraw: 35.0073
# CCsgRNA.raw: 31.7809
# gene.distance0: 31.2655
# sgRNA.tempsgRNA.raw: 30.1631
# sgRNA.gcsgRNA.raw: 28.0852

# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("doench.baisya.ecoli.noDWT_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.5848896

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("doench.baisya.ecoli.noDWT_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions., method=c("pearson"))
# 0.5848896
cor(y$cut.score, pred$Predictions., method=c("spearman"))
# 0.5577428
id <- read.delim("set4_test_SampleIDs.txt", header=F, sep="\t")
colnames(id) <- "sgRNAID"

id.pred <- cbind(id, pred)
id.pred.y <- cbind(id.pred, y)

library(tidyr)
id.pred.y.group <- id.pred.y %>% separate(sgRNAID, c("sgRNA", "ID", "group"), "_")
pred.ecoli <- subset(id.pred.y.group, id.pred.y.group$group == "Cas9")
cor(pred.Cas9$cut.score, pred.Cas9$Predictions., method=c("pearson"))
# 0.4994605




cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/sample
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/YNames.txt sample.doench.baisya.ecoli.noDWT
# 0.4030488131364838
sort -k3rg topVarEdges/cut.score_top95.txt | head
# sgRNA.structuresgRNA.raw  cut.score   0.3805553226209416
# PAM.A0    cut.score   0.07049413297499693
# GGsgRNA.raw   cut.score   0.04075023565524509
# gene.distance0    cut.score   0.021694200998125943
# pam.distance0 cut.score   0.021492929092492872
# PAM.T0    cut.score   0.021038666169178367
# CGsgRNA.raw   cut.score   0.017749487963395524
# PAM.G0    cut.score   0.015037086014790793
# sgRNA.tempsgRNA.raw   cut.score   0.014170446749402476
# sgRNA.gcsgRNA.raw cut.score   0.011947409859095641

sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/sample.doench.baisya.ecoli.noDWT_cut.score.importance4 | head
# sgRNA.structuresgRNA.raw: 32.0997
# PAM.A0: 6.05361
# GGsgRNA.raw: 3.06696
# PAM.G0: 1.77555
# PAM.T0: 1.71547
# pam.distance0: 1.69503
# gene.distance0: 1.6383
# CGsgRNA.raw: 1.35543
# GsgRNA.raw: 1.25625
# TsgRNA.raw: 0.960653


# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/sample/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("sample.doench.baisya.ecoli.noDWT_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions., method=c("pearson"))
# 0.6125933
cor(y$cut.score, pred$Predictions., method=c("spearman"))
# 0.6135849
id <- read.delim("set4_test_SampleIDs.txt", header=F, sep="\t")
colnames(id) <- "sgRNAID"

id.pred <- cbind(id, pred)
id.pred.y <- cbind(id.pred, y)

library(tidyr)
id.pred.y.group <- id.pred.y %>% separate(sgRNAID, c("group", "ID"), "_")
pred.ecoli <- subset(id.pred.y.group, id.pred.y.group$group == "ecoli")
cor(pred.ecoli$cut.score, pred.ecoli$Predictions., method=c("pearson"))
# 0.3820894
pred.doench <- subset(id.pred.y.group, id.pred.y.group$group == "doench")
cor(pred.doench$cut.score, pred.doench$Predictions., method=c("pearson"))
# 0.5427935
pred.lipolytica <- subset(id.pred.y.group, id.pred.y.group$group == "lipolytica")
cor(pred.lipolytica$cut.score, pred.lipolytica$Predictions., method=c("pearson"))
# 0.09821948
RIT
  • two outputs: size effect and direction
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score doench.baisya.ecoli.noDWT

# python /gpfs/alpine/syb105/proj-shared/Personal/jromero/PathAnalysis/ritEval.py doench.baisya.ecoli.noDWT_cut.score.importance4 cut.score

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/cut.score/RIT.run

sort -k3rg doench.baisya.ecoli.noDWT_cut.score.importance4.effect | head
# Feature   YVec    NormEdge    FeatureEffect   Samples Linearity
# sgRNA.structuresgRNA.raw  cut.score   0.33219874783701114 1.5015673448633625e-06  171364.954  0.3852079008244159
# TTsgRNA.raw   cut.score   0.027215364071148804    2.7005144780116256e-06  91778.339   0.3869038711036754
# pam.distance0 cut.score   0.02552126616373732 -1.7832494531292952e-06 39195.414   0.3945925333338561
# p20homo_lumo_energygapraw cut.score   0.02346168028306074 2.4595695339546074e-06  20252.915   0.4824571403940234
# GGsgRNA.raw   cut.score   0.01988554147358848 -1.0526693263682948e-06 33003.564   0.45044545253435986
# CCsgRNA.raw   cut.score   0.018555923802205363    -1.8442117417059407e-06 37633.404   0.4250683975531926
# gene.distance0    cut.score   0.017976016828707923    -1.7360295260056e-07    83554.548   0.321049884310981
# sgRNA.gcsgRNA.raw cut.score   0.016818264162424518    -1.3087550076601961e-06 35681.872   0.4454515249372095
# sgRNA.tempsgRNA.raw   cut.score   0.01645328005777779 -1.5365531662233492e-06 35027.959   0.4443609858962564
# p20xz_quadrupoleraw   cut.score   0.016053162568559768    -2.058420202656411e-06  16640.759   0.4722778763484967
SHAP
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate shap

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species

# python
import pandas as pd
import numpy as np
np.random.seed(0)
import matplotlib.pyplot as plt
df = pd.read_table('/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/doench.baisya.ecoli.noDWT.raw.onehot.tensor.pam.location.features.id.score.txt') # Load the data
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.ensemble import RandomForestRegressor
# The target variable is 'cut.score'.
Y = df['cut.score']
# get list of features from R... dput(colnames(df))
X = df.drop(columns =['sgRNAID', 'cut.score'])

# Split the data into train and test data:
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size = 0.2)
# Build the model with the random forest regression algorithm:
model = RandomForestRegressor(max_depth=6,random_state=0,n_estimators=10)
model.fit(X_train, Y_train)

import shap
shap_values = shap.TreeExplainer(model).shap_values(X_train)
f = plt.figure()
shap.summary_plot(shap_values, X_train, plot_type="bar")
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/multi.species.18jan.shap_summary_plot_bar.png", bbox_inches='tight', dpi=600)

import matplotlib.pyplot as plt
f = plt.figure()
shap.summary_plot(shap_values, X_train)
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/multi.species.18jan.shap_summary_plot_varimp.png", bbox_inches='tight', dpi=600)


# scp noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/multi.species.18jan.shap_summary_plot_varimp.png /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP/.

test on e.coli

### Summit

#!/bin/bash -l
#BSUB -P SYB105
#BSUB -W 02:15
#BSUB -nnodes 50
#BSUB -J multi.ecoli_0
#BSUB -o multi.ecoli_0.o%J
#BSUB -e multi.ecoli_0.e%J

#mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/ecoli.cas9/
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/ecoli.cas9

#/usr/bin/time -f "%e" jsrun -n 1 -a 1 -c 40 -bpacked:40 /gpfs/alpine/syb105/proj-shared/Projects/iRF/IterativeRanger/cpp_version/build/ranger --file /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.cas9.raw.onehot.tensor.pam.location.features_overlap_noSampleIDs.txt --yfile /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.cas9.score_overlap_noSampleIDs.txt --predict /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/cut.score/foldRuns/fold0/Runs/Set0/doench.baisya.ecoli.noDWT_cut.score.forest --treetype 3 --depvarname cut.score --impmeasure 1 --nthreads 160 --useMPI 0 --outprefix multi.ecoli --outputDirectory /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/ecoli.cas9/ > /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/ecoli.cas9/multi.ecoli_test.o

/usr/bin/time -f "%e" jsrun -n 1 -a 1 -c 40 -bpacked:40 /gpfs/alpine/syb105/proj-shared/Projects/iRF/IterativeRanger/cpp_version/build/ranger --file /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.cas9.raw.onehot.tensor.pam.location.features_overlap_noSampleIDs.txt --yfile /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/e.coli.cas9.score_overlap_noSampleIDs.txt --predict /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/cut.score/foldRuns/fold9/Runs/Set4/doench.baisya.ecoli.noDWT_cut.score.forest --treetype 3 --depvarname cut.score --impmeasure 1 --nthreads 160 --useMPI 0 --outprefix multi.ecoli --outputDirectory /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/ecoli.cas9/ > /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/ecoli.cas9/multi.ecoli_test4.o

# bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/multi.ecoli_test_submit.sh

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
score <- read.delim("e.coli.cas9.score_overlap_noSampleIDs.txt", header=T, sep="\t")

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/ecoli.cas9")
predict <- read.delim("multi.ecoli.prediction", header=T, sep="\t")

score.predict <- cbind(score, predict)
cor(score.predict$cut.score, score.predict$Predictions.)
# 0.7084526

pdf("multi.ecoli.prediction.scatter.pdf")
library(ggplot2)
ggplot(score.predict, aes(x=cut.score, y=Predictions.)) + geom_point() + theme_classic()
dev.off()

test on y.lipolytica

### Summit

#!/bin/bash -l
#BSUB -P SYB105
#BSUB -W 02:15
#BSUB -nnodes 50
#BSUB -J multi.lipolytica_0
#BSUB -o multi.lipolytica_0.o%J
#BSUB -e multi.lipolytica_0.e%J

#mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/lipolytica/
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/lipolytica

/usr/bin/time -f "%e" jsrun -n 1 -a 1 -c 40 -bpacked:40 /gpfs/alpine/syb105/proj-shared/Projects/iRF/IterativeRanger/cpp_version/build/ranger --file /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.raw.onehot.tensor.pam.location.features_overlap_noSampleIDs.txt --yfile /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/y.lipolytica.score_overlap_noSampleIDs.txt --predict /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/cut.score/foldRuns/fold0/Runs/Set0/doench.baisya.ecoli.noDWT_cut.score.forest --treetype 3 --depvarname cut.score --impmeasure 1 --nthreads 160 --useMPI 0 --outprefix multi.lipolytica --outputDirectory /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/lipolytica/ > /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/lipolytica/multi.lipolytica_test.o

# bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/multi.lipolytica_test_submit.sh

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save/")
score <- read.delim("y.lipolytica.score_overlap_noSampleIDs.txt", header=T, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/lipolytica")
predict <- read.delim("multi.lipolytica.prediction", header=T, sep="\t")

score.predict <- cbind(score, predict)
cor(score.predict$cut.score, score.predict$Predictions.)
# 0.7169777

pdf("multi.lipolytica.prediction.scatter.pdf")
library(ggplot2)
ggplot(score.predict, aes(x=cut.score, y=Predictions.)) + geom_point() + theme_classic()
dev.off()

test on human

### Summit

#!/bin/bash -l
#BSUB -P SYB105
#BSUB -W 02:15
#BSUB -nnodes 50
#BSUB -J multi.doench_0
#BSUB -o multi.ooench_0.o%J
#BSUB -e multi.doench_0.e%J

#mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/doench/
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/doench

/usr/bin/time -f "%e" jsrun -n 1 -a 1 -c 40 -bpacked:40 /gpfs/alpine/syb105/proj-shared/Projects/iRF/IterativeRanger/cpp_version/build/ranger --file /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.raw.onehot.tensor.pam.location.features_overlap_noSampleIDs.txt --yfile /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/Doench2014.score_overlap_noSampleIDs.txt --predict /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/cut.score/foldRuns/fold0/Runs/Set0/doench.baisya.ecoli.noDWT_cut.score.forest --treetype 3 --depvarname cut.score --impmeasure 1 --nthreads 160 --useMPI 0 --outprefix multi.doench --outputDirectory /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/doench/ > /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/doench/multi.doench_test.o

# bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/multi.doench_test_submit.sh


setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014/")
score <- read.delim("Doench2014.score_overlap_noSampleIDs.txt", header=T, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/doench")
predict <- read.delim("multi.doench.prediction", header=T, sep="\t")

score.predict <- cbind(score, predict)
cor(score.predict$cut.score, score.predict$Predictions.)
# 0.7219359

pdf("multi.doench.prediction.scatter.pdf")
library(ggplot2)
ggplot(score.predict, aes(x=cut.score, y=Predictions.)) + geom_point() + theme_classic()
dev.off()

18 January

  • run with most updated matrices (raw values, positional encoding kmers, quantum tensors (singleton, basepair, dimer))
# salloc -A SYB105 -N 2 -t 4:00:00

# mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species

######################## need to normalize cut score across datasets... ######################## 
# z = (xi - min(x)) / (max(x) - min(x))

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
doench <- read.delim("Doench2014.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.txt", header=T, sep="\t", stringsAsFactors = F)
ncol(doench)
# 6173
nrow(doench)
# 673
doench.num <- mutate_all(doench[,2:ncol(doench)], function(x) as.numeric(as.character(x)))
doench.num$cut.score <- (doench.num$cut.score - min(doench.num$cut.score)) / (max(doench.num$cut.score) - min(doench.num$cut.score))
doench.num <- cbind(data.frame("sgRNAID" = doench$sgRNAID), doench.num)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
lipolytica <- read.delim("y.lipolytica.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.txt", header=T, sep="\t", stringsAsFactors = F)
ncol(lipolytica)
# 6173
nrow(lipolytica)
# 45271
names(lipolytica)[names(lipolytica) == 'cut.score.x'] <- 'cut.score'
lipolytica.num <- mutate_all(lipolytica[,2:ncol(lipolytica)], function(x) as.numeric(as.character(x)))
lipolytica.num$cut.score <- (lipolytica.num$cut.score - min(lipolytica.num$cut.score)) / (max(lipolytica.num$cut.score) - min(lipolytica.num$cut.score))
lipolytica.num <- cbind(data.frame("sgRNAID" = lipolytica$sgRNAID), lipolytica.num)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
ecoli <- read.delim("Ecoli.allCas9.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.txt", header=T, sep="\t", stringsAsFactors = F)
ncol(ecoli)
# 6173
nrow(ecoli)
# 40468
ecoli.num <- mutate_all(ecoli[,2:ncol(ecoli)], function(x) as.numeric(as.character(x)))
ecoli.num$cut.score <- (ecoli.num$cut.score - min(ecoli.num$cut.score)) / (max(ecoli.num$cut.score) - min(ecoli.num$cut.score))
ecoli.num <- cbind(data.frame("sgRNAID" = ecoli$sgRNAID), ecoli.num)

###### need to adjust sgRNA IDs to include species name...
###### need to subset data so we are taking equal number of samples from each species (limited by Doench dataset)

doench.num.sample <- doench.num[sample(nrow(doench.num), 673), ]
doench.num.sample$sgRNAID <- paste0("doench_", doench.num.sample$sgRNAID)
lipolytica.num.sample <- lipolytica.num[sample(nrow(lipolytica.num), 673), ]
lipolytica.num.sample$sgRNAID <- paste0("lipolytica_", lipolytica.num.sample$sgRNAID)
ecoli.num.sample <- ecoli.num[sample(nrow(ecoli.num), 673), ]
ecoli.num.sample$sgRNAID <- paste0("ecoli_", ecoli.num.sample$sgRNAID)

#doench.lipolytica <- dplyr::bind_rows(doench.num.sample, lipolytica.num.sample)
#all <- dplyr::bind_rows(doench.lipolytica, ecoli.num.sample)

d.names <- names(doench.num.sample)
l.names <- names(lipolytica.num.sample)
e.names <- names(ecoli.num.sample)
setdiff(d.names, l.names)
setdiff(l.names, e.names)
setdiff(e.names, l.names)

names(ecoli.num.sample)[names(ecoli.num.sample) == 'V306sgRNA.raw'] <- 'V306.xsgRNA.raw'
names(ecoli.num.sample)[names(ecoli.num.sample) == 'V307sgRNA.raw'] <- 'V307.xsgRNA.raw'
names(ecoli.num.sample)[names(ecoli.num.sample) == 'V308sgRNA.raw'] <- 'V308.xsgRNA.raw'
names(ecoli.num.sample)[names(ecoli.num.sample) == 'V309sgRNA.raw'] <- 'V309.xsgRNA.raw'
names(ecoli.num.sample)[names(ecoli.num.sample) == 'V310sgRNA.raw'] <- 'V310.xsgRNA.raw'
names(ecoli.num.sample)[names(ecoli.num.sample) == 'V311sgRNA.raw'] <- 'V311.xsgRNA.raw'
names(ecoli.num.sample)[names(ecoli.num.sample) == 'V312sgRNA.raw'] <- 'V312.xsgRNA.raw'
names(ecoli.num.sample)[names(ecoli.num.sample) == 'V313sgRNA.raw'] <- 'V313.xsgRNA.raw'
names(ecoli.num.sample)[names(ecoli.num.sample) == 'V314sgRNA.raw'] <- 'V314.xsgRNA.raw'
names(ecoli.num.sample)[names(ecoli.num.sample) == 'V315sgRNA.raw'] <- 'V315.xsgRNA.raw'
names(ecoli.num.sample)[names(ecoli.num.sample) == 'V316sgRNA.raw'] <- 'V316.xsgRNA.raw'
names(ecoli.num.sample)[names(ecoli.num.sample) == 'V317sgRNA.raw'] <- 'V317.xsgRNA.raw'
names(ecoli.num.sample)[names(ecoli.num.sample) == 'V318sgRNA.raw'] <- 'V318.xsgRNA.raw'
names(ecoli.num.sample)[names(ecoli.num.sample) == 'V319sgRNA.raw'] <- 'V319.xsgRNA.raw'
names(ecoli.num.sample)[names(ecoli.num.sample) == 'V320sgRNA.raw'] <- 'V320.xsgRNA.raw'
names(ecoli.num.sample)[names(ecoli.num.sample) == 'V81sgRNA.raw'] <- 'V81.x.xsgRNA.raw'
ecoli.num.sample$V81.y.ysgRNA.raw <- 0
e.names <- names(ecoli.num.sample)
setdiff(l.names, e.names)
setdiff(e.names, l.names)

all <- rbind(doench.num.sample, lipolytica.num.sample, ecoli.num.sample)
ncol(all)
# 6174
nrow(all)
# 2019


setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species")
write.table(all, "doench.baisya.ecoli.18jan.raw.onehot.tensor.pam.location.features.id.score.txt", quote=F, row.names=F, sep="\t")
# var --> doench=0.03670343, ecoli=0.04583629, lipolytica=0.02404219
# sd --> doench=0.1915814, ecoli=0.2140941, lipolytica=0.1550555
# mean --> doench=0.1797033, ecoli=0.5210303, lipolytica=0.3284551


write.table(all[,c(1,3:ncol(all))], "doench.baisya.ecoli.18jan.raw.onehot.tensor.pam.location.features.txt", quote=F, row.names=F, sep="\t")
write.table(all[,c(1,3:ncol(all))], "doench.baisya.ecoli.18jan.raw.onehot.tensor.pam.location.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(all[,3:ncol(all)], "doench.baisya.ecoli.18jan.raw.onehot.tensor.pam.location.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(all[,1:2], "doench.baisya.ecoli.18jan.score.txt", quote=F, row.names=F, sep="\t")
write.table(all[,1:2], "doench.baisya.ecoli.18jan.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = all[,2]), "doench.baisya.ecoli.18jan.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
module load python/3.7-anaconda3

#mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName doench.baisya.ecoli.18jan --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/doench.baisya.ecoli.18jan.raw.onehot.tensor.pam.location.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/doench.baisya.ecoli.18jan.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/Submits/submit_full_doench.baisya.ecoli.18jan_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/Submits/submit_train_doench.baisya.ecoli.18jan_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/Submits/submit_test_doench.baisya.ecoli.18jan_0.sh

# Andes
module load python/3.7-anaconda3

vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/YNames.txt doench.baisya.ecoli.18jan
# 0.4228729850812137
sort -k3rg topVarEdges/cut.score_top95.txt | head
# sgRNA.structuresgRNA.raw  cut.score   0.462981027783319
# pam.distance0 cut.score   0.04477021223536862
# GGsgRNA.raw   cut.score   0.03985650177869358
# p20No_electronsraw    cut.score   0.02293728111265287
# sgRNA.tempsgRNA.raw   cut.score   0.018801638583876092
# sgRNA.gcsgRNA.raw cut.score   0.018569414804966544
# CCsgRNA.raw   cut.score   0.018239097698237207
# p20HL.gap_eVraw   cut.score   0.018139195557280316
# p20LUMO_eVraw cut.score   0.014351123429771385
# p20HOMO_eVraw cut.score   0.013971272670979858

sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/doench.baisya.ecoli.18jan_cut.score.importance4 | head
# sgRNA.structuresgRNA.raw: 25.9978
# pam.distance0: 2.82849
# GGsgRNA.raw: 2.47276
# p20No_electronsraw: 1.47081
# CCsgRNA.raw: 1.24198
# p20HOMO_eVraw: 1.21726
# sgRNA.gcsgRNA.raw: 1.03332
# sgRNA.tempsgRNA.raw: 0.957661
# p20HL.gap_eVraw: 0.881469
# p20LUMO_eVraw: 0.778138

# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("doench.baisya.ecoli.18jan_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.6810235

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("doench.baisya.ecoli.18jan_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions., method=c("pearson"))
# 0.6810235
cor(y$cut.score, pred$Predictions., method=c("spearman"))
# 0.6804178
id <- read.delim("set4_test_SampleIDs.txt", header=F, sep="\t")
colnames(id) <- "sgRNAID"

id.pred <- cbind(id, pred)
id.pred.y <- cbind(id.pred, y)

library(tidyr)
id.pred.y.group <- id.pred.y %>% separate(sgRNAID, c("species", "sgRNAID"), "_")
pred.ecoli <- subset(id.pred.y.group, id.pred.y.group$species == "ecoli")
cor(pred.ecoli$cut.score, pred.ecoli$Predictions., method=c("pearson"))
# 0.5247512

pred.lipolytica <- subset(id.pred.y.group, id.pred.y.group$species == "lipolytica")
cor(pred.lipolytica$cut.score, pred.lipolytica$Predictions., method=c("pearson"))
# -0.0006011889

pred.doench <- subset(id.pred.y.group, id.pred.y.group$species == "doench")
cor(pred.doench$cut.score, pred.doench$Predictions., method=c("pearson"))
# 0.4311619
RIT
  • two outputs: size effect and direction
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score doench.baisya.ecoli.18jan

# python /gpfs/alpine/syb105/proj-shared/Personal/jromero/PathAnalysis/ritEval.py doench.baisya.ecoli.18jan_cut.score.importance4 cut.score

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/cut.score/RIT.run.18jan

sort -k3rg doench.baisya.ecoli.18jan_cut.score.importance4.effect | head
SHAP
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate shap

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species

# python
import pandas as pd
import numpy as np
np.random.seed(0)
import matplotlib.pyplot as plt
df = pd.read_table('/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/doench.baisya.ecoli.18jan.raw.onehot.tensor.pam.location.features.id.score.txt') # Load the data
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.ensemble import RandomForestRegressor
# The target variable is 'cut.score'.
Y = df['cut.score']
# get list of features from R... dput(colnames(df))
X = df.drop(columns =['sgRNAID', 'cut.score'])

# Split the data into train and test data:
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size = 0.2)
# Build the model with the random forest regression algorithm:
model = RandomForestRegressor(max_depth=6,random_state=0,n_estimators=10)
model.fit(X_train, Y_train)

import shap
shap_values = shap.TreeExplainer(model).shap_values(X_train)
f = plt.figure()
shap.summary_plot(shap_values, X_train, plot_type="bar")
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/multi.species.18jan.shap_summary_plot_bar.png", bbox_inches='tight', dpi=600)

import matplotlib.pyplot as plt
f = plt.figure()
shap.summary_plot(shap_values, X_train)
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/multi.species.18jan.shap_summary_plot_varimp.png", bbox_inches='tight', dpi=600)

# scp noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/multi.species.18jan.shap_summary_plot_varimp.png /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP/.
group Split
# split each Cas9 group into two groups... 
# add --group tag
# add --sampleSize tag 

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/")
df <- read.delim("doench.baisya.ecoli.18jan.raw.onehot.tensor.pam.location.features.txt")
df.id <- data.frame(df$sgRNAID)

library(tidyr)
df.sep <- separate(df.id, df.sgRNAID, c("species", "sgRNAID"), sep="_")
df.ecoli <- subset(df.sep, df.sep$species == "ecoli")
# 673 / 2 = 336.5
df.lipolytica <- subset(df.sep, df.sep$species == "lipolytica")
df.doench <- subset(df.sep, df.sep$species == "doench")

df.ecoli.1 <- df.ecoli[1:336,]
df.ecoli.2 <- df.ecoli[337:673,]
df.lipolytica.1 <- df.lipolytica[1:336,]
df.lipolytica.2 <- df.lipolytica[337:673,]
df.doench.1 <- df.doench[1:336,]
df.doench.2 <- df.doench[337:673,]

df.ecoli.1$group <- "ecoli.group1"
df.ecoli.2$group <- "ecoli.group2"
df.lipolytica.1$group <- "lipolytica.group1"
df.lipolytica.2$group <- "lipolytica.group2"
df.doench.1$group <- "doench.group1"
df.doench.2$group <- "doench.group2"

df1 <- rbind(df.ecoli.1, df.ecoli.2)
df2 <- rbind(df1, df.lipolytica.1)
df3 <- rbind(df2, df.lipolytica.2)
df4 <- rbind(df3, df.doench.1)
df5 <- rbind(df4, df.doench.2)

library(dplyr)
df.order <- left_join(df.sep, df5, by=c("sgRNAID", "species"))
df.group <- data.frame(df.order$group)
colnames(df.group) <- "groupID"

write.table(df.group, "doench.baisya.ecoli.18jan.groupfile.txt", quote=F, row.names=F, sep="\t")


# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/
mkdir group.features

module load python/3.7-anaconda3

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/group.features
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 120 --Account SYB105 --NumTrees 1000 --NumIterations 6 --RunName doench.baisya.ecoli.group --bypass --Prediction --sampleSize 20000 --groupFile /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/doench.baisya.ecoli.18jan.groupfile.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/doench.baisya.ecoli.18jan.raw.onehot.tensor.pam.location.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/doench.baisya.ecoli.18jan.score.txt 

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/group.features/Submits/submit_full_doench.baisya.ecoli.group_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/group.features/Submits/submit_train_doench.baisya.ecoli.group_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/group.features/Submits/submit_test_doench.baisya.ecoli.group_0.sh

# Andes
module load python/3.7-anaconda3

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/group.features
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 6 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/YNames.txt doench.baisya.ecoli.group
# 0.09236658363398188

# correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/group.features/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("doench.baisya.ecoli.group_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions., method=c("pearson"))
# 0.4468816
cor(y$cut.score, pred$Predictions., method=c("spearman"))
# 0.5049132


# correlation - by Cas9 type
id <- read.delim("set4_test_SampleIDs.txt", header=F, sep="\t")
colnames(id) <- "sgRNAID"

id.pred <- cbind(id, pred)
id.pred.y <- cbind(id.pred, y)

library(tidyr)
id.pred.y.group <- id.pred.y %>% separate(sgRNAID, c("species", "sgRNAID"), "_")
# 336
cor(id.pred.y.group$cut.score, id.pred.y.group$Predictions., method=c("pearson"))
# 0.4468816
pred.ecoli <- subset(id.pred.y.group, id.pred.y.group$species == "ecoli")
cor(pred.ecoli$cut.score, pred.ecoli$Predictions., method=c("pearson"))
# NA
pred.lipolytica <- subset(id.pred.y.group, id.pred.y.group$species == "lipolytica")
cor(pred.lipolytica$cut.score, pred.lipolytica$Predictions., method=c("pearson"))
# NA
pred.doench <- subset(id.pred.y.group, id.pred.y.group$species == "doench")
cor(pred.doench$cut.score, pred.doench$Predictions., method=c("pearson"))
# 0.4468816

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/group.features/cut.score/foldRuns/fold9/Runs/Set3")
pred <- read.delim("doench.baisya.ecoli.group_Set3_test.prediction", header=T, sep="\t")
y <- read.delim("set3_Y_test_noSampleIDs.txt", header=T, sep="\t")
id <- read.delim("set3_test_SampleIDs.txt", header=F, sep="\t")
colnames(id) <- "sgRNAID"
id.pred <- cbind(id, pred)
id.pred.y <- cbind(id.pred, y)
library(tidyr)
id.pred.y.group <- id.pred.y %>% separate(sgRNAID, c("species", "sgRNAID"), "_")
# 632
cor(id.pred.y.group$cut.score, id.pred.y.group$Predictions., method=c("pearson"))
# 0.4252533
pred.ecoli <- subset(id.pred.y.group, id.pred.y.group$species == "ecoli")
cor(pred.ecoli$cut.score, pred.ecoli$Predictions., method=c("pearson"))
# 0.3815656
pred.lipolytica <- subset(id.pred.y.group, id.pred.y.group$species == "lipolytica")
cor(pred.lipolytica$cut.score, pred.lipolytica$Predictions., method=c("pearson"))
# -0.103618
pred.doench <- subset(id.pred.y.group, id.pred.y.group$species == "doench")
cor(pred.doench$cut.score, pred.doench$Predictions., method=c("pearson"))
# NA

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/group.features/cut.score/foldRuns/fold9/Runs/Set2")
pred <- read.delim("doench.baisya.ecoli.group_Set2_test.prediction", header=T, sep="\t")
y <- read.delim("set2_Y_test_noSampleIDs.txt", header=T, sep="\t")
id <- read.delim("set2_test_SampleIDs.txt", header=F, sep="\t")
colnames(id) <- "sgRNAID"
id.pred <- cbind(id, pred)
id.pred.y <- cbind(id.pred, y)
library(tidyr)
id.pred.y.group <- id.pred.y %>% separate(sgRNAID, c("species", "sgRNAID"), "_")
# 337
cor(id.pred.y.group$cut.score, id.pred.y.group$Predictions., method=c("pearson"))
# 0.5566255
pred.ecoli <- subset(id.pred.y.group, id.pred.y.group$species == "ecoli")
cor(pred.ecoli$cut.score, pred.ecoli$Predictions., method=c("pearson"))
# NA
pred.lipolytica <- subset(id.pred.y.group, id.pred.y.group$species == "lipolytica")
cor(pred.lipolytica$cut.score, pred.lipolytica$Predictions., method=c("pearson"))
# NA
pred.doench <- subset(id.pred.y.group, id.pred.y.group$species == "doench")
cor(pred.doench$cut.score, pred.doench$Predictions., method=c("pearson"))
# 0.5566255


setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/group.features/cut.score/foldRuns/fold9/Runs/Set1")
pred <- read.delim("doench.baisya.ecoli.group_Set1_test.prediction", header=T, sep="\t")
y <- read.delim("set1_Y_test_noSampleIDs.txt", header=T, sep="\t")
id <- read.delim("set1_test_SampleIDs.txt", header=F, sep="\t")
colnames(id) <- "sgRNAID"
id.pred <- cbind(id, pred)
id.pred.y <- cbind(id.pred, y)
library(tidyr)
id.pred.y.group <- id.pred.y %>% separate(sgRNAID, c("species", "sgRNAID"), "_")
# 337
cor(id.pred.y.group$cut.score, id.pred.y.group$Predictions., method=c("pearson"))
# 0.0346217
pred.ecoli <- subset(id.pred.y.group, id.pred.y.group$species == "ecoli")
cor(pred.ecoli$cut.score, pred.ecoli$Predictions., method=c("pearson"))
# NA
pred.lipolytica <- subset(id.pred.y.group, id.pred.y.group$species == "lipolytica")
cor(pred.lipolytica$cut.score, pred.lipolytica$Predictions., method=c("pearson"))
# 0.0346217
pred.doench <- subset(id.pred.y.group, id.pred.y.group$species == "doench")
cor(pred.doench$cut.score, pred.doench$Predictions., method=c("pearson"))
# NA


setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/group.features/cut.score/foldRuns/fold9/Runs/Set0")
pred <- read.delim("doench.baisya.ecoli.group_Set0_test.prediction", header=T, sep="\t")
y <- read.delim("set0_Y_test_noSampleIDs.txt", header=T, sep="\t")
id <- read.delim("set0_test_SampleIDs.txt", header=F, sep="\t")
colnames(id) <- "sgRNAID"
id.pred <- cbind(id, pred)
id.pred.y <- cbind(id.pred, y)
library(tidyr)
id.pred.y.group <- id.pred.y %>% separate(sgRNAID, c("species", "sgRNAID"), "_")
# 377
cor(id.pred.y.group$cut.score, id.pred.y.group$Predictions., method=c("pearson"))
# 0.2406913
pred.ecoli <- subset(id.pred.y.group, id.pred.y.group$species == "ecoli")
cor(pred.ecoli$cut.score, pred.ecoli$Predictions., method=c("pearson"))
# 0.2406913
pred.lipolytica <- subset(id.pred.y.group, id.pred.y.group$species == "lipolytica")
cor(pred.lipolytica$cut.score, pred.lipolytica$Predictions., method=c("pearson"))
# NA
pred.doench <- subset(id.pred.y.group, id.pred.y.group$species == "doench")
cor(pred.doench$cut.score, pred.doench$Predictions., method=c("pearson"))
# NA

Multi-species model: 15 March 2022: Quantum Matrix

–> use human, y.lipolytica, and e.coli to train the model –> then test the output on each dataset

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated

######################## need to normalize cut score across datasets... ######################## 
# z = (xi - min(x)) / (max(x) - min(x))

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
doench <- read.delim("Doench2014.finalquantum.noncorrelated.txt", header=T, sep="\t", stringsAsFactors = F)
ncol(doench)
# 6161
nrow(doench)
# 673
doench.id <- separate(doench, sgRNAID, c("data", "sgRNAID"))
doench.num <- mutate_all(doench.id[,2:ncol(doench.id)], function(x) as.numeric(as.character(x)))
doench.num$cut.score <- (doench.num$cut.score - min(doench.num$cut.score)) / (max(doench.num$cut.score) - min(doench.num$cut.score))
summary(doench.num$cut.score)
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#0.00000 0.03967 0.10641 0.17970 0.26133 1.00000 

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
lipolytica <- read.delim("y.lipolytica.finalquantum.noncorrelated.txt", header=T, sep="\t", stringsAsFactors = F)
ncol(lipolytica)
# 6161
nrow(lipolytica)
# 45271
lipolytica.num <- mutate_all(lipolytica[,1:ncol(lipolytica)], function(x) as.numeric(as.character(x)))
lipolytica.num$cut.score <- (lipolytica.num$cut.score - min(lipolytica.num$cut.score)) / (max(lipolytica.num$cut.score) - min(lipolytica.num$cut.score))
summary(lipolytica.num$cut.score)
 #   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 # 0.0000  0.2167  0.2877  0.3389  0.4460  1.0000 


setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
ecoli <- read.delim("Ecoli.finalquantum.noncorrelated.txt", header=T, sep="\t", stringsAsFactors = F)
ncol(ecoli)
# 6160
nrow(ecoli)
# 40468
ecoli.num <- mutate_all(ecoli[,2:ncol(ecoli)], function(x) as.numeric(as.character(x)))
ecoli.num$cut.score <- (ecoli.num$cut.score - min(ecoli.num$cut.score)) / (max(ecoli.num$cut.score) - min(ecoli.num$cut.score))
ecoli.num <- cbind(data.frame("sgRNAID" = ecoli$sgRNAID), ecoli.num)
summary(ecoli.num$cut.score)
ecoli.num.sample <- ecoli.num[sample(nrow(ecoli.num), 1000), ]
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
# 0.0000  0.3563  0.5618  0.5077  0.6757  1.0000 

#### columns don't match... find which columns and remove?
ecoli.names <- names(ecoli.num)
lipolytica.names <- names(lipolytica.num)
doench.names <- names(doench.num)
setdiff(lipolytica.names, doench.names)
# character(0)
setdiff(lipolytica.names, ecoli.names)
#  [1] "V306.xsgRNA.raw"  "V307.xsgRNA.raw"  "V308.xsgRNA.raw"  "V309.xsgRNA.raw" 
#  [5] "V310.xsgRNA.raw"  "V311.xsgRNA.raw"  "V312.xsgRNA.raw"  "V313.xsgRNA.raw" 
#  [9] "V314.xsgRNA.raw"  "V315.xsgRNA.raw"  "V316.xsgRNA.raw"  "V317.xsgRNA.raw" 
# [13] "V318.xsgRNA.raw"  "V319.xsgRNA.raw"  "V320.xsgRNA.raw"  "V81.x.xsgRNA.raw"
# [17] "V81.y.ysgRNA.raw"
setdiff(ecoli.names, lipolytica.names)
#  [1] "V306sgRNA.raw" "V307sgRNA.raw" "V308sgRNA.raw" "V309sgRNA.raw"
#  [5] "V310sgRNA.raw" "V311sgRNA.raw" "V312sgRNA.raw" "V313sgRNA.raw"
#  [9] "V314sgRNA.raw" "V315sgRNA.raw" "V316sgRNA.raw" "V317sgRNA.raw"
# [13] "V318sgRNA.raw" "V319sgRNA.raw" "V320sgRNA.raw" "V81sgRNA.raw" 

ecoli.num.df <- ecoli.num %>% select(-grep("V306sgRNA.raw", names(ecoli.num)), -grep("V307sgRNA.raw", names(ecoli.num)), -grep("V308sgRNA.raw", names(ecoli.num)), -grep("V309sgRNA.raw", names(ecoli.num)), -grep("V310sgRNA.raw", names(ecoli.num)), -grep("V311sgRNA.raw", names(ecoli.num)), -grep("V312sgRNA.raw", names(ecoli.num)), -grep("V313sgRNA.raw", names(ecoli.num)), -grep("V314sgRNA.raw", names(ecoli.num)), -grep("V315sgRNA.raw", names(ecoli.num)), -grep("V316sgRNA.raw", names(ecoli.num)), -grep("V317sgRNA.raw", names(ecoli.num)), -grep("V318sgRNA.raw", names(ecoli.num)), -grep("V319sgRNA.raw", names(ecoli.num)), -grep("V320sgRNA.raw", names(ecoli.num)), -grep("V81sgRNA.raw", names(ecoli.num)))

lipolytica.num.df <- lipolytica.num %>% select(-grep("V306.xsgRNA.raw", names(lipolytica.num)), -grep("V307.xsgRNA.raw", names(lipolytica.num)), -grep("V308.xsgRNA.raw", names(lipolytica.num)), -grep("V309.xsgRNA.raw", names(lipolytica.num)), -grep("V310.xsgRNA.raw", names(lipolytica.num)), -grep("V311.xsgRNA.raw", names(lipolytica.num)), -grep("V312.xsgRNA.raw", names(lipolytica.num)), -grep("V313.xsgRNA.raw", names(lipolytica.num)), -grep("V314.xsgRNA.raw", names(lipolytica.num)), -grep("V315.xsgRNA.raw", names(lipolytica.num)), -grep("V316.xsgRNA.raw", names(lipolytica.num)), -grep("V317.xsgRNA.raw", names(lipolytica.num)), -grep("V318.xsgRNA.raw", names(lipolytica.num)), -grep("V319.xsgRNA.raw", names(lipolytica.num)), -grep("V320.xsgRNA.raw", names(lipolytica.num)), -grep("V81.x.xsgRNA.raw", names(lipolytica.num)), -grep("V81.y.ysgRNA.raw", names(lipolytica.num)))

doench.num.df <- doench.num %>% select(-grep("V306.xsgRNA.raw", names(doench.num)), -grep("V307.xsgRNA.raw", names(doench.num)), -grep("V308.xsgRNA.raw", names(doench.num)), -grep("V309.xsgRNA.raw", names(doench.num)), -grep("V310.xsgRNA.raw", names(doench.num)), -grep("V311.xsgRNA.raw", names(doench.num)), -grep("V312.xsgRNA.raw", names(doench.num)), -grep("V313.xsgRNA.raw", names(doench.num)), -grep("V314.xsgRNA.raw", names(doench.num)), -grep("V315.xsgRNA.raw", names(doench.num)), -grep("V316.xsgRNA.raw", names(doench.num)), -grep("V317.xsgRNA.raw", names(doench.num)), -grep("V318.xsgRNA.raw", names(doench.num)), -grep("V319.xsgRNA.raw", names(doench.num)), -grep("V320.xsgRNA.raw", names(doench.num)), -grep("V81.x.xsgRNA.raw", names(doench.num)), -grep("V81.y.ysgRNA.raw", names(doench.num)))

all <- rbind(doench.num.df, lipolytica.num.df, ecoli.num.df)
ncol(all)
# 6144
nrow(all)
# 86412

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated")
write.table(all, "doench.baisya.ecoli.finalquantum.noncorrelated.features.id.score.txt", quote=F, row.names=F, sep="\t")

####### START HERE ####### 
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated")
all <- read.delim("doench.baisya.ecoli.finalquantum.noncorrelated.features.id.score.txt", header=T, sep="\t", stringsAsFactors = F)

write.table(all[,c(1,3:ncol(all))], "doench.baisya.ecoli.finalquantum.noncorrelated.features.txt", quote=F, row.names=F, sep="\t")
write.table(all[,c(1,3:ncol(all))], "doench.baisya.ecoli.finalquantum.noncorrelated.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(all[,3:ncol(all)], "doench.baisya.ecoli.finalquantum.noncorrelated.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(all[,1:2], "doench.baisya.ecoli.finalquantum.noncorrelated.score.txt", quote=F, row.names=F, sep="\t")
write.table(all[,1:2], "doench.baisya.ecoli.finalquantum.noncorrelated.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = all[,2]), "doench.baisya.ecoli.finalquantum.noncorrelated.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

doench.num.sample <- doench.num.df[sample(nrow(doench.num.df), 600), ]
doench.num.sample$sgRNAID <- paste0("doench_", doench.num.sample$sgRNAID)
lipolytica.num.sample <- lipolytica.num.df[sample(nrow(lipolytica.num.df), 600), ]
lipolytica.num.sample$sgRNAID <- paste0("lipolytica_", lipolytica.num.sample$sgRNAID)
ecoli.num.sample <- ecoli.num.df[sample(nrow(ecoli.num.df), 600), ]
ecoli.num.sample$sgRNAID <- paste0("ecoli_", ecoli.num.sample$sgRNAID)

all <- rbind(doench.num.sample, lipolytica.num.sample, ecoli.num.sample)
ncol(all)
# 6144
nrow(all)
# 1800

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated")
write.table(all, "sample.doench.baisya.ecoli.finalquantum.noncorrelated.features.id.score.txt", quote=F, row.names=F, sep="\t")

write.table(all[,c(1,3:ncol(all))], "sample.doench.baisya.ecoli.finalquantum.noncorrelated.features.txt", quote=F, row.names=F, sep="\t")
write.table(all[,c(1,3:ncol(all))], "sample.doench.baisya.ecoli.finalquantum.noncorrelated.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(all[,3:ncol(all)], "sample.doench.baisya.ecoli.finalquantum.noncorrelated.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(all[,1:2], "sample.doench.baisya.ecoli.finalquantum.noncorrelated.score.txt", quote=F, row.names=F, sep="\t")
write.table(all[,1:2], "sample.doench.baisya.ecoli.finalquantum.noncorrelated.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = all[,2]), "sample.doench.baisya.ecoli.finalquantum.noncorrelated.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")



# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName doench.baisya.ecoli.finalquantum.noncorrelated --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/doench.baisya.ecoli.finalquantum.noncorrelated.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/doench.baisya.ecoli.finalquantum.noncorrelated.score.txt

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/sample
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/sample
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName sample.doench.baisya.ecoli.finalquantum.noncorrelated --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/sample.doench.baisya.ecoli.finalquantum.noncorrelated.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/sample.doench.baisya.ecoli.finalquantum.noncorrelated.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/Submits/submit_full_doench.baisya.ecoli.finalquantum.noncorrelated_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/sample/Submits/submit_full_sample.doench.baisya.ecoli.finalquantum.noncorrelated_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/Submits/submit_train_doench.baisya.ecoli.finalquantum.noncorrelated_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/sample/Submits/submit_train_sample.doench.baisya.ecoli.finalquantum.noncorrelated_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/Submits/submit_test_doench.baisya.ecoli.finalquantum.noncorrelated_0.sh
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/sample/Submits/submit_test_sample.doench.baisya.ecoli.finalquantum.noncorrelated_0.sh

# Andes
module load python/3.7-anaconda3

vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/YNames.txt doench.baisya.ecoli.finalquantum.noncorrelated
# 0.31954645322248604
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/doench.baisya.ecoli.finalquantum.noncorrelated_cut.score.importance4 | head
# sgRNA.structuresgRNA.raw: 528.312
# p20basepair.Hbond.energyraw: 84.5808
# p19dimer.Hbond.stackingraw: 43.9609
# p18trimer.Hbond.stackingraw: 37.1693
# TTsgRNA.raw: 29.1179
# p18dimer.Hbond.stackingraw: 28.198
# p15tetramer.Hbond.stackingraw: 26.2502
# p11tetramer.Hbond.stackingraw: 24.5269
# p13tetramer.Hbond.stackingraw: 24.1983
# p1tetramer.Hbond.stackingraw: 23.2881

# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("doench.baisya.ecoli.finalquantum.noncorrelated_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.5687324

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("doench.baisya.ecoli.finalquantum.noncorrelated_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions., method=c("pearson"))
# 0.5687324
cor(y$cut.score, pred$Predictions., method=c("spearman"))
# 0.5455017
id <- read.delim("set4_test_SampleIDs.txt", header=F, sep="\t")
colnames(id) <- "sgRNAID"

id.pred <- cbind(id, pred)
id.pred.y <- cbind(id.pred, y)

library(tidyr)
id.pred.y.group <- id.pred.y %>% separate(sgRNAID, c("sgRNA", "ID", "group"), "_")
pred.ecoli <- subset(id.pred.y.group, id.pred.y.group$group == "Cas9")
cor(pred.ecoli$cut.score, pred.ecoli$Predictions., method=c("pearson"))
# 0.4936689



cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/sample
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/YNames.txt sample.doench.baisya.ecoli.finalquantum.noncorrelated
# 0.4209199735102055
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/sample.doench.baisya.ecoli.finalquantum.noncorrelated_cut.score.importance4 | head
# sgRNA.structuresgRNA.raw: 23.207
# pam.distance0: 2.61693
# p15dimer.Hbond.stackingraw: 2.50501
# GGsgRNA.raw: 1.92695
# p13tetramer.Hbond.stackingraw: 1.67273
# p20monomer.No.electronsraw: 1.51487
# p20monomer.HLgap.eVraw: 1.40704
# p19dimer.Hbond.stackingraw: 1.31469
# p17tetramer.Hbond.stackingraw: 1.21605
# sgRNA.tempsgRNA.raw: 0.876837


# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/sample/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("sample.doench.baisya.ecoli.finalquantum.noncorrelated_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions., method=c("pearson"))
# 0.6068327
cor(y$cut.score, pred$Predictions., method=c("spearman"))
# 0.5982605
id <- read.delim("set4_test_SampleIDs.txt", header=F, sep="\t")
colnames(id) <- "sgRNAID"

id.pred <- cbind(id, pred)
id.pred.y <- cbind(id.pred, y)

library(tidyr)
id.pred.y.group <- id.pred.y %>% separate(sgRNAID, c("group", "ID"), "_")
pred.ecoli <- subset(id.pred.y.group, id.pred.y.group$group == "ecoli")
cor(pred.ecoli$cut.score, pred.ecoli$Predictions., method=c("pearson"))
# 0.2953007
pred.doench <- subset(id.pred.y.group, id.pred.y.group$group == "doench")
cor(pred.doench$cut.score, pred.doench$Predictions., method=c("pearson"))
# 0.6683965
pred.lipolytica <- subset(id.pred.y.group, id.pred.y.group$group == "lipolytica")
cor(pred.lipolytica$cut.score, pred.lipolytica$Predictions., method=c("pearson"))
# 0.02491944
RIT
  • two outputs: size effect and direction
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score doench.baisya.ecoli.finalquantum.noncorrelated

# python /gpfs/alpine/syb105/proj-shared/Personal/jromero/PathAnalysis/ritEval.py doench.baisya.ecoli.finalquantum.noncorrelated_cut.score.importance4 cut.score

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/cut.score/RIT.run

sort -k3rg doench.baisya.ecoli.finalquantum.noncorrelated_cut.score.importance4.effect | head

Use species as a feature

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
doench <- read.delim("Doench2014.finalquantum.noncorrelated.txt", header=T, sep="\t", stringsAsFactors = F)
doench.id <- separate(doench, sgRNAID, c("data", "sgRNAID"))
doench.num <- mutate_all(doench.id[,2:ncol(doench.id)], function(x) as.numeric(as.character(x)))
doench.num$cut.score <- (doench.num$cut.score - min(doench.num$cut.score)) / (max(doench.num$cut.score) - min(doench.num$cut.score))

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
lipolytica <- read.delim("y.lipolytica.finalquantum.noncorrelated.txt", header=T, sep="\t", stringsAsFactors = F)
lipolytica.num <- mutate_all(lipolytica[,1:ncol(lipolytica)], function(x) as.numeric(as.character(x)))
lipolytica.num$cut.score <- (lipolytica.num$cut.score - min(lipolytica.num$cut.score)) / (max(lipolytica.num$cut.score) - min(lipolytica.num$cut.score))

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
ecoli <- read.delim("Ecoli.finalquantum.noncorrelated.txt", header=T, sep="\t", stringsAsFactors = F)
ecoli.num <- mutate_all(ecoli[,2:ncol(ecoli)], function(x) as.numeric(as.character(x)))
ecoli.num$cut.score <- (ecoli.num$cut.score - min(ecoli.num$cut.score)) / (max(ecoli.num$cut.score) - min(ecoli.num$cut.score))
ecoli.num <- cbind(data.frame("sgRNAID" = ecoli$sgRNAID), ecoli.num)
summary(ecoli.num$cut.score)

ecoli.num.df <- ecoli.num %>% select(-grep("V306sgRNA.raw", names(ecoli.num)), -grep("V307sgRNA.raw", names(ecoli.num)), -grep("V308sgRNA.raw", names(ecoli.num)), -grep("V309sgRNA.raw", names(ecoli.num)), -grep("V310sgRNA.raw", names(ecoli.num)), -grep("V311sgRNA.raw", names(ecoli.num)), -grep("V312sgRNA.raw", names(ecoli.num)), -grep("V313sgRNA.raw", names(ecoli.num)), -grep("V314sgRNA.raw", names(ecoli.num)), -grep("V315sgRNA.raw", names(ecoli.num)), -grep("V316sgRNA.raw", names(ecoli.num)), -grep("V317sgRNA.raw", names(ecoli.num)), -grep("V318sgRNA.raw", names(ecoli.num)), -grep("V319sgRNA.raw", names(ecoli.num)), -grep("V320sgRNA.raw", names(ecoli.num)), -grep("V81sgRNA.raw", names(ecoli.num)))

lipolytica.num.df <- lipolytica.num %>% select(-grep("V306.xsgRNA.raw", names(lipolytica.num)), -grep("V307.xsgRNA.raw", names(lipolytica.num)), -grep("V308.xsgRNA.raw", names(lipolytica.num)), -grep("V309.xsgRNA.raw", names(lipolytica.num)), -grep("V310.xsgRNA.raw", names(lipolytica.num)), -grep("V311.xsgRNA.raw", names(lipolytica.num)), -grep("V312.xsgRNA.raw", names(lipolytica.num)), -grep("V313.xsgRNA.raw", names(lipolytica.num)), -grep("V314.xsgRNA.raw", names(lipolytica.num)), -grep("V315.xsgRNA.raw", names(lipolytica.num)), -grep("V316.xsgRNA.raw", names(lipolytica.num)), -grep("V317.xsgRNA.raw", names(lipolytica.num)), -grep("V318.xsgRNA.raw", names(lipolytica.num)), -grep("V319.xsgRNA.raw", names(lipolytica.num)), -grep("V320.xsgRNA.raw", names(lipolytica.num)), -grep("V81.x.xsgRNA.raw", names(lipolytica.num)), -grep("V81.y.ysgRNA.raw", names(lipolytica.num)))

doench.num.df <- doench.num %>% select(-grep("V306.xsgRNA.raw", names(doench.num)), -grep("V307.xsgRNA.raw", names(doench.num)), -grep("V308.xsgRNA.raw", names(doench.num)), -grep("V309.xsgRNA.raw", names(doench.num)), -grep("V310.xsgRNA.raw", names(doench.num)), -grep("V311.xsgRNA.raw", names(doench.num)), -grep("V312.xsgRNA.raw", names(doench.num)), -grep("V313.xsgRNA.raw", names(doench.num)), -grep("V314.xsgRNA.raw", names(doench.num)), -grep("V315.xsgRNA.raw", names(doench.num)), -grep("V316.xsgRNA.raw", names(doench.num)), -grep("V317.xsgRNA.raw", names(doench.num)), -grep("V318.xsgRNA.raw", names(doench.num)), -grep("V319.xsgRNA.raw", names(doench.num)), -grep("V320.xsgRNA.raw", names(doench.num)), -grep("V81.x.xsgRNA.raw", names(doench.num)), -grep("V81.y.ysgRNA.raw", names(doench.num)))

ecoli.num.df$species <- 1
lipolytica.num.df$species <- 2
doench.num.df$species <- 3
all <- rbind(doench.num.df, lipolytica.num.df, ecoli.num.df)
all$cut.score <- as.numeric(all$cut.score)

write.table(all[,c(1,3:ncol(all))], "doench.baisya.ecoli.finalquantum.noncorrelated.species.features.txt", quote=F, row.names=F, sep="\t")
write.table(all[,c(1,3:ncol(all))], "doench.baisya.ecoli.finalquantum.noncorrelated.species.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(all[,3:ncol(all)], "doench.baisya.ecoli.finalquantum.noncorrelated.species.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(all[,1:2], "doench.baisya.ecoli.finalquantum.noncorrelated.species.score.txt", quote=F, row.names=F, sep="\t")
write.table(all[,1:2], "doench.baisya.ecoli.finalquantum.noncorrelated.species.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = all[,2]), "doench.baisya.ecoli.finalquantum.noncorrelated.species.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")


# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName doench.baisya.ecoli.finalquantum.noncorrelated.species --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/doench.baisya.ecoli.finalquantum.noncorrelated.species.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/doench.baisya.ecoli.finalquantum.noncorrelated.species.score.txt


# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species/Submits/submit_full_doench.baisya.ecoli.finalquantum.noncorrelated.species_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species/Submits/submit_train_doench.baisya.ecoli.finalquantum.noncorrelated.species_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species/Submits/submit_test_doench.baisya.ecoli.finalquantum.noncorrelated.species_0.sh

#### need to make species feature numeric??

# Andes
module load python/3.7-anaconda3

vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/YNames.txt doench.baisya.ecoli.finalquantum.noncorrelated.species
# 0.32658425662799506
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/doench.baisya.ecoli.finalquantum.noncorrelated.species_cut.score.importance4 | head
# species: 518.485
# p20basepair.Hbond.energyraw: 86.6616
# p19dimer.Hbond.stackingraw: 44.8908
# p18trimer.Hbond.stackingraw: 36.5359
# p18dimer.Hbond.stackingraw: 29.2181
# p13tetramer.Hbond.stackingraw: 28.2456
# TTsgRNA.raw: 26.6635
# p15tetramer.Hbond.stackingraw: 26.4083
# p11tetramer.Hbond.stackingraw: 24.6197
# p1tetramer.Hbond.stackingraw: 23.3195

# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("doench.baisya.ecoli.finalquantum.noncorrelated.species_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.573719

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("doench.baisya.ecoli.finalquantum.noncorrelated.species_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions., method=c("pearson"))
# 0.573719
cor(y$cut.score, pred$Predictions., method=c("spearman"))
# 0.5507857

id <- read.delim("set4_test_SampleIDs.txt", header=F, sep="\t")
colnames(id) <- "sgRNAID"
id.pred <- cbind(id, pred)
id.pred.y <- cbind(id.pred, y)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
species <- read.delim("doench.baisya.ecoli.finalquantum.noncorrelated.species.features.txt", header=T, sep="\t")
species.df <- id[,c(1,6144)]

id.pred.y.species <- left_join(id.pred.y, species.df, by="sgRNAID")

library(tidyr)
pred.ecoli <- subset(id.pred.y.species, id.pred.y.species$species == 1)
# 8095
cor(pred.ecoli$cut.score, pred.ecoli$Predictions., method=c("pearson"))
# 0.4936689
pred.lipolytica <- subset(id.pred.y.species, id.pred.y.species$species == 2)
# 9108
cor(pred.lipolytica$cut.score, pred.lipolytica$Predictions., method=c("pearson"))
# 0.3320421
pred.doench <- subset(id.pred.y.species, id.pred.y.species$species == 3)
# 191
cor(pred.doench$cut.score, pred.doench$Predictions., method=c("pearson"))
# 0.5422075
equal contribution
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
doench <- read.delim("Doench2014.finalquantum.noncorrelated.txt", header=T, sep="\t", stringsAsFactors = F)
doench.id <- separate(doench, sgRNAID, c("data", "sgRNAID"))
doench.num <- mutate_all(doench.id[,2:ncol(doench.id)], function(x) as.numeric(as.character(x)))
doench.num$cut.score <- (doench.num$cut.score - min(doench.num$cut.score)) / (max(doench.num$cut.score) - min(doench.num$cut.score))

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/y.lipolytica.save")
lipolytica <- read.delim("y.lipolytica.finalquantum.noncorrelated.txt", header=T, sep="\t", stringsAsFactors = F)
lipolytica.num <- mutate_all(lipolytica[,1:ncol(lipolytica)], function(x) as.numeric(as.character(x)))
lipolytica.num$cut.score <- (lipolytica.num$cut.score - min(lipolytica.num$cut.score)) / (max(lipolytica.num$cut.score) - min(lipolytica.num$cut.score))

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
ecoli <- read.delim("Ecoli.finalquantum.noncorrelated.txt", header=T, sep="\t", stringsAsFactors = F)
ecoli.num <- mutate_all(ecoli[,2:ncol(ecoli)], function(x) as.numeric(as.character(x)))
ecoli.num$cut.score <- (ecoli.num$cut.score - min(ecoli.num$cut.score)) / (max(ecoli.num$cut.score) - min(ecoli.num$cut.score))
ecoli.num <- cbind(data.frame("sgRNAID" = ecoli$sgRNAID), ecoli.num)
summary(ecoli.num$cut.score)

ecoli.num.df <- ecoli.num %>% select(-grep("V306sgRNA.raw", names(ecoli.num)), -grep("V307sgRNA.raw", names(ecoli.num)), -grep("V308sgRNA.raw", names(ecoli.num)), -grep("V309sgRNA.raw", names(ecoli.num)), -grep("V310sgRNA.raw", names(ecoli.num)), -grep("V311sgRNA.raw", names(ecoli.num)), -grep("V312sgRNA.raw", names(ecoli.num)), -grep("V313sgRNA.raw", names(ecoli.num)), -grep("V314sgRNA.raw", names(ecoli.num)), -grep("V315sgRNA.raw", names(ecoli.num)), -grep("V316sgRNA.raw", names(ecoli.num)), -grep("V317sgRNA.raw", names(ecoli.num)), -grep("V318sgRNA.raw", names(ecoli.num)), -grep("V319sgRNA.raw", names(ecoli.num)), -grep("V320sgRNA.raw", names(ecoli.num)), -grep("V81sgRNA.raw", names(ecoli.num)))

lipolytica.num.df <- lipolytica.num %>% select(-grep("V306.xsgRNA.raw", names(lipolytica.num)), -grep("V307.xsgRNA.raw", names(lipolytica.num)), -grep("V308.xsgRNA.raw", names(lipolytica.num)), -grep("V309.xsgRNA.raw", names(lipolytica.num)), -grep("V310.xsgRNA.raw", names(lipolytica.num)), -grep("V311.xsgRNA.raw", names(lipolytica.num)), -grep("V312.xsgRNA.raw", names(lipolytica.num)), -grep("V313.xsgRNA.raw", names(lipolytica.num)), -grep("V314.xsgRNA.raw", names(lipolytica.num)), -grep("V315.xsgRNA.raw", names(lipolytica.num)), -grep("V316.xsgRNA.raw", names(lipolytica.num)), -grep("V317.xsgRNA.raw", names(lipolytica.num)), -grep("V318.xsgRNA.raw", names(lipolytica.num)), -grep("V319.xsgRNA.raw", names(lipolytica.num)), -grep("V320.xsgRNA.raw", names(lipolytica.num)), -grep("V81.x.xsgRNA.raw", names(lipolytica.num)), -grep("V81.y.ysgRNA.raw", names(lipolytica.num)))

doench.num.df <- doench.num %>% select(-grep("V306.xsgRNA.raw", names(doench.num)), -grep("V307.xsgRNA.raw", names(doench.num)), -grep("V308.xsgRNA.raw", names(doench.num)), -grep("V309.xsgRNA.raw", names(doench.num)), -grep("V310.xsgRNA.raw", names(doench.num)), -grep("V311.xsgRNA.raw", names(doench.num)), -grep("V312.xsgRNA.raw", names(doench.num)), -grep("V313.xsgRNA.raw", names(doench.num)), -grep("V314.xsgRNA.raw", names(doench.num)), -grep("V315.xsgRNA.raw", names(doench.num)), -grep("V316.xsgRNA.raw", names(doench.num)), -grep("V317.xsgRNA.raw", names(doench.num)), -grep("V318.xsgRNA.raw", names(doench.num)), -grep("V319.xsgRNA.raw", names(doench.num)), -grep("V320.xsgRNA.raw", names(doench.num)), -grep("V81.x.xsgRNA.raw", names(doench.num)), -grep("V81.y.ysgRNA.raw", names(doench.num)))

ecoli.num.df$species <- 1
lipolytica.num.df$species <- 2
doench.num.df$species <- 3

doench.num.sample <- doench.num.df[sample(nrow(doench.num.df), 500), ]
lipolytica.num.sample <- lipolytica.num.df[sample(nrow(lipolytica.num.df), 500), ]
ecoli.num.sample <- ecoli.num.df[sample(nrow(ecoli.num.df), 500), ]

all <- rbind(doench.num.sample, lipolytica.num.sample, ecoli.num.sample)
all$cut.score <- as.numeric(all$cut.score)

write.table(all[,c(1,3:ncol(all))], "doench.baisya.ecoli.finalquantum.noncorrelated.species.equal.features.txt", quote=F, row.names=F, sep="\t")
write.table(all[,c(1,3:ncol(all))], "doench.baisya.ecoli.finalquantum.noncorrelated.species.equal.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(all[,3:ncol(all)], "doench.baisya.ecoli.finalquantum.noncorrelated.species.equal.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species.equal
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species.equal
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName doench.baisya.ecoli.finalquantum.noncorrelated.species.equal --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/doench.baisya.ecoli.finalquantum.noncorrelated.species.equal.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/doench.baisya.ecoli.finalquantum.noncorrelated.species.score.txt


# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species.equal
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species.equal/Submits/submit_full_doench.baisya.ecoli.finalquantum.noncorrelated.species.equal_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species.equal/Submits/submit_train_doench.baisya.ecoli.finalquantum.noncorrelated.species.equal_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species.equal/Submits/submit_test_doench.baisya.ecoli.finalquantum.noncorrelated.species.equal_0.sh

#### need to make species feature numeric??

# Andes
module load python/3.7-anaconda3

vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species.equal
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/YNames.txt doench.baisya.ecoli.finalquantum.noncorrelated.species.equal
# 
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/doench.baisya.ecoli.finalquantum.noncorrelated.species.equal_cut.score.importance4 | head
# sgRNA.structuresgRNA.raw: 1.03285
# p12tetramer.Hbond.stackingraw: 0.979958
# p10tetramer.Hbond.stackingraw: 0.741546
# V2358sgRNA.raw: 0.713869
# p5tetramer.Hbond.stackingraw: 0.598242
# V1103.xsgRNA.raw: 0.580338
# p3tetramer.Hlgap.eVEraw: 0.577636
# p6tetramer.Hlgap.eVEraw: 0.482052
# p15tetramer.Hlgap.eVEraw: 0.428174

# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species.equal/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("doench.baisya.ecoli.finalquantum.noncorrelated.species.equal_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.2670445

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run.species.equal/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("doench.baisya.ecoli.finalquantum.noncorrelated.species.equal_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions., method=c("pearson"))
# 0.2670445
cor(y$cut.score, pred$Predictions., method=c("spearman"))
# 0.3238192

id <- read.delim("set4_test_SampleIDs.txt", header=F, sep="\t")
colnames(id) <- "sgRNAID"
id.pred <- cbind(id, pred)
id.pred.y <- cbind(id.pred, y)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
species <- read.delim("doench.baisya.ecoli.finalquantum.noncorrelated.species.equal.features.txt", header=T, sep="\t")
species.df <- species[,c(1,6144)]

id.pred.y.species <- left_join(id.pred.y, species.df, by="sgRNAID")

library(tidyr)
pred.ecoli <- subset(id.pred.y.species, id.pred.y.species$species == 1)
# 98
cor(pred.ecoli$cut.score, pred.ecoli$Predictions., method=c("pearson"))
# -0.02541041
pred.doench <- subset(id.pred.y.species, id.pred.y.species$species == 2)
# 97
cor(pred.doench$cut.score, pred.doench$Predictions., method=c("pearson"))
# 0.1163934
pred.lipolytica <- subset(id.pred.y.species, id.pred.y.species$species == 3)
# 105
cor(pred.lipolytica$cut.score, pred.lipolytica$Predictions., method=c("pearson"))
# -0.05232319
e.coli + h.sapien

–> try with only h.sapien and e.coli

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014")
doench <- read.delim("Doench2014.finalquantum.noncorrelated.txt", header=T, sep="\t", stringsAsFactors = F)
doench.id <- separate(doench, sgRNAID, c("data", "sgRNAID"))
doench.num <- mutate_all(doench.id[,2:ncol(doench.id)], function(x) as.numeric(as.character(x)))
doench.num$cut.score <- (doench.num$cut.score - min(doench.num$cut.score)) / (max(doench.num$cut.score) - min(doench.num$cut.score))

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
ecoli <- read.delim("Ecoli.finalquantum.noncorrelated.txt", header=T, sep="\t", stringsAsFactors = F)
ecoli.num <- mutate_all(ecoli[,2:ncol(ecoli)], function(x) as.numeric(as.character(x)))
ecoli.num$cut.score <- (ecoli.num$cut.score - min(ecoli.num$cut.score)) / (max(ecoli.num$cut.score) - min(ecoli.num$cut.score))
ecoli.num <- cbind(data.frame("sgRNAID" = ecoli$sgRNAID), ecoli.num)
summary(ecoli.num$cut.score)

ecoli.num.df <- ecoli.num %>% select(-grep("V306sgRNA.raw", names(ecoli.num)), -grep("V307sgRNA.raw", names(ecoli.num)), -grep("V308sgRNA.raw", names(ecoli.num)), -grep("V309sgRNA.raw", names(ecoli.num)), -grep("V310sgRNA.raw", names(ecoli.num)), -grep("V311sgRNA.raw", names(ecoli.num)), -grep("V312sgRNA.raw", names(ecoli.num)), -grep("V313sgRNA.raw", names(ecoli.num)), -grep("V314sgRNA.raw", names(ecoli.num)), -grep("V315sgRNA.raw", names(ecoli.num)), -grep("V316sgRNA.raw", names(ecoli.num)), -grep("V317sgRNA.raw", names(ecoli.num)), -grep("V318sgRNA.raw", names(ecoli.num)), -grep("V319sgRNA.raw", names(ecoli.num)), -grep("V320sgRNA.raw", names(ecoli.num)), -grep("V81sgRNA.raw", names(ecoli.num)))

doench.num.df <- doench.num %>% select(-grep("V306.xsgRNA.raw", names(doench.num)), -grep("V307.xsgRNA.raw", names(doench.num)), -grep("V308.xsgRNA.raw", names(doench.num)), -grep("V309.xsgRNA.raw", names(doench.num)), -grep("V310.xsgRNA.raw", names(doench.num)), -grep("V311.xsgRNA.raw", names(doench.num)), -grep("V312.xsgRNA.raw", names(doench.num)), -grep("V313.xsgRNA.raw", names(doench.num)), -grep("V314.xsgRNA.raw", names(doench.num)), -grep("V315.xsgRNA.raw", names(doench.num)), -grep("V316.xsgRNA.raw", names(doench.num)), -grep("V317.xsgRNA.raw", names(doench.num)), -grep("V318.xsgRNA.raw", names(doench.num)), -grep("V319.xsgRNA.raw", names(doench.num)), -grep("V320.xsgRNA.raw", names(doench.num)), -grep("V81.x.xsgRNA.raw", names(doench.num)), -grep("V81.y.ysgRNA.raw", names(doench.num)))

ecoli.num.df$species <- 1
doench.num.df$species <- 2

doench.num.sample <- doench.num.df[sample(nrow(doench.num.df), 500), ]
ecoli.num.sample <- ecoli.num.df[sample(nrow(ecoli.num.df), 500), ]

all <- rbind(doench.num.sample, ecoli.num.sample)
all$cut.score <- as.numeric(all$cut.score)

write.table(all[,c(1,3:ncol(all))], "doench.ecoli.finalquantum.noncorrelated.species.equal.features.txt", quote=F, row.names=F, sep="\t")
write.table(all[,c(1,3:ncol(all))], "doench.ecoli.finalquantum.noncorrelated.species.equal.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(all[,3:ncol(all)], "doench.ecoli.finalquantum.noncorrelated.species.equal.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

write.table(all[,1:2], "doench.ecoli.finalquantum.noncorrelated.species.equal.score.txt", quote=F, row.names=F, sep="\t")
write.table(all[,1:2], "doench.ecoli.finalquantum.noncorrelated.species.equal.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = all[,2]), "doench.ecoli.finalquantum.noncorrelated.species.equal.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")


# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/doench.ecoli.equal
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/doench.ecoli.equal
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName doench.ecoli.equal --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/doench.ecoli.finalquantum.noncorrelated.species.equal.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/doench.ecoli.finalquantum.noncorrelated.species.equal.score.txt


# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/doench.ecoli.equal
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/doench.ecoli.equal/Submits/submit_full_doench.ecoli.equal_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/doench.ecoli.equal/Submits/submit_train_doench.ecoli.equal_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/doench.ecoli.equal/Submits/submit_test_doench.ecoli.equal_0.sh

#### need to make species feature numeric??

# Andes
module load python/3.7-anaconda3

vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/doench.ecoli.equal
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/YNames.txt doench.ecoli.equal
# 0.46586381290648016
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/doench.ecoli.equal_cut.score.importance4 | head
# species: 14.6898
# sgRNA.structuresgRNA.raw: 6.48601
# p13tetramer.Hbond.stackingraw: 1.97119
# p19dimer.Hbond.stackingraw: 1.31008
# p20monomer.HLgap.eVraw: 0.975087
# p15tetramer.Hbond.stackingraw: 0.80502
# p20monomer.No.electronsraw: 0.737543
# p17tetramer.Hbond.stackingraw: 0.603806
# p6trimer.Hlgap.eVEraw: 0.449374
# p14trimer.Hbond.stackingraw: 0.442922

# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/doench.ecoli.equal/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("doench.ecoli.equal_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions., method=c("pearson"))
# 0.687478
cor(y$cut.score, pred$Predictions., method=c("spearman"))
# 0.7021931

id <- read.delim("set4_test_SampleIDs.txt", header=F, sep="\t")
colnames(id) <- "sgRNAID"
id.pred <- cbind(id, pred)
id.pred.y <- cbind(id.pred, y)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
species <- read.delim("doench.ecoli.finalquantum.noncorrelated.species.equal.features.txt", header=T, sep="\t")
species.df <- species[,c(1,6144)]

id.pred.y.species <- left_join(id.pred.y, species.df, by="sgRNAID")

library(dplyr)
pred.ecoli <- subset(id.pred.y.species, id.pred.y.species$species == 1)
# 99
cor(pred.ecoli$cut.score, pred.ecoli$Predictions., method=c("pearson"))
# 0.3457917
pred.doench <- subset(id.pred.y.species, id.pred.y.species$species == 2)
# 101
cor(pred.doench$cut.score, pred.doench$Predictions., method=c("pearson"))
# 0.6503844
e.coli + h.sapien Doench & Chuai
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/Doench2014CORRECTED")
doench <- read.delim("Doench2014CORRECTED.Chuai2018.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)
doench.num <- mutate_all(doench[,2:ncol(doench)], function(x) as.numeric(as.character(x)))
doench.num$cut.score <- (doench.num$cut.score - min(doench.num$cut.score)) / (max(doench.num$cut.score) - min(doench.num$cut.score))
doench.num <- cbind(data.frame("sgRNAID" = doench$sgRNAID), doench.num)
summary(doench.num$cut.score)
 #   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 # 0.0000  0.1238  0.2058  0.2471  0.3410  1.0000 

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli")
ecoli <- read.delim("Ecoli.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)
ecoli.num <- mutate_all(ecoli[,2:ncol(ecoli)], function(x) as.numeric(as.character(x)))
ecoli.num$cut.score <- (ecoli.num$cut.score - min(ecoli.num$cut.score)) / (max(ecoli.num$cut.score) - min(ecoli.num$cut.score))
ecoli.num <- cbind(data.frame("sgRNAID" = ecoli$sgRNAID), ecoli.num)
summary(ecoli.num$cut.score)
#   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
# 0.0000  0.3563  0.5618  0.5077  0.6757  1.0000 

setdiff(names(ecoli.num), names(doench.num))
setdiff(names(doench.num), names(ecoli.num))

ecoli.num.df <- ecoli.num %>% select(-grep("V306sgRNA.raw", names(ecoli.num)), -grep("V307sgRNA.raw", names(ecoli.num)), -grep("V308sgRNA.raw", names(ecoli.num)), -grep("V309sgRNA.raw", names(ecoli.num)), -grep("V310sgRNA.raw", names(ecoli.num)), -grep("V311sgRNA.raw", names(ecoli.num)), -grep("V312sgRNA.raw", names(ecoli.num)), -grep("V313sgRNA.raw", names(ecoli.num)), -grep("V314sgRNA.raw", names(ecoli.num)), -grep("V315sgRNA.raw", names(ecoli.num)), -grep("V316sgRNA.raw", names(ecoli.num)), -grep("V317sgRNA.raw", names(ecoli.num)), -grep("V318sgRNA.raw", names(ecoli.num)), -grep("V319sgRNA.raw", names(ecoli.num)), -grep("V320sgRNA.raw", names(ecoli.num)), -grep("V81sgRNA.raw", names(ecoli.num)))

doench.num.df <- doench.num %>% select(-grep("V306.xsgRNA.raw", names(doench.num)), -grep("V307.xsgRNA.raw", names(doench.num)), -grep("V308.xsgRNA.raw", names(doench.num)), -grep("V309.xsgRNA.raw", names(doench.num)), -grep("V310.xsgRNA.raw", names(doench.num)), -grep("V311.xsgRNA.raw", names(doench.num)), -grep("V312.xsgRNA.raw", names(doench.num)), -grep("V313.xsgRNA.raw", names(doench.num)), -grep("V314.xsgRNA.raw", names(doench.num)), -grep("V315.xsgRNA.raw", names(doench.num)), -grep("V316.xsgRNA.raw", names(doench.num)), -grep("V317.xsgRNA.raw", names(doench.num)), -grep("V318.xsgRNA.raw", names(doench.num)), -grep("V319.xsgRNA.raw", names(doench.num)), -grep("V320.xsgRNA.raw", names(doench.num)), -grep("V81.x.xsgRNA.raw", names(doench.num)), -grep("V81.y.ysgRNA.raw", names(doench.num)))

ecoli.num.df$species <- 1
doench.num.df$species <- 2

doench.num.sample <- doench.num.df[sample(nrow(doench.num.df), 15000), ]
ecoli.num.sample <- ecoli.num.df[sample(nrow(ecoli.num.df), 15000), ]

all <- rbind(doench.num.sample, ecoli.num.sample)
all$cut.score <- as.numeric(all$cut.score)

write.table(all[,c(1,3:ncol(all))], "hsapien.ecoli.finalquantum.noncorrelated.species.features.txt", quote=F, row.names=F, sep="\t")
write.table(all[,c(1,3:ncol(all))], "hsapien.ecoli.finalquantum.noncorrelated.species.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(all[,3:ncol(all)], "hsapien.ecoli.finalquantum.noncorrelated.species.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

write.table(all[,1:2], "hsapien.ecoli.finalquantum.noncorrelated.species.score.txt", quote=F, row.names=F, sep="\t")
write.table(all[,1:2], "hsapien.ecoli.finalquantum.noncorrelated.species.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = all[,2]), "hsapien.ecoli.finalquantum.noncorrelated.species.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")


# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName hsapien.ecoli --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/hsapien.ecoli.finalquantum.noncorrelated.species.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/hsapien.ecoli.finalquantum.noncorrelated.species.score.txt


# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli/Submits/submit_full_hsapien.ecoli_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli/Submits/submit_train_hsapien.ecoli_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli/Submits/submit_test_hsapien.ecoli_0.sh

#### need to make species feature numeric??

# Andes
module load python/3.7-anaconda3

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/YNames.txt hsapien.icoli
# 
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/hsapien.ecoli_cut.score.importance4 | head
# species: 403.04
# p20basepair.Hbond.energyraw: 15.5363
# p20basepair.Hlgap.eVEraw: 14.0459
# p18trimer.Hbond.energyraw: 13.7647
# p18monomer.No.electronsraw: 13.7079
# p19dimer.HLgap.eVEraw: 11.9187
# p1tetramer.Hbond.energyraw: 11.1035
# p11tetramer.Hbond.energyraw: 10.1444
# p17tetramer.Hbond.stackingraw: 8.17368
# p19dimer.Hbond.stackingraw: 8.04079

# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("hsapien.ecoli_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions., method=c("pearson"))
# 0.6972761
cor(y$cut.score, pred$Predictions., method=c("spearman"))
# 0.6865943

id <- read.delim("set4_test_SampleIDs.txt", header=F, sep="\t")
colnames(id) <- "sgRNAID"
id.pred <- cbind(id, pred)
id.pred.y <- cbind(id.pred, y)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
species <- read.delim("hsapien.ecoli.finalquantum.noncorrelated.species.features.txt", header=T, sep="\t")
species.df <- species[,c(1,6218)]

id.pred.y.species <- left_join(id.pred.y, species.df, by="sgRNAID")

library(dplyr)
pred.ecoli <- subset(id.pred.y.species, id.pred.y.species$species == 1)
# 2999
cor(pred.ecoli$cut.score, pred.ecoli$Predictions., method=c("pearson"))
# 0.5042479
pred.doench <- subset(id.pred.y.species, id.pred.y.species$species == 2)
# 3001
cor(pred.doench$cut.score, pred.doench$Predictions., method=c("pearson"))
# 0.4909198



library(ggplot2)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
pdf("hsapien.ecoli.finalquantum.species.correlation.pdf")
ggplot(id.pred.y.species, aes(x=cut.score, y=Predictions., color=species)) + geom_point() + theme_classic() + geom_smooth(method='lm')
dev.off()

pdf("hsapien.ecoli.finalquantum.species1.correlation.pdf")
ggplot(pred.ecoli, aes(x=cut.score, y=Predictions., color=species)) + geom_point() + theme_classic() + geom_smooth(method='lm')
dev.off()

pdf("hsapien.ecoli.finalquantum.species2.correlation.pdf")
ggplot(pred.doench, aes(x=cut.score, y=Predictions., color=species)) + geom_point() + theme_classic() + geom_smooth(method='lm')
dev.off()
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00

#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score hsapien.ecoli

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli/cut.score/RIT.run

# species   cut.score   0.4406124933876367  -2.8183428048041573e-05 30000.0 0.5000142472067156
# p20basepair.Hbond.energyraw   cut.score   0.019823692363659887    -4.240483571948726e-06  8765.894    0.5327902858841508
# p20basepair.Hlgap.eVEraw  cut.score   0.01445105897913007 2.21313949456048e-06    6433.343    0.4819049333913702
# p18monomer.No.electronsraw    cut.score   0.013388172085470917    -7.504843419688948e-07  16044.203   0.4364978373568982
# p18trimer.Hbond.energyraw cut.score   0.012540958441570604    -1.0643703405694754e-07 12592.634   0.3618404018991294
# p19dimer.Hbond.stackingraw    cut.score   0.011375769441069144    -8.87869358943278e-08   10569.825   0.369172303679948
# p19dimer.HLgap.eVEraw cut.score   0.011323883334471187    2.125886575572766e-08   9245.464    0.4125654176032728
# p1tetramer.Hbond.energyraw    cut.score   0.010794039709261714    1.114493561333129e-07   8173.915    0.4253986571396801
# p11tetramer.Hbond.energyraw   cut.score   0.010290398567884208    -1.164100814647948e-07  7445.994    0.4031929448565761
# p17tetramer.Hbond.energyraw   cut.score   0.008051417943653588    -3.3000720861262885e-08 9328.267    0.32286973343101755
library(ggplot2)
library(reshape2)
library(RColorBrewer)
library(dplyr)

## Main H.sapien feature figure
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli/cut.score")
imp <- read.delim("hsapien.ecoli.importance4.effect_sorted", header=F, sep="\t", stringsAsFactors = F)
colnames(imp) <- c("Feature", "YVec", "NormEdge", "Effect", "Samples", "Linearity")
imp$Normalized.Importance <- as.numeric(imp$NormEdge)
imp$Feature.Effect <- as.numeric(imp$Effect)
imp$SampleCount <- as.numeric(imp$Samples)
imp.dir <- imp %>% mutate(Effect.Direction = ifelse(Feature.Effect < 0, "neg", ifelse(Feature.Effect > 0, "pos", "zero")))
imp.dir.top20 <- imp.dir[1:20,]

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
imp.dir.top20.df <- imp.dir.top20 %>% mutate(imp.dir = ifelse(Effect.Direction == "neg", Normalized.Importance*-1, Normalized.Importance))
imp.dir.top20.df$Feature.Label <- c("Species", "Basepair H-bond pos20", "Basepair HL-gap pos20", "Monomer # of Electrons pos18", "Trimer H-bond pos18", "Dimer H-stacking pos19", "Dimer HL-gap pos19", "Tetramer H-bond pos1", "Tetramer H-bond pos11", "Tetramer H-bond pos17", "Tetramer H-bond pos14", "Tetramer H-bond pos13", "Tetramer H-stacking pos17", "Tetramer H-bond pos15", "Tetramer H-bond pos10", "Tetramer H-stacking pos16", "Tetramer H-stacking pos15", "Trimer H-bond pos15", "Tetramer HL-gap pos7", "Tetramer HL-gap pos3")

library(ggplot2)
pdf("hsapien.ecoli.FeatureEngineering.pdf")
ggplot(imp.dir.top20.df, aes(x=reorder(Feature.Label, Normalized.Importance), y=imp.dir, color=Effect.Direction)) + geom_point(size=3) + geom_segment(aes(x=Feature.Label, xend=Feature.Label, y=0, yend=imp.dir)) + labs(title="Multi-species model Top Features") + ylab("Normalized Importance") + xlab("") + theme(axis.text.x = element_text(angle=90, vjust=0.6)) + scale_fill_brewer(palette="Set1") + theme_classic() + coord_flip()
dev.off()

imp.dir.top20.nospecies <- imp.dir.top20.df[2:20,]
imp.dir.top20.nospecies$Feature.Label <- c("Basepair H-bond pos20", "Basepair HL-gap pos20", "Monomer # of Electrons pos18", "Trimer H-bond pos18", "Dimer H-stacking pos19", "Dimer HL-gap pos19", "Tetramer H-bond pos1", "Tetramer H-bond pos11", "Tetramer H-bond pos17", "Tetramer H-bond pos14", "Tetramer H-bond pos13", "Tetramer H-stacking pos17", "Tetramer H-bond pos15", "Tetramer H-bond pos10", "Tetramer H-stacking pos16", "Tetramer H-stacking pos15", "Trimer H-bond pos15", "Tetramer HL-gap pos7", "Tetramer HL-gap pos3")
pdf("hsapien.ecoli.FeatureEngineering.minusspecies.pdf")
ggplot(imp.dir.top20.nospecies, aes(x=reorder(Feature.Label, Normalized.Importance), y=imp.dir, color=Effect.Direction)) + geom_point(size=3) + geom_segment(aes(x=Feature.Label, xend=Feature.Label, y=0, yend=imp.dir)) + labs(title="Multi-species model Top Features [Minus Species]") + ylab("Normalized Importance") + xlab("") + theme(axis.text.x = element_text(angle=90, vjust=0.6)) + scale_fill_brewer(palette="Set1") + theme_classic() + coord_flip()
dev.off()

–> remove species from feature list and re-run

#R
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
df <- read.delim("hsapien.ecoli.finalquantum.noncorrelated.species.features.txt", header=T, sep="\t", stringsAsFactors = F)
df.nospecies <- df[,c(1,2:6217)]

write.table(df.nospecies, "hsapien.ecoli.finalquantum.noncorrelated.nospecies.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.nospecies, "hsapien.ecoli.finalquantum.noncorrelated.nospecies.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.nospecies[,2:ncol(df.nospecies)], "hsapien.ecoli.finalquantum.noncorrelated.nospecies.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
score <- read.delim("hsapien.ecoli.finalquantum.noncorrelated.species.score.txt", header=T, sep="\t", stringsAsFactors = F)

library(ggplot2)
library(dplyr)
score.species <- left_join(score, df[,c(1,6218)], by="sgRNAID")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
score.species.df <- score.species %>% mutate(species.id = ifelse(species == 1, "E.coli", ifelse(species == 2, "H.sapien", "Unknown")))
pdf("ecoli.hsapien.multispecies.score.violin.pdf")
ggplot(score.species.df) + geom_violin(aes(x=species.id, y=cut.score)) + theme_classic()
dev.off()

# p20basepair.Hbond.energyraw
# p20basepair.Hlgap.eVEraw
# p18monomer.No.electronsraw
# p18trimer.Hbond.energyraw
# p19dimer.Hbond.stackingraw
df.species <- df %>% select(grep("sgRNAID", names(df)), grep("p20basepair.Hbond.energyraw", names(df)), grep("p20basepair.Hlgap.eVEraw", names(df)), grep("p18monomer.No.electronsraw", names(df)), grep("p18trimer.Hbond.energyraw", names(df)), grep("p19dimer.Hbond.stackingraw", names(df)), grep("species", names(df)))
df.species.id <- df.species %>% mutate(species.id = ifelse(species == 1, "E.coli", ifelse(species == 2, "H.sapien", "Unknown")))
df.species.melt <- melt(df.species.id[,c(1:6,8)])
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
pdf("ecoli.hsapien.multispecies.featuredistribution.violin.pdf")
ggplot(df.species.melt) + geom_violin(aes(x=species.id, y=value, fill=species.id)) + theme_classic() + facet_grid(. ~ variable)
dev.off()
pdf("ecoli.hsapien.multispecies.featuredistribution.boxplot.pdf")
ggplot(df.species.melt) + geom_boxplot(aes(x=species.id, y=value, fill=species.id)) + theme_classic() + facet_grid(. ~ variable)
dev.off()

df.species.score <- left_join(score, df.species.id, by="sgRNAID")
df.species.score.melt <- melt(df.species.score[,c(1:7,9)], id=c("sgRNAID", "species.id", "cut.score"))
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
pdf("ecoli.hsapien.multispecies.topfeature.score.scatter.pdf")
ggplot(df.species.score.melt) + geom_point(aes(x=value, y=cut.score, color=species.id)) + theme_classic() + facet_grid(. ~ variable)
dev.off()

pdf("ecoli.hsapien.multispecies.p20basepair.Hbond.score.scatter.pdf")
ggplot(df.species.score) + geom_point(aes(x=p20basepair.Hbond.energyraw, y=cut.score, color=species.id)) + theme_classic()
dev.off()
pdf("ecoli.hsapien.multispecies.p20basepair.Hbond.violinr.pdf")
df.species.score.melt <- melt(df.species.score[,c(1:3,9)], id=c("sgRNAID", "species.id"))
ggplot(df.species.score.melt) + geom_violin(aes(x=species.id, y=value, color=species.id)) + theme_classic() + facet_grid(. ~ variable)
dev.off()

pdf("ecoli.hsapien.multispecies.p20basepair.Hbond.density.pdf")
df.species.score.factor <- df.species.score %>% mutate(p20bp.Hbond = ifelse(p20basepair.Hbond.energyraw == 27.11950899, "high", ifelse(p20basepair.Hbond.energyraw == 8.563800343, "low", "NA")))
ggplot(df.species.score.factor) + geom_density(aes(x=cut.score, color=p20bp.Hbond)) + theme_classic() + facet_grid(. ~ species.id)
dev.off()
pdf("ecoli.hsapien.multispecies.p20basepair.Hlgap.density.pdf")
df.species.score.factor <- df.species.score %>% mutate(p20bp.Hlgap = ifelse(p20basepair.Hlgap.eVEraw == 3.284, "high", ifelse(p20basepair.Hlgap.eVEraw == 3.161, "low", "NA")))
ggplot(df.species.score.factor) + geom_density(aes(x=cut.score, color=p20bp.Hlgap)) + theme_classic() + facet_grid(. ~ species.id)
dev.off()
pdf("ecoli.hsapien.multispecies.p18monomer.electrons.density.pdf")
df.species.score.factor <- df.species.score %>% mutate(p18monomer = ifelse(p18monomer.No.electronsraw > 49, "high", ifelse(p18monomer.No.electronsraw < 49, "low", "NA")))
ggplot(df.species.score.factor) + geom_density(aes(x=cut.score, color=p18monomer)) + theme_classic() + facet_grid(. ~ species.id)
dev.off()
pdf("ecoli.hsapien.multispecies.p18trimer.Hbond.density.pdf")
df.species.score.factor <- df.species.score %>% mutate(p18trimer = ifelse(p18trimer.Hbond.energyraw <= 63, "first.quarter", ifelse(p18trimer.Hbond.energyraw <= 77, "second.quarter", ifelse(p18trimer.Hbond.energyraw <= 84, "third.quarter", "fourth.quarter"))))
ggplot(df.species.score.factor) + geom_density(aes(x=cut.score, color=p18trimer)) + theme_classic() + facet_grid(. ~ species.id)
dev.off()






# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli.nospecies
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli.nospecies
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName hsapien.ecoli.nospecies --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/hsapien.ecoli.finalquantum.noncorrelated.nospecies.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/hsapien.ecoli.finalquantum.noncorrelated.species.score.txt


# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli.nospecies
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli.nospecies/Submits/submit_full_hsapien.ecoli.nospecies_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli.nospecies/Submits/submit_train_hsapien.ecoli.nospecies_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli.nospecies/Submits/submit_test_hsapien.ecoli.nospecies_0.sh

#### need to make species feature numeric??

# Andes
module load python/3.7-anaconda3

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli.nospecies
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species/iRF.run/YNames.txt hsapien.ecoli.nospecies
# 0.4531928800756321
sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/hsapien.ecoli.nospecies_cut.score.importance4 | head
# sgRNA.structuresgRNA.raw: 410.319
# p20basepair.Hlgap.eVEraw: 15.0505
# p18trimer.Hbond.energyraw: 14.6217
# p20basepair.Hbond.energyraw: 14.1491
# p18monomer.No.electronsraw: 12.331
# p19dimer.HLgap.eVEraw: 12.2639
# p1tetramer.Hbond.energyraw: 10.8594
# p11tetramer.Hbond.energyraw: 9.91986
# p17tetramer.Hbond.stackingraw: 8.06495
# p13tetramer.Hbond.energyraw: 7.77037

# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/multi.species.finalquantum.noncorrelated/hsapien.ecoli.nospecies/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("hsapien.ecoli.nospecies_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions., method=c("pearson"))
# 0.6954827
cor(y$cut.score, pred$Predictions., method=c("spearman"))
# 0.684502

id <- read.delim("set4_test_SampleIDs.txt", header=F, sep="\t")
colnames(id) <- "sgRNAID"
id.pred <- cbind(id, pred)
id.pred.y <- cbind(id.pred, y)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
species <- read.delim("hsapien.ecoli.finalquantum.noncorrelated.species.features.txt", header=T, sep="\t")
species.df <- species[,c(1,6218)]

id.pred.y.species <- left_join(id.pred.y, species.df, by="sgRNAID")

library(dplyr)
pred.ecoli <- subset(id.pred.y.species, id.pred.y.species$species == 1)
cor(pred.ecoli$cut.score, pred.ecoli$Predictions., method=c("pearson"))
# 0.4987557
pred.doench <- subset(id.pred.y.species, id.pred.y.species$species == 2)
cor(pred.doench$cut.score, pred.doench$Predictions., method=c("pearson"))
# 0.4875551

Putida

sgRNA dataset

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

setwd("/Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/putida/Lib1_Cas9_library_database.csv")
id <- read.delim("Lib1_Cas9_library_database.csv", header=T, sep=",", stringsAsFactors = F)
data <- read.delim("deseq2_lib_vs_cas_tf.csv", header=T, sep=",", stringsAsFactors = F)

library(tidyverse)
library(dplyr)
id$sgRNAID <- str_extract_all(id$gRNA, "[A-Z]+")
colnames(id) <- c("gRNA", "seq", "nucleotide.sequence")
data.id <- left_join(data, id[,c(1,3)], by="gRNA")

df <- data.id[,c(1,3,8)]
colnames(df) <- c("sgRNAID", "cut.score", "nucleotide.sequence")
df.mat <- as.matrix(df)
df.na <- na.omit(df.mat)

write.table(df.na, "putida.txt", quote=F, row.names=F, sep="\t")



mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida
# scp /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/putida/putida.txt noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/.
# scp /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/putida/GCF_000412675.1_ASM41267v1_genomic.fna noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/.
# scp /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/putida/GCF_000412675.1_ASM41267v1_genomic.gff noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/.

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/
sed '1d' putida.txt | awk '{print ">"$1"\n"$3}' > putida.fasta

blast

  • do a search for the sgRNA sequence in the genome
    • input fasta file of sequences, output coordinates
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

## blast
# conda install blast
# cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes
# wget https://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST/ncbi-blast-2.11.0+-x64-linux.tar.gz
# tar zxvpf ncbi-blast-2.11.0+-x64-linux.tar.gz
# export PATH=$PATH:$HOME/ncbi-blast-2.10.1+/bin
# echo $PATH
# mkdir $HOME/blastdb
# export BLASTDB=$HOME/blastdb
# set BLASTDB=$HOME/blastdb

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida

/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/makeblastdb -in GCF_000412675.1_ASM41267v1_genomic.fna -dbtype nucl
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query putida.fasta -db GCF_000412675.1_ASM41267v1_genomic.fna -out putida.gRNA.blast.tab -outfmt 6 -evalue 0.0005 -task blastn -num_threads 10

awk '{if ($9 > $10) print $2"\t"$10"\t"$9"\t"$1}' putida.gRNA.blast.tab > tmp1.bed
awk '{if ($10 > $9) print $2"\t"$9"\t"$10"\t"$1}' putida.gRNA.blast.tab > tmp2.bed
cat tmp1.bed tmp2.bed > putida.gRNA.blast.bed
## not capturing all of the guides... only 28971... why??

#/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query putida.fasta -db GCF_000412675.1_ASM41267v1_genomic.fna -out putida.gRNA.blast2.tab -outfmt 6 -evalue 0.001 -task blastn -num_threads 10

/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/ncbi-blast-2.11.0+/bin/blastn -query putida.fasta -db GCF_000412675.1_ASM41267v1_genomic.fna -out putida.gRNA.blast.tab -outfmt 6 -task blastn -num_threads 10 
awk '{if ($9 > $10) print $2"\t"$10"\t"$9"\t"$1}' putida.gRNA.blast.tab > tmp1.bed
awk '{if ($10 > $9) print $2"\t"$9"\t"$10"\t"$1}' putida.gRNA.blast.tab > tmp2.bed
cat tmp1.bed tmp2.bed > putida.gRNA.blast.bed
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

# R

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/")
df <- read.delim("putida.txt", header=T, sep="\t")
colnames(df) <- c("sgRNAID", "nucleotide.sequence", "cut.score")
coord <- read.delim("putida.gRNA.blast.bed", header=F, sep="\t")
colnames(coord) <- c("chr", "start", "end", "sgRNA")
df$sgRNA <- df$sgRNAID

library(dplyr)
df.coord <- left_join(coord, df, by="sgRNA")
write.table(df.coord, "putida.sgRNA.coord.txt", quote=F, row.names=F, sep="\t")

sliding windows

  • make 20bp sliding windows (every 1bp)
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida

faidx GCF_000412675.1_ASM41267v1_genomic.fna -i chromsizes > putida.sizes.genome
bedtools makewindows -g putida.sizes.genome -w 20 -s 1 > putida.20bp.sliding.bed

Features

Gene density & GC content

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida

## genes
grep 'gene' GCF_000412675.1_ASM41267v1_genomic.gff | sort -k 1,1 -k 4,4n > GCF_000412675.1_ASM41267v1_genomic.sort.gff
bedtools intersect -wo -a putida.20bp.sliding.bed -b GCF_000412675.1_ASM41267v1_genomic.sort.gff > putida.gene.20sliding.bed

## GC content
bedtools nuc -fi GCF_000412675.1_ASM41267v1_genomic.fna -bed putida.20bp.sliding.bed | sed '1d' > putida.GC.20sliding.bed

Temperature of melting (Tm)

https://biopython.org/docs/1.75/api/Bio.SeqUtils.MeltingTemp.html

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

Bio.SeqUtils.MeltingTemp.Tm_NN(seq, check=True, strict=True, c_seq=None, shift=0, nn_table=None, tmm_table=None, imm_table=None, de_table=None, dnac1=25, dnac2=25, selfcomp=False, Na=50, K=0, Tris=0, Mg=0, dNTPs=0, saltcorr=5)

https://warwick.ac.uk/fac/sci/moac/people/students/peter_cock/python/fasta_n

# summit: # conda install -c conda-forge biopython 

### sgRNA
# count nucleotides
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida
python3

input_file = open('putida.fasta', 'r')
output_file = open('nucleotide_counts_sgRNA.tsv','w')
output_file.write('Window\tA\tC\tG\tT\tLength\tCG%\n')
from Bio import SeqIO
for cur_record in SeqIO.parse(input_file, "fasta") :
    gene_name = cur_record.name
    A_count = cur_record.seq.count('A')
    C_count = cur_record.seq.count('C')
    G_count = cur_record.seq.count('G')
    T_count = cur_record.seq.count('T')
    length = len(cur_record.seq)
    cg_percentage = float(C_count + G_count) / length
    output_line = '%s\t%i\t%i\t%i\t%i\t%i\t%f\n' % \
    (gene_name, A_count, C_count, G_count, T_count, length, cg_percentage)
    output_file.write(output_line)
    
output_file.close()
input_file.close()
exit()

# Melting temperature(°C) = 64.9 + 41 * (nG+nC-16.4)/(nA+nT+nG+nC)
R

library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
df <- read.delim("nucleotide_counts_sgRNA.tsv", header=T, sep="\t")
df.melt <- df %>% mutate(MeltingTemp = 64.9 + 41 * (G+C-16.4) / (A+T+G+C))

write.table(df.melt, "putida.nucleotide_counts_sgRNA_temp.txt", quote=F, row.names=F, sep="\t")
q()




### 20bp sliding windows
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida
bedtools getfasta -fi GCF_000412675.1_ASM41267v1_genomic.fna -bed putida.20bp.sliding.bed -fo putida.20sliding.fa

# count nucleotides
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida
python3

input_file = open('putida.20sliding.fa', 'r')
output_file = open('nucleotide_counts_20sliding.tsv','w')
output_file.write('Window\tA\tC\tG\tT\tLength\tCG%\n')
from Bio import SeqIO
for cur_record in SeqIO.parse(input_file, "fasta") :
    gene_name = cur_record.name
    A_count = cur_record.seq.count('A')
    C_count = cur_record.seq.count('C')
    G_count = cur_record.seq.count('G')
    T_count = cur_record.seq.count('T')
    length = len(cur_record.seq)
    cg_percentage = float(C_count + G_count) / length
    output_line = '%s\t%i\t%i\t%i\t%i\t%i\t%f\n' % \
    (gene_name, A_count, C_count, G_count, T_count, length, cg_percentage)
    output_file.write(output_line)
    
output_file.close()
input_file.close()
exit()

# Melting temperature(°C) = 64.9 + 41 * (nG+nC-16.4)/(nA+nT+nG+nC)
R

library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
df <- read.delim("nucleotide_counts_20sliding.tsv", header=T, sep="\t")
df.melt <- df %>% mutate(MeltingTemp = 64.9 + 41 * (G+C-16.4) / (A+T+G+C))

write.table(df.melt, "putida.nucleotide_counts_20sliding_temp.txt", quote=F, row.names=F, sep="\t")
q()

Onehot encoding

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/
cut -f 1,3 putida.txt > putida.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/encode_sequences.py putida.noscore.txt


# separate nucleotide sequence values into individual columns in data frame so each position counts as one feature
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/

sed '1d' putida.noscore_independent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID A C T G' | cut -d ' ' -f 1-5 > putida_ind1.txt
sed '1d' putida.noscore_independent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID AA AC AT AG CA CC CT CG TA TC TT TG GA GC GT GG' | cut -d ' ' -f 1-17 > putida_ind2.txt
sed '1d' putida.noscore_dependent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID p1.A p1.C p1.T p1.G p2.A p2.C p2.T p2.G p3.A p3.C p3.T p3.G p4.A p4.C p4.T p4.G p5.A p5.C p5.T p5.G p6.A p6.C p6.T p6.G p7.A p7.C p7.T p7.G p8.A p8.C p8.T p8.G p9.A p9.C p9.T p9.G p10.A p10.C p10.T p10.G p11.A p11.C p11.T p11.G p12.A p12.C p12.T p12.G p13.A p13.C p13.T p13.G p14.A p14.C p14.T p14.G p15.A p15.C p15.T p15.G p16.A p16.C p16.T p16.G p17.A p17.C p17.T p17.G p18.A p18.C p18.T p18.G p19.A p19.C p19.T p19.G p20.A p20.C p20.T p20.G' | cut -d ' ' -f 1-81 > putida_dep1.txt
sed '1d' putida.noscore_dependent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID p1.AA p1.AC p1.AT p1.AG p1.CA p1.CC p1.CT p1.CG p1.TA p1.TC p1.TT p1.TG p1.GA p1.GC p1.GT p1.GG p2.AA p2.AC p2.AT p2.AG p2.CA p2.CC p2.CT p2.CG p2.TA p2.TC p2.TT p2.TG p2.GA p2.GC p2.GT p2.GG p3.AA p3.AC p3.AT p3.AG p3.CA p3.CC p3.CT p3.CG p3.TA p3.TC p3.TT p3.TG p3.GA p3.GC p3.GT p3.GG p4.AA p4.AC p4.AT p4.AG p4.CA p4.CC p4.CT p4.CG p4.TA p4.TC p4.TT p4.TG p4.GA p4.GC p4.GT p4.GG p5.AA p5.AC p5.AT p5.AG p5.CA p5.CC p5.CT p5.CG p5.TA p5.TC p5.TT p5.TG p5.GA p5.GC p5.GT p5.GG p6.AA p6.AC p6.AT p6.AG p6.CA p6.CC p6.CT p6.CG p6.TA p6.TC p6.TT p6.TG p6.GA p6.GC p6.GT p6.GG p7.AA p7.AC p7.AT p7.AG p7.CA p7.CC p7.CT p7.CG p7.TA p7.TC p7.TT p7.TG p7.GA p7.GC p7.GT p7.GG p8.AA p8.AC p8.AT p8.AG p8.CA p8.CC p8.CT p8.CG p8.TA p8.TC p8.TT p8.TG p8.GA p8.GC p8.GT p8.GG p9.AA p9.AC p9.AT p9.AG p9.CA p9.CC p9.CT p9.CG p9.TA p9.TC p9.TT p9.TG p9.GA p9.GC p9.GT p9.GG p10.AA p10.AC p10.AT p10.AG p10.CA p10.CC p10.CT p10.CG p10.TA p10.TC p10.TT p10.TG p10.GA p10.GC p10.GT p10.GG p11.AA p11.AC p11.AT p11.AG p11.CA p11.CC p11.CT p11.CG p11.TA p11.TC p11.TT p11.TG p11.GA p11.GC p11.GT p11.GG p12.AA p12.AC p12.AT p12.AG p12.CA p12.CC p12.CT p12.CG p12.TA p12.TC p12.TT p12.TG p12.GA p12.GC p12.GT p12.GG p13.AA p13.AC p13.AT p13.AG p13.CA p13.CC p13.CT p13.CG p13.TA p13.TC p13.TT p13.TG p13.GA p13.GC p13.GT p13.GG p14.AA p14.AC p14.AT p14.AG p14.CA p14.CC p14.CT p14.CG p14.TA p14.TC p14.TT p14.TG p14.GA p14.GC p14.GT p14.GG p15.AA p15.AC p15.AT p15.AG p15.CA p15.CC p15.CT p15.CG p15.TA p15.TC p15.TT p15.TG p15.GA p15.GC p15.GT p15.GG p16.AA p16.AC p16.AT p16.AG p16.CA p16.CC p16.CT p16.CG p16.TA p16.TC p16.TT p16.TG p16.GA p16.GC p16.GT p16.GG p17.AA p17.AC p17.AT p17.AG p17.CA p17.CC p17.CT p17.CG p17.TA p17.TC p17.TT p17.TG p17.GA p17.GC p17.GT p17.GG p18.AA p18.AC p18.AT p18.AG p18.CA p18.CC p18.CT p18.CG p18.TA p18.TC p18.TT p18.TG p18.GA p18.GC p18.GT p18.GG p19.AA p19.AC p19.AT p19.AG p19.CA p19.CC p19.CT p19.CG p19.TA p19.TC p19.TT p19.TG p19.GA p19.GC p19.GT p19.GG p20.AA p20.AC p20.AT p20.AG p20.CA p20.CC p20.CT p20.CG p20.TA p20.TC p20.TT p20.TG p20.GA p20.GC p20.GT p20.GG' | cut -d ' ' -f 1-321 > putida_dep2.txt

chemical tensors

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/
sed '1d' putida.noscore.txt | awk '{gsub(/./,"& ",$2);print $0}' | sed '1i sgRNAID p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 p13 p14 p15 p16 p17 p18 p19 p20' | cut -d ' ' -f 1-21 > putida.sequence.txt


# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

R
library(dplyr)
library(reshape2)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("protein_rna_dna-vector_lee_nucleotide_dna_data.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/")
seq <- read.delim("putida.sequence.txt", header=T, sep=" ", stringsAsFactors = F)

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:5]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- c("A", "C", "G", "T")

rownames(seq) <- seq[,1]
seq.df <- seq[,2:21]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
write.table(seq.tensor.melt, "putida.tensors.melt.txt", quote=F, row.names=F, sep="\t")

seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")
write.table(seq.tensor.dcast, "putida.tensors.txt", quote=F, row.names=F, sep="\t")

RNA structure (ViennaRNA)

https://www.tbi.univie.ac.at/RNA/tutorial/ minimum free energy (MFE) structure

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate ViennaRNA

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/vienna
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/vienna
RNAfold < ../putida.fasta > putida.gRNA.ViennaRNA.output.txt

grep '(' putida.gRNA.ViennaRNA.output.txt | grep -Eo '[+-]?[0-9]+([.][0-9]+)?' > putida.gRNA.ViennaRNA.output.value.txt
grep '>' putida.gRNA.ViennaRNA.output.txt | sed 's/>//g' > putida.gRNA.names.txt
paste putida.gRNA.names.txt putida.gRNA.ViennaRNA.output.value.txt > putida.gRNA.ViennaRNA.output.value.id.txt
cp putida.gRNA.ViennaRNA.output.value.id.txt ../.

# 20bp sliding fasta
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/vienna
RNAfold < ../putida.20sliding.fa > putida.20sliding.ViennaRNA.output.txt

grep '(' putida.20sliding.ViennaRNA.output.txt | grep -Eo '[+-]?[0-9]+([.][0-9]+)?' > putida.20sliding.ViennaRNA.output.value.txt
grep '>' putida.20sliding.ViennaRNA.output.txt | sed 's/>//g' > putida.20sliding.names.txt
paste putida.20sliding.names.txt putida.20sliding.ViennaRNA.output.value.txt > putida.20sliding.ViennaRNA.output.value.id.txt
cp putida.20sliding.ViennaRNA.output.value.id.txt ../.
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J ViennaRNA.ylipolytica
#SBATCH -N 2
#SBATCH -t 48:00:00

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate ViennaRNA

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/vienna
RNAfold < ../putida.20sliding.fa > putida.20sliding.ViennaRNA.output.txt

grep '(' putida.20sliding.ViennaRNA.output.txt | grep -Eo '[+-]?[0-9]+([.][0-9]+)?' > putida.20sliding.ViennaRNA.output.value.txt
grep '>' putida.20sliding.ViennaRNA.output.txt | sed 's/>//g' > putida.20sliding.names.txt
paste putida.20sliding.names.txt putida.20sliding.ViennaRNA.output.value.txt > putida.20sliding.ViennaRNA.output.value.id.txt
cp putida.20sliding.ViennaRNA.output.value.id.txt ../.

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/ViennaRNA.putida.sh

GATC motif

  • proxy for putative methylation
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

## GATC motif
## fastaregex
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f GCF_000412675.1_ASM41267v1_genomic.fna -r 'GATC' > putida.gatc.bed

bedtools intersect -wo -a putida.20bp.sliding.bed -b putida.gatc.bed > putida.gatc.20sliding.bed

PAM

https://www.synthego.com/guide/how-to-use-crispr/pam-sequence

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

# generate fastq file of NGG sequences and blast to reference

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida
cut -f 1-4 putida.sgRNA.coord.txt | sed '1d' | sort -k 1,1 -k 2,2n > putida.sgRNA.coord.bed

# vim NGG.PAM.fasta

## fastaRegexFinder
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f GCF_000412675.1_ASM41267v1_genomic.fna -r 'AGG' > putida.AGG.PAM.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f GCF_000412675.1_ASM41267v1_genomic.fna -r 'TGG' > putida.TGG.PAM.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f GCF_000412675.1_ASM41267v1_genomic.fna -r 'CGG' > putida.CGG.PAM.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/fastaRegexFinder.py -q -f GCF_000412675.1_ASM41267v1_genomic.fna -r 'GGG' > putida.GGG.PAM.txt

cat putida.AGG.PAM.txt putida.TGG.PAM.txt putida.CGG.PAM.txt putida.GGG.PAM.txt > putida.NGG.PAM.txt
sort -k 1,1 -k 2,2n putida.NGG.PAM.txt > putida.NGG.PAM.sorted.bed

# intersect with sliding windows in the genome to get density for DWT
bedtools intersect -wo -a putida.20bp.sliding.bed -b putida.NGG.PAM.sorted.bed > putida.NGG.PAM.20bp.sliding.windows.bed

# closest with gRNAs to identify distance (downstream, strand)
awk '{print $0"\t""+"}' putida.sgRNA.coord.bed > putida.sgRNA.coord.strand.txt
bedtools closest -a putida.sgRNA.coord.strand.txt -b putida.NGG.PAM.sorted.bed -io -iu -D a > putida.sgRNA.closestPAM.bed

location relative to gene

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida
awk '{if ($3 == "gene") print $0}' GCF_000412675.1_ASM41267v1_genomic.gff | sort -k 1,1 -k 4,4n > GCF_000412675.1_ASM41267v1_genomic.gene.gff
bedtools closest -a putida.sgRNA.coord.bed -b GCF_000412675.1_ASM41267v1_genomic.gene.gff -D b > putida.sgRNA.gene.closest.bed

Raw features matrix

# salloc -A SYB105 -N 2 -t 4:00:00

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
structure <- read.delim("putida.gRNA.ViennaRNA.output.value.id.txt", header=F, sep="\t", stringsAsFactors = F)
nuc <- read.delim("putida.nucleotide_counts_sgRNA_temp.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("putida.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- unique(score[,c(5:6)])
colnames(score.df) <- c("sgRNAID", "cut.score")

structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]

# strucutre, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])

structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"

structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]

structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score.df, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")

## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
onehot.ind1 <- read.delim("putida_ind1.txt", header=T, sep=" ")
onehot.ind2 <- read.delim("putida_ind2.txt", header=T, sep=" ")
onehot.dep1 <- read.delim("putida_dep1.txt", header=T, sep=" ")
onehot.dep2 <- read.delim("putida_dep2.txt", header=T, sep=" ")
onehot.dep2 <- onehot.dep2[,1:305]

onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep <- full_join(onehot.dep1, onehot.dep2, by="sgRNAID")
onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"

data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))

df.melt <- melt(data.onehot, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
write.table(df.id, "df.id.test.txt", quote=F, row.names=F, sep="\t")

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
tensor <- read.delim("putida.tensors.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0

tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0
write.table(tensor.id, "tensor.id.test", quote=F, row.names=F, sep="\t")



# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
df.id <- read.delim("df.id.test.txt", header=T, sep="\t")
tensor.id <- read.delim("tensor.id.test", header=T, sep="\t")

df.score <- unique(df.id[,c(1,3)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]

head(df.id)
head(tensor.score.order)
tensor.df <- rbind(df.id, tensor.score.order)
write.table(tensor.df, "putida.raw.onehot.tensor.txt", quote=F, row.names=F, sep="\t")

df.dcast <- tensor.df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
write.table(df.dcast, "putida.raw.onehot.tensor.dcast.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast)
# 149437
df.dcast.na <- na.omit(df.dcast)
write.table(df.dcast.na, "putida.raw.onehot.tensor.dcast.na.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast.na)
# 149437


# pam (distance and nucleotide)
# setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
# sgRNA.pam <- read.table("putida.sgRNA.closestPAM.bed", header=F, sep="\t", stringsAsFactors = F)
# sgRNA.pam.sub <- sgRNA.pam[,c(4,12,13)]
# colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
# sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
# sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]
# 
# score.location <- left_join(score.df, sgRNA.pam.df, by=c("sgRNAID"))
# score.location$scale <- 0
# 
# df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
# df <- na.omit(df.melt)
# colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
# 
# df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
# df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
# df.dcast.na <- na.omit(df.dcast)
# # 27345
# write.table(df.dcast.na, "putida.sgRNA.pam.dcast.txt", quote=F, row.names=F, sep="\t")
# 
# setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
# df.dcast <- read.delim("putida.sgRNA.pam.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
# df <- read.delim("putida.raw.onehot.tensor.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
# 
# df.location <- left_join(df, df.dcast, by=c("sgRNAID"))
# nrow(df.location)
# # 27363
# 
# write.table(df.location, "putida.raw.onehot.tensor.pam.dcast.na.txt", quote=F, row.names=F, sep="\t")



# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
sgRNA.genes <- read.table("putida.sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- unique(sgRNA.genes[,c(4,14)])
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")

score.location <- left_join(score.df, sgRNA.genes.df, by=c("sgRNAID"))
score.location$scale <- 0

df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
# 148591
write.table(df.dcast.na, "putida.sgRNA.location.dcast.txt", quote=F, row.names=F, sep="\t")


setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
df.dcast.na <- read.delim("putida.sgRNA.location.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
#df <- read.delim("putida.raw.onehot.tensor.pam.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
df <- read.delim("putida.raw.onehot.tensor.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)

df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
nrow(df.location)
# 148591

write.table(df.location, "putida.raw.onehot.tensor.pam.location.dcast.na.txt", quote=F, row.names=F, sep="\t")

18 January

  • matrix including raw values, positional encoding kmers, quantum tensors (singleton, basepair, dimer)
# positional encoding kmers 1-4
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer1_positional_encode.py putida.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer2_positional_encode.py putida.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer3_positional_encode.py putida.noscore.txt
python /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/kmer4_positional_encode.py putida.noscore.txt


# separate nucleotide sequence values into individual columns in data frame so each position counts as one feature
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/

sed '1d' putida.noscore_dependent1.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > putida_dep1.txt
sed '1d' putida.noscore_dependent2.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > putida_dep2.txt
sed '1d' putida.noscore_dependent3.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > putida_dep3.txt
sed '1d' putida.noscore_dependent4.txt | sed '1d' | awk '{gsub(/./,"& ",$2);print $0}' > putida_dep4.txt
# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

R
library(dplyr)
library(reshape2)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("protein_rna_dna-vector_lee_nucleotide_dna_data_15dec.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/")
seq <- read.delim("putida.sequence.txt", header=T, sep=" ", stringsAsFactors = F)

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:5]
tensor.t <- as.data.frame(t(tensor.df[63:70,]))
tensor.t$base <- c("A", "C", "G", "T")

rownames(seq) <- seq[,1]
seq.df <- seq[,2:21]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))

seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "putida.tensors.single.bp.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "putida.tensors.single.bp.melt.txt", quote=F, row.names=F, sep="\t")
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J jan18.matrix
#SBATCH -N 4
#SBATCH -t 10:00:00

module load python
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida
R CMD BATCH mar8.matrix.R

#sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/mar8.matrix.sh
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
structure <- read.delim("putida.gRNA.ViennaRNA.output.value.id.txt", header=F, sep="\t", stringsAsFactors = F)
nuc <- read.delim("putida.nucleotide_counts_sgRNA_temp.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("putida.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- unique(score[,c(5:6)])
colnames(score.df) <- c("sgRNAID", "cut.score")

structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]

# strucutre, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])

structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"

structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]

structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score.df, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")

## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
onehot.ind1 <- read.delim("putida_ind1.txt", header=T, sep=" ")
onehot.ind2 <- read.delim("putida_ind2.txt", header=T, sep=" ")
onehot.dep1 <- read.delim("putida_dep1.txt", header=F, sep=" ")
onehot.dep2 <- read.delim("putida_dep2.txt", header=F, sep=" ")
onehot.dep3 <- read.delim("putida_dep3.txt", header=F, sep=" ")
onehot.dep4 <- read.delim("putida_dep4.txt", header=F, sep=" ")
colnames(onehot.dep1)[1] <- "sgRNAID"
colnames(onehot.dep2)[1] <- "sgRNAID"
colnames(onehot.dep3)[1] <- "sgRNAID"
colnames(onehot.dep4)[1] <- "sgRNAID"

onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)-1], onehot.dep2[,1:ncol(onehot.dep2)-1], by="sgRNAID")
onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)-1], by="sgRNAID")
onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)-1], by="sgRNAID")

onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"

data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))

df.melt <- melt(data.onehot, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
write.table(df.id, "putida.structure.temp.gc.onehot1to4.txt", quote=F, row.names=F, sep="\t")

# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
tensor <- read.delim("putida.tensors.single.bp.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0

tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0
write.table(tensor.id, "tensor.id.test", quote=F, row.names=F, sep="\t")


library(dplyr)
library(reshape2)
library(tidyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
tensor.id <- read.delim("tensor.id.test", header=T, sep="\t")
df.id <- read.delim("putida.structure.temp.gc.onehot1to4.txt", header=T, sep="\t")
score <- read.delim("putida.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- unique(score[,c(5:6)])
colnames(score.df) <- c("sgRNAID", "cut.score")

#df.score <- unique(df.id[,c(1,3)])
tensor.score <- inner_join(tensor.id, score.df, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]

#head(df.id)
#head(tensor.score.order)
tensor.df <- rbind(df.id, tensor.score.order)

df.dcast <- tensor.df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
write.table(df.dcast, "putida.raw.onehot.tensor.single.bp.dcast.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast.na)
# 96002
nrow(df.dcast)
# 149625

# pam (distance and nucleotide)
# setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
# sgRNA.pam <- read.table("putida.sgRNA.closestPAM.bed", header=F, sep="\t", stringsAsFactors = F)
# sgRNA.pam.sub <- sgRNA.pam[,c(4,12,13)]
# colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
# sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
# sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]
# #sgRNA.pam.df$id <- "Cas9"
# #sgRNA.pam.id <- unite(sgRNA.pam.df, "sgRNAID", c(sgRNAID, id), sep="_")
# 
# score <- read.delim("putida.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
# score.df <- score[,c(5:6)]
# colnames(score.df) <- c("sgRNAID", "cut.score")
# 
# score.location <- left_join(score.df, sgRNA.pam.df, by="sgRNAID")
# score.location$scale <- 0
# 
# df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
# df <- na.omit(df.melt)
# colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")
# 
# df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
# df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
# df.dcast.na <- na.omit(df.dcast)
# 
# df <- read.delim("putida.raw.onehot.tensor.single.bp.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
# 
# df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
# nrow(df.location)
# 


# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
sgRNA.genes <- read.table("putida.sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- sgRNA.genes[,c(4,14)]
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")
#sgRNA.genes.df$id <- "Cas9"
#sgRNA.genes.id <- unite(sgRNA.genes.df, "sgRNAID", c(sgRNAID, id), sep="_")

score.location <- left_join(score.df, sgRNA.genes.df, by=c("sgRNAID"))
score.location$scale <- 0

df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)

#df <- df.location
df <- read.delim("putida.raw.onehot.tensor.single.bp.dcast.txt", header=T, sep="\t", stringsAsFactors = F)
df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
nrow(df.location)
# 148591

write.table(df.location, "putida.raw.onehot.tensor.single.bp.pam.location.dcast.na.txt", quote=F, row.names=F, sep="\t")
# add new DNA/RNA dimer features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(tidyr)
library(reshape2)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/")
tensor <- read.delim("quantum_dimers_20dec.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
seq <- read.delim("putida.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.dimer <- seq %>% unite("p1", p1:p2, remove=F, sep= "") %>% unite("p2", p2:p3, remove=F, sep= "") %>% unite("p3", p3:p4, remove=F, sep= "") %>% unite("p4", p4:p5, remove=F, sep= "") %>% unite("p5", p5:p6, remove=F, sep= "") %>% unite("p6", p6:p7, remove=F, sep= "") %>% unite("p7", p7:p8, remove=F, sep= "") %>% unite("p8", p8:p9, remove=F, sep= "") %>% unite("p9", p9:p10, remove=F, sep= "") %>% unite("p10", p10:p11, remove=F, sep= "") %>% unite("p11", p11:p12, remove=F, sep= "") %>% unite("p12", p12:p13, remove=F, sep= "") %>% unite("p13", p13:p14, remove=F, sep= "") %>% unite("p14", p14:p15, remove=F, sep= "") %>% unite("p15", p15:p16, remove=F, sep= "") %>% unite("p16", p16:p17, remove=F, sep= "") %>% unite("p17", p17:p18, remove=F, sep= "") %>% unite("p18", p18:p19, remove=F, sep= "") %>% unite("p19", p19:p20, remove=T, sep= "")

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:17]
tensor.t <- as.data.frame(t(tensor.df))
#tensor.t$base <- c("A", "C", "G", "T")
tensor.t$base <- names(tensor[,2:17])

rownames(seq) <- seq.dimer[,1]
seq.df <- seq.dimer[,2:20]
seq.melt <- melt(seq.dimer, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))

seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "putida.tensors.dimers.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "putida.tensors.dimers.melt.txt", quote=F, row.names=F, sep="\t")
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

library(tidyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
df <- read.delim("putida.raw.onehot.tensor.single.bp.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)

# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
tensor <- read.delim("putida.tensors.dimers.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0

tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0

df.score <- unique(df[,c(1,2)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
colnames(tensor.score.order) <- c("cut.score", "feature.scale", "sgRNAID", "value")

df.dcast <- tensor.score.order %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
write.table(df.dcast.na, "putida.raw.onehot.tensor.single.bp.dimers.dcast.na.txt", quote=F, row.names=F, sep="\t")
nrow(df.dcast.na)
# 27345

df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
write.table(df.location, "putida.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.txt", quote=F, row.names=F, sep="\t")
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
df <- read.delim("putida.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df <- df[,c(1:6073,6075:6079,6081,6083:6177)]
df.num <- mutate_all(df[,2:ncol(df)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"

write.table(df.all, "putida.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.txt", quote=F, row.names=F, sep="\t")

write.table(df.all[,c(1,3:ncol(df.all))], "putida.raw.onehot.tensor.single.bp.dimers.pam.location.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "putida.raw.onehot.tensor.single.bp.dimers.pam.location.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "putida.raw.onehot.tensor.single.bp.dimers.pam.location.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "putida.raw.onehot.tensor.single.bp.dimers.pam.location.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "putida.raw.onehot.tensor.single.bp.dimers.pam.location.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.all[,2]), "putida.raw.onehot.tensor.single.bp.dimers.pam.location.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
df <- read.delim("putida.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df <- df[,c(1:6072,6074:6078,6080,6082:6176)]
df.num <- mutate_all(df[,2:ncol(df)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"

df.abs <- df.all %>% select(grep("bondraw", names(df.all))) %>% abs()
df.all.sub <- df.all %>% select(-grep("bondraw", names(df.all))) 
df.abs.all <- cbind(df.all.sub, df.abs)
df.abs.all2 <- df.abs.all %>% select(-grep("cut.score.x", names(df.abs.all))) %>% select(-grep("cut.score.y", names(df.abs.all))) %>% select(-grep("cut.score.y.y", names(df.abs.all))) 
df.abs.all3 <- df.abs.all2 %>% select(-grep("cut.score.y.y", names(df.abs.all2))) 
write.table(df.abs.all3, "putida.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.txt", quote=F, row.names=F, sep="\t")

df.minusHL <- df.abs.all3 %>% select(-grep("HL", names(df.abs.all3)), -grep("HOMO", names(df.abs.all3)), -grep("LUMO", names(df.abs.all3))) 
# 5991 features

write.table(df.minusHL, "putida.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.txt", quote=F, row.names=F, sep="\t")

write.table(df.minusHL[,c(1,3:ncol(df.minusHL))], "putida.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.minusHL[,c(1,3:ncol(df.minusHL))], "putida.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.minusHL[,3:ncol(df.minusHL)], "putida.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")

write.table(df.minusHL[,c(1:2)], "putida.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.minusHL[,c(1:2)], "putida.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame(cut.score = df.minusHL[,2]), "putida.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")


# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/noHL
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/noHL

python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName noHL --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/putida.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/putida.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/noHL
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/noHL/Submits/submit_full_noHL_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/noHL/Submits/submit_train_noHL_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/noHL/Submits/submit_test_noHL_0.sh

# Andes
module load python/3.7-anaconda3

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/noHL
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/YNames.txt noHL
# 0.2564941121557385
sort -k3rg topVarEdges/cut.score_top95.txt | head
# sgRNA.structuresgRNA.raw  cut.score   0.2676690008756795
# sgRNA.gcsgRNA.raw cut.score   0.04810165968914591
# sgRNA.tempsgRNA.raw   cut.score   0.045682202832182765
# V4087sgRNA.raw    cut.score   0.028748615100844917  <-- p16.GGCC
# GGsgRNA.raw   cut.score   0.028323663567439376
# pam.distance0 cut.score   0.018764516297564895
# V4343sgRNA.raw    cut.score   0.017372547699812658  <-- p17.GGCC
# V4312sgRNA.raw    cut.score   0.014410503258427519  <-- p17.GCCT
# p13dimer_H_bondraw    cut.score   0.014158699462834634
# p11dimer_H_bondraw    cut.score   0.011419832884509533

# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/noHL/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("noHL_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.5316399
RIT

** Need to compile the C++ file /gpfs/alpine/syb105/proj-shared/Personal/jromero/codesnippets/ritw **

  • run RIT on Cas9 model with all features
  • need to run arva-rit and then runRIT.sh (3 scripts)
  • two outputs: size effect and directionality
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00

#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/noHL/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score noHL

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/noHL/cut.score/RIT.run

# sgRNA.structuresgRNA.raw  cut.score   0.2676690008756795  -0.017026901627220408   70802.78    -0.4160079400908415
# sgRNA.gcsgRNA.raw cut.score   0.04810165968914591 0.009099651611727953    30993.431   -1.2138644207966751
# sgRNA.tempsgRNA.raw   cut.score   0.045682202832182765    0.009947162040008827    29374.496   -1.201684899390678
# V4087sgRNA.raw    cut.score   0.028748615100844917    0.018105634883326425    24193.116   -1.020549171992924
# GGsgRNA.raw   cut.score   0.028323663567439376    0.007262151543367551    20337.142   -1.1698805330346533
# pam.distance0 cut.score   0.018764516297564895    0.0022152333872347044   12920.775   -1.3359447252109964
# V4343sgRNA.raw    cut.score   0.017372547699812658    0.015554623512120214    16979.811   -0.9183917130469017
# V4312sgRNA.raw    cut.score   0.014410503258427519    0.011493453405739885    16654.431   -1.2281994074729736
# p13dimer_H_bondraw    cut.score   0.014158699462834634    0.005819836565006202    6787.281    -0.9978164560081595
# p11dimer_H_bondraw    cut.score   0.011419832884509533    0.005690488639647928    4136.188    -1.1907651715285015
SHAP
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate shap

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida

# python
import pandas as pd
import numpy as np
np.random.seed(0)
import matplotlib.pyplot as plt
df = pd.read_table('/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/putida.raw.onehot.tensor.single.bp.dimers.pam.location.dcast.na.corrected.txt') # Load the data
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.ensemble import RandomForestRegressor
# The target variable is 'cut.score'.
Y = df['cut.score']
# get list of features from R... dput(colnames(df))
X = df.drop(columns =['sgRNAID', 'cut.score'])

# Split the data into train and test data:
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size = 0.2)
# Build the model with the random forest regression algorithm:
model = RandomForestRegressor(max_depth=6,random_state=0,n_estimators=10)
model.fit(X_train, Y_train)

import shap
shap_values = shap.TreeExplainer(model).shap_values(X_train)
f = plt.figure()
shap.summary_plot(shap_values, X_train, plot_type="bar")
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/putida.raw.onehot.tensor.single.bp.dimers.pam.location.shap_summary_plot_bar.png", bbox_inches='tight', dpi=600)

import matplotlib.pyplot as plt
f = plt.figure()
shap.summary_plot(shap_values, X_train)
f.savefig("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/putida.onehot.tensor.single.bp.dimers.pam.location.shap_summary_plot_varimp.png", bbox_inches='tight', dpi=600)

# scp noshayjm@dtn.ccs.ornl.gov:/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/putida.onehot.tensor.single.bp.dimers.pam.location.shap_summary_plot_varimp.png /Users/27n/Dropbox (ORNL)/ORNL.Noshay/Projects/SEED/ExploratoryDataForModelGeneration/e.coli/SHAP/.

test putida data with E.coli model

#!/bin/bash -l
#BSUB -P SYB105
#BSUB -W 02:15
#BSUB -nnodes 50
#BSUB -J putida.test_0
#BSUB -o putida.test_0.o%J
#BSUB -e putida.test_0.e%J

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/ecoli.model.test
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/ecoli.model.test

/usr/bin/time -f "%e" jsrun -n 1 -a 1 -c 40 -bpacked:40 /gpfs/alpine/syb105/proj-shared/Projects/iRF/IterativeRanger/cpp_version/build/ranger --file /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/putida.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.features_overlap_noSampleIDs.txt --yfile /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/putida.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.score_overlap_noSampleIDs.txt --predict /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/e.coli.tensor.single.bp.dimers.HbondAbs/noHL/cut.score/noHL_cut.score.forest --treetype 3 --depvarname cut.score --impmeasure 1 --nthreads 160 --useMPI 0 --outprefix ecoli.model.putida.test --outputDirectory /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/ecoli.model.test > /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/ecoli.model.test/ecoli.model.putida.test.o

# bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/ecoli.model.putida.test.sh


#### test the output
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/")
score <- read.delim("putida.raw.onehot.tensor.single.bp.dimers.pam.location.HbondAbs.noHL.features_overlap_noSampleIDs.txt", header=T, sep="\t")
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/ecoli.model.test/")
predict <- read.delim("ecoli.model.putida.test.prediction", header=T, sep="\t")

score.predict <- cbind(score, predict)
cor(score.predict$cut.score, score.predict$Predictions.)
#-

15 March 2022: Quantum Matrix

  • generate final matrix with updated quantum properties (HL and H-bond) for monomer, basepair, dimer, trimer, tetramer
  • think through incorporating DNA and RNA sequence?
#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J putida.matrix
#SBATCH -N 1
#SBATCH -t 10:00:00
#SBATCH -p gpu

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida
R CMD BATCH mar15.matrix.R

#sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/mar15.matrix.sh
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
structure <- read.delim("putida.gRNA.ViennaRNA.output.value.id.txt", header=F, sep="\t", stringsAsFactors = F)
nuc <- read.delim("putida.nucleotide_counts_sgRNA_temp.txt", header=T, sep="\t", stringsAsFactors = F)
score <- read.delim("putida.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(5:7)]
colnames(score.df) <- c("sgRNAID", "cut.score", "nucleotide.sequence")

structure.df <- structure[,2]
gc.df <- nuc[,7]
temp.df <- nuc[,8]

# structure, gc, temp
structure.df <- data.frame(structure[,2])
gc.df <- data.frame(nuc[,7])
temp.df <- data.frame(nuc[,8])

structure.df$scale <- "sgRNA.raw"
gc.df$scale <- "sgRNA.raw"
temp.df$scale <- "sgRNA.raw"

structure.df$sgRNAID <- structure[,1]
gc.df$sgRNAID <- nuc[,1]
temp.df$sgRNAID <- nuc[,1]

structure.temp <- left_join(structure.df, temp.df, by=c("sgRNAID", "scale"))
structure.temp.gc <- left_join(structure.temp, gc.df, by=c("sgRNAID", "scale"))
score.structure.temp.gc <- left_join(score.df, structure.temp.gc, by=c("sgRNAID"))
colnames(score.structure.temp.gc) <- c("sgRNAID", "cut.score", "seq", "sgRNA.structure", "scale", "sgRNA.temp", "sgRNA.gc")

## add one-hot encoding of sequence
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
onehot.ind1 <- read.delim("putida_ind1.txt", header=T, sep=" ")
onehot.ind2 <- read.delim("putida_ind2.txt", header=T, sep=" ")
onehot.dep1 <- read.delim("putida_dep1.txt", header=F, sep=" ")
onehot.dep2 <- read.delim("putida_dep2.txt", header=F, sep=" ")
onehot.dep3 <- read.delim("putida_dep3.txt", header=F, sep=" ")
onehot.dep4 <- read.delim("putida_dep4.txt", header=F, sep=" ")
colnames(onehot.dep1)[1] <- "sgRNAID"
colnames(onehot.dep2)[1] <- "sgRNAID"
colnames(onehot.dep3)[1] <- "sgRNAID"
colnames(onehot.dep4)[1] <- "sgRNAID"

onehot.ind <- full_join(onehot.ind1, onehot.ind2, by="sgRNAID")
onehot.dep12 <- full_join(onehot.dep1[,1:ncol(onehot.dep1)-1], onehot.dep2[,1:ncol(onehot.dep2)-1], by="sgRNAID")
onehot.dep123 <- full_join(onehot.dep12, onehot.dep3[,1:ncol(onehot.dep3)-1], by="sgRNAID")
onehot.dep <- full_join(onehot.dep123, onehot.dep4[,1:ncol(onehot.dep4)-1], by="sgRNAID")

onehot <- full_join(onehot.ind, onehot.dep, by="sgRNAID")
onehot$scale <- "sgRNA.raw"

data.onehot <- left_join(score.structure.temp.gc, onehot, by=c("sgRNAID", "scale"))

df.melt <- melt(data.onehot[,c(1,2,4:ncol(data.onehot))], id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.id$value <- as.numeric(df.id$value)
df.id <- df.id[!(is.na(df.id$value) | df.id$value==""), ]
colnames(df.id) <- c("cut.score", "feature.scale", "sgRNAID", "value")
write.table(df.id, "putida.structure.temp.gc.onehot1to4.txt", quote=F, row.names=F, sep="\t")
# 

# pam (distance and nucleotide)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
sgRNA.pam <- read.table("putida.sgRNA.closestPAM.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.pam.sub <- sgRNA.pam[,c(4,12,13)]
colnames(sgRNA.pam.sub) <- c("sgRNAID", "pam.code", "pam.distance")
sgRNA.pam.onehot <- sgRNA.pam.sub %>% mutate(PAM.A = ifelse(pam.code == "AGG" | pam.code == "CCT", 1, 0), PAM.C = ifelse(pam.code == "CGG" | pam.code == "CCG", 1, 0), PAM.T = ifelse(pam.code == "TGG" | pam.code == "CCA", 1, 0), PAM.G = ifelse(pam.code == "GGG" | pam.code == "CCC", 1, 0))
sgRNA.pam.df <- sgRNA.pam.onehot[,c(1,3:7)]
#sgRNA.pam.df$id <- "Cas9"
#sgRNA.pam.id <- unite(sgRNA.pam.df, "sgRNAID", c(sgRNAID, id), sep="_")
sgRNA.pam.id <- sgRNA.pam.df

score <- read.delim("putida.sgRNA.coord.txt", header=T, sep="\t", stringsAsFactors = F)
score.df <- score[,c(5:6)]
colnames(score.df) <- c("sgRNAID", "cut.score")

score.location <- left_join(score.df, sgRNA.pam.id, by="sgRNAID")
score.location$scale <- 0

df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.pam.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)

df <- read.delim("putida.structure.temp.gc.onehot1to4.txt", header=T, sep="\t")
df.onehot.dcast <- df %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)

df.onehot.pam <- left_join(df.onehot.dcast, df.pam.dcast, by=c("sgRNAID"))

df.onehot.pam.na <- na.omit(df.onehot.pam)
nrow(df.onehot.pam.na)
# 


# location relative to gene
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
sgRNA.genes <- read.table("putida.sgRNA.gene.closest.bed", header=F, sep="\t", stringsAsFactors = F)
sgRNA.genes.df <- sgRNA.genes[,c(4,14)]
colnames(sgRNA.genes.df) <- c("sgRNAID", "gene.distance")
sgRNA.genes.id <- sgRNA.genes.df

score.location <- left_join(score.df, sgRNA.genes.id, by=c("sgRNAID"))
score.location$scale <- 0

df.melt <- melt(score.location, id=c("cut.score", "scale", "sgRNAID"))
df <- na.omit(df.melt)
colnames(df) <- c("cut.score", "scale", "sgRNAID", "variable", "value")

df.id <- df %>% unite(feature.scale, c(variable, scale), sep = "")
df.location.dcast <- df.id %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.location.dcast.na <- na.omit(df.location.dcast)

df.pam.location <- inner_join(df.location.dcast.na, df.onehot.pam.na, by=c("sgRNAID"))
nrow(df.pam.location)
# 

write.table(df.pam.location, "putida.raw.matrix.txt", quote=F, row.names=F, sep="\t")
# add new DNA/RNA features to data table
# salloc -A SYB105 -N 2 -t 4:00:00
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)

# Monomer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Monomer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/")
seq <- read.delim("putida.sequence.txt", header=T, sep=" ", stringsAsFactors = F)

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq) <- seq[,1]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "putida.quantum.monomer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "putida.quantum.monomer.melt.txt", quote=F, row.names=F, sep="\t")


# Basepair
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Basepair.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/")
seq <- read.delim("putida.sequence.txt", header=T, sep=" ", stringsAsFactors = F)

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq) <- seq[,1]
seq.melt <- melt(seq, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "putida.quantum.basepair.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "putida.quantum.basepair.melt.txt", quote=F, row.names=F, sep="\t")


# Dimer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Dimer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/")
seq <- read.delim("putida.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.dimer <- seq %>% unite("p1", p1:p2, remove=F, sep= "") %>% unite("p2", p2:p3, remove=F, sep= "") %>% unite("p3", p3:p4, remove=F, sep= "") %>% unite("p4", p4:p5, remove=F, sep= "") %>% unite("p5", p5:p6, remove=F, sep= "") %>% unite("p6", p6:p7, remove=F, sep= "") %>% unite("p7", p7:p8, remove=F, sep= "") %>% unite("p8", p8:p9, remove=F, sep= "") %>% unite("p9", p9:p10, remove=F, sep= "") %>% unite("p10", p10:p11, remove=F, sep= "") %>% unite("p11", p11:p12, remove=F, sep= "") %>% unite("p12", p12:p13, remove=F, sep= "") %>% unite("p13", p13:p14, remove=F, sep= "") %>% unite("p14", p14:p15, remove=F, sep= "") %>% unite("p15", p15:p16, remove=F, sep= "") %>% unite("p16", p16:p17, remove=F, sep= "") %>% unite("p17", p17:p18, remove=F, sep= "") %>% unite("p18", p18:p19, remove=F, sep= "") %>% unite("p19", p19:p20, remove=T, sep= "")

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq.dimer) <- seq.dimer[,1]
seq.df <- seq.dimer[,1:20]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "putida.quantum.dimer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "putida.quantum.dimer.melt.txt", quote=F, row.names=F, sep="\t")


# Trimer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Trimer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/")
seq <- read.delim("putida.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.trimer <- seq %>% unite("p1", p1:p3, remove=F, sep= "") %>% unite("p2", p2:p4, remove=F, sep= "") %>% unite("p3", p3:p5, remove=F, sep= "") %>% unite("p4", p4:p6, remove=F, sep= "") %>% unite("p5", p5:p7, remove=F, sep= "") %>% unite("p6", p6:p8, remove=F, sep= "") %>% unite("p7", p7:p9, remove=F, sep= "") %>% unite("p8", p8:p10, remove=F, sep= "") %>% unite("p9", p9:p11, remove=F, sep= "") %>% unite("p10", p10:p12, remove=F, sep= "") %>% unite("p11", p11:p13, remove=F, sep= "") %>% unite("p12", p12:p14, remove=F, sep= "") %>% unite("p13", p13:p15, remove=F, sep= "") %>% unite("p14", p14:p16, remove=F, sep= "") %>% unite("p15", p15:p17, remove=F, sep= "") %>% unite("p16", p16:p18, remove=F, sep= "") %>% unite("p17", p17:p19, remove=F, sep= "") %>% unite("p18", p18:p20, remove=F, sep= "")

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq.trimer) <- seq.trimer[,1]
seq.df <- seq.trimer[,1:19]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "putida.quantum.trimer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "putida.quantum.trimer.melt.txt", quote=F, row.names=F, sep="\t")


# Tetramer
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/")
tensor <- read.delim("HL.Bond.Tetramer.txt", header=T, sep="\t", stringsAsFactors = F)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/")
seq <- read.delim("putida.sequence.txt", header=T, sep=" ", stringsAsFactors = F)
seq.tetramer <- seq %>% unite("p1", p1:p4, remove=F, sep= "") %>% unite("p2", p2:p5, remove=F, sep= "") %>% unite("p3", p3:p6, remove=F, sep= "") %>% unite("p4", p4:p7, remove=F, sep= "") %>% unite("p5", p5:p8, remove=F, sep= "") %>% unite("p6", p6:p9, remove=F, sep= "") %>% unite("p7", p7:p10, remove=F, sep= "") %>% unite("p8", p8:p11, remove=F, sep= "") %>% unite("p9", p9:p12, remove=F, sep= "") %>% unite("p10", p10:p13, remove=F, sep= "") %>% unite("p11", p11:p14, remove=F, sep= "") %>% unite("p12", p12:p15, remove=F, sep= "") %>% unite("p13", p13:p16, remove=F, sep= "") %>% unite("p14", p14:p17, remove=F, sep= "") %>% unite("p15", p15:p18, remove=F, sep= "") %>% unite("p16", p16:p19, remove=F, sep= "") %>% unite("p17", p17:p20, remove=F, sep= "") 

tensor.features <- tensor[,1]
rownames(tensor) <- tensor[,1]
tensor.df <- tensor[,2:ncol(tensor)]
tensor.t <- as.data.frame(t(tensor.df))
tensor.t$base <- names(tensor[,2:ncol(tensor)])

rownames(seq.tetramer) <- seq.tetramer[,1]
seq.df <- seq.tetramer[,1:18]
seq.melt <- melt(seq.df, id="sgRNAID")
colnames(seq.melt) <- c("sgRNAID", "position", "base")

seq.tensor <- left_join(seq.melt, tensor.t, by="base")
seq.tensor.melt <- melt(seq.tensor, id=c("sgRNAID", "position", "base"))
seq.tensor.dcast <- dcast(seq.tensor.melt, sgRNAID ~ position + variable, value.var="value")

write.table(seq.tensor.dcast, "putida.quantum.tetramer.txt", quote=F, row.names=F, sep="\t")
write.table(seq.tensor.melt, "putida.quantum.tetramer.melt.txt", quote=F, row.names=F, sep="\t")



setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/")
monomer <- read.delim("putida.quantum.monomer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
basepair <- read.delim("putida.quantum.basepair.melt.txt", header=T, sep="\t", stringsAsFactors = F)
dimer <- read.delim("putida.quantum.dimer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
trimer <- read.delim("putida.quantum.trimer.melt.txt", header=T, sep="\t", stringsAsFactors = F)
tetramer <- read.delim("putida.quantum.tetramer.melt.txt", header=T, sep="\t", stringsAsFactors = F)

monomer.basepair <- rbind(monomer, basepair)
monomer.basepair.dimer <- rbind(monomer.basepair, dimer)
monomer.basepair.dimer.trimer <- rbind(monomer.basepair.dimer, trimer)
monomer.basepair.dimer.trimer.tetramer <- rbind(monomer.basepair.dimer.trimer, tetramer)
write.table(monomer.basepair.dimer.trimer.tetramer, "putida.15mar22.quantum.melt.txt", quote=F, row.names=F, sep="\t")
# salloc -A SYB105 -N 2 -t 4:00:00 -p gpu

source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test
R

library(dplyr)
library(reshape2)
library(tidyr)

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
df <- read.delim("putida.raw.matrix.txt", header=T, sep="\t", stringsAsFactors = F)

# quantum chemical tensors
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
tensor <- read.delim("putida.15mar22.quantum.melt.txt", header=T, sep="\t")
tensor[is.na(tensor)] <- 0

tensor$scale <- "raw"
tensor.id <- tensor %>% unite(feature.scale, c(position, variable, scale), sep = "")
tensor.id$value <- as.numeric(tensor.id$value)
tensor.id[is.na(tensor.id)] <- 0

df.score <- unique(df[,c(1,2)])
tensor.score <- inner_join(tensor.id, df.score, by="sgRNAID")
tensor.score.order <- tensor.score[,c(5,2,1,4)]
colnames(tensor.score.order) <- c("cut.score", "feature.scale", "sgRNAID", "value")

df.dcast <- tensor.score.order %>% dcast(sgRNAID + cut.score ~ feature.scale, value.var = "value", fun.aggregate=mean, na.rm=TRUE)
df.dcast.na <- na.omit(df.dcast)
nrow(df.dcast.na)
# 27345

df.location <- inner_join(df, df.dcast.na, by=c("sgRNAID"))
write.table(df.location, "putida.finalquantum.txt", quote=F, row.names=F, sep="\t")
library(dplyr)
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida")
df <- read.delim("putida.finalquantum.txt", header=T, sep="\t", stringsAsFactors = F)
names(df)[names(df) == 'cut.score.x'] <- 'cut.score'
df.cut <- df %>% select(-grep("cut.score.y.y", names(df)), -grep("cut.score.y", names(df)), -grep("cut.score.x.x", names(df))) 
df.num <- mutate_all(df.cut[,2:ncol(df.cut)], function(x) as.numeric(as.character(x)))
df.all <- cbind(df.cut[,1], df.num)
colnames(df.all)[1] <- "sgRNAID"

write.table(df.all, "putida.finalquantum.txt", quote=F, row.names=F, sep="\t")

write.table(df.all[,c(1,3:ncol(df.all))], "putida.finalquantum.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,c(1,3:ncol(df.all))], "putida.finalquantum.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,3:ncol(df.all)], "putida.finalquantum.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "putida.finalquantum.score.txt", quote=F, row.names=F, sep="\t")
write.table(df.all[,1:2], "putida.finalquantum.score_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(data.frame("cut.score" = df.all[,2]), "putida.finalquantum.score_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")


# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName putida.finalquantum --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/putida.finalquantum.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/putida.finalquantum.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum/Submits/submit_full_putida.finalquantum_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum/Submits/submit_train_putida.finalquantum_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum/Submits/submit_test_putida.finalquantum_0.sh

# Andes
module load python/3.7-anaconda3

vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/YNames.txt putida.finalquantum
# 0.2497100288038237

sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/putida.finalquantum_cut.score.importance4 | head
# sgRNA.structuresgRNA.raw: 22369.2
# sgRNA.tempsgRNA.raw: 4369.7
# sgRNA.gcsgRNA.raw: 3938.45
# V4087sgRNA.raw: 2280.42
# GGsgRNA.raw: 1981.13
# p11tetramer.Hbond.energyraw: 1816.39
# V4343sgRNA.raw: 1530.88
# p13tetramer.Hbond.energyraw: 1508.66
# p17tetramer.Hlgap.eVEraw: 1142.82
# p7tetramer.Hbond.energyraw: 1131.84


# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("putida.finalquantum_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.5215119
RIT
# salloc -A SYB105 -p gpu -N 1 -t 4:00:00

#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0
#SBATCH -p gpu

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score putida.finalquantum

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum/cut.score/RIT.run

# sgRNA.structuresgRNA.raw  cut.score   0.23350755818800148 -0.003092209374979174   62056.212   -0.7346448352143302
# sgRNA.gcsgRNA.raw cut.score   0.04289250555917792 0.0036930635871796277   28767.83    -1.2115198992908018
# sgRNA.tempsgRNA.raw   cut.score   0.03905222514980143 0.0038157350050903737   26623.295   -0.9897214396709659
# V4087sgRNA.raw    cut.score   0.025792547076915376    0.010823889309003597    23344.273   -1.06989025798672
# p11tetramer.Hbond.energyraw   cut.score   0.019625679832632137    0.001709177788478303    8774.493    -0.9065756264264376
# GGsgRNA.raw   cut.score   0.01851639846123979 0.002251535744183914    14006.253   -0.821462737942859
# V4343sgRNA.raw    cut.score   0.015418165378914536    0.007039166873850421    14269.541   -0.9449242024857153
# p7tetramer.Hbond.energyraw    cut.score   0.013642788627073781    -0.0007493166151327098  7707.85 -1.1241960466240646
# p13tetramer.Hbond.energyraw   cut.score   0.013434957932850747    0.002999438556682073    6821.335    -1.2045936711515992
# p17tetramer.Hlgap.eVEraw  cut.score   0.01259884826856768 -3.894005154055647e-05  6126.836    -1.0959472886330697
Figures
library(ggplot2)
library(reshape2)
library(RColorBrewer)

# Figure 3A
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum/cut.score")
imp <- read.delim("putida.finalquantum.importance4.effect_sorted", header=F, sep="\t", stringsAsFactors = F)
colnames(imp) <- c("Feature", "YVec", "NormEdge", "Effect", "Samples", "Linearity")
imp$Normalized.Importance <- as.numeric(imp$NormEdge)
imp$Feature.Effect <- as.numeric(imp$Effect)
imp$SampleCount <- as.numeric(imp$Samples)
imp.dir <- imp %>% mutate(Effect.Direction = ifelse(Feature.Effect < 0, "neg", ifelse(Feature.Effect > 0, "pos", "zero")))
imp.dir.top20 <- imp.dir[1:20,]

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/e.coli/iRF.run/iRF.R2_foldResults_pdf")
pdf("putida.Imp.Dir.Top20.21March.pdf")
ggplot(imp.dir.top20) + geom_bar(aes(x=reorder(Feature, -Normalized.Importance), y=Normalized.Importance, fill=Effect.Direction), stat="identity") + theme_classic() + xlab("putida Top Features") + ylab("Normalized Importance") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_fill_brewer(palette="Set1")
dev.off()

pdf("putida.Imp.Dir.Top20.Effect.21March.pdf")
imp.dir.top20$Sample.Prop <- imp.dir.top20$SampleCount/32374
ggplot(imp.dir.top20, aes(x=reorder(Feature, -Normalized.Importance))) + geom_point(aes(y=Sample.Prop, color=Effect.Direction, size=Feature.Effect)) + xlab("putida") + ylab("Avg Proportion of Samples that Features Influence") + theme_minimal() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) + scale_color_brewer(palette="Set2")
dev.off()

remove highly correlated

# R
source /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/scripts/loadcondaandes.sh
conda activate /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/Libraries/andes/envs/test

setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/")
df <- read.delim("putida.finalquantum.txt", header=T, sep="\t")
df.rm <- df %>% select(-grep("basepair.Hlgap.eVEraw", names(df)), -grep("dimer.Hbond.energyraw", names(df)), -grep("trimer.Hbond.energyraw", names(df)), -grep("tetramer.Hbond.energyraw", names(df))) 
# 6160

write.table(df.rm, "putida.finalquantum.noncorrelated.txt", quote=F, row.names=F, sep="\t")
write.table(df.rm[,c(1,3:ncol(df.rm))], "putida.finalquantum.noncorrelated.features.txt", quote=F, row.names=F, sep="\t")
write.table(df.rm[,c(1,3:ncol(df.rm))], "putida.finalquantum.noncorrelated.features_overlap.txt", quote=F, row.names=F, sep="\t")
write.table(df.rm[,3:ncol(df.rm)], "putida.finalquantum.noncorrelated.features_overlap_noSampleIDs.txt", quote=F, row.names=F, sep="\t")


# run python scripts on Andes
# run job submissions on Summit

# Builder script: /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py
# [python iRF_LOOP_SetUp_CrossLayer.py --DataFile --YFile --System Summit --NodesPer 1 --TotalNodes 10 --RunTime 2 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName iRF.XX --bypass --Prediction]

# Andes
module load python/3.7-anaconda3

mkdir /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum.noncorrelated
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum.noncorrelated
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_LOOP_SetUp_CrossLayer.py --System Summit --NodesPer 1 --TotalNodes 50 --RunTime 90 --Account SYB105 --NumTrees 1000 --NumIterations 5 --RunName putida.finalquantum.noncorrelated --bypass --targetNodeSize 50 --Prediction /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/putida.finalquantum.noncorrelated.features.txt /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/putida.finalquantum.score.txt

# Summit
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum.noncorrelated
module load python/3.7.0-anaconda3-5.3.0

# full
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum.noncorrelated/Submits/submit_full_putida.finalquantum.noncorrelated_0.sh
# train 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum.noncorrelated/Submits/submit_train_putida.finalquantum.noncorrelated_0.sh
# once the train submissions are done run the test submissions
# test 
bsub /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum.noncorrelated/Submits/submit_test_putida.finalquantum.noncorrelated_0.sh

# Andes
module load python/3.7-anaconda3

vim /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/YNames.txt
# cut.score
cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum.noncorrelated
python /gpfs/alpine/syb105/proj-shared/Projects/iRF/iRF_postProcessing.py --Iterations 5 --Prediction --PredAccuracy MAE,MAEA,MSE,R2 --varTot 95 /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/YNames.txt putida.finalquantum.noncorrelated
# 0.24496076349534782

sort -k2rg cut.score/foldRuns/fold9/Runs/Set4/putida.finalquantum.noncorrelated_cut.score.importance4 | head
# sgRNA.structuresgRNA.raw: 22226
# sgRNA.tempsgRNA.raw: 4397.14
# sgRNA.gcsgRNA.raw: 4086.12
# V4087sgRNA.raw: 2340.2
# GGsgRNA.raw: 2080.96
# V4343sgRNA.raw: 1587.08
# p14tetramer.Hbond.stackingraw: 1510.35
# p13tetramer.Hbond.stackingraw: 1364.15
# p9tetramer.Hbond.stackingraw: 1349.97
# p17tetramer.Hlgap.eVEraw: 1326.07


# pearson correlation
setwd("/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum.noncorrelated/cut.score/foldRuns/fold9/Runs/Set4")
pred <- read.delim("putida.finalquantum.noncorrelated_Set4_test.prediction", header=T, sep="\t")
y <- read.delim("set4_Y_test_noSampleIDs.txt", header=T, sep="\t")
cor(y$cut.score, pred$Predictions.)
# 0.5155685


##### RIT:

#!/bin/bash
#SBATCH -A SYB105
#SBATCH -J RIT.run
#SBATCH -N 2
#SBATCH -t 48:00:00
#SBATCH --mem-per-cpu=0

cd /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum.noncorrelated/cut.score
/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/runRIT.sh cut.score putida.finalquantum.noncorrelated

# sbatch /gpfs/alpine/syb105/proj-shared/Personal/noshayjm/projects/seed/putida/iRF.run/putida.finalquantum.noncorrelated/cut.score/RIT.run

# sgRNA.structuresgRNA.raw  cut.score   0.23132463786518545 -0.011995223541752633   62023.052   -0.6854654563328648
# sgRNA.gcsgRNA.raw cut.score   0.04192840688619521 0.006112590311440782    28058.078   -0.8019374059558255
# sgRNA.tempsgRNA.raw   cut.score   0.0413068447610841  0.007020801584949681    28738.998   -0.8431022877243979
# V4087sgRNA.raw    cut.score   0.02560023664147605 0.01811197791757538 23371.355   -0.8178490559922933
# GGsgRNA.raw   cut.score   0.020047269899403895    0.009717129675941157    16346.393   -1.1170746632784563
# V4343sgRNA.raw    cut.score   0.015044749974598627    0.01881497930109692 15797.125   -0.8970033415493099
# p14tetramer.Hbond.stackingraw cut.score   0.015038379062361742    0.0032346566223878325   9702.037    -1.1723079521358957
# p12tetramer.Hbond.stackingraw cut.score   0.014996569950807183    0.0014439501079288695   7482.037    -1.026837369152886
# p7tetramer.Hbond.stackingraw  cut.score   0.014917729911875727    -0.0012121146589381287  8812.361    -1.2274738338024733
# p17tetramer.Hlgap.eVEraw  cut.score   0.014278090323292447    0.0016227017955795773   6678.93 -1.1505361898914646

Bacterial investigation

Putida & E.coli - look into top features in both models - distribution of feature values - generate multi-species model - distribution of cutting efficiency scores

** Work in Jupyter notebook with r4environment connection [/gpfs/alpine/syb105/proj-shared/Personal/noshayjm/bacterial.sgRNA.iRF.ipynb]

Potential References:

https://academic.oup.com/nar/article/46/14/7052/5047272#120184448 https://www.biorxiv.org/content/10.1101/2021.09.14.460134v1.full.pdf https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0227994 https://www.science.org/lookup/doi/10.1126/science.aad5227 https://github.com/bm2-lab/iGWOS https://github.com/maximilianh/crisporWebsite http://www.ams.sunysb.edu/~pfkuan/predictSGRNA/demopredictSGRNA_1.0.1.pdf https://www.chemistryworld.com/news/machine-learning-accurately-predicts-rna-structures-using-tiny-dataset/4014347.article https://www.biorxiv.org/content/10.1101/605790v1.full https://science.sciencemag.org/content/373/6558/964.full https://www.nature.com/articles/s41467-021-23576-0 https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-020-3395-z https://www.future-science.com/doi/full/10.2144/btn-2018-0187 https://www.nature.com/articles/s41467-021-23576-0 https://onlinelibrary.wiley.com/doi/epdf/10.1002/advs.201902312 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5795621/ https://www.nature.com/articles/nbt.3026 https://www.nature.com/articles/s42003-020-01452-9#MOESM4 https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1697-6 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6921152/ https://www.embopress.org/doi/full/10.15252/embj.201899466 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4338555/ https://www.embopress.org/doi/full/10.15252/embj.201899466 - paper that shows that tandem PAMs affects Cas9 binding to target https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-019-3151-4#Sec2 https://www.sciencedirect.com/science/article/pii/S2001037021000738#s0010 - CNN model with only sgRNA sequence input https://www.sciencedirect.com/science/article/pii/S2001037019303551 - Summary of current methods https://pubmed.ncbi.nlm.nih.gov/30988204/ - integrating the energetics of R-loop formation under Cas9 binding, the effect of the protospacer adjacent motif sequence, and the folding stability of the whole single guide RNA, we devised a unified, physical model that can apply to any cleavage-activity dataset. https://www.biorxiv.org/content/10.1101/269910v1.full.pdf - “inefficient RNAs have a significantly higher average melting temperature than efficient ones” Temperature of Melting (Tm) is defined as the temperature at which 50% of double stranded DNA is changed to single-stranded DNA. The higher the melting temperature the greater the guanine-cytosine (GC) content of the DNA. Formula: Tm = 2 °C(A + T) + 4 °C(G + C) = °C Tm.

- https://www.cambridge.org/core/journals/quarterly-reviews-of-biophysics/article/key-role-of-the-rec-lobe-during-crisprcas9-activation-by-sensing-regulating-and-locking-the-catalytic-hnh-domain/DD8DCCAC11DC69C73C9B2AEB15E4B656

- https://pubs.acs.org/doi/10.1021/jacs.7b13047

Mechanism Literature Notes

https://pubs.acs.org/doi/10.1021/jacs.7b13047 “Collectively, the current understanding of RNA-guided DNA targeting and cleavage by Cas9 involves (1) sgRNA binding to elicit an active Cas9 conformation, (9, 10) (2) PAM recognition, (8) (3) local DNA duplex unwinding and RNA strand invasion, (4) complete directional unwinding of the DNA to form the RNA-DNA heteroduplex, (11) and (5) coupled conformational changes within Cas9 necessary for subsequent DNA cleavage. (3, 12)” “Our results directly show that HNH cleaves faster than RuvC (Figure 7A,B) and therefore expand this hypothesis to suggest that conformational activation of the HNH domain is a prerequisite for cleavage of tDNA and ntDNA, thereby controlling when (i.e., timing) DNA cleavage occurs. However, it is the slow rate (0.37 s–1, k5b, Figure 2A) for RuvC isomerization which limits the overall rate of double-stranded DNA cleavage from the pre-formed ternary complex.” “This result corroborates the findings from the multiple-turnover kinetic assays (Figure 8) indicating that DNA product release is the slowest mechanistic step (k7, Figure 2A) and also demonstrates that Cas9 can remain tightly bound to even large DNA products for a substantial amount of time following double-stranded DNA cleavage.“

https://www.nature.com/articles/nature13579 “Our structural observations suggest that the interaction between the target DNA strand and the phosphate lock loop might stabilize target DNA immediately upstream of the PAM in an unwound conformation, thereby linking PAM recognition with local strand separation.“ “Together, these structures reveal that even in the absence of compensatory base pairing to the guide RNA, target DNA binding by Cas9–RNA results in local strand separation immediately upstream of the PAM. Importantly, the interaction of the +1 phosphate with the phosphate lock loop is maintained in both structures, supporting the hypothesis that the loop contributes to stabilizing the target DNA strand in the unwound state.”

https://www.annualreviews.org/doi/10.1146/annurev-biophys-062215-010822 “Mismatches in this seed region severely impair or completely abrogate target DNA binding and cleavage, whereas close homology in the seed region often leads to off-target binding events even with many mismatches elsewhere (78).” “The most prominent conformational change takes place in the REC lobe, in particular Hel-III, which moves ∼65 Å toward the HNH domain upon sgRNA binding. In contrast, Cas9 exhibits much smaller conformational changes upon binding to target DNA and PAM sequence (Figure 5), which indicates that the majority of the extensive structural rearrangements occur prior to target DNA binding, reinforcing the notion that guide RNA loading is a key regulator of Cas9 enzyme function” “Once Cas9 has found a target site with the appropriate PAM, it triggers local DNA melting at the PAM-adjacent nucleation site, followed by RNA strand invasion to form an RNA–DNA hybrid and a displaced DNA strand (termed R-loop) from PAM-proximal to PAM-distal ends (94, 96). Perfect complementarity between the seed region of sgRNA and target DNA is necessary for Cas9-mediated DNA targeting and cleavage, whereas imperfect base pairing at the nonseed region is much more tolerated for target binding specificity” “ In the PAM duplex–bound structure (Figure 5d), a sharp kink turn is observed in the target strand immediately upstream of the PAM, with the connecting phosphodiester group (referred to as +1 phosphate) stabilized by a phosphate lock loop (K1107–S1109) located in the PAM-interacting CTD domain (3). Such a kink-turn configuration is necessary for driving the target stand DNA to transition from pairing with the nontarget strand to pairing with the guide RNA” “As observed in the PAM duplex–bound structure (3), the unwound target DNA strand kinks at the +1 phosphodiester linkage and then pairs with the spacer region to form a pseudo-A-form RNA–DNA hybrid. In contrast to the target strand, which runs the length of the central channel formed between the two Cas9 lobes, the displaced nontarget DNA strand threads into a tight side tunnel located within the NUC lobe”

https://www.science.org/doi/10.1126/sciadv.abe5496 “Across all sgRNAs, most RNA:DNA mismatches or bulges had small effects on final fraction bound (Fig. 2C and table S4). Single RNA:DNA mismatches had particularly modest impact, generally only visible in first seven positions of the seed. Curiously, the presence of multiple distal mismatches slightly increased the final fraction bound for many sgRNAs (Fig. 2A). Recent single-molecule studies suggest that distal mismatches decrease the fraction of RNP:target complexes in an unwound state even while stably bound (10, 28), which could correspond to differences in complex stability or adherence to nitrocellulose. We also, we observed that the sensitivity of a target to perturbation (as ordered in Fig. 2A) inversely correlated with the number of internal PAMs contained within the target sequence (Spearman R = −0.31, P = 0.01).” “Previous work has shown that cleavage is much more sensitive to imperfect matches than is binding (23) due to a conformational change required for target DNA cleavage (7, 31, 32). Our data are consistent with these findings. Across all sgRNAs, more than 85% of targets with 17 bp of complementarity exhibited detectable cleavage (Fig. 3F). Additional mismatches substantially decreased the fraction of targets cleaved: 38% of targets with 16 bp of complementarity exhibited cleavage below the threshold of detection, as did 62% of targets with 15 bp of complementarity” “The fit parameters indicate that the presence of a G at the nearest 3′ position (NGGG-extended PAM) slows association, in this case by 27% (table S9). However, as suggested by an analysis of CRISPRi/a data (22), an extended PAM consisting of a 3′ CC (NGGCC) slowed the association rate even more. When combined with an additional 3′ C (NGGCCC), the model predicted over a twofold drop in association rate, more than double the reduction predicted for an NGGG-extended PAM.” “Across context variants of all guides, association rates were typically the slowest to targets containing a G at the nearest 3′ base, consistent with an NGGH-extended PAM motif for achieving the most rapid association” “Our maximal productive binding measurement instead appears to align with the conventional understanding of Cas9 targets, which have an 8- to 10-bp seed region that is sensitive to disruption, an 8- to 11-bp PAM-distal region that is largely resilient, and an intermediate zone sensitive to large perturbations’

https://www.frontiersin.org/articles/10.3389/fmolb.2021.653262/full https://www.sciencedirect.com/science/article/pii/S0092867414001561?via%3Dihub#fig2 “Here, we report the crystal structure of Streptococcus pyogenes Cas9 in complex with sgRNA and its target DNA at 2.5 Å resolution. The structure revealed a bilobed architecture composed of target recognition and nuclease lobes, accommodating the sgRNA:DNA heteroduplex in a positively charged groove at their interface. Whereas the recognition lobe is essential for binding sgRNA and DNA, the nuclease lobe contains the HNH and RuvC nuclease domains, which are properly positioned for cleavage of the complementary and noncomplementary strands of the target DNA, respectively. The nuclease lobe also contains a carboxyl-terminal domain responsible for the interaction with the protospacer adjacent motif (PAM). “ “These observations suggested that the 3′-NCC-5′ sequence complementary to the 5′-NGG-3′ PAM is not recognized by Cas9 and are consistent with previous biochemical data showing that Cas9-catalyzed DNA cleavage requires the 5′-NGG-3′ PAM on the noncomplementary strand, but not the 3′-NCC-5′ sequence on the complementary strand (Jinek et al., 2012).” “The backbone phosphate groups of the guide region (nucleotides 2, 4–6, and 13–20) interact with the REC1 domain (Arg165, Gly166, Arg403, Asn407, Lys510, Tyr515, and Arg661) and the bridge helix (Arg63, Arg66, Arg70, Arg71, Arg74, and Arg78)” “The sgRNA guide region is recognized by Cas9 in a sequence-independent manner, except for the U16-Arg447 and G18-Arg71 interactions (Figures 5 and 6A). This base-specific G18-Arg71 interaction may partly explain the observed preference of Cas9 for sgRNAs with guanines in the four PAM-proximal guide regions (Wang et al., 2014).”

https://pubs.acs.org/doi/10.1021/jacs.7b13047

–> Description of DWTs removed from manuscript draft (keeping for potential future use):

Discrete Wavelet Transformations: A discrete wavelet transformation (21 scales) was done on several features calculated for every 20bp sliding window of the genome. These transformations included features such as GATC motif density, gene density, GC content, PAM site density, IPD ratio, MFE (ViennaRNA), and temperature of melting (Tm) through a combination of counts, calculations, and motif searches. A fasta file was generated based on the genome assembly using the bedtools makewindows command with -w 20 and -s 1 indicating a window size of 20 sliding every base pair. This file was then used to calculate the feature values for each window. Each feature was assessed individually by generating a vector of the calculated values for each window. The vector went through a HAAR transformation using the R package wmtsa function wavMODWT. The resulting HAAR wavelet value corresponding to each scale for every 20bp sliding window of the genome was extracted. These transformed values were compiled across all features for each sgRNA resulting in an additional 184 features.

R package: wmtsa 21 scales per feature per guide 20bp sliding windows Features GC content Temperature of melting RNA structure (ViennaRNA MFE) Gene density Gene expression (RNA-seq GEO: GSM2267479) ?? GATCF motif density ?? IPD ratio (GEO: GSM3264688) ??