Apenas uma forma de transformar o output do kofamkoala/ghostkoala/blastkoala
head hadza_kofamkoala_kos.txt
## K00287
## K00547
## K00640
## K00986
## K01153
## K01154
## K01174
## K01185
## K01462
## K02010
DB tabular das Categorias Kegg foi gerado seguindo esse passo a passo: http://merenlab.org/2018/01/17/importing-ghostkoala-annotations/
head KO_Orthology_ko00001.txt
## 09100 Metabolism 09101 Carbohydrate metabolism 00010 Glycolysis / Gluconeogenesis [PATH:ko00010] K00844 HK; hexokinase [EC:2.7.1.1]
## 09100 Metabolism 09101 Carbohydrate metabolism 00010 Glycolysis / Gluconeogenesis [PATH:ko00010] K12407 GCK; glucokinase [EC:2.7.1.2]
## 09100 Metabolism 09101 Carbohydrate metabolism 00010 Glycolysis / Gluconeogenesis [PATH:ko00010] K00845 glk; glucokinase [EC:2.7.1.2]
## 09100 Metabolism 09101 Carbohydrate metabolism 00010 Glycolysis / Gluconeogenesis [PATH:ko00010] K01810 GPI, pgi; glucose-6-phosphate isomerase [EC:5.3.1.9]
## 09100 Metabolism 09101 Carbohydrate metabolism 00010 Glycolysis / Gluconeogenesis [PATH:ko00010] K06859 pgi1; glucose-6-phosphate isomerase, archaeal [EC:5.3.1.9]
## 09100 Metabolism 09101 Carbohydrate metabolism 00010 Glycolysis / Gluconeogenesis [PATH:ko00010] K13810 tal-pgi; transaldolase / glucose-6-phosphate isomerase [EC:2.2.1.2 5.3.1.9]
## 09100 Metabolism 09101 Carbohydrate metabolism 00010 Glycolysis / Gluconeogenesis [PATH:ko00010] K15916 pgi-pmi; glucose/mannose-6-phosphate isomerase [EC:5.3.1.9 5.3.1.8]
## 09100 Metabolism 09101 Carbohydrate metabolism 00010 Glycolysis / Gluconeogenesis [PATH:ko00010] K24182 PFK9; 6-phosphofructokinase [EC:2.7.1.11]
## 09100 Metabolism 09101 Carbohydrate metabolism 00010 Glycolysis / Gluconeogenesis [PATH:ko00010] K00850 pfkA, PFK; 6-phosphofructokinase 1 [EC:2.7.1.11]
## 09100 Metabolism 09101 Carbohydrate metabolism 00010 Glycolysis / Gluconeogenesis [PATH:ko00010] K16370 pfkB; 6-phosphofructokinase 2 [EC:2.7.1.11]
#criar db apenas com os KOs encontrados
grep -f hadza_kofamkoala_kos.txt KO_Orthology_ko00001.txt > hadza_kofamkoala_list2.txt
#remover numeros e colocar tab apos os KOnumber
sed 's/^[0-9]* //g' hadza_kofamkoala_list2.txt | sed 's/\t[0-9]* /\t/g' | sed 's/ /\t/g' > hadza_kofamkoala_list2_edited.txt
head hadza_kofamkoala_list.txt
## K00287 2
## K00547 1
## K00640 1
## K00986 3
## K01153 1
## K01154 1
## K01174 1
## K01185 1
## K01462 1
## K02010 1
#juntar tabela de counts dos KOs e db
join -1 4 -2 1 -a 2 -t $'\t' <(sort -k 4 hadza_kofamkoala_list2_edited.txt) <(sort -k 1 hadza_kofamkoala_list.txt) > hadza_kofamkoala_final.txt
Level1:
#pegar o level1
awk '{print $3"\t"$6}' hadza_kofamkoala_final.txt | sort -k1 > hadza_kofamkoala_final_level1.txt
#somar os counts dos repetidos
awk '{ seen[$1] += $2 } END { for (i in seen) print i"\t"seen[i] }' hadza_kofamkoala_final_level1.txt | sort > hadza_kofamkoala_final_level1_sum.txt
#transformar os counts em abundancia relativa
awk 'FNR==NR{s+=$2;next;} {printf "%s\t%s\n",$1,100*$2/s}' hadza_kofamkoala_final_level1_sum.txt hadza_kofamkoala_final_level1_sum.txt > hadza_kofamkoala_final_level1_abrel.txt
Level2:
#pegar o level2
awk '{print $3"\t"$4"\t"$6}' hadza_kofamkoala_final.txt | sort -k1,2 > hadza_kofamkoala_final_level2.txt
#somar os counts
awk '{ seen[$1$2] += $3 } END { for (i in seen) print i"\t"seen[i] }' hadza_kofamkoala_final_level2.txt | sort > hadza_kofamkoala_final_level2_sum.txt
#tornar ab relativa
awk 'FNR==NR{s+=$3;next;} {printf "%s\t%s\t%s\n",$1,$2,100*$3/s}' hadza_kofamkoala_final_level2_sum.txt hadza_kofamkoala_final_level2_sum.txt > hadza_kofamkoala_final_level2_abrel.txt