Apenas uma forma de transformar o output do kofamkoala/ghostkoala/blastkoala

head hadza_kofamkoala_kos.txt
## K00287
## K00547
## K00640
## K00986
## K01153
## K01154
## K01174
## K01185
## K01462
## K02010

DB tabular das Categorias Kegg foi gerado seguindo esse passo a passo: http://merenlab.org/2018/01/17/importing-ghostkoala-annotations/

head KO_Orthology_ko00001.txt
## 09100 Metabolism 09101 Carbohydrate metabolism   00010 Glycolysis / Gluconeogenesis [PATH:ko00010]   K00844  HK; hexokinase [EC:2.7.1.1]
## 09100 Metabolism 09101 Carbohydrate metabolism   00010 Glycolysis / Gluconeogenesis [PATH:ko00010]   K12407  GCK; glucokinase [EC:2.7.1.2]
## 09100 Metabolism 09101 Carbohydrate metabolism   00010 Glycolysis / Gluconeogenesis [PATH:ko00010]   K00845  glk; glucokinase [EC:2.7.1.2]
## 09100 Metabolism 09101 Carbohydrate metabolism   00010 Glycolysis / Gluconeogenesis [PATH:ko00010]   K01810  GPI, pgi; glucose-6-phosphate isomerase [EC:5.3.1.9]
## 09100 Metabolism 09101 Carbohydrate metabolism   00010 Glycolysis / Gluconeogenesis [PATH:ko00010]   K06859  pgi1; glucose-6-phosphate isomerase, archaeal [EC:5.3.1.9]
## 09100 Metabolism 09101 Carbohydrate metabolism   00010 Glycolysis / Gluconeogenesis [PATH:ko00010]   K13810  tal-pgi; transaldolase / glucose-6-phosphate isomerase [EC:2.2.1.2 5.3.1.9]
## 09100 Metabolism 09101 Carbohydrate metabolism   00010 Glycolysis / Gluconeogenesis [PATH:ko00010]   K15916  pgi-pmi; glucose/mannose-6-phosphate isomerase [EC:5.3.1.9 5.3.1.8]
## 09100 Metabolism 09101 Carbohydrate metabolism   00010 Glycolysis / Gluconeogenesis [PATH:ko00010]   K24182  PFK9; 6-phosphofructokinase [EC:2.7.1.11]
## 09100 Metabolism 09101 Carbohydrate metabolism   00010 Glycolysis / Gluconeogenesis [PATH:ko00010]   K00850  pfkA, PFK; 6-phosphofructokinase 1 [EC:2.7.1.11]
## 09100 Metabolism 09101 Carbohydrate metabolism   00010 Glycolysis / Gluconeogenesis [PATH:ko00010]   K16370  pfkB; 6-phosphofructokinase 2 [EC:2.7.1.11]
#criar db apenas com os KOs encontrados
grep -f hadza_kofamkoala_kos.txt KO_Orthology_ko00001.txt > hadza_kofamkoala_list2.txt   

#remover numeros e colocar tab apos os KOnumber
sed 's/^[0-9]* //g' hadza_kofamkoala_list2.txt | sed 's/\t[0-9]* /\t/g'  | sed 's/  /\t/g' > hadza_kofamkoala_list2_edited.txt
head hadza_kofamkoala_list.txt
## K00287   2
## K00547   1
## K00640   1
## K00986   3
## K01153   1
## K01154   1
## K01174   1
## K01185   1
## K01462   1
## K02010   1
#juntar tabela de counts dos KOs e db
join -1 4 -2 1 -a 2 -t $'\t' <(sort -k 4 hadza_kofamkoala_list2_edited.txt) <(sort -k 1 hadza_kofamkoala_list.txt) > hadza_kofamkoala_final.txt

Level1:

#pegar o level1 
awk '{print $3"\t"$6}' hadza_kofamkoala_final.txt | sort -k1 > hadza_kofamkoala_final_level1.txt

#somar os counts dos repetidos
awk '{ seen[$1] += $2 } END { for (i in seen) print i"\t"seen[i] }' hadza_kofamkoala_final_level1.txt | sort > hadza_kofamkoala_final_level1_sum.txt 

#transformar os counts em abundancia relativa
awk 'FNR==NR{s+=$2;next;} {printf "%s\t%s\n",$1,100*$2/s}'  hadza_kofamkoala_final_level1_sum.txt hadza_kofamkoala_final_level1_sum.txt > hadza_kofamkoala_final_level1_abrel.txt

Level2:

#pegar o level2
awk '{print $3"\t"$4"\t"$6}' hadza_kofamkoala_final.txt | sort -k1,2 > hadza_kofamkoala_final_level2.txt

#somar os counts
awk '{ seen[$1$2] += $3 } END { for (i in seen) print i"\t"seen[i] }' hadza_kofamkoala_final_level2.txt | sort > hadza_kofamkoala_final_level2_sum.txt

#tornar ab relativa
awk 'FNR==NR{s+=$3;next;} {printf "%s\t%s\t%s\n",$1,$2,100*$3/s}'  hadza_kofamkoala_final_level2_sum.txt hadza_kofamkoala_final_level2_sum.txt > hadza_kofamkoala_final_level2_abrel.txt