Folder assembly

/CIBIO/sharedCM/projects/metagenomicassembly/agambiae/evaluation/results_megahit

Two samples

Both samples have Enterobacter Ag1, genome here,

  1. How different are EspZ and Ag1?
  2. Extract all assembly from sample H2
  3. Annotation with Prokka + script
  4. Roary
  5. Tree with other Enterobacter

New sample: Sample_zanzf150 with presence of Microsporidiae. No assembly was possible due to no enough contigs.

H1’s reconstruced Ag1 covered 99.something % of AG refgen, H2 only 67%.

Genome extraction from Edo’s assemblies

Extract all Enterobacter sp. Ag1 and two Cedecea neteri contigs and store in a new fasta file ready annotable with PROKKA

python ~/pyphlan/fna_sss.py --select --ids <( egrep "Ag1|NODE_16_|NODE_91" /CIBIO/sharedCM/projects/metagenomicassembly/agambiae/evaluation/results_megahit/Sample_H1_2_140_out.txt | cut -f1 ) /CIBIO/sharedCM/projects/metagenomicassembly/agambiae/Sample_H1_2_140_megahit/contigs.fasta > /CIBIO/sharedCM/projects/agambie/H1/Enterobacter_Ag1_extracted.fasta

Alignment and comparison with progressiveMauve

Download Mauve aligner, run the Move contig tool from Tools menu. Add refgen first and then the assembled genome. Last alignemnt folder contains the ordered contigs.

PROKKA annotation

export PATH=/CIBIO/sharedCM/tools/prokka-1.11/bin:/CIBIO/sharedCM/tools/barrnap-0.7/bin:$PATH
prokka --setupdb
prokka /CIBIO/sharedCM/projects/agambie/H1/Enterobacter_Ag1_extracted.fasta --outdir /CIBIO/sharedCM/projects/agambie/H1/prokkaanotation --force --metagenome --cpus 30

Download all Enterobacter spp. genomes from NCBI

while read line; do
    rsync --copy-links --recursive --times --verbose rsync://$line /CIBIO/sharedCM/projects/agambie/NCBI_enterobacter/.
done < <(wget -qO - ftp://ftp.ncbi.nlm.nih.gov/genomes/ASSEMBLY_REPORTS/assembly_summary_refseq.txt | grep Enterobacter | cut -f20 | sed 's/ftp:\/\///g')

Convert NCBI’s GenBank to GFF3

mkdir corrected_gff parallel bp_genbank2gff3 {} --outdir - > /CIBIO/sharedCM/projects/agambie/corrected_gff/{.}.gff ::: /CIBIO/sharedCM/projects/agambie/NCBI_enterobacter/*/*gbff.gz

Run Roary

Roary with nearest genomes

Species Strain Assembly
Enterobacter sp. Ag1 GCA_000277545.1
Cedecea neteri ND14a GCA_000758325.1
Cedecea neteri M006 GCA_000758305.1
Klebsiella michiganensis RC10 GCA_000963575.1
Cedecea neteri SSMD04 GCA_000757825.1
Cedecea neteri NBRC 105707 GCA_001571265.1
Cedecea daviseae DSM 4568 GCA_000412335.2
Enterobacter cloaceae ATCC 13047 GCA_000025565.1
Escherichia coli K12MG1655 G001265845
Pseudomonas aeroginosa G000796765
Serratia marcensis FGI94 G000330865
Yersinia pestis CO92 G000009065
Pantoea agglomerans P10c G001288285
Cronobacter sakazakii 8399 G000467775
Enterobacter asburiae ATCC 35953 G001521715
Enterobacter hormaechei oharae G000807405
Enterobacter lignolyticus SCF1 G000164865
Enterobacter soli ATCC BAA-2102 G001654845
Enterobacter cancerogenus M004 G000773965
Klebsiella pneumoniae HS11286 G000240185
Salmonella enterica LT2 G000006945
Shigella dysenteriae Sd197 G000012005
export PATH=/CIBIO/sharedCM/tools/Roary_git/bin:$PATH
export PERL5LIB=/CIBIO/sharedCM/tools/Roary_git/lib:$PERL5LIB
export PERL5LIB=/home/e.pasolli/perl5/lib/perl5:$PERL5LIB
roary /CIBIO/sharedCM/projects/agambie/NCBI_enterobacter/corrected_gff/*gff -p 50 -e -f /CIBIO/sharedCM/projects/agambie/roary_out/roary_out -f /CIBIO/sharedCM/projects/agambie/roary_out -g 1000000

FastTree -nt -gtr core_gene* > tree.nwk

Core genes tree

coretree <- read.tree("/CIBIO/sharedCM/projects/agambie/H1/neargenomes/out_p80/tree.nwk.rerooted")
coretree$tip.label <- gsub("\\_"," ", coretree$tip.label)
p <- ggtree(coretree, aes(linesize=32)) +
    geom_tiplab(fontface="italic", hjust = -0.03, family="sans-serif") +
    geom_tippoint() +
    geom_treescale()
p <- ggtree::rotate(p, 22)
svg("out.svg", width = 10, height = 10)
p
dev.off()
## png 
##   2

Rebuild 16S rRNA from reads

No 16S gene from PROKKA annotation, so let’s build it from the reference and the reads!

Buenaventura

qiime join_paired_ends.py \
    -f $seqdir/Undetermined_S0_L001_R1_001.fastq.gz \
    -r $seqdir/Undetermined_S0_L001_R2_001.fastq.gz \
    -b $seqdir/Undetermined_S0_L001_I1_001.fastq.gz \
    -p 3 \
    -j $min_overlap \
    -o merged_reads

qiime split_libraries_fastq.py \
    -i merged_reads/fastqjoin.join.fastq \
    -b merged_reads/fastqjoin.join_barcodes.fastq \
    -m validated/mapping_file_corrected.txt \
    --rev_comp_barcode \
    -q 29 \
    -o splitted \
    --barcode_type 12 
for read in `ls fastq | cut -f1 -d"_"  | sort | uniq`; do
    qiime join_paired_ends.py \
        -f fastq/$read"_R1.fastq.gz" \
        -r fastq/$read"_R2.fastq.gz" \
        -j 18 \
        -o fastq_reads_joined/$read.fastq
done

for sample in `ls fastq_reads_joined`; do
    qiime extract_barcodes.py -f fastq_reads_joined/$sample/fastqjoin.join.fastq -c barcode_single_end --bc1_len 6 -o fastq_reads_joined/$sample/processed_seqs
    samplename=`echo $sample | cut -f2 -d "."`
    qiime split_libraries_fastq.py \
        -i fastq_reads_joined/$sample/processed_seqs/reads.fastq \
        -o fastq_reads_joined/$sample/split \
        --barcode_type 'not-barcoded' \
        --sample_ids $samplename \
        -m mapping_file.txt
done

qiime validate_mapping_file.py -m mapping_file.txt -o validated -s 

qiime pick_open_reference_otus.py -i splitted/seqs.fna -p otu_picking_params.txt.silva -o otus -a -O 50

qiime summarize_taxa_through_plots.py -i otus/otu_table_mc2_w_tax.biom -o taxa_nonsummary -m validated/mapping_file_corrected.txt -f

qiime alpha_rarefaction.py -i otus/otu_table_mc2_w_tax.biom -o alpha_div -p alpha_params.txt -m validated/mapping_file_corrected.txt -t otus/rep_set.tre -f -a -O 50

qiime beta_diversity_through_plots.py -i otus/otu_table_mc2_w_tax.biom -o beta_div -m validated/mapping_file_corrected.txt -p beta_params.txt -t otus_silva/rep_set.tre -a -O 20 -f

Remove B65 B66 B34 CN B5 BC1 B18

echo -e "B65\nB66\nB34\nCN\nB5\nBC1\nB18\nBL5" > ids.txt
qiime filter_samples_from_otu_table.py -i otus/otu_table_mc2_w_tax.biom -o otus/cleaned.biom -m validated/mapping_file_corrected.txt --sample_id_fp ids.txt --negate_sample_id_fp

Extract larvae and emergednte

mkdir larvae_emerg

qiime filter_samples_from_otu_table.py -i otus/cleaned.biom -o otus/larvae_emerg.biom -m validated/mapping_file_corrected.txt -s 'Stage:E,L'

qiime summarize_taxa_through_plots.py -i otus/larvae_emerg.biom -o larvae_emerg/taxa -m validated/mapping_file_corrected.txt -f

qiime alpha_rarefaction.py -i otus/larvae_emerg.biom -o larvae_emerg/alpha_div -p alpha_params.txt -m validated/mapping_file_corrected.txt -t otus/rep_set.tre -f -a -O 50

qiime beta_diversity_through_plots.py -i otus/larvae_emerg.biom -o larvae_emerg/beta_div -m validated/mapping_file_corrected.txt -p beta_params.txt -t otus/rep_set.tre -a -O 20 -f

qiime make_2d_plots.py -i larvae_emerg/beta_div/unweighted_unifrac_pc.txt -m validated/mapping_file_corrected.txt -b Stage -o larvae_emerg/pcoa

/CIBIO/sharedCM/projects/anofele/bue-fastq/larvae_emerg : Resultados larvas y emergente

/CIBIO/sharedCM/projects/anofele/bue-fastq/taxa_nonsummary/ : Taxonomy all samples

Core microbiome

qiime compute_core_microbiome.py -i otu_table.biom -o otu_table_core_fast --mapping_fp map.txt --valid_states "Treatment:Fast"

Taxonomy

Load QIIME data

# system("biom add-metadata -i /CIBIO/sharedCM/projects/anofele/bue-fastq/otus/larvae_emerg.biom -o /CIBIO/sharedCM/projects/anofele/bue-fastq/larvae_emerg/larvae_emerg_wtax.biom -m /CIBIO/sharedCM/projects/anofele/bue-fastq/mapping_file.txt")

# system("biom convert -i /CIBIO/sharedCM/projects/anofele/bue-fastq/larvae_emerg/larvae_emerg_wtax.biom -o /CIBIO/sharedCM/projects/anofele/bue-fastq/larvae_emerg/larvae_emerg_wtax.txt --to-json --table-type \"OTU table\"")

# otu_table <- "/CIBIO/sharedCM/projects/anofele/bue-fastq/larvae_emerg/larvae_emerg_wtax.txt"
# mapping_file <- "/CIBIO/sharedCM/projects/anofele/bue-fastq/mapping_file.txt"
# tree_otu <- "/CIBIO/sharedCM/projects/anofele/bue-fastq/otus/rep_set.tre"
# rep_set <- "/CIBIO/sharedCM/projects/anofele/bue-fastq/otus/new_refseqs.fna"

#qiime_data <- import_biom(BIOMfilename = otu_table, treefilename = read_tree(tree_otu), refseqfilename = rep_set, refseqFunction = parse_taxonomy_default)
#plot_bar(filter_taxa(qiime_data, function(x) sum(x > 50) > (0.5*length(x)), TRUE), fill="Rank5")

taxonomy <- read.delim("/CIBIO/sharedCM/projects/anofele/bue-fastq/larvae_emerg/taxa/larvae_emerg_L5.txt", skip = 1)
rownames(taxonomy) <- taxonomy$X.OTU.ID
taxonomy <- taxonomy[,-1]
threshold <- 0.005
taxonomy[apply(taxonomy, 1, mean)<threshold,"phylum"] <- "Other"
taxonomy[apply(taxonomy[,-which(names(taxonomy)=="phylum")], 1, function(x) {mean(x, na.rm = T)})>threshold,"phylum"] <- rownames(taxonomy[apply(taxonomy[,-which(names(taxonomy)=="phylum")], 1, function(x) {mean(x, na.rm = T)})>threshold,])
x<-melt(taxonomy, id.vars = c("phylum"))
x$phylum <- sapply(sapply(lapply(x$phylum, strsplit, ";"),`[[`,1),tail, n=1)
x$phylum <- gsub("D_4__","",x$phylum)
x <- melt(dcast(x, phylum~variable, fun.aggregate = sum))
## Using phylum as id variables
x[,"stage"] <- sapply(x$variable, substr, 2,2)
# svg("taxonomy.svg", width = 11)
ggplot() + 
    geom_bar(data=x, aes(x=variable, y=value, fill=phylum), stat="identity") +
    scale_fill_manual(values=c("#47004b","#00f7b7","#d10090","#01deca","#ed2300","#017adb","#3d6200","#ff83cf","#005f43","#ff8f7c","#003d96","#ffe89f","#1a000b","#0000CC","#4b0016","#dfc7ff"), name="Genus") + 
    facet_grid( ~ stage, scale="free", space = "free_x") 

# dev.off()