Salmonella report

Overiview of the data

Quality trimming summary Reads are trimmed from the right with a minimum Phred score of 20 to a minimum length of 180. Then left & right homopolymers are trimmed.
stage reads bases
original 280,370 70,092,500
filtered 232,742 57,123,859

Mapping

  1. Map 232,742 reads at 100% identity allowing for multiple positions:
    Canu
    pb_636_20
    statistic 100% identity + secondary
    reads mapped 155,574
    bases mapped 38,328,553
    ref covered 4,620,743
    ref uncovered 31,377
    gaps 464

    155,574 map perfectly in at least 1 place, 77,168 reads have not yet been mapped.

    To visualize where the gaps are in the alignment, we can look at the sum of all gaps within each 100 Kb portion of the reference genome:

    Subsequent mapping steps will fill in these gaps and show which have variants.
  2. Map the 155,574 reads at 100% and identify reads with multiple sites
Canu
pb_636_20
type reads bases
unmapped: 1,342 331,590
mapped: 155,568 38,327,062

The 1,342 reads were remapped at 80% identity and no variants were found.

3. The 77,168 reads can now be mapped at 95% identity (no multiple alignments)
Identify how many reads mapped and how much more genome is covered:
Canu
pb_636_20
statistic 100% identity + secondary + 95% identity
reads mapped 26,469
bases mapped 6,456,783
ref covered 4,631,605
ref uncovered 20,515
gaps 337


Call variants in the gap regions:
Canu
pb_636_20_95pid
Molecule Pos Ref Alt Qual Type ref_seq alt_seq
canu 180,936 C CG 34.63 INS TATGACTTTGCCGGGGTTTTCGC TATGACTTTGCCGGGGGTTTTCGC
canu 734,742 A G 22.89 SUB TTCTTCATTGCACTCAGTCTGTT TTCTTCATTGCGCTCAGTCTGTT
canu 1,110,949 T TA 38.50 INS ACAGCATCTAATAAAAAGAGAAA ACAGCATCTAATAAAAAAGAGAAA
canu 1,437,932 C A 22.55 SUB CGGTGCAGCAACTCATGCAGACG CGGTGCAGCAAATCATGCAGACG
canu 1,731,702 C G 27.93 SUB CTTCTGAATATCGTGGCGGGCGT CTTCTGAATATGGTGGCGGGCGT
canu 1,917,704 G GC 38.52 INS ACATCATCTCTGCCCCCGGGCAA ACATCATCTCTGCCCCCCGGGCAA
canu 2,058,140 A T 22.20 SUB GGCGCGACCATATTCCCCAGCCC GGCGCGACCATTTTCCCCAGCCC
canu 2,595,883 C A 22.17 SUB CGCCTTGCGCGCCGGGCACGTTG CGCCTTGCGCGACGGGCACGTTG
canu 2,672,224 C T 22.84 SUB GGTAAACCCGTCACGGAAACGCT GGTAAACCCGTTACGGAAACGCT
canu 2,738,375 A AC 27.91 INS GTGATGTCATAACCCCCCCCTTT GTGATGTCATAACCCCCCCCCTTT
canu 2,794,618 T TC 37.75 INS TAAAACGCTTTTCCCCATCCAAT TAAAACGCTTTTCCCCCATCCAAT
canu 3,144,273 C T 27.33 SUB TGCTGGACCGCCTTCTGGTAGAC TGCTGGACCGCTTTCTGGTAGAC
canu 4,529,193 T C 22.64 SUB CTCACGCTGGCTGTCCTGCGCAG CTCACGCTGGCCGTCCTGCGCAG


4. The remaining 50,699 reads are mapped at 80% (allowing multiple alignments):
Canu
pb_636_20
statistic (100% + secondary) + 95% + (80% + secondary)
reads mapped 44,316
bases mapped 10,855,244
ref covered 4,640,362
ref uncovered 11,758
gaps 230



There were 235 new variants detected using the 80% reads.
So in total, the reconstruction will have 248 variants found and 11,758 nt of unverified bases.
The remaining 6,383 unmapped reads will be de novo assembled and blasted.