Overiview of the data
Quality trimming summary Reads are trimmed from the right with a minimum Phred score of 20 to a minimum length of 180. Then left & right homopolymers are trimmed.| stage | reads | bases |
|---|---|---|
| original | 280,370 | 70,092,500 |
| filtered | 232,742 | 57,123,859 |
Mapping
- Map 232,742 reads at 100% identity allowing for multiple positions:
Canupb_636_20
statistic 100% identity + secondary reads mapped 155,574 bases mapped 38,328,553 ref covered 4,620,743 ref uncovered 31,377 gaps 464
155,574 map perfectly in at least 1 place, 77,168 reads have not yet been mapped.
To visualize where the gaps are in the alignment, we can look at the sum of all gaps within each 100 Kb portion of the reference genome:
Subsequent mapping steps will fill in these gaps and show which have variants.
- Map the 155,574 reads at 100% and identify reads with multiple sites
|
Canu
|
||
|---|---|---|
|
pb_636_20
|
||
| type | reads | bases |
| unmapped: | 1,342 | 331,590 |
| mapped: | 155,568 | 38,327,062 |
The 1,342 reads were remapped at 80% identity and no variants were found.
3. The 77,168 reads can now be mapped at 95% identity (no multiple alignments)
Identify how many reads mapped and how much more genome is covered:
|
Canu
|
|
|---|---|
|
pb_636_20
|
|
| statistic | 100% identity + secondary + 95% identity |
| reads mapped | 26,469 |
| bases mapped | 6,456,783 |
| ref covered | 4,631,605 |
| ref uncovered | 20,515 |
| gaps | 337 |
Call variants in the gap regions:
|
Canu
|
|||||||
|---|---|---|---|---|---|---|---|
|
pb_636_20_95pid
|
|||||||
| Molecule | Pos | Ref | Alt | Qual | Type | ref_seq | alt_seq |
| canu | 180,936 | C | CG | 34.63 | INS | TATGACTTTGCCGGGGTTTTCGC | TATGACTTTGCCGGGGGTTTTCGC |
| canu | 734,742 | A | G | 22.89 | SUB | TTCTTCATTGCACTCAGTCTGTT | TTCTTCATTGCGCTCAGTCTGTT |
| canu | 1,110,949 | T | TA | 38.50 | INS | ACAGCATCTAATAAAAAGAGAAA | ACAGCATCTAATAAAAAAGAGAAA |
| canu | 1,437,932 | C | A | 22.55 | SUB | CGGTGCAGCAACTCATGCAGACG | CGGTGCAGCAAATCATGCAGACG |
| canu | 1,731,702 | C | G | 27.93 | SUB | CTTCTGAATATCGTGGCGGGCGT | CTTCTGAATATGGTGGCGGGCGT |
| canu | 1,917,704 | G | GC | 38.52 | INS | ACATCATCTCTGCCCCCGGGCAA | ACATCATCTCTGCCCCCCGGGCAA |
| canu | 2,058,140 | A | T | 22.20 | SUB | GGCGCGACCATATTCCCCAGCCC | GGCGCGACCATTTTCCCCAGCCC |
| canu | 2,595,883 | C | A | 22.17 | SUB | CGCCTTGCGCGCCGGGCACGTTG | CGCCTTGCGCGACGGGCACGTTG |
| canu | 2,672,224 | C | T | 22.84 | SUB | GGTAAACCCGTCACGGAAACGCT | GGTAAACCCGTTACGGAAACGCT |
| canu | 2,738,375 | A | AC | 27.91 | INS | GTGATGTCATAACCCCCCCCTTT | GTGATGTCATAACCCCCCCCCTTT |
| canu | 2,794,618 | T | TC | 37.75 | INS | TAAAACGCTTTTCCCCATCCAAT | TAAAACGCTTTTCCCCCATCCAAT |
| canu | 3,144,273 | C | T | 27.33 | SUB | TGCTGGACCGCCTTCTGGTAGAC | TGCTGGACCGCTTTCTGGTAGAC |
| canu | 4,529,193 | T | C | 22.64 | SUB | CTCACGCTGGCTGTCCTGCGCAG | CTCACGCTGGCCGTCCTGCGCAG |
4. The remaining 50,699 reads are mapped at 80% (allowing multiple alignments):
|
Canu
|
|
|---|---|
|
pb_636_20
|
|
| statistic | (100% + secondary) + 95% + (80% + secondary) |
| reads mapped | 44,316 |
| bases mapped | 10,855,244 |
| ref covered | 4,640,362 |
| ref uncovered | 11,758 |
| gaps | 230 |
There were 235 new variants detected using the 80% reads.
So in total, the reconstruction will have 248 variants found and 11,758 nt of unverified bases.
The remaining 6,383 unmapped reads will be de novo assembled and blasted.