Sample ran with reprep_threshold of 0.5 and repool_threshold of 0.75
This interactive notebook generates a summary of QC statistics to assess the success of a Mad4hatter run.
QC Summaries
The notebook provides the following analyses:
- Plate Layout: Location of samples and controls on the plate
- Primer Dimer Content: Input reads vs. % attributed to primer dimers
- Balancing Across Batches: Swarm plot of reads output by the pipeline by batch
- Control Summary, including:
- Polyclonality of Positive Controls
- Read Summary for Negative Controls
- Contamination Maps for Negative Controls
- Successul Amplification summary, including:
- Sample Reads vs. Number of Successfully Amplified Loci
- Parasitemia vs. Number of Successfully Amplified Loci
- Amplification Plate Maps, including:
- Reads Heatmap
- Successful Amplification Heatmap
Note: A locus is considered successfully amplified if it has more than read_threshold reads, where read_threshold is a threshold that can be set below.
Setup
Results Directory: /Users/mallo/OneDrive/Desktop/Personal/Fall 2025/Berube/Sequence data/QC_ZMZHF_MDBG16_3
Manifest File: C:/Users/mallo/OneDrive/Desktop/Personal/Fall 2025/Berube/Sequence data/manifest_QC_ZMZHF_MDBG16_3_v3.csv
Read Threshold for Successful Amplification: 100
Minimum Reads per ASV (Positive Control Filter): 5
Minimum Allele Frequency per ASV (Positive Control Filter): 0.01
Warning: package 'plotly' was built under R version 4.5.1
Warning: package 'ggbeeswarm' was built under R version 4.5.1
Warning: package 'kableExtra' was built under R version 4.5.1
Here we filter out two types of loci:
Loci targeting non-Plasmodium falciparum species, which are expected to amplify only if other species are present.
Long amplicons (>275 bp, including primers), which tend to underperform due to their length.
These loci should not be considered when assessing the success of a sequencing run.
Note: If you modify the filtered loci, be sure to update the loci counts below accordingly.
Below are the loci counts per reaction, set with the expectation that you are running Madh4hatter pools D1.1, R1.2, and R2.1.. The loci we filtered out above are excluded from these counts.
- If you are not using these pools, please update these numbers manually.
- Alternatively, you can uncomment the section below to calculate loci counts directly from the
allele_table. Note: If a locus fails to amplify in all samples, it will not be included in the counts when derived from the allele_table.
reaction nreactionloci
1 1 205
2 2 25
Uncomment the following to count the number of loci per reaction based on the allele_data table.
reaction nreactionloci
1 1 205
2 2 24
285 samples in allele data file
288 samples in manifest and sample coverage file
288 samples in manifest and amplicon coverage file
Plate Layout
The plate layout provides a visual representation of sample organization, including the placement of positive and negative controls. This is allows verification that controls are positioned correctly to validate assay performance with later plots.
Primer Dimer Content
Here we visualise the proportion of sequencing reads that are classified as primer dimers, which occur when primers anneal to each other instead of the target DNA. This plot is useful for assessing the efficiency of the amplification process, as high primer dimer levels can indicate suboptimal reaction conditions, reduced sequencing efficiency, and potential issues with sample quality or reagent performance.
Balancing Across Batches
Here we show the distribution of total reads per sample across different batches, helping to assess whether sequencing depth is consistent. This plot is useful when you have multiple batches in the same sequencing run to identify imbalances in sequencing, which can arise due to variations in library preparation, loading efficiency, or sequencing conditions.
Control Summary
Polyclonality of Positive Controls
Here we inspect the positive controls, ensuring that the positive controls perform as expected without contamination or unwanted diversity. For example, if the control you included was monoclonal you would expect little to no loci reported as having more than one allele (polyclonal). In some cases you may allow for some level of false positive detection within a monoclonal control, these filters
| ZMZHF_MH_02_KL_2025-07-14_8076967381_Pos_B04 |
Pf3D7_08_v3-1375025-1375284-1B |
10G17T22G33D=TA39+8N58+8N66G67+53N132G134A136G139T151A168A195A |
259 |
0.1265885 |
2 |
2 Alleles |
| ZMZHF_MH_02_KL_2025-07-14_8076967381_Pos_B04 |
Pf3D7_08_v3-1375025-1375284-1B |
39+8N58+8N67+53N |
1787 |
0.8734115 |
2 |
2 Alleles |
| ZMZHF_MH_02_KL_2025-07-14_8076967381_Pos_B04 |
Pf3D7_13_v3-1146764-1147010-1A |
64C136C146+9N182+8N |
176 |
0.0112654 |
2 |
2 Alleles |
| ZMZHF_MH_02_KL_2025-07-14_8076967381_Pos_B04 |
Pf3D7_13_v3-1146764-1147010-1A |
64C146+9N182+8N |
15447 |
0.9887346 |
2 |
2 Alleles |
| ZMZHF_MH_02_KL_2025-07-14_8076967387_Pos_E10 |
Pf3D7_08_v3-1375025-1375284-1B |
10G17T22G33D=TA39+8N58+8N66G67+53N132G134A136G139T151A168A195A |
245 |
0.1440329 |
2 |
2 Alleles |
| ZMZHF_MH_02_KL_2025-07-14_8076967387_Pos_E10 |
Pf3D7_08_v3-1375025-1375284-1B |
39+8N58+8N67+53N |
1456 |
0.8559671 |
2 |
2 Alleles |
Negative Control Contamination
Read Summary per Negative Control
In an ideal scenario, negative controls should have minimal or no amplification. If a negative control shows a high number of targets amplified with significant reads, it suggests potential contamination. By plotting the number of reads against the number of targets for each sample, any outliers or unexpected amplification in negative controls can be easily flagged.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 48 rows containing missing values or values outside the scale range
(`geom_bar()`).
| negative_control_amplified_loci.csv |
Aggregated Reads per Target
This plot aggregates the total reads for each locus across all negative controls. In negative controls, if there is an unexpected spike in reads for specific loci, it could indicate contamination in the form of cross-sample contamination or environmental contamination. Analyzing these total summed reads helps pinpoint specific loci where contamination might have occurred, offering insights into which steps in the process might have introduced contaminants.
Successul Amplification
Parasitemia vs. Number of Successfully Amplified Loci
This plot examines the relationship between a sample’s parasitemia (qPCR-determined parasite load) and the number of loci that successfully amplified. This plot helps assess whether lower parasitemia samples struggle with amplification, which can indicate potential limitations in sensitivity.
Sample Reads vs. Number of Successfully Amplified Loci
This illustrates the relationship between the total number of reads per sample and the number of loci that passed the amplification threshold for that sample. This plot helps evaluate whether samples with higher read counts achieve better amplification success and can reveal potential issues such as insufficient sequencing depth or inefficient amplification. Ideally, a positive correlation should be observed, where higher read counts result in more successfully amplified loci.
| ZMZHF_MH_02_KL_2025-07-14_8078139464_Sample_A12 |
ZMZHF_MH_02_KL_2025-07-14 |
2 |
reprep |
< reprep threshold 0.5 |
2864 |
| ZMZHF_MH_02_KL_2025-07-14_8078139520_Sample_A07 |
ZMZHF_MH_02_KL_2025-07-14 |
1 |
reprep |
< reprep threshold 0.5 |
26785 |
| ZMZHF_MH_02_KL_2025-07-14_8078140039_Sample_H04 |
ZMZHF_MH_02_KL_2025-07-14 |
2 |
reprep |
< reprep threshold 0.5 |
1697 |
| ZMZHF_MH_02_KL_2025-07-14_NC1_Neg_F02 |
ZMZHF_MH_02_KL_2025-07-14 |
1 |
reprep |
< reprep threshold 0.5 |
237 |
| ZMZHF_MH_02_KL_2025-07-14_NC1_Neg_F02 |
ZMZHF_MH_02_KL_2025-07-14 |
2 |
reprep |
< reprep threshold 0.5 |
19 |
| ZMZHF_MH_02_KL_2025-07-14_NC2_Neg_D06 |
ZMZHF_MH_02_KL_2025-07-14 |
1 |
reprep |
< reprep threshold 0.5 |
0 |
Amplification Plate Maps
Reads Heatmap
The reads heatmap provides a visual representation of read distribution across the plate, helping to identify inconsistencies in sequencing efficiency. This can highlight potential issues such as edge effects, batch effects, or pipetting errors that may impact data quality and interpretation.
Successful Amplification Heatmap
Here we visualise the success rate of amplification across the plate, allowing for the identification of poorly amplified regions or wells. This is useful for assessing the quality of the PCR process, ensuring that loci across all samples are adequately amplified, and helping to spot potential issues with specific wells or sample groups.
Save filtered allele table