QC Report

Sample ran with reprep_threshold of 0.5 and repool_threshold of 0.75

This interactive notebook generates a summary of QC statistics to assess the success of a Mad4hatter run.

Required Inputs

To proceed, you must provide the results directory from the Mad4hatter pipeline, which should include the following files:

  • sample_coverage.txt
  • amplicon_coverage.txt
  • allele_data.txt

Additionally, a sample manifest is required. This file must contain the following fields:

  • SampleID – Unique identifier for each sample.
  • SampleType – Specifies whether the entry is a sample, positive control, or negative control.
  • Batch – Identifies a group of samples processed simultaneously by the same individual.
  • Column – The well column where the sample was placed in the plate.
  • Row – The well row where the sample was placed in the plate.
  • Parasitemia – The qPCR value for the sample.

QC Summaries

The notebook provides the following analyses:

  • Plate Layout: Location of samples and controls on the plate
  • Primer Dimer Content: Input reads vs. % attributed to primer dimers
  • Balancing Across Batches: Swarm plot of reads output by the pipeline by batch
  • Control Summary, including:
    • Polyclonality of Positive Controls
    • Read Summary for Negative Controls
    • Contamination Maps for Negative Controls
  • Successul Amplification summary, including:
    • Sample Reads vs. Number of Successfully Amplified Loci
    • Parasitemia vs. Number of Successfully Amplified Loci
  • Amplification Plate Maps, including:
    • Reads Heatmap
    • Successful Amplification Heatmap

Note: A locus is considered successfully amplified if it has more than read_threshold reads, where read_threshold is a threshold that can be set below.

Setup

User Input Summary
Results Directory: /Users/mallo/OneDrive/Desktop/Personal/Fall 2025/Berube/Sequence data/QC_ZMZHF_MDBG16_3 
Manifest File: C:/Users/mallo/OneDrive/Desktop/Personal/Fall 2025/Berube/Sequence data/manifest_QC_ZMZHF_MDBG16_3_v3.csv 
Read Threshold for Successful Amplification: 100 
Minimum Reads per ASV (Positive Control Filter): 5 
Minimum Allele Frequency per ASV (Positive Control Filter): 0.01 
Warning: package 'plotly' was built under R version 4.5.1
Warning: package 'ggbeeswarm' was built under R version 4.5.1
Warning: package 'kableExtra' was built under R version 4.5.1

Here we filter out two types of loci:

  • Loci targeting non-Plasmodium falciparum species, which are expected to amplify only if other species are present.

  • Long amplicons (>275 bp, including primers), which tend to underperform due to their length.

These loci should not be considered when assessing the success of a sequencing run.

Note: If you modify the filtered loci, be sure to update the loci counts below accordingly.

Below are the loci counts per reaction, set with the expectation that you are running Madh4hatter pools D1.1, R1.2, and R2.1.. The loci we filtered out above are excluded from these counts.

  • If you are not using these pools, please update these numbers manually.
  • Alternatively, you can uncomment the section below to calculate loci counts directly from the allele_table. Note: If a locus fails to amplify in all samples, it will not be included in the counts when derived from the allele_table.
  reaction nreactionloci
1        1           205
2        2            25

Uncomment the following to count the number of loci per reaction based on the allele_data table.

  reaction nreactionloci
1        1           205
2        2            24
Sample Summary
288 samples in manifest 
285 samples in allele data file 
288 samples in manifest and sample coverage file 
288 samples in manifest and amplicon coverage file 

Plate Layout

The plate layout provides a visual representation of sample organization, including the placement of positive and negative controls. This is allows verification that controls are positioned correctly to validate assay performance with later plots.

Primer Dimer Content

Here we visualise the proportion of sequencing reads that are classified as primer dimers, which occur when primers anneal to each other instead of the target DNA. This plot is useful for assessing the efficiency of the amplification process, as high primer dimer levels can indicate suboptimal reaction conditions, reduced sequencing efficiency, and potential issues with sample quality or reagent performance.

Balancing Across Batches

Here we show the distribution of total reads per sample across different batches, helping to assess whether sequencing depth is consistent. This plot is useful when you have multiple batches in the same sequencing run to identify imbalances in sequencing, which can arise due to variations in library preparation, loading efficiency, or sequencing conditions.

Control Summary

Polyclonality of Positive Controls

Here we inspect the positive controls, ensuring that the positive controls perform as expected without contamination or unwanted diversity. For example, if the control you included was monoclonal you would expect little to no loci reported as having more than one allele (polyclonal). In some cases you may allow for some level of false positive detection within a monoclonal control, these filters

SampleID Locus PseudoCIGAR Reads AlleleFreq NumASVs_Meeting_Threshold Category
ZMZHF_MH_02_KL_2025-07-14_8076967381_Pos_B04 Pf3D7_08_v3-1375025-1375284-1B 10G17T22G33D=TA39+8N58+8N66G67+53N132G134A136G139T151A168A195A 259 0.1265885 2 2 Alleles
ZMZHF_MH_02_KL_2025-07-14_8076967381_Pos_B04 Pf3D7_08_v3-1375025-1375284-1B 39+8N58+8N67+53N 1787 0.8734115 2 2 Alleles
ZMZHF_MH_02_KL_2025-07-14_8076967381_Pos_B04 Pf3D7_13_v3-1146764-1147010-1A 64C136C146+9N182+8N 176 0.0112654 2 2 Alleles
ZMZHF_MH_02_KL_2025-07-14_8076967381_Pos_B04 Pf3D7_13_v3-1146764-1147010-1A 64C146+9N182+8N 15447 0.9887346 2 2 Alleles
ZMZHF_MH_02_KL_2025-07-14_8076967387_Pos_E10 Pf3D7_08_v3-1375025-1375284-1B 10G17T22G33D=TA39+8N58+8N66G67+53N132G134A136G139T151A168A195A 245 0.1440329 2 2 Alleles
ZMZHF_MH_02_KL_2025-07-14_8076967387_Pos_E10 Pf3D7_08_v3-1375025-1375284-1B 39+8N58+8N67+53N 1456 0.8559671 2 2 Alleles

Negative Control Contamination

Read Summary per Negative Control

In an ideal scenario, negative controls should have minimal or no amplification. If a negative control shows a high number of targets amplified with significant reads, it suggests potential contamination. By plotting the number of reads against the number of targets for each sample, any outliers or unexpected amplification in negative controls can be easily flagged.

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 48 rows containing missing values or values outside the scale range
(`geom_bar()`).

x
negative_control_amplified_loci.csv

Aggregated Reads per Target

This plot aggregates the total reads for each locus across all negative controls. In negative controls, if there is an unexpected spike in reads for specific loci, it could indicate contamination in the form of cross-sample contamination or environmental contamination. Analyzing these total summed reads helps pinpoint specific loci where contamination might have occurred, offering insights into which steps in the process might have introduced contaminants.

Successul Amplification

Parasitemia vs. Number of Successfully Amplified Loci

This plot examines the relationship between a sample’s parasitemia (qPCR-determined parasite load) and the number of loci that successfully amplified. This plot helps assess whether lower parasitemia samples struggle with amplification, which can indicate potential limitations in sensitivity.

Sample Reads vs. Number of Successfully Amplified Loci

This illustrates the relationship between the total number of reads per sample and the number of loci that passed the amplification threshold for that sample. This plot helps evaluate whether samples with higher read counts achieve better amplification success and can reveal potential issues such as insufficient sequencing depth or inefficient amplification. Ideally, a positive correlation should be observed, where higher read counts result in more successfully amplified loci.

SampleID Batch reaction status reason reads_per_reaction
ZMZHF_MH_02_KL_2025-07-14_8078139464_Sample_A12 ZMZHF_MH_02_KL_2025-07-14 2 reprep < reprep threshold 0.5 2864
ZMZHF_MH_02_KL_2025-07-14_8078139520_Sample_A07 ZMZHF_MH_02_KL_2025-07-14 1 reprep < reprep threshold 0.5 26785
ZMZHF_MH_02_KL_2025-07-14_8078140039_Sample_H04 ZMZHF_MH_02_KL_2025-07-14 2 reprep < reprep threshold 0.5 1697
ZMZHF_MH_02_KL_2025-07-14_NC1_Neg_F02 ZMZHF_MH_02_KL_2025-07-14 1 reprep < reprep threshold 0.5 237
ZMZHF_MH_02_KL_2025-07-14_NC1_Neg_F02 ZMZHF_MH_02_KL_2025-07-14 2 reprep < reprep threshold 0.5 19
ZMZHF_MH_02_KL_2025-07-14_NC2_Neg_D06 ZMZHF_MH_02_KL_2025-07-14 1 reprep < reprep threshold 0.5 0

Amplification Plate Maps

Reads Heatmap

The reads heatmap provides a visual representation of read distribution across the plate, helping to identify inconsistencies in sequencing efficiency. This can highlight potential issues such as edge effects, batch effects, or pipetting errors that may impact data quality and interpretation.

Successful Amplification Heatmap

Here we visualise the success rate of amplification across the plate, allowing for the identification of poorly amplified regions or wells. This is useful for assessing the quality of the PCR process, ensuring that loci across all samples are adequately amplified, and helping to spot potential issues with specific wells or sample groups.

Save filtered allele table