ISFG, Washington, DC, August 30th 2022

Learning Outcome

Learning Outcomes

  • Understanding the possibilities with STR-validator.
  • Experimental design and analysis according to ENFSI recommendations and SWGDAM guidelines.
  • Documentation and interpretation of the validation result.
  • Demonstration of automated analysis and report generation using R markdown.
  • Provide useful learning resources to the R universe.

Workshop Schedule

14:00 Workshop begins

  • Introduction to STR-validator, R, RStudio, R markdown
  • Validation and guidelines
  • Analytical Thresholds (Exercise)

15:30-16:00 COFFEE BREAK

  • Allele Sizing Precision (Exercise)

  • Peak Balance (Exercise)

  • Stutter Ratios (Exercise)

  • Demonstration of automated analysis using R markdown

18:00 Workshop ends

Questions about STR-validator

How many of you have:

  • Installed strvalidator?
  • Previously used STR-validator?
  • Validated/verified an STR kit?

Questions about R and programming

How many of you have:

  • Used RStudio?
  • Written own functions or analysed data in R/Rgui/RStudio?
  • Programmed in another language?

Introduction to STR-validator and R

What is STR-validator?

  • Easy to use graphical user interface on top of R
  • Aims to simplify, standardize, and streamline validation
  • Process control and contamination monitoring
  • Free and open source software with high quality assurance standards
  • Partly funded by the European Union Seventh Framework Programme (FP7/2007-2013) under grant agreement n°285487 (EUROFORGEN-NoE)

STR-validator website

https://sites.google.com/site/forensicapps/strvalidator

STR-validator paper

  • Overview of core functions and motivating examples
  • Explains the design philosophy and technical stuff
  • Use this reference to cite STR-validator

What is R?

  • Advanced system for statistical calculations and graphichs
  • Programming language
  • Platform for specialized ‘packages’

What is RStudio?

  • Integrated development environment (IDE) for R
  • Greatly simplifies the use of R
  • Free and open source

Hands-On Programming with R

R for Data Science

  • Learn the basics of R and the tools of the modern R universe
  • Freely available online: http://r4ds.had.co.nz/

Advanced R

  • The book is designed primarily for R users who want to improve their programming skills and understanding of the language.
  • Freely available online: https://adv-r.hadley.nz/index.html

Efficient R Programming

Mastering Software Development in R

  • This book is designed to be used in conjunction with the course sequence Mastering Software Development in R, available on Coursera.
  • Freely available online: https://bookdown.org/rdpeng/RProgDA/

What is R Markdown?

R Markdown is a powerful tool for combining analysis and reporting into the same document. Turn your analyses into high quality documents, reports, presentations and dashboards. https://rmarkdown.rstudio.com/

R Markdown Cookbook

R Markdown: The Definite Guide

bookdown: Authoring Books and Technical Documents with R Markdown

STR-validator

Demonstration using RStudio

  • Command line functions

  • Gui wrapped functions

  • STR-validator main gui

  • Quick introduction to RStudio

Possibly installation and break.

Why Validate?

SWGDAM Validation Guidelines for Forensic DNA Analysis Methods

Validation is a process by which a procedure is evaluated to determine its efficacy and reliability for forensic casework and/or database analysis

  • Determination of conditions and limitations of a new or novel DNA methodology for use on forensic genetics
  • Demonstrate that established methods and procedures perform as expected in the laboratory

Why Validate?

To interpret results, it is necessary to characterize loci by their key features:

  • Stutter ratio
  • Heterozygote peak balance and inter-locus balance
  • Stochastic threshold

Validation Guidelines

The European Network of Forensic Science Institutes

http://enfsi.eu/documents/forensic-guidelines/

Validation Guidelines

Analytical Thresholds

Analytical Thresholds

The laboratory should establish an analytical threshold (AT) based on signal-to-noise analyses of internally derived empirical data

Analytical Thresholds

Analytical Thresholds

STR-validator Exercise

Estimate Analytical Thresholds (~10 min)

Experimental Design

  • 8 individual negative PCR controls
  • Amplified using SureID 27comp with 29 PCR cycles
  • No stutter filter and all global settings were set to 0.0
  • Peak Amplitude Threshold was set to 1 RFU

Note: The GeneMapper option “Use Normalization, if applicable” will probably impact the calculations since it re-scales the heights.

Instructions for exercises: https://rpubs.com/OskarHansson/935925

References

Analytical Thresholds

  • Assumptions should apply to the observed distributions
  • Six methods to determine analytical thresholds compared
  • Errors are common when stating the confidence
  • AT is not synonymous with sensitivity
  • Color specific thresholds are recommended

  • Evaluates log-normal, Gaussian (normal), and gamma distributions
  • Log-normal distribution best describes the data
  • Gaussian distribution class gives the worst fit
  • Remove N-2 stutters as they impacts the noise
  • Estimate AT from data with realistic DNA amounts

  • Multiple ATs were tested: 3 calculated from negative samples, 1 from RFU signal to DNA input relationship, 3 fixed at 50, 150 and 200 RFU
  • True positive vs false positive rates were evaluated
  • Color specific AT’s from baseline signal had lowest total error
  • Arbitrarily AT’s had high incidences of allelic drop-out

Allele Sizing Precision

Allele Sizing Precision

ENFSI Guideline

The precision of the instrument should be such that all measured alleles fall within a ± 0.5 bp window around the measured size for the corresponding allele in the allelic ladder.

Allele Sizing Precision

Advanced Topics in Forensic DNA Typing: Interpretation

The standard deviation (SD) should be <0.15 bp so that 3 SD will be <0.5 bp. Then >99% of the time alleles differing by a single base pair can be distinguished from one another.

STR-validator Exercise

Allele Sizing Precision (~15 min)

Experimental Design

  • 4 allelic ladders

The dataset in this exercise comes from 4 allelic ladders. Ideally a full plate should be set up to maximize the data collected and to capture any variation between injections. If a ‘bad’ allelic ladder was selected as “Allelic Ladder”, the data could be analyzed in GeneMapper multiple times, with different wells assigned “Allelic Ladder”.

Instructions for exercises: https://rpubs.com/OskarHansson/935925

References

Allele Sizing Precision

  • The largest allele sizes yield the greatest standard deviation
  • 3500 and 3500xL Genetic Analyzers: SD <0.07 bases
  • 3130 and 3130xl Genetic Analyzers: SD <0.1 bases
  • Precision to distinguish alleles which differ 1 base

Peak Balance

Peak Balance

The peak balance ratios of heterozygote alleles within a locus (intra-locus balance) and of alleles between all loci (inter-locus or profile balance) should be >60% for good quality samples.

Calculation of Heterozygote Balance (intra-locus)

Calculation of Profile Balance (inter-locus)

STR-validator Exercise

Peak Balance (~15 min)

References

Peak Balance

  • Heterozygotes that are separated by one repeat unit will be affected by stuttering of the HMW allele onto the LMW allele
  • Hb vs. difference in repeat number shows the expected slight downward trend
  • Hb is more variable at low average peak height

  • Relative parameters stutter, heterozygote balance, and mixture proportion were very similar between the two instrument models
  • Interpretation guidelines developed on one machine are likely to be transportable to different CE instrument models and different machines of the same model

  • Hb variance start to decrease at very low amounts of DNA
  • Pristine diluted DNA is an acceptable approximation in validations to infer Hb
  • Sampling effects are illustrated in the supplement

Stutter Ratios

Stutters

  • Stutters are well-characterized PCR artefacts caused by strand slippage during the replication process

Stutters

  • Stutter ratios should be characterized to indicate when a stutter can be ignored as an artefact of the PCR

Calculation of Stutter Ratios

STR-validator Exercise

Stutter Ratios (~10 min)

Experimental Design

  • 85 reference samples
  • Amplified using 29 cycles

Instructions for exercises: https://rpubs.com/OskarHansson/935925

References

Stutter Ratios

  • Confirms characteristics using syntetic oligonucleotides
  • Linear relationship between stutter ratio and repeat number
  • Increased A–T content increases stutter ratio
  • Interruptions in repeating sequences decreased stutter ratios

  • Characterisation of +1 and -2 repeat stutters
  • Guidelines for interpretation

  • Length of uniform repeats vs. total number of repeats
  • Stutter ratios correlate with length of uniform repeats (LUS)

Thanks for Your Attention!

Sign Up for STR-validator News: oskhan@ous-hf.no