2025-06-07

Introduction to Forensic DNA Analysis

  • Forensic DNA analysis is used to identify individuals by their genetic profiles.
    It involves collecting DNA from evidence at a crime scene (like blood, hair, or skin cells) and comparing it to DNA from a suspect or a known individual. Because every person (except identical twins) has a unique DNA sequence, this technique can strongly link a person to a crime or exclude them.

  • DNA profiles are constructed using Short Tandem Repeats (STRs). STRs are small, repeating sequences in the genome that vary in length between individuals. Forensic labs analyze specific STR locations (loci) that are known to be highly variable in the population. These STR patterns make up a person’s DNA profile.

  • A match means the suspect and crime scene samples have the same STR alleles. When two DNA profiles have the same alleles (STR lengths) at all examined loci, it’s considered a “match.” But because a match can occur by chance in rare cases, statistics are needed to determine how likely that is and how strong the evidence really is.

Role of Statistics in DNA Evidence

  • DNA matching is not absolute it’s based on probability.
    Even if two DNA samples appear to match, there’s always some chance the match occurred randomly. We can’t say with 100% certainty that two profiles come from the same person only how likely or unlikely a match is.

  • Random Match Probability (RMP) estimates how likely a random person matches.
    RMP tells us how often someone from the general population would randomly match the DNA profile from the crime scene. For example, an RMP of 1 in 10,000 means one person out of 10,000 would be expected to have the same DNA profile by chance.

  • Statistics is used to test hypotheses about identity.
    We use hypothesis testing to decide whether the DNA match is statistically significant or likely to be coincidental. By comparing observed DNA data to population frequency data, we can test whether a suspect is likely the source of the DNA found at the scene.

Hypothesis Testing Basics

\[ H_0: \text{DNA samples come from different individuals} \\ H_a: \text{DNA samples come from the same individual} \]

  • A low p-value is strong evidence against \(H_0\).
  • Helps determine whether match is random or meaningful.

DNA Match Example

  • Let’s say DNA profile match probability is 1 in 10,000.

  • Then, p-value = 0.0001 \[ p = P(\text{random match}) = 0.0001 \]

  • Very small p-value → strong evidence samples are from same person.

Allele Frequency Distribution (ggplot)

What the Allele Frequency Chart tells us

  • The Chart shows Different Allele’s we have in our DNA with Allele A being the most common type and Allele E being the most rare
  • Allele E being a rare occurring allele is important in matching with DNA from a crime scene with the same allele showing that statistically the person with the matching DNA is more likely to not be a random match.
  • This frequency Data is used to calculate Random Match Probability (RMP) and Likelihood ratios in hypothesis testing.

Comparing Match vs Random Match Likelihoods

Match vs Random Match Likelihoods

  • As you can see in the chart, the chances of receiving a random match of someone’s DNA to the DNA of the crime scene but not being the person involved in the crime scene is a lot lower than a true match but not impossible due to certain allele commonalities
  • However even with all of the information provided in DNA, Random matches are very unlikely, but not impossible.

In Conclusion

  • Hypothesis testing and statistics are a very useful tool in Forensic DNA and sciences in order to complete necessary tasks in this occupation and DNA analysis is one of the many topics where Statistical analysis is used as a main function for the job.