2025-06-08

Hardy Weinberg Equilibrium - Using Statistics to Understand Evolution

  • Hardy Weinberg Equilibrium is a concept in genetics which states that allele frequencies within a given population will remain constant if there are no evolutionary forces acting on that population.

  • This is rarely, if ever, observed in reality, but this principle allows geneticists to predict allele frequencies under equilibrium assumptions and compare observed allele frequencies to these expected values. This can provide valuable information regarding the evolutionary forces that are acting on a population.

Fundamentals of Hardy Weinberg - Part 1

Deriving the Hardy Weinberg Equilibrium for a population is very simple. The equation is simply:

\(p^2 + 2pq + q^2 = 1\)

When working with genotypes of two alleles, \(p^2\) is the frequency of the homozygous dominant genotype, \(2pq\) is the frequency of the heterozygous genotype, and \(q^2\) is the frequency of the homozygous recessive genotype. When working with more than two alleles, we must generalize the equation:

\(p^2 + q^2 + r^2 + 2pq + 2pr + 2qr + ... = 1\)

Fundamentals of Hardy Weinberg - Part 2

Often, when working with a gene with two alleles, we will have data on the number of individuals in the population that are homozygous dominant, heterozygous, and homozygous recessive. We can use this information to find the frequencies of each allele.

Frequency of p: \(p^2 + 0.5(2pq)\)

Frequency q: \(q^2 + 0.5(2pq)\)

or, more simply, 1 - Frequency of Allele 1

Applying Hardy Weinberg Principle

  • Soon we will apply the Hardy Weinberg Principle to real data. First, let’s take a quick look at the data set that we’ll be using.
  • This data set comes from the United States Geological Survey (USGS) and contains data on Atlantic Sturgeon from 18 different populations. For each Sturgeon sampled, we have data on 12 loci, as well as the fish’s length, sex, and a few other metrics.
  • Later on we will be examinig two alleles at the LS68 locus to demonstrate how to determine the Hardy Weinberg equilibrium. The next slide shows the frequencies of the alleles of interest by population.
  • You can view and download the data set here

Applying Hardy Weinberg Principle

Applying the Hardy Weinberg Principle to Atlantic Sturgeon Data - Step 1

Our first step is to determine the frequency of each allele in this data set.

The code below will retrieve the allelic frequencies for the “Canadian Rivers” Distinct Population Segment (DPS).

To make this analysis simpler for the sake of demonstration, we are selecting just two of the five alleles that Sturgeon in this DPS express at this loci.

alleles <- sturg %>% 
  filter(DPS == "Canadian Rivers" & (LS68 == 151 | LS68 == 155) & 
           (LS68.1 == 151 | LS68.1 == 155)) %>% 
  select(DPS, LS68, LS68.1) %>% 
  group_by(DPS, LS68, LS68.1) %>% 
  summarise(count=n())

allele_frequencies <- alleles %>% 
  ungroup() %>% 
  mutate(frequency = round(count / sum(count), 3))

Applying the Hardy Weinberg Principle to Atlantic Sturgeon Data - Step 2

Let’s assign each allele in the following way:

p = Allele 151, q = Allele 155

Then, the genotypes have the following frequencies:

\(p^2\) = 0.107, 2pq = 0.321, \(q^2\) = 0.571

From here, we can calculate the frequency of each allele:

\(\text{frequency of p} = p^2 + 0.5(2pq) = 0.268\)
\(\text{frequency of q} = 1 - p = 0.732\)

Now that we know the observed frequencies of each allele, we can use that to calculate the expected genotype frequencies.

\(\text{frequency of pp} = p^2 = 0.268^2 = 0.072\)
\(\text{frequency of pq} = 2pq = 2 * 0.268 * 0.732 = 0.392\)
\(\text{frequency of qq} = q^2 = 0.732^2 = 0.536\)

Determining if Population is in Equilibrium

Using a \(\chi^2\) test, we can determine if the population of Sturgeon are in Hardy Weinberg equilibrium at the LS68 locus. Hardy Weinberg equilibrium is our null hypothesis, so if our result differs enough from equilibrium, we can reject the null hypothesis and state that the population is not in equilibrium. Otherwise, we fail to reject the null hypothesis and state that the population is in equilibrium.

\(\chi^2 = \Sigma \frac{(\text{O - E}^2)}{\text{E}}\)

Where O is the observed value and E is the expected value. In the \(\chi^2\) test, we use counts of individuals with each genotype, rather than frequencies.

Using the expected frequencies and a total population size of 28, we get that the expected values for each genotype are 2, 11, and 15, while the observed genotypes are 3, 9, and 16, respectively.

Thus,

\(\chi^2 = \frac{(\text{3-2})^2}{\text{2}} + \frac{(\text{9-11})^2}{\text{11}} + \frac{(\text{16-15})^2}{\text{15}} = 0.93\)

With 2 degrees of freedom, a \(\chi^2\) value of 0.93 corresponds to a p-value of roughly 0.63, meaning that we fail to reject the null hypothesis, and this population seems to be in Hardy Weinberg Equilibrium.

Simulating Changes in Frequency

  • Using a simple function shown on the next slide, we can see how the equilibrium will shift under different conditions. This can be used to test how real world events like migration may affect equilibrium.
  • The function takes the frequency of the “p” and “q” alleles, as well as the population size, and returns the \(\chi^2\) value.

Function to Simulate Changes in Frequency

hw_simulator <- function(p, q, n) {
  stopifnot(p + q <= 1)
  pq = 1 - (p + q)
  
  p_freq = p + 0.5 * (pq)
  q_freq = 1 - p_freq
  
  pp_freq = p_freq^2
  pq_freq = 2 * p_freq * q_freq
  qq_freq = q_freq^2
  
  obs_p_count = round(p * n)
  obs_q_count = round(q * n)
  obs_pq_count = round(pq * n)
  exp_p_count = round(pp_freq * n)
  exp_q_count = round(qq_freq * n)
  exp_pq_count = round(pq_freq * n)
    
  chi_squared = ((obs_p_count - exp_p_count)^2) / exp_p_count + 
    ((obs_q_count - exp_q_count)^2) / exp_q_count +
    ((obs_pq_count - exp_pq_count)^2) / exp_pq_count
  
  return(chi_squared)
}

Visualizing the Simulations

We can plot this to get a visual representation of how different factors may affect the Hardy Weinberg equilibrium of a population. On the x-axis is the frequency of allele “p” and on the y-axis is the frequency of the “q” allele. The size of each point corresponds to the \(\chi^2\) value. Larger \(\chi^2\) values correspond to populations further away from equilibrium.

Visualizing the Simulations

We can also plot the frequencies of the three genotypes in 3D to see how all three values affect the chi-squared value.

## Warning: `line.width` does not currently support multiple values.