1. Introduction and Quantitative-Genetic Context

In classical quantitative genetics, we often start from a single biallelic locus to formalize the relationship between genotype and phenotype. This is a minimal building block for understanding more complex, polygenic traits.

Consider a locus with alleles A (often referred to as the “dominant” allele) and a (the “recessive” allele). We assume:

We describe the genotypic values (e.g., for a quantitative trait like height, yield, etc.) using two parameters:

A common parameterization of genotypic values is:

\[ \begin{aligned} G_{AA} &= +a \\ G_{Aa} &= d \\ G_{aa} &= -a \end{aligned} \]

Your mathematical formula uses a sign convention equivalent to this, but arranged as:

\[ \mu = p^2 a + 2pq\,d - q^2 a \]

which is algebraically consistent with the standard parameterization once we fix the sign of a and what “dominant” vs “recessive” means. The important point: a and d describe the pattern of gene action (additive vs dominance), and p, q describe population genetics (allele frequencies).

Below, we will:

  1. Compute a population mean from a single locus.
  2. Explore how the population mean changes with allele frequency under different dominance scenarios.
  3. Explore the Fisherian average effect of allele substitution (often denoted α), which connects the genetic model to the additive genetic variance underlying heritability.

2. A Single Example: Population Mean for Fixed Allele Frequencies

Here we compute the population mean phenotype given fixed allele frequencies and genotypic values.

Interpreting this:

knitr::opts_chunk$set(echo = TRUE, fig.align = "center")

# Fixed allele frequencies and genotypic values

p <- 0.7          # frequency of the (here labeled) 'dominant' allele
q <- 0.3          # frequency of the recessive allele, by definition q = 1 - p

d <- 2            # heterozygote genotypic value (relative to baseline)
a <- 2            # genotypic value contribution per copy (sign convention)

# Population mean:
# mu = (p^2)*a + 2*p*q*d - (q^2)*a
# This is the same structural form as E(G) = Σ (genotype frequency × genotype value),
# given the chosen coding for genotypic values.

pop.mean <- (p^2)*a + 2*p*q*d - (q^2)*a
pop.mean
## [1] 1.64

3. Population Mean as a Function of Allele Frequency under Different Gene-Action Models

We now examine how the population mean changes as the allele frequency p varies from 0 to 1, under several standard dominance models:

These scenarios correspond to different shapes of the genotype–phenotype relationship:

# Sequence of allele frequencies p
p <- seq(0, 1, length = 101)
q <- 1 - p

# 3.1 Complete dominance model (a = d)
d_dom <- 2
a_dom <- 2
pop.mean.dom <- (p^2)*a_dom + 2*p*q*d_dom - (q^2)*a_dom

# Plot population mean under complete dominance
plot(p, pop.mean.dom,
     xlab = "Allele frequency p",
     ylab = "Expected population mean",
     type = "l", lwd = 3, col = "black",
     ylim = c(-3, 3))

# Shade region where q > p (i.e., p < 0.5)
rect(xleft = -0.1, ybottom = -3, xright = 0.5, ytop = 3, col = "grey90", border = NA)

# Redraw the dominance model line on top of the shaded region
lines(p, pop.mean.dom, lwd = 3, col = "black")

# 3.2 Partial dominance model (a > d)
d_part <- 1
a_part <- 2
pop.mean.part <- (p^2)*a_part + 2*p*q*d_part - (q^2)*a_part
lines(p, pop.mean.part, lty = 2, lwd = 3, col = "red")

# 3.3 Additive (no dominance) model (d = 0)
d_add <- 0
a_add <- 2
pop.mean.add <- (p^2)*a_add + 2*p*q*d_add - (q^2)*a_add
lines(p, pop.mean.add, lty = 3, lwd = 3, col = "darkgreen")

# 3.4 Overdominance model (d > a)
d_over <- 2
a_over <- 1
pop.mean.over <- (p^2)*a_over + 2*p*q*d_over - (q^2)*a_over
lines(p, pop.mean.over, lty = 3, lwd = 3, col = "blue")

# Reference lines at p = 0.5 and mean = 0
lines(c(-0.1, 1.1), c(0, 0), lwd = 0.5, lty = 3, col = "grey")
lines(c(0.5, 0.5), c(-3, 3), lwd = 0.5, lty = 3, col = "grey")

legend("topleft",
       legend = c("Complete dominance (a = d)",
                  "Partial dominance (a > d)",
                  "Additive (d = 0)",
                  "Overdominance (d > a)"),
       col = c("black", "red", "darkgreen", "blue"),
       lwd = 3, lty = c(1, 2, 3, 3), bty = "n")

Interpretation

  • Additive model (green line): slope is linear in p, reflecting the fact that the population mean is entirely determined by the additive effects.
  • Complete dominance (black line): the curve is nonlinear, and for low p the mean is closer to the recessive homozygote, while for high p it becomes more like the dominant genotype.
  • Partial dominance (red): intermediate curvature.
  • Overdominance (blue): maximum mean at intermediate frequencies where heterozygotes are common; mean decreases when either allele becomes fixed.

These patterns are foundational for understanding how allele frequency changes (evolution) interact with gene action to change population means.

4. Extra Visualization: Genotypic Values vs Allele Frequency

To better connect the mean curves to the actual genotype-specific values, we can plot the genotype means and the population mean together for one scenario.

This figure visually separates:

p <- seq(0, 1, length = 101)
q <- 1 - p

# Choose a model, e.g., partial dominance
a <- 2
d <- 1

# Genotypic values for AA, Aa, aa under the usual (a, d) interpretation:
G_AA <- a
G_Aa <- d
G_aa <- -a

# Population mean (same form as before, consistent with your code)
pop.mean <- (p^2)*a + 2*p*q*d - (q^2)*a

plot(p, pop.mean, type = "l", lwd = 3, col = "red",
     ylim = c(-2.5, 2.5),
     xlab = "Allele frequency p",
     ylab = "Genotypic values / population mean")

abline(h = G_AA, col = "blue", lty = 2, lwd = 2)
abline(h = G_Aa, col = "purple", lty = 2, lwd = 2)
abline(h = G_aa, col = "darkgreen", lty = 2, lwd = 2)

legend("topleft",
       legend = c("Population mean", "G(AA)", "G(Aa)", "G(aa)"),
       col = c("red", "blue", "purple", "darkgreen"),
       lwd = c(3, 2, 2, 2), lty = c(1, 2, 2, 2), bty = "n")

5. Average Effect of Allele Substitution (Fisher’s α)

The average effect of allele substitution (often noted α) captures the expected change in phenotype when we substitute one allele for another at random in the population, accounting for the current genotype composition.

For a biallelic locus, Fisher showed that the average effect of substituting allele A for a can be written (under a particular parameterization of a and d) as:

\[ \alpha = a + d (q - p) \]

In the code, the average effect on the trait for allele B (denoted B.effect) is computed as:

\[ \text{B.effect} = q \cdot \left(a + d(q - p)\right) \]

This expression is linked to the contribution to the population mean and is closely related to the additive genetic effect at that locus.

# Average effect for different dominance models

p <- seq(0, 1, length = 101)
q <- 1 - p

# Complete dominance model (a = d)
d_dom <- 2
a_dom <- 2
B.effect.D <- q * (a_dom + d_dom * (q - p))

plot(p, B.effect.D, xlab = "Allele frequency p",
     ylab = "Average effect (B.effect)",
     type = "l", lwd = 3, col = "black",
     ylim = c(-3, 5))

# Shade region for q > p (p < 0.5)
rect(-0.1, -3, 0.5, 5, col = "grey90", border = NA)

# Redraw the complete dominance line
lines(p, B.effect.D, lwd = 3, col = "black")

# Partial dominance model (a > d)
d_part <- 1
a_part <- 2
B.effect.P <- q * (a_part + d_part * (q - p))
lines(p, B.effect.P, lty = 2, lwd = 3, col = "red")

# Additive model (d = 0)
d_add <- 0
a_add <- 2
B.effect.A <- q * (a_add + d_add * (q - p))
lines(p, B.effect.A, lty = 3, lwd = 3, col = "darkgreen")

# Overdominance model (d > a)
d_over <- 2
a_over <- 1
B.effect.O <- q * (a_over + d_over * (q - p))
lines(p, B.effect.O, lty = 3, lwd = 3, col = "blue")

legend("topleft",
       legend = c("Complete dominance (a = d)",
                  "Partial dominance (a > d)",
                  "Additive (d = 0)",
                  "Overdominance (d > a)"),
       col = c("black", "red", "darkgreen", "blue"),
       lwd = 3, lty = c(1, 2, 3, 3), bty = "n")

Interpretation

  • When dominance is absent (d = 0), the average effect is (up to the scaling by q) constant with respect to p; additive effects do not depend on allele frequency.
  • When dominance is present, the average effect depends on genotype frequencies, hence on p and q. This makes the mapping from allele substitution to change in mean context-dependent: the same allelic substitution has different expected effects in different populations.

The average effect is crucial because:

  • The additive genetic variance VA at a locus is proportional to 2pqα² (for the standard parameterization).
  • In polygenic models, the breeder’s equation R = h²S implicitly relies on the notion of average effects rather than raw genotypic values.

6. Additional Plot: Phenotypic Distributions under Different Gene Actions

To link this single-locus theory to phenotypic distributions, we can simulate a population, assume some environmental noise, and visualize the phenotype distributions under different gene-action models at a fixed allele frequency.

This illustrates how dominance patterns, even at a single locus, can affect the shape and mean of the phenotypic distribution.

set.seed(123)

n <- 10000
p <- 0.6
q <- 1 - p

# Function to simulate phenotypes for given a, d
simulate_phenotypes <- function(n, p, a, d, env_sd = 1) {
  # Genotype frequencies under Hardy-Weinberg
  geno <- sample(c("AA", "Aa", "aa"), size = n, replace = TRUE,
                 prob = c(p^2, 2*p*(1-p), (1-p)^2))
  
  # Genotypic values under the (a, d) scheme:
  G <- ifelse(geno == "AA",  a,
       ifelse(geno == "Aa",  d,
              -a))
  
  # Add environmental noise (normally distributed)
  P <- G + rnorm(n, mean = 0, sd = env_sd)
  data.frame(geno = geno, G = G, P = P)
}

# Define four models
models <- list(
  complete = list(a = 2, d = 2),
  partial  = list(a = 2, d = 1),
  additive = list(a = 2, d = 0),
  overdom  = list(a = 1, d = 2)
)

# Simulate for each model
sim_data <- lapply(names(models), function(m) {
  pars <- models[[m]]
  df <- simulate_phenotypes(n, p, a = pars$a, d = pars$d, env_sd = 1)
  df$model <- m
  df
})

sim_data <- do.call(rbind, sim_data)

# Plot distributions of phenotypes for each model
par(mfrow = c(2, 2))
for (m in names(models)) {
  df <- subset(sim_data, model == m)
  hist(df$P, breaks = 40, freq = FALSE,
       main = paste("Phenotype distribution:", m),
       xlab = "Phenotype",
       col = "grey80", border = "white")
  lines(density(df$P), col = "black", lwd = 2)
}

par(mfrow = c(1, 1))

Interpretation

  • All four distributions are approximately normal due to environmental noise, but their means and subtle shapes differ depending on the gene action.
  • Overdominance tends to increase the central mass (more heterozygotes with extreme genotypic value), which shifts the distribution mean and sometimes alters its kurtosis.
  • In purely additive models, the genotype contributions are symmetric around 0, so all non-normality arises from environment and sampling.

7. Summary

This single-locus framework generalizes to many loci in polygenic models, where the total mean and variance are sums of locus-specific contributions, and the same principles (mean, dominance, average effects, and their dependence on allele frequencies) scale up to the familiar tools of quantitative genetics (e.g., VA, VD, h², R = h²S).