In classical quantitative genetics, we often start from a single biallelic locus to formalize the relationship between genotype and phenotype. This is a minimal building block for understanding more complex, polygenic traits.
Consider a locus with alleles A (often referred to as
the “dominant” allele) and a (the “recessive” allele). We
assume:
AaAA with frequency p²Aa with frequency 2pqaa with frequency q²We describe the genotypic values (e.g., for a quantitative trait like height, yield, etc.) using two parameters:
A common parameterization of genotypic values is:
\[ \begin{aligned} G_{AA} &= +a \\ G_{Aa} &= d \\ G_{aa} &= -a \end{aligned} \]
Your mathematical formula uses a sign convention equivalent to this, but arranged as:
\[ \mu = p^2 a + 2pq\,d - q^2 a \]
which is algebraically consistent with the standard parameterization once we fix the sign of a and what “dominant” vs “recessive” means. The important point: a and d describe the pattern of gene action (additive vs dominance), and p, q describe population genetics (allele frequencies).
Below, we will:
Here we compute the population mean phenotype given fixed allele frequencies and genotypic values.
Interpreting this:
knitr::opts_chunk$set(echo = TRUE, fig.align = "center")
# Fixed allele frequencies and genotypic values
p <- 0.7 # frequency of the (here labeled) 'dominant' allele
q <- 0.3 # frequency of the recessive allele, by definition q = 1 - p
d <- 2 # heterozygote genotypic value (relative to baseline)
a <- 2 # genotypic value contribution per copy (sign convention)
# Population mean:
# mu = (p^2)*a + 2*p*q*d - (q^2)*a
# This is the same structural form as E(G) = Σ (genotype frequency × genotype value),
# given the chosen coding for genotypic values.
pop.mean <- (p^2)*a + 2*p*q*d - (q^2)*a
pop.mean
## [1] 1.64
We now examine how the population mean changes as the allele frequency p varies from 0 to 1, under several standard dominance models:
These scenarios correspond to different shapes of the genotype–phenotype relationship:
Aa is exactly midway between
AA and aa.# Sequence of allele frequencies p
p <- seq(0, 1, length = 101)
q <- 1 - p
# 3.1 Complete dominance model (a = d)
d_dom <- 2
a_dom <- 2
pop.mean.dom <- (p^2)*a_dom + 2*p*q*d_dom - (q^2)*a_dom
# Plot population mean under complete dominance
plot(p, pop.mean.dom,
xlab = "Allele frequency p",
ylab = "Expected population mean",
type = "l", lwd = 3, col = "black",
ylim = c(-3, 3))
# Shade region where q > p (i.e., p < 0.5)
rect(xleft = -0.1, ybottom = -3, xright = 0.5, ytop = 3, col = "grey90", border = NA)
# Redraw the dominance model line on top of the shaded region
lines(p, pop.mean.dom, lwd = 3, col = "black")
# 3.2 Partial dominance model (a > d)
d_part <- 1
a_part <- 2
pop.mean.part <- (p^2)*a_part + 2*p*q*d_part - (q^2)*a_part
lines(p, pop.mean.part, lty = 2, lwd = 3, col = "red")
# 3.3 Additive (no dominance) model (d = 0)
d_add <- 0
a_add <- 2
pop.mean.add <- (p^2)*a_add + 2*p*q*d_add - (q^2)*a_add
lines(p, pop.mean.add, lty = 3, lwd = 3, col = "darkgreen")
# 3.4 Overdominance model (d > a)
d_over <- 2
a_over <- 1
pop.mean.over <- (p^2)*a_over + 2*p*q*d_over - (q^2)*a_over
lines(p, pop.mean.over, lty = 3, lwd = 3, col = "blue")
# Reference lines at p = 0.5 and mean = 0
lines(c(-0.1, 1.1), c(0, 0), lwd = 0.5, lty = 3, col = "grey")
lines(c(0.5, 0.5), c(-3, 3), lwd = 0.5, lty = 3, col = "grey")
legend("topleft",
legend = c("Complete dominance (a = d)",
"Partial dominance (a > d)",
"Additive (d = 0)",
"Overdominance (d > a)"),
col = c("black", "red", "darkgreen", "blue"),
lwd = 3, lty = c(1, 2, 3, 3), bty = "n")
These patterns are foundational for understanding how allele frequency changes (evolution) interact with gene action to change population means.
To better connect the mean curves to the actual genotype-specific values, we can plot the genotype means and the population mean together for one scenario.
This figure visually separates:
G_AA,
G_Aa, G_aa), which are determined by
a and d and do not depend on p;p <- seq(0, 1, length = 101)
q <- 1 - p
# Choose a model, e.g., partial dominance
a <- 2
d <- 1
# Genotypic values for AA, Aa, aa under the usual (a, d) interpretation:
G_AA <- a
G_Aa <- d
G_aa <- -a
# Population mean (same form as before, consistent with your code)
pop.mean <- (p^2)*a + 2*p*q*d - (q^2)*a
plot(p, pop.mean, type = "l", lwd = 3, col = "red",
ylim = c(-2.5, 2.5),
xlab = "Allele frequency p",
ylab = "Genotypic values / population mean")
abline(h = G_AA, col = "blue", lty = 2, lwd = 2)
abline(h = G_Aa, col = "purple", lty = 2, lwd = 2)
abline(h = G_aa, col = "darkgreen", lty = 2, lwd = 2)
legend("topleft",
legend = c("Population mean", "G(AA)", "G(Aa)", "G(aa)"),
col = c("red", "blue", "purple", "darkgreen"),
lwd = c(3, 2, 2, 2), lty = c(1, 2, 2, 2), bty = "n")
The average effect of allele substitution (often noted α) captures the expected change in phenotype when we substitute one allele for another at random in the population, accounting for the current genotype composition.
For a biallelic locus, Fisher showed that the average effect of
substituting allele A for a can be written
(under a particular parameterization of a and d)
as:
\[ \alpha = a + d (q - p) \]
In the code, the average effect on the trait for allele
B (denoted B.effect) is computed as:
\[ \text{B.effect} = q \cdot \left(a + d(q - p)\right) \]
This expression is linked to the contribution to the population mean and is closely related to the additive genetic effect at that locus.
# Average effect for different dominance models
p <- seq(0, 1, length = 101)
q <- 1 - p
# Complete dominance model (a = d)
d_dom <- 2
a_dom <- 2
B.effect.D <- q * (a_dom + d_dom * (q - p))
plot(p, B.effect.D, xlab = "Allele frequency p",
ylab = "Average effect (B.effect)",
type = "l", lwd = 3, col = "black",
ylim = c(-3, 5))
# Shade region for q > p (p < 0.5)
rect(-0.1, -3, 0.5, 5, col = "grey90", border = NA)
# Redraw the complete dominance line
lines(p, B.effect.D, lwd = 3, col = "black")
# Partial dominance model (a > d)
d_part <- 1
a_part <- 2
B.effect.P <- q * (a_part + d_part * (q - p))
lines(p, B.effect.P, lty = 2, lwd = 3, col = "red")
# Additive model (d = 0)
d_add <- 0
a_add <- 2
B.effect.A <- q * (a_add + d_add * (q - p))
lines(p, B.effect.A, lty = 3, lwd = 3, col = "darkgreen")
# Overdominance model (d > a)
d_over <- 2
a_over <- 1
B.effect.O <- q * (a_over + d_over * (q - p))
lines(p, B.effect.O, lty = 3, lwd = 3, col = "blue")
legend("topleft",
legend = c("Complete dominance (a = d)",
"Partial dominance (a > d)",
"Additive (d = 0)",
"Overdominance (d > a)"),
col = c("black", "red", "darkgreen", "blue"),
lwd = 3, lty = c(1, 2, 3, 3), bty = "n")
The average effect is crucial because:
To link this single-locus theory to phenotypic distributions, we can simulate a population, assume some environmental noise, and visualize the phenotype distributions under different gene-action models at a fixed allele frequency.
This illustrates how dominance patterns, even at a single locus, can affect the shape and mean of the phenotypic distribution.
set.seed(123)
n <- 10000
p <- 0.6
q <- 1 - p
# Function to simulate phenotypes for given a, d
simulate_phenotypes <- function(n, p, a, d, env_sd = 1) {
# Genotype frequencies under Hardy-Weinberg
geno <- sample(c("AA", "Aa", "aa"), size = n, replace = TRUE,
prob = c(p^2, 2*p*(1-p), (1-p)^2))
# Genotypic values under the (a, d) scheme:
G <- ifelse(geno == "AA", a,
ifelse(geno == "Aa", d,
-a))
# Add environmental noise (normally distributed)
P <- G + rnorm(n, mean = 0, sd = env_sd)
data.frame(geno = geno, G = G, P = P)
}
# Define four models
models <- list(
complete = list(a = 2, d = 2),
partial = list(a = 2, d = 1),
additive = list(a = 2, d = 0),
overdom = list(a = 1, d = 2)
)
# Simulate for each model
sim_data <- lapply(names(models), function(m) {
pars <- models[[m]]
df <- simulate_phenotypes(n, p, a = pars$a, d = pars$d, env_sd = 1)
df$model <- m
df
})
sim_data <- do.call(rbind, sim_data)
# Plot distributions of phenotypes for each model
par(mfrow = c(2, 2))
for (m in names(models)) {
df <- subset(sim_data, model == m)
hist(df$P, breaks = 40, freq = FALSE,
main = paste("Phenotype distribution:", m),
xlab = "Phenotype",
col = "grey80", border = "white")
lines(density(df$P), col = "black", lwd = 2)
}
par(mfrow = c(1, 1))
This single-locus framework generalizes to many loci in polygenic models, where the total mean and variance are sums of locus-specific contributions, and the same principles (mean, dominance, average effects, and their dependence on allele frequencies) scale up to the familiar tools of quantitative genetics (e.g., VA, VD, h², R = h²S).