1. Introduction and Quantitative-Genetic Context

In classical quantitative genetics, we often start from a single biallelic locus to formalize the relationship between genotype and phenotype. This is a minimal building block for understanding more complex, polygenic traits.

Consider a locus with alleles A (often referred to as the “dominant” allele) and a (the “recessive” allele). We assume:

Allele frequencies in the population:
- p = frequency of allele A
- q = 1 − p = frequency of allele a
Genotypes and Hardy–Weinberg frequencies:
- AA with frequency p²
- Aa with frequency 2pq
- aa with frequency q²

We describe the genotypic values (e.g., for a quantitative trait like height, yield, etc.) using two parameters:

a: the (additive) deviation of the two homozygotes from the midpoint.
d: the dominance deviation of the heterozygote from the midpoint.

A common parameterization of genotypic values is:

\[ \begin{aligned} G_{AA} &= +a \\ G_{Aa} &= d \\ G_{aa} &= -a \end{aligned} \]

Your mathematical formula uses a sign convention equivalent to this, but arranged as:

\[ \mu = p^2 a + 2pq\,d - q^2 a \]

which is algebraically consistent with the standard parameterization once we fix the sign of a and what “dominant” vs “recessive” means. The important point: a and d describe the pattern of gene action (additive vs dominance), and p, q describe population genetics (allele frequencies).

Below, we will:

Compute a population mean from a single locus.
Explore how the population mean changes with allele frequency under different dominance scenarios.
Explore the Fisherian average effect of allele substitution (often denoted α), which connects the genetic model to the additive genetic variance underlying heritability.

2. A Single Example: Population Mean for Fixed Allele Frequencies

Here we compute the population mean phenotype given fixed allele frequencies and genotypic values.

Interpreting this:

When dominance is present (d ≠ 0), the population mean depends nonlinearly on allele frequency, because the heterozygote frequency 2pq enters the expression.
In a purely additive model (no dominance, d = 0), the mean is a linear function of allele frequency.

knitr::opts_chunk$set(echo = TRUE, fig.align = "center")

# Fixed allele frequencies and genotypic values

p <- 0.7          # frequency of the (here labeled) 'dominant' allele
q <- 0.3          # frequency of the recessive allele, by definition q = 1 - p

d <- 2            # heterozygote genotypic value (relative to baseline)
a <- 2            # genotypic value contribution per copy (sign convention)

# Population mean:
# mu = (p^2)*a + 2*p*q*d - (q^2)*a
# This is the same structural form as E(G) = Σ (genotype frequency × genotype value),
# given the chosen coding for genotypic values.

pop.mean <- (p^2)*a + 2*p*q*d - (q^2)*a
pop.mean

## [1] 1.64

3. Population Mean as a Function of Allele Frequency under Different Gene-Action Models

We now examine how the population mean changes as the allele frequency p varies from 0 to 1, under several standard dominance models:

Complete dominance: a = d
Partial dominance: a > d > 0
Additive (no dominance): d = 0
Overdominance: d > a > 0

These scenarios correspond to different shapes of the genotype–phenotype relationship:

In additive models, Aa is exactly midway between AA and aa.
In complete dominance, the heterozygote phenotype equals one of the homozygotes.
In overdominance, the heterozygote exceeds both homozygotes (e.g., heterosis).

# Sequence of allele frequencies p
p <- seq(0, 1, length = 101)
q <- 1 - p

# 3.1 Complete dominance model (a = d)
d_dom <- 2
a_dom <- 2
pop.mean.dom <- (p^2)*a_dom + 2*p*q*d_dom - (q^2)*a_dom

# Plot population mean under complete dominance
plot(p, pop.mean.dom,
     xlab = "Allele frequency p",
     ylab = "Expected population mean",
     type = "l", lwd = 3, col = "black",
     ylim = c(-3, 3))

# Shade region where q > p (i.e., p < 0.5)
rect(xleft = -0.1, ybottom = -3, xright = 0.5, ytop = 3, col = "grey90", border = NA)

# Redraw the dominance model line on top of the shaded region
lines(p, pop.mean.dom, lwd = 3, col = "black")

# 3.2 Partial dominance model (a > d)
d_part <- 1
a_part <- 2
pop.mean.part <- (p^2)*a_part + 2*p*q*d_part - (q^2)*a_part
lines(p, pop.mean.part, lty = 2, lwd = 3, col = "red")

# 3.3 Additive (no dominance) model (d = 0)
d_add <- 0
a_add <- 2
pop.mean.add <- (p^2)*a_add + 2*p*q*d_add - (q^2)*a_add
lines(p, pop.mean.add, lty = 3, lwd = 3, col = "darkgreen")

# 3.4 Overdominance model (d > a)
d_over <- 2
a_over <- 1
pop.mean.over <- (p^2)*a_over + 2*p*q*d_over - (q^2)*a_over
lines(p, pop.mean.over, lty = 3, lwd = 3, col = "blue")

# Reference lines at p = 0.5 and mean = 0
lines(c(-0.1, 1.1), c(0, 0), lwd = 0.5, lty = 3, col = "grey")
lines(c(0.5, 0.5), c(-3, 3), lwd = 0.5, lty = 3, col = "grey")

legend("topleft",
       legend = c("Complete dominance (a = d)",
                  "Partial dominance (a > d)",
                  "Additive (d = 0)",
                  "Overdominance (d > a)"),
       col = c("black", "red", "darkgreen", "blue"),
       lwd = 3, lty = c(1, 2, 3, 3), bty = "n")

Interpretation

Additive model (green line): slope is linear in p, reflecting the fact that the population mean is entirely determined by the additive effects.
Complete dominance (black line): the curve is nonlinear, and for low p the mean is closer to the recessive homozygote, while for high p it becomes more like the dominant genotype.
Partial dominance (red): intermediate curvature.
Overdominance (blue): maximum mean at intermediate frequencies where heterozygotes are common; mean decreases when either allele becomes fixed.

These patterns are foundational for understanding how allele frequency changes (evolution) interact with gene action to change population means.

4. Extra Visualization: Genotypic Values vs Allele Frequency

To better connect the mean curves to the actual genotype-specific values, we can plot the genotype means and the population mean together for one scenario.

This figure visually separates:

The fixed genotype values (G_AA, G_Aa, G_aa), which are determined by a and d and do not depend on p;
From the population mean, which is a frequency-weighted average over these genotypes and thus varies with p.

p <- seq(0, 1, length = 101)
q <- 1 - p

# Choose a model, e.g., partial dominance
a <- 2
d <- 1

# Genotypic values for AA, Aa, aa under the usual (a, d) interpretation:
G_AA <- a
G_Aa <- d
G_aa <- -a

# Population mean (same form as before, consistent with your code)
pop.mean <- (p^2)*a + 2*p*q*d - (q^2)*a

plot(p, pop.mean, type = "l", lwd = 3, col = "red",
     ylim = c(-2.5, 2.5),
     xlab = "Allele frequency p",
     ylab = "Genotypic values / population mean")

abline(h = G_AA, col = "blue", lty = 2, lwd = 2)
abline(h = G_Aa, col = "purple", lty = 2, lwd = 2)
abline(h = G_aa, col = "darkgreen", lty = 2, lwd = 2)

legend("topleft",
       legend = c("Population mean", "G(AA)", "G(Aa)", "G(aa)"),
       col = c("red", "blue", "purple", "darkgreen"),
       lwd = c(3, 2, 2, 2), lty = c(1, 2, 2, 2), bty = "n")

5. Average Effect of Allele Substitution (Fisher’s α)

The average effect of allele substitution (often noted α) captures the expected change in phenotype when we substitute one allele for another at random in the population, accounting for the current genotype composition.

For a biallelic locus, Fisher showed that the average effect of substituting allele A for a can be written (under a particular parameterization of a and d) as:

\[ \alpha = a + d (q - p) \]

In the code, the average effect on the trait for allele B (denoted B.effect) is computed as:

\[ \text{B.effect} = q \cdot \left(a + d(q - p)\right) \]

This expression is linked to the contribution to the population mean and is closely related to the additive genetic effect at that locus.

# Average effect for different dominance models

p <- seq(0, 1, length = 101)
q <- 1 - p

# Complete dominance model (a = d)
d_dom <- 2
a_dom <- 2
B.effect.D <- q * (a_dom + d_dom * (q - p))

plot(p, B.effect.D, xlab = "Allele frequency p",
     ylab = "Average effect (B.effect)",
     type = "l", lwd = 3, col = "black",
     ylim = c(-3, 5))

# Shade region for q > p (p < 0.5)
rect(-0.1, -3, 0.5, 5, col = "grey90", border = NA)

# Redraw the complete dominance line
lines(p, B.effect.D, lwd = 3, col = "black")

# Partial dominance model (a > d)
d_part <- 1
a_part <- 2
B.effect.P <- q * (a_part + d_part * (q - p))
lines(p, B.effect.P, lty = 2, lwd = 3, col = "red")

# Additive model (d = 0)
d_add <- 0
a_add <- 2
B.effect.A <- q * (a_add + d_add * (q - p))
lines(p, B.effect.A, lty = 3, lwd = 3, col = "darkgreen")

# Overdominance model (d > a)
d_over <- 2
a_over <- 1
B.effect.O <- q * (a_over + d_over * (q - p))
lines(p, B.effect.O, lty = 3, lwd = 3, col = "blue")

legend("topleft",
       legend = c("Complete dominance (a = d)",
                  "Partial dominance (a > d)",
                  "Additive (d = 0)",
                  "Overdominance (d > a)"),
       col = c("black", "red", "darkgreen", "blue"),
       lwd = 3, lty = c(1, 2, 3, 3), bty = "n")

Interpretation

When dominance is absent (d = 0), the average effect is (up to the scaling by q) constant with respect to p; additive effects do not depend on allele frequency.
When dominance is present, the average effect depends on genotype frequencies, hence on p and q. This makes the mapping from allele substitution to change in mean context-dependent: the same allelic substitution has different expected effects in different populations.

The average effect is crucial because:

The additive genetic variance V_A at a locus is proportional to 2pqα² (for the standard parameterization).
In polygenic models, the breeder’s equation R = h²S implicitly relies on the notion of average effects rather than raw genotypic values.

6. Additional Plot: Phenotypic Distributions under Different Gene Actions

To link this single-locus theory to phenotypic distributions, we can simulate a population, assume some environmental noise, and visualize the phenotype distributions under different gene-action models at a fixed allele frequency.

This illustrates how dominance patterns, even at a single locus, can affect the shape and mean of the phenotypic distribution.

set.seed(123)

n <- 10000
p <- 0.6
q <- 1 - p

# Function to simulate phenotypes for given a, d
simulate_phenotypes <- function(n, p, a, d, env_sd = 1) {
  # Genotype frequencies under Hardy-Weinberg
  geno <- sample(c("AA", "Aa", "aa"), size = n, replace = TRUE,
                 prob = c(p^2, 2*p*(1-p), (1-p)^2))
  
  # Genotypic values under the (a, d) scheme:
  G <- ifelse(geno == "AA",  a,
       ifelse(geno == "Aa",  d,
              -a))
  
  # Add environmental noise (normally distributed)
  P <- G + rnorm(n, mean = 0, sd = env_sd)
  data.frame(geno = geno, G = G, P = P)
}

# Define four models
models <- list(
  complete = list(a = 2, d = 2),
  partial  = list(a = 2, d = 1),
  additive = list(a = 2, d = 0),
  overdom  = list(a = 1, d = 2)
)

# Simulate for each model
sim_data <- lapply(names(models), function(m) {
  pars <- models[[m]]
  df <- simulate_phenotypes(n, p, a = pars$a, d = pars$d, env_sd = 1)
  df$model <- m
  df
})

sim_data <- do.call(rbind, sim_data)

# Plot distributions of phenotypes for each model
par(mfrow = c(2, 2))
for (m in names(models)) {
  df <- subset(sim_data, model == m)
  hist(df$P, breaks = 40, freq = FALSE,
       main = paste("Phenotype distribution:", m),
       xlab = "Phenotype",
       col = "grey80", border = "white")
  lines(density(df$P), col = "black", lwd = 2)
}

par(mfrow = c(1, 1))

Interpretation

All four distributions are approximately normal due to environmental noise, but their means and subtle shapes differ depending on the gene action.
Overdominance tends to increase the central mass (more heterozygotes with extreme genotypic value), which shifts the distribution mean and sometimes alters its kurtosis.
In purely additive models, the genotype contributions are symmetric around 0, so all non-normality arises from environment and sampling.

7. Summary

The population mean for a single locus depends on allele frequencies and the pattern of gene action (additive vs dominance).
Different dominance scenarios produce distinct nonlinear relationships between allele frequency and mean phenotype.
The average effect of allele substitution formalizes how a change in allele frequency translates into a change in population mean and lies at the heart of the additive genetic variance and response to selection.
Even with a single locus, dominance can noticeably alter the phenotypic distribution, especially when combined with environmental variance.

This single-locus framework generalizes to many loci in polygenic models, where the total mean and variance are sums of locus-specific contributions, and the same principles (mean, dominance, average effects, and their dependence on allele frequencies) scale up to the familiar tools of quantitative genetics (e.g., V_A, V_D, h², R = h²S).

PHENOTYPIC DISTRIBUTIONS BASED ON THE AMOUNTS OF GENES INVOLVED

GENE643/SCSC643

1. Introduction and Quantitative-Genetic Context

2. A Single Example: Population Mean for Fixed Allele Frequencies

3. Population Mean as a Function of Allele Frequency under Different Gene-Action Models

Interpretation

4. Extra Visualization: Genotypic Values vs Allele Frequency

5. Average Effect of Allele Substitution (Fisher’s α)

Interpretation

6. Additional Plot: Phenotypic Distributions under Different Gene Actions

Interpretation

7. Summary