Sewall Wright Bayes and Binomial Examples

Author

Dr Andrew Dalby

Bayes Rule as the Probability of Causes

I have been reading Sewall Wright’s Evolution and the Genetics of Populations(Wright 1984). It is an important and foundational work in Population Genetics but Wright was also one of the first people to think about path integrals and causality.

When he is looking at genotypes determined by many allellic sites he actually introduces Bayes rule as the probability of causes, vol 1 p 144.

\[ P_{A_{i}|B_{j}}=\dfrac{P_{A_{i}}P_{B_{j}|A_{i}}}{P_{B_{j}}}= \dfrac{(P_{A_{i}}P_{B_{j}|A_{i}})}{\sum_{i}(P_{A_{i}}P_{B_{j}|A_{i}})} \]

Where A defines all of the alleles that make up a specific Genotype and B is the Phenotype caused by those alleles.

He gives an example for a case with a single site D/d where you consider the Dominant individuals produced by crossing Dd x Dd. This produces 2 Dd offspring and a DD offspring but from the phenotype they cannot be distinguished. If these are then mated with a recessive parent then you can distinguish the two with increasing confidence as the number of Dominant offspring increases. \(P_{n}\) is the probability of n dominants and no recessives.

Table of Bayesian Calculations
\(m_{i}\) Mating Type \(P_{m_{i}}\) \(P_{n|m_{i}}\) \(P_{m_{i}}P_{n|m_{i}}\) \(P_{m_{i}|n}\) Posterior
\(m_{1}= DD \times dd\) \(1/3\) \(1\) \(1/3\) \(2^{n-1}/(2^{n-1}+1)\)
\(m_{2}= Dd \times dd\) \(2/3\) \((1/2)^{n}\) \(2/3(1/2)^{n}\) \(1/(2^{n-1}+1)\)
Total \(1\) \(P_{n}=\dfrac{1}{3} \bigg[\dfrac{2^{n-1}+1}{2^{n-1}} \bigg]\) \(1\)

As soon as you see one recessive offspring you know it cannot be DD and the more dominant offspring you see the more likely it becomes to be DD.

Binomial Examples

Wright then gives the derivations of the summary statistics and the moments of the distributions in order to characterise actual biological distributions by showing that they have those properties. He also gives a demonstration of how properties approach the normal distribution when you have a large number of contributing sites (analogous to the Likert examples I have given before).

For the binomial case he had sets of 10 coins that were thrown 100 times by 166 students in his biometry class.

library("ggplot2")
library("e1071")

Attaching package: 'e1071'
The following object is masked from 'package:ggplot2':

    element
coins <- c(rep(0,11),rep(1,160),rep(2,736),rep(3,1923),rep(4,3419),rep(5,4145),rep(6,3364),rep(7,1878),rep(8,786),rep(9,156),rep(10,22))
coins <- as.data.frame(coins)
colnames(coins) <- c("Heads")
ggplot(coins, aes(x=Heads, y= after_stat(density)))+
  geom_histogram(fill="#560591",color="White", binwidth = 1)

m1 <- mean(coins$Heads)
v1 <- var(coins$Heads)
s1 <- skewness(coins$Heads)
k1 <- kurtosis(coins$Heads)
Table of Actual and Expected Values
Statistic Observed Expected
Mean 5.0026506 \(5.0000\pm 0.0123\)
Variance 2.5041197 \(2.5000 \pm 0.0260\)
Skewness 0.0262008 \(0 \pm 0.019\)
Kurtosis -0.1911256 \(-0.200 \pm 0.038\)

The next set of data used is biological back cross data from a paper by Detlefson in 1918.

Detlefson Data

I have the original paper(Detlefsen 1918) and I think that Sewall Wright has slightly abused the data by using the same data multiple times in his summary. The numbers of litters reported of sizes 110, 127 and 103 while the tabulated values are 330, 381 and 309. That is because each animal has three charcteristics that are reported and which can be dominant or recessive.

Table of Data for Dominant Traits at Different Litter Sizes

Detlefsen (1918) Data Table

The dominant characters are Agouti, Dark Eye and Black dependent on the AaPpBb allellic sites. The crosses here are with aappbb recessives.

What Wright has done is totaled the numbers of dominant characteristics. Where a pup might be dominant in one but not in another. My issue is that this is not a completely fair test as the three sets of data are not guaranteed independent if there is any kind of linkage.

library("ggplot2")
pheno1 <- c(rep(0,9),rep(1,47),rep(2,106),rep(3,103),rep(4,51),rep(5,14))
pheno1 <- as.data.frame(pheno1)
colnames(pheno1) <- c("Dominants")

df_dist1 <- data.frame(x=0:5, prob = dbinom(0:5, size=5, prob=0.5))


ggplot(pheno1, aes(x=Dominants, y= after_stat(density)))+
  geom_histogram(fill="#560591",color="White", binwidth = 1)+
  geom_line(data = df_dist1, aes(x = x, y = prob),
            color = "orange", size=0.7) 
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

pheno2 <- c(rep(0,3),rep(1,32),rep(2,99),rep(3,101),rep(4,98),rep(5,40),rep(6,8))
pheno2 <- as.data.frame(pheno2)
colnames(pheno2) <- c("Dominants")

df_dist2 <- data.frame(x=0:6, prob = dbinom(0:6, size=6, prob=0.5))


ggplot(pheno2, aes(x=Dominants, y= after_stat(density)))+
  geom_histogram(fill="#560591",color="White", binwidth = 1)+
  geom_line(data = df_dist2, aes(x = x, y = prob),
            color = "orange", size=0.7) 

pheno3 <- c(rep(0,2),rep(1,23),rep(2,55),rep(3,79),rep(4,85),rep(5,48),rep(6,16),rep(7,1))
pheno3 <- as.data.frame(pheno3)
colnames(pheno3) <- c("Dominants")

df_dist3 <- data.frame(x=0:7, prob = dbinom(0:7, size=7, prob=0.5))


ggplot(pheno3, aes(x=Dominants, y= after_stat(density)))+
  geom_histogram(fill="#560591",color="White", binwidth = 1)+
  geom_line(data = df_dist3, aes(x = x, y = prob),
            color = "orange", size=0.7) 

References

Detlefsen, J. A. 1918. “Fluctuations of Sampling in a Mendelian Population.” Genetics 3 (6): 599. https://pmc.ncbi.nlm.nih.gov/articles/PMC1200451/.
Wright, Sewall. 1984. Evolution and the Genetics of Populations, Volume 1: Genetic and Biometric Foundations. Vol. 1. University of Chicago press. https://books.google.com/books?hl=en&lr=&id=4pTdTWi83ecC&oi=fnd&pg=PP7&dq=evolution+and+the+genetics+of+populations&ots=WV7VZEpbox&sig=LwrRUwUQZjP6HvtyLaQw7ac7QZ4.