Suppose we have a general population that dichotomizes into two
subpopulations that I will refer to as Mac and Don, for which Mac and
Don compose proportions s and 1-s of the population, respectively.
Observe: s,1-s ∈ (0,1).
Mac and Don are in Hardy-Weinburg equilibrium (HWE), meaning there is random mating, no inbreeding, infinite population size, discrete generations, equivalent allele frequencies in males and females, and the absence of mutation/migration/selection. Allele frequencies are \(p_1\) and \(p_2\), respectively. We are interested in sampling 100 individuals from this population to compare the observed overall population allele frequency to the expected overall population allele frequency \(\hat{p}\).
For each sample, we need to set random values of s, \(p_1\), and \(p_2\) that fall within [0,1]. Then, we will
randomly sample 100 individuals from the population and determine p’s
allele frequency. We expect that for a population of n individuals,
\(\hat{p} = \frac{2n_{AA} +
n_{Aa}}{2n}\). We can then use \(\hat{p}\) to generate expected proportions
for genotypes AA, Aa, and aa.
With Mac and Don composing our population, this equation can be split into a summation of the two subpopulations’ p allele frequency, where for Mac, \(\hat{p_1} = s\frac{2n_{AA} + n_{Aa}}{2n}\) and for Don, \(\hat{p_2} = (1-s)\frac{2n_{AA} + n_{Aa}}{2n}\).
| Observed and Expected Genotype Frequencies | ||
| s = 0.5 , p1 = 0.3 , p2 = 0.3 | ||
| Genotype | Observed Frequency | Expected HWE Frequency |
|---|---|---|
| AA | 0.120 | 0.109 |
| Aa | 0.420 | 0.442 |
| aa | 0.460 | 0.449 |
| Overall allele frequency (p̂): 0.33 | ||
| Observed and Expected Genotype Frequencies | ||
| s = 0.5 , p1 = 0.3 , p2 = 0.8 | ||
| Genotype | Observed Frequency | Expected HWE Frequency |
|---|---|---|
| AA | 0.320 | 0.265 |
| Aa | 0.390 | 0.500 |
| aa | 0.290 | 0.235 |
| Overall allele frequency (p̂): 0.515 | ||
| Observed and Expected Genotype Frequencies | ||
| s = 0.5 , p1 = 0.5 , p2 = 0.3 | ||
| Genotype | Observed Frequency | Expected HWE Frequency |
|---|---|---|
| AA | 0.190 | 0.144 |
| Aa | 0.380 | 0.471 |
| aa | 0.430 | 0.384 |
| Overall allele frequency (p̂): 0.38 | ||
| Observed and Expected Genotype Frequencies | ||
| s = 0.5 , p1 = 0.5 , p2 = 0.8 | ||
| Genotype | Observed Frequency | Expected HWE Frequency |
|---|---|---|
| AA | 0.470 | 0.442 |
| Aa | 0.390 | 0.446 |
| aa | 0.140 | 0.112 |
| Overall allele frequency (p̂): 0.665 | ||
| Observed and Expected Genotype Frequencies | ||
| s = 0.7 , p1 = 0.3 , p2 = 0.3 | ||
| Genotype | Observed Frequency | Expected HWE Frequency |
|---|---|---|
| AA | 0.110 | 0.081 |
| Aa | 0.350 | 0.408 |
| aa | 0.540 | 0.511 |
| Overall allele frequency (p̂): 0.285 | ||
| Observed and Expected Genotype Frequencies | ||
| s = 0.7 , p1 = 0.3 , p2 = 0.8 | ||
| Genotype | Observed Frequency | Expected HWE Frequency |
|---|---|---|
| AA | 0.240 | 0.226 |
| Aa | 0.470 | 0.499 |
| aa | 0.290 | 0.276 |
| Overall allele frequency (p̂): 0.475 | ||
| Observed and Expected Genotype Frequencies | ||
| s = 0.7 , p1 = 0.5 , p2 = 0.3 | ||
| Genotype | Observed Frequency | Expected HWE Frequency |
|---|---|---|
| AA | 0.140 | 0.130 |
| Aa | 0.440 | 0.461 |
| aa | 0.420 | 0.410 |
| Overall allele frequency (p̂): 0.36 | ||
| Observed and Expected Genotype Frequencies | ||
| s = 0.7 , p1 = 0.5 , p2 = 0.8 | ||
| Genotype | Observed Frequency | Expected HWE Frequency |
|---|---|---|
| AA | 0.350 | 0.348 |
| Aa | 0.480 | 0.484 |
| aa | 0.170 | 0.168 |
| Overall allele frequency (p̂): 0.59 | ||
In simple cases, the recurrence risk ratio depends on the degree of
relatedness of the two relatives, the underlying genetic model, and the
disease allele frequency (p). \[E[\lambda_s]=\frac{1}{4} + \frac{1}{2p} +
\frac{1}{4p^2}\]
\(E[\lambda_s | p=0.01] = \frac{1}{4} +
\frac{1}{2(0.01)} + \frac{1}{4(0.01)^2}=2550.25\)
\(E[\lambda_s | p=0.1] = \frac{1}{4} +
\frac{1}{2(0.1)} + \frac{1}{4(0.1)^2}=30.25\)
\(E[\lambda_s | p=0.25] = \frac{1}{4} +
\frac{1}{2(0.25)} + \frac{1}{4(0.25)^2}=6.25\)
\(E[\lambda_s | p=0.5] = \frac{1}{4} +
\frac{1}{2(0.5)} + \frac{1}{4(0.5)^2}= 2.25\)
\(E[\lambda_s | p=0.75] = \frac{1}{4} +
\frac{1}{2(0.75)} + \frac{1}{4(0.75)^2}= 1.361\)
\(E[\lambda_s | p=0.9] = \frac{1}{4} +
\frac{1}{2(0.9)} + \frac{1}{4(0.9)^2}= 1.114\)
\(E[\lambda_s | p=0.99] = \frac{1}{4} +
\frac{1}{2(0.99)} + \frac{1}{4(0.99)^2}= 1.010\)
As the disease allele frequency p approaches 1, \(\lambda_s\) approaches 1. Meanwhile, as p
approaches 0, \(\lamba_s\) approaches
infinity. In other words, the recurrence risk ratio converges to 1 as
the disease allele frequency increases and overtakes the non-disease
allele. This makes sense, because as the prevalence of disease among
family members will become equivalent to the overall population disease
prevalence with high disease allele frequency. This means that the
recurrence risk ratio more powerfully confirms the existence of DSLs for
monogenetic diseases with very small minor allele frequencies and small
disease prevalence.
We can compare these expected values to the observed \(\lambda_s\), computed using \(\lambda_s = P(Y_1 = 1,Y_2 = 1)/P(Y_1 =
1)^2\), where K is the disease prevalence in the population. We
can also estimate the recurrence risk ratio with \(\hat{\lambda_s}=
\frac{s_{case}}{\hat{k}}\).
## p observed_lambda_s expected_lambda_s
## 1 0.1 15.0000000 30.250000
## 2 0.2 7.8125000 9.000000
## 3 0.3 7.0493827 4.694444
## 4 0.4 4.5742187 3.062500
## 5 0.5 3.2336000 2.250000
## 6 0.6 2.3248457 1.777778
## 7 0.7 1.5743440 1.474490
## 8 0.8 1.0788574 1.265625
## 9 0.9 0.7297668 1.114198
We can see that the expected and observed \(\lambda_s\) are slightly different, but the trend is the same: as p increases from 0 to 1, \(\lambda_s\) will monotonically decrease. I believe I have an issue with my computation of the observed recurrence risk ratio, since ideally, it should not dip below 1. This would imply taht the population disease prevalence would be greater than the probability that a full sibling has a disease given their sibling has the disease, which is not logical.