Remember to re-run this code every time you re-open this R Notebook.
#Load libraries
library(ggplot2)
## Warning in register(): Can't find generic `scale_type` in package ggplot2 to
## register S3 method.
In this exercise you will conduct a number of simple calculations. R recognizes all basic mathematical functions for addition/subtraction (+, -), multiplication/division (*, /), and exponents/logarithms (^, log10). Also, R can use these functions using basic algebra, as long as you first define symbols numerically first. So, you can solve the the multiplication of 11 and 13 in two ways:
Specific solution:
11*13
## [1] 143
General solution:
x <- 11
y <- 13
x*y
## [1] 143
Note that the general solution seems more cumbersome at first. However, it allows you to write some complex code that processes data, and if you want to do the same thing with different input variables, you don’t have to rewrite the entire code, you just redefine the input variables at the beginning. So, it’s a good habit to write code using general rather than specific solutions.
The Hardy–Weinberg principle states that allele and genotype frequencies in a population will remain constant from generation to generation in the absence of evolutionary forces acting on that population. Evolutionary influences include natural selection, mutation, genetic drift, and migration (and then there is non-random mating, too; we’ll look at these forces in more detail in the coming modules). In the simplest case of a single locus with two alleles denoted A and a with frequencies f(A) = p and f(a) = q, respectively, the expected genotype frequencies under random mating are f(AA) = p2 for the AA homozygotes, f(aa) = q2 for the aa homozygotes, and f(Aa) = 2pq for the heterozygotes (see figure below for the modified Punnett square that explains this graphically). Note that p2 + 2pq + q2 always equals 1, just as p+q equals 1 for biallelic loci. In the absence of evolutionary forces, allele frequencies p and q are constant between generations.
The power of the Hardy-Weinberg principle is that it allows for a simple test of whether evolutionary forces are acting on a specific locus. We can go out into natural populations, measure genotype frequencies, and infer allele frequencies from that. In the absence of any evolutionary forces, the measured genotype frequencies should be equal to the idealized genotype frequencies predicted by the Hardy-Weinberg principle. If measured and predicted and idealized genotype frequencies do not match, the population is in Hardy-Weinberg disequilibrium and some evolutionary force must be acting on that particular locus, causing some genotypes to be over- or underrepresented.
Before we apply the Hardy-Weinberg principle in practice, we should think more clearly about the theoretical predictions. For any given allele frequency of p at a biallelic locus, there is a clear mathematical prediction of what the genotype frequencies should be. Here, you will explore the relationship between p and the frequency of all genotypes.
Imagine a locus with two alleles, A (with frequency p) and a (with frequency q). In this section you will calculate the frequency of genotypes for values of p between 0 and 1.
#Make a table with the allele frequency of A (p) between 0 and 1 (in 0.01-step increments) using the seq command
data <- as.data.frame(seq(0, 1, by = 0.01))
#Add column name
colnames(data) <- "p"
#Calculate alternate allele frequency (q)
data$q <- 1-data$p
#Calculate the theoretical frequencies for genotypes AA, Aa, and aa
data$AA <- data$p^2
data$Aa <- 2*data$p*data$q
data$aa <- data$q^2
#Calculate the total allele frequency (sum of all genotype frequencies)
data$total <- (data$p^2)+(2*data$p*data$q)+(data$q^2)
#Show table
data
## p q AA Aa aa total
## 1 0.00 1.00 0.0000 0.0000 1.0000 1
## 2 0.01 0.99 0.0001 0.0198 0.9801 1
## 3 0.02 0.98 0.0004 0.0392 0.9604 1
## 4 0.03 0.97 0.0009 0.0582 0.9409 1
## 5 0.04 0.96 0.0016 0.0768 0.9216 1
## 6 0.05 0.95 0.0025 0.0950 0.9025 1
## 7 0.06 0.94 0.0036 0.1128 0.8836 1
## 8 0.07 0.93 0.0049 0.1302 0.8649 1
## 9 0.08 0.92 0.0064 0.1472 0.8464 1
## 10 0.09 0.91 0.0081 0.1638 0.8281 1
## 11 0.10 0.90 0.0100 0.1800 0.8100 1
## 12 0.11 0.89 0.0121 0.1958 0.7921 1
## 13 0.12 0.88 0.0144 0.2112 0.7744 1
## 14 0.13 0.87 0.0169 0.2262 0.7569 1
## 15 0.14 0.86 0.0196 0.2408 0.7396 1
## 16 0.15 0.85 0.0225 0.2550 0.7225 1
## 17 0.16 0.84 0.0256 0.2688 0.7056 1
## 18 0.17 0.83 0.0289 0.2822 0.6889 1
## 19 0.18 0.82 0.0324 0.2952 0.6724 1
## 20 0.19 0.81 0.0361 0.3078 0.6561 1
## 21 0.20 0.80 0.0400 0.3200 0.6400 1
## 22 0.21 0.79 0.0441 0.3318 0.6241 1
## 23 0.22 0.78 0.0484 0.3432 0.6084 1
## 24 0.23 0.77 0.0529 0.3542 0.5929 1
## 25 0.24 0.76 0.0576 0.3648 0.5776 1
## 26 0.25 0.75 0.0625 0.3750 0.5625 1
## 27 0.26 0.74 0.0676 0.3848 0.5476 1
## 28 0.27 0.73 0.0729 0.3942 0.5329 1
## 29 0.28 0.72 0.0784 0.4032 0.5184 1
## 30 0.29 0.71 0.0841 0.4118 0.5041 1
## 31 0.30 0.70 0.0900 0.4200 0.4900 1
## 32 0.31 0.69 0.0961 0.4278 0.4761 1
## 33 0.32 0.68 0.1024 0.4352 0.4624 1
## 34 0.33 0.67 0.1089 0.4422 0.4489 1
## 35 0.34 0.66 0.1156 0.4488 0.4356 1
## 36 0.35 0.65 0.1225 0.4550 0.4225 1
## 37 0.36 0.64 0.1296 0.4608 0.4096 1
## 38 0.37 0.63 0.1369 0.4662 0.3969 1
## 39 0.38 0.62 0.1444 0.4712 0.3844 1
## 40 0.39 0.61 0.1521 0.4758 0.3721 1
## 41 0.40 0.60 0.1600 0.4800 0.3600 1
## 42 0.41 0.59 0.1681 0.4838 0.3481 1
## 43 0.42 0.58 0.1764 0.4872 0.3364 1
## 44 0.43 0.57 0.1849 0.4902 0.3249 1
## 45 0.44 0.56 0.1936 0.4928 0.3136 1
## 46 0.45 0.55 0.2025 0.4950 0.3025 1
## 47 0.46 0.54 0.2116 0.4968 0.2916 1
## 48 0.47 0.53 0.2209 0.4982 0.2809 1
## 49 0.48 0.52 0.2304 0.4992 0.2704 1
## 50 0.49 0.51 0.2401 0.4998 0.2601 1
## 51 0.50 0.50 0.2500 0.5000 0.2500 1
## 52 0.51 0.49 0.2601 0.4998 0.2401 1
## 53 0.52 0.48 0.2704 0.4992 0.2304 1
## 54 0.53 0.47 0.2809 0.4982 0.2209 1
## 55 0.54 0.46 0.2916 0.4968 0.2116 1
## 56 0.55 0.45 0.3025 0.4950 0.2025 1
## 57 0.56 0.44 0.3136 0.4928 0.1936 1
## 58 0.57 0.43 0.3249 0.4902 0.1849 1
## 59 0.58 0.42 0.3364 0.4872 0.1764 1
## 60 0.59 0.41 0.3481 0.4838 0.1681 1
## 61 0.60 0.40 0.3600 0.4800 0.1600 1
## 62 0.61 0.39 0.3721 0.4758 0.1521 1
## 63 0.62 0.38 0.3844 0.4712 0.1444 1
## 64 0.63 0.37 0.3969 0.4662 0.1369 1
## 65 0.64 0.36 0.4096 0.4608 0.1296 1
## 66 0.65 0.35 0.4225 0.4550 0.1225 1
## 67 0.66 0.34 0.4356 0.4488 0.1156 1
## 68 0.67 0.33 0.4489 0.4422 0.1089 1
## 69 0.68 0.32 0.4624 0.4352 0.1024 1
## 70 0.69 0.31 0.4761 0.4278 0.0961 1
## 71 0.70 0.30 0.4900 0.4200 0.0900 1
## 72 0.71 0.29 0.5041 0.4118 0.0841 1
## 73 0.72 0.28 0.5184 0.4032 0.0784 1
## 74 0.73 0.27 0.5329 0.3942 0.0729 1
## 75 0.74 0.26 0.5476 0.3848 0.0676 1
## 76 0.75 0.25 0.5625 0.3750 0.0625 1
## 77 0.76 0.24 0.5776 0.3648 0.0576 1
## 78 0.77 0.23 0.5929 0.3542 0.0529 1
## 79 0.78 0.22 0.6084 0.3432 0.0484 1
## 80 0.79 0.21 0.6241 0.3318 0.0441 1
## 81 0.80 0.20 0.6400 0.3200 0.0400 1
## 82 0.81 0.19 0.6561 0.3078 0.0361 1
## 83 0.82 0.18 0.6724 0.2952 0.0324 1
## 84 0.83 0.17 0.6889 0.2822 0.0289 1
## 85 0.84 0.16 0.7056 0.2688 0.0256 1
## 86 0.85 0.15 0.7225 0.2550 0.0225 1
## 87 0.86 0.14 0.7396 0.2408 0.0196 1
## 88 0.87 0.13 0.7569 0.2262 0.0169 1
## 89 0.88 0.12 0.7744 0.2112 0.0144 1
## 90 0.89 0.11 0.7921 0.1958 0.0121 1
## 91 0.90 0.10 0.8100 0.1800 0.0100 1
## 92 0.91 0.09 0.8281 0.1638 0.0081 1
## 93 0.92 0.08 0.8464 0.1472 0.0064 1
## 94 0.93 0.07 0.8649 0.1302 0.0049 1
## 95 0.94 0.06 0.8836 0.1128 0.0036 1
## 96 0.95 0.05 0.9025 0.0950 0.0025 1
## 97 0.96 0.04 0.9216 0.0768 0.0016 1
## 98 0.97 0.03 0.9409 0.0582 0.0009 1
## 99 0.98 0.02 0.9604 0.0392 0.0004 1
## 100 0.99 0.01 0.9801 0.0198 0.0001 1
## 101 1.00 0.00 1.0000 0.0000 0.0000 1
Just to practice our visualization skills, let’s use ggplot to visualize these data using a line graph with geom_line() with p on the x-axis and the genotype frequencies on the y-axis. You can plot the values for AA, Aa, aa, and the sum of all three into a single plot by layering multiple geom_line() terms with different aesthetics:
#Visualize the possible allele frequencies as a function of p
#Note that you can add additional elements (lines) by just adding additional geoms, with the aesthetics specified within the brackets
ggplot(data, aes(x=p, y=AA)) +
geom_line()+
geom_line(aes(x=p, y=Aa), color="blue") +
geom_line(aes(x=p, y=aa), color="red") +
geom_line(aes(x=p, y=total), color="purple") +
xlab("p(Allele Frequency") +
ylab("Genotype Frequency") +
theme_classic()
In your own words, can you explain the Hardy-Weinberg principle and why the sum of all genotypes always equals 1?
The Hardy-Weinberg principal says that the variation within a population is mostly constant unless it is disturbed by other factors such as mutations. Such factors as mutations cause a disruption in the equilibrium by introducing new alleles into a population. There are other factors that can also disrupt the equilibrium such as natural selection and genetic drift.The sum of genotypes will always be equal to one because one represents the entire population and there cannot be less than the whole population.
Fruit flies of the genus Drosophila are a workhorse for genetic and evolutionary studies. A frequently studied trait is the eye color polymorphism some populations exhibit. Wild-type flies have a bright red eye coloration. A mutation in a single gene controlling the expression of eye color, however, can turn the eyes white. The allele causing the white-eye phenotype (we call it e) is recessive, therefore the wild-type allele (we call it E) is dominant.
You collected 1000 flies in a natural population and bring your samples back to the laboratory for genotyping. You find that 720 flies are homozygous for the E allele, 120 flies are homozygous for the e allele, and 160 are heterozygous.
What are the relative frequencies (f) of each genotype in the population?
#Calculate observed genotype frequencies based on the results
f_EE = 720
f_Ee = 160
f_ee = 120
#Write results into a table
#Make a list of possible genotypes
genotype <- c("EE", "Ee", "ee")
#Make a list of observed genotype frequencies
f_observed <- c(f_EE, f_Ee, f_ee)
#Merge the two lists into a table
results1 <- data.frame(genotype, f_observed)
results1
## genotype f_observed
## 1 EE 720
## 2 Ee 160
## 3 ee 120
Based on the measured genotype frequencies, you can calculate allele frequencies for E and e (hint: flies are diploid):
#Calculate allele frequency for E (p) and e (q) based on observed genotype frequencies
p = ((2*720)+160)/2000
q = ((2*120)+160)/2000
Based on the allele frequencies, we can apply the HW principle and calculate idealized genotype frequencies:
#Calculate idealized (i.e., theoretically predicted) genotype frequencies based on allele frequencies
f_EEi = p^2
f_Eei = 2*p*q
f_eei = q^2
#Make a list of idealized genotype frequencies
f_idealized <- c(f_EEi, f_Eei, f_eei)
#Add list to the results table
results2 <- cbind(results1, f_idealized)
results2
## genotype f_observed f_idealized
## 1 EE 720 0.64
## 2 Ee 160 0.32
## 3 ee 120 0.04
What do you observe? Is the population in HW equilibrium? If it is not, what could explain the difference between the observed and the predicted genotype frequencies?
The given population is not in equilibrium, we know this because the idealized population has different ratios than were actually observed. This difference could be due to many different factors such as genetic drift or mutations (really any outside influences). In this particular sampling it is highly likely that the flies collected and used for the experiment do not accuratley represent the entire population.
Using the same flies, you also quantify genotype frequencies at a the bithorax locus, which has two alleles G and g, and causes an abnormal development of the fly’s halteres. You find that 10 flies are homozygous for the G allele, 810 flies are homozygous for the g allele, and 180 are heterozygous.
f_GG = 10
f_Gg = 180
f_gg = 810
#Write results into a table
genotype <- c("GG", "Gg", "gg")
f_observed <- c(f_GG, f_Gg, f_gg)
results3 <- data.frame(genotype, f_observed)
p = (2*10+180)/2000
q = (2*810+180)/2000
f_GGi = p^2
f_Ggi = 2*p*q
f_ggi = q^2
f_idealized <- c(f_GGi, f_Ggi, f_ggi)
results4 <- cbind(results3, f_idealized)
results4
## genotype f_observed f_idealized
## 1 GG 10 0.01
## 2 Gg 180 0.18
## 3 gg 810 0.81
Is the population in HW equilibrium? If it is not, what could explain the difference between the observed and the predicted genotype frequencies?
The population actually appears to be in equilibrium.
This exercise does not contain original data.
Consulting additional resources to solve this assignment is absolutely allowed, but failure to disclose those resources is plagiarism. Please list any collaborators you worked with and resources you used below or state that you have not used any.
N/A