Next week we will perform a hybrid cross of Wisconsin fastplants. Today, we are going go use R to simulating how this (and other crosses) might turn out.
The are a number of ways of creating simulations in R but today we’re going to use the sample function. The sample function takes several inputs including:
Let’s simulate two six-sided die rolls to see how this works. Before looking ahead, think, are we sampling with replacement or without replacement?
sample(1:6, 2, replace=TRUE)
Make sure that you understand the line of code above before moving on. You may want to try out other probabilities to see if and how they work. Note that we also could have done the following to indicate that each die roll has a 1/6 chance of occuring however since all rolls are equally likely we don’t need to:
sample(1:6, 2, replace=TRUE, prob=rep(1/6, 6))
Now, let’s imagine that we don’t just want to imagine one single roll of two dice but rather want to see all of the possibilities and their relative likelihoods. To do that, let’s replicate this simple “experiment” 1000 times:
replicate(1000, sample(1:6, 2, replace=TRUE, prob=rep(1/6, 6)))
This creates a matrix with 2 rows and 1000 columns. Each column is one result and each row is one of the two dice. Let’s assign a matrix like this to a variable and then get the sum of the two dice for every experiment. To get this sums we can use the apply function.
die.results <- replicate(1000, sample(1:6, 2, replace=TRUE, prob=rep(1/6, 6)))
apply(die.results, 1, sum) #sum of every row
apply(die.results, 2, sum) #sum of every column
Now let’s take several different looks at the sums of two die rolls:
two.dice <- apply(die.results, 2, sum) #sum of every column
hist(two.dice)
table(two.dice)
mean(two.dice)
var(two.dice) #variance
sd(two.dice) #standard deviation
summary(two.dice)
Are these results what you expected?
Now let’s replace dice with crosses between two plants that are both hetrozygous for gene A. Let’s imagine that as class, we’re going to produce 200 plants:
alleles.A <- c("A", "a")
num.plants <- 200
Next, we’ll sample from these alleles 200 games for each of two gametes:
m <- sample(alleles.A, num.plants, replace=TRUE)
f <- sample(alleles.A, num.plants, replace=TRUE)
table(m,f)
Finally, and this is literally the sexy part, let’s get these gametes together. I’m also going to replace aA’s with Aa’s since we don’t care where our alleles came from.
f2 <- paste0(m,f)
f2 <- replace(f2, f2=="aA", "Aa")
f2 #let's look at our plants!
table(f2)
How many plants of the 200 plants had each possible phenotype? Is this roughly what we would expect?
As with our dice, we’re likely more interested in all the things that could happen, rather than merely what did happen in one imaginary experiment with 200 plants. To explore this, we’ll once again use the replicate function. This time, however, we’re going to replicate a series of instruction, enclosed within curly brackets, {}. Simulating 1000 experiments (each with 200 crosses looks like this):
f2 <- replicate(1000,
{m <- sample(alleles.A, num.plants, replace=TRUE)
f <- sample(alleles.A, num.plants, replace=TRUE)
f2 <- paste0(m,f)
replace(f2, f2=="aA", "Aa")
}
)
f2 is now a matrix and a somewhat unweildy one at that. dim shows us the dimensions of this matrix. We have 200 rows and 1000 columns. Each column is one simulated experiment.
f2
dim(f2)
We can use apply to get a summary of each experiment:
f2.tables <- apply(f2, 2, table)
f2.tables
f2.tables has 1000 columns but only 3 rows (as opposed to the 200 in f2). The rows represent the numbers of aa, Aa and AA plants in each simulation of the experiment (each with, if you recall 200 plants). Here are a few different ways of looking at these results:
hist(f2.tables[1, ], main="Number of aa plants")
hist(f2.tables[2, ], main="Number of Aa plants")
hist(f2.tables[2, ], main="Number of AA plants")
apply(f2.tables, 1, mean)
apply(f2.tables, 1, sd)
apply(f2.tables, 1, var)
apply(f2.tables, 1, summary)
apply(f2.tables>=45, 1, sum) #how often were there at least 45 of these phenotypes?
table(apply(f2.tables, 2, which.max)) #how often each phenotype was most plentiful in our results
Do these results match your expectations?
The following code will simulate 1000 experiments each with 200 dihybrid crosses. Spend a couple of minutes looking over this code to see if you can understand what it’s doing and then use some of the same tools we used above to explore these results.
alleles.B <- c("B", "b")
f2 <- replicate(1000,
{m.a <- sample(alleles.A, num.plants, replace=TRUE)
f.a <- sample(alleles.A, num.plants, replace=TRUE)
f2.a <- paste0(m.a,f.a)
f2.a <- replace(f2.a, f2.a=="aA", "Aa")
m.b <- sample(alleles.B, num.plants, replace=TRUE)
f.b <- sample(alleles.B, num.plants, replace=TRUE)
f2.b <- paste0(m.b,f.b)
f2.b <- replace(f2.b, f2.b=="bB", "Bb")
paste0(f2.a,f2.b)
}
)
In the above simulations, we assume independence but what if genes A and B are nearby on the same chromosome. In the following simulation, if a gamete passes along A it has a 60% chance of passes along B (and only a 40% chance of passing along b) and if it passes along a it has a 60% chance of passing along b. The code here is a bit more complex, so don’t worry if you can’t follow it (although I encourage you to try). See how the results of this simulation compare to the simulation above where A and B are independent.
f2 <- replicate(1000,
{m.a <- sample(alleles.A, num.plants, replace=TRUE);
f.a <- sample(alleles.A, num.plants, replace=TRUE);
m.prob.B <- .4+.2*(m.a=="A") # 60% or 40% depending on A or a
f.prob.B <- .4+.2*(f.a=="A")
m.b <- rep(NA, num.plants)
f.b <- rep(NA, num.plants)
for (i in 1:num.plants){
m.b[i] <- sample(alleles.B, 1, replace=TRUE, prob=c(m.prob.B[i], 1-m.prob.B[i]));
f.b[i] <- sample(alleles.B, 1, replace=TRUE, prob=c(m.prob.B[i], 1-m.prob.B[i]));
}
f2.a <- paste0(m.a,f.a)
f2.a <- replace(f2.a, f2.a=="aA", "Aa")
f2.b <- paste0(m.b,f.b)
f2.b <- replace(f2.b, f2.b=="bB", "Bb")
paste0(f2.a,f2.b)
}
)