In this demonstration, I will show the effect of selection in the genetic variance (referred to as the Bulmer effect) and then we will discuss the advantages and disadvantages of multi-stage selection
Read in coded marker data on 100 individuals at 5 loci. The individuals are in rows and the markers are in columns
dat<- read.csv('class11data.csv')
We will assume that all of the genetic variance can be explained by the 5 markers in the dataset. Thus, each marker explains 1/5 of the total genetic variance. We will assume that the genetic variance 10 in this example. We are simulating randomly distributed marker effects using the function rnorm().
Va<- 20
mrkEff<- rnorm(5, 0, sqrt(Va/5))
First look at the marker data and the effects that we will use to compute the breeding value
head(as.matrix(dat[,-1])) #see the first 6 rows of the marker data
## loc1 loc2 loc3 loc4 loc5
## [1,] 1 0 0 1 1
## [2,] 0 0 -1 -1 -1
## [3,] -1 0 -1 1 1
## [4,] 0 1 -1 0 0
## [5,] 1 0 0 -1 -1
## [6,] 0 1 -1 1 1
as.matrix(mrkEff) #see the marker effects
## [,1]
## [1,] -0.04763857
## [2,] -1.63998958
## [3,] 1.03046678
## [4,] 1.53952975
## [5,] -0.26268408
Use matrix multiplication to compute the breeding values. When you do matrix multiplication, the columns of the matrix on the left must equal the rows of the matrix on the right
bv<- as.matrix(dat[,-1]) %*% as.matrix(mrkEff) #compute the breeding values
head(bv) #see the first 6 breeding values
## [,1]
## [1,] 1.2292071
## [2,] -2.3073124
## [3,] 0.2940175
## [4,] -2.6704564
## [5,] -1.3244842
## [6,] -1.3936107
Now that we have simulated breeding values based on markers and marker effects. We can simulate phenotype by adding in residual error. We will assume that the error variance is 10 and then use the rnorm() function to simulate a vector of residuals. Then we calculate the phenotypes as breeding value + error.
Verr<- 10
resid<- rnorm(100, 0, sqrt(Verr))
pheno<- bv+resid
Next we will compute the heritability for the trait that we just simulated within the simulated population. The additive genetic variance is the variance of the breeding values. The phenotypic variance is the variance of the phenotypes. The heritability is the genetic variance divided by the phenotypic variance.
genvar<- var(bv)
phenovar<- var(pheno)
h2<- genvar/phenovar
h2
## [,1]
## [1,] 0.1662661
For phenotypic selection, we begin ranking individuals based on their phenotype. The rank() function gives high values lower ranks and low values higer ranks. To rank so that high values are ranked at the top, we compute the rank of -1 * phenotype. We then create the pheno2 object which is a matrix that contains the phenotypes in the first column and the ranks in the second column.
pheno2<- cbind(pheno, rank(-pheno))
head(pheno2)
## [,1] [,2]
## [1,] 3.5489441 19
## [2,] -6.6359866 97
## [3,] 3.4777911 21
## [4,] -10.9668963 100
## [5,] 0.4192054 46
## [6,] 5.0007840 5
Next we identify which individuals are ranked in the top 20.
ix<- which(pheno2[,2]<=20)
ix
## [1] 1 6 8 12 13 15 18 25 31 37 42 45 49 52 58 61 68 79 81 90
The ix vector contains the row numbers corresponding to these best individuals.
To illustrate how selection affects phenotypic and genetic variances we will calculate and observe variances within selected groups and across the whole popualtion.
First we calculate and compare phenotypic variance for selected and whole population
var(pheno[ix]) #selected
## [1] 1.541125
var(pheno) #whole
## [,1]
## [1,] 13.18356
Next we calculate and compare genetic variance for selected and whole population
var(bv[ix]) #selected
## [1] 1.030349
var(bv) #whole population
## [,1]
## [1,] 2.19198
Compute the pairwise correlations between individual locus breeding values for the whole population
datBV<- dat
for(i in 1:5){
datBV[,1+i]<- dat[,1+i]*mrkEff[i]
}
head(datBV)
## id loc1 loc2 loc3 loc4 loc5
## 1 1 -0.04763857 0.00000 0.000000 1.53953 -0.2626841
## 2 2 0.00000000 0.00000 -1.030467 -1.53953 0.2626841
## 3 3 0.04763857 0.00000 -1.030467 1.53953 -0.2626841
## 4 4 0.00000000 -1.63999 -1.030467 0.00000 0.0000000
## 5 5 -0.04763857 0.00000 0.000000 -1.53953 0.2626841
## 6 6 0.00000000 -1.63999 -1.030467 1.53953 -0.2626841
Compute the pairwise correlations breeding values at individual loci for the selected individuals and the entire population. You should see the correlations become more negative after selection.
round(cor(datBV[,-1]), 2)#whole population
## loc1 loc2 loc3 loc4 loc5
## loc1 1.00 -0.08 0.01 0.12 -0.07
## loc2 -0.08 1.00 -0.07 0.08 -0.16
## loc3 0.01 -0.07 1.00 -0.01 0.00
## loc4 0.12 0.08 -0.01 1.00 -0.84
## loc5 -0.07 -0.16 0.00 -0.84 1.00
round(cor(datBV[ix,-1]), 2)#selected individuals
## loc1 loc2 loc3 loc4 loc5
## loc1 1.00 -0.23 0.09 -0.11 0.13
## loc2 -0.23 1.00 -0.28 -0.12 0.00
## loc3 0.09 -0.28 1.00 0.14 -0.17
## loc4 -0.11 -0.12 0.14 1.00 -0.84
## loc5 0.13 0.00 -0.17 -0.84 1.00
For this we will use a formula. First we need to compute the paramter \(v\), \(v= k(k-x)\). k is the selection intensity and x is the difference between the mean and the truncation point in units of standard deviation
p<- 0.2
k<- dnorm(qnorm(p))/p
x<- -qnorm(p)
v<- k*(k-x)
Genetic variance after selection: \(\sigma_{a_1}^2= (1-vh^2)\sigma_{a_0}^2\)
(1-v*h2)*genvar
## [,1]
## [1,] 1.907212
How does this compare to what we observed in our simulation?