In this demonstration, I will show the effect of selection in the genetic variance (referred to as the Bulmer effect) and then we will discuss the advantages and disadvantages of multi-stage selection

Read in data

Read in coded marker data on 100 individuals at 5 loci. The individuals are in rows and the markers are in columns

dat<- read.csv('class11data.csv')

Simulate marker effects

We will assume that all of the genetic variance can be explained by the 5 markers in the dataset. Thus, each marker explains 1/5 of the total genetic variance. We will assume that the genetic variance 10 in this example. We are simulating randomly distributed marker effects using the function rnorm().

Va<- 20
mrkEff<- rnorm(5, 0, sqrt(Va/5))

Compute breeding values

First look at the marker data and the effects that we will use to compute the breeding value

head(as.matrix(dat[,-1])) #see the first 6 rows of the marker data
##      loc1 loc2 loc3 loc4 loc5
## [1,]    1    0    0    1    1
## [2,]    0    0   -1   -1   -1
## [3,]   -1    0   -1    1    1
## [4,]    0    1   -1    0    0
## [5,]    1    0    0   -1   -1
## [6,]    0    1   -1    1    1
as.matrix(mrkEff) #see the marker effects
##             [,1]
## [1,] -0.04763857
## [2,] -1.63998958
## [3,]  1.03046678
## [4,]  1.53952975
## [5,] -0.26268408

Use matrix multiplication to compute the breeding values. When you do matrix multiplication, the columns of the matrix on the left must equal the rows of the matrix on the right

bv<- as.matrix(dat[,-1]) %*% as.matrix(mrkEff) #compute the breeding values
head(bv) #see the first 6 breeding values
##            [,1]
## [1,]  1.2292071
## [2,] -2.3073124
## [3,]  0.2940175
## [4,] -2.6704564
## [5,] -1.3244842
## [6,] -1.3936107

Simulate phenotypes

Now that we have simulated breeding values based on markers and marker effects. We can simulate phenotype by adding in residual error. We will assume that the error variance is 10 and then use the rnorm() function to simulate a vector of residuals. Then we calculate the phenotypes as breeding value + error.

Verr<- 10
resid<- rnorm(100, 0, sqrt(Verr))
pheno<- bv+resid

Compute the heritability

Next we will compute the heritability for the trait that we just simulated within the simulated population. The additive genetic variance is the variance of the breeding values. The phenotypic variance is the variance of the phenotypes. The heritability is the genetic variance divided by the phenotypic variance.

genvar<- var(bv)
phenovar<- var(pheno)
h2<- genvar/phenovar
h2
##           [,1]
## [1,] 0.1662661

Select based on phenotype

For phenotypic selection, we begin ranking individuals based on their phenotype. The rank() function gives high values lower ranks and low values higer ranks. To rank so that high values are ranked at the top, we compute the rank of -1 * phenotype. We then create the pheno2 object which is a matrix that contains the phenotypes in the first column and the ranks in the second column.

pheno2<- cbind(pheno, rank(-pheno))
head(pheno2)
##             [,1] [,2]
## [1,]   3.5489441   19
## [2,]  -6.6359866   97
## [3,]   3.4777911   21
## [4,] -10.9668963  100
## [5,]   0.4192054   46
## [6,]   5.0007840    5

Next we identify which individuals are ranked in the top 20.

ix<- which(pheno2[,2]<=20)
ix
##  [1]  1  6  8 12 13 15 18 25 31 37 42 45 49 52 58 61 68 79 81 90

The ix vector contains the row numbers corresponding to these best individuals.

Compute the phenotypic and genetic variances

To illustrate how selection affects phenotypic and genetic variances we will calculate and observe variances within selected groups and across the whole popualtion.

First we calculate and compare phenotypic variance for selected and whole population

var(pheno[ix]) #selected
## [1] 1.541125
var(pheno) #whole
##          [,1]
## [1,] 13.18356

Next we calculate and compare genetic variance for selected and whole population

var(bv[ix]) #selected
## [1] 1.030349
var(bv) #whole population
##         [,1]
## [1,] 2.19198

Look at LD covariance due to selection

Compute the pairwise correlations between individual locus breeding values for the whole population

datBV<- dat
for(i in 1:5){
   datBV[,1+i]<- dat[,1+i]*mrkEff[i]
}
head(datBV)
##   id        loc1     loc2      loc3     loc4       loc5
## 1  1 -0.04763857  0.00000  0.000000  1.53953 -0.2626841
## 2  2  0.00000000  0.00000 -1.030467 -1.53953  0.2626841
## 3  3  0.04763857  0.00000 -1.030467  1.53953 -0.2626841
## 4  4  0.00000000 -1.63999 -1.030467  0.00000  0.0000000
## 5  5 -0.04763857  0.00000  0.000000 -1.53953  0.2626841
## 6  6  0.00000000 -1.63999 -1.030467  1.53953 -0.2626841

Compute the pairwise correlations breeding values at individual loci for the selected individuals and the entire population. You should see the correlations become more negative after selection.

round(cor(datBV[,-1]), 2)#whole population
##       loc1  loc2  loc3  loc4  loc5
## loc1  1.00 -0.08  0.01  0.12 -0.07
## loc2 -0.08  1.00 -0.07  0.08 -0.16
## loc3  0.01 -0.07  1.00 -0.01  0.00
## loc4  0.12  0.08 -0.01  1.00 -0.84
## loc5 -0.07 -0.16  0.00 -0.84  1.00
round(cor(datBV[ix,-1]), 2)#selected individuals
##       loc1  loc2  loc3  loc4  loc5
## loc1  1.00 -0.23  0.09 -0.11  0.13
## loc2 -0.23  1.00 -0.28 -0.12  0.00
## loc3  0.09 -0.28  1.00  0.14 -0.17
## loc4 -0.11 -0.12  0.14  1.00 -0.84
## loc5  0.13  0.00 -0.17 -0.84  1.00

Computing expected genetic variance after selection

For this we will use a formula. First we need to compute the paramter \(v\), \(v= k(k-x)\). k is the selection intensity and x is the difference between the mean and the truncation point in units of standard deviation

p<- 0.2
k<- dnorm(qnorm(p))/p
x<- -qnorm(p)
v<- k*(k-x)

Genetic variance after selection: \(\sigma_{a_1}^2= (1-vh^2)\sigma_{a_0}^2\)

(1-v*h2)*genvar
##          [,1]
## [1,] 1.907212

How does this compare to what we observed in our simulation?

Discuss: What are some advantages and disadvantages of multi-stage selection?