We will use the mixed model equations (Equation 4 from lecture 12 notes) to predict breeding values for days-to-heading measured on 557 wheat lines in two environments, there is no replication within environment. Then we will calculate the reliabilities and the selection accuracies.

load("Class16data.RData")

Lets assume the the error and genetic variance in the base population is 115 and 13 respectively

Ve<- 115
Va<- 13

Calculate alpha, the shrinkage parameter

alpha<- Ve/Va
alpha
## [1] 8.846154

Create the design matrix X using the model.matrix function

X<- model.matrix(DTHD~environment, data=pheno)
head(X)
##   (Intercept) environmentoptimal
## 1           1                  0
## 2           1                  0
## 3           1                  0
## 4           1                  0
## 5           1                  0
## 6           1                  0

Create the design matrix Z using the model.matrix function

Z<- model.matrix(DTHD~genotype_identifier-1, data=pheno)
Z[1:3, 1:3]
##   genotype_identifier6569128 genotype_identifier6688880
## 1                          1                          0
## 2                          0                          1
## 3                          0                          0
##   genotype_identifier6688916
## 1                          0
## 2                          0
## 3                          1

The model.matrix function has appended ‘genotype_identifier’ to the column names of Z. So that the column names of Z will match the column names of A, we will remove ‘genotype_identifier’ from the column names of Z using the gsub() function.

colnames(Z)<- gsub("genotype_identifier",  "", colnames(Z))

Make sure A rows and columns are in the same order as Z columns

A<- A[colnames(Z),colnames(Z)]
A[1:3, 1:3]
##         6569128 6688880 6688916
## 6569128  1.9930  0.3972  0.3736
## 6688880  0.3972  1.9892  1.0466
## 6688916  0.3736  1.0466  1.9618

Calculate the inverse of A

Ainv<- solve(A)
Ainv[1:3, 1:3]
##               6569128       6688880      6688916
## 6569128  1.2624028157  0.0009917869 -0.002926766
## 6688880  0.0009917869  0.9279932611 -0.149256306
## 6688916 -0.0029267663 -0.1492563058  1.385856866

Create the vector y

y<-as.vector(pheno$DTHD)

Create the matrix on the left hand side

C11<- t(X)%*%X
C12<- t(X)%*%Z
C21<- t(Z)%*%X
C22<- t(Z)%*%Z + Ainv*alpha
upper_LHS<- cbind(C11, C12)
lower_LHS<- cbind(C21, C22)
LHS= rbind(upper_LHS, lower_LHS)

Create the matrix on the right hand side

RHS<- rbind(t(X)%*%y,t(Z)%*%y)

Calculate the inverse of the left hand side

invLHS<- solve(LHS)

Calculate the solutions for the fixed and random effects

b_u<- invLHS %*%RHS

Calculate PEV matrix

PEVmat<- invLHS*Ve

Get the diagonals of the PEV mat

dPEV<- diag(PEVmat)

Subset the diagonals of the PEV mat to get the PEVs of the EBVs only. (The first two PEVs correspond to the fixed effect estimates in this example)

pev_ebv<-dPEV[-c(1:2)]

Calculate the reliabilities (remember to use the Va among individuals, not Va in the base population)

Vf<-diag(A)*Va
rel<- 1-pev_ebv/Vf

Make a histogram of the reliabilities

hist(rel)

Calculate the median of the reliabilities

mean(rel)
## [1] 0.4801748

Calculate the median selection accuracy

mean(sqrt(rel))
## [1] 0.6897284

Compare this to the accuracy of phenotypic selection among the lines and compare with the selection accuracy for the EBVs

Vf<- Va*median(diag(A))
sqrt(Vf/c(Vf+Ve/2))
## [1] 0.5567591