In this demonstration you will estimate variance components using a dataset consisting of phenotypes on individual plants (GID) belonging to 30 half-sib families (HSfam). The data are from two locations
library(lme4)
## Loading required package: Matrix
library(rrBLUP)
library(pedigreemm)
library(arm)
## Loading required package: MASS
##
## arm (Version 1.13-1, built: 2022-8-25)
## Working directory is /Users/jrut/Library/Mobile Documents/com~apple~CloudDocs/Documents/Teaching/CPSC554/2024/Class 7
setwd("~/Library/Mobile Documents/com~apple~CloudDocs/Documents/Teaching/CPSC554")
load('HS example data.RData')
The GID column contains indivudal unique IDs. The HSfam indicates to which half-sib family each individual belongs. The l
head(pheno)
## GID HSfam loc pheno bv
## 51 81 1 1 -0.9553381 -0.463061920
## 80 110 1 1 -1.0753473 -1.379104345
## 87 117 1 1 3.1308638 1.377296145
## 88 118 1 1 0.7319875 -1.186054227
## 162 192 1 1 -1.0200256 -0.005779062
## 174 204 1 1 -0.1315346 -1.224348681
str(pheno)
## 'data.frame': 500 obs. of 5 variables:
## $ GID : int 81 110 117 118 192 204 206 232 337 358 ...
## $ HSfam: num 1 1 1 1 1 1 1 1 1 1 ...
## $ loc : num 1 1 1 1 1 1 1 1 2 2 ...
## $ pheno: num -0.955 -1.075 3.131 0.732 -1.02 ...
## $ bv : num -0.46306 -1.3791 1.3773 -1.18605 -0.00578 ...
The loc, HSfam, and GID variables were recognized as integer or numeric, but they are actually factors. They need to be changed to factors using as.factor()
pheno$loc<- as.factor(pheno$loc)
pheno$HSfam<- as.factor(pheno$HSfam)
pheno$GID<- as.factor(pheno$GID)
In this model, pheno is the response variable, location is a fixed factor, and HSfam is a random factor. This allows us to estimate how much of the total phenotypic variance can be attributed to half-sib family, \(\sigma^2_{f}\). This is also the same as the genetic covariance variance between individuals within a half-sib family
mod<- lmer(pheno~loc+(1|HSfam), data=pheno)
Get the variance components
vc<- data.frame(VarCorr(mod))
Varfam<- vc[1,'vcov']
Varresid<- vc[2,'vcov']
Recall that \(\sigma^2_{f}=r_{xy}\times\sigma^2_{a}\) where \(\sigma^2_{f}\) is the variance due to half-sib family, \(\sigma^2_{a}\) is the additive genetic variance in the base population, and \(r_{xy}\) is the coefficient of relationship between inividuals within a family.
Varabase<- 4*Varfam
First we estimate the what is often referred to as the ‘family-mean heritability.’ Better known as the reliability. This tells us how well whole family phenotypic means or BLUPs correspond with the true family-mean breeding values
nreps<- mean(table(pheno$HSfam))
H2fam<- Varfam/(Varfam+ Varresid/nreps)
H2fam
## [1] 0.6434489
Next we estimate the actual narrow-sense heritability. This tells us the correspondence of phenotypes and genotypes observed on a single non-inbred individual in a single environment.
Varabase/(Varabase+Varresid)
## [1] 0.3022194
BLUPs are Best Linear Unbiased Predictors, they are random effect coefficients.These values are an estimate of true average breeding value for each family. In our example, this also tells us the 1/2 the breeding value of the female parents (the mother of each half-sib family). Multiply by 2 to get the breeding value estimate of the female parent of the half-sib family
Parentebv1<- ranef(mod)$HSfam[,1] *2
Next we can calculate the reliability of these BLUPs using the prediction error variance (PEV) and the half-sib family variance
sep<- se.ranef(mod)$HSfam[,1]
pev<- sep^2
rel<- 1-pev/Varfam
mean(rel)
## [1] 0.6382942
Now we will use complete pedigree information to estimate the same parameters from the half-sib family data. These estimates will be more accurate because we will account for the fact that some half-sib families are more related to eachother than others.
We first need to generate the pedigree relationship matrix to use it in the model
p2<- editPed(ped$p2, ped$p1, ped$GID) #order the pedigree
p3<- pedigree(p2$sire, p2$dam, p2$label) #create pedigree object
A<- as.matrix(getA(p3)) #create pedigree relationship matrix
## 'as(<dtTMatrix>, "dtCMatrix")' is deprecated.
## Use 'as(., "CsparseMatrix")' instead.
## See help("Deprecated") and help("Matrix-deprecated").
First the fixed effects design matrix
X<- model.matrix(pheno~1+loc, pheno)
Next the random effects design matrix
pheno$GID<- factor(pheno$GID, levels= colnames(A))
Z<- model.matrix(pheno~0+GID, pheno)
colnames(Z)<- gsub("GID", "", colnames(Z)) #remove the prefix from the columns
Ensure that columns of Z match with rows of A
A<- A[colnames(Z),]
The vector of phenotypes, design matrices, and relationship matrices are used as input to the mixed model solver, mixed.solve()
mod2<- mixed.solve(y= pheno$pheno, Z, K=A, X)
First look at the outputs from the model
str(mod2)
## List of 5
## $ Vu : num 1.67
## $ Ve : num 2.65
## $ beta: num [1:2(1d)] -1.15 -0.226
## ..- attr(*, "dimnames")=List of 1
## .. ..$ : chr [1:2] "(Intercept)" "loc2"
## $ u : num [1:557(1d)] 0.389 -0.533 -0.264 -0.92 -0.171 ...
## ..- attr(*, "dimnames")=List of 1
## .. ..$ : chr [1:557] "-1" "-10" "-11" "-12" ...
## $ LL : num -1058
Next look at the distribution if the BLUPs. Unlike in the previous model, there is a BLUP for each individual.
hist(mod2$u)
The estimate of the additive genetic variance in the base population is in Vu, and the estimate of the residual error variance is in Ve
VarabaseMX<- mod2$Vu
VarresidMX<- mod2$Ve
Next we can estimate the heritability in the base population
VarabaseMX/c(VarabaseMX+VarresidMX)
## [1] 0.3868276
Note that the value is similar to what we estimated earlier.
We can also get the BLUPs of the female parents (the mothers) of each of the half-sib families. These are GIDs 1-30.
Parentebv2<- mod2$u[as.character(1:30)]
Compare these breeding value estimates with the ones estimated previously.
plot(Parentebv1, Parentebv2)