library(breedR)
library(ggplot2)
library(plyr)
library(dplyr)
theme_set(theme_bw())
Simulate a progeny-trial dataset with a somewhat “extreme” pedigree (only 6 families) in order to highlight the desired structure
simdat <- function(N = 1e3, vc = list(gen = .5, res = 1)) {
breedR.sample.phenotype(
fixed = c(mu = 1),
genetic = list(
model = 'add_animal',
Nparents = c(3, 2), # 3*2 "families"
sigma2_a = vc$gen,
check.factorial = FALSE),
residual.variance = vc$res,
N = N)[-(1:20),] # remove founders
}
Fit an additive-genetic model
fitdat <- function(dat = simdat()) {
suppressMessages(
remlf90(
fixed = phenotype ~ 1,
genetic = list(
model = 'add_animal',
ped = dat[, 1:3],
id = 'self'
),
data = dat
)
)
}
Observed effect: positive correlation between PBV and residuals
set.seed(12345)
res <- fitdat()
PBV = model.matrix(res)$genetic %*%
ranef(res)$genetic
qplot(resid(res), PBV) +
geom_smooth(method = 'lm') +
coord_fixed()
summary(res)
## Linear Mixed Model with pedigree and spatial effects fit by AI-REMLF90 ver. 1.122
## Data: dat
## AIC BIC logLik
## 2950 2960 -1473
##
## Parameters of special components:
##
##
## Variance components:
## Estimated variances S.E.
## genetic 0.2874 0.2448
## Residual 1.0069 0.1331
##
## Estimate S.E.
## Heritability 0.2047 0.1752
##
## Fixed effects:
## value s.e.
## Intercept 1.2688 0.2471
lm(PBV ~ resid(res))
##
## Call:
## lm(formula = PBV ~ resid(res))
##
## Coefficients:
## (Intercept) resid(res)
## -0.01832 0.14605
The observed phenotype is split into the model components:
all.equal(
model.matrix(res)$genetic %*% ranef(res)$genetic + resid(res),
res$mf$phenotype - fixef(res)$Intercept[, 'value'],
check.attributes = FALSE
)
## [1] TRUE
… in a way such that the ratio between the genetic blups and empirical residual is constant and equal to the heritability, up to some shift which pools the estimates towards the family mean.
Assess correlation between BVs and residuals
simcorr <- function(res = fitdat()) {
PBV = model.matrix(res)$genetic %*%
ranef(res)$genetic
return(
data.frame(
est_sigma_a = res$var[1, 1],
est_sigma_e = res$var[2, 1],
corr_a_e = cor(PBV, resid(res))
)
)
}
Nrep <- 30
repdat <- as.data.frame(t(replicate(Nrep, unlist(simcorr()))))
Plot density estimators of the sampling distributions of variance component estimators and the correlation between PBV and residuals
repdat %>%
lapply(density) %>%
ldply(function(x) as.data.frame(x[c('x', 'y')])) %>%
ggplot(aes(x, y)) +
geom_line() +
facet_grid(~ .id) +
theme_bw()
Conclusions:
Interpretation:
The prediction of random effects essentially splits the deviation between the observed phenotype and the expected value (given by the fixed effects - in this case only the global mean) between the available random effects, acording to the relative magnitude of variance estimates.
This means that individuals whose phenotype (i.e. “real” BV+residual) is higher than average will have positive estimated residual and breeding value.
The kinship structure and the spatial effect only introduce some relative perturbations around this, but the underlying effect remains.
Note that the regression slope shown above is very similar to the estimated heritability
Simulate two independent random variables with a variance ratio of 2 (say, a spatial effect and the residual effect). But in this case we remove the autocorrelation in the random effect such that each blup is predicted based on the individual’s phenotype only.
set.seed(1234)
N <- 500
dat <-
data.frame(
S = rnorm(N),
R = rnorm(N, sd = sqrt(2))
)
For every individual (point), we only observe the phenotype, which is the sum of both
ind <- 11
ggplot(dat, aes(R, S)) +
geom_point(col = 'gray', size = .1) +
geom_point(dat = dat[ind, ], colour = 'red') +
geom_abline(intercept = sum(dat[ind,]), slope = -1, col = 'darkgray') +
coord_fixed()
The predicted blups are computed by splitting the observed phenotype according to the estimated variance ratios. This is, the point of intersection between the phenotypic line and the variance ratio line of slope \(1/(1+2)\) The prediction is a projection of the true points on this line.
ggplot(dat, aes(R, S)) +
geom_point(col = 'gray', size = .1) +
geom_point(dat = dat[ind, ], colour = 'red') +
geom_abline(intercept = sum(dat[ind,]), slope = -1, col = 'darkgray') +
geom_abline(intercept = 0, slope = 1/3, col = 'blue') +
coord_fixed()
While this model is not identifiable, we force blup prediction by making only one round or reml estimation. The variance estimates are very close to the initial values
res2 <- remlf90(
fixed = pheno ~ 1,
random = ~ ind,
data = transform(
dat,
pheno = R + S,
ind = seq_len(nrow(dat))
),
var.ini = list(ind = 1, resid = 2),
progsf90.options = 'maxrounds 1'
)
summary(res2)
## Linear Mixed Model with pedigree and spatial effects fit by AI-REMLF90 ver. 1.122
## Data: transform(dat, pheno = R + S, ind = seq_len(nrow(dat)))
## AIC BIC logLik
## 1997 2006 -996.7
##
## Parameters of special components:
##
##
## Variance components:
## Estimated variances S.E.
## ind 1.070 92845
## Residual 2.062 92845
##
## Fixed effects:
## value s.e.
## Intercept -0.07599 0.0775
vratio <- res2$var[1, 1]/sum(res2$var[, 1])
The interecept also need to be split acording to the variance ratio
dat <- transform(
dat,
Shat = fixef(res2)$Intercept[, 'value']*vratio +
model.matrix(res2)$ind %*% ranef(res2)$ind,
Rhat = fixef(res2)$Intercept[, 'value']*(1-vratio) + resid(res2)
)
ggplot(dat) +
geom_point(aes(R, S), col = 'gray', size = .1) +
geom_point(aes(R, S), dat = dat[ind, ], colour = 'red') +
geom_abline(intercept = sum(dat[ind, c('R', 'S')]), slope = -1, col = 'darkgray') +
geom_abline(intercept = 0, slope = 1/2, col = 'blue') +
geom_point(aes(Rhat, Shat), size = .1) +
geom_point(aes(Rhat, Shat), dat = dat[ind, ], colour = 'darkred') +
coord_fixed()
This procedure of prediction produces two effects:
Even if we know (or assume) that the random effect and the residuals are independent, the respective blups are conditioned to the observed phenotype. This necessarily introduces a correlation in the predictions that is more or less strong according to how much additional information about an individual we can get from other individuals (e.g. kinship, neighbourhood).
The variances of the blups are smaller than the estimated variances, as a consequance of the projection.
Furthermore, the sum of the variances and twice the covariance gives an approximation to the phenotypic variance:
var(dat$Rhat)+
var(dat$Shat)+
2*cov(dat$Rhat, dat$Shat)
## [1] 3.138393