We generate 3 samples of 10 standard normal variates each, and a corresponding factor variable with levels A, B, and C:
set.seed(1234)
(dat <- rnorm(n=30))
## [1] -1.20707 0.27743 1.08444 -2.34570 0.42912 0.50606 -0.57474
## [8] -0.54663 -0.56445 -0.89004 -0.47719 -0.99839 -0.77625 0.06446
## [15] 0.95949 -0.11029 -0.51101 -0.91120 -0.83717 2.41584 0.13409
## [22] -0.49069 -0.44055 0.45959 -0.69372 -1.44820 0.57476 -1.02366
## [29] -0.01514 -0.93595
(fac <- rep(LETTERS[1:3], each=10))
## [1] "A" "A" "A" "A" "A" "A" "A" "A" "A" "A" "B" "B" "B" "B" "B" "B" "B"
## [18] "B" "B" "B" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C"
We fit a linear model to the data using R’s standard parameterization (i.e., mean of sample A, differences of means of samples B and C from mean of sample A):
(lma <- lm(dat ~ fac))
##
## Call:
## lm(formula = dat ~ fac)
##
## Coefficients:
## (Intercept) facB facC
## -0.38316 0.26499 -0.00479
vcov(lma)
## (Intercept) facB facC
## (Intercept) 0.08581 -0.08581 -0.08581
## facB -0.08581 0.17162 0.08581
## facC -0.08581 0.08581 0.17162
We want to investigate the difference between the means of samples B and C using a simple linear contrast:
mat <- rbind(c(0, 1, -1))
The associated variance should be:
mat %*% vcov(lma) %*% t(mat)
## [,1]
## [1,] 0.1716
Now observe the difference between estimated variances:
library(multcomp)
vcov(glht(lma, mat))
## 1
## 1 0.1716
vcov(glht(mmm(lma), mlf(mat)))
## lma: 1
## lma: 1 0.1486
We can use an alternative parameterization for the linear model (i.e., means of samples A, B, and C):
(lmb <- lm(dat ~ fac - 1))
##
## Call:
## lm(formula = dat ~ fac - 1)
##
## Coefficients:
## facA facB facC
## -0.383 -0.118 -0.388
vcov(lmb)
## facA facB facC
## facA 0.08581 0.00000 0.00000
## facB 0.00000 0.08581 0.00000
## facC 0.00000 0.00000 0.08581
Again we are interested in the difference between the means of samples B and C. We can use the same linear contrast as above, and the associated variance should be the same as above:
mat %*% vcov(lmb) %*% t(mat)
## [,1]
## [1,] 0.1716
This time we find that the estimated variances are the same:
vcov(glht(lmb, mat))
## 1
## 1 0.1716
vcov(glht(mmm(lmb), mlf(mat)))
## lmb: 1
## lmb: 1 0.1716
We increase the sample size from 10 to 10,000:
DAT <- rnorm(n=30000)
FAC <- rep(LETTERS[1:3], each=10000)
We redo the analysis using R’s standard parameterization:
lmA <- lm(DAT ~ FAC)
vcov(lmA)
## (Intercept) FACB FACC
## (Intercept) 9.954e-05 -9.954e-05 -9.954e-05
## FACB -9.954e-05 1.991e-04 9.954e-05
## FACC -9.954e-05 9.954e-05 1.991e-04
mat %*% vcov(lmA) %*% t(mat)
## [,1]
## [1,] 0.0001991
vcov(glht(lmA, mat))
## 1
## 1 0.0001991
vcov(glht(mmm(lmA), mlf(mat)))
## lmA: 1
## lmA: 1 0.0002022
The relative discrepancy between variance estimates is much smaller than with a sample size of 10.
Using the alternative parameterization, the variance estimates are again identical.
lmB <- lm(DAT ~ FAC - 1)
vcov(lmB)
## FACA FACB FACC
## FACA 9.954e-05 0.000e+00 0.000e+00
## FACB 0.000e+00 9.954e-05 0.000e+00
## FACC 0.000e+00 0.000e+00 9.954e-05
mat %*% vcov(lmB) %*% t(mat)
## [,1]
## [1,] 0.0001991
vcov(glht(lmB, mat))
## 1
## 1 0.0001991
vcov(glht(mmm(lmB), mlf(mat)))
## lmB: 1
## lmB: 1 0.0001991
So we see that the alternative parameterization is preferable here since it uses different portions of the data to estimate different parameters. The problem decreases with sample size because the observed and expected Fisher information, and thus variance, are (only) asymptotically equal.
This plot shows the average percentage deviation (plus/minus standard deviation) of the observed from the expected variance as estimated from 1000 simulated datasets under the above setting (3 balanced samples of standard normal variates):
We repeat the simulation, now with 5 balanced samples of standard normal variates:
A <- rbind(c(1, 0, 0), c(1, 1, 0), c(1, 0, 1))
B <- rbind(c(1, 0, 0), c(0, 1, 0), c(0, 0, 1))
A %*% vcov(lma) %*% t(A)
## [,1] [,2] [,3]
## [1,] 8.581e-02 1.388e-17 -1.388e-17
## [2,] 1.388e-17 8.581e-02 1.388e-17
## [3,] -1.388e-17 1.388e-17 8.581e-02
vcov(glht(lma, A))
## 1 2 3
## 1 8.581e-02 1.388e-17 -1.388e-17
## 2 1.388e-17 8.581e-02 1.388e-17
## 3 -1.388e-17 1.388e-17 8.581e-02
vcov(glht(mmm(lma), mlf(A)))
## lma: 1 lma: 2 lma: 3
## lma: 1 0.085812 0.0030258 -0.0150624
## lma: 2 0.003026 0.0918635 -0.0005311
## lma: 3 -0.015062 -0.0005311 0.0556872
B %*% vcov(lmb) %*% t(B)
## [,1] [,2] [,3]
## [1,] 0.08581 0.00000 0.00000
## [2,] 0.00000 0.08581 0.00000
## [3,] 0.00000 0.00000 0.08581
vcov(glht(lmb, B))
## 1 2 3
## 1 0.08581 0.00000 0.00000
## 2 0.00000 0.08581 0.00000
## 3 0.00000 0.00000 0.08581
vcov(glht(mmm(lmb), mlf(B)))
## lmb: 1 lmb: 2 lmb: 3
## lmb: 1 0.08581 0.00000 0.00000
## lmb: 2 0.00000 0.08581 0.00000
## lmb: 3 0.00000 0.00000 0.08581