Stat892 - Assignment 2

R Markdown

Statistics 892 -Assignment 2

1. View videos Stat902-02.01.17a; Stat902-02.08.17(1); Stat902-013017 and Stat902-2.12.19 on box:https://unl.app.box.com/folder/66083066660?s=idup7tp7rvkh8w3fxjqpo52u6f2xzc9m

2. A 2x3 factorial treatment structure in a completely randomized design structure produced the following results:

a. Using R lm() and Anova() in R (or a similar statistical programsuch as SAS GLIMMIX, GLM or MIXED ), obtain the ANOVA table for this problem.

library(car); library(emmeans); library(multcomp);

## Loading required package: carData

## Loading required package: mvtnorm

## Loading required package: survival

## Loading required package: TH.data

## Loading required package: MASS

## 
## Attaching package: 'TH.data'

## The following object is masked from 'package:MASS':
## 
##     geyser

y <- c(20,25,26,22,25,25,26,27,22,31)
Y <- matrix(y, nrow = 10)
t <- factor(c(rep(1,6), rep(2,4)))
b <- factor(c(1,2,2,3,3,3,1,1,2,3))
Trt <- interaction(t,b)
data <- data.frame(Y, t, b, Trt)

options(contrasts=c("contr.sum", "contr.sum"))
fit <- lm(Y ~ t + b + t*b, data = data)
summary(fit)

## 
## Call:
## lm(formula = Y ~ t + b + t * b, data = data)
## 
## Residuals:
##          1          2          3          4          5          6          7 
## -1.300e-16 -5.000e-01  5.000e-01 -2.000e+00  1.000e+00  1.000e+00 -5.000e-01 
##          8          9         10 
##  5.000e-01 -3.921e-16  3.018e-16 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  24.8333     0.4590  54.107 6.98e-07 ***
## t1           -1.6667     0.4590  -3.631  0.02213 *  
## b1           -1.5833     0.6553  -2.416  0.07306 .  
## b2           -1.0833     0.6553  -1.653  0.17363    
## t1:b1        -1.5833     0.6553  -2.416  0.07306 .  
## t1:b2         3.4167     0.6553   5.214  0.00645 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.323 on 4 degrees of freedom
## Multiple R-squared:  0.9176, Adjusted R-squared:  0.8145 
## F-statistic: 8.903 on 5 and 4 DF,  p-value: 0.02733

Type I - sequential sum of squares, this tests the main effect for factor T, followed by main effect factor B after the main effect of T, followed by the interaction effect TB, after the main effects.

Because of the sequential nature and the fact that the two main factors are tested in a particular order, this type of sums of squares will give different results for unbalanced data depending on which main effect is considered first.

For unbalanced data, this approach tests for a difference in the weighted marginal means. In practical terms, this means that the results are dependent on the realized sample sizes, namely the proportions in the particular data set. In other words, it is testing the first factor without controlling for the other factor

Note that this is often not the hypothesis that is of interest when dealing with unbalanced data.

anova1 <- anova(fit)
anova1

## Analysis of Variance Table
## 
## Response: Y
##           Df Sum Sq Mean Sq F value  Pr(>F)  
## t          1 17.067  17.067  9.7524 0.03543 *
## b          2 12.980   6.490  3.7086 0.12275  
## t:b        2 47.853  23.927 13.6724 0.01629 *
## Residuals  4  7.000   1.750                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Type II - This type tests for each main effect after the other main effect.

SS(A | B) for factor A. SS(B | A) for factor B.

Note that no significant interaction is assumed (in other words, you should test for interaction first (SS(AB | A, B)) and only if AB is not significant, continue with the analysis for main effects).

If there is indeed no interaction, then type II is statistically more powerful than type III (see Langsrud [3] for further details).

Computationally, this is equivalent to running a type I analysis with different orders of the factors, and taking the appropriate output (the second, where one main effect is run after the other, in the example above).

anova2 <- Anova(fit, type = "II")
anova2

## Anova Table (Type II tests)
## 
## Response: Y
##           Sum Sq Df F value  Pr(>F)  
## t         25.230  1 14.4171 0.01915 *
## b         12.980  2  3.7086 0.12275  
## t:b       47.853  2 13.6724 0.01629 *
## Residuals  7.000  4                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Type III - This type tests for the presence of a main effect after the other main effect and interaction. This approach is therefore valid in the presence of significant interactions.

SS(A | B, AB) for factor A.SS(B | A, AB) for factor B.

However, it is often not interesting to interpret a main effect if interactions are present (generally speaking, if a significant interaction is present, the main effects should not be further analysed).

If the interactions are not significant, type II gives a more powerful test.

anova3 <- Anova(fit, type = "III")
anova3

## Anova Table (Type III tests)
## 
## Response: Y
##             Sum Sq Df   F value    Pr(>F)    
## (Intercept) 5123.3  1 2927.6044 6.985e-07 ***
## t             23.1  1   13.1868   0.02213 *  
## b             31.1  2    8.8724   0.03384 *  
## t:b           47.9  2   13.6724   0.01629 *  
## Residuals      7.0  4                        
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

NOTE: when data is balanced, the factors are orthogonal, and types I, II and III all give the same results.

Summary: Usually the hypothesis of interest is about the significance of one factor while controlling for the level of the other factors. This equates to using type II or III SS. In general, if there is no significant interaction effect, then type II is more powerful, and follows the principle of marginality. If interaction is present, then type II is inappropriate while type III can still be used, but results need to be interpreted with caution (in the presence of interactions, main effects are rarely interpretable).

a. cont’d - Also, obtain the least-squares means via emmeans in R (or LSMEANS in SAS)for the levels of T and B and the interaction.

em <- emmeans(fit,list(~t,~b,~t*b))

## NOTE: Results may be misleading due to involvement in interactions
## NOTE: Results may be misleading due to involvement in interactions

em$`emmeans of t`

##  t emmean    SE df lower.CL upper.CL
##  1   23.2 0.597  4     21.5     24.8
##  2   26.5 0.697  4     24.6     28.4
## 
## Results are averaged over the levels of: b 
## Confidence level used: 0.95

em$`emmeans of b`

##  b emmean    SE df lower.CL upper.CL
##  1   23.2 0.810  4     21.0     25.5
##  2   23.8 0.810  4     21.5     26.0
##  3   27.5 0.764  4     25.4     29.6
## 
## Results are averaged over the levels of: t 
## Confidence level used: 0.95

em$`emmeans of t, b`

##  t b emmean    SE df lower.CL upper.CL
##  1 1   20.0 1.323  4     16.3     23.7
##  2 1   26.5 0.935  4     23.9     29.1
##  1 2   25.5 0.935  4     22.9     28.1
##  2 2   22.0 1.323  4     18.3     25.7
##  1 3   24.0 0.764  4     21.9     26.1
##  2 3   31.0 1.323  4     27.3     34.7
## 
## Confidence level used: 0.95

emb <- emmeans(fit,pairwise~b)

## NOTE: Results may be misleading due to involvement in interactions

emb

## $emmeans
##  b emmean    SE df lower.CL upper.CL
##  1   23.2 0.810  4     21.0     25.5
##  2   23.8 0.810  4     21.5     26.0
##  3   27.5 0.764  4     25.4     29.6
## 
## Results are averaged over the levels of: t 
## Confidence level used: 0.95 
## 
## $contrasts
##  contrast estimate   SE df t.ratio p.value
##  1 - 2       -0.50 1.15  4 -0.436  0.9028 
##  1 - 3       -4.25 1.11  4 -3.817  0.0403 
##  2 - 3       -3.75 1.11  4 -3.368  0.0594 
## 
## Results are averaged over the levels of: t 
## P value adjustment: tukey method for comparing a family of 3 estimates

b. Obtain the following using matrix operations in R or SAS PROC IML(or a similar matrix processor):

(1) Using the SUM to zero restriction, set up the design matrix for this problem.

x <- matrix(c(1,1,1,1,1,1,1,1,1,1,
              1,1,1,1,1,1,-1,-1,-1,-1,
              1,0,0,-1,-1,-1,1,1,0,-1,
              0,1,1,-1,-1,-1,0,0,1,-1,
              1,0,0,-1,-1,-1,-1,-1,0,1,
              0,1,1,-1,-1,-1,0,0,-1,1), nrow = 10)
x

##       [,1] [,2] [,3] [,4] [,5] [,6]
##  [1,]    1    1    1    0    1    0
##  [2,]    1    1    0    1    0    1
##  [3,]    1    1    0    1    0    1
##  [4,]    1    1   -1   -1   -1   -1
##  [5,]    1    1   -1   -1   -1   -1
##  [6,]    1    1   -1   -1   -1   -1
##  [7,]    1   -1    1    0   -1    0
##  [8,]    1   -1    1    0   -1    0
##  [9,]    1   -1    0    1    0   -1
## [10,]    1   -1   -1   -1    1    1

(2) Obtain the least squares solution vector for this design matrix.

beta=solve(t(x)%*%x)%*%t(x)%*%Y
beta

##           [,1]
## [1,] 24.833333
## [2,] -1.666667
## [3,] -1.583333
## [4,] -1.083333
## [5,] -1.583333
## [6,]  3.416667

(3) For this solution vector, develop the K’ matrix for testing the hypothesis that all three lsmeans for Factor B are all equal, ie µ.1 = µ.2 = µ.3

b1 <- as.matrix(c(0,0,1,0,0,0)); t(b1)%*%beta; #b1 contrast matrix;

##           [,1]
## [1,] -1.583333

b2 <- as.matrix(c(0,0,0,1,0,0)); t(b2)%*%beta; #b2 contrast matrix;

##           [,1]
## [1,] -1.083333

Kb <- matrix(c(0,0,0,0,1,0,0,1,0,0,0,0), nrow =2); Kb%*%beta; #b.main contrast matrix

##           [,1]
## [1,] -1.583333
## [2,] -1.083333

(4) Use this K’ matrix to compute the appropriate F value to test H0: K’B= 0.

e <- y-x%*%beta
epe <- t(e)%*%e
mse <- epe/4 #divide by df of residual
xpx <- t(x)%*%x
r = matrix(c(0,0))
f = t(Kb%*%beta - r)%*%solve(Kb%*%solve(xpx)%*%t(Kb))%*%(Kb%*%beta -r) / (2%*%mse)
f

##          [,1]
## [1,] 8.872381

#testing with glht to compute F-score
b.main <- glht(fit, linfct = Kb); b.main; summary(b.main, test = Ftest());

## 
##   General Linear Hypotheses
## 
## Linear Hypotheses:
##        Estimate
## 1 == 0   -1.583
## 2 == 0   -1.083

## 
##   General Linear Hypotheses
## 
## Linear Hypotheses:
##        Estimate
## 1 == 0   -1.583
## 2 == 0   -1.083
## 
## Global Test:
##       F DF1 DF2  Pr(>F)
## 1 8.872   2   4 0.03384

(5) Develop the needed K’ vector to estimate µ.1 - µ.2 the difference between B1 and B2 lsmeans and compute its standard error.

Having issues with computing standard error.

b.diff <- as.matrix(c(0,0,1,1,-1,-1)); t(b.diff)%*%beta; #b.diff contrast matrix

##      [,1]
## [1,] -4.5

(6) Verify that your computed F value in (4) and your difference and standard error in (5) are correct compared to the packaged program you used in part a above. NOTE: Which ever software you use, when computing the anova, make sure to compute type III SS. If you are using R, make sure to use the Anova() function with type=”III”. If you are using SAS, the type III SS may be obtained with Proc MIXED method=type3 or with Proc GLM.

the computed f value in part 4, which is equal to 8.872 matches what I calculated in

(7) For the given package you used in part a above, explain the type of restriction the method uses to obtain a solution to the normal equations.No matter which restriction is used(eg sum to zero or another restriction),the sums of squares,mean squares and F values from the anova and lsmeans and standard errors will be identical across restrictions used.What general concept does this equivalence of results using different restriction illustrate? Why is this important?

options(contrasts=c("contr.sum", "contr.poly"))
fit.sum <- lm(Y ~ t + b + t*b, data = data)
summary(fit.sum)

## 
## Call:
## lm(formula = Y ~ t + b + t * b, data = data)
## 
## Residuals:
##          1          2          3          4          5          6          7 
## -1.300e-16 -5.000e-01  5.000e-01 -2.000e+00  1.000e+00  1.000e+00 -5.000e-01 
##          8          9         10 
##  5.000e-01 -3.921e-16  3.018e-16 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  24.8333     0.4590  54.107 6.98e-07 ***
## t1           -1.6667     0.4590  -3.631  0.02213 *  
## b1           -1.5833     0.6553  -2.416  0.07306 .  
## b2           -1.0833     0.6553  -1.653  0.17363    
## t1:b1        -1.5833     0.6553  -2.416  0.07306 .  
## t1:b2         3.4167     0.6553   5.214  0.00645 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.323 on 4 degrees of freedom
## Multiple R-squared:  0.9176, Adjusted R-squared:  0.8145 
## F-statistic: 8.903 on 5 and 4 DF,  p-value: 0.02733

options(contrasts=c("contr.treatment", "contr.poly"))
fit.set <- lm(Y ~ t + b + t*b, data = data)
summary(fit.set)

## 
## Call:
## lm(formula = Y ~ t + b + t * b, data = data)
## 
## Residuals:
##          1          2          3          4          5          6          7 
## -1.590e-17 -5.000e-01  5.000e-01 -2.000e+00  1.000e+00  1.000e+00 -5.000e-01 
##          8          9         10 
##  5.000e-01 -1.981e-16 -1.981e-16 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   20.000      1.323  15.119 0.000112 ***
## t2             6.500      1.620   4.012 0.015972 *  
## b2             5.500      1.620   3.395 0.027412 *  
## b3             4.000      1.528   2.619 0.058885 .  
## t2:b2        -10.000      2.291  -4.364 0.012021 *  
## t2:b3          0.500      2.227   0.225 0.833338    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.323 on 4 degrees of freedom
## Multiple R-squared:  0.9176, Adjusted R-squared:  0.8145 
## F-statistic: 8.903 on 5 and 4 DF,  p-value: 0.02733

em.sum <-emmeans(fit.sum,list(~t,~b,~t*b))

## NOTE: Results may be misleading due to involvement in interactions
## NOTE: Results may be misleading due to involvement in interactions

em.sum

## $`emmeans of t`
##  t emmean    SE df lower.CL upper.CL
##  1   23.2 0.597  4     21.5     24.8
##  2   26.5 0.697  4     24.6     28.4
## 
## Results are averaged over the levels of: b 
## Confidence level used: 0.95 
## 
## $`emmeans of b`
##  b emmean    SE df lower.CL upper.CL
##  1   23.2 0.810  4     21.0     25.5
##  2   23.8 0.810  4     21.5     26.0
##  3   27.5 0.764  4     25.4     29.6
## 
## Results are averaged over the levels of: t 
## Confidence level used: 0.95 
## 
## $`emmeans of t, b`
##  t b emmean    SE df lower.CL upper.CL
##  1 1   20.0 1.323  4     16.3     23.7
##  2 1   26.5 0.935  4     23.9     29.1
##  1 2   25.5 0.935  4     22.9     28.1
##  2 2   22.0 1.323  4     18.3     25.7
##  1 3   24.0 0.764  4     21.9     26.1
##  2 3   31.0 1.323  4     27.3     34.7
## 
## Confidence level used: 0.95

em.set <-emmeans(fit.set,list(~t,~b,~t*b))

## NOTE: Results may be misleading due to involvement in interactions
## NOTE: Results may be misleading due to involvement in interactions

em.set

## $`emmeans of t`
##  t emmean    SE df lower.CL upper.CL
##  1   23.2 0.597  4     21.5     24.8
##  2   26.5 0.697  4     24.6     28.4
## 
## Results are averaged over the levels of: b 
## Confidence level used: 0.95 
## 
## $`emmeans of b`
##  b emmean    SE df lower.CL upper.CL
##  1   23.2 0.810  4     21.0     25.5
##  2   23.8 0.810  4     21.5     26.0
##  3   27.5 0.764  4     25.4     29.6
## 
## Results are averaged over the levels of: t 
## Confidence level used: 0.95 
## 
## $`emmeans of t, b`
##  t b emmean    SE df lower.CL upper.CL
##  1 1   20.0 1.323  4     16.3     23.7
##  2 1   26.5 0.935  4     23.9     29.1
##  1 2   25.5 0.935  4     22.9     28.1
##  2 2   22.0 1.323  4     18.3     25.7
##  1 3   24.0 0.764  4     21.9     26.1
##  2 3   31.0 1.323  4     27.3     34.7
## 
## Confidence level used: 0.95

c. Briefly explain why the TYPE III sums of squares are usually the relevant sums of squares in this type of an unbalanced design when the data are from a designed experiment.

Type III - This type tests for the presence of a main effect after the other main effect and interaction. This approach is therefore valid in the presence of significant interactions.

SS(A | B, AB) for factor A.SS(B | A, AB) for factor B.

If the interactions are not significant, type II gives a more powerful test.

When data is balanced, the factors are orthogonal, and types I, II and III all give the same results. Usually the hypothesis of interest is about the significance of one factor while controlling for the level of the other factors. This equates to using type II or III SS. In general, if there is no significant interaction effect, then type II is more powerful, and follows the principle of marginality. If interaction is present, then type II is inappropriate while type III can still be used, but results need to be interpreted with caution (in the presence of interactions, main effects are rarely interpretable).

3. A split-plot experiment is conducted where the whole plot factor (A) has 3 levels and is set out as a randomized complete block design with 2 random blocks. The split plot factor (B) has two levels.

library(lme4)

## Loading required package: Matrix

## Registered S3 methods overwritten by 'lme4':
##   method                          from
##   cooks.distance.influence.merMod car 
##   influence.merMod                car 
##   dfbeta.influence.merMod         car 
##   dfbetas.influence.merMod        car

options(contrasts = c("contr.treatment","contr.poly"))
dat = matrix(c(1,2,3,1,2,3,1,2,3,1,2,3,
               1,1,1,2,2,2,1,1,1,2,2,2,
               1,1,1,1,1,1,2,2,2,2,2,2,
               2,3,4,6,9,8,1,3,3,7,8,8), nrow = 12)
dat

##       [,1] [,2] [,3] [,4]
##  [1,]    1    1    1    2
##  [2,]    2    1    1    3
##  [3,]    3    1    1    4
##  [4,]    1    2    1    6
##  [5,]    2    2    1    9
##  [6,]    3    2    1    8
##  [7,]    1    1    2    1
##  [8,]    2    1    2    3
##  [9,]    3    1    2    3
## [10,]    1    2    2    7
## [11,]    2    2    2    8
## [12,]    3    2    2    8

A = factor(dat[,1])
B = factor(dat[,2])
Block = factor(dat[,3])
Y = dat[,4]
datframe <- data.frame(Y,Block,A,B)
result <- lmer(Y ~ A*B + (1|Block), data = datframe)
summary(result)

## Linear mixed model fit by REML ['lmerMod']
## Formula: Y ~ A * B + (1 | Block)
##    Data: datframe
## 
## REML criterion at convergence: 14.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -0.8661 -0.8660  0.0000  0.8660  0.8661 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev.
##  Block    (Intercept) 5.889e-06 0.002427
##  Residual             3.333e-01 0.577345
## Number of obs: 12, groups:  Block, 2
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   1.5000     0.4082   3.674
## A2            1.5000     0.5773   2.598
## A3            2.0000     0.5773   3.464
## B2            5.0000     0.5773   8.660
## A2:B2         0.5000     0.8165   0.612
## A3:B2        -0.5000     0.8165  -0.612
## 
## Correlation of Fixed Effects:
##       (Intr) A2     A3     B2     A2:B2 
## A2    -0.707                            
## A3    -0.707  0.500                     
## B2    -0.707  0.500  0.500              
## A2:B2  0.500 -0.707 -0.354 -0.707       
## A3:B2  0.500 -0.354 -0.707 -0.707  0.500

Anova(result, type="III", test = "F")

## Analysis of Deviance Table (Type III Wald F tests with Kenward-Roger df)
## 
## Response: Y
##                   F Df Df.res    Pr(>F)    
## (Intercept) 13.5000  1      6 0.0104017 *  
## A            6.5001  2      5 0.0406659 *  
## B           75.0013  1      5 0.0003392 ***
## A:B          0.7500  2      5 0.5189640    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

a. Define the linear model for Yijk for this experiment, being sure to define all fixed and random effects making all ‘standard’ assumptions.

\(Y_{ijk}\) = \(\mu\) + \(\alpha_i\) + \(\beta_j\) + \(\alpha\beta_{ij}\) + \(\gamma_k\) + \(\delta_{ik}\) + \(\epsilon_{ijk}\)

where i = 1,2,3 and j = 1,2 and k = 1,2

the terms in this model have the following meanings:

µ: the overall mean

\(\alpha_i\): the effect of the \(i^{th}\) level of factor A, the whole unit treatments; a fixed effect, \(\sum\)\(\alpha_i = 0\)

\(\beta_j\): the effect of the \(j^{th}\) level of factor B, the subunit treatment; a fixed effect, \(\sum\)\(\beta_j = 0\)

\(\alpha\beta_{ij}\): the interaction effect between the \(i^{th}\) level of factor A and the \(j^{th}\) level of factor B

\(\gamma_k\): the \(k^{th}\) block effect; blocks are random

\(\delta_{ik}\): the whole-unit random component, \(\delta_{ik}\) IND(0, \(\sigma^2_D\))

\(\epsilon_{ijk}\): the subunit random component, \(\epsilon_{ijk}\) IND(0, \(\sigma^2\))

b. Give the skeleton ANOVA (source df and expected mean squares) and show the proper ratio of Mean squares to test A, B and AB effects.

c. Assume you will use the mixed model approach to analyze the data. Give the relevant X and Z matrices for the mixed model incorporating the set to zero restriction on the X matrix only.

X <- model.matrix(result)
Xp <- t(X)
Z <- matrix(c(1,0,0,0,0,0,
              1,0,0,0,0,0,
              0,1,0,0,0,0,
              0,1,0,0,0,0,
              0,0,1,0,0,0,
              0,0,1,0,0,0,
              0,0,0,1,0,0,
              0,0,0,1,0,0,
              0,0,0,0,1,0,
              0,0,0,0,1,0,
              0,0,0,0,0,1,
              0,0,0,0,0,1),nrow =12)
Z

##       [,1] [,2] [,3] [,4] [,5] [,6]
##  [1,]    1    0    0    0    0    0
##  [2,]    0    1    0    0    0    0
##  [3,]    0    0    1    0    0    0
##  [4,]    0    0    0    1    0    0
##  [5,]    0    0    0    0    1    0
##  [6,]    0    0    0    0    0    1
##  [7,]    1    0    0    0    0    0
##  [8,]    0    1    0    0    0    0
##  [9,]    0    0    1    0    0    0
## [10,]    0    0    0    1    0    0
## [11,]    0    0    0    0    1    0
## [12,]    0    0    0    0    0    1

sigma.squared.u = 0.333
sigma.squared = 5.889e-06

V = sigma.squared.u*(Z%*%t(Z)) + sigma.squared*diag(12)
V

##            [,1]      [,2]      [,3]      [,4]      [,5]      [,6]      [,7]
##  [1,] 0.3330059 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.3330000
##  [2,] 0.0000000 0.3330059 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
##  [3,] 0.0000000 0.0000000 0.3330059 0.0000000 0.0000000 0.0000000 0.0000000
##  [4,] 0.0000000 0.0000000 0.0000000 0.3330059 0.0000000 0.0000000 0.0000000
##  [5,] 0.0000000 0.0000000 0.0000000 0.0000000 0.3330059 0.0000000 0.0000000
##  [6,] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.3330059 0.0000000
##  [7,] 0.3330000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.3330059
##  [8,] 0.0000000 0.3330000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
##  [9,] 0.0000000 0.0000000 0.3330000 0.0000000 0.0000000 0.0000000 0.0000000
## [10,] 0.0000000 0.0000000 0.0000000 0.3330000 0.0000000 0.0000000 0.0000000
## [11,] 0.0000000 0.0000000 0.0000000 0.0000000 0.3330000 0.0000000 0.0000000
## [12,] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.3330000 0.0000000
##            [,8]      [,9]     [,10]     [,11]     [,12]
##  [1,] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
##  [2,] 0.3330000 0.0000000 0.0000000 0.0000000 0.0000000
##  [3,] 0.0000000 0.3330000 0.0000000 0.0000000 0.0000000
##  [4,] 0.0000000 0.0000000 0.3330000 0.0000000 0.0000000
##  [5,] 0.0000000 0.0000000 0.0000000 0.3330000 0.0000000
##  [6,] 0.0000000 0.0000000 0.0000000 0.0000000 0.3330000
##  [7,] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
##  [8,] 0.3330059 0.0000000 0.0000000 0.0000000 0.0000000
##  [9,] 0.0000000 0.3330059 0.0000000 0.0000000 0.0000000
## [10,] 0.0000000 0.0000000 0.3330059 0.0000000 0.0000000
## [11,] 0.0000000 0.0000000 0.0000000 0.3330059 0.0000000
## [12,] 0.0000000 0.0000000 0.0000000 0.0000000 0.3330059

V.inv = solve(V)
beta <- solve(Xp%*%V.inv%*%X)%*%Xp%*%V.inv%*%Y
beta

##             [,1]
## (Intercept)  1.5
## A2           1.5
## A3           2.0
## B2           5.0
## A2:B2        0.5
## A3:B2       -0.5

d. For the X matrix you developed in part c above, give the relevant K’ matrix for the null hypothesis of no effects of A.

Ka = matrix(c(0,1,1,0,0.5,0.5)) # µ,A1,A2,B1,A1:B1,A2:B1

e. Give the form of V(don’t write out zeros and 1’s, just use symbols)for this problem using appropriate terms defined in part a above and appropriately defined component matrices V.

Y = X\(\beta\) + Zu + \(\epsilon\)

V = \(\hat\sigma^{2}_u\)ZZ’ + \(\hat\sigma^{2}\)I

\(\hat\sigma^{2}\) = MSE

\(\hat\sigma^{2}_u\) = (MS\(\_{b(A)}\) - MSE)/2

f. Show how to use ANOVA and expected mean squares to estimate the relevant variance components for V.

From the summary report of the Anova looking at the variance components in the random effects, Block = 5.889e-06 (\(\hat\sigma^{2}_u\)), Residual = 3.333e-01 (\(\hat\sigma^{2}\)).

summary(result)

## Linear mixed model fit by REML ['lmerMod']
## Formula: Y ~ A * B + (1 | Block)
##    Data: datframe
## 
## REML criterion at convergence: 14.6
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -0.8661 -0.8660  0.0000  0.8660  0.8661 
## 
## Random effects:
##  Groups   Name        Variance  Std.Dev.
##  Block    (Intercept) 5.889e-06 0.002427
##  Residual             3.333e-01 0.577345
## Number of obs: 12, groups:  Block, 2
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   1.5000     0.4082   3.674
## A2            1.5000     0.5773   2.598
## A3            2.0000     0.5773   3.464
## B2            5.0000     0.5773   8.660
## A2:B2         0.5000     0.8165   0.612
## A3:B2        -0.5000     0.8165  -0.612
## 
## Correlation of Fixed Effects:
##       (Intr) A2     A3     B2     A2:B2 
## A2    -0.707                            
## A3    -0.707  0.500                     
## B2    -0.707  0.500  0.500              
## A2:B2  0.500 -0.707 -0.354 -0.707       
## A3:B2  0.500 -0.354 -0.707 -0.707  0.500

g. Give two other methods that are used for estimating variance components. Just give their names. Other methods to estimate variance components are beyond the scope of this class.

https://www.agriculturejournals.cz/publicFiles/52286.pdf

frequency approach - ANOVA-method maximum-likelihiood method (ML) restricted maximum-likelihood (REML) modified maximum-likelihood (MML) Federer’s estimator (bayesian)

Handout Example

Example - Mixed Model Analysis of CRD split-plot.

A study is conducted to compare two instructors (a=2) regarding student perceptions of gender bias in their lectures. For each instructor, two classes (r=2) are randomly selected from the current classes that the instructors are teaching. Then within a class, a female and a male student are randomly selected.Since a student of a certain gender “splits” the class into two “treatments”, gender is assumed to be the split-plot treatment (b=2) and student is the split-plot experimental unit while instructor is the whole plot treatment and class is the whole-plot experimental unit. After the class, the selected students are asked to fill out a questionnaire assessing student perceptions of gender bias. One total score is obtained from each students’ questionnaire for a total of 8 responses (yijk).

options(contrasts = c("contr.SAS", "contr.poly"))
library(MASS);library(emmeans);library(car);library(lme4);library(pbkrtest);
dat=matrix(c(1, 1, 1, 1 , 1, 1, 2, 2, 1, 2, 1, 3, 1, 2, 2, 3, 2, 1, 1, 4, 2, 1, 2 , 6, 2, 2, 1, 5, 2, 2, 2, 7),nrow=4);
m=t(dat);
a=factor(m[,1]);
r=factor(m[,2]);
b=factor(m[,3]);
y=m[,4]; 
m;

##      [,1] [,2] [,3] [,4]
## [1,]    1    1    1    1
## [2,]    1    1    2    2
## [3,]    1    2    1    3
## [4,]    1    2    2    3
## [5,]    2    1    1    4
## [6,]    2    1    2    6
## [7,]    2    2    1    5
## [8,]    2    2    2    7

datframe1=data.frame(y,a,r,b);
model<-lmer(y~a * b + (1|a:r) ,data=datframe1);
summary(model)

## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ a * b + (1 | a:r)
##    Data: datframe1
## 
## REML criterion at convergence: 10.9
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -0.8703 -0.2176  0.0000  0.2176  0.8703 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  a:r      (Intercept) 0.750    0.8660  
##  Residual             0.125    0.3536  
## Number of obs: 8, groups:  a:r, 4
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept)   6.5000     0.6614   9.827
## a1           -4.0000     0.9354  -4.276
## b1           -2.0000     0.3536  -5.657
## a1:b1         1.5000     0.5000   3.000
## 
## Correlation of Fixed Effects:
##       (Intr) a1     b1    
## a1    -0.707              
## b1    -0.267  0.189       
## a1:b1  0.189 -0.267 -0.707

Anova(model, type="III", test="F");

## Analysis of Deviance Table (Type III Wald F tests with Kenward-Roger df)
## 
## Response: y
##                  F Df Df.res   Pr(>F)   
## (Intercept) 96.571  1 2.3059 0.006245 **
## a           18.286  1 2.3059 0.039086 * 
## b           32.000  1 2.0000 0.029857 * 
## a:b          9.000  1 2.0000 0.095466 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

emmeans(model,list(~a,~b,~a*b));emmeans(model,pairwise~a);

## NOTE: Results may be misleading due to involvement in interactions
## NOTE: Results may be misleading due to involvement in interactions

## $`emmeans of a`
##  a emmean    SE df lower.CL upper.CL
##  1   2.25 0.637  2   -0.492     4.99
##  2   5.50 0.637  2    2.758     8.24
## 
## Results are averaged over the levels of: b 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95 
## 
## $`emmeans of b`
##  b emmean    SE   df lower.CL upper.CL
##  1   3.25 0.468 2.31     1.47     5.03
##  2   4.50 0.468 2.31     2.72     6.28
## 
## Results are averaged over the levels of: a 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95 
## 
## $`emmeans of a, b`
##  a b emmean    SE   df lower.CL upper.CL
##  1 1    2.0 0.661 2.31  -0.5131     4.51
##  2 1    4.5 0.661 2.31   1.9869     7.01
##  1 2    2.5 0.661 2.31  -0.0131     5.01
##  2 2    6.5 0.661 2.31   3.9869     9.01
## 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95

## NOTE: Results may be misleading due to involvement in interactions

## $emmeans
##  a emmean    SE df lower.CL upper.CL
##  1   2.25 0.637  2   -0.492     4.99
##  2   5.50 0.637  2    2.758     8.24
## 
## Results are averaged over the levels of: b 
## Degrees-of-freedom method: kenward-roger 
## Confidence level used: 0.95 
## 
## $contrasts
##  contrast estimate    SE df t.ratio p.value
##  1 - 2       -3.25 0.901  2 -3.606  0.0691 
## 
## Results are averaged over the levels of: b 
## Degrees-of-freedom method: kenward-roger

aov(y ~ a*b + Error(a:r))

## Warning in aov(y ~ a * b + Error(a:r)): Error() model is singular

## 
## Call:
## aov(formula = y ~ a * b + Error(a:r))
## 
## Grand Mean: 3.875
## 
## Stratum 1: a:r
## 
## Terms:
##                      a Residuals
## Sum of Squares  21.125     3.250
## Deg. of Freedom      1         2
## 
## Residual standard error: 1.274755
## 1 out of 2 effects not estimable
## Estimated effects are balanced
## 
## Stratum 2: Within
## 
## Terms:
##                     b   a:b Residuals
## Sum of Squares  3.125 1.125     0.250
## Deg. of Freedom     1     1         2
## 
## Residual standard error: 0.3535534
## Estimated effects may be unbalanced

y = matrix(c(1, 2, 3, 3, 4, 6, 5, 7))
x = matrix(c(1,1,1,1,1,1,0,0,1,1,1,1,1,1,0,0,1,0,1,0,1,0,0,0,1,0,1,0,1,0,0,0), ncol = 8)
xp <- x
x <- t(x)

I = matrix(c(1,0,0,0,0,0,0,0,
             0,1,0,0,0,0,0,0,
             0,0,1,0,0,0,0,0,
             0,0,0,1,0,0,0,0,
             0,0,0,0,1,0,0,0,
             0,0,0,0,0,1,0,0,
             0,0,0,0,0,0,1,0,
             0,0,0,0,0,0,0,1), nrow = 8)

z = matrix(c(1,0,0,0,
             1,0,0,0,
             0,1,0,0,
             0,1,0,0,
             0,0,1,0,
             0,0,1,0,
             0,0,0,1,
             0,0,0,1), ncol = 8)

V = .75*(t(z)%*%z) + .125*I
V.inv = solve(V) #V^-1

beta <- solve(xp%*%V.inv%*%x)%*%xp%*%V.inv%*%y
betap <- t(beta) #B'

µ1. = matrix(c(1,1,.5,.5))
µ1.p = c(1,1,.5,.5)
lsmean.ins1 <- µ1.p%*%beta

µ2. = matrix(c(1,0,.5,0))
µ2.p = c(1,0,.5,0) # k'
lsmean.ins2 <- µ2.p%*%beta

µ.diff = matrix(c(0,1,0,.5))
µ.diff.p = c(0,1,0,.5) # k'
lsmean.ins.diff = µ.diff.p%*%beta
#lsmean.ins.diff = lsmean.ins1 - lsmean.ins2 

KA = matrix(c(0,1,0,.5))
KB = matrix(c(0,0,1,.5))
KAB = matrix(c(0,0,0,1))
XVIXI = solve(t(x)%*%V.inv%*%x)

FA = betap%*%KA%*%solve(t(KA)%*%XVIXI%*%KA)%*%t(KA)%*%beta / 1
FB = betap%*%KB%*%solve(t(KB)%*%XVIXI%*%KB)%*%t(KB)%*%beta / 1
FAB = betap%*%KAB%*%solve(t(KAB)%*%XVIXI%*%KAB)%*%t(KAB)%*%beta / 1