Analysis of Covariance

Analysis of covariance (ANCOVA) incorporates one or more regression variables into an analysis of variance. As such, we can think of it as analogous to the two-way ANOVA; except that instead of having two different factor variables as predictors, we have one factor variable and one continuous variable.

Assumptions

  1. The covariate(s) and the factor variable(s) are independent – The covariate and the factor variable should be independent of each other, since adding a covariate term into the model only makes sense if the covariate and the factor variable act independently on the response variable.
  2. The covariate(s) are continuous data. The covariates should be continuous.
  3. Homogeneity of variances – The variances among the groups should be roughly equal.
  4. Independence – The observations in each group should be independent.
  5. Normality – The data should be roughly normally distributed in each group.
  6. No extreme outliers – There should be no extreme outliers in any of the groups that could significantly affect the results of the ANCOVA.

Null Hypothesis

  1. All groups (w.r.t categorical predictor) have same mean.
  2. The response and the continuous predictor variable are independent.
  3. The categorical and continuous predictor variables have no interaction.

Fisher’s Cats data

Our primary interest is to determine whether females’ heart weights differ from males’ heart weights when both have received digitalis.

library(MASS)
data(cats)
str(cats)
## 'data.frame':    144 obs. of  3 variables:
##  $ Sex: Factor w/ 2 levels "F","M": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Bwt: num  2 2 2 2.1 2.1 2.1 2.1 2.1 2.1 2.1 ...
##  $ Hwt: num  7 7.4 9.5 7.2 7.3 7.6 8.1 8.2 8.3 8.5 ...
par(mfrow = c(1,2),mar = c(3,3.5,1,1))
boxplot(Hwt ~ Sex, data = cats)
boxplot(Hwt ~ Bwt, data = cats)

Fisher provided both heart weights and body weights, so we can ask a more complex question, “Is there a sex difference in the heart weights over and above the fact that male cats are naturally larger than female cats?”

Inspect interactions

par(mfrow=c(1,1))
interaction.plot(cats$Sex, cats$Bwt, cats$Hwt)

The means of response variable vary w.r.t change in factors (lines are not parallel), indicating strong interaction between factor (Sex) and covariate (Bwt).

Apply ANCOVA

To apply ANCOVA, first fit a linear model with interaction Bwt*Sex as predictor.

Bwt_Sex_model <- lm(Hwt ~ Bwt * Sex, data = cats)
print(summary(Bwt_Sex_model))
## 
## Call:
## lm(formula = Hwt ~ Bwt * Sex, data = cats)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.7728 -1.0118 -0.1196  0.9272  4.8646 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.9813     1.8428   1.618 0.107960    
## Bwt           2.6364     0.7759   3.398 0.000885 ***
## SexM         -4.1654     2.0618  -2.020 0.045258 *  
## Bwt:SexM      1.6763     0.8373   2.002 0.047225 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.442 on 140 degrees of freedom
## Multiple R-squared:  0.6566, Adjusted R-squared:  0.6493 
## F-statistic: 89.24 on 3 and 140 DF,  p-value: < 2.2e-16

These results indicate that Bwt and Bwt*Sex interaction are two factors that has a statistically significant effect on Hwt. Significant interaction term implies that the effect of Bwt is not consistent across Sex. That is, a male cat will have more Hwt compared to a female cat while the effect of Bwt is blocked.

print(anova(Bwt_Sex_model))
## Analysis of Variance Table
## 
## Response: Hwt
##            Df Sum Sq Mean Sq  F value  Pr(>F)    
## Bwt         1 548.09  548.09 263.6448 < 2e-16 ***
## Sex         1   0.15    0.15   0.0745 0.78535    
## Bwt:Sex     1   8.33    8.33   4.0077 0.04722 *  
## Residuals 140 291.05    2.08                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The ANCOVA summary shows the significant contribution of Bwt:Sex interaction.

Compare the mean values

Let’s inspect the adjusted means w.r.t vs after blocking effect of Bwt. You can also plot the difference in adjusted means using effect function.

adj.means <- effect("Sex", Bwt_Sex_model)
## NOTE: Sex is not a high-order term in the model
print(adj.means)
## 
##  Sex effect
## Sex
##        F        M 
## 10.16188 10.56197
plot(adj.means)

The effects::effect() function uses the output of the ANCOVA model to estimate the means of each factor level, corrected by the effect of the covariate (Bwt).

We can also compare the grouped means w.r.t vs without taking into consideration (blocking) the effect of its interaction with hp.

grouped.means <- aggregate(cats$Hwt, list(cats$Sex), FUN=mean )
print(grouped.means)
##   Group.1         x
## 1       F  9.202128
## 2       M 11.322680
Sex_model <- lm(Hwt~Sex, data=cats)
plot(effect("Sex", Sex_model))

Interestingly, the grouped means are different than adjusted means. Explain why ?

Summary

The effect of Sex on Hwt changes when its interaction with Bwt is taken into account.