Analysis of Covariates

Analysis of covariance (ACOVA) incorporates one or more regression variables into an analysis of variance. As such, we can think of it as analogous to the two-way ANOVA; except that instead of having two different factor variables as predictors, we have one factor variable and one continuous variable.

Assumptions

  1. The covariate(s) and the factor variable(s) are independent – The covariate and the factor variable should be independent of each other, since adding a covariate term into the model only makes sense if the covariate and the factor variable act independently on the response variable.
  2. The covariate(s) are continuous data. The covariates should be continuous.
  3. Homogeneity of variances – The variances among the groups should be roughly equal.
  4. Independence – The observations in each group should be independent.
  5. Normality – The data should be roughly normally distributed in each group.
  6. No extreme outliers – There should be no extreme outliers in any of the groups that could significantly affect the results of the ANCOVA.

Null Hypothesis

  1. All groups (w.r.t categorical predictor) have same mean.
  2. The response and the continuous predictor variable are independent.
  3. The categorical and continuous predictor variables have no interaction.

Case Study: Miles Per Gallon

The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).

Check the effect of engine shape (vs) and horse power (hp) and miles per gallon (mpg). First lets have a look at the response (mpg), factor (vs) and covariate (hp).

mtcars$vs <- as.factor(mtcars$vs) 
summary(mtcars[c('mpg', 'vs', 'hp')])
##       mpg        vs           hp       
##  Min.   :10.40   0:18   Min.   : 52.0  
##  1st Qu.:15.43   1:14   1st Qu.: 96.5  
##  Median :19.20          Median :123.0  
##  Mean   :20.09          Mean   :146.7  
##  3rd Qu.:22.80          3rd Qu.:180.0  
##  Max.   :33.90          Max.   :335.0

Example 1: vs+hp

Let’s study the interaction between engine shape (vs) and horse power (hp) predictors. The model will estimate the effect of vs and hp on mpg response variable.

Visualize

First, visualize the change in mpg due to vs and hp using box plots.

par(mfrow=c(1,2))
boxplot(mpg ~ hp, data = mtcars)
boxplot(mpg ~ vs, data = mtcars)

The box plots show that the mean of response variable changes/responds to changes in factor (vs) and covariate (mpg).

Visualize interactions

Inspect the interactions between factor (vs) and covariate (hp).

interaction.plot(mtcars$vs, mtcars$hp, mtcars$mpg)

The change in response variable w.r.t vs is not consistent for different values of the continuous predictor (hp). That is the lines not parallel, indicating interactions between factor (vs) and covariate (hp). Go ahead and confirm this observation using ANCOVA.

Apply ANCOVA

To apply ANCOVA, first fit a linear model with interaction conc*Treatment as predictor.

hp_vs_model <- lm(mpg ~ hp * vs, data = mtcars)
print(summary(hp_vs_model))
## 
## Call:
## lm(formula = mpg ~ hp * vs, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.5821 -1.7710 -0.3612  1.5969  9.2646 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 24.49637    2.73893   8.944 1.07e-09 ***
## hp          -0.04153    0.01379  -3.011  0.00547 ** 
## vs1         14.50418    4.58160   3.166  0.00371 ** 
## hp:vs1      -0.11657    0.04130  -2.822  0.00868 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.428 on 28 degrees of freedom
## Multiple R-squared:  0.7077, Adjusted R-squared:  0.6764 
## F-statistic:  22.6 on 3 and 28 DF,  p-value: 1.227e-07

The hp:vs1 coefficient is very significant. Which confirms our prior observation on interaction plots. Lets input the model to anova function (fit ANCOVA).

print(anova(hp_vs_model))
## Analysis of Variance Table
## 
## Response: mpg
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## hp         1 678.37  678.37 57.7135 2.826e-08 ***
## vs         1  24.94   24.94  2.1216  0.156358    
## hp:vs      1  93.62   93.62  7.9649  0.008677 ** 
## Residuals 28 329.12   11.75                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The ANCOVA summary shows the significant contribution of hp:vs interaction.

Let’s inspect the adjusted means w.r.t vs after blocking effect of hp. You can also plot the difference in adjusted means using effect function.

adj.means <- effect("vs", hp_vs_model)
## NOTE: vs is not a high-order term in the model
print(adj.means)
## 
##  vs effect
## vs
##        0        1 
## 18.40402 15.80951
plot(adj.means)

We can also compare the grouped means w.r.t vs without taking into consideration (blocking) the effect of its interaction with hp.

grouped.means <- aggregate(mtcars$mpg, list(mtcars$vs), FUN=mean )
print(grouped.means)
##   Group.1        x
## 1       0 16.61667
## 2       1 24.55714
vs_model <- lm(mpg~vs, data=mtcars)
plot(effect("vs", vs_model))

Summary

The effect of vs on mpg changes when its interaction with hp is taken into account.