Blog 4: Analysis of Covariance (ANCOVA)

Analysis of covariance (ANCOVA) is a statistical method that allows accounting for third variabls, called covariates, when investigating the relationship between an independent and a dependent variable. The covariate is a continuous, never the key independent variable, and always observed. For example, there is a study that looked to estimate the effect of training(independent variable) on job performance (defendant variable). There will be a unexplained variation between those who took the training and those who didn’t. To account for the variation, a performance test can be given prior to the training to get everyone’s baseline performance. This performance test can be the covariate.

We use Regression analysis to create models which describe the effect of variation in independent variables on the dependent variable. If there are categorical variable with binary values like Yes/No or Male/Female, the simple regression analysis gives multiple results for each value of the categorical variable. ANCOVA allows us to see effect of the categorical variable by using it along with the predictor variable and comparing the regression lines for each level of the categorical variable.

In this example we will use select columns from the built in dataset mtcars in R. The columns we will use are:
am: the type of transmission (automatic or manual)
mpg: miles per gallon
hp: horse power

We want see the effect of “am” on the regression between “mpg” and “hp”

library(dplyr)
library(tidyr)
library(skimr)

mtcars_2 <- mtcars %>% select(am, mpg, hp)
skim(mtcars_2)

Data summary
Name	mtcars_2
Number of rows	32
Number of columns	3
_______________________
Column type frequency:
numeric	3
________________________
Group variables	None

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
am	1	0.41	0.50	0.0	0.00	0.0	1.0	1.0	▇▁▁▁▆
mpg	1	20.09	6.03	10.4	15.43	19.2	22.8	33.9	▃▇▅▁▂
hp	1	146.69	68.56	52.0	96.50	123.0	180.0	335.0	▇▇▆▃▁

mtcars_m <- aov(mpg~hp*am,data = mtcars_2)
summary(mtcars_m)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## hp           1  678.4   678.4  77.391 1.50e-09 ***
## am           1  202.2   202.2  23.072 4.75e-05 ***
## hp:am        1    0.0     0.0   0.001    0.981    
## Residuals   28  245.4     8.8                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This above shows that both hp and am are significant, though the interaction between hp:am is not. We can not model the interaction between categorical variable and predictor variable and see that both variables are significant to the impact on mpg

mtcars_mv2 <- aov(mpg~hp+am,data = mtcars_2)
summary(mtcars_mv2)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## hp           1  678.4   678.4   80.15 7.63e-10 ***
## am           1  202.2   202.2   23.89 3.46e-05 ***
## Residuals   29  245.4     8.5                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can compare the two models to see if the interaction of the variables is truly in-significant

m1 <- aov(mpg~hp*am,data = mtcars_2)
m2 <- aov(mpg~hp+am,data = mtcars_2)

anova(m1,m2)

## Analysis of Variance Table
## 
## Model 1: mpg ~ hp * am
## Model 2: mpg ~ hp + am
##   Res.Df    RSS Df  Sum of Sq     F Pr(>F)
## 1     28 245.43                           
## 2     29 245.44 -1 -0.0052515 6e-04 0.9806

For the result we can see that the interaction between hp and am is not significant.

Blog 4: Analysis of Covariance (ANCOVA)

Dhairav Chhatbar

12/9/2020