Analysis of Covariance

Approach: - Study the effect of the categorical variable by using it along with the predictor variable and comparing the regression lines for each level of the categorical variable.

Example 1: mtcars data

input <- mtcars[,c("am","mpg","hp")]
input$am<-as.factor(input$am)
str(input)
## 'data.frame':    32 obs. of  3 variables:
##  $ am : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
##  $ mpg: num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ hp : num  110 110 93 110 175 105 245 62 95 123 ...
head(input)
##                   am  mpg  hp
## Mazda RX4          1 21.0 110
## Mazda RX4 Wag      1 21.0 110
## Datsun 710         1 22.8  93
## Hornet 4 Drive     0 21.4 110
## Hornet Sportabout  0 18.7 175
## Valiant            0 18.1 105

am (transmission type) - Cateforical Variable
hp (horse power) - Continous predictor variable
mpg (Miles per Gallon) - Response Variable

Model1: Interaction between hp and am

summary(lm(mpg~hp*am - 1,data = input))
## 
## Call:
## lm(formula = mpg ~ hp * am - 1, data = input)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.3818 -2.2696  0.1344  1.7058  5.8752 
## 
## Coefficients:
##          Estimate Std. Error t value Pr(>|t|)    
## hp     -0.0591370  0.0129449  -4.568 9.02e-05 ***
## am0    26.6248479  2.1829432  12.197 1.01e-12 ***
## am1    31.8425012  1.5288820  20.827  < 2e-16 ***
## hp:am1  0.0004029  0.0164602   0.024    0.981    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.961 on 28 degrees of freedom
## Multiple R-squared:  0.9825, Adjusted R-squared:   0.98 
## F-statistic: 393.5 on 4 and 28 DF,  p-value: < 2.2e-16
fit1 <- aov(mpg~hp*am,data = input)
summary(fit1)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## hp           1  678.4   678.4  77.391 1.50e-09 ***
## am           1  202.2   202.2  23.072 4.75e-05 ***
## hp:am        1    0.0     0.0   0.001    0.981    
## Residuals   28  245.4     8.8                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Model2: No interaction between hp and am

fit2 <- aov(mpg~hp+am,data = input)
summary(fit2)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## hp           1  678.4   678.4   80.15 7.63e-10 ***
## am           1  202.2   202.2   23.89 3.46e-05 ***
## Residuals   29  245.4     8.5                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Comparing Two Models:
Objective: to conclude if the interaction of the variables is truly in-significant

anova(fit1,fit2)
## Analysis of Variance Table
## 
## Model 1: mpg ~ hp * am
## Model 2: mpg ~ hp + am
##   Res.Df    RSS Df  Sum of Sq     F Pr(>F)
## 1     28 245.43                           
## 2     29 245.44 -1 -0.0052515 6e-04 0.9806

Example 2: swiss Fertility data

Load swiss data and bin Catholic variable into 2 bins

data(swiss)
input<-swiss[,c("Fertility","Agriculture","Catholic")]
input$Catholic<-as.numeric(cut(input$Catholic,2))
head(input)
##              Fertility Agriculture Catholic
## Courtelary        80.2        17.0        1
## Delemont          83.1        45.1        2
## Franches-Mnt      92.5        39.7        2
## Moutier           85.8        36.5        1
## Neuveville        76.9        43.5        1
## Porrentruy        76.1        35.3        2

Model1: Interaction between Agriculture and Catholic

fit1 <- aov(Fertility~Agriculture*Catholic,data = input)
summary(fit1)
##                      Df Sum Sq Mean Sq F value  Pr(>F)   
## Agriculture           1    895   894.8   7.377 0.00948 **
## Catholic              1   1067  1066.5   8.793 0.00492 **
## Agriculture:Catholic  1      1     0.7   0.006 0.93900   
## Residuals            43   5216   121.3                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Model2: No Interaction between Agriculture and Catholic

fit2 <- aov(Fertility~Agriculture+Catholic,data = input)
summary(fit2)
##             Df Sum Sq Mean Sq F value  Pr(>F)   
## Agriculture  1    895   894.8   7.548 0.00868 **
## Catholic     1   1067  1066.5   8.996 0.00444 **
## Residuals   44   5217   118.6                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Comparing Two Models:

anova(fit1,fit2)
## Analysis of Variance Table
## 
## Model 1: Fertility ~ Agriculture * Catholic
## Model 2: Fertility ~ Agriculture + Catholic
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     43 5215.9                           
## 2     44 5216.6 -1  -0.71869 0.0059  0.939

Adjusting for another variable

Adding a regressor in a Linear model to investigate its role on the relationship between another two variables. Eg: Adding smoking-status to investigate the replationship between Lung-infection and mint usage. The adjustment variable(Eg: smoking-status) is hold constant and then relationship is investigated