Approach: - Study the effect of the categorical variable by using it along with the predictor variable and comparing the regression lines for each level of the categorical variable.
input <- mtcars[,c("am","mpg","hp")]
input$am<-as.factor(input$am)
str(input)
## 'data.frame': 32 obs. of 3 variables:
## $ am : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
## $ mpg: num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
head(input)
## am mpg hp
## Mazda RX4 1 21.0 110
## Mazda RX4 Wag 1 21.0 110
## Datsun 710 1 22.8 93
## Hornet 4 Drive 0 21.4 110
## Hornet Sportabout 0 18.7 175
## Valiant 0 18.1 105
am (transmission type) - Cateforical Variable
hp (horse power) - Continous predictor variable
mpg (Miles per Gallon) - Response Variable
Model1: Interaction between hp and am
summary(lm(mpg~hp*am - 1,data = input))
##
## Call:
## lm(formula = mpg ~ hp * am - 1, data = input)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.3818 -2.2696 0.1344 1.7058 5.8752
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## hp -0.0591370 0.0129449 -4.568 9.02e-05 ***
## am0 26.6248479 2.1829432 12.197 1.01e-12 ***
## am1 31.8425012 1.5288820 20.827 < 2e-16 ***
## hp:am1 0.0004029 0.0164602 0.024 0.981
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.961 on 28 degrees of freedom
## Multiple R-squared: 0.9825, Adjusted R-squared: 0.98
## F-statistic: 393.5 on 4 and 28 DF, p-value: < 2.2e-16
fit1 <- aov(mpg~hp*am,data = input)
summary(fit1)
## Df Sum Sq Mean Sq F value Pr(>F)
## hp 1 678.4 678.4 77.391 1.50e-09 ***
## am 1 202.2 202.2 23.072 4.75e-05 ***
## hp:am 1 0.0 0.0 0.001 0.981
## Residuals 28 245.4 8.8
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Model2: No interaction between hp and am
fit2 <- aov(mpg~hp+am,data = input)
summary(fit2)
## Df Sum Sq Mean Sq F value Pr(>F)
## hp 1 678.4 678.4 80.15 7.63e-10 ***
## am 1 202.2 202.2 23.89 3.46e-05 ***
## Residuals 29 245.4 8.5
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Comparing Two Models:
Objective: to conclude if the interaction of the variables is truly in-significant
anova(fit1,fit2)
## Analysis of Variance Table
##
## Model 1: mpg ~ hp * am
## Model 2: mpg ~ hp + am
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 28 245.43
## 2 29 245.44 -1 -0.0052515 6e-04 0.9806
Load swiss data and bin Catholic variable into 2 bins
data(swiss)
input<-swiss[,c("Fertility","Agriculture","Catholic")]
input$Catholic<-as.numeric(cut(input$Catholic,2))
head(input)
## Fertility Agriculture Catholic
## Courtelary 80.2 17.0 1
## Delemont 83.1 45.1 2
## Franches-Mnt 92.5 39.7 2
## Moutier 85.8 36.5 1
## Neuveville 76.9 43.5 1
## Porrentruy 76.1 35.3 2
Model1: Interaction between Agriculture and Catholic
fit1 <- aov(Fertility~Agriculture*Catholic,data = input)
summary(fit1)
## Df Sum Sq Mean Sq F value Pr(>F)
## Agriculture 1 895 894.8 7.377 0.00948 **
## Catholic 1 1067 1066.5 8.793 0.00492 **
## Agriculture:Catholic 1 1 0.7 0.006 0.93900
## Residuals 43 5216 121.3
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Model2: No Interaction between Agriculture and Catholic
fit2 <- aov(Fertility~Agriculture+Catholic,data = input)
summary(fit2)
## Df Sum Sq Mean Sq F value Pr(>F)
## Agriculture 1 895 894.8 7.548 0.00868 **
## Catholic 1 1067 1066.5 8.996 0.00444 **
## Residuals 44 5217 118.6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Comparing Two Models:
anova(fit1,fit2)
## Analysis of Variance Table
##
## Model 1: Fertility ~ Agriculture * Catholic
## Model 2: Fertility ~ Agriculture + Catholic
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 43 5215.9
## 2 44 5216.6 -1 -0.71869 0.0059 0.939
Adding a regressor in a Linear model to investigate its role on the relationship between another two variables. Eg: Adding smoking-status to investigate the replationship between Lung-infection and mint usage. The adjustment variable(Eg: smoking-status) is hold constant and then relationship is investigated