Categorical Variables

also called factor or qualitative variables

limited number of values, or levels

library(car)
## Loading required package: carData
Salaries<-Salaries

categorical variable with n levels will be transformed into n-1 variables each with 2 levels.

levels(Salaries$rank)
## [1] "AsstProf"  "AssocProf" "Prof"

This 3 level variable will be recoded into 2 variables AssocProf and Prof

-If rank = AssocProf, then the column AssocProf would be coded with a 1 and Prof with a 0. -If rank = Prof, then the column AssocProf would be coded with a 0 and Prof would be coded with a 1. -If rank = AsstProf, then both columns “AssocProf” and “Prof” would be coded with a 0.

recoding this way creates a contrast matrix

res<-model.matrix(~rank,data=Salaries)
head(res)
##   (Intercept) rankAssocProf rankProf
## 1           1             0        1
## 2           1             0        1
## 3           1             0        0
## 4           1             0        1
## 5           1             0        1
## 6           1             1        0

Make Model

model_anova<-lm(salary~rank, data = Salaries)
summary(model_anova)
## 
## Call:
## lm(formula = salary ~ rank, data = Salaries)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -68972 -16376  -1580  11755 104773 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      80776       2887  27.976  < 2e-16 ***
## rankAssocProf    13100       4131   3.171  0.00164 ** 
## rankProf         45996       3230  14.238  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 23630 on 394 degrees of freedom
## Multiple R-squared:  0.3943, Adjusted R-squared:  0.3912 
## F-statistic: 128.2 on 2 and 394 DF,  p-value: < 2.2e-16

Interpretation

There are 2 dummy variables, where the negative case is AssistProf

Estimate is not a slope, but a difference in group averages (AssistProf vs AssocProf and AssistProf vs Prof)

Goodness of fit tests

the p-value is valid for this model

the confidence intervals are valid for this model

confint(model_anova)
##                   2.5 %   97.5 %
## (Intercept)   75099.519 86452.45
## rankAssocProf  4979.187 21221.72
## rankProf      39644.872 52347.38
ggResidpanel::resid_panel(model_anova)