also called factor or qualitative variables
limited number of values, or levels
library(car)
## Loading required package: carData
Salaries<-Salaries
categorical variable with n levels will be transformed into n-1 variables each with 2 levels.
levels(Salaries$rank)
## [1] "AsstProf" "AssocProf" "Prof"
This 3 level variable will be recoded into 2 variables AssocProf and Prof
-If rank = AssocProf, then the column AssocProf would be coded with a 1 and Prof with a 0. -If rank = Prof, then the column AssocProf would be coded with a 0 and Prof would be coded with a 1. -If rank = AsstProf, then both columns “AssocProf” and “Prof” would be coded with a 0.
recoding this way creates a contrast matrix
res<-model.matrix(~rank,data=Salaries)
head(res)
## (Intercept) rankAssocProf rankProf
## 1 1 0 1
## 2 1 0 1
## 3 1 0 0
## 4 1 0 1
## 5 1 0 1
## 6 1 1 0
model_anova<-lm(salary~rank, data = Salaries)
summary(model_anova)
##
## Call:
## lm(formula = salary ~ rank, data = Salaries)
##
## Residuals:
## Min 1Q Median 3Q Max
## -68972 -16376 -1580 11755 104773
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 80776 2887 27.976 < 2e-16 ***
## rankAssocProf 13100 4131 3.171 0.00164 **
## rankProf 45996 3230 14.238 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 23630 on 394 degrees of freedom
## Multiple R-squared: 0.3943, Adjusted R-squared: 0.3912
## F-statistic: 128.2 on 2 and 394 DF, p-value: < 2.2e-16
There are 2 dummy variables, where the negative case is AssistProf
Estimate is not a slope, but a difference in group averages (AssistProf vs AssocProf and AssistProf vs Prof)
the p-value is valid for this model
the confidence intervals are valid for this model
confint(model_anova)
## 2.5 % 97.5 %
## (Intercept) 75099.519 86452.45
## rankAssocProf 4979.187 21221.72
## rankProf 39644.872 52347.38
ggResidpanel::resid_panel(model_anova)