Dummy Variable Trap with Example

Zahid Asghar

11/25/2020

QUALITATIVE VARIABLES

Qualitative variables are nominal scale variables which have no particular numerical values.
We can “quantify” them by creating the so-called dummy variables, which take values of 0 and 1
- 0 indicates the absence of an attribute
- 1 indicates the presence of the attribute
For example, a variable denoting gender can be quantified as female = 1 and male = 0 or vice versa.

Dummy variables are also called indicator variables, categorical variables, and qualitative variables. Examples: gender, race, color, religion, nationality, geographical region, party affiliation, and political upheavals

DUMMY VARIABLE TRAP

If an intercept is included in the model and if a qualitative variable has m categories, then introduce only (m – 1) dummy variables.
- For example, gender has only two categories; hence we introduce only one dummy variable for gender.
- This is because if a female gets a value of 1, ipso facto a male gets a value of zero.
If we consider self-reported health as a choice among excellent, good, and poor, we can have at most two dummy variables to represent the three categories.
If we do not follow this rule, we will fall into what is called the dummy variable trap, the situation of perfect collinearity.

REFERENCE CATEGORY

The category that gets the value of 0 is called the reference, benchmark, or comparison category.
All comparisons are made in relation to the reference category.
If there are several dummy variables, you must keep track of the reference category; otherwise, it will be difficult to interpret the results.

POINTS TO KEEP IN MIND

If there is an intercept in the regression model, the number of dummy variables must be one less than the number of classifications of each qualitative variable.
If you drop the (common) intercept from the model, you can have as many dummy variables as the number of categories of the dummy variable.
The coefficient of a dummy variable must always be interpreted in relation to the reference category.
Dummy variables can interact with quantitative regressors as well as with qualitative regressors. If a model has several qualitative variables with several categories, introduction of dummies for all the combinations can consume a large number of degrees of freedom.

INTERPRETATION OF DUMMY VARIABLES

Dummy coefficients are often called differential intercept dummies, for they show the differences in the intercept values of the category that gets the value of 1 as compared to the reference category.
The common intercept value refers to all those categories that take a value of 0.
If we have: Yi = B1 + B2 Fi where Y = wage and F = female dummy variable
Then, on average, females earn a wage of (B1 + B2) and males earn a wage of B1. (Note that B2 can be negative.)
Thus females earn a wage that is B2 higher than males.
Since wages tend to be skewed to the right, we might instead model the wage function as: lnYi = B1 + B2 Fi
In this case, females earn exp(B2 – 1)*100% more than males on average.
On average, male wages are equal to exp(B1), and female wages are equal to exp(B1+B2).

Data on Teachers Evaluation and Beauty

TeachingRatings <- read_excel("C:/Users/hp/Dropbox/Applied Econometrics SBP/Stock and Watson Data sets/TeachingRatings.xls")
sample_n(TeachingRatings, size=5)

age	female	beauty	course_eval	intro
49	0	1.05	4	0
42	0	0.217	3.8	0
38	1	-1.02	3.5	1
35	0	0.275	4.2	0
39	0	0.577	4.2	0

TeachingRatings1<-TeachingRatings %>% mutate(male=(1-female), advanced=(1-intro))
set.seed(12345)
sample_data<-sample_n(TeachingRatings1, size=20)

sample_data

minority	age	female	beauty	course_eval	intro	nnenglish	male	advanced
0	47	0	0.541	4.7	0	0	1	1
0	33	1	0.724	4.4	0	0	0	1
0	64	0	-0.111	4.4	0	0	1	1
1	52	0	0.212	3.5	0	1	1	1
1	52	0	0.212	3.9	0	1	1	1
0	42	0	0.217	3.7	0	0	1	1
0	57	0	0.632	4.2	0	0	1	1
0	32	0	1.23	4.3	1	0	1	0
0	47	1	0.339	3.8	1	0	0	0
0	52	1	-1.09	4.4	0	0	0	1
1	52	0	0.212	4.6	0	1	1	1
1	47	0	-1.05	3.4	0	0	1	1
0	42	0	1.77	4.9	1	0	1	0
0	60	1	-0.0567	4	0	1	0	1
0	62	0	-0.728	4	0	0	1	1
0	40	1	-0.678	4.6	0	0	0	1
0	52	1	-1.09	3.7	0	0	0	1
0	60	0	-0.395	4.5	0	0	1	1
0	57	0	-0.767	4.7	1	0	1	0
0	37	0	0.933	3.5	0	0	1	1

Dummy Variable

lm_dummy<-lm(course_eval~beauty+female,data = sample_data)

lm_saturated<-lm(course_eval~beauty+female+male,data = sample_data)

lm_nointercept<-lm(course_eval~beauty+female+male+0,data = sample_data)


huxreg("One Less Dummy"=lm_dummy,"Constant+Dummies"=lm_saturated, "full dummies without constant"=lm_nointercept) %>% set_caption("Teaching Evaluation as function of Beauty")

Teaching Evaluation as function of Beauty
	One Less Dummy	Constant+Dummies	full dummies without constant
(Intercept)	4.140 ***	4.140 ***
	(0.130)	(0.130)
beauty	0.117	0.117	0.117
	(0.142)	(0.142)	(0.142)
female	0.046	0.046	4.186 ***
	(0.242)	(0.242)	(0.198)
male			4.140 ***
			(0.130)
N	20	20	20
R2	0.039	0.039	0.989
logLik	-11.760	-11.760	-11.760
AIC	31.520	31.520	31.520
* p < 0.001; p < 0.01; * p < 0.05.