The variable race has three levels: 1 = Hispanic 2 = Black 3 = Non-Black, Non-Hispanic Create two dummy variables using the race variable: called race_black (where children who identify as black are coded as 1 and all other children are coded as 0) and race_hispanic where children who identify as hispanic are coded as 1 and all other children are coded as 0).
The following codes are used to create the two dummy variables.
NLSY <- read.csv("/Users/YanfeiQin/Desktop/NLSY.csv", header=TRUE, sep=",")
NLSY$race_black <- ifelse(NLSY$race == 2, 1, 0)
NLSY$race_hispanic <- ifelse(NLSY$race == 1, 1, 0)
head(NLSY)
## CID race gender birthord magebirth medu bthwht breastfed math read hhnum
## 1 303 3 0 3 24 13 0 1 74 80 5
## 2 1601 3 1 1 31 13 0 1 105 135 5
## 3 1901 3 0 1 29 12 0 1 91 109 2
## 4 2001 3 0 1 30 16 0 0 112 135 3
## 5 2501 3 1 1 30 14 0 1 105 117 4
## 6 2701 3 0 1 27 16 0 1 102 120 4
## race_black race_hispanic
## 1 0 0
## 2 0 0
## 3 0 0
## 4 0 0
## 5 0 0
## 6 0 0
Estimate a regression model where reading achievement scores (read) are regressed on race_black and race_hispanic.
The following code is used to estimate the required regression model.
lm <- lm(read~race_black + race_hispanic, data = NLSY)
Interpret the regression coefficients for race_black and race_hispanic.
First, print out the summary table of the regression model.
summary(lm)
##
## Call:
## lm(formula = read ~ race_black + race_hispanic, data = NLSY)
##
## Residuals:
## Min 1Q Median 3Q Max
## -42.768 -9.336 0.232 9.232 35.664
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 107.7678 0.3746 287.719 <2e-16 ***
## race_black -8.4314 0.6156 -13.695 <2e-16 ***
## race_hispanic -5.8581 0.6846 -8.557 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 14.39 on 2973 degrees of freedom
## Multiple R-squared: 0.06466, Adjusted R-squared: 0.06403
## F-statistic: 102.8 on 2 and 2973 DF, p-value: < 2.2e-16
Children who identify as black (race_black): The parameter estimate for race_black is -8.43 (rounded). We can interpret this as: On average, children who identify as black are predicted to have PIAT Reading Recognition Standard Score (read) that are 8.43 points (rounded) lower than children who do not identify as black, holding all other independent variables constant.
Children who identify as Hispanic (race_hispanic): The parameter estimate for race_hispanic is -5.86 (rounded). We can interpret this as: On average, children who identify as Hispanic are predicted to have PIAT Reading Recognition Standard Score (read) that are 5.86 points (rounded) lower than children who do not identify as Hispanic, holding all other independent variables constant.