Answer the following questions using the NLSY dataset and the NLSY Codebook.

For each question, provide your code and the answer.


Q1:

The variable race has three levels: 1 = Hispanic 2 = Black 3 = Non-Black, Non-Hispanic Create two dummy variables using the race variable: called race_black (where children who identify as black are coded as 1 and all other children are coded as 0) and race_hispanic where children who identify as hispanic are coded as 1 and all other children are coded as 0).

Answer - Q1:

The following codes are used to create the two dummy variables.

NLSY <- read.csv("/Users/YanfeiQin/Desktop/NLSY.csv", header=TRUE, sep=",")
NLSY$race_black <- ifelse(NLSY$race == 2, 1, 0)
NLSY$race_hispanic <- ifelse(NLSY$race == 1, 1, 0)
head(NLSY)
##    CID race gender birthord magebirth medu bthwht breastfed math read hhnum
## 1  303    3      0        3        24   13      0         1   74   80     5
## 2 1601    3      1        1        31   13      0         1  105  135     5
## 3 1901    3      0        1        29   12      0         1   91  109     2
## 4 2001    3      0        1        30   16      0         0  112  135     3
## 5 2501    3      1        1        30   14      0         1  105  117     4
## 6 2701    3      0        1        27   16      0         1  102  120     4
##   race_black race_hispanic
## 1          0             0
## 2          0             0
## 3          0             0
## 4          0             0
## 5          0             0
## 6          0             0


Q2:

Estimate a regression model where reading achievement scores (read) are regressed on race_black and race_hispanic.

Answer - Q2:

The following code is used to estimate the required regression model.

lm <- lm(read~race_black + race_hispanic, data = NLSY)


Q3:

Interpret the regression coefficients for race_black and race_hispanic.

Answer - Q3:

First, print out the summary table of the regression model.

summary(lm)
## 
## Call:
## lm(formula = read ~ race_black + race_hispanic, data = NLSY)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -42.768  -9.336   0.232   9.232  35.664 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   107.7678     0.3746 287.719   <2e-16 ***
## race_black     -8.4314     0.6156 -13.695   <2e-16 ***
## race_hispanic  -5.8581     0.6846  -8.557   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.39 on 2973 degrees of freedom
## Multiple R-squared:  0.06466,    Adjusted R-squared:  0.06403 
## F-statistic: 102.8 on 2 and 2973 DF,  p-value: < 2.2e-16
Interpetation

Children who identify as black (race_black): The parameter estimate for race_black is -8.43 (rounded). We can interpret this as: On average, children who identify as black are predicted to have PIAT Reading Recognition Standard Score (read) that are 8.43 points (rounded) lower than children who do not identify as black, holding all other independent variables constant.

Children who identify as Hispanic (race_hispanic): The parameter estimate for race_hispanic is -5.86 (rounded). We can interpret this as: On average, children who identify as Hispanic are predicted to have PIAT Reading Recognition Standard Score (read) that are 5.86 points (rounded) lower than children who do not identify as Hispanic, holding all other independent variables constant.