Problem Set 4: Quantifying Effects Using Regression Models

Question 1

Answer

Interpret the intercept
The intercept is 37, which represents the control group, or those who are not exposed to watching the educational shows.

Interpret the slope coefficient
The slope of 5.7, is the treatment group. Thus, these children are expected to watch the educational shows and score ~5.7 points higher than the control group.

Interpret the R2 value
The \(r^2\) of.34 indicates that ~34% of the variations of the literacy score are to be explained by children and their watching the educational shows.

What can you conclude from this model in real-world terms?
This model suggests that exposure to educational TV has a positive and somewhat sizeable effect on children’s literacy scores.

In two or three sentences, discuss one concern you have with this analysis
A major concern for this analysis is the lack of starting scores within their groups. Because of this, those who are in the control group could have an already potentially high literacy score which would impact the experiment. The appropriate solution would be to complete before-and-after scores to determine if there were changes after watching the literacy videos.

Question 2

Answer

## The regression function for whether woman have children or not and their wage is:

## y ~ 17.61 - 3.14 * child.bin

## The mean difference in expected hourly wage between women who have children and women who don’t have children is, 3.137 .

## The expected hourly wage of women with and without children is as follows,

## data$child.bin: Children
## [1] 14.4697
## ------------------------------------------------------------ 
## data$child.bin: No Children
## [1] 17.60668

Code

model1 = lm(wage.dollars ~ child.bin, data = data)
cat("The regression function for whether woman have children or not and their wage is: ")
as.formula(
  paste0("y ~ ", round(coefficients(model1)[1],2), 
    paste(sprintf("%.2f * %s", 
                  coefficients(model1)[-1],  
                  names(coefficients(model1)[-1])), 
          collapse=" + ")
  )
)

mean.diff = meanDiff(data$wage.dollars ~ data$child.bin)

cat("The mean difference in expected hourly wage between women who have children and women who don’t have children is,", round(mean.diff[["meanDiff"]], digits = 3),".")

data$child.bin[data$child.bin == 0] = "No Children"
data$child.bin[data$child.bin == 1] = "Children"

mean.child = by(data = data$wage.dollars, INDICES = data$child.bin, FUN = mean)

cat(" The expected hourly wage of women with and without children is as follows,")

mean.child

Question 3

Answer

Level of education may be a confounding variable because those who have higher education are often associated with jobs that pay higher wages. So, in this case women who do not have children might have higher paying jobs because they had time to pursue a more advanced degree.

## The regression function for wage and number of children with a control variable of education level is:

## y ~ 7.17 - 0.55 * numChildren + 3.4 * educ.level

Interpret the coefficient of number of children
Controlling for their education level, a woman’s wage will decrease .55 cents per hour for each child they have.

Interpret the coefficient of education level
As you control for the number of children a woman has, a woman’s wage will increase by 3.40 an hour for each level of education they have.

Interpret the R^2
The \(r^2\) value of .16 means that the number of children a woman has and their level of education explain 16% of the variance in their hourly wage.

Interpret the R^2
Marital status could be a useful control variable to implement into this model. This is due to the fact that if a woman has child(ren) and is single, they’re most likely not going to have the flexibility of someone who has spousal support or no children at all. This could lead to not pursuing higher education and/or not working as much.

Conclusion of the model
The analysis suggests that having children affects the hourly wage of women. However, the low coefficient could imply that there are confounding variables that are not being accounted for in the control variables and the model should be modified to better understand the effect of children on women’s wages.

Code

model2 = lm(wage.dollars ~ numChildren + educ.level, data = data)
cat("The regression function for wage and number of children with a control variable of education level is: ")
as.formula(
  paste0("y ~ ", round(coefficients(model2)[1],2), "",
    paste(sprintf("%.2f * %s", 
                  coefficients(model2)[-1],  
                  names(coefficients(model2)[-1])), 
          collapse=" + ")
  )
)

r.2 = round(summary(model2)$r.squared, digits = 2)

Problem Set 4: Quantifying Effects Using Regression Models

Rachel Burgess

July 2022

Question 1

Answer

Question 2

Answer

Code

Question 3

Answer

Code