Using the added variable plot method, assess whether the highest grade completed by the child’s mother (medu) is an omitted relevant variable in the following regression model:
read = magebirth + breastfed
(include all relevant plots in your answer)
NLSY <- read.csv("/Users/YanfeiQin/Desktop/Fall 2021/897-002 Applied Linear Modeling/Lab 5/NLSY-3.csv", header=TRUE, sep=",")
NLSY2 <- na.omit(NLSY[,c("read","magebirth","breastfed","medu")])
reg1 <- lm(read ~ magebirth + breastfed, data = NLSY2)
resid_read <- as.data.frame(reg1$residuals)
reg2 <- lm(medu ~ magebirth + breastfed, data = NLSY2)
resid_medu <- as.data.frame(reg2$residuals)
plot(density(resid(reg1)))
plot(density(resid(reg2)))
qqnorm(resid(reg1))
qqline(resid(reg1))
qqnorm(resid(reg2))
qqline(resid(reg2))
By looking at the density plots and the quantile normal plots of two residuals, the residual of regression 1 is normally distributed, while the residual of regression 2 is not normally distributed.
plot(reg2$residuals,reg1$residuals)
abline(lm(reg1$residuals ~ reg2$residuals), col = "red")
lines(lowess(reg1$residuals ~ reg2$residuals), col = "blue")
Since the regression and lowess lines are not horizontal, there is reason to suspect that the highest grade completed by the child’s mother (medu) is an omitted relevant variable in the regression model read = magebirth + breastfed.