You are still working as a research consultant for an evaluation study examining the effect of the Women, Infant and Children (WIC) Nutrition Program participation during pregnancy on the raw scores of child mathematics achievement in 1997. Using the Child Development Supplement to the Panel Study of Income Dynamics (economic.good), the evaluation team has asked you to examine if the effect of WIC program participation during pregnancy is moderated by (1) family income, (2) race, and (3) the current age of the child. Although assumptions still need to be evaluated it is not necessary to report the evaluation of the assumptions as was asked for in assignment 2. Simply state what changes were made to the data and model in order to account for assumption violations.
\[ mathraw97 = \beta_0 + \beta_1WICpreg + \beta_2faminc97 + \beta_3AGE97 + \beta_4race + \beta_5homeC + \beta_6bthwht + \epsilon \]
y = mathraw97
x = WICpreg, faminc97, AGE97, CHRACE, homeC
library(Hmisc)
library(interplot)
library(ggplot2)
library(sjPlot)
library(stats)good <- read.csv("good.csv")
goodMini <- na.omit(good[,c("CHID","mathraw97","AGE97","faminc97","HOME97","bthwht","CHRACE","WICpreg")])
str(goodMini)## 'data.frame': 2042 obs. of 8 variables:
## $ CHID : int 4041 4180 5032 6034 7041 7044 7046 7039 7040 7045 ...
## $ mathraw97: int 8 58 75 70 11 6 6 40 33 11 ...
## $ AGE97 : int 4 12 12 13 6 4 3 9 7 4 ...
## $ faminc97 : num 8420 5828 89109 115002 24635 ...
## $ HOME97 : num 15.1 19.1 21 23.1 13.5 16 17.7 18.6 16.6 16 ...
## $ bthwht : int 1 1 1 1 1 1 0 0 0 1 ...
## $ CHRACE : int 1 1 1 1 1 1 1 1 1 1 ...
## $ WICpreg : int 1 0 1 0 1 1 1 1 1 1 ...
## - attr(*, "na.action")= 'omit' Named int 1 2 4 5 6 7 10 15 23 26 ...
## ..- attr(*, "names")= chr "1" "2" "4" "5" ...
goodMini$ageC <- goodMini$AGE97 - 8
goodMini$ageC2 <- goodMini$ageC^2
goodMini$logfaminc <- ifelse(goodMini$faminc97 <= 1, 1, goodMini$faminc97)
goodMini$logfaminc <- log(goodMini$logfaminc)
goodMini$incomeC <- goodMini$logfaminc - mean(goodMini$logfaminc)
goodMini$homeC <- goodMini$HOME97- mean(goodMini$HOME97)
goodMini$race <- factor(ifelse(goodMini$CHRACE == 1, .5,
ifelse(goodMini$CHRACE == 2, -.5, NA)))
summary(goodMini$race)## -0.5 0.5 NA's
## 878 970 194
# We can choose the base category as we want. here `black` is the base category:
goodMini$race<-factor(goodMini$race, levels=c(-.5, .5),
labels=c("black","white"))
table(goodMini$race)##
## black white
## 878 970
# We can choose the base category as we want. Here, we set up `0` i.e., "non-participant" as the base category:
goodMini$WICpreg<-factor(goodMini$WICpreg, levels=c(0,1),
labels=c("non-participant","participant"))
table(goodMini$WICpreg)##
## non-participant participant
## 1191 851
# Create a Clean Dataset
goodMini <- na.omit(goodMini)
summary(goodMini)## CHID mathraw97 AGE97 faminc97
## Min. : 4041 Min. : 0.00 Min. : 3.00 Min. : 0
## 1st Qu.:1284780 1st Qu.:15.00 1st Qu.: 5.00 1st Qu.: 20687
## Median :2521676 Median :37.00 Median : 7.00 Median : 43042
## Mean :3343332 Mean :35.98 Mean : 7.39 Mean : 53452
## 3rd Qu.:5678037 3rd Qu.:54.00 3rd Qu.:10.00 3rd Qu.: 69294
## Max. :6872031 Max. :98.00 Max. :13.00 Max. :784611
## HOME97 bthwht CHRACE WICpreg
## Min. : 7.90 Min. :0.0000 Min. :1.000 non-participant:1103
## 1st Qu.:18.07 1st Qu.:0.0000 1st Qu.:1.000 participant : 745
## Median :20.60 Median :0.0000 Median :1.000
## Mean :20.23 Mean :0.3674 Mean :1.475
## 3rd Qu.:22.50 3rd Qu.:1.0000 3rd Qu.:2.000
## Max. :27.00 Max. :1.0000 Max. :2.000
## ageC ageC2 logfaminc incomeC
## Min. :-5.0000 Min. : 0.000 Min. : 0.000 Min. :-10.42329
## 1st Qu.:-3.0000 1st Qu.: 1.000 1st Qu.: 9.937 1st Qu.: -0.48605
## Median :-1.0000 Median : 9.000 Median :10.670 Median : 0.24663
## Mean :-0.6098 Mean : 8.904 Mean :10.442 Mean : 0.01837
## 3rd Qu.: 2.0000 3rd Qu.:16.000 3rd Qu.:11.146 3rd Qu.: 0.72282
## Max. : 5.0000 Max. :25.000 Max. :13.573 Max. : 3.14966
## homeC race
## Min. :-12.28521 black:878
## 1st Qu.: -2.11021 white:970
## Median : 0.41479
## Mean : 0.04969
## 3rd Qu.: 2.31479
## Max. : 6.81479
# str(goodMini)
attach(goodMini)lm1 without Interaction Terms\[ mathraw97 = \beta_0 + \beta_1WICpreg + \beta_2incomeC + \beta_3ageC + \beta_4ageC2 + \beta_5race + \beta_6homeC + \beta_7bthwht + \epsilon \]
lm1<-lm(mathraw97~WICpreg + incomeC + ageC + ageC2 + race + homeC + bthwht, data=goodMini)
summary(lm1) # R2 = 0.87##
## Call:
## lm(formula = mathraw97 ~ WICpreg + incomeC + ageC + ageC2 + race +
## homeC + bthwht, data = goodMini)
##
## Residuals:
## Min 1Q Median 3Q Max
## -40.379 -4.719 0.069 4.831 28.133
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 40.77160 0.45658 89.298 < 2e-16 ***
## WICpregparticipant -1.30307 0.46291 -2.815 0.004931 **
## incomeC 0.55479 0.18265 3.038 0.002419 **
## ageC 6.83480 0.07348 93.015 < 2e-16 ***
## ageC2 -0.10179 0.02612 -3.896 0.000101 ***
## racewhite 2.49186 0.46117 5.403 7.39e-08 ***
## homeC 0.57915 0.07412 7.814 9.25e-15 ***
## bthwht -1.47936 0.41990 -3.523 0.000437 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.996 on 1840 degrees of freedom
## Multiple R-squared: 0.8723, Adjusted R-squared: 0.8718
## F-statistic: 1796 on 7 and 1840 DF, p-value: < 2.2e-16
par(mfrow=c(2,2))
plot(lm1)# Our critical Cooks Distance value is Cook's D > 4/N OR Cook's D > 4/1848
plot(cooks.distance(lm1), pch="*", cex=2, main="Influential Obs by Cook's distance Using Official Threshold (4/1848)")
abline(h = 4/1848, col="red") # add cutoff lineoutliers <- goodMini
outliers$cd <- cooks.distance(lm1)
large_cd<-subset(outliers, cd > (4/1964))
quantile(large_cd$cd, probs = seq(0, 1, 0.05)) # from 0 to 1, by 0.05 ## 0% 5% 10% 15% 20% 25%
## 0.002057488 0.002143064 0.002234165 0.002297108 0.002401858 0.002527369
## 30% 35% 40% 45% 50% 55%
## 0.002661663 0.002748061 0.002925112 0.003048601 0.003238979 0.003484519
## 60% 65% 70% 75% 80% 85%
## 0.003698902 0.003869497 0.004344361 0.004640185 0.005175765 0.005861974
## 90% 95% 100%
## 0.007532305 0.009118859 0.030975654
# 85% = 0.00586; 90% = 0.0075
cd <- cooks.distance(lm1)
plot(cooks.distance(lm1), pch="*", cex=2, main="Influential Obs by Cooks distance")
abline(h = 0.00586, col="red") # add cutoff line
text(x=1:length(cd), y=cd, labels=ifelse(cd > 0.0058,names(cd),""), col="red", pos = 4) # add labels # pos = 4 to position at the right
goodTrim <- subset(goodMini, cd < 0.00586) # n = 1848 -> 1831
summary(goodTrim)## CHID mathraw97 AGE97 faminc97
## Min. : 4041 Min. : 0.00 Min. : 3.000 Min. : 0
## 1st Qu.:1284780 1st Qu.:15.00 1st Qu.: 5.000 1st Qu.: 20792
## Median :2518104 Median :37.00 Median : 7.000 Median : 43042
## Mean :3342144 Mean :35.82 Mean : 7.358 Mean : 53613
## 3rd Qu.:5678037 3rd Qu.:54.00 3rd Qu.:10.000 3rd Qu.: 69382
## Max. :6872031 Max. :93.00 Max. :13.000 Max. :784611
## HOME97 bthwht CHRACE WICpreg
## Min. : 7.90 Min. :0.0000 Min. :1.000 non-participant:1095
## 1st Qu.:18.07 1st Qu.:0.0000 1st Qu.:1.000 participant : 737
## Median :20.60 Median :0.0000 Median :1.000
## Mean :20.24 Mean :0.3641 Mean :1.474
## 3rd Qu.:22.50 3rd Qu.:1.0000 3rd Qu.:2.000
## Max. :27.00 Max. :1.0000 Max. :2.000
## ageC ageC2 logfaminc incomeC
## Min. :-5.0000 Min. : 0.000 Min. : 0.000 Min. :-10.42329
## 1st Qu.:-3.0000 1st Qu.: 1.000 1st Qu.: 9.942 1st Qu.: -0.48095
## Median :-1.0000 Median : 9.000 Median :10.670 Median : 0.24663
## Mean :-0.6419 Mean : 8.871 Mean :10.461 Mean : 0.03732
## 3rd Qu.: 2.0000 3rd Qu.:16.000 3rd Qu.:11.147 3rd Qu.: 0.72410
## Max. : 5.0000 Max. :25.000 Max. :13.573 Max. : 3.14966
## homeC race
## Min. :-12.28521 black:869
## 1st Qu.: -2.11021 white:963
## Median : 0.41479
## Mean : 0.05234
## 3rd Qu.: 2.31479
## Max. : 6.81479
Using the goodTrim dateset, we create three separate models (lm2, lm3 and lm4) to investigate the effect of WICpreg on Math Scores moderated by (1) family income, (2) race, and (3) the current age respectively.
WICpreg*incomeC\[ mathraw97 = \beta_0 + \beta_1WICpreg + \beta_2incomeC + \beta_3WICpreg \cdot incomeC + \beta_4ageC + \beta_5ageC + \beta_6race + \beta_7homeC + \beta_8bthwht + \epsilon \]
summary(lm2<-lm(mathraw97 ~ WICpreg*incomeC +
ageC + ageC2 + race + homeC + bthwht, data=goodTrim)) # R2 = 0.88##
## Call:
## lm(formula = mathraw97 ~ WICpreg * incomeC + ageC + ageC2 + race +
## homeC + bthwht, data = goodTrim)
##
## Residuals:
## Min 1Q Median 3Q Max
## -25.8197 -4.7074 -0.0887 4.7501 23.9006
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 40.76579 0.43994 92.663 < 2e-16 ***
## WICpregparticipant -1.33672 0.44741 -2.988 0.002849 **
## incomeC 1.16015 0.26356 4.402 1.14e-05 ***
## ageC 6.86917 0.07105 96.679 < 2e-16 ***
## ageC2 -0.09106 0.02519 -3.615 0.000309 ***
## racewhite 2.05379 0.44468 4.619 4.13e-06 ***
## homeC 0.60320 0.07175 8.407 < 2e-16 ***
## bthwht -1.53993 0.40199 -3.831 0.000132 ***
## WICpregparticipant:incomeC -0.92190 0.35584 -2.591 0.009652 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.634 on 1823 degrees of freedom
## Multiple R-squared: 0.8825, Adjusted R-squared: 0.882
## F-statistic: 1712 on 8 and 1823 DF, p-value: < 2.2e-16
WICpreg*incomeC\[ mathraw97 = 40.76 -1.34WICpreg + 1.16incomeC - 0.92WICpreg \cdot incomeC + 6.87ageC + -0.09ageC + 2.05raceWhite + 0.60homeC -1.54bthwht + \epsilon \]
For IncomeC = 1 and WICpreg = 1:
\[ mathraw97 = 40.76 -1.34\cdot 1 + 1.16\cdot 1 - 0.92 \cdot 1 \cdot 1 + 6.87 \cdot 0 + -0.09 \cdot 0 + 2.05 \cdot 0 + 0.60 \cdot 0 -1.54 \cdot 0 + \epsilon \]
\[ mathraw97 = 40.76 -1.34\cdot 1 + 1.16\cdot 1 - 0.92 \cdot 1 \cdot 1 + \epsilon \]
For IncomeC = 0 and WICpreg = 1:
\[ mathraw97 = 40.76 -1.34\cdot 1 + 1.16\cdot 0 - 0.92 \cdot 1 \cdot 0 + \epsilon \]
\[ mathraw97 = 40.76 -1.34\cdot 1 + \epsilon \]
Hence for Y(incomeC=1|WICpreg=1) vs. Y(incomeC=0|WICpreg=1)
\[ mathraw97 = (40.76 -1.34 + 1.16 - 0.92) - (40.76 -1.34) \]
\[ mathraw97 = (1.16 - 0.92) \]
Y(incomeC=1|WICpreg=1) vs. Y(incomeC=0|WICpreg=1): For average children whose mothers were WIC participants (WICpreg=1), a one percent increase in family income corresponds to an increase in math scores in 0.0024 units (i.e., [(1.161)-(0.921*1)]/100), controlling other variables constant.
Y(incomeC=1|WICpreg=0) vs. Y(incomeC=0|WICpreg=0): For average children whose mothers were NOT WIC participants (WICpreg=0), a one percent increase in family income corresponds to an increase in math scores in 0.0116 units (i.e., ((1.16*1)/100)), controlling other variables constant.
WICpregFor average children whose mothers were WIC participants (WICpreg=1), their math scores are lower than the math scores of their counterparts whose mothers were NOT WIC participants (WICpreg=0) by 1.34 units, controlling other variables constant.
(Do the same for other control variables)
Y(incomeC=1 & WICpreg=1) vs. Y(incomeC=0 & WICpreg=0): When family income is one unit above the average, the math scores of average children whose mothers were WIC participants (WICpreg=1) are lower than the math scores of their counterparts whose family income are at the mean and mothers were NOT WIC participants (WICpreg=0) by 1.1 units (i.e., 1.16-0.92-1.34), controlling other variables constant.
Y(incomeC=1 & WICpreg=0) vs. Y(incomeC=0 & WICpreg=0): When family income is one unit above the average, the math scores of average children whose mothers were NOT WIC participants (WICpreg=0) are higher than the math scores of their counterparts whose family income are at the mean and mothers were NOT WIC participants (WICpreg=0) by 1.16 units, controlling other variables constant.
# install.packages("interplot")
# `interplot` is a generic function to produce a plot of the coefficient estimates of one variable in a two-way interaction conditional on the values of the other variable in the interaction term.
# lm2: WICpreg*incomeC
plotlm2 <- interplot(m = lm2, var1 = "incomeC", var2 = "WICpreg")
plotlm2View(plotlm2$data)
plotlm2b <- interplot(m = lm2, var1 = "WICpreg", var2 = "incomeC")
plotlm2b# plot_model(lm2, type = "pred", terms = c("WICpreg", "incomeC"))
# interaction.plot(goodMini$incomeC, goodMini$WICpreg, goodMini$mathraw97)WICpreg*ageC\[ mathraw97 = \beta_0 + \beta_1WICpreg + \beta_2ageC + \beta_3ageC2 + \beta_4WICpreg \cdot ageC + \beta_5incomeC + \beta_6race + \beta_7homeC + \beta_8bthwht + \epsilon \]
summary(lm3<-lm(mathraw97 ~ WICpreg*ageC + ageC2 +
incomeC + race + homeC + bthwht, data=goodTrim))##
## Call:
## lm(formula = mathraw97 ~ WICpreg * ageC + ageC2 + incomeC + race +
## homeC + bthwht, data = goodTrim)
##
## Residuals:
## Min 1Q Median 3Q Max
## -26.2175 -4.5482 -0.0446 4.9069 23.5718
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 41.03247 0.43772 93.741 < 2e-16 ***
## WICpregparticipant -1.52604 0.45274 -3.371 0.000766 ***
## ageC 7.03722 0.08612 81.711 < 2e-16 ***
## ageC2 -0.09819 0.02523 -3.891 0.000103 ***
## incomeC 0.70441 0.19024 3.703 0.000220 ***
## racewhite 2.13268 0.44256 4.819 1.56e-06 ***
## homeC 0.61734 0.07159 8.623 < 2e-16 ***
## bthwht -1.54934 0.40134 -3.860 0.000117 ***
## WICpregparticipant:ageC -0.44349 0.12564 -3.530 0.000426 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.622 on 1823 degrees of freedom
## Multiple R-squared: 0.8829, Adjusted R-squared: 0.8824
## F-statistic: 1718 on 8 and 1823 DF, p-value: < 2.2e-16
WICpreg*ageCY(age=1|WICpreg=1) vs. Y(age=0|WICpreg=1): For average children whose mothers were WIC participants (WICpreg=1), a one unit increase in age corresponds to an increase in math scores in 6.6 units (i.e., 7.04-0.44), controlling other variables constant.
Y(age=1|WICpreg=0) vs. Y(age=0|WICpreg=0): For average children whose mothers were NOT WIC participants (WICpreg=0), a one unit increase in age corresponds to an increase in math scores in 7.04 units, controlling other variables constant.
WICpregFor average children whose mothers were WIC participants (WICpreg=1), their math scores are lower than the math scores of their counterparts whose mothers were NOT WIC participants (WICpreg=0) by 1.53 units, controlling other variables constant.
(Do the same for other control variables)
Y(ageC=1 + ageC2=1 & WICpreg=1) vs. Y(ageC=0 + ageC2=1 & WICpreg=0): The math scores of children who are one year older than the average children and whose mothers were WIC participants (WICpreg=1) are higher than the average children whose mothers were NOT WIC participants (WICpreg=0) by 4.87 units (i.e., 7.04-(2*0.098) -0.44-1.526), controlling other variables constant.
Y(ageC=1 + ageC2=1 & WICpreg=0) vs. Y(ageC=0 + ageC2=1 & WICpreg=0): Whereas, the math scores of children who are one year older than the average children and whose mothers were NOT WIC participants (WICpreg=0) are higher than the average children whose mothers were NOT WIC participants (WICpreg=0) by 6.84 units (i.e., 7.04-(0.098*2)), controlling other variables constant.
# The function `interaction.plot` creates a visual representation of the interaction between the effects of two factors, or between a factor and a numeric variable. I
# lm3: WICpreg*ageC
interaction.plot(goodTrim$ageC, goodTrim$WICpreg, goodTrim$mathraw97)plotlm3<-interplot(m = lm3, var1 = "ageC", var2 = "WICpreg")
plotlm3$data## fake coef1 ub lb value
## 1 0 7.038536 7.208818 6.871647 WICpregparticipant
## 2 1 6.595128 6.798574 6.394700 WICpregparticipant
WICpreg*race\[ mathraw97 = \beta_0 + \beta_1WICpreg + \beta_2race + \beta_3WICpreg \cdot race + \beta_4ageC + \beta_5ageC2 + \beta_6incomeC + \beta_7homeC + \beta_8bthwht + \epsilon \]
summary(lm4<-lm(mathraw97 ~ WICpreg*race +
ageC + ageC2 + incomeC + homeC + bthwht, data=goodTrim))##
## Call:
## lm(formula = mathraw97 ~ WICpreg * race + ageC + ageC2 + incomeC +
## homeC + bthwht, data = goodTrim)
##
## Residuals:
## Min 1Q Median 3Q Max
## -25.737 -4.671 -0.008 4.854 23.932
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 40.30368 0.49701 81.093 < 2e-16 ***
## WICpregparticipant -0.36143 0.56174 -0.643 0.520036
## racewhite 2.97123 0.55093 5.393 7.83e-08 ***
## ageC 6.86719 0.07104 96.661 < 2e-16 ***
## ageC2 -0.08982 0.02520 -3.565 0.000374 ***
## incomeC 0.67696 0.19055 3.553 0.000391 ***
## homeC 0.60261 0.07177 8.396 < 2e-16 ***
## bthwht -1.53407 0.40205 -3.816 0.000140 ***
## WICpregparticipant:racewhite -2.14013 0.84883 -2.521 0.011777 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.635 on 1823 degrees of freedom
## Multiple R-squared: 0.8825, Adjusted R-squared: 0.882
## F-statistic: 1712 on 8 and 1823 DF, p-value: < 2.2e-16
WICpreg*ageCY(race=white|WICpreg=1) vs. Y(race=black|WICpreg=1): For average children whose mothers were WIC participants (WICpreg=1), being
whitecorresponds to an increase in math scores in 0.83 units (i.e., 2.97-2.14), controlling other variables constant.
Y(race=white|WICpreg=0) vs. Y(race=black|WICpreg=0): For average children whose mothers were NOT WIC participants (WICpreg=0), being
whitecorresponds to an increase in math scores in 2.97 units, controlling other variables constant.
WICpregNote that the coefficient of WICpreg is not significant at the 0.05 level. (For average children whose mothers were WIC participants (WICpreg=1), their math scores are lower than the math scores of their counterparts whose mothers were NOT WIC participants (WICpreg=0) by 0.36 units, controlling other variables constant.)
(Do the same for other control variables)
Y(race=white & WICpreg=1) vs. Y(race=black & WICpreg=0): For average
whitechildren whose mothers were WIC participants (WICpreg=1), their math scores are higher than those for averageblackchildren whose mothers were NOT WIC participants (WICpreg=0) by 0.47 units (i.e., 2.97-2.14-0.36), controlling other variables constant.
Y(race=white & WICpreg=0) vs. Y(race=black & WICpreg=0): For average
whitechildren whose mothers were NOT WIC participants (WICpreg=0), their math scores are higher than those for averageblackchildren whose mothers were NOT WIC participants (WICpreg=0) by 2.97 units, controlling other variables constant.
# lm4: WICpreg*race
interaction.plot(goodTrim$race, goodTrim$WICpreg, goodTrim$mathraw97)# interplot(m = lm4, var1 = "race", var2 = "WICpreg") # does not support interactions between two factors.
plotlm4 <- plot_model(lm4, type = "int", terms = c("WICpreg", "race"))
plotlm4View(plotlm4$data)