m1 <- lm(bwt ~ smoke * nonwhite, data = birthwt)
summary(m1)
##
## Call:
## lm(formula = bwt ~ smoke * nonwhite, data = birthwt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2407.75 -416.85 31.25 483.25 1561.25
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3428.7 102.7 33.378 < 2e-16 ***
## smokesmoker -601.9 139.6 -4.312 2.62e-05 ***
## nonwhitenonwhite -604.2 130.7 -4.622 7.12e-06 ***
## smokesmoker:nonwhitenonwhite 419.5 217.1 1.932 0.0548 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 681.4 on 185 degrees of freedom
## Multiple R-squared: 0.1408, Adjusted R-squared: 0.1268
## F-statistic: 10.1 on 3 and 185 DF, p-value: 3.393e-06
Interpretation for each group - Non-smoking, white mothers: This is the reference group, so their birth weights are around the intercept, on average. - Smoking, white mothers: This group has birth weights that are typically lower than average. - Non-smoking, non-white mothers: This group also has birth rates that are typically lower than average, but only slightly worse than smoking white mothers. - Smoking, non-white mothers: This group has lower birth rates on average, as the interaction term increases the intercept a bit but not enough to compensate for the negative association for both smokers and non-white mothers.
Interpretion of the interaction term: Smoking impacts the relationship between race and birth weight in a positive manner, though the relationship is not statistically significant.
#2. Nominal by continuous
m2 <- lm(bwt ~ smoke * age, data = birthwt)
summary(m2)
##
## Call:
## lm(formula = bwt ~ smoke * age, data = birthwt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2189.27 -458.46 51.46 527.26 1521.39
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2406.06 292.19 8.235 3.18e-14 ***
## smokesmoker 798.17 484.34 1.648 0.1011
## age 27.73 12.15 2.283 0.0236 *
## smokesmoker:age -46.57 20.45 -2.278 0.0239 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 709.3 on 185 degrees of freedom
## Multiple R-squared: 0.06909, Adjusted R-squared: 0.054
## F-statistic: 4.577 on 3 and 185 DF, p-value: 0.004068
Without interaction, With only main effects, we assume that smoking and age have only their respective relationships with birth weight. Furthermore, those relationships are both positive but hardly statistically significant.
With interaction, Including an interaction term, we assume that age impacts the relationship that smoking has with birth weights in a negative way, and this relationship is slightly statistically significant.
Interpretation of Interaction: Coefficient The interaction term has a negative coefficient, meaning that the older women get who spoke, the lower their birth weights are.
Interpretation - For non-smokers: This group, holding all else constant, has a higher birth weight on average. - For smokers: This group, holding all else constant, has a birth weight that is close to the intercept on average. - interaction: Smokers, on average, have lower birth weights as they increase in age. This relationship is statistically significant.
Tip: Note that the main effect of smoking here gives the mean difference between smokers and non-smokers for age = 0. It may be easier to interpret models with nominal by continuous interactions if you first center the continuous variable (at mean, median or other relevant value).
median(birthwt$age)
## [1] 23
birthwt$agec <- birthwt$age - 23
m2c <- lm(bwt ~ smoke * agec, data = birthwt)
summary(m2c)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3043.87967 66.34054 45.882648 5.136217e-103
## smokesmoker -272.97916 105.82868 -2.579444 1.067228e-02
## agec 27.73138 12.14910 2.282587 2.359245e-02
## smokesmoker:agec -46.57191 20.44711 -2.277677 2.388962e-02
Now, interpret the intercept, coefficients of smoke and agec, as well as the interaction terms.
The intercept shows us that holding all else constant, birth weights are around 3043 on average. However, this decreases by 272 for smokers, though the relationship is not statistically significant. Similarly, birth weight increases by 27 for every additional year of age of the mother, though this is also not statistically significant. The interaction term shows us that smoking impacts birth weights the older the mother gets in a negative way.
Without interaction With only main effects, we assume that the two independent variables do not impact the relationship that the other has with birth weight.
With interaction, we assume that age impacts the relationship that the weight of the mother at delivery has on the weight of their child at birth.
Interpretation of the interaction term: It is likely, in this case, that the older women get, the more their weight at the time of birth negatively impacts the weight of their child.
Tip Unless x1= 0 and x2= 0 are meaningful in your dataset, you may end up with strange values for the intercept or other main effect estimates. If this happens, try centering continuous variables.
median(birthwt$lwt)
## [1] 121
## [1] 121
birthwt$lwtc <- birthwt$lwt - 121
m3 <- lm(bwt ~ agec * lwtc, data = birthwt)
summary(m3)
##
## Call:
## lm(formula = bwt ~ agec * lwtc, data = birthwt)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2258.87 -477.29 16.28 512.40 1824.01
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2912.1115 54.8888 53.055 <2e-16 ***
## agec 11.7363 10.8076 1.086 0.279
## lwtc 4.4237 1.7645 2.507 0.013 *
## agec:lwtc -0.2992 0.3227 -0.927 0.355
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 719.4 on 185 degrees of freedom
## Multiple R-squared: 0.04229, Adjusted R-squared: 0.02676
## F-statistic: 2.723 on 3 and 185 DF, p-value: 0.04569
Now interpret the coefficient of all variables here.
For every additional year of age of the mother, and holding all else constant, birth weight increases by 11.7. However, for every additional pound of weight for the mother at time of birth, the baby’s birth increases (on average) by 4.4. The former relationship is not statistically significant but the latter one is. Finally, the interaction term shows us that the degree to which the mother’s weight impacts the baby’s decreases as they age very slightly. This is also not statistically significant.
library(ggplot2)
nd <- expand.grid(agec = seq(15, 35, 5) - 23, lwtc = seq(75, 200, 25) - 121)
nd$pred <- predict(m3, newdata = nd)
nd$age <- nd$agec + 23
nd$lwt <- nd$lwtc + 121
qplot(age, pred, data = nd, color = factor(lwt), geom = "line") + ylim(2000, 4000)
## Warning: `qplot()` was deprecated in ggplot2 3.4.0.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
qplot(lwt, pred, data = nd, color = factor(age), geom = "line") + ylim(2000, 4000)
Question: which plot makes more sense, and why?
I think the chart with age as the x-axis makes more sense to interpret, as it plainly shows that the predicted weight decrases with age and that the weight of the mother matters less over time as it relates to the baby’s weight.