1. Nominal by nominal

m1 <- lm(bwt ~ smoke * nonwhite, data = birthwt)
summary(m1)
## 
## Call:
## lm(formula = bwt ~ smoke * nonwhite, data = birthwt)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2407.75  -416.85    31.25   483.25  1561.25 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                    3428.7      102.7  33.378  < 2e-16 ***
## smokesmoker                    -601.9      139.6  -4.312 2.62e-05 ***
## nonwhitenonwhite               -604.2      130.7  -4.622 7.12e-06 ***
## smokesmoker:nonwhitenonwhite    419.5      217.1   1.932   0.0548 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 681.4 on 185 degrees of freedom
## Multiple R-squared:  0.1408, Adjusted R-squared:  0.1268 
## F-statistic:  10.1 on 3 and 185 DF,  p-value: 3.393e-06

Interpretation for each group - Non-smoking, white mothers: This is the reference group, so their birth weights are around the intercept, on average. - Smoking, white mothers: This group has birth weights that are typically lower than average. - Non-smoking, non-white mothers: This group also has birth rates that are typically lower than average, but only slightly worse than smoking white mothers. - Smoking, non-white mothers: This group has lower birth rates on average, as the interaction term increases the intercept a bit but not enough to compensate for the negative association for both smokers and non-white mothers.

Interpretion of the interaction term: Smoking impacts the relationship between race and birth weight in a positive manner, though the relationship is not statistically significant.

#2. Nominal by continuous

m2 <- lm(bwt ~ smoke * age, data = birthwt)
summary(m2)
## 
## Call:
## lm(formula = bwt ~ smoke * age, data = birthwt)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2189.27  -458.46    51.46   527.26  1521.39 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      2406.06     292.19   8.235 3.18e-14 ***
## smokesmoker       798.17     484.34   1.648   0.1011    
## age                27.73      12.15   2.283   0.0236 *  
## smokesmoker:age   -46.57      20.45  -2.278   0.0239 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 709.3 on 185 degrees of freedom
## Multiple R-squared:  0.06909,    Adjusted R-squared:  0.054 
## F-statistic: 4.577 on 3 and 185 DF,  p-value: 0.004068

Without interaction, With only main effects, we assume that smoking and age have only their respective relationships with birth weight. Furthermore, those relationships are both positive but hardly statistically significant.

With interaction, Including an interaction term, we assume that age impacts the relationship that smoking has with birth weights in a negative way, and this relationship is slightly statistically significant.

Interpretation of Interaction: Coefficient The interaction term has a negative coefficient, meaning that the older women get who spoke, the lower their birth weights are.

Interpretation - For non-smokers: This group, holding all else constant, has a higher birth weight on average. - For smokers: This group, holding all else constant, has a birth weight that is close to the intercept on average. - interaction: Smokers, on average, have lower birth weights as they increase in age. This relationship is statistically significant.

Tip: Note that the main effect of smoking here gives the mean difference between smokers and non-smokers for age = 0. It may be easier to interpret models with nominal by continuous interactions if you first center the continuous variable (at mean, median or other relevant value).

median(birthwt$age)
## [1] 23
birthwt$agec <- birthwt$age - 23
m2c <- lm(bwt ~ smoke * agec, data = birthwt)
summary(m2c)$coef
##                    Estimate Std. Error   t value      Pr(>|t|)
## (Intercept)      3043.87967   66.34054 45.882648 5.136217e-103
## smokesmoker      -272.97916  105.82868 -2.579444  1.067228e-02
## agec               27.73138   12.14910  2.282587  2.359245e-02
## smokesmoker:agec  -46.57191   20.44711 -2.277677  2.388962e-02

Now, interpret the intercept, coefficients of smoke and agec, as well as the interaction terms.

The intercept shows us that holding all else constant, birth weights are around 3043 on average. However, this decreases by 272 for smokers, though the relationship is not statistically significant. Similarly, birth weight increases by 27 for every additional year of age of the mother, though this is also not statistically significant. The interaction term shows us that smoking impacts birth weights the older the mother gets in a negative way.

3. Continuous by continuous

Tip Unless x1= 0 and x2= 0 are meaningful in your dataset, you may end up with strange values for the intercept or other main effect estimates. If this happens, try centering continuous variables.

median(birthwt$lwt)
## [1] 121
## [1] 121
birthwt$lwtc <- birthwt$lwt - 121
m3 <- lm(bwt ~ agec * lwtc, data = birthwt)
summary(m3)
## 
## Call:
## lm(formula = bwt ~ agec * lwtc, data = birthwt)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2258.87  -477.29    16.28   512.40  1824.01 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 2912.1115    54.8888  53.055   <2e-16 ***
## agec          11.7363    10.8076   1.086    0.279    
## lwtc           4.4237     1.7645   2.507    0.013 *  
## agec:lwtc     -0.2992     0.3227  -0.927    0.355    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 719.4 on 185 degrees of freedom
## Multiple R-squared:  0.04229,    Adjusted R-squared:  0.02676 
## F-statistic: 2.723 on 3 and 185 DF,  p-value: 0.04569

Now interpret the coefficient of all variables here.

For every additional year of age of the mother, and holding all else constant, birth weight increases by 11.7. However, for every additional pound of weight for the mother at time of birth, the baby’s birth increases (on average) by 4.4. The former relationship is not statistically significant but the latter one is. Finally, the interaction term shows us that the degree to which the mother’s weight impacts the baby’s decreases as they age very slightly. This is also not statistically significant.

library(ggplot2)
nd <- expand.grid(agec = seq(15, 35, 5) - 23, lwtc = seq(75, 200, 25) - 121)
nd$pred <- predict(m3, newdata = nd)
nd$age <- nd$agec + 23
nd$lwt <- nd$lwtc + 121
qplot(age, pred, data = nd, color = factor(lwt), geom = "line") + ylim(2000, 4000)
## Warning: `qplot()` was deprecated in ggplot2 3.4.0.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

qplot(lwt, pred, data = nd, color = factor(age), geom = "line") + ylim(2000, 4000)

Question: which plot makes more sense, and why?

I think the chart with age as the x-axis makes more sense to interpret, as it plainly shows that the predicted weight decrases with age and that the weight of the mother matters less over time as it relates to the baby’s weight.