8.3 Baby weights, Part III. We considered the variables smoke and parity, one at a time, in modeling birth weights of babies in Exercises 8.1 and 8.2. A more realistic approach to modeling infant weights is to consider all possibly related variables at once. Other variables of interest include length of pregnancy in days (gestation), mother’s age in years (age), mother’s height in inches (height), and mother’s pregnancy weight in pounds (weight). Below are three observations from this data set.

bwt gestation parity age height weight smoke
1 284 0 27 62 100 0
2 113 282 0 33 64 135
. . . . . . .
. . . . . . .
. . . . . . .
1236 117 297 0 38 65 129

The summary table below shows the results of a regression model for predicting the average birth weight of babies based on all of the variables included in the data set.


              Estimate  Std. Error  t value   Pr(>|t|) 
  (Intercept)   -80.41  14.35       -5.60       0.0000 
  gestation       0.44   0.03       15.26       0.0000 
  parity         -3.33   1.13       -2.95       0.0033 
  age            -0.01   0.09       -0.10       0.9170 
  height          1.15   0.21        5.63       0.0000 
  weight          0.05   0.03        1.99       0.0471 
  smoke          -8.40   0.95       -8.81       0.0000

Question

A. Write the equation of the regression line that includes all of the variables.

Answer :

baby_weight = -80.41 + 0.44 X gestation - 3.33 X parity - 0.01 X age + 1.15 X height + .05 X weight - 8.40 X smoke

Question

B. Interpret the slopes of gestation and age in this context.

Answer :

The model predicts that for each additional day of gestation, the baby’s weight should increase by 0.44 oz, all other factors excluded.

Similarly, for each additional year older the mother is, the baby’s weight should be 0.01 oz lower.

Question

C. The coefficient for parity is different than in the linear model shown in Exercise 8.2. Why might there be a difference?

Answer :

Since we are considering many factors simultaneously, the coefficents of each can be measured more accurately, because we are removing effects of collinearity across variables.

Question

D. Calculate the residual for the first observation in the data set.

Answer :

baby_weight = -80.41 + 0.44 X gestation - 3.33 X parity - 0.01 X age + 1.15 X height + .05 X weight - 8.40 X smoke

    bwt   gestation parity  age  height  weight  smoke 

1 120 284 0 27 62 100 0

Actual baby weight = 120
Model baby weight = -80.41 + 0.44 X 284 - 3.33 X 0 - 0.01 X 27 + 1.15 X 62 + .05 X 100 - 8.40 X 0 = 120.58

Residual = Actual - Model = 120 - 120.58 = -0.58

Question

E. The variance of the residuals is 249.28, and the variance of the birth weights of all babies in the data set is 332.57. Calculate the R2 and the adjusted R2. Note that there are 1,236 observations in the data set.

Answer :

R 2 = 1 V a r ( e i ) / V a r ( y i )

R 2 a d j = 1 V a r ( e i ) / V a r ( y i ) ( ( n 1 ) / ( n k 1 ) )

r2 <- 1 -(249.28/332.57)
r2
## [1] 0.2504435
r2adj <- 1 -(249.28/332.57) * (1235/1229)
r2adj
## [1] 0.2467842