Q8.7
Backward elimination starts with the model that includes all potential predictor
variables. Variables are eliminated one-at-a-time from the model until we cannot
improve the adjusted R^2. The strategy within each elimination step is to eliminate
the variable that leads to the largest improvement in adjusted R^2.
Therefore, we need to remove from the largest adjusted R^2. We should remove age.
Q8.13
Multiple regression methods using the model generally depend on the following
four assumptions:
1. the residuals of the model are nearly normal,
2. the variability of the residuals is nearly constant,
3. the residuals are independent, and
4. each variable is linearly related to the outcome.
Left graph has nearly normal residuals: The normal probability plot shows a
nearly normal distributionof the residuals, however, there are some minor
irregularities at the tails. With a data set so large, these would not be
a concern.
Right graph has constant variability of residuals: The scatterplot of the
residuals versus the fitted values doesnot show any overall structure.
However, values that have very low or very high fitted values appear to also
have somewhat larger outliers.
Q8.13
Left garph has linear relationships between the response variable and numerical
explanatory variables:the residuals do appeare to be randomly distributed around 0.
Right graph has linear relationships between the response variable and numerical
explanatory variables: The residuals vs. length of gestation plot does not show
any clear or strong remaining structures, with the possible exception of very short
or long gestations.
Q8.13
Left graph has constant variability of residuals. The residuals do appear to
have constant variability between the two parity, though these items are relatively
minor.
Right graph has linear relationships between the response variable and numerical
explanatory variables:the residuals do appeare to be randomly distributed around 0.
Q8.13
Left graph has linear relationships between the response variable and numerical
explanatory variables: The residuals vs weight of mother are randomly distributed
around 0.
Right graph has constant variability of residuals. The residuals do appear to
have constant variability between smoking status groups, though
these items are relatively minor.
All concerns raised here are relatively mild. There are some outliers, but there is
so much data that the influence of such observations will be minor.