Q8.7

Q8.7

Backward elimination starts with the model that includes all potential predictor 
variables. Variables are eliminated one-at-a-time from the model until we cannot 
improve the adjusted R^2. The strategy within each elimination step is to eliminate 
the variable that leads to the largest improvement in adjusted R^2.

Therefore, we need to remove from the largest adjusted R^2. We should remove age.
Q8.13

Q8.13

Multiple regression methods using the model generally depend on the following 
four assumptions:
1. the residuals of the model are nearly normal,
2. the variability of the residuals is nearly constant,
3. the residuals are independent, and
4. each variable is linearly related to the outcome.

Left graph has nearly normal residuals: The normal probability plot shows a 
nearly normal distributionof the residuals, however, there are some minor 
irregularities at the tails. With a data set so large, these would not be 
a concern.

Right graph has constant variability of residuals: The scatterplot of the 
residuals versus the fitted values doesnot show any overall structure. 
However, values that have very low or very high fitted values appear to also
have somewhat larger outliers.
Q8.13

Q8.13

Left garph  has linear relationships between the response variable and numerical 
explanatory variables:the residuals do appeare to be randomly distributed around 0.

Right graph has linear relationships between the response variable and numerical 
explanatory variables: The residuals vs. length of gestation plot does not show
any clear or strong remaining structures, with the possible exception of very short 
or long gestations. 
Q8.13

Q8.13

Left graph has constant variability of residuals. The residuals do appear to
have constant variability between the two parity, though  these items are relatively 
minor.

Right graph has linear relationships between the response variable and numerical 
explanatory variables:the residuals do appeare to be randomly distributed around 0.
Q8.13

Q8.13

Left graph has linear relationships between the response variable and numerical 
explanatory variables: The residuals vs weight of mother are randomly distributed 
around 0. 

Right graph has constant variability of residuals. The residuals do appear to
have constant variability between smoking status groups, though 
these items are relatively minor.
All concerns raised here are relatively mild. There are some outliers, but there is 
so much data that the influence of such observations will be minor.