Problem 8.15

a) Examine each of the predictors. Are there any outliers that are likely to have a very large influence on the logistic regression model?

Each numerical predictor has a couple points that could exhibit influence. Head length has a couple above the median, skull width a couple above the median, total length a couple below the median, and tail length one above the median.

b) The summary table for the full model indicates that at least one variable should be eliminated when using the p-value approach for variable selection: head length. The second component of the table summarizes the reduced model following variable selection. Explain why the remaining estimates change between the two models.

When predictor variables are correlated, they are said to be collinear. When there are collinear variables in the model, they can have an effect on the the parameter estimates and their standard errors. When the head length parameter is removed, the skull width parameter changes by over 30%. This makes some intuitive sense, as head length and skull width are likely to be highly correlated. The standard error for the skull width parameter also decreases when head length is removed.

To get more intuition on why the parameter estimates have to change when collinear variables are added and subtracted, start with and arbitrary model with one parmeter:

\[\hat { y } = { \beta }_{ 0 } + .8{ X }_{ 11 }\]

Now what if we wanted to add a nearly another predictor variable that is nearly perfectly correlated. Both this data and our previous data are normalized so that they are practicaly the same vector. That wouldn’t add much information. Our estimate for each point therefore isn’t going to change very much. The new model would have to look someting like:

\[\hat { y } = { \beta }_{ 0 } + .4{ X }_{ 11 } + .41{ X }_{ 12 }\]

Problem 8.15

Peter Goodridge

November 13, 2017

a) Examine each of the predictors. Are there any outliers that are likely to have a very large influence on the logistic regression model?