8.15 Possum classification, Part I. The common brushtail possum of the Australia region is a bit cuter than its distant cousin, the American opossum (see Figure 7.5 on page 334). We consider 104 brushtail possums from two regions in Australia, where the possums may be considered a random sample from the population. The first region is Victoria, which is in the eastern half of Australia and traverses the southern coast. The second region consists of New South Wales and Queensland, which make up eastern and northeastern Australia.
We use logistic regression to diferentiate between possums in these two regions. The outcome variable, called population, takes value 1 when a possum is from Victoria and 0 when it is from New South Wales or Queensland. We consider five predictors: sex male (an indicator for a possum being male), head length, skull width, total length, and tail length. Each variable is summarized in a histogram. The full logistic regression model and a reduced model after variable selection are summarized in the table.

a) Examine each of the predictors. Are there any outliers that are likely to have a very large influence on the logistic regression model?

Each numerical predictor has a couple points that could exhibit influence. Head length has a couple above the median, skull width a couple above the median, total length a couple below the median, and tail length one above the median.

b) The summary table for the full model indicates that at least one variable should be eliminated when using the p-value approach for variable selection: head length. The second component of the table summarizes the reduced model following variable selection. Explain why the remaining estimates change between the two models.

When predictor variables are correlated, they are said to be collinear. When there are collinear variables in the model, they can have an effect on the the parameter estimates and their standard errors. When the head length parameter is removed, the skull width parameter changes by over 30%. This makes some intuitive sense, as head length and skull width are likely to be highly correlated. The standard error for the skull width parameter also decreases when head length is removed.

To get more intuition on why the parameter estimates have to change when collinear variables are added and subtracted, start with and arbitrary model with one parmeter:

\[\hat { y } = { \beta }_{ 0 } + .8{ X }_{ 11 }\]

Now what if we wanted to add a nearly another predictor variable that is nearly perfectly correlated. Both this data and our previous data are normalized so that they are practicaly the same vector. That wouldn’t add much information. Our estimate for each point therefore isn’t going to change very much. The new model would have to look someting like:

\[\hat { y } = { \beta }_{ 0 } + .4{ X }_{ 11 } + .41{ X }_{ 12 }\]