Problem 8.15

Possum Classification, Part 1.

The common brushtail possum of the Australia region is a bit cuter than its distant cousin, the American opossum (see Figure 7.5 on page 334). We consider 104 brushtail possums from two regions in Australia, where the possums may be considered a random sample from the population. The first region is Victoria, which is in the eastern half of Australia and traverses the southern coast. The second region consists of New South Wales and Queensland, which make up eastern and northeastern Australia. We use logistic regression to di???erentiate between possums in these two regions. The outcome variable, called population, takes value 1 when a possum is from Victoria and 0 when it is from New South Wales or Queensland. We consider five predictors: sex male (an indicator for a possum being male), head length, skull width, total length, and tail length. Each variable is summarized in a histogram. The full logistic regression model and a reduced model after variable selection are summarized in the table.

Table

Table

  1. Examine each of the predictors. Are there any outliers that are likely to have a very large influence on the logistic regression model?
# The context heere is that, we generally eliminate predictor variables with extreme skew
# We find outliers by examining the highest P(|Z|) value, which is for the variable head_length.

#The outliers will not have a *very large* influence, because the sample size of 104 is big enough
  1. The summary table for the full model indicates that at least one variable should be eliminated when using the p-value approach for variable selection: head length. The second component of the table summarizes the reduced model following variable selection. Explain why the remaining estimates change between the two models.
# We notice that removing of variable head-length in the Reduced Model, reduced the p-values of sex_male (-1.86 --> -2.20) and skull_width (-1.52 --> -2.27) signficantly
# The reason a reduced model in logistic regression impacts the "remaining estimates", is because of collinearity in the predictor variables. 
#  This make sense becaue the skull_width and sex of the possum is correlated to skull_width

This is a product of OpenIntro that is released under a Creative Commons Attribution-ShareAlike 3.0 Unported. This lab was adapted for OpenIntro by Andrew Bray and Mine Çetinkaya-Rundel from a lab written by Mark Hansen of UCLA Statistics.