The common brushtail possum of the Australia region is a bit cuter than its distant cousin, the American opossum (see Figure 7.5 on page 334). We consider 104 brushtail possums from two regions in Australia, where the possums may be considered a random sample from the population. The first region is Victoria, which is in the eastern half of Australia and traverses the southern coast. The second region consists of New South Wales and Queensland, which make up eastern and northeastern Australia. We use logistic regression to di???erentiate between possums in these two regions. The outcome variable, called population, takes value 1 when a possum is from Victoria and 0 when it is from New South Wales or Queensland. We consider five predictors: sex male (an indicator for a possum being male), head length, skull width, total length, and tail length. Each variable is summarized in a histogram. The full logistic regression model and a reduced model after variable selection are summarized in the table.
Table
# The context heere is that, we generally eliminate predictor variables with extreme skew
# We find outliers by examining the highest P(|Z|) value, which is for the variable head_length.
#The outliers will not have a *very large* influence, because the sample size of 104 is big enough# We notice that removing of variable head-length in the Reduced Model, reduced the p-values of sex_male (-1.86 --> -2.20) and skull_width (-1.52 --> -2.27) signficantly
# The reason a reduced model in logistic regression impacts the "remaining estimates", is because of collinearity in the predictor variables.
# This make sense becaue the skull_width and sex of the possum is correlated to skull_widthThis is a product of OpenIntro that is released under a Creative Commons Attribution-ShareAlike 3.0 Unported. This lab was adapted for OpenIntro by Andrew Bray and Mine Çetinkaya-Rundel from a lab written by Mark Hansen of UCLA Statistics.