Answer (a):

Data

The data file is loaded from http://www.stat.tamu.edu/~sheather/book/data_sets.php, file is tab separated, I converted into comma separated csv file.

options(scipen=999)
library(dplyr)
library(tidyverse)
library(knitr)
library(kableExtra)
library(car)
library(ggrepel)
library(ggplot2)

#Load data

MissAm.df <- read.csv("https://raw.githubusercontent.com/akulapa/Data621-Week08-Discussion/master/MissAmericato2008.csv", header= TRUE, stringsAsFactors = F)
attach(MissAm.df)

Dataset structure, the variable Top10 represents the number of times state made into top 10 list from 2000 to 2008, including 2000 and 2008. The number of times state did not make into Top10 list is 9 - Top10.

str(MissAm.df)
## 'data.frame':    51 obs. of  7 variables:
##  $ abbreviation  : chr  "AL" "AK" "AZ" "AR" ...
##  $ Top10         : int  6 0 0 4 5 0 2 0 2 3 ...
##  $ LogPopulation : num  11.9 9.8 12.1 11.3 14 ...
##  $ LogContestants: num  3.9 2.71 2.86 3.77 3.94 ...
##  $ LogTotalArea  : num  10.9 13.4 11.6 10.9 12 ...
##  $ Latitude      : num  32.4 58.4 33.4 34.7 38.5 ...
##  $ Longitude     : num  86.4 134.6 112 92.2 121.5 ...
summary(MissAm.df)
##  abbreviation           Top10       LogPopulation    LogContestants 
##  Length:51          Min.   :0.000   Min.   : 9.693   Min.   :1.910  
##  Class :character   1st Qu.:0.000   1st Qu.:10.843   1st Qu.:2.857  
##  Mode  :character   Median :2.000   Median :11.757   Median :3.091  
##                     Mean   :1.765   Mean   :11.682   Mean   :3.136  
##                     3rd Qu.:3.000   3rd Qu.:12.354   3rd Qu.:3.450  
##                     Max.   :7.000   Max.   :14.001   Max.   :4.040  
##   LogTotalArea      Latitude       Longitude     
##  Min.   : 4.22   Min.   :21.33   Min.   : 69.80  
##  1st Qu.:10.49   1st Qu.:35.99   1st Qu.: 78.06  
##  Median :10.94   Median :39.75   Median : 89.67  
##  Mean   :10.62   Mean   :39.40   Mean   : 93.14  
##  3rd Qu.:11.34   3rd Qu.:42.77   3rd Qu.:102.78  
##  Max.   :13.40   Max.   :58.37   Max.   :157.92
state.df <- data.frame(abbreviation = state.abb, State=state.name, stringsAsFactors = F)
MissAm.df <- MissAm.df %>% 
  mutate(InTop10 = ifelse(Top10>0, 1, 0)) %>% 
  inner_join(state.df)

MissAm.df %>% 
  select(State, Top10, InTop10, LogPopulation, LogContestants, LogTotalArea, Latitude, Longitude) %>% 
  kable("html",caption = "Miss America Contest - Number of Times State Produced Top 10 Finalist`(2000 - 2008)`") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = T, position = "left", font_size = 12) %>%
  scroll_box(width = "100%", height = "250px")
Miss America Contest - Number of Times State Produced Top 10 Finalist(2000 - 2008)
State Top10 InTop10 LogPopulation LogContestants LogTotalArea Latitude Longitude
Alabama 6 1 11.9249 3.895894 10.8670 32.3833 86.367
Alaska 0 0 9.8011 2.708050 13.4049 58.3667 134.583
Arizona 0 0 12.0543 2.862201 11.6439 33.4333 112.017
Arkansas 4 1 11.2702 3.766997 10.8814 34.7333 92.233
California 5 1 14.0005 3.935739 12.0058 38.5167 121.500
Colorado 0 0 11.8820 2.944439 11.5530 39.7500 104.867
Connecticut 2 1 11.5742 2.944439 8.6203 41.7333 72.650
Delaware 0 0 10.3397 2.852631 7.8196 39.1333 75.467
Florida 3 1 13.0882 3.725693 11.0937 30.3833 84.367
Georgia 4 1 12.5558 3.875359 10.9925 33.6500 84.433
Hawaii 3 1 10.6205 2.564949 9.2994 21.3333 157.917
Idaho 0 0 10.6565 2.970414 11.3334 43.5667 116.217
Illinois 2 1 13.0341 3.238678 10.9667 39.8333 89.667
Indiana 2 1 12.3026 3.126761 10.5028 39.7333 86.283
Iowa 1 1 11.6306 2.879199 10.9380 41.5333 93.650
Kansas 1 1 11.4371 2.978925 11.3178 39.0667 95.633
Kentucky 2 1 11.7568 3.433987 10.6068 38.2000 84.867
Louisiana 2 1 12.1042 3.465736 10.8559 30.5333 91.150
Maine 0 0 10.6066 2.484907 10.4740 44.3167 69.800
Maryland 3 1 12.1239 3.218876 9.4260 39.0000 76.083
Massachusetts 2 1 12.4428 2.917771 9.2644 42.3667 71.033
Michigan 3 1 12.8257 3.367296 11.4795 42.7833 84.600
Minnesota 0 0 12.1286 2.724579 11.3730 44.8833 93.217
Mississippi 3 1 11.6233 3.703768 10.7879 32.3167 90.083
Missouri 0 0 12.1716 3.526361 11.1520 38.5667 92.183
Montana 0 0 10.2920 2.852631 11.8985 46.6000 112.000
Nebraska 0 0 11.0494 2.871680 11.2561 40.8500 96.750
Nevada 1 1 10.9462 2.549445 11.6133 39.1667 119.767
New Hampshire 1 1 10.6631 2.785011 9.1431 43.2000 71.500
New Jersey 2 1 12.5266 3.242592 9.0735 40.2167 74.767
New Mexico 0 0 11.0420 3.091042 11.7084 35.6167 106.083
New York 3 1 13.4873 3.072693 10.9070 42.7500 73.800
North Carolina 2 1 12.5005 3.332205 10.8934 35.8667 78.783
North Dakota 0 0 10.1843 3.032546 11.1662 46.7667 100.750
Ohio 0 0 12.9287 3.212187 10.7105 40.0000 82.883
Oklahoma 5 1 11.6138 3.784190 11.1548 35.4000 97.600
Oregon 1 1 11.6680 3.100092 11.4966 44.9167 123.017
Pennsylvania 4 1 13.0212 3.178054 10.7376 40.2000 76.767
Rhode Island 1 1 10.7402 2.656757 7.3428 41.7333 71.433
South Carolina 1 1 11.8946 3.725693 10.3741 33.9500 81.117
South Dakota 0 0 10.2306 2.674149 11.2531 44.3833 100.283
Tennessee 1 1 12.1059 3.564827 10.6488 36.1167 86.683
Texas 7 1 13.4640 3.811097 12.5009 30.3000 97.700
Utah 2 1 11.5123 4.040123 11.3492 40.7667 111.967
Vermont 0 0 10.0402 2.335375 9.1710 44.2667 72.567
Virginia 3 1 12.4058 3.258097 10.6637 37.5000 77.333
Washington 1 1 12.2114 2.995732 11.1747 46.9667 122.900
West Virginia 2 1 10.9746 3.091042 10.0953 38.3667 81.600
Wisconsin 3 1 12.2358 3.212187 11.0898 43.1333 89.333
Wyoming 0 0 9.6926 1.909542 11.4908 41.1500 104.817

Model

Coefficients of the full Generalized Logistic Regression Model.

#Build model

MissAm.glm <- glm(InTop10 ~ LogPopulation + LogContestants + LogTotalArea + Latitude + Longitude, family=binomial(link = "logit"), data = MissAm.df)

MissAm.glm
## 
## Call:  glm(formula = InTop10 ~ LogPopulation + LogContestants + LogTotalArea + 
##     Latitude + Longitude, family = binomial(link = "logit"), 
##     data = MissAm.df)
## 
## Coefficients:
##    (Intercept)   LogPopulation  LogContestants    LogTotalArea  
##      -11.68988         1.28041         2.81720        -1.30420  
##       Latitude       Longitude  
##       -0.01544         0.03781  
## 
## Degrees of Freedom: 49 Total (i.e. Null);  44 Residual
## Null Deviance:       62.69 
## Residual Deviance: 35.97     AIC: 47.97

Summary of the model.

summary(MissAm.glm)
## 
## Call:
## glm(formula = InTop10 ~ LogPopulation + LogContestants + LogTotalArea + 
##     Latitude + Longitude, family = binomial(link = "logit"), 
##     data = MissAm.df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.2553  -0.3656   0.3466   0.5164   1.9389  
## 
## Coefficients:
##                 Estimate Std. Error z value Pr(>|z|)  
## (Intercept)    -11.68988    9.58609  -1.219   0.2227  
## LogPopulation    1.28041    0.64357   1.990   0.0466 *
## LogContestants   2.81720    1.65071   1.707   0.0879 .
## LogTotalArea    -1.30420    0.62712  -2.080   0.0376 *
## Latitude        -0.01544    0.10896  -0.142   0.8873  
## Longitude        0.03781    0.03688   1.025   0.3052  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 62.687  on 49  degrees of freedom
## Residual deviance: 35.972  on 44  degrees of freedom
## AIC: 47.972
## 
## Number of Fisher Scoring iterations: 5
#Manually calculating Predicted and Fitted values
# man.pre<- MissAm.glm$coefficients[1] + 
#   MissAm.glm$coefficients[2] * MissAm.df$LogPopulation +
#   MissAm.glm$coefficients[3] * MissAm.df$LogContestants +
#   MissAm.glm$coefficients[4] * MissAm.df$LogTotalArea +
#   MissAm.glm$coefficients[5] * MissAm.df$Latitude +
#   MissAm.glm$coefficients[6] * MissAm.df$Longitude
# 
# man.fitted <- 1/(1 + (1/exp(man.pre)))

#Get coefficients
MissAm.Coe <- round(MissAm.glm$coefficients,4)

Summary Explanation

First part Call, shows information about response variable and predictor variables.

Logistic regression equation

\[ln \bigg(\frac{P}{1-P}\bigg) = \beta_0 + \beta_1{X_1} + \beta_2{X_2} + \beta_3{X_3} + \beta_4{X_4} + \beta_5{X_5}\]

\[ln \bigg(\frac{P}{1-P}\bigg) = -11.6899 + 1.2804LogPopulation + 2.8172LogContestants -1.3042LogTotalArea -0.0154Latitude + 0.0378Longitude\]

Probability is

\[P = \frac{e^{\beta_0 + \beta_1{X_1} + \beta_2{X_2} + \beta_3{X_3} + \beta_4{X_4} + \beta_5{X_5}}}{1 + {e^{\beta_0 + \beta_1{X_1} + \beta_2{X_2} + \beta_3{X_3} + \beta_4{X_4} + \beta_5{X_5}}}}\]

  • For every unit increase of LogPopulation, \(ln \bigg(\frac{p}{1-p}\bigg)\) increases by 1.2804. LogPopulation has a positive effect on the outcome when all other predictor variables are held constant. In other words, as log value of state population increases by one unit, log odds or logits for the state to make into Top10 list of finalists increases by 1.2804.

  • For every unit increase of LogContestants, \(ln \bigg(\frac{p}{1-p}\bigg)\) increases by 2.8172. LogContestants has a positive effect on the outcome when all other predictor variables are held constant. As log value of contestants from a state increases by one unit, log odds or logits for the state to make into Top10 list of finalists increases by 2.8172.

  • For every unit increase of LogTotalArea, \(ln \bigg(\frac{p}{1-p}\bigg)\) decreases by 1.3042. LogTotalArea has a negative effect on the outcome when all other predictor variables are held constant. As log value of the total area of a state increases by one unit, log odds or logits for the state to make into Top10 list of finalists decreases by 1.3042.

  • For every degree increase of Latitude, \(ln \bigg(\frac{p}{1-p}\bigg)\) decreases by 0.0154. Latitude has a negative effect on the outcome when all other predictor variables are held constant. As log value of latitude of state capitol increases by one degree, log odds or logits for the state to make into Top10 list of finalists decreases by 0.0154.

  • For every degree increase of Longitude, \(ln \bigg(\frac{p}{1-p}\bigg)\) increases by 0.0378. Longitude has a positive effect on the outcome when all other predictor variables are held constant. As log value of longitude of state capitol increases by one degree, log odds or logits for the state to make into Top10 list of finalists increases by 0.0378.

Model also suggests,

  • Variable LogPopulation and LogTotalArea are significent at 5% level
  • Variables LogContestants is contributing to model and is significent at 10% level.
  • Since p-value is high variables Latitude and Longitude are not significent to the model.

Null deviance is 62.687, and Residual deviance is 35.972, suggesting variables are needed to build the model. Lower the value of deviance better the model.

Akaike information criterion(AIC), gives the quality of the model. Lower the value of AIC better the model. Since this is full model, lets use Step function to see if AIC improves if we remove any variables from the model.

step(MissAm.glm, test="LRT")
## Start:  AIC=47.97
## InTop10 ~ LogPopulation + LogContestants + LogTotalArea + Latitude + 
##     Longitude
## 
##                  Df Deviance    AIC    LRT Pr(>Chi)  
## - Latitude        1   35.992 45.992 0.0202  0.88703  
## - Longitude       1   37.292 47.292 1.3194  0.25071  
## <none>                35.972 47.972                  
## - LogContestants  1   39.712 49.712 3.7397  0.05313 .
## - LogPopulation   1   40.709 50.709 4.7364  0.02953 *
## - LogTotalArea    1   42.067 52.067 6.0949  0.01356 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=45.99
## InTop10 ~ LogPopulation + LogContestants + LogTotalArea + Longitude
## 
##                  Df Deviance    AIC    LRT Pr(>Chi)   
## - Longitude       1   37.981 45.981 1.9886 0.158485   
## <none>                35.992 45.992                   
## - LogPopulation   1   40.992 48.992 4.9993 0.025358 * 
## - LogContestants  1   41.302 49.302 5.3095 0.021209 * 
## - LogTotalArea    1   44.344 52.344 8.3518 0.003853 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=45.98
## InTop10 ~ LogPopulation + LogContestants + LogTotalArea
## 
##                  Df Deviance    AIC    LRT Pr(>Chi)  
## <none>                37.981 45.981                  
## - LogPopulation   1   41.708 47.708 3.7274  0.05353 .
## - LogContestants  1   42.521 48.521 4.5404  0.03310 *
## - LogTotalArea    1   44.446 50.446 6.4649  0.01100 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Call:  glm(formula = InTop10 ~ LogPopulation + LogContestants + LogTotalArea, 
##     family = binomial(link = "logit"), data = MissAm.df)
## 
## Coefficients:
##    (Intercept)   LogPopulation  LogContestants    LogTotalArea  
##        -9.4502          1.0702          2.5447         -0.9303  
## 
## Degrees of Freedom: 49 Total (i.e. Null);  46 Residual
## Null Deviance:       62.69 
## Residual Deviance: 37.98     AIC: 45.98

AIC value without Latitude and Longitude yield better value. The output of step suggests existence of Latitude and Longitude is not providing any value to the model. Let’s build a model without Latitude and Longitude variables.

MissAm.glm_v1 <- glm(InTop10 ~ LogPopulation + LogContestants + LogTotalArea, family=binomial(link = "logit"), data = MissAm.df)

summary(MissAm.glm_v1)
## 
## Call:
## glm(formula = InTop10 ~ LogPopulation + LogContestants + LogTotalArea, 
##     family = binomial(link = "logit"), data = MissAm.df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3100  -0.4617   0.3657   0.4814   2.0845  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)  
## (Intercept)     -9.4502     6.1757  -1.530   0.1260  
## LogPopulation    1.0702     0.6007   1.782   0.0748 .
## LogContestants   2.5447     1.3779   1.847   0.0648 .
## LogTotalArea    -0.9303     0.4293  -2.167   0.0302 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 62.687  on 49  degrees of freedom
## Residual deviance: 37.981  on 46  degrees of freedom
## AIC: 45.981
## 
## Number of Fisher Scoring iterations: 5

Marginal Plots For Full Model

mmps(MissAm.glm,layout=c(2,3),key=T)

Marginal Plots For The Model Without Longitude

mmps(MissAm.glm_v1,layout=c(2,3),key=T)

There is not much difference between the plots. In both Marginal Plots curve drawn by model and data fit between 0 and 1 for all variables.

Answer (b):

The leverage \(h_i\) is a measure of the distance between the \(x\) value for the \(i^{th}\) data point and the mean of the \(x\) values for all \(n\) data points. If leverage value greater than \(2\times \frac{number\ of\ variables + 1}{number\ of\ observations}\) is considered as high leverage point. Leverage values are also known as hat values. We obtain the values using function hatvalues.

Let’s get hatvalues for the model that does not have Longitude variable. Since we have multiple variables, we will be using Standardized Deviance Residuals It is calculated dividing pearson residual by \(\sqrt{1 - hatvalues}\).

\[Standardized\ Deviance\ Residuals(r_i) = \frac{p_i}{\sqrt{(1 - h_i)}}\]

Leverage points can be identified using influencePlot function from car package or calculated manually.

#Cut of leverage
#we have 4 variables and 50 observations
highLeverageHat = 2 * (3+1)/50

#Leverage values
MissAm.df$hatVal <- hatvalues(MissAm.glm_v1)

#standardized deviance residuals(sdr)
#Get pearson residuals
MissAm.df$pearsonResd <- residuals(MissAm.glm_v1,'pearson')

MissAm.df$sdr <- MissAm.df$pearsonResd / (sqrt(1 - MissAm.df$hatVal))

#Cook's distance
MissAm.df$cookd <- cooks.distance(MissAm.glm_v1)

#High leverage SDR
#data points falling outside 2 standard deviations
highLeverageSdrU <- mean(MissAm.df$sdr) + (2*sd(MissAm.df$sdr))
highLeverageSdrL <- mean(MissAm.df$sdr) - (2*sd(MissAm.df$sdr))

#High leverage based on Cook's distance
#data points falling outside 2 standard deviations
highLeverageCookdU <- mean(MissAm.df$cookd) + (2*sd(MissAm.df$cookd))
highLeverageCookdL <- mean(MissAm.df$cookd) - (2*sd(MissAm.df$cookd))

MissAm.df$Outlier <- ifelse((MissAm.df$hatVal > highLeverageHat | MissAm.df$sdr >  highLeverageSdrU | MissAm.df$sdr <  highLeverageSdrL | MissAm.df$cookd > highLeverageCookdU | MissAm.df$cookd < highLeverageCookdL),'Yes','No')

Identifying leverage data points Using influencePlot function

influencePlot(MissAm.glm_v1, col="red",id.n=5)

##       StudRes        Hat      CookD
## 8  -2.2315944 0.22191328 0.45381657
## 11  1.3549485 0.15762426 0.06839409
## 23 -1.2075931 0.17274282 0.05612712
## 25 -2.2347980 0.04654186 0.11263216
## 28  2.2400994 0.07957585 0.18272920
## 34 -0.7564691 0.15896707 0.01655727
## 35 -2.4581400 0.05002072 0.18585607
## 39  0.5455872 0.21222184 0.01187986
## 45 -0.7364479 0.17412150 0.01749629

Manual calculation to identify leverage data points.

ggplot(data=MissAm.df, aes(hatVal,sdr)) + 
  geom_point(aes(col=Outlier)) + 
  scale_color_manual(values=c("black", "red")) +
  geom_vline(xintercept=highLeverageHat, color="blue") +
  geom_hline(yintercept=c(highLeverageSdrU, highLeverageSdrL), color="blue") +
  geom_text_repel(data=filter(MissAm.df, (Outlier == 'Yes')), aes(hatVal,sdr, label=State), size=3) +
  labs(title = sprintf("High Leverage Data Points Using GGPlot - Manually")) + xlab("Leverage(Hat-Values)") +
  ylab("Standardized Deviance Residuals") +
  annotate("text", x = 0.04, y = -2.3, label = 'SDR - Lower Bound', colour="blue", size = 3) + 
  annotate("text", x = 0.04, y = 2.3, label = 'SDR - Upper Bound', colour="blue", size = 3) +
  annotate("text", x = 0.18, y = 2.5, label = 'High Leverage Hat Value', colour="blue", size = 3)

MissAm.df %>% 
  select(State, Top10, InTop10, LogPopulation, LogContestants, LogTotalArea, Latitude, Longitude, pearsonResd, hatVal, sdr, cookd, Outlier) %>% 
  filter(Outlier == 'Yes') %>% 
  kable("html",caption = "Miss America Contest - High Leverage Data Points") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = T, position = "left", font_size = 12) %>%
  scroll_box(width = "100%", height = "250px")
Miss America Contest - High Leverage Data Points
State Top10 InTop10 LogPopulation LogContestants LogTotalArea Latitude Longitude pearsonResd hatVal sdr cookd Outlier
Delaware 0 0 10.3397 2.852631 7.8196 39.1333 75.467 -2.2253921 0.2219133 -2.5228564 0.4538166 Yes
Minnesota 0 0 12.1286 2.724579 11.3730 44.8833 93.217 -0.9430982 0.1727428 -1.0368994 0.0561271 Yes
Missouri 0 0 12.1716 3.526361 11.1520 38.5667 92.183 -2.9664769 0.0465419 -3.0380167 0.1126322 Yes
Nevada 1 1 10.9462 2.549445 11.6133 39.1667 119.767 2.7895320 0.0795758 2.9076180 0.1827292 Yes
Ohio 0 0 12.9287 3.212187 10.7105 40.0000 82.883 -3.6623304 0.0500207 -3.7575127 0.1858561 Yes
Rhode Island 1 1 10.7402 2.656757 7.3428 41.7333 71.433 0.3727730 0.2122218 0.4199934 0.0118799 Yes
Vermont 0 0 10.0402 2.335375 9.1710 44.2667 72.567 -0.5235920 0.1741215 -0.5761491 0.0174963 Yes

Leverage Data Points

  • Capitol of state of Rhode Island is on a high Latitude(41.7333), and variables LogTotalArea(7.3428) and LogContestants(2.656757) are low, yet contestants made into Top10 list. It seems like Outlier.

  • For state Delaware values for variables are close to individual averages, yet contestants never made into Top10 list at least once in nine years. Data points look like Outliers.

  • State of Ohio has variable LogContestants as 3.212187, it is high, yet contestants never made into Top10 list at least once in nine years. Data point looks like Outlier.

  • State of Missouri has variable LogContestants as 3.526361, it is very high, yet contestants never made into Top10 list at least once in nine years. Data point looks like Outlier.

  • State of Vermont has variable LogContestants as 2.335375, it is high compared to LogTotalArea 9.1710 and LogPopulation 10.0402, yet contestants never made into Top10 list at least once in nine years. Data point looks like Outlier.

  • State of Minnesota has variable LogContestants as 2.724580, it is very high compared to LogTotalArea 11.3730 and LogPopulation 12.1286, yet contestants never made into Top10 list at least once in nine years. Data point looks like Outlier.

  • State of Nevada has variable LogContestants as 2.549445, it is very low, yet contestants from the state made into Top10 list. This does not seem to be bad leverage data point.

Answer (c):

MissAm.glm <- glm(InTop10 ~ LogPopulation + LogContestants + LogTotalArea, family=binomial(link = "logit"), data = MissAm.df)
MissAm.Coe <- round(MissAm.glm$coefficients,4)
summary(MissAm.glm)
## 
## Call:
## glm(formula = InTop10 ~ LogPopulation + LogContestants + LogTotalArea, 
##     family = binomial(link = "logit"), data = MissAm.df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3100  -0.4617   0.3657   0.4814   2.0845  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)  
## (Intercept)     -9.4502     6.1757  -1.530   0.1260  
## LogPopulation    1.0702     0.6007   1.782   0.0748 .
## LogContestants   2.5447     1.3779   1.847   0.0648 .
## LogTotalArea    -0.9303     0.4293  -2.167   0.0302 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 62.687  on 49  degrees of freedom
## Residual deviance: 37.981  on 46  degrees of freedom
## AIC: 45.981
## 
## Number of Fisher Scoring iterations: 5
  • Intercepct value(-9.4502) decreased a lot by removing Latitude and Longitude variables.

  • For every unit increase of LogPopulation, \(ln \bigg(\frac{p}{1-p}\bigg)\) increases by 1.0702. LogPopulation has a positive effect on the outcome when all other predictor variables are held constant. In other words, as log value of state population increases by one unit, log odds or logits for the state to make into Top10 list of finalists increases by 1.0702.

  • For every unit increase of LogContestants, \(ln \bigg(\frac{p}{1-p}\bigg)\) increases by 2.5447. LogContestants has a positive effect on the outcome when all other predictor variables are held constant. As log value of contestants from a state increases by one unit, log odds or logits for the state to make into Top10 list of finalists increases by 2.5447.

  • For every unit increase of LogTotalArea, \(ln \bigg(\frac{p}{1-p}\bigg)\) decreases by 0.9303. LogTotalArea has a negative effect on the outcome when all other predictor variables are held constant. As log value of the total area of a state increases by one unit, log odds or logits for the state to make into Top10 list of finalists decreases by 0.9303.

Model also suggests,

  • Variable LogTotalArea is significent at 5% level
  • Variables LogContestants and LogPopulation are contributing to model and are significent at 10% level.

Null deviance is 62.687, and Residual deviance is 37.981, suggesting variables are needed to build the model. Lower the value of deviance better the model.