Data621-Week08-Discussion

Answer (a):

Data

The data file is loaded from http://www.stat.tamu.edu/~sheather/book/data_sets.php, file is tab separated, I converted into comma separated csv file.

options(scipen=999)
library(dplyr)
library(tidyverse)
library(knitr)
library(kableExtra)
library(car)
library(ggrepel)
library(ggplot2)

#Load data

MissAm.df <- read.csv("https://raw.githubusercontent.com/akulapa/Data621-Week08-Discussion/master/MissAmericato2008.csv", header= TRUE, stringsAsFactors = F)
attach(MissAm.df)

Dataset structure, the variable Top10 represents the number of times state made into top 10 list from 2000 to 2008, including 2000 and 2008. The number of times state did not make into Top10 list is 9 - Top10.

str(MissAm.df)

## 'data.frame':    51 obs. of  7 variables:
##  $ abbreviation  : chr  "AL" "AK" "AZ" "AR" ...
##  $ Top10         : int  6 0 0 4 5 0 2 0 2 3 ...
##  $ LogPopulation : num  11.9 9.8 12.1 11.3 14 ...
##  $ LogContestants: num  3.9 2.71 2.86 3.77 3.94 ...
##  $ LogTotalArea  : num  10.9 13.4 11.6 10.9 12 ...
##  $ Latitude      : num  32.4 58.4 33.4 34.7 38.5 ...
##  $ Longitude     : num  86.4 134.6 112 92.2 121.5 ...

summary(MissAm.df)

##  abbreviation           Top10       LogPopulation    LogContestants 
##  Length:51          Min.   :0.000   Min.   : 9.693   Min.   :1.910  
##  Class :character   1st Qu.:0.000   1st Qu.:10.843   1st Qu.:2.857  
##  Mode  :character   Median :2.000   Median :11.757   Median :3.091  
##                     Mean   :1.765   Mean   :11.682   Mean   :3.136  
##                     3rd Qu.:3.000   3rd Qu.:12.354   3rd Qu.:3.450  
##                     Max.   :7.000   Max.   :14.001   Max.   :4.040  
##   LogTotalArea      Latitude       Longitude     
##  Min.   : 4.22   Min.   :21.33   Min.   : 69.80  
##  1st Qu.:10.49   1st Qu.:35.99   1st Qu.: 78.06  
##  Median :10.94   Median :39.75   Median : 89.67  
##  Mean   :10.62   Mean   :39.40   Mean   : 93.14  
##  3rd Qu.:11.34   3rd Qu.:42.77   3rd Qu.:102.78  
##  Max.   :13.40   Max.   :58.37   Max.   :157.92

state.df <- data.frame(abbreviation = state.abb, State=state.name, stringsAsFactors = F)
MissAm.df <- MissAm.df %>% 
  mutate(InTop10 = ifelse(Top10>0, 1, 0)) %>% 
  inner_join(state.df)

MissAm.df %>% 
  select(State, Top10, InTop10, LogPopulation, LogContestants, LogTotalArea, Latitude, Longitude) %>% 
  kable("html",caption = "Miss America Contest - Number of Times State Produced Top 10 Finalist`(2000 - 2008)`") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = T, position = "left", font_size = 12) %>%
  scroll_box(width = "100%", height = "250px")

Miss America Contest - Number of Times State Produced Top 10 Finalist`(2000 - 2008)`
State	Top10	InTop10	LogPopulation	LogContestants	LogTotalArea	Latitude	Longitude
Alabama	6	1	11.9249	3.895894	10.8670	32.3833	86.367
Alaska	0	0	9.8011	2.708050	13.4049	58.3667	134.583
Arizona	0	0	12.0543	2.862201	11.6439	33.4333	112.017
Arkansas	4	1	11.2702	3.766997	10.8814	34.7333	92.233
California	5	1	14.0005	3.935739	12.0058	38.5167	121.500
Colorado	0	0	11.8820	2.944439	11.5530	39.7500	104.867
Connecticut	2	1	11.5742	2.944439	8.6203	41.7333	72.650
Delaware	0	0	10.3397	2.852631	7.8196	39.1333	75.467
Florida	3	1	13.0882	3.725693	11.0937	30.3833	84.367
Georgia	4	1	12.5558	3.875359	10.9925	33.6500	84.433
Hawaii	3	1	10.6205	2.564949	9.2994	21.3333	157.917
Idaho	0	0	10.6565	2.970414	11.3334	43.5667	116.217
Illinois	2	1	13.0341	3.238678	10.9667	39.8333	89.667
Indiana	2	1	12.3026	3.126761	10.5028	39.7333	86.283
Iowa	1	1	11.6306	2.879199	10.9380	41.5333	93.650
Kansas	1	1	11.4371	2.978925	11.3178	39.0667	95.633
Kentucky	2	1	11.7568	3.433987	10.6068	38.2000	84.867
Louisiana	2	1	12.1042	3.465736	10.8559	30.5333	91.150
Maine	0	0	10.6066	2.484907	10.4740	44.3167	69.800
Maryland	3	1	12.1239	3.218876	9.4260	39.0000	76.083
Massachusetts	2	1	12.4428	2.917771	9.2644	42.3667	71.033
Michigan	3	1	12.8257	3.367296	11.4795	42.7833	84.600
Minnesota	0	0	12.1286	2.724579	11.3730	44.8833	93.217
Mississippi	3	1	11.6233	3.703768	10.7879	32.3167	90.083
Missouri	0	0	12.1716	3.526361	11.1520	38.5667	92.183
Montana	0	0	10.2920	2.852631	11.8985	46.6000	112.000
Nebraska	0	0	11.0494	2.871680	11.2561	40.8500	96.750
Nevada	1	1	10.9462	2.549445	11.6133	39.1667	119.767
New Hampshire	1	1	10.6631	2.785011	9.1431	43.2000	71.500
New Jersey	2	1	12.5266	3.242592	9.0735	40.2167	74.767
New Mexico	0	0	11.0420	3.091042	11.7084	35.6167	106.083
New York	3	1	13.4873	3.072693	10.9070	42.7500	73.800
North Carolina	2	1	12.5005	3.332205	10.8934	35.8667	78.783
North Dakota	0	0	10.1843	3.032546	11.1662	46.7667	100.750
Ohio	0	0	12.9287	3.212187	10.7105	40.0000	82.883
Oklahoma	5	1	11.6138	3.784190	11.1548	35.4000	97.600
Oregon	1	1	11.6680	3.100092	11.4966	44.9167	123.017
Pennsylvania	4	1	13.0212	3.178054	10.7376	40.2000	76.767
Rhode Island	1	1	10.7402	2.656757	7.3428	41.7333	71.433
South Carolina	1	1	11.8946	3.725693	10.3741	33.9500	81.117
South Dakota	0	0	10.2306	2.674149	11.2531	44.3833	100.283
Tennessee	1	1	12.1059	3.564827	10.6488	36.1167	86.683
Texas	7	1	13.4640	3.811097	12.5009	30.3000	97.700
Utah	2	1	11.5123	4.040123	11.3492	40.7667	111.967
Vermont	0	0	10.0402	2.335375	9.1710	44.2667	72.567
Virginia	3	1	12.4058	3.258097	10.6637	37.5000	77.333
Washington	1	1	12.2114	2.995732	11.1747	46.9667	122.900
West Virginia	2	1	10.9746	3.091042	10.0953	38.3667	81.600
Wisconsin	3	1	12.2358	3.212187	11.0898	43.1333	89.333
Wyoming	0	0	9.6926	1.909542	11.4908	41.1500	104.817

Model

Coefficients of the full Generalized Logistic Regression Model.

#Build model

MissAm.glm <- glm(InTop10 ~ LogPopulation + LogContestants + LogTotalArea + Latitude + Longitude, family=binomial(link = "logit"), data = MissAm.df)

MissAm.glm

## 
## Call:  glm(formula = InTop10 ~ LogPopulation + LogContestants + LogTotalArea + 
##     Latitude + Longitude, family = binomial(link = "logit"), 
##     data = MissAm.df)
## 
## Coefficients:
##    (Intercept)   LogPopulation  LogContestants    LogTotalArea  
##      -11.68988         1.28041         2.81720        -1.30420  
##       Latitude       Longitude  
##       -0.01544         0.03781  
## 
## Degrees of Freedom: 49 Total (i.e. Null);  44 Residual
## Null Deviance:       62.69 
## Residual Deviance: 35.97     AIC: 47.97

Summary of the model.

summary(MissAm.glm)

## 
## Call:
## glm(formula = InTop10 ~ LogPopulation + LogContestants + LogTotalArea + 
##     Latitude + Longitude, family = binomial(link = "logit"), 
##     data = MissAm.df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.2553  -0.3656   0.3466   0.5164   1.9389  
## 
## Coefficients:
##                 Estimate Std. Error z value Pr(>|z|)  
## (Intercept)    -11.68988    9.58609  -1.219   0.2227  
## LogPopulation    1.28041    0.64357   1.990   0.0466 *
## LogContestants   2.81720    1.65071   1.707   0.0879 .
## LogTotalArea    -1.30420    0.62712  -2.080   0.0376 *
## Latitude        -0.01544    0.10896  -0.142   0.8873  
## Longitude        0.03781    0.03688   1.025   0.3052  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 62.687  on 49  degrees of freedom
## Residual deviance: 35.972  on 44  degrees of freedom
## AIC: 47.972
## 
## Number of Fisher Scoring iterations: 5

#Manually calculating Predicted and Fitted values
# man.pre<- MissAm.glm$coefficients[1] + 
#   MissAm.glm$coefficients[2] * MissAm.df$LogPopulation +
#   MissAm.glm$coefficients[3] * MissAm.df$LogContestants +
#   MissAm.glm$coefficients[4] * MissAm.df$LogTotalArea +
#   MissAm.glm$coefficients[5] * MissAm.df$Latitude +
#   MissAm.glm$coefficients[6] * MissAm.df$Longitude
# 
# man.fitted <- 1/(1 + (1/exp(man.pre)))

#Get coefficients
MissAm.Coe <- round(MissAm.glm$coefficients,4)

Summary Explanation

First part Call, shows information about response variable and predictor variables.

Logistic regression equation

\[ln \bigg(\frac{P}{1-P}\bigg) = \beta_0 + \beta_1{X_1} + \beta_2{X_2} + \beta_3{X_3} + \beta_4{X_4} + \beta_5{X_5}\]

\[ln \bigg(\frac{P}{1-P}\bigg) = -11.6899 + 1.2804LogPopulation + 2.8172LogContestants -1.3042LogTotalArea -0.0154Latitude + 0.0378Longitude\]

Probability is

\[P = \frac{e^{\beta_0 + \beta_1{X_1} + \beta_2{X_2} + \beta_3{X_3} + \beta_4{X_4} + \beta_5{X_5}}}{1 + {e^{\beta_0 + \beta_1{X_1} + \beta_2{X_2} + \beta_3{X_3} + \beta_4{X_4} + \beta_5{X_5}}}}\]

For every unit increase of LogPopulation, \(ln \bigg(\frac{p}{1-p}\bigg)\) increases by 1.2804. LogPopulation has a positive effect on the outcome when all other predictor variables are held constant. In other words, as log value of state population increases by one unit, log odds or logits for the state to make into Top10 list of finalists increases by 1.2804.
For every unit increase of LogContestants, \(ln \bigg(\frac{p}{1-p}\bigg)\) increases by 2.8172. LogContestants has a positive effect on the outcome when all other predictor variables are held constant. As log value of contestants from a state increases by one unit, log odds or logits for the state to make into Top10 list of finalists increases by 2.8172.
For every unit increase of LogTotalArea, \(ln \bigg(\frac{p}{1-p}\bigg)\) decreases by 1.3042. LogTotalArea has a negative effect on the outcome when all other predictor variables are held constant. As log value of the total area of a state increases by one unit, log odds or logits for the state to make into Top10 list of finalists decreases by 1.3042.
For every degree increase of Latitude, \(ln \bigg(\frac{p}{1-p}\bigg)\) decreases by 0.0154. Latitude has a negative effect on the outcome when all other predictor variables are held constant. As log value of latitude of state capitol increases by one degree, log odds or logits for the state to make into Top10 list of finalists decreases by 0.0154.
For every degree increase of Longitude, \(ln \bigg(\frac{p}{1-p}\bigg)\) increases by 0.0378. Longitude has a positive effect on the outcome when all other predictor variables are held constant. As log value of longitude of state capitol increases by one degree, log odds or logits for the state to make into Top10 list of finalists increases by 0.0378.

Model also suggests,

Variable LogPopulation and LogTotalArea are significent at 5% level
Variables LogContestants is contributing to model and is significent at 10% level.
Since p-value is high variables Latitude and Longitude are not significent to the model.

Null deviance is 62.687, and Residual deviance is 35.972, suggesting variables are needed to build the model. Lower the value of deviance better the model.

Akaike information criterion(AIC), gives the quality of the model. Lower the value of AIC better the model. Since this is full model, lets use Step function to see if AIC improves if we remove any variables from the model.

step(MissAm.glm, test="LRT")

## Start:  AIC=47.97
## InTop10 ~ LogPopulation + LogContestants + LogTotalArea + Latitude + 
##     Longitude
## 
##                  Df Deviance    AIC    LRT Pr(>Chi)  
## - Latitude        1   35.992 45.992 0.0202  0.88703  
## - Longitude       1   37.292 47.292 1.3194  0.25071  
## <none>                35.972 47.972                  
## - LogContestants  1   39.712 49.712 3.7397  0.05313 .
## - LogPopulation   1   40.709 50.709 4.7364  0.02953 *
## - LogTotalArea    1   42.067 52.067 6.0949  0.01356 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=45.99
## InTop10 ~ LogPopulation + LogContestants + LogTotalArea + Longitude
## 
##                  Df Deviance    AIC    LRT Pr(>Chi)   
## - Longitude       1   37.981 45.981 1.9886 0.158485   
## <none>                35.992 45.992                   
## - LogPopulation   1   40.992 48.992 4.9993 0.025358 * 
## - LogContestants  1   41.302 49.302 5.3095 0.021209 * 
## - LogTotalArea    1   44.344 52.344 8.3518 0.003853 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Step:  AIC=45.98
## InTop10 ~ LogPopulation + LogContestants + LogTotalArea
## 
##                  Df Deviance    AIC    LRT Pr(>Chi)  
## <none>                37.981 45.981                  
## - LogPopulation   1   41.708 47.708 3.7274  0.05353 .
## - LogContestants  1   42.521 48.521 4.5404  0.03310 *
## - LogTotalArea    1   44.446 50.446 6.4649  0.01100 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

## 
## Call:  glm(formula = InTop10 ~ LogPopulation + LogContestants + LogTotalArea, 
##     family = binomial(link = "logit"), data = MissAm.df)
## 
## Coefficients:
##    (Intercept)   LogPopulation  LogContestants    LogTotalArea  
##        -9.4502          1.0702          2.5447         -0.9303  
## 
## Degrees of Freedom: 49 Total (i.e. Null);  46 Residual
## Null Deviance:       62.69 
## Residual Deviance: 37.98     AIC: 45.98

AIC value without Latitude and Longitude yield better value. The output of step suggests existence of Latitude and Longitude is not providing any value to the model. Let’s build a model without Latitude and Longitude variables.

MissAm.glm_v1 <- glm(InTop10 ~ LogPopulation + LogContestants + LogTotalArea, family=binomial(link = "logit"), data = MissAm.df)

summary(MissAm.glm_v1)

## 
## Call:
## glm(formula = InTop10 ~ LogPopulation + LogContestants + LogTotalArea, 
##     family = binomial(link = "logit"), data = MissAm.df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3100  -0.4617   0.3657   0.4814   2.0845  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)  
## (Intercept)     -9.4502     6.1757  -1.530   0.1260  
## LogPopulation    1.0702     0.6007   1.782   0.0748 .
## LogContestants   2.5447     1.3779   1.847   0.0648 .
## LogTotalArea    -0.9303     0.4293  -2.167   0.0302 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 62.687  on 49  degrees of freedom
## Residual deviance: 37.981  on 46  degrees of freedom
## AIC: 45.981
## 
## Number of Fisher Scoring iterations: 5

Marginal Plots For Full Model

mmps(MissAm.glm,layout=c(2,3),key=T)

Marginal Plots For The Model Without Longitude

mmps(MissAm.glm_v1,layout=c(2,3),key=T)

There is not much difference between the plots. In both Marginal Plots curve drawn by model and data fit between 0 and 1 for all variables.

Answer (b):

The leverage \(h_i\) is a measure of the distance between the \(x\) value for the \(i^{th}\) data point and the mean of the \(x\) values for all \(n\) data points. If leverage value greater than \(2\times \frac{number\ of\ variables + 1}{number\ of\ observations}\) is considered as high leverage point. Leverage values are also known as hat values. We obtain the values using function hatvalues.

Let’s get hatvalues for the model that does not have Longitude variable. Since we have multiple variables, we will be using Standardized Deviance Residuals It is calculated dividing pearson residual by \(\sqrt{1 - hatvalues}\).

\[Standardized\ Deviance\ Residuals(r_i) = \frac{p_i}{\sqrt{(1 - h_i)}}\]

Leverage points can be identified using influencePlot function from car package or calculated manually.

#Cut of leverage
#we have 4 variables and 50 observations
highLeverageHat = 2 * (3+1)/50

#Leverage values
MissAm.df$hatVal <- hatvalues(MissAm.glm_v1)

#standardized deviance residuals(sdr)
#Get pearson residuals
MissAm.df$pearsonResd <- residuals(MissAm.glm_v1,'pearson')

MissAm.df$sdr <- MissAm.df$pearsonResd / (sqrt(1 - MissAm.df$hatVal))

#Cook's distance
MissAm.df$cookd <- cooks.distance(MissAm.glm_v1)

#High leverage SDR
#data points falling outside 2 standard deviations
highLeverageSdrU <- mean(MissAm.df$sdr) + (2*sd(MissAm.df$sdr))
highLeverageSdrL <- mean(MissAm.df$sdr) - (2*sd(MissAm.df$sdr))

#High leverage based on Cook's distance
#data points falling outside 2 standard deviations
highLeverageCookdU <- mean(MissAm.df$cookd) + (2*sd(MissAm.df$cookd))
highLeverageCookdL <- mean(MissAm.df$cookd) - (2*sd(MissAm.df$cookd))

MissAm.df$Outlier <- ifelse((MissAm.df$hatVal > highLeverageHat | MissAm.df$sdr >  highLeverageSdrU | MissAm.df$sdr <  highLeverageSdrL | MissAm.df$cookd > highLeverageCookdU | MissAm.df$cookd < highLeverageCookdL),'Yes','No')

Identifying leverage data points Using influencePlot function

influencePlot(MissAm.glm_v1, col="red",id.n=5)

##       StudRes        Hat      CookD
## 8  -2.2315944 0.22191328 0.45381657
## 11  1.3549485 0.15762426 0.06839409
## 23 -1.2075931 0.17274282 0.05612712
## 25 -2.2347980 0.04654186 0.11263216
## 28  2.2400994 0.07957585 0.18272920
## 34 -0.7564691 0.15896707 0.01655727
## 35 -2.4581400 0.05002072 0.18585607
## 39  0.5455872 0.21222184 0.01187986
## 45 -0.7364479 0.17412150 0.01749629

Manual calculation to identify leverage data points.

ggplot(data=MissAm.df, aes(hatVal,sdr)) + 
  geom_point(aes(col=Outlier)) + 
  scale_color_manual(values=c("black", "red")) +
  geom_vline(xintercept=highLeverageHat, color="blue") +
  geom_hline(yintercept=c(highLeverageSdrU, highLeverageSdrL), color="blue") +
  geom_text_repel(data=filter(MissAm.df, (Outlier == 'Yes')), aes(hatVal,sdr, label=State), size=3) +
  labs(title = sprintf("High Leverage Data Points Using GGPlot - Manually")) + xlab("Leverage(Hat-Values)") +
  ylab("Standardized Deviance Residuals") +
  annotate("text", x = 0.04, y = -2.3, label = 'SDR - Lower Bound', colour="blue", size = 3) + 
  annotate("text", x = 0.04, y = 2.3, label = 'SDR - Upper Bound', colour="blue", size = 3) +
  annotate("text", x = 0.18, y = 2.5, label = 'High Leverage Hat Value', colour="blue", size = 3)

MissAm.df %>% 
  select(State, Top10, InTop10, LogPopulation, LogContestants, LogTotalArea, Latitude, Longitude, pearsonResd, hatVal, sdr, cookd, Outlier) %>% 
  filter(Outlier == 'Yes') %>% 
  kable("html",caption = "Miss America Contest - High Leverage Data Points") %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = T, position = "left", font_size = 12) %>%
  scroll_box(width = "100%", height = "250px")

Miss America Contest - High Leverage Data Points
State	Top10	InTop10	LogPopulation	LogContestants	LogTotalArea	Latitude	Longitude	pearsonResd	hatVal	sdr	cookd	Outlier
Delaware	0	0	10.3397	2.852631	7.8196	39.1333	75.467	-2.2253921	0.2219133	-2.5228564	0.4538166	Yes
Minnesota	0	0	12.1286	2.724579	11.3730	44.8833	93.217	-0.9430982	0.1727428	-1.0368994	0.0561271	Yes
Missouri	0	0	12.1716	3.526361	11.1520	38.5667	92.183	-2.9664769	0.0465419	-3.0380167	0.1126322	Yes
Nevada	1	1	10.9462	2.549445	11.6133	39.1667	119.767	2.7895320	0.0795758	2.9076180	0.1827292	Yes
Ohio	0	0	12.9287	3.212187	10.7105	40.0000	82.883	-3.6623304	0.0500207	-3.7575127	0.1858561	Yes
Rhode Island	1	1	10.7402	2.656757	7.3428	41.7333	71.433	0.3727730	0.2122218	0.4199934	0.0118799	Yes
Vermont	0	0	10.0402	2.335375	9.1710	44.2667	72.567	-0.5235920	0.1741215	-0.5761491	0.0174963	Yes

Leverage Data Points

Capitol of state of Rhode Island is on a high Latitude(41.7333), and variables LogTotalArea(7.3428) and LogContestants(2.656757) are low, yet contestants made into Top10 list. It seems like Outlier.
For state Delaware values for variables are close to individual averages, yet contestants never made into Top10 list at least once in nine years. Data points look like Outliers.
State of Ohio has variable LogContestants as 3.212187, it is high, yet contestants never made into Top10 list at least once in nine years. Data point looks like Outlier.
State of Missouri has variable LogContestants as 3.526361, it is very high, yet contestants never made into Top10 list at least once in nine years. Data point looks like Outlier.
State of Vermont has variable LogContestants as 2.335375, it is high compared to LogTotalArea 9.1710 and LogPopulation 10.0402, yet contestants never made into Top10 list at least once in nine years. Data point looks like Outlier.
State of Minnesota has variable LogContestants as 2.724580, it is very high compared to LogTotalArea 11.3730 and LogPopulation 12.1286, yet contestants never made into Top10 list at least once in nine years. Data point looks like Outlier.
State of Nevada has variable LogContestants as 2.549445, it is very low, yet contestants from the state made into Top10 list. This does not seem to be bad leverage data point.

Answer (c):

MissAm.glm <- glm(InTop10 ~ LogPopulation + LogContestants + LogTotalArea, family=binomial(link = "logit"), data = MissAm.df)
MissAm.Coe <- round(MissAm.glm$coefficients,4)
summary(MissAm.glm)

## 
## Call:
## glm(formula = InTop10 ~ LogPopulation + LogContestants + LogTotalArea, 
##     family = binomial(link = "logit"), data = MissAm.df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3100  -0.4617   0.3657   0.4814   2.0845  
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)  
## (Intercept)     -9.4502     6.1757  -1.530   0.1260  
## LogPopulation    1.0702     0.6007   1.782   0.0748 .
## LogContestants   2.5447     1.3779   1.847   0.0648 .
## LogTotalArea    -0.9303     0.4293  -2.167   0.0302 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 62.687  on 49  degrees of freedom
## Residual deviance: 37.981  on 46  degrees of freedom
## AIC: 45.981
## 
## Number of Fisher Scoring iterations: 5

Intercepct value(-9.4502) decreased a lot by removing Latitude and Longitude variables.
For every unit increase of LogPopulation, \(ln \bigg(\frac{p}{1-p}\bigg)\) increases by 1.0702. LogPopulation has a positive effect on the outcome when all other predictor variables are held constant. In other words, as log value of state population increases by one unit, log odds or logits for the state to make into Top10 list of finalists increases by 1.0702.
For every unit increase of LogContestants, \(ln \bigg(\frac{p}{1-p}\bigg)\) increases by 2.5447. LogContestants has a positive effect on the outcome when all other predictor variables are held constant. As log value of contestants from a state increases by one unit, log odds or logits for the state to make into Top10 list of finalists increases by 2.5447.
For every unit increase of LogTotalArea, \(ln \bigg(\frac{p}{1-p}\bigg)\) decreases by 0.9303. LogTotalArea has a negative effect on the outcome when all other predictor variables are held constant. As log value of the total area of a state increases by one unit, log odds or logits for the state to make into Top10 list of finalists decreases by 0.9303.