This weeek we are discussing about LOgistic Regression. Here is the problem we done in class ###6.8.1 a. predictor is poor grades response is whether or not student binge drink b. predictor is GPA and MCAT response is whether or getting into medical school c. predictor is if the child is a boy response is likeliness to marry the father d. predictor is students participating in sports response is more or less likely to graduate e. predictor is exposure to a particular chemical response is whether or not they have cancer

In order to use logistic regression, we have to meet certain assumptions: The response must be dichotomous (only two possible responses) The observations must be independent of each other Variance is np(1-p); variance is highest at p = .5 log(1/1-p) must be a linear function of x ###How to interpreting coefficients Logistic Regression: Every 1 unit increase in x is associated w/ a b1 unit increase in the log odds OR Every 1 unit increase in x is associated w/ a multiplicative change in the odds by a factor of exp(b1)

crabs <- read.csv("http://www.cknudson.com/data/crabs.csv")
weightmod<- glm(y~weight, data= crabs, family = binomial)
summary(weightmod)
## 
## Call:
## glm(formula = y ~ weight, family = binomial, data = crabs)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1108  -1.0749   0.5426   0.9122   1.6285  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -3.6947264  0.8801975  -4.198 2.70e-05 ***
## weight       0.0018151  0.0003767   4.819 1.45e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 225.76  on 172  degrees of freedom
## Residual deviance: 195.74  on 171  degrees of freedom
## AIC: 199.74
## 
## Number of Fisher Scoring iterations: 4

estimated log odds = -3.6947264 + 0.0018151(weight) Heavier crabs have higher chances of having satellites since a 1 unit increase in the weight of a female crab is associated with a 1.001817 multiplicative change in the odds ratio of having a satellite.

width1 <- 28
ans <- -12.3508 + .4972*(width1)
prob <- exp(ans)/(1 + exp(ans))
prob
## [1] 0.8278976

A female crab with a carapace width of 28 cm has approximately a 82.79% chance of having at least 1 satellite crab. Creating a fictional list of possible widths and then plotting that with their corresponding probabilities of having at least 1 satellite crab provides this chart: Looks like there is a linear relationship between a female crab’s weight and her log odds of satellites. The p value of the coefficient for weight is 1.45*10^-6 which is significant at the alpha = 0.05 level. H0: no lack of fit Ha: lack of fit pchisq(deviance(modelname), lower.tail=FALSE, df= #betas-1)

width1 <- c(15,17,19,21,23,24,26,28,30,31,33,37)
ans <- -12.3508 + .4972*(width1)

prob <- exp(ans)/(1 + exp(ans))

plot(width1,prob, type = "l", ylab = "Probability", xlab = "Width " )

Looks like this does a great job of visually representing the relationship of width and the probability of having at least one satellite. ###Review “Odds” is always a confusing term for me because growing up it was synonomous with probability of something happening. However, in the statistical world, the word means something a little different.

Odds = #successes/#failures or Odds = p/1-p

If I go to the gym and shoot 100 free throws and make 63 out of the 100. The probability of me making a free throw is .63

However, the odds of me making a free throw is .63/1-.63 or approximately 1.703 to 1.

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.