Exercise 9.14

library(Stat2Data)
## Warning: package 'Stat2Data' was built under R version 3.4.4
data(MedGPA)
head(MedGPA)
##   Accept Acceptance Sex BCPM  GPA VR PS WS BS MCAT Apps
## 1      D          0   F 3.59 3.62 11  9  9  9   38    5
## 2      A          1   M 3.75 3.84 12 13  8 12   45    3
## 3      A          1   F 3.24 3.23  9 10  5  9   33   19
## 4      A          1   F 3.74 3.69 12 11  7 10   40    5
## 5      A          1   F 3.53 3.38  9 11  4 11   35   11
## 6      A          1   M 3.59 3.72 10  9  7 10   36    5

9.14 Medical school acceptance.

The datafile MedGPA used in Example 9.4 also contains information on the medical school admission test (MCAT) scores for the same sample of 55 students. Fit a logistic regression model to predict the Acceptance status using the MCAT scores.

a. Write down the estimated versions of both the logit and probability forms of this model.

lm(MedGPA$Acceptance~MedGPA$MCAT)
## 
## Call:
## lm(formula = MedGPA$Acceptance ~ MedGPA$MCAT)
## 
## Coefficients:
## (Intercept)  MedGPA$MCAT  
##    -1.01263      0.04295
logit<-(-1.01263+.04295)

We estimate that the form of the logit of this model to be \[log(\pi/(1-\pi))=-1.01263+.04295 * MCAT Score\].

We estimate that the form of the probability of this model to be \[\pi=(exp(-1.01263+.04295 * MCAT Score))/(1+(exp(-1.01263+.04295 * MCAT Score)))\]

b. What would the estimated model say about the chance that a student with MCAT = 40 is accepted to medical school?

(exp(-1.01263+.04295*40))/(1+(exp(-1.01263+.04295 *40)))
## [1] 0.6693773

We expect there to be a 66.93773% chance of them getting accepted into medical school.

(exp(-1.01263+.04295*40))
## [1] 2.024596

We expect the odds of them getting into medical school to be 2.024596 to 1.

c. For approximately what MCAT score would a student have roughly a 50-50 chance of being accepted to medical school? (Hint: You might look at a graph or solve one of the equations algebraically.)

# So we want to find 1=(exp(-1.01263+.04295*x)).
#log(1)= -1.01263+.04295*x
#log(1)+1.01263=.04295*x
#(log(1)+1.01263)/.04295=x
x<-(log(1)+1.01263)/.04295
(exp(-1.01263+.04295*x))/(1+(exp(-1.01263+.04295 *x)))
## [1] 0.5
x
## [1] 23.57695

As shown in the code, I solved this equation by puting the logit value as 1 so that there is a 1:1 chance of being accepted. And then solved for the MCAT score or x. The end result gave an expected 50% chance of passing. So we expect that a MCAT score of 23.57695 would give a 50-50 chance of being accepted into medical school.

Exercise 9.15

9.15 Metastasizing cancer. In a study of 31 patients with esophageal cancer, it was found that in 18 of the patients the cancer had metastasized to the lymph nodes. Thus, an overall estimate of the probability of metastasis is 18/31 = 0.58. A predictor variable measured on each patient is Size of the tumor (in cm).

a. Use this model to estimate the odds of metastisis, ??/(1 ??? ??), if a patient’s tumor size is 6 cm.

exp(-2.086+.5117*6)
## [1] 2.67567

We estimate that the odds of metastsis if a patient’s tumor is 6 cm in size to be 2.67567 to 1.

b. Use the model to predict the probability of metastasis if a patient’s tumor size is 6 cm.

(exp(-2.086+.5117*6))/(1+(exp(-2.086+.5117*6)))
## [1] 0.7279408

We expect the probability of the a patient with a tumor that is 6 cm to have metastasis to be 72.794%.

c. How much do the estimated odds change if the tumor size changes from 6 cm to 7 cm? Provide and interpret an odds ratio.

exp(-2.086+.5117*7) # the odds of metastasis with a 7 cm tumor
## [1] 4.463352
exp(-2.086+.5117*7)-exp(-2.086+.5117*6)#The difference in odds between a 7 cm and a 6cm tumor.
## [1] 1.787681
exp(-2.086+.5117*7)/exp(-2.086+.5117*6)#The ratio in odds between a 7 cm and a 6cm tumor.
## [1] 1.668125

The odds of someone with a 7 cm tumor to have metastasis is 4.463352, which is 1.787681 higher odds then someone with a 6 cm tumor. This means they have a 4.463352 to 1 chance of having metastsis with a 7 cm tumor. Or that around 5 people out of 6 will have metastsis if they have a 7 cm tumor. This shows that the odds of someone to have metastasis increase with the size of the tumor. A perosn with a tumor that is 7cm have 1.668125 times the odds of a person with a tumor that is 6 cm to have metastasis.

d. How much does the estimate of ?? change if the tumor size changes from 6 cm to 7 cm?

(exp(-2.086+.5117*7))/(1+(exp(-2.086+.5117*7)))#The likelyhood of a 7 cm tumor to have metatasis.
## [1] 0.8169622
(exp(-2.086+.5117*7))/(1+(exp(-2.086+.5117*7)))-(exp(-2.086+.5117*6))/(1+(exp(-2.086+.5117*6)))#The difference in likelyhood of a 7 cm tumor and a 6 cm tumor to have metatasis.
## [1] 0.08902139

We expect the probability of the a patient with a tumor that is 7 cm to have metastasis to be 81.69622% which is 8.902139% higher than a tumor that is 6 cm.

Exercise 9.18

data(Titanic)

9.18 Titanic: Survival and sex.

a. Use a two-way table to explore whether survival is related to the sex of the passenger. What do you conclude from this table alone? Write a summary statement that interprets Sex as the explanatory variable and Survived as the response variable, and that uses simple comparisons of conditional proportions or percentages.

tab<-table(Titanic$Survived, Titanic$Sex)
tab
##    
##     female male
##   0    154  709
##   1    308  142

This shows that 308:154 females survived and 142:709 males survived the titanic. This shows that there is a possible relationship between the sex and if they survived since double the amount of women survived compare to die and almost 5 times as many males died as survived.

ms<-142/(709+142)
fs<-308/(154+308)
ms
## [1] 0.1668625
fs
## [1] 0.6666667

There was a 2/3 survival rate for females and a 16.69% survival rate for males showing a clear relationship between survival and sex.

b. Use software to fit a logistic model to the survival and sex variables to decide whether there is a statistically significant relationship between sex and survival. If there is, what are the nature and magnitude of the relationship? Does the relationship found by the logistic model confirm the descriptive analysis? (Note: You will actually use SexCode as the predictor in the logistic model.)

logm <- glm(Survived ~ SexCode, data=Titanic, family=binomial)
summary(logm)
## 
## Call:
## glm(formula = Survived ~ SexCode, family = binomial, data = Titanic)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.4823  -0.6042  -0.6042   0.9005   1.8924  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.60803    0.09194  -17.49   <2e-16 ***
## SexCode      2.30118    0.13488   17.06   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1688.1  on 1312  degrees of freedom
## Residual deviance: 1355.5  on 1311  degrees of freedom
## AIC: 1359.5
## 
## Number of Fisher Scoring iterations: 4

So $_0 $ is -1.60803 and \(\beta_1\) is 2.30118.

From the logistic model we can see that the p value is 2*10^-16 which is close to 0 so we reject that there is no relationship between sex and the survival rate on the Titanic. Its a binomial nature.

Exercise 10.2

10.2 Empirical logits. Here is a plot of empirical logits for a dataset with two predictors: x, a continuous variable, and Group, a categorical variable with two levels, 1 and 2. The circles are for Group = 1 and the triangles are for Group = 2. What model is suggested by this plot?

The model suggested by the plot is \(log(\pi/(1-\pi))=\beta_0+\beta_1x+\beta_2Group2\) where group2 is an indicator value for group.

Exercise 10.4

The model suggested by the plot is \(log(\pi/(1-\pi))=\beta_0+\beta_1Group2\) This is because group2 greatly increases and group1 stays around the intercept.

Exercise 10.12

data(Titanic)
head(Titanic)
##                                            Name PClass   Age    Sex
## 1                  Allen, Miss Elisabeth Walton    1st 29.00 female
## 2                   Allison, Miss Helen Loraine    1st  2.00 female
## 3           Allison, Mr Hudson Joshua Creighton    1st 30.00   male
## 4 Allison, Mrs Hudson JC (Bessie Waldo Daniels)    1st 25.00 female
## 5                 Allison, Master Hudson Trevor    1st  0.92   male
## 6                            Anderson, Mr Harry    1st 47.00   male
##   Survived SexCode
## 1        1       1
## 2        0       1
## 3        0       0
## 4        0       1
## 5        1       0
## 6        1       0

10.12 Sinking of the Titanic (continued). In Exercises 9.17-9.20, we considered data on the passengers who survived and those who died when the oceanliner Titanic sank on its maiden voyage in 1912. The dataset in Titanic includes the following variables:

  1. In Exercises 9.17-9.20, you fit separate logistic regression models for the binary response Survived using Age and then SexCode. Now fit a multiple logistic model using these two predictors. Write down both the logit and probability forms for the fitted model.
logm <- glm(Survived ~ SexCode+Age, data=Titanic, family=binomial)
summary(logm)
## 
## Call:
## glm(formula = Survived ~ SexCode + Age, family = binomial, data = Titanic)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7541  -0.6905  -0.6504   0.7576   1.8628  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.159839   0.219651  -5.280 1.29e-07 ***
## SexCode      2.465996   0.178455  13.819  < 2e-16 ***
## Age         -0.006352   0.006187  -1.027    0.305    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1025.57  on 755  degrees of freedom
## Residual deviance:  795.59  on 753  degrees of freedom
##   (557 observations deleted due to missingness)
## AIC: 801.59
## 
## Number of Fisher Scoring iterations: 4

We estimate that the form of the logit of this model to be \[log(\pi/(1-\pi))=-1.159839-.006352 * Age +2.465996*SexCode\].

We estimate that the form of the probability of this model to be \[\pi=(exp(-1.159839-.006352 * Age +2.465996*SexCode))/(1+(exp(-1.159839-.006352 * Age +2.465996*SexCode)))\]

  1. Comment on the effectiveness of each of the predictors in the two-predictor model.
logm <- glm(Survived ~ SexCode+Age, data=Titanic, family=binomial)
summary(logm)
## 
## Call:
## glm(formula = Survived ~ SexCode + Age, family = binomial, data = Titanic)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.7541  -0.6905  -0.6504   0.7576   1.8628  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.159839   0.219651  -5.280 1.29e-07 ***
## SexCode      2.465996   0.178455  13.819  < 2e-16 ***
## Age         -0.006352   0.006187  -1.027    0.305    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1025.57  on 755  degrees of freedom
## Residual deviance:  795.59  on 753  degrees of freedom
##   (557 observations deleted due to missingness)
## AIC: 801.59
## 
## Number of Fisher Scoring iterations: 4

Looking at the summary, SexCode is significant for finding out someone survived since it has a p value close to 0, but Age was not signigicant for finding out if someone survived since its p value is above .05.

  1. According to the fitted model, estimate the probability and odds that an 18-year-old man would survive the Titanic sinking.
(exp(-1.159839-.006352 * 18)/(1+(exp(-1.159839-.006352 * 18 ))))
## [1] 0.2185434

The model expects a 21.85434% probability that an 18 year old man would survive the titanic sinking.

(exp(-1.159839-.006352 * 18))
## [1] 0.2796616
1/(exp(-1.159839-.006352 * 18))
## [1] 3.57575

The model expects that for every 1 18 year old man that survived the titanic sinking, 3.6 did not survive. This also means that for every 1 18 year old man that died .28 18 year old men surivived.

  1. Repeat the calculations for an 18-year-old woman and find the odds ratio compared to a man of the same age.
(exp(-1.159839-.006352 * 18 +2.465996))/(1+(exp(-1.159839-.006352 * 18 +2.465996)))
## [1] 0.7670666

The model expects a 76.70666% probability that an 18 year old woman would survive the titanic sinking.

(exp(-1.159839-.006352 * 18+2.465996))
## [1] 3.293072
1/(exp(-1.159839-.006352 * 18+2.465996))
## [1] 0.3036678

The model expects that for every 1 18 year old woman that survived the titanic sinking, .3036678 did not survive. This also means that for every 1 18 year old woman that died 3.29 18 year old women surivived. This means that 3.29 18 year old women survive for every 1 18 year old woman’s death compare to the 0.2796616 18 year old men that survive for every 1 18 year old man’s death.

3.293072/0.2796616
## [1] 11.7752

This is 11.772 18 year old females survive for each 18 year old male that survive.

  1. Redo both (b) and (c) for a man and woman of age 50.
(exp(-1.159839-.006352 * 50)/(1+(exp(-1.159839-.006352 * 50 ))))
## [1] 0.1858146

The model expects a 18.58146% probability that an 50 year old man would survive the titanic sinking.

(exp(-1.159839-.006352 * 50))
## [1] 0.2282214
1/(exp(-1.159839-.006352 * 50))
## [1] 4.38171

The model expects that for every 1 50 year old man that survived the titanic sinking, 4.38171 did not survive. This also means that for every 1 50 year old man that died .2282214 50 year old men surivived.

(exp(-1.159839-.006352 * 50 +2.465996))/(1+(exp(-1.159839-.006352 * 50 +2.465996)))
## [1] 0.7288028

The model expects a 72.88028% probability that an 50 year old woman would survive the titanic sinking.

(exp(-1.159839-.006352 * 50+2.465996))
## [1] 2.687354
1/(exp(-1.159839-.006352 * 50+2.465996))
## [1] 0.3721133

The model expects that for every 1 50 year old woman that survived the titanic sinking, .3721133 did not survive. This also means that for every 1 50 year old woman that died, 2.687354 18 year old women surivived. This means that 2.687354 50 year old women survive for every 1 50 year old woman’s death compare to the .22822 18 year old men that survive for every 1 50 year old man’s death.

2.687354/.22822
## [1] 11.77528

This is 11.77528 50 year old females survive for each 50 year old male that survive.

  1. What happens to the odds ratio (female to male of the same age) when the age increases in the Titanic data? Will this always be the case?

The odds ratio between females and males were the same within the different age increases. This will always be the case with our model since we have 1 catagorical variable which is yes or no and one quanitive that is not exponational.