Math 228/Hon 309 Module 11 Assignment
Names
1.The data set Shuttle contains data for the 23 shuttle flights that occurred before the Challenger mission disaster in 1986. The first variable (Temp) is the temperature (in degrees F) at launch time. The second is an indicator (TD) of whether (1) or not (0) at least one of the O-rings suffered thermal distress. [The Challenger disaster occurred as a direct consequence of the failure of an O-ring in the right solid-rocket booster. An O-ring is a simple device that was used to seal parts of the Challenger rocket engine. In cold weather, the O-ring used on the Shuttle became brittle and deformed and can allow high-temperature gases to escape and ignite.] Obtain the output for logistic regression model with TD as the response variable and temperature as the explanatory variable.
Shuttle <- read.csv("C:/Users/aizax94/Downloads/Shuttle.CSV")
View(Shuttle)
model <- glm(TD ~ Temp, binomial, Shuttle)
model
##
## Call: glm(formula = TD ~ Temp, family = binomial, data = Shuttle)
##
## Coefficients:
## (Intercept) Temp
## 15.0429 -0.2322
##
## Degrees of Freedom: 22 Total (i.e. Null); 21 Residual
## Null Deviance: 28.27
## Residual Deviance: 20.32 AIC: 24.32
k <- data.frame(Temp = 31)
p <- predict(model, k, type = "response")
p
## 1
## 0.9996088
Based on the fitted model, the predicted probability of thermal distress at 31 degrees F is 0.9996.
Using the equation for predicted probability with b0 as 15.0429 and b1 as -0.2322, solving for x yields 64.78. The predicted probability equals 0.5 when the temperature is 64.78 F.
exp(-0.2322)
## [1] 0.7927875
The odds of an o-ring suffering thermal distress increase by a factor of 0.7928 for every one degree increase in temperature.
fit <- fitted(model)
predTD <- numeric(23)
predTD[fit >= 0.25] <- 1
n <- data.frame(Shuttle$Temp, predTD)
t <- table(Shuttle$TD, predTD)
t
## predTD
## 0 1
## 0 10 6
## 1 3 4
The model correctly predicts the value for TD for 6.25% of the launches that didn’t experience thermal distress, and 57.14% of those that did experience thermal distress.
Overall, the model correctly predicts the value for TD for 60.87% of the observations. (10 + 4/ (10 + 4 + 6 + 3))
summary(model)
##
## Call:
## glm(formula = TD ~ Temp, family = binomial, data = Shuttle)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.0611 -0.7613 -0.3783 0.4524 2.2175
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 15.0429 7.3786 2.039 0.0415 *
## Temp -0.2322 0.1082 -2.145 0.0320 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 28.267 on 22 degrees of freedom
## Residual deviance: 20.315 on 21 degrees of freedom
## AIC: 24.315
##
## Number of Fisher Scoring iterations: 5
confint(model)
## Waiting for profiling to be done...
## 2.5 % 97.5 %
## (Intercept) 3.3305848 34.34215133
## Temp -0.5154718 -0.06082076
The null hypothesis is that there is no statistical difference in whether or not a shuttle experiences thermal distress based on the temperature at take-off (beta1 = 0). The alternative hypothesis is that the tempereature at take-off results in a statistically significant difference in whether or not a shuttle experiences thermal distress(beta1 does not equal 0).
From a Z-test, since the p-value is less than 0.05, we can reject the null hypothesis at the 5% level of significance. This means thermal distress is not independent of temperature.
From the CI, we can be 95% sure that the value for beta1 is between -0.5154718 and -0.06082076. This implies beta1 does not equal 0 and that the null hyothesis is not true. Thermal distress is not independent of temperature.
Both tests are equivalent because they estimate similar things.
2.Every year in the United States, over 120,000 undergraduates submit applications in hopes of realizing their dreams to become physicians. Medical school applicants invest endless hours studying to boost their GPAs. They also invest considerable time in studying fothmmcal school admission test or MCAT. The data set MedSchApp allows us to investigate the influence of these scores on the success of applicants. The data set is based on responses from 55 medical school applicants from a large Midwest university. The variables are listed below:
Count Name 55 Accept 1= accepted 0 = not accepted 55 Sex F or M 55 GPA
55 MCAT MCAT scores 55 NumApp Number of medical school applications
MedSchApp <- read.csv("C:/Users/aizax94/Downloads/MedSchApp.CSV")
View(MedSchApp)
x <- table(MedSchApp$Accept, MedSchApp$Sex)
y <- addmargins(x, 2)
y
##
## F M Sum
## 0 10 15 25
## 1 18 12 30
100*prop.table(y,2)
##
## F M Sum
## 0 35.71429 55.55556 45.45455
## 1 64.28571 44.44444 54.54545
The odds of being admitted to medical school for females is 18/28, (0.6428). The odds of a male not being admitted to med school is 1.55 times that of a female.
model2 <- glm(Accept ~ Sex, binomial, MedSchApp)
model2
##
## Call: glm(formula = Accept ~ Sex, family = binomial, data = MedSchApp)
##
## Coefficients:
## (Intercept) SexM
## 0.5878 -0.8109
##
## Degrees of Freedom: 54 Total (i.e. Null); 53 Residual
## Null Deviance: 75.79
## Residual Deviance: 73.59 AIC: 77.59
exp(-0.8109)
## [1] 0.4444579
The slope of the line in -0.8109. This means that the predicted probability of being admitted to medical school for males is less than that for females by a factor of -0.8109. The odds, e^(slope), is 0.4444.
model3 <- glm(Accept ~ MCAT + NumApp + Sex, binomial, MedSchApp)
summary(model3)
##
## Call:
## glm(formula = Accept ~ MCAT + NumApp + Sex, family = binomial,
## data = MedSchApp)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9331 -0.9523 0.4993 1.0016 1.8226
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -8.850244 3.474122 -2.547 0.01085 *
## MCAT 0.265827 0.094524 2.812 0.00492 **
## NumApp -0.006829 0.062359 -0.110 0.91280
## SexM -1.072282 0.636063 -1.686 0.09183 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 75.791 on 54 degrees of freedom
## Residual deviance: 61.699 on 51 degrees of freedom
## AIC: 69.699
##
## Number of Fisher Scoring iterations: 4
i The null hypothesis in this situation is that the chances of being admitted to medical school are not statistically different depending on MCAT scores, the nunmber of applications, and Sex, than for when ignoring these variables. The alternative hypothesis is that the chances of being admitted to medical school are statistically different when considering MCAT scores, number of applications, and sex.
In this situation, for the joint model between all three variables MCAT, NumApp, and Sex the p-value is 0.01805. This means we cannot reject the null hypothesis, and the alternative is true.
ii The p-value for MCAT is 0.00492, for NumApp it is 0.9128, and for SexM it is 0.09183. Based on these numbers I would definitely keep MCAT in the model, but get rid of NumApp. If the Z-test were one sided, I would keep SexM as well.
oddsMCAT <- exp(-8.850244 + 0.265827)
oddsNumApp <- exp(-8.850244 - 0.006829)
oddsSexM <- exp(-8.850244 - 1.072282)
oddsMCAT
## [1] 0.0001869972
oddsNumApp
## [1] 0.0001423712
oddsSexM
## [1] 4.905708e-05
After adjusting for the number of Applications and the applicants sex, the odds of being admitted to med school increase by a factor of 0.0001869972 with every one point increase in an MCAT score.
After adjusting for MCAT scores and the applicants sex, the odds of being admitted to med school increase by a factor of 0.0001423712 for every additional application that is submitted.
After adjusting for MCAT scores and the number of applications, the odds of being admitted to med school increase by a factor of 4.905708e-05 for males as compared to female applicants. This odds ratio is different from what we obtained before because in this model we are adjusting for MCAT scores and the number of applications.