AizaKabeer_Assignment

Math 228/Hon 309 Module 11 Assignment

Names

1.The data set Shuttle contains data for the 23 shuttle flights that occurred before the Challenger mission disaster in 1986. The first variable (Temp) is the temperature (in degrees F) at launch time. The second is an indicator (TD) of whether (1) or not (0) at least one of the O-rings suffered thermal distress. [The Challenger disaster occurred as a direct consequence of the failure of an O-ring in the right solid-rocket booster. An O-ring is a simple device that was used to seal parts of the Challenger rocket engine. In cold weather, the O-ring used on the Shuttle became brittle and deformed and can allow high-temperature gases to escape and ignite.] Obtain the output for logistic regression model with TD as the response variable and temperature as the explanatory variable.

Shuttle <- read.csv("C:/Users/aizax94/Downloads/Shuttle.CSV")
   View(Shuttle)
   
model <- glm(TD ~ Temp, binomial, Shuttle)

Use the fitted model to predict the probability of thermal distress at 31 degrees F, the temperature at the time of the Challenger flight.

model

## 
## Call:  glm(formula = TD ~ Temp, family = binomial, data = Shuttle)
## 
## Coefficients:
## (Intercept)         Temp  
##     15.0429      -0.2322  
## 
## Degrees of Freedom: 22 Total (i.e. Null);  21 Residual
## Null Deviance:       28.27 
## Residual Deviance: 20.32     AIC: 24.32

k <- data.frame(Temp = 31)
p <- predict(model, k, type = "response")
p

##         1 
## 0.9996088

Based on the fitted model, the predicted probability of thermal distress at 31 degrees F is 0.9996.

At what temperature does the predicted probability equal 0.5?

Using the equation for predicted probability with b0 as 15.0429 and b1 as -0.2322, solving for x yields 64.78. The predicted probability equals 0.5 when the temperature is 64.78 F.

Interpret the odds ratio in this case.

exp(-0.2322)

## [1] 0.7927875

The odds of an o-ring suffering thermal distress increase by a factor of 0.7928 for every one degree increase in temperature.

In what percentage of the 23 observations does your model predicted the correct “value” for TD?

fit <- fitted(model) 
predTD <- numeric(23)
predTD[fit >= 0.25] <- 1
n <- data.frame(Shuttle$Temp, predTD)
t <- table(Shuttle$TD, predTD)
t

##    predTD
##      0  1
##   0 10  6
##   1  3  4

The model correctly predicts the value for TD for 6.25% of the launches that didn’t experience thermal distress, and 57.14% of those that did experience thermal distress.

Overall, the model correctly predicts the value for TD for 60.87% of the observations. (10 + 4/ (10 + 4 + 6 + 3))

Test whether thermal distress is independent of temperature. Do this with (i) a Z-test and (ii) a CI for the population odds ratio. In each case state the null and alternative hypothesesyou’re your conclusion. Explain why the two tests are equivalent.

summary(model)

## 
## Call:
## glm(formula = TD ~ Temp, family = binomial, data = Shuttle)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.0611  -0.7613  -0.3783   0.4524   2.2175  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  15.0429     7.3786   2.039   0.0415 *
## Temp         -0.2322     0.1082  -2.145   0.0320 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 28.267  on 22  degrees of freedom
## Residual deviance: 20.315  on 21  degrees of freedom
## AIC: 24.315
## 
## Number of Fisher Scoring iterations: 5

confint(model)

## Waiting for profiling to be done...

##                  2.5 %      97.5 %
## (Intercept)  3.3305848 34.34215133
## Temp        -0.5154718 -0.06082076

The null hypothesis is that there is no statistical difference in whether or not a shuttle experiences thermal distress based on the temperature at take-off (beta1 = 0). The alternative hypothesis is that the tempereature at take-off results in a statistically significant difference in whether or not a shuttle experiences thermal distress(beta1 does not equal 0).

From a Z-test, since the p-value is less than 0.05, we can reject the null hypothesis at the 5% level of significance. This means thermal distress is not independent of temperature.

From the CI, we can be 95% sure that the value for beta1 is between -0.5154718 and -0.06082076. This implies beta1 does not equal 0 and that the null hyothesis is not true. Thermal distress is not independent of temperature.

Both tests are equivalent because they estimate similar things.

2.Every year in the United States, over 120,000 undergraduates submit applications in hopes of realizing their dreams to become physicians. Medical school applicants invest endless hours studying to boost their GPAs. They also invest considerable time in studying fothmmcal school admission test or MCAT. The data set MedSchApp allows us to investigate the influence of these scores on the success of applicants. The data set is based on responses from 55 medical school applicants from a large Midwest university. The variables are listed below:

Count Name 55 Accept 1= accepted 0 = not accepted 55 Sex F or M 55 GPA
55 MCAT MCAT scores 55 NumApp Number of medical school applications

MedSchApp <- read.csv("C:/Users/aizax94/Downloads/MedSchApp.CSV")
   View(MedSchApp)

Obtain a contingency table of acceptance status by sex. Obtain two forms of the odds (of acceptance) ratio, one less than 1 and the other greater than 1. Interpret the two values.

x <- table(MedSchApp$Accept, MedSchApp$Sex)
y <- addmargins(x, 2)
y

##    
##      F  M Sum
##   0 10 15  25
##   1 18 12  30

100*prop.table(y,2)

##    
##            F        M      Sum
##   0 35.71429 55.55556 45.45455
##   1 64.28571 44.44444 54.54545

The odds of being admitted to medical school for females is 18/28, (0.6428). The odds of a male not being admitted to med school is 1.55 times that of a female.

Fit a logistic regression model predicting acceptance status from sex. What is the slope of the line. What is eslope?

model2 <- glm(Accept ~ Sex, binomial, MedSchApp)
model2

## 
## Call:  glm(formula = Accept ~ Sex, family = binomial, data = MedSchApp)
## 
## Coefficients:
## (Intercept)         SexM  
##      0.5878      -0.8109  
## 
## Degrees of Freedom: 54 Total (i.e. Null);  53 Residual
## Null Deviance:       75.79 
## Residual Deviance: 73.59     AIC: 77.59

exp(-0.8109)

## [1] 0.4444579

The slope of the line in -0.8109. This means that the predicted probability of being admitted to medical school for males is less than that for females by a factor of -0.8109. The odds, e^(slope), is 0.4444.

Fit a logistic regression model predicting acceptance status from MCAT, NumApp, and sex. Perform the overall (joint) test of significance. (i) Write the null and the alternative hypothesis and your conclusion. (ii) Perform the three Z tests. Which of the three variables would you definitely throw out? Which one would you definitely retain? Which one might you keep had the Z-test been one-sided?

model3 <- glm(Accept ~ MCAT + NumApp + Sex, binomial, MedSchApp)
summary(model3)

## 
## Call:
## glm(formula = Accept ~ MCAT + NumApp + Sex, family = binomial, 
##     data = MedSchApp)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9331  -0.9523   0.4993   1.0016   1.8226  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)   
## (Intercept) -8.850244   3.474122  -2.547  0.01085 * 
## MCAT         0.265827   0.094524   2.812  0.00492 **
## NumApp      -0.006829   0.062359  -0.110  0.91280   
## SexM        -1.072282   0.636063  -1.686  0.09183 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 75.791  on 54  degrees of freedom
## Residual deviance: 61.699  on 51  degrees of freedom
## AIC: 69.699
## 
## Number of Fisher Scoring iterations: 4

i The null hypothesis in this situation is that the chances of being admitted to medical school are not statistically different depending on MCAT scores, the nunmber of applications, and Sex, than for when ignoring these variables. The alternative hypothesis is that the chances of being admitted to medical school are statistically different when considering MCAT scores, number of applications, and sex.

In this situation, for the joint model between all three variables MCAT, NumApp, and Sex the p-value is 0.01805. This means we cannot reject the null hypothesis, and the alternative is true.

ii The p-value for MCAT is 0.00492, for NumApp it is 0.9128, and for SexM it is 0.09183. Based on these numbers I would definitely keep MCAT in the model, but get rid of NumApp. If the Z-test were one sided, I would keep SexM as well.

Obtain the three odds ratios. Interpret the odds ratio for sex. Why is it different to that in part (b)?

oddsMCAT <- exp(-8.850244 + 0.265827)
oddsNumApp <- exp(-8.850244 - 0.006829)
oddsSexM <- exp(-8.850244 - 1.072282)

oddsMCAT

## [1] 0.0001869972

oddsNumApp

## [1] 0.0001423712

oddsSexM

## [1] 4.905708e-05

After adjusting for the number of Applications and the applicants sex, the odds of being admitted to med school increase by a factor of 0.0001869972 with every one point increase in an MCAT score.

After adjusting for MCAT scores and the applicants sex, the odds of being admitted to med school increase by a factor of 0.0001423712 for every additional application that is submitted.

After adjusting for MCAT scores and the number of applications, the odds of being admitted to med school increase by a factor of 4.905708e-05 for males as compared to female applicants. This odds ratio is different from what we obtained before because in this model we are adjusting for MCAT scores and the number of applications.

AizaKabeer_Assignment_M11

Aiza K

April 26, 2016