A.

[All Students] Using the “LI” data from pp.113-114, exercise 4.1,

(a)

Plot the jittered data (as on p.92) with the GAM and logistic models fits superimposed (and with a legend for the two plots in the top left corner of the plot) – use two distinct colors for the plots

LI = c(8,8,10,10,12,12,12,14,14,14,16,16,16,18,20,20,20,22,22,24,26,28,32,34,38,38,38)
y = c(rep(0,13), rep(1,3), 0,1,0,0,1,1,0,1,1,1,0)
fit <- glm(y ~ LI, family = binomial)
 plot(jitter(y, 0.08) ~ LI, pch = 19, col = "purple", xlab = "Percentage of Labeled Cells", ylab = "Remission")
library(gam)

## Loading required package: splines

## Loading required package: foreach

## Loaded gam 1.16

gam.fit <- gam(y ~ s(LI), family = binomial)
curve(predict(gam.fit, data.frame(LI = x), type = "resp"), col = "blue", lty = 4, lwd = 3, add = T)
curve(predict(fit, data.frame(LI = x), type = "resp"), col = "darkgreen", lty = 2, lwd = 3, add = T)
legend("topleft", legend=c("Logistic Fit","GAM Fit"), lty = c(2, 4), lwd = 3, col = c("darkgreen", "blue"))

(b)

Showing all needed calculations, find, report and interpret the 𝑳𝑫𝟕𝟓 – the point on the x axis where the predicted response reaches 𝟕𝟓% – and clearly indicate the units

summary(fit)

## 
## Call:
## glm(formula = y ~ LI, family = binomial)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9448  -0.6465  -0.4947   0.6571   1.6971  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept) -3.77714    1.37862  -2.740  0.00615 **
## LI           0.14486    0.05934   2.441  0.01464 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 34.372  on 26  degrees of freedom
## Residual deviance: 26.073  on 25  degrees of freedom
## AIC: 30.073
## 
## Number of Fisher Scoring iterations: 4

Using the Logit model:

#log(.75/.25) = -3.77714 + 0.14486*LI
#Solving with algebra
(log(.75/.25)+3.77714)/.14486

## [1] 33.65838

#Checking Answer
predict(fit, data.frame(LI = 33.61875), type = "response")

##         1 
## 0.7489423

Holly found a package that is more accurate for some reason.

library(MASS)
dose.p(fit, p = .75)

##               Dose       SE
## p = 0.75: 33.65763 5.931823

ld75 = 33.65753
predict(fit, data.frame(LI = ld75), type = "response")

##         1 
## 0.7499972

Units are “Percentage of”labeled" cells.

(c)

Showing all needed calculations, find, report and interpret the 𝑳𝑫𝟏𝟎 – the point on the x axis where the predicted response reaches 𝟏𝟎% – and clearly indicate the units.

(log(.1/.9)+3.7714)/.14486

## [1] 10.86687

dose.p(fit, p = .10)

##              Dose       SE
## p = 0.1: 10.90626 5.566491

ld10 = 10.90626
predict(fit, data.frame(LI = ld10), type = "response")

##   1 
## 0.1

Following the same logic as (b), units are “Percentage of”labeled" cells.

B.

[All Students] p. 115, exercise 4.8. Show all work in your calculations and attach R programs and output/graph. Be clear and detailed in your interpretation of the AUC in part (d).

a.

crabs <- read.table("http://www.stat.ufl.edu/~aa/cat/data/Crabs.dat", header = T)
fitB <- glm(y ~ weight, family = binomial, data=crabs)
summary(fitB)

## 
## Call:
## glm(formula = y ~ weight, family = binomial, data = crabs)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.1108  -1.0749   0.5426   0.9122   1.6285  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -3.6947     0.8802  -4.198 2.70e-05 ***
## weight        1.8151     0.3767   4.819 1.45e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 225.76  on 172  degrees of freedom
## Residual deviance: 195.74  on 171  degrees of freedom
## AIC: 199.74
## 
## Number of Fisher Scoring iterations: 4

$logit(pi) = -3.6947 + 1.8151*weight$

#i
exp(1.8151)

## [1] 6.14169

This is the estimated multiplicative increase in odds of having a satellite for each 1-kg increase in weight.

#ii
exp(1.8151*.10)

## [1] 1.199027

This is the estimated multiplicative increase in odds of having a satellite for each .10-kg increase in weight.

#iii
exp(1.8151*.58)

## [1] 2.865543

This is the estimated multiplicative increase in odds of having a satellite for each .58-kg increase in weight.

b

exp(1.8151*.10)

## [1] 1.199027

This is the estimated multiplicative increase in odds of having a satellite for each .10-kg increase in weight.

c

prop = sum(crabs$y)/nrow(crabs);prop

## [1] 0.6416185

predicted = as.numeric(fitted(fitB) > prop)

t = xtabs(~crabs$y + predicted);addmargins(t)

##        predicted
## crabs$y   0   1 Sum
##     0    45  17  62
##     1    43  68 111
##     Sum  88  85 173

Sensitivity = 68/111; Sensitivity

## [1] 0.6126126

Specificity = 45/62; Specificity

## [1] 0.7258065

#Correct Total:
(68+45)/173

## [1] 0.6531792

Thus the Model correctly predicts a satellite 0.6126126 and correctly predicts not having a satellite 0.7258065.

#comparing to .5
predicted2 = as.numeric(fitted(fitB) > .5)

t2 = xtabs(~crabs$y + predicted2);addmargins(t2)

##        predicted2
## crabs$y   0   1 Sum
##     0    27  35  62
##     1    20  91 111
##     Sum  47 126 173

d

 library(pROC)

## Type 'citation("pROC")' for a citation.

## 
## Attaching package: 'pROC'

## The following objects are masked from 'package:stats':
## 
##     cov, smooth, var

        rocplot <- roc(y ~ fitted(fitB), data = crabs)
        plot.roc(rocplot, legacy.axes = T, col = "purple", lwd = 3, lty = 1)

        auc = auc(rocplot); auc

## Area under the curve: 0.7379

The concordance index 0.7379396 estimates the probability that the model will correctly predict if crabs will have at least one satellite or not. If the model was just guessing, then the probability would be 0.5. Therefore this model seems to be better at predicting than just guessing.

C.

[All Students] pp. 116-117, exercise 4.12. Clearly label your answers with these designations:

(a)

Fit the binomial logit-link logistic model using R (attach program and output)

df = read.table("http://users.stat.ufl.edu/~aa/cda/data/MBTI.dat", header = TRUE)
fitC = glm(drink/n ~ EI + SN + TF + JP , family = binomial, weights = n, data=df)
summary(fitC)

## 
## Call:
## glm(formula = drink/n ~ EI + SN + TF + JP, family = binomial, 
##     data = df, weights = n)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2712  -0.8062  -0.1063   0.1124   1.5807  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -2.1140     0.2715  -7.788 6.82e-15 ***
## EIi          -0.5550     0.2170  -2.558  0.01053 *  
## SNs          -0.4292     0.2340  -1.834  0.06664 .  
## TFt           0.6873     0.2206   3.116  0.00184 ** 
## JPp           0.2022     0.2266   0.893  0.37209    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 30.488  on 15  degrees of freedom
## Residual deviance: 11.149  on 11  degrees of freedom
## AIC: 73.99
## 
## Number of Fisher Scoring iterations: 4

(b) and (c)

Report the fitted or prediction equation and clearly indicate how you (or R) set up the 4 variables (c) Clearly/thoroughly explain why the highest predicted probability is associated with the ENTP run Add the following parts:

$logit(pi) = -2.114 - .555*EI - .4292* SN + .6873*TF + .2022*JP$ where pi is the probability of reporting drinking alcohol frequenty, 1 for EI is I, 1 for SN is S, 1 for TF is T, and 1 for JP is P. Since the Estimate for the Betas corresponding to EI and SN are negative, with those “off” (AKA NOT introverted and NOT Sensing), and with values of 1 for TF (Thinking) and JP (Perceiving) we get the maximum value for our logit response (hence ENTP is the highest value for the logit).

(d)

Using the prediction equation, give the predicted probability associated with this ENTP run showing all relevant work

predict(fitC, data.frame(EI = "e", SN = "n", TF = "t", JP = "p"), type = "response")

##         1 
## 0.2271486

(e)

Using R (and providing your R program and output), provide a table (i.e., the R output) with each of the 16 cases here as rows – include the raw data, the predicted probabilities (also called fitted values), and the 95% confidence intervals for the predicted probabilities. Hint: see p.96.

untransform <- function(x) {
      exp(x)/ (1 + exp(x))
      }
pred.prob = fitted(fitC)
lp = predict(fitC, se.fit = TRUE)
LB = lp$fit-qnorm(.975)*lp$se.fit
UB = lp$fit+qnorm(.975)*lp$se.fit
LB.p = untransform(LB)
UB.p = untransform(UB)
cbind(as.character(df$EI), as.character(df$SN), as.character(df$TF), as.character(df$JP), pred.prob, LB.p, UB.p)

##                    pred.prob            LB.p                
## 1  "e" "s" "t" "j" "0.135185995617697"  "0.0931172482722204"
## 2  "e" "s" "t" "p" "0.160618495610535"  "0.104683668260431" 
## 3  "e" "s" "f" "j" "0.0728847929649104" "0.0475781110856337"
## 4  "e" "s" "f" "p" "0.0877863430939439" "0.056945742765649" 
## 5  "e" "n" "t" "j" "0.193611499337586"  "0.122230341820276" 
## 6  "e" "n" "t" "p" "0.227148569323643"  "0.150430654325489" 
## 7  "e" "n" "f" "j" "0.107739004955063"  "0.0662299916180012"
## 8  "e" "n" "f" "p" "0.128776805961202"  "0.087368053391938" 
## 9  "i" "s" "t" "j" "0.0823472222532527" "0.0557651787705972"
## 10 "i" "s" "t" "p" "0.0989768623938524" "0.0634001298871951"
## 11 "i" "s" "f" "j" "0.0431811803063737" "0.0272291405566716"
## 12 "i" "s" "f" "p" "0.0523526558573392" "0.032964391018237" 
## 13 "i" "n" "t" "j" "0.121135227579985"  "0.0727636218369248"
## 14 "i" "n" "t" "p" "0.144365624878002"  "0.0911440303870178"
## 15 "i" "n" "f" "j" "0.0648240222100406" "0.0376315312815638"
## 16 "i" "n" "f" "p" "0.0782165584699526" "0.0500686110443195"
##    UB.p                
## 1  "0.192231981663945" 
## 2  "0.238479683006856" 
## 3  "0.11009605188184"  
## 4  "0.132974420092835" 
## 5  "0.292774426605533" 
## 6  "0.327891347602674" 
## 7  "0.17051280884275"  
## 8  "0.185815618486869" 
## 9  "0.119990147919009" 
## 10 "0.15129251895222"  
## 11 "0.0678270189260874"
## 12 "0.0821752633991756"
## 13 "0.194903955094926" 
## 14 "0.221104337446394" 
## 15 "0.109431440893027" 
## 16 "0.120186799360624"

#When I thought I was being clever...:
#dat = expand.grid(EI = c("e","i"), SN = c("s","n"), TF = c("t","f"), JP = c("j", "p"))
#predict(fitC, data.frame(dat), type = "response")

D.

[All Students] pp. 117-118, exercise 4.14. Using these data (accessed via the Infection data file on the text website), fit the related logistic model(s) to do the following:

D = read.table("http://users.stat.ufl.edu/~aa/cda/data/Infection.dat", header = TRUE)
fitD = glm(y/n ~ factor(center) + treat, family = binomial, weights = n, data=D)
summary(fitD)

## 
## Call:
## glm(formula = y/n ~ factor(center) + treat, family = binomial, 
##     data = D, weights = n)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.87919  -0.77729  -0.00401   0.47293   0.92934  
## 
## Coefficients:
##                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -1.3220     0.3165  -4.177 2.95e-05 ***
## factor(center)2   2.0554     0.4201   4.893 9.94e-07 ***
## factor(center)3   1.1529     0.4246   2.715  0.00662 ** 
## factor(center)4  -1.4185     0.6636  -2.138  0.03255 *  
## factor(center)5  -0.5199     0.5338  -0.974  0.33007    
## factor(center)6  -2.1469     1.0614  -2.023  0.04310 *  
## factor(center)7  -0.7977     0.8149  -0.979  0.32764    
## factor(center)8   2.2079     0.7195   3.069  0.00215 ** 
## treat             0.7769     0.3067   2.533  0.01130 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 93.5545  on 15  degrees of freedom
## Residual deviance:  9.7463  on  7  degrees of freedom
## AIC: 66.136
## 
## Number of Fisher Scoring iterations: 4

(a)

Using the likelihood-based test, test whether one can accept that the odds ratios are equal across the 8 centers. Here, ‘odds’ refers to the Success as response comparing Drug and Control. Clearly give the hypotheses, test statistic, degrees of freedom, p-value, and your clear conclusion.

$H_0:\theta_2 = \theta_3 =...= \theta_8$ $H_a:$ At least one $\theta_i \neq \theta_j$ Where $i\neq j$ and $\theta$ is the odds ratios of success at each center. Note $\theta_2$ refers to center 2 and $\alpha$ absorbs center 1 and $\theta_1$ corresponds to treatment.

fit.red = glm(y/n ~ treat, family = binomial, weights = n, data=D)
an = anova(fitD, fit.red, test = "LRT"); an

## Analysis of Deviance Table
## 
## Model 1: y/n ~ factor(center) + treat
## Model 2: y/n ~ treat
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1         7      9.746                          
## 2        14     90.960 -7  -81.214 7.788e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

stat = an$Deviance[2]
df = an$Df[2]

With a difference in Deviance of -81.2139004 and a corresponding p-value of 7.788215110^{-15} < $\alpha = 0.05$ we reject tjhe null and conclude that at least one odds ratio of success at a center is different.

(b)

In the logistic model involving only center (as a factor) and treatment/drug (as a dummy variable), obtain and clearly interpret the estimated odds ratio associated with (i) center 2

OR.Center2 <- exp(coefficients(fitD)[2]); OR.Center2

## factor(center)2 
##         7.81023

Controlling for treatment/drug, the estimated odds of success at center 2 is 7.8102302 times the estimated odds for the other centers.

center 4, and

OR.Center4 <- exp(coefficients(fitD)[4]); OR.Center4

## factor(center)4 
##       0.2420881

Controlling for treatment/drug, the estimated odds of success at center 4 is 0.2420881 times the estimated odds for the other centers.

treatment/drug

OR.treat <- exp(coefficients(fitD)[9]); OR.treat

##    treat 
## 2.174764

Controlling for center, the estimated odds of success at for treatment is 2.1747643 times the estimated odds of control.

(c)

In the logistic model involving only center (as a factor) and treatment/drug (as a dummy variable), use the likelihood-based test to test whether treatment/drug is significant in this model. Give the hypotheses, test statistic, p-value, and your clear conclusion.

$H_0:\beta_{treat} =0$ $H_a:\beta_{treat} \neq 0$

fit.null = glm(y/n ~ factor(center), family = binomial, weights = n, data=D)
an = anova(fitD, fit.null, test = "LRT"); an

## Analysis of Deviance Table
## 
## Model 1: y/n ~ factor(center) + treat
## Model 2: y/n ~ factor(center)
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)   
## 1         7     9.7463                        
## 2         8    16.4151 -1  -6.6688 0.009811 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

stat = an$Deviance[2]
df = an$Df[2]

With a difference in Deviance of -6.6688176 with a corresponding p-value of 0.0098114 < $= 0.05 $ we reject tjhe null and conclude that treatment/drug is significant in this model.

E.

[STAT-410 students only] pp. 118-119, exercise 4.16. Perform the likelihood test in part (a), clearly giving the hypotheses, test statistic, p-value and your clear answer. Add the following parts:

Input = ("
D  t  Y  
45 0 0 
15 0 0 
40 0 1 
83 1 1 
90 1 1 
25 1 1
35 0 1
65 0 1
95 0 1
35 0 1
75 0 1
45 1 1
50 1 0
75 1 1
30 0 0
25 0 1
20 1 0
60 1 1
70 1 1
30 0 1
60 0 1
61 0 0
65 0 1
15 1 0
20 1 0
45 0 1
15 1 0
25 0 1
15 1 0
30 0 1
40 0 1
15 1 0
135 1 1
20 1 0
40 1 0
")
dfE = read.table(textConnection(Input),header=TRUE)

a. Fit model with interaction

fitE = glm(Y ~ D + t + D*t, family = binomial, data=dfE)
summary(fitE)

## 
## Call:
## glm(formula = Y ~ D + t + D * t, family = binomial, data = dfE)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.9707  -0.3779   0.3448   0.7292   1.9961  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  0.04979    1.46940   0.034   0.9730  
## D            0.02848    0.03429   0.831   0.4062  
## t           -4.47224    2.46707  -1.813   0.0699 .
## D:t          0.07460    0.05777   1.291   0.1966  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 46.180  on 34  degrees of freedom
## Residual deviance: 28.321  on 31  degrees of freedom
## AIC: 36.321
## 
## Number of Fisher Scoring iterations: 6

a(i). t = 1

When t = 1: $logit(pi) = 0.04979 + 0.02848*D - 4.47224* (1) + 0.07460*D*(1)$ $logit(pi) = -4.42245 + 0.10308*D$

exp(0.10308)

## [1] 1.10858

So when using a tracheal tube, the estimated odds of having a sore throat on waking from surgery increases by a multiplicative factor of 1.10858 for each minute increasing in duration of surgery.

a(ii). t = 0

When t = 0: $logit(pi) = 0.04979 + 0.02848*D - 4.47224* (0) + 0.07460*D*(0)$ $logit(pi) = 0.04979 + 0.02848*D$

exp(0.02848)

## [1] 1.028889

So when using a laryngeal mask, the estimated odds of having a sore throat on waking from surgery increases by a multiplicative factor of 1.028889 for each minute increasing in duration of surgery.

a(iii). testing the interaction term

$H_0:\beta_{interaction} =0$ $H_a:\beta_{interaction} \neq 0$

fitE2 = glm(Y ~ D + t, family = binomial, data=dfE)
aovE = anova(fitE, fitE2, test = "LRT"); aovE

## Analysis of Deviance Table
## 
## Model 1: Y ~ D + t + D * t
## Model 2: Y ~ D + t
##   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1        31     28.321                     
## 2        32     30.138 -1  -1.8169   0.1777

With a difference in deviance of NA, -1.8168856 and since the p-value 0.1776844 > $\alpha = .05$, we fail-to-reject the null and cannot conclude the interaction term is significant in this model. So we can remove it.

(b)

cor(dfE$Y, fitted(fitE))

## [1] 0.6598764

cor(dfE$Y, fitted(fitE2))

## [1] 0.6528899

Since these are very close, the simpler model (without interaction) does essentially as well as the full model.

(c)

In the model with no interaction term (between 𝒅 and 𝒕), obtain and clearly interpret the odds ratios for both 𝒅 (duration) and 𝒕 (type of device)

exp(fitE2$coefficients)

## (Intercept)           D           t 
##   0.2423575   1.0710910   0.1903389

The estimated odds of having a sore throat after waking up from surgery increases by a multiplicative factor of 1.071091 for each additional minute of surgery (duration), given type of device t stays constant.

The estimated odds of having a sore throat after waking up from surgery increases by a multiplicative factor of 0.1903389 when changing from laryngeal mask to a tracheal tube, given duration of surgery d stays constant.

(d)

For the model with the 𝒅 × 𝒕 interaction term, using R obtain and submit the fitted logistic curves for the 𝒕 = 𝟎 and 𝒕 = 𝟏 cases juxtaposed on the same plot

#range(dfE$D) Use this to find range of xx
xx <- seq(15, 135, length = 1000)
yy1 = predict(fitE, data.frame(D = xx, t = 0), type = "response")
yy2 = predict(fitE, data.frame(D = xx, t = 1), type = "response")
xrange = range(xx)
yrange = c(0,1)
plot(xx, yy1, xlim = xrange, ylim = yrange, col = "blue",  xlab = "Duration of Surgery", ylab = "Percent Sore Throat")
par(new = T)
plot(xx, yy2, xlim = xrange, ylim = yrange, col = "red", xlab = "", ylab = "")
legend("bottomright", lwd = 3, lty = c(1, 1), col = c("blue", "red"), legend = c("t = 0 Laryngeal Mask", "t = 1 Tracheal Tube"))

(e)

For the model without the 𝒅 × 𝒕 interaction term, using R obtain and submit the fitted logistic curves for the 𝒕 = 𝟎 and 𝒕 = 𝟏 cases juxtaposed on the same plot.

xx <- seq(15, 135, length = 1000)
yy1 = predict(fitE2, data.frame(D = xx, t = 0), type = "response")
yy2 = predict(fitE2, data.frame(D = xx, t = 1), type = "response")
xrange = range(xx)
yrange = c(0,1)
plot(xx, yy1, xlim = xrange, ylim = yrange, col = "blue", xlab = "Duration of Surgery", ylab = "Percent Sore Throat")
par(new = T)
plot(xx, yy2, xlim = xrange, ylim = yrange, col = "red", xlab = "", ylab = "")
legend("bottomright", lwd = 3, lty = c(1, 1), col = c("blue", "red"),
             legend = c("t = 0 Laryngeal Mask", "t = 1 Tracheal Tube"))

Figuring it out using curve function instead of plot:

curve(predict(fitE2, data.frame(D = x, t = 0), type = "resp"), col = "blue", lty = 4, lwd = 3, xlim = xrange, ylim = yrange, xlab = "Duration of Surgery", ylab = "Percent Sore Throat")
par(new = T)
curve(predict(fitE2, data.frame(D = x, t = 1), type = "resp"), col = "red", lty = 1, lwd = 3, xlim = xrange, ylim = yrange, xlab = "", ylab = "")
legend("bottomright", lwd = 3, lty = c(4, 1), col = c("blue", "red"), legend = c("t = 0 Laryngeal Mask", "t = 1 Tracheal Tube"))

(f)

For the model in part (e), give the estimated 𝐿𝐷50s for 𝒅 in both the 𝒕 = 𝟎 and 𝒕 = 𝟏 cases. F.

Using on the plots (since this is estimated) the LD50 for t = 0 (Laryngeal Mask) is about 23 minutes of surgery, while the LD50 for t = 1 (Tracheal Tube) is about 44 minutes of surgery.

Now to calculate them:

summary(fitE2)

## 
## Call:
## glm(formula = Y ~ D + t, family = binomial, data = dfE)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.3802  -0.5358   0.3047   0.7308   1.7821  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept) -1.41734    1.09457  -1.295  0.19536   
## D            0.06868    0.02641   2.600  0.00931 **
## t           -1.65895    0.92285  -1.798  0.07224 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 46.180  on 34  degrees of freedom
## Residual deviance: 30.138  on 32  degrees of freedom
## AIC: 36.138
## 
## Number of Fisher Scoring iterations: 5

$logit(pi) = -1.41734 + 0.06868*D - 1.65895*t$

t = 0: $logit(pi) = -1.41734 + 0.06868*D$ ld50 for t = 0

(log(.5/.5) + 1.41734)/ 0.06868

## [1] 20.63687

#Note log(1)=0, so ld50 is just alpha/beta

t = 1: $logit(pi) = -1.41734 + 0.06868*D - 1.65895*1$ $\iff logit(pi) = -3.07529 + 0.06868*D$ ld50 for t = 1

(log(.5/.5) + 3.07629)/ 0.06868

## [1] 44.79164

These are about the same as my estimations from the graph.

F.

[STAT-410 students only] p. 120, exercise 4.22. Using the reported data, obtain and report both logistic model estimated parameters (𝛼̂ and 𝛽). Show all relevant calculations.

Model: $logit(pi) = \alpha + \beta*x$

With x = 18, pi = 80: $log(.80/.20) = \alpha + \beta*18$

and with x = 65, pi = 20: $log(.20/.80) = \alpha + \beta*65$

So solving the system using elimination. I multiply the second equation by -1 and add them together: $log(.80/.20) = \alpha + \beta*18$ $+ - log(.20/.80) = -\alpha - \beta*65$

Yields: $log(.80/.20) - log(.20/.80) = -47\beta$

Solving for $\beta$:

(log(.8/.2)-log(.2/.8))/47

## [1] 0.05899125

So $\beta \approx 0.05899125$

410 HW4

Alex Blohm

3/6/2019

A.

(a)

(b)

(c)

B.

a.

b

c

d

C.

(a)

(b) and (c)

(d)

(e)

D.

(a)

(b)

(c)

E.

a. Fit model with interaction

a(i). t = 1

a(ii). t = 0

a(iii). testing the interaction term

(b)

(c)

(d)

(e)

(f)

F.