Week 4 homework

Exercise One

For this exercise, you will use the ICU dataset. You can download the icu.dta Stata file, or you can also access the data through this CSV file.

Complete the following:

a) Consider the ICU data and use as the outcome variable vital status (STA) and race (RACE) as a covariate. Prepare a table showing the coding of the two design variables for RACE using the value RACE = 1, white, as the reference group.

icu_cc <- read.csv("icu_correct_coding.csv")
names(icu_cc) <- tolower(names(icu_cc))
icu_cc[,2] <- as.factor(icu_cc[,2])
icu_cc[,5] <- factor(icu_cc[,5], levels = c("1", "2", "3"), labels = c("White", "Black", "Other"))
fit.icu <- glm(sta ~ race, data = icu_cc, family = binomial(link = logit))
summary(fit.icu)

## 
## Call:
## glm(formula = sta ~ race, family = binomial(link = logit), data = icu_cc)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.6893  -0.6893  -0.6893  -0.3715   2.3272  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.31634    0.18513  -7.110 1.16e-12 ***
## raceBlack   -1.32272    1.05152  -1.258    0.208    
## raceOther   -0.06996    0.81196  -0.086    0.931    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 200.16  on 199  degrees of freedom
## Residual deviance: 197.90  on 197  degrees of freedom
## AIC: 203.9
## 
## Number of Fisher Scoring iterations: 5

contrasts(icu_cc$race)

##       Black Other
## White     0     0
## Black     1     0
## Other     0     1

b) Show that the estimated log-odds ratios obtained from the cross-classification of STA by RACE, using RACE = 1 as the reference group, are identical to estimated slope coefficients for the two design variables from the logistic regression of STA on RACE.

\[logit(race = White) = \ln{\pi}{1 - \pi} = \beta_0 - 1.323*Black - 0.07*Other\]

library(car)
(LogOdds1 <- logit(predict(fit.icu, newdata = data.frame(race = "White"), type = "response")))

##         1 
## -1.316336

(LogOdds2 <- logit(predict(fit.icu, newdata = data.frame(race = "Black"), type = "response")))

##         1 
## -2.639057

(LogOdds3 <- logit(predict(fit.icu, newdata = data.frame(race = "Other"), type = "response")))

##         1 
## -1.386294

(OR21 <- exp(LogOdds2 - LogOdds1))   #odds ratio of dying for people of black race compared to people of white race

##         1 
## 0.2664093

log(OR21)    # taking natural logarithm of odds ratio

##         1 
## -1.322722

coef(fit.icu)[2]

## raceBlack 
## -1.322722

(OR31 <- exp(LogOdds3 - LogOdds1))  # ratio of odds of death for people of other races, that are not black, versus whites.

##         1 
## 0.9324324

log(OR31)  #log of odds ratio

##           1 
## -0.06995859

coef(fit.icu)[3]   #coefficient for race = 3, or "Other" from logistic regression output

##   raceOther 
## -0.06995859

c) Verify that the estimated standard errors of the estimated slope coefficients for the two design variables for RACE are identical to the square root of the sum of the inverse of the cell frequencies from the cross-classification of STA by RACE used to calculate the odds ratio.

(smt <- summary(fit.icu)$coef)

##                Estimate Std. Error     z value     Pr(>|z|)
## (Intercept) -1.31633577  0.1851308 -7.11030229 1.157891e-12
## raceBlack   -1.32272156  1.0515203 -1.25791343 2.084231e-01
## raceOther   -0.06995859  0.8119565 -0.08616051 9.313388e-01

(tbl <- table(icu_cc$sta, icu_cc$race))

##    
##     White Black Other
##   0   138    14     8
##   1    37     1     2

(beta_B <- smt[2,1])

## [1] -1.322722

(SE_Black <- sqrt(1/138 + 1/37 + 1/14 + 1/1))

## [1] 1.051524

smt[2, 2]

## [1] 1.05152

(beta_O <- smt[3, 1])

## [1] -0.06995859

(SE_Other <- sqrt(1/138 + 1/37 + 1/8 + 1/2))

## [1] 0.8119565

smt[3, 2]

## [1] 0.8119565

d) Use either set of computations to compute the 95% Confidence Interval for the odds ratios.

(CI_OR21 <- exp(beta_B + c(-1, 1) * qnorm(0.975, lower.tail = TRUE) * SE_Black))

## [1] 0.03392273 2.09222254

(CI31 <- exp(beta_O + c(-1, 1) * qnorm(0.975, lower.tail = TRUE) * SE_Other))

## [1] 0.1898798 4.5788458

Odds of dying are from 93% less to 2 times higher for blacks versus whites, while odds of dying for people of other races than black or white are from 81% lower to 4.5 times higher than odds of whites of dying when addmitted to intensive care unit.

Exercise Two

For this exercise, you will continue using the ICU dataset.

Complete the following:

a) Create design variables for RACE using the method typically employed in ANOVA.

#  "method typically employed in ANOVA" -> deviation from means coding:
(mt <- rbind( c( -1, -1), diag( c( 1, 1))))

##      [,1] [,2]
## [1,]   -1   -1
## [2,]    1    0
## [3,]    0    1

contrasts(icu_cc$race)

##       Black Other
## White     0     0
## Black     1     0
## Other     0     1

contrasts(icu_cc$race) <- mt
contrasts(icu_cc$race)  #deviation from means coding

##       [,1] [,2]
## White   -1   -1
## Black    1    0
## Other    0    1

b) Perform the logistic regression of STA on RACE.

fit.icu.dfmc <- glm(sta ~ race, data = icu_cc, family = binomial(link = logit))
summary(fit.icu.dfmc)

## 
## Call:
## glm(formula = sta ~ race, family = binomial(link = logit), data = icu_cc)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.6893  -0.6893  -0.6893  -0.3715   2.3272  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -1.7806     0.4385  -4.060  4.9e-05 ***
## race1        -0.8585     0.7412  -1.158    0.247    
## race2         0.3943     0.6330   0.623    0.533    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 200.16  on 199  degrees of freedom
## Residual deviance: 197.90  on 197  degrees of freedom
## AIC: 203.9
## 
## Number of Fisher Scoring iterations: 5

(coefs <- fit.icu.dfmc$coef)

## (Intercept)       race1       race2 
##  -1.7805625  -0.8584948   0.3942681

c) Show by calculation that the estimated logit differences of RACE = 2 versus RACE = 1 and RACE = 3 versus RACE = 1 are equivalent to the values of the log-odds ratio obtained in exercise 1.

(logitW <- logit(predict(fit.icu.dfmc, newdata = data.frame(race = "White"), type = "response")))

##         1 
## -1.316336

coefs[1] + coefs[2] * (-1) + coefs[3] * (-1)  # equivalent to logitW, as well as the intercept from the reference cell coding fit

## (Intercept) 
##   -1.316336

(logitB <- logit(predict(fit.icu.dfmc, newdata = data.frame(race = "Black"), type = "response")))

##         1 
## -2.639057

coefs[1] + coefs[2] * 1 + coefs[3] * 0

## (Intercept) 
##   -2.639057

(logitO <- logit(predict(fit.icu.dfmc, newdata = data.frame(race = "Other"), type = "response")))

##         1 
## -1.386294

coefs[1] + coefs[2] * 0 + coefs[3] * 1

## (Intercept) 
##   -1.386294

coef(fit.icu)  # coefficients from reference cell coding

## (Intercept)   raceBlack   raceOther 
## -1.31633577 -1.32272156 -0.06995859

(logOR21 <- logitB - logitW)   # identical to coefficient for race = "Black" from ref. cell coding

##         1 
## -1.322722

(logOR31 <- logitO - logitW)   # identical to coefficient for race = "Other" from rcc

##           1 
## -0.06995859

d) Use the results of the logistic regression to obtain the 95% Confidence Interval for the odds ratios and verify that they are the same limits as obtained in exercise 1. Note that the estimated covariance matrix for the estimated coefficients is needed to obtain the estimated variances of the logit differences.

\(in\ deviation\ from\ means\ coding:\) \[logit(race = "White") = \beta_0 + \beta_1 * (-1) + \beta_2 * (-1)\] \[logit(race = "Black") = \beta_0 + \beta_1 * 1 + \beta_2 * 0\] \[logit(race = "Other") = \beta_0 + \beta_1 * 0 + \beta_2 * 1\] \[\log(OR("Black", "White")) = logit(race = "Black") - logit(race = "White")\]

\[\log(OR("Black", "White")) = \beta_0 + \beta_1 - \beta_0 + \beta_1 + \beta_2 = 2 * \beta_1 + \beta_2\] \[Var(log(OR("Black", "White"))) = (2 * \beta_1 + \beta_2)^2\] \[Var(log(OR("Black", "White"))) = 4 * Var(\beta_1) + 4 * Cov(\beta_1, \beta_2) + Var(\beta_2) \]

\[\log(OR("Other", "White")) = logit(race = "Other") - logit(race = "White") \] \[\log(OR("Other", "White")) = \beta_0 + \beta_2 - \beta_0 + \beta_1 + \beta_2 = 2 * \beta_2 + \beta_1\] \[Var(log(OR("Other", "White"))) = (2 * \beta_2 + \beta_1)^2\] \[Var(log(OR("Other", "White"))) = 4 * Var(\beta_2) + 4 * Cov(\beta_1, \beta_2) + Var(\beta_1) \]

(vc_dfmc <- vcov(fit.icu.dfmc))

##             (Intercept)      race1       race2
## (Intercept)  0.19229945  0.1648411  0.01603389
## race1        0.16484109  0.5494400 -0.37317442
## race2        0.01603389 -0.3731744  0.40063278

(VarOR_BW <- 4 * vc_dfmc[2,2] + 4 * vc_dfmc[2,3] + vc_dfmc[3,3])

## [1] 1.105695

(seOR_BW <- sqrt(VarOR_BW))

## [1] 1.05152

(VarOR_OW <- 4 * vc_dfmc[3,3] + 4 * vc_dfmc[2,3] + vc_dfmc[2,2])

## [1] 0.6592734

(seOR_OW <- sqrt(VarOR_OW))

## [1] 0.8119565

(ciOR_BW <- exp(logOR21 + c(-1, 1) * qnorm(0.975, lower.tail = TRUE) * seOR_BW))

## [1] 0.03392295 2.09220897

(ciOR_OW <- exp(logOR31 + c(-1, 1) * qnorm(0.975, lower.tail = TRUE) * seOR_OW))

## [1] 0.1898798 4.5788458