Question 1

prediabetes = read.csv("prediabetes.csv", header = TRUE)

table(prediabetes$racecode, prediabetes$predia)

##       
##          0   1
##   -0.5 308  78
##   0.5   65  38

#########################################
#Probabilities and odds from table above:
#########################################
prob_diabetes_white = 78/(308+78) #probability of diabetes given White = .202
prob_diabetes_asian = 38/(65+38) #probability of diabetes given Asian = .368

odds_diabetes_white = prob_diabetes_white/(1-prob_diabetes_white) #odds of diabetes given White = .253
odds_diabetes_asian = prob_diabetes_asian/(1-prob_diabetes_asian) #odds of diabetes given Asian = .584

odds_ratio = odds_diabetes_asian/odds_diabetes_white #diabetes odds ratio relative to Whites = 2.308

Briefly, the odds ratio calculated above registers the odds of receiving a diabetes diagnosis given that one’s race is Asian relative to the same odds for the “baseline” White population. Given this interpretation, this number seems quite large and potentially important: its value of ~2.3 suggests that the odds of receiving a diabetes diagnosis versus remaining healthy are over two times greater among Asians than Whites in this sample.

Question 2

#estimate and examine logistic models testing for effect of race on prediabetic probability

predia_raceOnly_A <- glm(predia ~ racecode, data=prediabetes, family=binomial)
predia_raceOnly_C <- glm(predia ~ 1, data=prediabetes, family=binomial)

summary(predia_raceOnly_A)

## 
## Call:
## glm(formula = predia ~ racecode, family = binomial, data = prediabetes)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.9595  -0.6719  -0.6719  -0.6719   1.7884  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.9551     0.1202  -7.948  1.9e-15 ***
## racecode      0.8366     0.2403   3.481    5e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 535.80  on 488  degrees of freedom
## Residual deviance: 524.15  on 487  degrees of freedom
## AIC: 528.15
## 
## Number of Fisher Scoring iterations: 4

summary(predia_raceOnly_C)

## 
## Call:
## glm(formula = predia ~ 1, family = binomial, data = prediabetes)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.7359  -0.7359  -0.7359  -0.7359   1.6963  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -1.1680     0.1063  -10.99   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 535.8  on 488  degrees of freedom
## Residual deviance: 535.8  on 488  degrees of freedom
## AIC: 537.8
## 
## Number of Fisher Scoring iterations: 4

exp(predia_raceOnly_C$coefficients[1]) #exponentiate Model C intecept coeff: e^(intercept) = .310

## (Intercept) 
##    0.310992

exp(predia_raceOnly_A$coefficients[1]) #exponentiate intecept coeff: e^(intercept) = .384

## (Intercept) 
##   0.3847752

exp(predia_raceOnly_A$coefficients[2]) #exponentiate slope coeff: e^(slope) = 2.308

## racecode 
## 2.308481

anova(predia_raceOnly_C, predia_raceOnly_A, test="Chisq") #effect of race highly significant (p = .0006405)

## Analysis of Deviance Table
## 
## Model 1: predia ~ 1
## Model 2: predia ~ racecode
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1       488     535.80                          
## 2       487     524.15  1   11.654 0.0006405 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

white_odds = exp(predia_raceOnly_A$coefficients[1] - .5*(predia_raceOnly_A$coefficients[2])) #.253
asian_odds = exp(predia_raceOnly_A$coefficients[1] + .5*(predia_raceOnly_A$coefficients[2])) #.584

In the output above, the exponentiated versions of the intercept and slope estimates have mathematically precise interpretations. That is, the exponentiated intercept from Model C is the odds ratio for the full sample of both Asians and Whites; a quick calculation demonstrates that the estimate from the model (\(e^{\beta_{0-ModelC}} = .3109\)) matches up exactly with that directly obtainable from the sample: \[ p_{total} = (78+38)/489 = .237 \] \[ odds_{total} \approx .310 \] Similarly, the exponentiated slope of race in Model A is simply the odds ratio between Asian and White prediabetics (\(e^{\beta_{1-ModelA}} = 2.308\)), which is exactly equal to that value found above in Question 1. Finally, the exponentiated intercept from Model A (\(e^{\beta_{1-ModelA}} = .384\)) is the odds of prediabetic status when racecode = 0, which occurs for a hypothetical person midway between the disparate odds of Whites and Asians. Mathematically, this quantity can be derived directly from the data by taking the mean of the log odds for Whites and Asians and exponentiating the result: \[ (log(odds_{Asian}) + log(odds_{White}))/2 = (log(.584) + log(.253))/2 = -.95 \] \[ odds_{racecode = 0} = e^{-.95} \approx .386 \] This is exactly equal to the same value calculated above in Question 1.

An A-C model comparison clearly shows that, without controls, race is a significant logistic predictor of prediabetic status ((1) = 11.65, p = .00064).

Exponentiating the linear equation with racecode set to -.5 (Whites) and .5 (Asians) shows that prediabetic odds for both groups are substantially less than 0: \[ odds_{White} = e^{\beta{0} -.5*\beta_{1}} = e^{-.955 -.5*.836} = .253 \] \[ odds_{Asian} = e^{\beta{0} -.5*\beta_{1}} = e^{-.955 +.5*.836} = .584 \] Importantly, these results are both in line with those calculated directly from the data in Question 1.

Question 3

#estimate and examine logistic models testing for effect of BMIz on prediabetic probability

predia_BMI_A <- glm(predia ~ BMIz, data=prediabetes, family=binomial)
predia_BMI_C <- glm(predia ~ 1, data=prediabetes, family=binomial)

summary(predia_BMI_A)

## 
## Call:
## glm(formula = predia ~ BMIz, family = binomial, data = prediabetes)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.0947  -0.7653  -0.6535  -0.4729   2.1483  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -1.2147     0.1106 -10.986  < 2e-16 ***
## BMIz          0.4291     0.1104   3.888 0.000101 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 535.80  on 488  degrees of freedom
## Residual deviance: 520.06  on 487  degrees of freedom
## AIC: 524.06
## 
## Number of Fisher Scoring iterations: 4

summary(predia_BMI_C)

## 
## Call:
## glm(formula = predia ~ 1, family = binomial, data = prediabetes)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.7359  -0.7359  -0.7359  -0.7359   1.6963  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -1.1680     0.1063  -10.99   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 535.8  on 488  degrees of freedom
## Residual deviance: 535.8  on 488  degrees of freedom
## AIC: 537.8
## 
## Number of Fisher Scoring iterations: 4

anova(predia_BMI_C, predia_BMI_A, test="Chisq") #effect of BMIz highly significant (p = 7.26e-5)

## Analysis of Deviance Table
## 
## Model 1: predia ~ 1
## Model 2: predia ~ BMIz
##   Resid. Df Resid. Dev Df Deviance  Pr(>Chi)    
## 1       488     535.80                          
## 2       487     520.06  1    15.74 7.268e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

For every 1SD increase in BMIz, the log odds will increase by a constant .4291 units. However, this is not true when considering odds; in this case, every 1SD increase in BMIz will increase prediabetic odds by an amount dependent on the previous value of BMIz. Compare, for instance, the effect of increasing BMIz from either 1 to 2 or 3 to 4 on the log odds and odds:

Log odds: \[ log(odds_{BMIz = 1}) = -1.21 + .4291 = -.781 \\ log(odds_{BMIz = 2}) = -1.21 + 2*.4291 = -.351 \\ Difference = .43 \] \[ log(odds_{BMIz = 3}) = -1.21 + 3*.4291 = .0773 \\ log(odds_{BMIz = 4}) = -1.21 + 4*.4291 = .506 \\ Difference = .43 \]

Odds: \[ odds_{BMIz = 1} = e^{-1.21 + .4291} = .457 \\ odds_{BMIz = 2} = e^{-1.21 + 2*.4291} = .703 \\ Difference = .246 \] \[ odds_{BMIz = 3} = e^{-1.21 + 3*.4291} = 1.08 \\ odds_{BMIz = 4} = e^{-1.21 + 4*.4291} = 1.659 \\ Difference = .579 \]

Notice that the differences are the same in the case of log odds but not odds.

An A-C model comparison clearly shows that, without controls, BMIz is a significant logistic predictor of prediabetic status (Deviance(1) = 15.74, p = 7.26e-5).

HM22_SMM

Spencer Moore

4/22/2021

Question 1

Question 2

Question 3