🎚 Katy Perry - Hot N Cold


agenda for today

check-in
logistic regression recap
examples in SAS


anouncements & check-in

pdf versions of my handouts will be available on Sakai.
“selfcare” in the middle of semester Virtual wellness week events


logistic regression recap

poll (10 mins) + answers


Q1. For which outcome variable(s) you would prefer logistic regression over linear regression?
a. individual daily consumption of vegetables (unit: serving)
b. parental depressive symptoms, measured by a validated 9-item scale
c. youth vaping experience (have vaped vs never vaped)

Q2. Which residual assumptions are violated for binary outcome?
a. normality
b. independence
c. homoscedasticity

Q3. What is the function used in logistic regression?
a. log function
b. logit function
c. logistic function

Q4. What is the yhat that we are modeling in logistic regression?
a. probability that outcome=1
b. odds that outcome=1
c. log odds that outcome=1

Q5. What is the equivalent term of partial regression coefficients (MLR) in a logistic regression model?
a. odds
b. crude/unadjusted odds ratio
c. adjusted odds ratio
d. relative odds

Q6. If the probability that a UNC undergraduate ever studied abroad is p=0.25, what are the odds of no study abroad experience (vs study abroad experience)?

Q7. For a predictor in a logistic regression, if its point estimate is 0.80 and SE is 0.33, what are the OR and 95%CI?
a. OR=0.80, 95%CI=0.15,1.45
b. OR=2.23, 95%CI=1.58,2.88
c. OR=2.23, 95%CI=1.16,4.26

examples in SAS

syntax file: HB761_Recitation_Week7.sas
data file:heart.sas7bdat, downloaded from UCI machine learning repository


outcome: hd: diagnosis of heart disease (Yes = 1, no = 0)
predictors: sex: female = 0, male = 1; age: age in yrs; trestbps: resting blood pressure in mm Hg; chol: serum cholestoral in mg/dl


check distributions and explore bivariate relationships

proc freq data=heart;
    tables hd*sex/OR; *OR output the odds ratio;
    format hd hd_f.
           sex sex_f.;
run;
proc univariate;
    var age trestbps chol;
run;



proc sgplot data=heart;
    title "boxplot of age by heart disease diagnosis";
    vbox age/ group = hd; 
    yaxis label = "age";
    format hd hd_f.;
run;
title; *reset title;



proc sgplot data=heart;
    title "boxplot of resting blood pressure by heart disease diagnosis";
    vbox trestbps/ group = hd; 
    yaxis label = "resting blood pressure";
    format hd hd_f.;
run;



proc sgplot data=heart;
    title "boxplot of cholestoral by heart disease diagnosis";
    vbox chol / group = hd; 
    yaxis label = "serum cholestoral";
    format hd hd_f.;
run;



logistic regression models


crude model: logit(p(hd=1)) = a + b*sex

proc logistic data=heart descending; /*need "descending" to model hd=1*/
    title "logistic regression of heart disease";
    model hd=sex;
run; quit;
/*or specify event as 1*/
proc logistic data=heart;
    title "logistic regression of heart disease";
    model hd (event="1")=sex;
run; quit;


Interpretations

exp(a): the odds that a female is diagnosed with heart disease is 0.35
exp(b): compared with females, males are 3.574 times more likely to be diagnosed with heart disease (crude OR=3.574, 95%CI=2.095, 6.097).


adjusted model: logit(p(hd=1)) = a + b1* sex + b2* age + b3* chol + b4* trestbps

proc logistic data=heart;
    model hd (event="1")=sex age chol trestbps;
run;quit;


Interpretations

exp(a): meaningless
exp(b1): controlling for age, serum cholestoral, and resting blood pressure, males are 5.318 times more likely to be diagnosed with heart disease than females (aOR=5.138, 95%CI=2.899, 9.755).
exp(b2): controlling for sex, serum cholestoral, and resting blood pressure, a one-year increase in age is associated with 1.058 factor increase in the odds of being diagnosed with heart disease than females (aOR=1.058, 95%CI=1.026, 1.091).


use estimate statement to further probe sex effects

proc logistic data=heart descending;
    model hd=sex age chol trestbps;
    estimate "log odds of hd=1 (female, 55 yrs, chol= 140, trestbps= 270)" sex 0 age 55 chol 140 trestbps 270;
    estimate "log odds of hd=1 (male, 55 yrs, chol= 140, trestbps= 270)" sex 1 age 55 chol 140 trestbps 270;
run;quit;


(optional) measures of association between exposure and a binary outcome

risk ratio
  • risk, attack rate, incidence proportion: % diseased
  • RR = risk in exposed group / risk in non-exposed group
  • RR > 1 -> positive relation between exposure and outcome
  • RR < 1 -> exposure is protective factor
odds ratio, relative risk
  • odds of disease (vs non-diseased)
  • OR = odds in exposed group / odds in non-exposed group
  • OR > 1 -> positive relation between exposure and outcome
RR or OR?
  • RR is easier understood. For comparison, one can also calculate risk difference.
  • OR can be used in case-control studies: odds of exposed (vs non-exposed).
  • For rare outcomes, OR is close to RR.