🎺 Katy Perry - Hot N Cold


agenda for today

check-in
logistic regression recap questions - 11:40
examples in SAS - Noon


check-in


logistic regression recap


Q1. For which outcome variable(s) you would prefer logistic regression over linear regression?
a. individual daily consumption of vegetables (unit: serving)
b. parental depressive symptoms, measured by a validated 9-item scale
c. youth vaping experience (have vaped vs never vaped)

Q2. For which two models you would use a LRT to compare (i.e. nested models? (More than one correct answers)
a. a logistic regression model predicts Alzheimer’s diagnosis using a scale of memory impairment
b. a logistic regression model predicts Alzheimer’s diagnosis using a scale of memory impairment and a scale of changes in personality
c. a logistic regression model predicts Alzheimer’s diagnosis using a scale of memory impairment scale, a scale of cognitive skills, and age
d. null model of Alzheimer’s diagnostics

Q3. What is the link function used in logistic regression?
a. log function
b. logit function
c. logistic function

Q4. What is the yhat that we want to model in logistic regression?
a. probability that outcome=1
b. odds that outcome=1
c. log odds that outcome=1

Q5. What is the yhat that we would get if we do not specify descending option in PROC LOGISTIC?
a. probability that outcome=1
b. odds that outcome=1
c. probability that outcome=0

Q7. What is the equivalent term of partial regression coefficients (MLR) in a logistic regression model?
a. odds
b. crude/unadjusted odds ratio
c. adjusted odds ratio
d. relative odds

Q8. If the probability that a UNC undergraduate ever studied abroad is p=0.25, what are the odds of no study abroad experience (vs study abroad experience)?

Q9. For a predictor in a logistic regression, if its point estimate is 0.80 and SE is 0.33, what are the OR and 95%CI?
a. OR=0.80, 95%CI=0.15,1.45
b. OR=2.23, 95%CI=1.58,2.88
c. OR=2.23, 95%CI=1.16,4.26

examples in SAS: interpret binary predictor and practice model evaluation and model comparison

syntax file: HB761_Recitation_Week7.sas
data file:heart.sas7bdat, downloaded from UCI machine learning repository


outcome: hd: diagnosis of heart disease (Yes = 1, no = 0)
predictors: sex: female = 0, male = 1; age: age in yrs; trestbps: resting blood pressure in mm Hg; chol: serum cholestoral in mg/dl


check distributions and explore bivariate relationships

proc freq data=heart;
    tables hd*sex/OR; *OR output the odds ratio;
    format hd hd_f.
           sex sex_f.;
run;
proc univariate;
    var age trestbps chol;
run;



proc sgplot data=heart;
    title "boxplot of age by heart disease diagnosis";
    vbox age/ group = hd; 
    yaxis label = "age";
    format hd hd_f.;
run;
title; *reset title;



proc sgplot data=heart;
    title "boxplot of resting blood pressure by heart disease diagnosis";
    vbox trestbps/ group = hd; 
    yaxis label = "resting blood pressure";
    format hd hd_f.;
run;



proc sgplot data=heart;
    title "boxplot of cholestoral by heart disease diagnosis";
    vbox chol / group = hd; 
    yaxis label = "serum cholestoral";
    format hd hd_f.;
run;



logistic regression models


crude model: logit(p(hd=1)) = a + b*sex

proc logistic data=heart descending; /*need "descending" to model hd=1*/
    title "logistic regression of heart disease";
    model hd=sex;
run; quit;
/*or specify event as 1*/
proc logistic data=heart;
    title "logistic regression of heart disease";
    model hd (event="1")=sex;
run; quit;


Interpretations

exp(a): the odds that a female is diagnosed with heart disease is 0.35
exp(b): compared with females, males are 3.574 times more likely to be diagnosed with heart disease (crude OR=3.574, 95%CI=2.095, 6.097).


adjusted model: logit(p(hd=1)) = a + b1* sex + b2* age + b3* chol + b4* trestbps

proc logistic data=heart;
    model hd (event="1")=sex age chol trestbps;
run;quit;


Interpretations

exp(a): meaningless
exp(b1): controlling for age, serum cholestoral, and resting blood pressure, males are 5.318 times more likely to be diagnosed with heart disease than females (aOR=5.138, 95%CI=2.899, 9.755).
exp(b2): controlling for sex, serum cholestoral, and resting blood pressure, a one-year increase in age is associated with 1.058 factor increase in the odds of being diagnosed with heart disease than females (aOR=1.058, 95%CI=1.026, 1.091).


Because these two models are nested, we can conduct a LRT to compare them.


Demonstrate calculation in class😄


use estimate statement to further probe sex effects

proc logistic data=heart descending;
    model hd=sex age chol trestbps;
    estimate "log odds of hd=1 (female, 55 yrs, chol= 140, trestbps= 270)" sex 0 age 55 chol 140 trestbps 270;
    estimate "log odds of hd=1 (male, 55 yrs, chol= 140, trestbps= 270)" sex 1 age 55 chol 140 trestbps 270;
run;quit;


(some optional contents; review independently if interested) measures of association between exposure and a binary outcome

risk ratio
  • risk, attack rate, incidence proportion: % diseased
  • RR = risk in exposed group / risk in non-exposed group
  • RR > 1 -> positive relation between exposure and outcome
  • RR < 1 -> exposure is protective factor
odds ratio, relative risk
  • odds of disease (vs non-diseased)
  • OR = odds in exposed group / odds in non-exposed group
  • OR > 1 -> positive relation between exposure and outcome
RR or OR?
  • RR is easier understood. For comparison, one can also calculate risk difference.
  • OR can be used in case-control studies: odds of exposed (vs non-exposed).
  • For rare outcomes, OR is close to RR.


Next week: Generalized Linear Models