agenda for today

check-in

recap: generalized linear model

ordinal, poisson, and negative binomial regression in SAS


recap

Hint: some questions may have more than one best answers 😄

Q1. Which term includes more models, general linear model or generalized linear model?

  1. generalized linear model
  2. general linear model
  3. aren’t they the same thing?


Q2. which model you would choose if your outcome variable is ordinal with equal spacing?

  1. poisson regression
  2. multinomial regression
  3. ordinary linear regression with a quadratic term
  4. proportional odds regression


Q3. If you are trying to predict individuals’ favorite Girl Scout cookies using their demographic information, which modeling approach you will choose? (Note: you must choose one!)

  1. poisson regression
  2. multinomial regression
  3. ordinary linear regression with quadratic terms
  4. proportional odds regression


Q4. What if you are predicting how many boxes of cookies individuals bought over the past two weeks?

  1. poisson regression
  2. negative binomial regression
  3. ordinary linear regression
  4. proportional odds regression


model count and ordinal variables in SAS

syntax file: HB761_Recitation_Week9.sas


model count variables

PROC GENMOD - Generalized Linear Models


data source dataset: BRFSS - North Carolina

outcome: MUD (range=0-30, count/integer), number of perceived mentally unhealthy days in the past 30 days.

predictors: sex (male=1, female=0); sleep (average daily hours of sleep in the past month).


Poisson regression: link=log, dist=poisson

proc genmod data=brfss;
    model MUD=sleep_c male/link=log dist=poisson;
    output out=pofit pred=yhat_po;
    estimate "log odds - male" int 0 sleep_c 0 male 1 / exp; 
    estimate "log odds - sleep_c" int 0 sleep_c 1 male 0 / exp;
run;




Interpretations

Intercept: expected MUD for a female with 7 hours of daily sleep is exp(2.9132)=18.5 days.
Male: being male was associated with a 0.77 factor decrease in expected number of MUD, controlling for daily sleep hour.
Sleep_c: a one hour increase in average daily sleep hour is associated with an exp(-0.2107) = 0.81 factor decrease in expected MUD, controlling for sex.


Negative Binomial regression: link=log, dist=nb


proc genmod data=brfss;
    model MUD=sleep_c male/link=log dist=nb;
    output out=nbfit pred=yhat_nb;
run;


We would choose Negative Binomial over Poission because estimate of dispersion paramater is significant from zero (estimate=8, 95%CI:7.53, 8.52)


Can also compare AIC/BIC (a smaller number is referred)

Poisson: AIC=56676.6212, BIC=56695.8739
Negative binomial: AIC=16384.7893, BIC=16410.4597 ❤️


Interpretations: similar to Poisson regression.


OLS: link=id, dist=normal


proc genmod data=brfss;
     model MUD=sleep_c male/link=id dist=normal;
     output out=lfit pred=yhat_l;
run;


Logistic regression: link=logit dist=binomial

proc genmod data=temp1 descending;
    class male;
    model FMD=sleep male/link=logit dist=binomial;
run;


[optional] Some visualizations

proc gplot data=pofit;
    plot yhat_po*MUD;
run; quit;


proc sgplot data=pofit;
    histogram MUD/ binwidth=1 transparency=0.5
                    name='o' legendlabel= "observed";
    histogram yhat_po/ binwidth=1 transparency=0.5
                    name='p' legendlabel= "yhat_Poisson";
    keylegend 'o' 'p'/ location=inside position=topright across=1 noborder;
    yaxis offsetmin=0;
    xaxis display=(nolabel);
run;



proc sort data=pofit;
    by male sleep_c;
run;
proc sgplot data = pofit;
  series x = sleep_c y = yhat_po/group=male;
run;




model ordinal variables

proportional odds assumption:

predictor effects on the odds of increasing adjacent response categories, b, are constant across all adjacent categories

outcome variable: levels of function status, ranging from fully active (0) to disabled (4)

predictor of focal interest: whether or not liver Metastasis


proc logistic data=hcc descending;
    model fs = metastasis sex ageDiag /aggregate;
run;


Score test: a non-significant p indicates the proportionated odds assumption is appropriate.


For practice purposes, let’s still try to interpret the outputs!


Intercepts: [ex: 4] Log odds for being in categories of selfcare (fs=3) or ambulatory (fs=2) or restricted (fs=1) or active (fs=0) versus being in category of disabled (fs=4) when all covariates equal 0 (i.e. female, 60 yrs, no metastasis).


metastasis: HCC patients with metastasis were 2.67 times more likely to endorse a higher response category (i.e. poorer function) of function status than those without metastasis (OR=2.67, 95%CI=1.34,5.28), after adjusting for age at diagnosis and sex.


additional resources

POISSON REGRESSION | SAS DATA ANALYSIS EXAMPLES
NEGATIVE BINOMIAL REGRESSION | SAS DATA ANALYSIS EXAMPLES
On example of visualization for model comparison (Zhou et al.)
👧 Indulgent treats