🥁 Statistics Rap [Linear Regression]


Agenda for today

1. Check in understanding (20 mins)

2. Implementing Multiple linear regression in SAS (30 mins)

3. DAGs think-pair-share (15 mins)


Check in understanding

Q1: If there is a confounder and a collider in your DAG, which would you control for in your model?

Q2: What does sr2 in MLR stand for?

Q3: If the two predictors (x1, x2) in your MLR is highly correlated, the sum of their sr2 will be … (hint: draw a Ballentine diagram)?

Q4: What are the values of the degree of freedom and \(\chi\)2 for the \(\chi\)2 test comparing two nested models (M1 and M2)?

Q5: To conduct a cross-validation, in which sample(s) you would run an OLS (i.e. attain parameter estimates)?

Q6: Which statement(s) about variance inflation factor (VIF) is correct?


Multiple linear regression examples in SAS

Data source: 2018 North Carolina BFRSS study (n=4,526)


First, fit the SLR from Week 2: MUD = a + b*sleep + error (Model 1)

The output suggested that in the current sample, longer daily sleep is associated with fewer mentally unhealthy days (or better mental health).



Considering gender may confound the relationship between sleep and MUDs, in model 2, add male variable.


Model 2: MUD = a + b1 * sleep + b2 * male + error



How would you interpret these results?

R2
Intercept
Partial regression coefficients: Sleep, Male



Interpretations
Intercept: In the current sample, the expected mentally unhealthy days in the previous month of a female (ie. male=0) with no sleep (sleep=0) was 9.86 days (95% CI: 8.67, 11.05).
Partial regression coefficient for sleep: Controlling/Adjusting for sex (holding sex constant), an one-hour increase in daily sleep was significantly associated with an expected 0.78 fewer mentally unhealthy days in the previous month (95% CI: -0.94, -0.62).
Partial regression coefficient for male: Compared to females, males were expected to have 1.01 fewer days of being mentally unhealthy in the previous month(95% CI: -1.49, -0.54), holding daily sleep time constant.


We can also compare Model 2 with Model 1: test signficance of the change in R2

\(\chi\)2 = \(\Delta\)F = 86.27 - 52.10 = 34.17
\(\Delta\)df = 2 - 1 = 1

Multicollinearity checking

Model 3: MUD = a + b1 * sleep + b2 * male + b3 * kid + error
Check VIFs and condition index: look okay!


Breakout room activity: DAGs think-pair-share

  • Present your DAG to your partner (1~2 mins/person)
  • Then you may discuss the following questions:
    • How did you decide the variables (and their relationships) to include in your DAG?
    • What are some challenges that you had working on this exercise?
    • Which are some unexpected paths or variables you were able to identify through this exercise or you find in your partner’s DAG?

 

(Optional) Weighted least squares & PROC SURVEYREG

Like many population-based surveys, BRFSS has a specific survey design. Often researchers are more interested in using the survey design information in their analyses to make inferences about the population (in this case, the NC population), instead of the sample.
After adjusting the survey design in SAS, the same model would have different estimates (compared with unweighted Model 2.


Next week: Assumptions and diagnostics

Additional resources