The primary research question was to examine how time and demographic variables predict depressive symptom severity as measured by the Patient Health Questionnaire‑9 (PHQ‑9). To evaluate the robustness of results to measurement error and outliers, three analytic approaches are compared:
Demographic predictors included:
The comparison of these models allow assessment of (a) whether latent modeling alters inferences relative to summed scores, and (b) whether results are stable when down‑weighting outliers.
The secondary research question was the following: based on the Interpersonal-Psychological Theory of Suicide Behavior (commonly known as Joiner’s theory) and reflecting the established trends of increasing risk of suicide in college aged young adults, we hypothesize increased distress over time in the following variables will relate to corresponding increases in mean PHQ9 scores over time. The same analytical methods mentioned above will be used.
For Model 1 for both research questions, missingness in the total PHQ‑9 score followed listwise deletion unless otherwise noted. For Model 2 and Model 3 for both research questions, the CFA estimated using Full Information Maximum Likelihood (FIML) to retain cases with partial item‑level missingness.
Categorical predictors (race, gender, sexual orientation, class year, varsity athlete, transfer status) were dummy‑coded. Reference groups are listed above and were selected based on theoretical relevance or sample size. Time was coded as a numeric index, representing semester, starting with Fall 2017.
Before Models 2 and 3, a Confirmatory Factor Analysis (CFA) was fit to the nine PHQ‑9 items using a single‑factor model, consistent with evidence supporting a strong general depression factor in the PHQ‑9.
| Gender | n (%) |
|---|---|
| Woman | 1850 (76.7%) |
| Man | 477 (19.8%) |
| Non-binary | 46 (1.9%) |
| PNA | 21 (0.9%) |
| Trans man | 10 (0.4%) |
| Trans woman | 5 (0.2%) |
| NA | 4 (0.2%) |
| Class | n (%) |
|---|---|
| First year | 662 (27.4%) |
| Junior | 578 (24%) |
| Sophomore | 539 (22.3%) |
| Senior | 415 (17.2%) |
| Senior+ | 178 (7.4%) |
| Post bacc | 40 (1.7%) |
| NA | 1 (0%) |
| Sexual Orientation | n (%) |
|---|---|
| Heterosexual | 1634 (67.7%) |
| Bisexual | 293 (12.1%) |
| PNA | 108 (4.5%) |
| Gay/lesbian | 106 (4.4%) |
| Questioning | 85 (3.5%) |
| Asexual | 66 (2.7%) |
| Queer | 46 (1.9%) |
| Pansexual | 37 (1.5%) |
| DNI | 28 (1.2%) |
| Panromantic | 5 (0.2%) |
| NA | 5 (0.2%) |
| Race | n (%) |
|---|---|
| Native American/Alaskan Native | 1807 (74.9%) |
| African/Afro-Caribbean/Black | 224 (9.3%) |
| Multi-ethnic | 118 (4.9%) |
| White | 90 (3.7%) |
| Asia/PI | 90 (3.7%) |
| PNA | 48 (2%) |
| DNI | 15 (0.6%) |
| Arab/ME | 13 (0.5%) |
| NA | 8 (0.3%) |
| Varsity athlete | n (%) |
|---|---|
| No | 2319 (96.1%) |
| Yes | 89 (3.7%) |
| NA | 5 (0.2%) |
| Transfer | n (%) |
|---|---|
| No | 1858 (77%) |
| Yes | 541 (22.4%) |
| NA | 14 (0.6%) |
A linear regression model was estimated: PHQ9 Total=β0+β1Time+β2Race+…+βkTransfer+ϵ.
Assumptions of linearity, homoscedasticity, normality of residuals, and influence diagnostics (standardized residuals, leverage, Cook’s distance) were examined. Diagnostics identical to Model 1 were performed to evaluate residual patterns and influential observations. These graphs are suppressed for this report but available.
Call:
lm(formula = Score ~ Period + MClass + MRace + MSorient + MGender +
Varsitya2 + Transfer2, data = data)
Residuals:
Min 1Q Median 3Q Max
-15.7981 -3.8806 -0.0383 3.8505 13.4061
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.80676 0.40946 31.277 < 2e-16 ***
Period 0.07407 0.02685 2.758 0.00585 **
MClassNot first year 0.03937 0.25881 0.152 0.87910
MRaceWhite 0.79909 0.57905 1.380 0.16772
MSorientNot heterosexual 2.11096 0.24074 8.769 < 2e-16 ***
MGenderNot a man/DNI -0.21507 0.28056 -0.767 0.44340
Varsitya2Yes -0.89803 0.60141 -1.493 0.13552
Transfer2Yes 0.38951 0.27442 1.419 0.15591
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5.387 on 2373 degrees of freedom
(32 observations deleted due to missingness)
Multiple R-squared: 0.04041, Adjusted R-squared: 0.03758
F-statistic: 14.27 on 7 and 2373 DF, p-value: < 2.2e-16
An ordinary least squares (OLS) regression was fitted with the PHQ 9 total score as the dependent variable. Predictors included time and the six demographic variables.
The variables that were found to significantly predict PHQ9 were the time variable Period(B=0.074, p=.00585) and Sexual Orientation (B=2.111, p<2e-16).
The extracted PHQ‑9 factor score will be used as the dependent variable:
Latent Depression Factor=β0+β1Time+β2Race+…+βkTransfer+ϵ
This model adjusts for measurement error in PHQ‑9 items by using a more precise estimate of underlying depression severity. Assumptions of linearity, homoscedasticity, normality of residuals, and influence diagnostics (standardized residuals, leverage, Cook’s distance) were examined. Diagnostics were performed to evaluate residual patterns and influential observations. These graphs are suppressed for this report but available.
Call:
lm(formula = Score_factor ~ Period + MClass + MSorient + MRace +
MGender + Varsitya2 + Transfer2, data = data)
Residuals:
Min 1Q Median 3Q Max
-2.67221 -0.67273 0.00352 0.66765 2.11393
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.183584 0.068651 -2.674 0.00754 **
Period 0.012520 0.004502 2.781 0.00546 **
MClassNot first year 0.015495 0.043393 0.357 0.72106
MSorientNot heterosexual 0.347262 0.040363 8.604 < 2e-16 ***
MRaceWhite 0.091624 0.097086 0.944 0.34540
MGenderNot a man/DNI -0.071377 0.047039 -1.517 0.12930
Varsitya2Yes -0.123856 0.100835 -1.228 0.21945
Transfer2Yes 0.068724 0.046010 1.494 0.13539
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.9031 on 2373 degrees of freedom
(32 observations deleted due to missingness)
Multiple R-squared: 0.03911, Adjusted R-squared: 0.03628
F-statistic: 13.8 on 7 and 2373 DF, p-value: < 2.2e-16
A confirmatory factor analysis (CFA) estimated a single latent depression factor from the nine PHQ 9 items. The CFA was estimated using FIML with the latent factor variance fixed to 1. Individual factor scores were extracted using regression based scoring. These latent scores served as the dependent variable in an OLS regression with the same set of demographic and time predictors as Model 1.
The variables that were found to significantly predict PHQ9 were the time variable Period(B=0.013, p=.008) and Sexual Orientation (B=.347, p<2e-16).
Model 3: Robust regression with latent factor score To evaluate sensitivity to influential observations and distributional assumptions, a robust M‑estimation regression was conducted: Latent Depression Factor=β0+β1Time+β2Race+…+βkTransfer+ϵrobust
Implementation: rlm() (MASS). Estimator: Huber. This model down‑weights extreme residuals rather than deleting them. Coefficient estimates and standard errors were compared to OLS results to assess robustness.
Call:
lmrob(formula = Score_factor ~ Period + MClass + MSorient + MRace + MGender +
Varsitya2 + Transfer2, data = data, fast.s.large.n = Inf)
\--> method = "MM"
Residuals:
Min 1Q Median 3Q Max
-2.699479 -0.677933 0.001548 0.655582 2.117162
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.178827 0.076756 -2.330 0.01990 *
Period 0.012730 0.004738 2.687 0.00727 **
MClassNot first year 0.011714 0.047015 0.249 0.80327
MSorientNot heterosexual 0.380852 0.043962 8.663 < 2e-16 ***
MRaceWhite 0.080930 0.105184 0.769 0.44172
MGenderNot a man/DNI -0.080203 0.051775 -1.549 0.12150
Varsitya2Yes -0.115778 0.094449 -1.226 0.22038
Transfer2Yes 0.068361 0.047719 1.433 0.15211
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Robust residual standard error: 0.9718
(32 observations deleted due to missingness)
Multiple R-squared: 0.04242, Adjusted R-squared: 0.03959
Convergence in 11 IRWLS iterations
Robustness weights:
199 weights are ~= 1. The remaining 2182 ones are summarized as
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.4205 0.8778 0.9496 0.9185 0.9844 0.9990
Algorithmic parameters:
tuning.chi bb tuning.psi refine.tol
1.548e+00 5.000e-01 4.685e+00 1.000e-07
rel.tol scale.tol solve.tol zero.tol
1.000e-07 1.000e-10 1.000e-07 1.000e-10
eps.outlier eps.x warn.limit.reject warn.limit.meanrw
4.200e-05 2.910e-11 5.000e-01 5.000e-01
nResample max.it best.r.s k.fast.s k.max maxit.scale
500 50 2 1 200 200
trace.lev mts compute.rd
0 1000 0
psi subsampling cov
"bisquare" "nonsingular" ".vcov.avar1"
compute.outlier.stats
"SM"
seed : int(0)
To assess sensitivity to influential observations, a robust regression model (M‑estimation, Huber or bisquare loss function) was fitted using the same predictors and the same latent factor score outcome as in Model 2. Robust regression down‑weights observations with large residuals rather than removing them. This approach provides coefficient estimates that are more stable in the presence of outliers or heteroskedasticity.
The variables that were found to significantly predict PHQ9 were the time variable Period(B=0.013, p=.008) and Sexual Orientation (B=.381, p<2e-16).
The three models performed similarly. The predictor variables Period and Sexual Orientation were significant in all three models, with no other significant predictors.
Based on the Interpersonal-Psychological Theory of Suicide Behavior (commonly known as Joiner’s theory) and reflecting the established trends of increasing risk of suicide in college aged young adults, we hypothesize increased distress over time in the following variables will relate to corresponding increases in mean PHQ9 scores over time-
The models below use a new selection of factors to predict PHQ9. The models are fit in the same three methods as the models above: 1) OLS on the PHQ9 total, 2) OLS on the PHQ9 latentOLS depression factor score derived from factor analysis (CFA), 3) Robust regression (M‑estimation) using the latent factor score outcome to down‑weight influential observations.
Model 1: The variables that were found to significantly predict PHQ9 were the time variable Period(B=0.069, p=.0003), Loneliness (B=1.03 , p<2-16), Hopelessness (B=1.45, p,2-16), Desperate feelings (B=.459, p=7.45e-5), Out of Control Feelings (B=1.04, p<2e-16), Drug Use (B=0.269, p=.0218), Suicidal Thoughts (B=1.14, p=7.25e-10), Suicidal Actions (B=.61, p=.009), and Suicidal Attempts (B=.901, p=.0004).
Model 2: The variables that were found to significantly predict PHQ9 were the time variable Period(B=0.012, p=.0002), Loneliness (B=0.189, p<2-16), Hopelessness (B=0.287, p,2-16), Desperate feelings (B=.076, p=5.55e-5), Out of Control Feelings (B=0.159, p<2e-16), Drug Use (B=0.042, p=.0268), Suicidal Thoughts (B=0.167, p=2.33e-8), and Suicidal Attempts (B=.144, p=.0004).
Suicidal actions (B=0.071, p=.064) did not meet the 5% significance level threshold as it did in model 1.
Model 3: The variables that were found to significantly predict PHQ9 were the time variable Period(B=0.011, p=.0004), Loneliness (B=0.199, p<2-16), Hopelessness (B=0.293, p,2-16), Desperate feelings (B=.076, p=.0001), Out of Control Feelings (B=0.166, p<2e-16), Drug Use (B=0.040, p=.046), Suicidal Thoughts (B=0.166, p=7.93e-8), and Suicidal Attempts (B=.158, p=.0001).
Models 2 and 3 performed nearly identically.
Call:
lm(formula = Score ~ Period + Lonely + Hopeless + Desperat +
Control + Drink + TooMuch + Drugs + Thoughts + Plans + Actions +
Attempt, data = data)
Residuals:
Min 1Q Median 3Q Max
-13.4455 -2.6818 0.0053 2.7105 13.6749
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.76309 0.35739 16.126 < 2e-16 ***
Period 0.06964 0.01932 3.604 0.000320 ***
Lonely 1.02545 0.09479 10.818 < 2e-16 ***
Hopeless 1.45184 0.12255 11.847 < 2e-16 ***
Desperat 0.45893 0.11566 3.968 7.46e-05 ***
Control 1.03982 0.09617 10.812 < 2e-16 ***
Drink 0.25312 0.18219 1.389 0.164859
TooMuch 0.14293 0.18994 0.752 0.451834
Drugs 0.26873 0.11709 2.295 0.021821 *
Thoughts 1.13672 0.18374 6.187 7.25e-10 ***
Plans 0.42179 0.26973 1.564 0.118008
Actions 0.60894 0.23278 2.616 0.008958 **
Attempt 0.90122 0.25207 3.575 0.000357 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.885 on 2318 degrees of freedom
(82 observations deleted due to missingness)
Multiple R-squared: 0.5022, Adjusted R-squared: 0.4996
F-statistic: 194.8 on 12 and 2318 DF, p-value: < 2.2e-16
Call:
lm(formula = Score_factor ~ Period + Lonely + Hopeless + Desperat +
Control + Drink + TooMuch + Drugs + Thoughts + Plans + Actions +
Attempt, data = data)
Residuals:
Min 1Q Median 3Q Max
-2.21545 -0.42733 0.00515 0.43480 2.35759
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.428759 0.058041 -24.617 < 2e-16 ***
Period 0.011602 0.003138 3.697 0.000223 ***
Lonely 0.189447 0.015395 12.306 < 2e-16 ***
Hopeless 0.287089 0.019903 14.425 < 2e-16 ***
Desperat 0.075859 0.018783 4.039 5.55e-05 ***
Control 0.158954 0.015618 10.177 < 2e-16 ***
Drink 0.020831 0.029588 0.704 0.481466
TooMuch 0.035200 0.030846 1.141 0.253924
Drugs 0.042126 0.019016 2.215 0.026837 *
Thoughts 0.167249 0.029840 5.605 2.33e-08 ***
Plans 0.047635 0.043804 1.087 0.276947
Actions 0.070107 0.037805 1.854 0.063801 .
Attempt 0.143878 0.040936 3.515 0.000449 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.6309 on 2318 degrees of freedom
(82 observations deleted due to missingness)
Multiple R-squared: 0.5315, Adjusted R-squared: 0.5291
F-statistic: 219.2 on 12 and 2318 DF, p-value: < 2.2e-16
Call:
lmrob(formula = Score_factor ~ Period + Lonely + Hopeless + Desperat + Control +
Drink + TooMuch + Drugs + Thoughts + Plans + Actions + Attempt, data = data,
fast.s.large.n = Inf)
\--> method = "MM"
Residuals:
Min 1Q Median 3Q Max
-2.237177 -0.420806 0.002442 0.437523 2.387504
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.479074 0.059819 -24.726 < 2e-16 ***
Period 0.011496 0.003212 3.578 0.000353 ***
Lonely 0.199107 0.015975 12.464 < 2e-16 ***
Hopeless 0.293227 0.020784 14.108 < 2e-16 ***
Desperat 0.075725 0.019500 3.883 0.000106 ***
Control 0.166486 0.016549 10.060 < 2e-16 ***
Drink 0.023790 0.030019 0.792 0.428151
TooMuch 0.032039 0.030407 1.054 0.292143
Drugs 0.039985 0.019988 2.000 0.045571 *
Thoughts 0.165659 0.030757 5.386 7.93e-08 ***
Plans 0.051581 0.039575 1.303 0.192579
Actions 0.064476 0.034198 1.885 0.059502 .
Attempt 0.158464 0.041434 3.824 0.000135 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Robust residual standard error: 0.6348
(82 observations deleted due to missingness)
Multiple R-squared: 0.5455, Adjusted R-squared: 0.5431
Convergence in 10 IRWLS iterations
Robustness weights:
202 weights are ~= 1. The remaining 2129 ones are summarized as
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.1264 0.8747 0.9492 0.9086 0.9847 0.9990
Algorithmic parameters:
tuning.chi bb tuning.psi refine.tol
1.548e+00 5.000e-01 4.685e+00 1.000e-07
rel.tol scale.tol solve.tol zero.tol
1.000e-07 1.000e-10 1.000e-07 1.000e-10
eps.outlier eps.x warn.limit.reject warn.limit.meanrw
4.290e-05 2.910e-11 5.000e-01 5.000e-01
nResample max.it best.r.s k.fast.s k.max maxit.scale
500 50 2 1 200 200
trace.lev mts compute.rd
0 1000 0
psi subsampling cov
"bisquare" "nonsingular" ".vcov.avar1"
compute.outlier.stats
"SM"
seed : int(0)