The primary objective was to examine how time and demographic variables predict depressive symptom severity as measured by the Patient Health Questionnaire‑9 (PHQ‑9). To evaluate the robustness of results to measurement error and outliers, three analytic approaches are compared:
Demographic predictors included:
The comparison of these models allow assessment of (a) whether latent modeling alters inferences relative to summed scores, and (b) whether results are stable when down‑weighting outliers.
For Model 1, missingness in the total PHQ‑9 score followed listwise deletion unless otherwise noted. For Model 2 and Model 3, the CFA estimated using Full Information Maximum Likelihood (FIML) to retain cases with partial item‑level missingness.
Categorical predictors (race, gender, sexual orientation, class year, varsity athlete, transfer status) were dummy‑coded. Reference groups are listed above and were selected based on theoretical relevance or sample size. Time was coded as a numeric index, representing semester, starting with Fall 2017.
Before Models 2 and 3, a Confirmatory Factor Analysis (CFA) was fit to the nine PHQ‑9 items using a single‑factor model, consistent with evidence supporting a strong general depression factor in the PHQ‑9.
A linear regression model was estimated: PHQ9 Total=β0​+β1​Time+β2​Race+…+βk​Transfer+ϵ.
Assumptions (linearity, normality, homoscedasticity) were examined through: Residual plots and histograms Cook’s distance and leverage diagnostics
Call:
lm(formula = Score ~ Period + Class3 + Race2 + Sorient2 + Gender2 +
Varsitya2 + Transfer2, data = data)
Residuals:
Min 1Q Median 3Q Max
-16.1554 -3.8352 -0.0159 3.8910 13.1338
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.93893 0.69595 20.029 < 2e-16 ***
Period 0.06464 0.02690 2.403 0.016341 *
Class3NFY 0.07094 0.25838 0.275 0.783696
Race2African/Afro-Caribbean/Black 0.01260 0.67231 0.019 0.985051
Race2Arab/ME 2.44067 1.59458 1.531 0.126002
Race2Asia/PI -0.56167 0.80469 -0.698 0.485254
Race2DNI 1.44930 1.50480 0.963 0.335587
Race2Multi-ethnic -0.42339 0.75537 -0.561 0.575184
Race2Native American/Alaskan Native -1.06638 0.57957 -1.840 0.065903 .
Race2PNA -1.48569 0.98132 -1.514 0.130170
Sorient2Asexual 0.41756 0.67783 0.616 0.537936
Sorient2Bisexual 2.26260 0.34840 6.494 1.01e-10 ***
Sorient2DNI 0.46054 1.02503 0.449 0.653261
Sorient2Gay/lesbian 1.48896 0.56515 2.635 0.008478 **
Sorient2Panromantic -0.75578 2.40823 -0.314 0.753677
Sorient2Pansexual 3.20230 0.93230 3.435 0.000603 ***
Sorient2PNA 1.30768 0.55202 2.369 0.017922 *
Sorient2Queer 2.24161 0.87232 2.570 0.010239 *
Sorient2Questioning 1.84994 0.59938 3.086 0.002049 **
Gender2Non-binary 2.57723 0.90041 2.862 0.004243 **
Gender2PNA 2.90491 1.22869 2.364 0.018148 *
Gender2Trans man 1.04368 1.73324 0.602 0.547129
Gender2Trans woman 1.07705 2.44196 0.441 0.659212
Gender2Woman -0.42231 0.28197 -1.498 0.134335
Varsitya2Yes -0.85524 0.60356 -1.417 0.156620
Transfer2Yes 0.28514 0.27367 1.042 0.297550
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5.351 on 2353 degrees of freedom
(34 observations deleted due to missingness)
Multiple R-squared: 0.0604, Adjusted R-squared: 0.05041
F-statistic: 6.05 on 25 and 2353 DF, p-value: < 2.2e-16
An ordinary least squares (OLS) regression was fitted with the PHQ 9 total score as the dependent variable. Predictors included time and the six demographic variables. Assumptions of linearity, homoscedasticity, normality of residuals, and influence diagnostics (standardized residuals, leverage, Cook’s distance) were examined.
!!!!!!!!include summary here!!!!!!!!!!!
The extracted PHQ‑9 factor score will be used as the dependent variable:
Latent Depression Factor=β0​+β1​Time+β2​Race+…+βk​Transfer+ϵ
This model adjusts for measurement error in PHQ‑9 items by using a more precise estimate of underlying depression severity. Diagnostics identical to Model 1 were performed to evaluate residual patterns and influential observations.
Call:
lm(formula = Score_factor ~ Period + Class3 + Sorient2 + Race2 +
Gender2 + Varsitya2 + Transfer2, data = data)
Residuals:
Min 1Q Median 3Q Max
-2.72055 -0.64612 -0.00145 0.64783 2.14441
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.038930 0.116764 -0.333 0.738855
Period 0.011023 0.004513 2.443 0.014658 *
Class3NFY 0.020355 0.043350 0.470 0.638727
Sorient2Asexual 0.074947 0.113724 0.659 0.509945
Sorient2Bisexual 0.377336 0.058452 6.455 1.31e-10 ***
Sorient2DNI 0.134129 0.171975 0.780 0.435507
Sorient2Gay/lesbian 0.225334 0.094819 2.376 0.017558 *
Sorient2Panromantic -0.092927 0.404043 -0.230 0.818116
Sorient2Pansexual 0.558064 0.156418 3.568 0.000367 ***
Sorient2PNA 0.207915 0.092616 2.245 0.024866 *
Sorient2Queer 0.348791 0.146354 2.383 0.017242 *
Sorient2Questioning 0.303994 0.100562 3.023 0.002530 **
Race2African/Afro-Caribbean/Black 0.059886 0.112798 0.531 0.595527
Race2Arab/ME 0.382694 0.267532 1.430 0.152717
Race2Asia/PI -0.066964 0.135008 -0.496 0.619942
Race2DNI 0.296697 0.252469 1.175 0.240042
Race2Multi-ethnic -0.032135 0.126733 -0.254 0.799851
Race2Native American/Alaskan Native -0.134687 0.097238 -1.385 0.166147
Race2PNA -0.196042 0.164643 -1.191 0.233887
Gender2Non-binary 0.375173 0.151067 2.483 0.013079 *
Gender2PNA 0.443753 0.206145 2.153 0.031449 *
Gender2Trans man 0.074819 0.290795 0.257 0.796977
Gender2Trans woman 0.068991 0.409702 0.168 0.866288
Gender2Woman -0.105838 0.047307 -2.237 0.025363 *
Varsitya2Yes -0.111287 0.101263 -1.099 0.271883
Transfer2Yes 0.052370 0.045915 1.141 0.254153
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.8978 on 2353 degrees of freedom
(34 observations deleted due to missingness)
Multiple R-squared: 0.05791, Adjusted R-squared: 0.04791
F-statistic: 5.786 on 25 and 2353 DF, p-value: < 2.2e-16
A confirmatory factor analysis (CFA) estimated a single latent depression factor from the nine PHQ 9 items. The CFA was estimated using FIML with the latent factor variance fixed to 1. Individual factor scores were extracted using regression based scoring. These latent scores served as the dependent variable in an OLS regression with the same set of demographic and time predictors as Model 1.
!!!!!!here!!!!!!!!
Model 3: Robust regression with latent factor score To evaluate sensitivity to influential observations and distributional assumptions, a robust M‑estimation regression was conducted: Latent Depression Factor=β0​+β1​Time+β2​Race+…+βk​Transfer+ϵrobust​
Implementation: rlm() (MASS). Estimator: Huber. This model down‑weights extreme residuals rather than deleting them. Coefficient estimates and standard errors were compared to OLS results to assess robustness.
Call:
lmrob(formula = Score_factor ~ Period + Class3 + Sorient2 + Race2 + Gender2 +
Varsitya2 + Transfer2, data = data, fast.s.large.n = Inf)
\--> method = "S"
Residuals:
Min 1Q Median 3Q Max
-3.057342 -0.650301 0.007254 0.613016 2.904135
Algorithm did not converge
Coefficients of the *initial* S-estimator:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.08517 NA NA NA
Period 0.01899 NA NA NA
Class3NFY -0.07406 NA NA NA
Sorient2Asexual 0.01310 NA NA NA
Sorient2Bisexual 0.60959 NA NA NA
Sorient2DNI 0.33356 NA NA NA
Sorient2Gay/lesbian 0.43992 NA NA NA
Sorient2Panromantic -1.00444 NA NA NA
Sorient2Pansexual 0.93944 NA NA NA
Sorient2PNA 0.39898 NA NA NA
Sorient2Queer 0.17349 NA NA NA
Sorient2Questioning 0.55375 NA NA NA
Race2African/Afro-Caribbean/Black 0.40469 NA NA NA
Race2Arab/ME 0.45269 NA NA NA
Race2Asia/PI -0.09538 NA NA NA
Race2DNI -0.57985 NA NA NA
Race2Multi-ethnic 0.25615 NA NA NA
Race2Native American/Alaskan Native -0.15563 NA NA NA
Race2PNA -0.71041 NA NA NA
Gender2Non-binary 0.55046 NA NA NA
Gender2PNA 0.29716 NA NA NA
Gender2Trans man 0.06888 NA NA NA
Gender2Trans woman -0.18632 NA NA NA
Gender2Woman -0.10301 NA NA NA
Varsitya2Yes -0.09963 NA NA NA
Transfer2Yes -0.01465 NA NA NA
Robustness weights:
266 observations c(7,20,65,72,82,83,84,91,94,95,106,113,115,118,120,164,168,186,196,198,199,207,211,212,221,223,231,236,237,243,244,247,248,253,255,256,263,267,268,269,274,293,295,296,299,300,302,317,328,349,352,356,364,370,374,378,379,389,390,395,402,404,411,413,414,417,429,433,455,512,516,536,538,559,564,594,595,599,617,620,639,645,659,662,676,683,696,726,738,757,766,775,779,812,815,820,842,844,852,858,862,875,886,890,907,912,915,926,929,947,956,958,965,972,980,981,982,990,997,998,999,1002,1006,1066,1069,1070,1072,1074,1079,1081,1086,1092,1097,1101,1110,1120,1168,1176,1178,1186,1193,1197,1203,1204,1215,1222,1231,1234,1240,1250,1257,1264,1265,1267,1280,1284,1286,1299,1300,1322,1324,1326,1333,1343,1388,1389,1404,1468,1483,1494,1495,1500,1501,1509,1510,1511,1523,1527,1530,1533,1535,1537,1542,1551,1573,1574,1587,1596,1598,1613,1634,1649,1655,1657,1660,1662,1663,1668,1671,1681,1690,1691,1709,1710,1734,1737,1757,1759,1764,1773,1783,1790,1823,1871,1874,1891,1894,1895,1905,1906,1915,1926,1928,1933,1973,1979,1981,1987,1990,2009,2013,2017,2018,2023,2037,2056,2080,2093,2095,2129,2168,2176,2189,2190,2197,2199,2204,2214,2232,2242,2256,2265,2279,2282,2285,2290,2321,2324,2325,2327,2337,2339,2344,2355,2356,2374)
are outliers with |weight| <= 4e-05 ( < 4.2e-05);
60 weights are ~= 1. The remaining 2053 ones are summarized as
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000595 0.3793000 0.7170000 0.6326000 0.9236000 0.9990000
Algorithmic parameters:
tuning.chi bb tuning.psi refine.tol
1.548e+00 5.000e-01 4.685e+00 1.000e-07
rel.tol scale.tol solve.tol zero.tol
1.000e-07 1.000e-10 1.000e-07 1.000e-10
eps.outlier eps.x warn.limit.reject warn.limit.meanrw
4.203e-05 2.910e-11 5.000e-01 5.000e-01
nResample max.it best.r.s k.fast.s k.max maxit.scale
500 50 2 1 200 200
trace.lev mts compute.rd
0 1000 0
psi subsampling cov
"bisquare" "nonsingular" ".vcov.avar1"
compute.outlier.stats
"SM"
seed : int(0)
To assess sensitivity to influential observations, a robust regression model (M‑estimation, Huber or bisquare loss function) was fitted using the same predictors and the same latent factor score outcome as in Model 2. Robust regression down‑weights observations with large residuals rather than removing them. This approach provides coefficient estimates that are more stable in the presence of outliers or heteroskedasticity.
!!!!!!!!!!!!!!!!!!!