This is an R Markdown document for PHQ9 data across waves. Patient Health Questionnaire (PHQ9) is a brief depression severity measure, rating the frequency of the symptoms which factors into the scoring severity index.The scale uses a 0(Not at all) to 3 (Nearly every day) response format.The possible range is 0-27.PHQ9 scores of 5, 10, 15 and 20 represents mild, moderate, moderately severe and severe depression.303, 255 and 227 participants finished the PHQ9 instrument respectively. 216 participants finished PHQ9 instrument for all three waves.

Table 1.PPHQ-9 Depression Severity across Waves
None-Minimal(0-4) Mild(5-9) Moderate(10-14) Moderately Severe(15-19) Severe (20+) PHQ9 Total Score
wave1 135 (0.64) 40 (0.19) 22 (0.1) 8 (0.04) 7 (0.03) 4.95 (5.62)
wave2 105 (0.5) 52 (0.25) 32 (0.15) 12 (0.06) 11 (0.05) 6.37 (6.12)
wave3 92 (0.44) 59 (0.28) 31 (0.15) 17 (0.08) 11 (0.05) 6.98 (6.26)
Note: Categorical Variables(risk group) were present as n(%) and Continuous variable(Total score) was present as mean(sd).

Use the model below to reveal the relationship between time and PHQ9 total socre
\(Y_{ij}=\beta_0+\beta_1Time_2+\beta_2Time_3\).
Table 2. Results of Linear Regression to estimate the correlation between waves and PHQ9 Total Score
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.95 0.41 12.00 0.00
factor(time)2 1.42 0.58 2.44 0.01
factor(time)3 2.03 0.58 3.48 0.00
Note: Linear regression without considering the correlation between measurement
##           [,1]      [,2]      [,3]
## [1,] 1.0000000 0.5092685 0.1102938
## [2,] 0.5092685 1.0000000 0.4251096
## [3,] 0.1102938 0.4251096 1.0000000
##   (Intercept) factor(time)2 factor(time)3 
##      4.948113      1.424528      2.032839
Table 2. Results of GEE to estimate the correlation between waves and PHQ9 Total Score
Estimate Naive S.E. Naive z Robust S.E. Robust z
(Intercept) 4.96 0.41 12.10 0.39 12.88
factor(time)2 1.38 0.31 4.41 0.30 4.57
factor(time)3 2.01 0.39 5.22 0.36 5.54
Note: Generalized estimating equation Model
Conclusion: The measurment in one time-point is correlated with measurment on near time points and the correlation declines with the duration between two measurements increases. Unstructure correlation matrix will be considered when conducting gee models. The results gained from two models are similar, indicating the PHQ9 total score significaly increases with time passing.
Table 3.PHQ9 Total Score acorss Waves, by Sex
Male Female Lower 95% Upper 95% P-value
wave1 3.92 5.36 3.89 5.42 <0.0001
wave2 5.15 6.87 5.25 6.91 <0.0001
wave3 5.67 7.48 5.84 7.54 <0.0001
Note: Two sample student t-test was applied to compare the PHQ9 Total in two categories
Conclusion:The average PHQ9 total score across waves among female is larger than which among males, the difference is significant for all three waves.
Table 4.PHQ9 Total Score acorss Waves, by Sex
P-value
wave1 0.3715291
wave2 0.4772003
wave3 0.1127171
Note: Chi-squared Test of Independece was applied to compare the PHQ9 riskgroup in two categories

Possible interaction of sex is found in figure(The rate of total score change is not the same between two sex groups).

Use the model below to reveal the relationship between time, sex and PHQ9 total socre. Backwards elimination(model selection) was used to decided the best fit. \(Y_{ij}=\beta_0+\beta_1Time_2+\beta_2Time_3+\beta_4Sex_i+\beta_5Time_2*Sex_i+\beta_6Time_3*Sex_i\).
Table 4. Results of Linear Regression to Estimate the Correlation between Waves and PHQ9 Total Score
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.36 0.48 11.06 0.00
factor(time)2 1.51 0.69 2.20 0.03
factor(time)3 2.13 0.68 3.10 0.00
sex -1.44 0.91 -1.58 0.11
factor(time)2:sex -0.28 1.28 -0.22 0.83
factor(time)3:sex -0.37 1.30 -0.29 0.78
Note: Linear regression without considering the correlation between measurement
## Start:  AIC=2271.75
## total ~ factor(time) * sex
## 
##                    Df Sum of Sq   RSS    AIC
## - factor(time):sex  2     3.179 22392 2267.8
## <none>                          22389 2271.8
## 
## Step:  AIC=2267.84
## total ~ factor(time) + sex
## 
##                Df Sum of Sq   RSS    AIC
## <none>                      22392 2267.8
## - sex           1    351.60 22744 2275.7
## - factor(time)  2    456.77 22849 2276.7
Table 5. Results of Final Linear Regression to Estimate the Correlation between Waves and PHQ9 Total Score after Backwards Elimination
Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.42 0.44 12.43 0.00
factor(time)2 1.43 0.58 2.47 0.01
factor(time)3 2.02 0.58 3.48 0.00
sex -1.65 0.53 -3.15 0.00
Note: Linear regression without considering the correlation between measurement
##       (Intercept)     factor(time)2     factor(time)3               sex 
##         5.3552632         1.5122865         2.1250000        -1.4385965 
## factor(time)2:sex factor(time)3:sex 
##        -0.2814122        -0.3692529
Table 6. Results of GEE to estimate the correlation between waves and PHQ9 Total Score
Estimate Naive S.E. Naive z Robust S.E. Robust z
(Intercept) 5.33 0.48 11.03 0.46 11.56
factor(time)2 1.51 0.37 4.13 0.36 4.22
factor(time)3 2.18 0.45 4.80 0.44 5.00
sex -1.26 0.90 -1.39 0.83 -1.52
factor(time)2:sex -0.48 0.69 -0.69 0.66 -0.72
factor(time)3:sex -0.62 0.86 -0.71 0.78 -0.79
Note: Generalized estimating equation Model

Conclusion: Sex should be also considered into model when analyzing the change of PSS.However, it is not an interaction term, which means that the changing rate of PHQ9 total score across waves doesn’t differ within different sex group.

Table 4.PHQ9 Total Score acorss Waves, by Campus
P-value
wave1 0.8700883
wave2 0.2513914
wave3 0.2743083
Note: Chi-squared Test of Independece was applied to compare the PHQ9 riskgroup in two categories

Conclusion: There is no difference of neither PHQ9 total score or severity groups among differnet campus.

Summary: The mean of PHQ9 total score increases with time enrolled in to our study. We should take sex into consideration when analyzing the PHQ9 total score. Limitations: We found the interaction of gender and PHQ9 severity from the plots but we didn’t do logistics regression to quantify the relationship. We need to figure out whether it is a nominal or ordinal logistic regression.Model diagnosis should be conducted to indicate the performance of the model.