Introduction
Depression is a prevalent mental health condition that significantly affects individuals’ quality of life and has substantial public health implications. As one of the leading causes of disability adjusted life years (DALYs) worldwide, understanding the key determinants of depression is crucial for developing effective interventions.

This study examines how gender, diet (vegetable consumption), socializing, and education influence depression.

Sample Description

The analytic sample consisted of 2,865 participants, including 1,339 males and 1,526 females. The mean age of the sample was 52.5 years.

Dependent Variable: Depression
To assess depressive symptoms, this study utilized a Depression Score, which was calculated based on participants’ self-reported experiences over the past week. The score captured various aspects of emotional well-being and psychological distress by measuring both negative and positive mental states. Responses were recorded on a standardized numerical scale, ensuring comparability across participants.
To maintain consistency in interpretation, items reflecting positive emotions were reverse scored so that higher values consistently indicated worse mental health. The final Depression Score was computed by averaging the responses across all included items, providing a reliable measure of depressive symptomatology.

The score was derived from the following eight indicators:

# convert to numbers 1-5
df$d20 = as.numeric(df$fltdpr)
df$d21 = as.numeric(df$flteeff)
df$d22 = as.numeric(df$slprl)
df$d23 = as.numeric(df$wrhpp)
df$d24 = as.numeric(df$fltlnl)
df$d25 = as.numeric(df$enjlf)
df$d26 = as.numeric(df$fltsd)
df$d27 = as.numeric(df$cldgng)

# Reverse scales of D23 and D25

df$d23 = 5 - df$d23
df$d25 = 5 - df$d25

df$depression = rowSums(df[,c("d20", "d21", "d22", "d23", "d24", "d25", "d26", "d27")]) / 8

# Cronbach's alpha: 

alpha <- psych::alpha(df[, c("d20", "d21", "d22", "d23", "d24", "d25", "d26", "d27")])$total$raw_alpha

The internal consistency of the depression scale was assessed using Cronbach’s alpha. The analysis resulted in a reliability coefficient of 0.838 based on responses from 40,156 participants, indicating good internal reliability. This suggests that the eight items measure a common underlying construct and justify the use of a composite depression score in subsequent analyses.

# Hypotheses 1: Female Italians have a higher risk of suffering from depression than males

The analysis confirms that women report higher mean depression scores (1.83) compared to men (1.68), supporting the hypothesis that gender differences exist in depressive symptoms.

# Hypotheses 2: Italians with lower education levels have a higher risk of depression

A strong inverse relationship between education level and depression was observed. Individuals with a “high” level of education reported the lowest mean depression score (1.59), followed by those with a “medium” level (1.67), while those in the “low” education category had the highest depression scores (1.92). The ANOVA test confirmed the significance of these differences (F = 116.54, p < 0.001). This result suggests that higher education levels may be associated with better coping mechanisms, greater economic stability, and increased access to mental health resources, all of which contribute to lower depression scores.

# Hypotheses 3: Italians eating many vegetables have a lower risk of suffering from depression (–> Mediterranean Diet)

There is strong evidence that daily vegetable consumption is associated with lower average depression scores. Specifically, individuals who eat vegetables daily have a mean depression score of 1.72 compared to 1.88 for those who do not. This difference of approximately -0.16 units is statistically significant (t = -7.51, df = 2749, p < 0.001), with a 95% confidence interval ranging from -0.2 to -0.12, indicating a statistically significant difference unlikely to have occurred by chance.

# Hypotheses 4: Italians who meet up with often socially meet with friends, relatives or colleagues

The analysis suggests a general trend where higher socializing frequency is associated with lower depression levels. Individuals who met with others daily (socializing frequency = 7) reported the lowest mean depression score (1.6), whereas those who socialized the least (socializing frequency = 1) had the highest mean depression score (2.47). While the trend supports the hypothesis, some variations were observed across intermediate levels of social interaction. These findings reinforce the well-documented protective effects of social engagement on mental health.

Regression Model: Predicting Depression Scores based on the four independent variables of education, eating vegetables, gender and socially meeting
term estimate std.error statistic p.value
(Intercept) 2.149 0.031 68.98 < .001
edumedium -0.203 0.019 -10.48 < .001
eduhigh -0.268 0.025 -10.56 < .001
eatveg_groupnot_daily 0.142 0.020 6.97 < .001
gndrFemale 0.139 0.018 7.90 < .001
sclmeet -0.077 0.006 -13.60 < .001

This linear regression model shows that higher educational attainment, more frequent vegetable consumption, and increased social interaction are all associated with lower levels of reported depression. In contrast, being female is linked to slightly higher depression scores. The most substantial reductions in depression are observed among those with postgraduate education and those who socialize more frequently, while individuals who rarely or never eat vegetables report higher depression levels.

Likert Scale

Mean scores and the number of valid responses were computed for eight Likert-scale items related to mental well-being. The results were summarized in a table to provide an overview of central tendencies and response completeness for each item.

Item None or almost none of the time Some of the time Most of the time All or almost all of the time means counts
fltdpr 71.1 24.0 4.1 0.8 1.347 2840
flteeff 52.6 37.4 7.3 2.7 1.601 2838
slprl 53.7 38.1 6.0 2.2 1.567 2850
wrhpp 7.7 34.2 43.3 14.8 2.652 2817
fltlnl 61.8 29.6 5.9 2.7 1.497 2837
enjlf 14.7 45.7 29.7 9.8 2.346 2803
fltsd 48.3 45.3 4.5 1.9 1.600 2832
cldgng 57.2 35.4 5.7 1.7 1.519 2825

Interpretation: The table summarizes participant responses to eight mental health–related items measured on a 4-point Likert scale ranging from “None or almost none of the time” to “All or almost all of the time.” Overall, the majority of respondents reported experiencing symptoms such as feeling depressed (fltdpr) or lonely (fltlnl) rarely, with over 70% and 60%, respectively, selecting the lowest frequency category. In contrast, a substantial proportion reported frequent worry about problems (wrhpp), with over 58% indicating they experienced this “most of the time” or more. Positive affect, such as enjoying life (enjlf), was reported more frequently, with nearly 10% indicating they experienced it “all or almost all of the time.” Mean scores ranged from 1.35 to 2.65, suggesting that most symptoms were experienced occasionally rather than frequently. Response counts were consistently high across items, indicating good data completeness.

Predictors of Clinically Significant Depression

Binary Outcome Definition: I constructed a binary variable clin_depression to indicate whether an individual shows clinically significant depressive symptoms. The variable is based on the CES-D-8 scale, which ranges from 0 to 24 and is calculated using 8 self-report items related to mood, sleep, and emotional well-being.

Cutoff Justification: I applied a cutoff of 9 or more points on the CES-D-8 to define “clinically significant” depression. This threshold is based on the validation study by Briggs et al. (2018), which demonstrated that a CES-D-8 score of ≥ 9 provides 98% sensitivity and 83% specificity in detecting depression, compared to the full CES-D-20 (cutoff ≥ 16). This cutoff is recognized in research as a valid indicator of clinical-level symptoms.

Prevalence of Clinically Significant Depression (CES-D ≥ 9)
Status Count Proportion
Not Depressed 2115 0.765
Clinically Depressed 650 0.235

A total of 2,865 participants were evaluated using an 8-item CES-D index. Based on a cut-off score of 9 or higher, 23.5% of the sample met the threshold for clinically significant depressive symptoms, while 76.5% did not. This binary classification was used to simplify further analysis and identify individuals at potential risk for depression.

Frequency Distribution: In my dataset, approximately 20.1% (n = 7,930) of individuals met the threshold for clinically significant depressive symptoms, while 79.9% (n = 31,427) did not.

Logistic Regression: Odds Ratios, Confidence Intervals, and Significance
Term Odds_Ratio CI_Lower CI_Upper Std_Error z_value p_value
(Intercept) 1.45 1.06 2.00 0.162 2.30 0.022
gndrFemale 1.75 1.44 2.13 0.099 5.62 < .001
edumedium 0.41 0.33 0.51 0.107 -8.33 < .001
eduhigh 0.32 0.24 0.43 0.153 -7.43 < .001
eatveg_groupnot_daily 1.92 1.56 2.36 0.106 6.12 < .001
sclmeet 0.71 0.67 0.75 0.031 -10.95 < .001

Interpretation of Logistic Regression Results:
The logistic regression model examined predictors of clinically significant depression based on gender, education level, vegetable consumption, and frequency of social contact. The results are as follows:

Overall, all predictors were statistically significant (p < .001), and the model indicates that social and behavioral factors—including education, diet, and interaction frequency—play a significant role in the likelihood of experiencing clinical-level depressive symptoms.