Depression is a globally prevalent mental health condition and a leading contributor to disability, according to the World Health Organization (2017). Understanding the factors that influence depression is critical for informing evidence-based mental health interventions and social policy.
This study investigates depression as a primary outcome variable using a quantitative cross-sectional design. The analysis focuses on Hungary, drawing on data from the 11th round of the European Social Survey (ESS). It explores how social determinants, including education, gender, self-reported health, internet use, and socializing frequency relate to depressive symptoms, with the aim of identifying significant predictors and informing targeted interventions.
# Filtering dataset to only include responses from Hungary
df = df[df$cntry=="Hungary",]
sample_size = nrow(df)
Variables for social determinants of depression measuring how often participants experienced different emotional states over the past week:
1. fltdpr: Felt depressed, how often past week
2. flteeff: Felt everything did as effort, how often
past week
3. slprl: Sleep was restless, how often past week
4. wrhpp: Were happy, how often past week
5. fltlnl: Felt lonely, how often past week
6. enjlf: Enjoyed life, how often past week
7. fltsd: Felt sad, how often past week
8. cldgng: Could not get going, how often past
week
• Variables used: d20 to d27
• Number of items: 8
• Sample size: 2,118
• Cronbach’s alpha: 0.845
The Cronbach’s alpha of 0.845 indicates strong internal consistency, suggesting that the items reliably capture depressive symptoms.
# Means
means = c()
means$d20 = mean(as.numeric(df$fltdpr),na.rm=T)
means$d21 = mean(as.numeric(df$flteeff),na.rm=T)
means$d22 = mean(as.numeric(df$slprl),na.rm=T)
means$d23 = mean(as.numeric(df$wrhpp),na.rm=T)
means$d24 = mean(as.numeric(df$fltlnl),na.rm=T)
means$d25= mean(as.numeric(df$enjlf),na.rm=T)
means$d26 = mean(as.numeric(df$fltsd),na.rm=T)
means$d27 = mean(as.numeric(df$cldgng),na.rm=T)
# Counts
counts = c()
counts$d20 = sum(!is.na(df$fltdpr))
counts$d21 = sum(!is.na(df$flteeff))
counts$d22= sum(!is.na(df$slprl))
counts$d23= sum(!is.na(df$wrhpp))
counts$d24= sum(!is.na(df$fltlnl))
counts$d25= sum(!is.na(df$enjlf))
counts$d26= sum(!is.na(df$fltsd))
counts$d27= sum(!is.na(df$cldgng))
df_tmp = likert(df[,c('fltdpr','flteeff','slprl','wrhpp','fltlnl','enjlf','fltsd','cldgng')])$results
df_tmp$means = round(unlist(means), 3)
df_tmp$counts = unlist(counts)
df_tmp[2:5] = round(df_tmp[2:5], 1)
#Rename columns for clarity
colnames(df_tmp) = c("Item", "None", "Some", "Most", "All", "Mean", "Count")
#Render with kable
kable(df_tmp, caption = "Summary of Depression Indicators (Likert Responses)", align = 'lcccccc')
Item | None | Some | Most | All | Mean | Count |
---|---|---|---|---|---|---|
fltdpr | 41.4 | 47.0 | 9.7 | 1.8 | 1.720 | 2118 |
flteeff | 41.7 | 43.5 | 12.2 | 2.6 | 1.757 | 2115 |
slprl | 28.9 | 52.1 | 16.6 | 2.3 | 1.923 | 2115 |
wrhpp | 5.0 | 21.1 | 49.8 | 24.0 | 2.929 | 2105 |
fltlnl | 61.4 | 26.9 | 8.1 | 3.6 | 1.538 | 2112 |
enjlf | 7.5 | 24.7 | 46.7 | 21.0 | 2.812 | 2111 |
fltsd | 47.3 | 43.3 | 8.2 | 1.2 | 1.632 | 2115 |
cldgng | 46.8 | 40.5 | 10.8 | 1.9 | 1.679 | 2117 |
# convert to numbers 1-5
df$d20 = as.numeric(df$fltdpr)
df$d21 = as.numeric(df$flteeff)
df$d22 = as.numeric(df$slprl)
df$d23 = as.numeric(df$wrhpp)
df$d24 = as.numeric(df$fltlnl)
df$d25 = as.numeric(df$enjlf)
df$d26 = as.numeric(df$fltsd)
df$d27 = as.numeric(df$cldgng)
# Reverse scales of D23 and D25 (differently poled than the others)
# Note: Alternatively we could reverse the others. This would, however,
# result in a reversely defined sumscore with high sums indicating LOW tolerance.
# We prefer positively defined scales with high sums indicating HIGH tolerance.
df$d23 = 5 - df$d23
df$d25 = 5 - df$d25
df$depression = rowSums(df[,c("d20", "d21", "d22", "d23", "d24", "d25", "d26", "d27")]) -8
# Run and capture the printed output
alpha_output = capture.output(
cronbach.alpha(df[, c("d20", "d21", "d22", "d23", "d24", "d25", "d26", "d27")], na.rm = TRUE)
)
# Extracting from previous output
alpha_table = data.frame(
`Number of Items` = 8,
`Sample Size` = 2118,
`Cronbach's Alpha` = 0.845
)
# Display table
knitr::kable(alpha_table, caption = "Summary of Internal Consistency (Cronbach's Alpha) for Depression Scale") |>
kableExtra::kable_styling(full_width = T)
Number.of.Items | Sample.Size | Cronbach.s.Alpha |
---|---|---|
8 | 2118 | 0.845 |
Figure 1 shows the distribution of depression scores. Cronbach’s alpha for the scale was 0.8446908.
1. Individuals with higher educational levels are
associated with lower depression scores.
2. Women are more likely to report higher depression
scores compared to men.
3. Individuals with better self-reported health levels
are likely to report lower levels of depression.
4. Excessive internet use increases levels of
depression.
5. Individuals who frequently socialize with friends,
relatives, or colleagues are less likely to experience symptoms of
depression compared to those who socialize less frequently.
Individuals with higher educational levels are associated with lower depression scores.
# A one-way ANOVA was conducted to test differences in depression scores across three education levels (low, medium, high), using the computed depression scale as the dependent variable.
# Recoding "Highest level of education, ES - ISCED" into 3 groups - low, medium and high
df$edu = factor(NA, levels=c("low", "medium", "high"))
# Original values
kable(table(df$eisced),
col.names = c("Education","Frequency"),
caption = "Frequency of Answers by Education Level"
)
Education | Frequency |
---|---|
Not possible to harmonise into ES-ISCED | 0 |
ES-ISCED I , less than lower secondary | 27 |
ES-ISCED II, lower secondary | 377 |
ES-ISCED IIIb, lower tier upper secondary | 623 |
ES-ISCED IIIa, upper tier upper secondary | 679 |
ES-ISCED IV, advanced vocational, sub-degree | 141 |
ES-ISCED V1, lower tertiary education, BA level | 195 |
ES-ISCED V2, higher tertiary education, >= MA level | 73 |
Other | 0 |
df$edu[df$eisced == "ES-ISCED I , less than lower secondary"] = "low"
df$edu[df$eisced == "ES-ISCED II, lower secondary"] = "low"
df$edu[df$eisced == "ES-ISCED IIIb, lower tier upper secondary"] = "medium"
df$edu[df$eisced == "ES-ISCED IIIa, upper tier upper secondary"] = "medium"
df$edu[df$eisced == "ES-ISCED IV, advanced vocational, sub-degree"] = "high"
df$edu[df$eisced == "ES-ISCED V1, lower tertiary education, BA level"] = "high"
df$edu[df$eisced == "ES-ISCED V2, higher tertiary education, >= MA level"] = "high"
# As numeric
df$edunum = as.numeric(df$edu)
# Check
kable(table(df$edunum),
col.names = c("Education","Average Depression Score"),
caption = "Frequency of Answers by Educational, Low (1),Medium (2), High (3)"
)
Education | Average Depression Score |
---|---|
1 | 404 |
2 | 1302 |
3 | 409 |
# An ANOVA was conducted to test for differences in depression scores across education levels. The model was statistically significant, F(73.41, 2, 2078) = 73.41, p < 2e-16, indicating that mean depression levels varied significantly by educational category.
means_df = data.frame(
by(df$depression, df$edu, mean, na.rm=T)
)
edu_means = aggregate(depression ~ edu, data = df,
FUN = function(x) c(mean = round(mean(x, na.rm = TRUE), 2), sd = round(sd(x, na.rm = TRUE), 2)))
edu_means_df = data.frame(Education = edu_means$edu,
Mean_Depression = edu_means$depression[, "mean"],
SD_Depression = edu_means$depression[, "sd"]
)
kable(edu_means_df, caption = "Mean Depression Scores by Education Level") |>
kable_styling(full_width = T)
Education | Mean_Depression | SD_Depression |
---|---|---|
low | 8.50 | 4.72 |
medium | 6.25 | 4.04 |
high | 5.09 | 3.48 |
The table shows the average depression scores across education levels. As expected, individuals with higher education levels reported lower average depression scores. The standard deviation (SD) indicates how much responses varied within each group, with lower values suggesting more consistent responses.
Women are more likely to report higher depression scores compared to men.
# A t-test (or ANOVA, if multiple gender identities) was conducted to compare average depression scores by gender.
# Computing and comparing mean depression score by gender
means_df = data.frame(
by(df$depression, df$gndr, mean, na.rm=T)
)
kable(means_df,
col.names = c("Gender","Average Depression Score"),
caption = "Average Depression Score by Gender"
)
Gender | Average Depression Score |
---|---|
Male | 6.019417 |
Female | 6.738889 |
# Females show higher depression rates than males, confirming the hypothesis.
kable(table(df$gndr),
col.names = c("Gender","Frequency"),
caption = "Frequency of answers by Gender"
)
Gender | Frequency |
---|---|
Male | 835 |
Female | 1283 |
# Creating binary variable for gender: 1 for female, 0 for male
df$female = NA
df$female[df$gndr=="Male"] = 0
df$female[df$gndr=="Female"] = 1
Individuals who report ‘very good’ health will have significantly lower depression scores than those reporting ‘bad’ or ‘very bad’ health.
# Computing mean depression scores for each health level
means_df = data.frame(
by(df$depression, df$health, mean, na.rm=T)
)
kable(means_df,
col.names = c("Health Level","Average Depression Score"),
caption = "Average Depression Score by Gender"
)
Health Level | Average Depression Score |
---|---|
Very good | 3.878906 |
Good | 5.805369 |
Fair | 8.072581 |
Bad | 11.866197 |
Very bad | 14.736842 |
# Check
kable(table(df$health),
col.names = c("Health","Frequency"),
caption = "Frequency of answers by Subjective Health Level"
)
Health | Frequency |
---|---|
Very good | 521 |
Good | 905 |
Fair | 505 |
Bad | 145 |
Very bad | 40 |
People with self-reported good health show lower levels of depression, confirming the hypothesis.
Excessive internet use increases levels of depression.
# Computing mean depression scores for different internet usage levels
means_df = data.frame(
by(df$depression, df$netusoft, mean, na.rm=T)
)
kable(means_df,
col.names = c("Amount of Internet Use","Average depression score"),
caption = "Average Depression Score by the amount of Internet Use"
)
Amount of Internet Use | Average depression score |
---|---|
Never | 9.715736 |
Only occasionally | 8.081967 |
A few times a week | 6.841060 |
Most days | 6.277778 |
Every day | 5.299585 |
# Check
netusoft_table = as.data.frame(table(df$netusoft))
kable(netusoft_table,
col.names = c("Internet Use Frequency", "Number of Respondents"),
caption = "Frequency of Responses by Internet Use Category") |>
kable_styling(full_width = T)
Internet Use Frequency | Number of Respondents |
---|---|
Never | 406 |
Only occasionally | 64 |
A few times a week | 154 |
Most days | 272 |
Every day | 1219 |
Contrary to the hypothesis, higher internet use was linked to lower
depression scores, with the highest score among non-users (2.21) and the
lowest among daily users (1.66).
Individuals who frequently socialize with friends, relatives, or colleagues are less likely to experience symptoms of depression compared to those who socialize less frequently.
# Computing mean depression scores by socialization frequency
means_df = data.frame(
by(df$depression, df$sclmeet, mean, na.rm=T)
)
# Check
kable(table(df$sclmeet),
col.names = c("Amount of Socializing","Number of Respondents"),
caption = "Frequency of Answers by Amount of Socialisation"
)
Amount of Socializing | Number of Respondents |
---|---|
Never | 159 |
Less than once a month | 538 |
Once a month | 386 |
Several times a month | 534 |
Once a week | 256 |
Several times a week | 192 |
Every day | 50 |
# Testing for differences in depression scores by socialization frequency
# Run ANOVA model
anova_result = aov(depression ~ sclmeet, data = df)
# Extract summary
anova_summary = summary(anova_result)
# Convert ANOVA output to a clean data frame
anova_table = as.data.frame(anova_summary[[1]])
# Round values for presentation
anova_table = round(anova_table, 3)
# Create styled table
kable(anova_table, caption = "ANOVA Summary: Depression by Socializing Frequency") |>
kable_styling(full_width = T)
Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
---|---|---|---|---|---|
sclmeet | 6 | 4885.889 | 814.315 | 52.454 | 0 |
Residuals | 2074 | 32197.793 | 15.524 | NA | NA |
The hypothesis can only be partially confirmed as there are small discrepancies.
To predict the dependent variable (depression) based on the five independent variables of education level, gender, self-rated health, internet use, and frequency of socializing.
# Run the model
model = lm(depression ~ edunum + gndr + health + netusoft + sclmeet, data = df, weights = pspwght)
# Extract model coefficients
coef_table = summary(model)$coefficients
# Convert to data frame for editing
coef_df <- as.data.frame(coef_table)
# Round for neatness
coef_table = round(coef_table, 4)
# Rename columns
colnames(coef_table) <- c("Estimate", "Std. Error", "t value", "p value")
# Create and display table
kable(coef_table, caption = "Regression Model: Predicting Depression from Social Determinants") |>
kable_styling(full_width = T, position = "center")
Estimate | Std. Error | t value | p value | |
---|---|---|---|---|
(Intercept) | 7.5096 | 0.4148 | 18.1060 | 0.0000 |
edunum | -0.6548 | 0.1126 | -5.8155 | 0.0000 |
gndrFemale | 0.5343 | 0.1454 | 3.6735 | 0.0002 |
healthGood | 1.7809 | 0.1792 | 9.9393 | 0.0000 |
healthFair | 3.3363 | 0.2273 | 14.6747 | 0.0000 |
healthBad | 6.5115 | 0.3533 | 18.4314 | 0.0000 |
healthVery bad | 8.1725 | 0.6487 | 12.5987 | 0.0000 |
netusoftOnly occasionally | 0.2471 | 0.5010 | 0.4933 | 0.6219 |
netusoftA few times a week | -0.7158 | 0.3463 | -2.0670 | 0.0389 |
netusoftMost days | -0.7455 | 0.3046 | -2.4476 | 0.0145 |
netusoftEvery day | -1.0384 | 0.2573 | -4.0363 | 0.0001 |
sclmeetLess than once a month | -1.2062 | 0.3250 | -3.7114 | 0.0002 |
sclmeetOnce a month | -1.5231 | 0.3474 | -4.3839 | 0.0000 |
sclmeetSeveral times a month | -2.0698 | 0.3369 | -6.1433 | 0.0000 |
sclmeetOnce a week | -1.7382 | 0.3692 | -4.7076 | 0.0000 |
sclmeetSeveral times a week | -1.9532 | 0.3864 | -5.0549 | 0.0000 |
sclmeetEvery day | -0.3480 | 0.5629 | -0.6181 | 0.5365 |
# Create binary outcome for clinically significant depression
df$clin_depr <- ifelse(df$depression >= 9, 1, 0)
table(df$depression)
##
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
## 88 121 181 187 183 205 192 179 169 122 119 87 65 39 39 36 29 14 7 6
## 20 21 22 23
## 6 8 1 1
In this dataset, responses were originally coded from 1 to 5 on a Likert scale. These values were recoded to a 0–3 scale before computing the sum score. Respondents with scores ≥9 were categorized as experiencing clinically significant depressive symptoms.
##
## 0 1
## 1505 579
##
## 0 1
## 0.7221689 0.2778311
# Convert necessary predictors to factor
df$gndr <- factor(df$gndr)
df$health <- factor(df$health)
df$netusoft <- factor(df$netusoft)
df$sclmeet <- factor(df$sclmeet)
# Fit logistic regression (with weights and complete cases)
df_model <- na.omit(df[, c("clin_depr", "edunum", "gndr", "health", "netusoft", "sclmeet", "pspwght")])
# Drop unused levels (optional but safe)
df_model$gndr <- droplevels(factor(df_model$gndr))
df_model$health <- droplevels(factor(df_model$health))
df_model$netusoft <- droplevels(factor(df_model$netusoft))
df_model$sclmeet <- droplevels(factor(df_model$sclmeet))
# Fit logistic regression
log_model <- glm(clin_depr ~ edunum + gndr + health + netusoft + sclmeet,
data = df_model, family = binomial, weights = pspwght)
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
##
## Call:
## glm(formula = clin_depr ~ edunum + gndr + health + netusoft +
## sclmeet, family = binomial, data = df_model, weights = pspwght)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.80174 0.34050 -2.355 0.018542 *
## edunum -0.42328 0.09453 -4.478 7.55e-06 ***
## gndrFemale 0.08171 0.12039 0.679 0.497305
## healthGood 1.41837 0.21350 6.643 3.06e-11 ***
## healthFair 2.34753 0.22590 10.392 < 2e-16 ***
## healthBad 3.34393 0.29455 11.353 < 2e-16 ***
## healthVery bad 4.83965 0.80708 5.997 2.02e-09 ***
## netusoftOnly occasionally 0.46678 0.33470 1.395 0.163119
## netusoftA few times a week -0.23349 0.24243 -0.963 0.335497
## netusoftMost days -0.35082 0.22082 -1.589 0.112119
## netusoftEvery day -0.33790 0.17846 -1.893 0.058298 .
## sclmeetLess than once a month -0.63882 0.23318 -2.740 0.006151 **
## sclmeetOnce a month -1.05419 0.25580 -4.121 3.77e-05 ***
## sclmeetSeveral times a month -1.25775 0.25039 -5.023 5.08e-07 ***
## sclmeetOnce a week -1.19575 0.28355 -4.217 2.48e-05 ***
## sclmeetSeveral times a week -1.12902 0.30366 -3.718 0.000201 ***
## sclmeetEvery day -0.62118 0.43149 -1.440 0.149980
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2335.4 on 2072 degrees of freedom
## Residual deviance: 1774.5 on 2056 degrees of freedom
## AIC: 1893.8
##
## Number of Fisher Scoring iterations: 5
# McFadden's pseudo R²
ll_full <- as.numeric(logLik(log_model))
ll_null <- as.numeric(logLik(update(log_model, . ~ 1)))
## Warning in eval(family$initialize): non-integer #successes in a binomial glm!
## [1] 0.2424953
# Compute odds ratios and 95% confidence intervals
OR <- exp(coef(log_model))
CI <- exp(confint.default(log_model)) # Use confint.default for compatibility
# Combine into table
or_table <- cbind(OR, CI)
# Round and display
knitr::kable(round(or_table, 3), caption = "Odds Ratios with 95% Confidence Intervals") |>
kableExtra::kable_styling(full_width = T)
OR | 2.5 % | 97.5 % | |
---|---|---|---|
(Intercept) | 0.449 | 0.230 | 0.874 |
edunum | 0.655 | 0.544 | 0.788 |
gndrFemale | 1.085 | 0.857 | 1.374 |
healthGood | 4.130 | 2.718 | 6.277 |
healthFair | 10.460 | 6.718 | 16.286 |
healthBad | 28.330 | 15.905 | 50.463 |
healthVery bad | 126.425 | 25.992 | 614.916 |
netusoftOnly occasionally | 1.595 | 0.828 | 3.073 |
netusoftA few times a week | 0.792 | 0.492 | 1.273 |
netusoftMost days | 0.704 | 0.457 | 1.085 |
netusoftEvery day | 0.713 | 0.503 | 1.012 |
sclmeetLess than once a month | 0.528 | 0.334 | 0.834 |
sclmeetOnce a month | 0.348 | 0.211 | 0.575 |
sclmeetSeveral times a month | 0.284 | 0.174 | 0.464 |
sclmeetOnce a week | 0.302 | 0.174 | 0.527 |
sclmeetSeveral times a week | 0.323 | 0.178 | 0.586 |
sclmeetEvery day | 0.537 | 0.231 | 1.252 |
## [1] 0.2424953
The logistic regression results suggest that individuals with poorer self-rated health, lower education levels, and female gender are significantly more likely to report clinically significant depressive symptoms. For example, those with worse health had substantially higher odds of depression, confirming patterns found in the linear model. However, the logistic regression provides a sharper distinction by identifying individuals at clinical risk (CES-D-8 ≥ 9), offering more targeted implications for mental health policy and interventions. McFadden’s pseudo-R² was r round(r_mcfadden, 3), indicating a modest but meaningful model fit typical of social science data.