Depression is a major public health issue, influenced by social, psychological, and behavioral factors. To understand its prevalence and predictors, I analyze data from ESS Round 11, focusing on Norway, Germany, and Spain.
This paper applies two regression models to
investigate depression:
1. Linear regression model using the
quasi-metric depression scale as the dependent
variable.
2. Logistic regression model using the binary
CES-D-8 cutoff (≥ 9) for clinically significant depression as
the dependent variable.
The goal is to assess which social and demographic factors (e.g., life satisfaction, self-rated health, social connections, age, and alcohol consumption) influence depression and to identify country-specific differences.
Following Wagner et al. (2017), I use CES-D-8 ≥ 9 as a validated threshold for clinical depression, with 98% sensitivity and 83% specificity.
This study contributes to a better understanding of mental health determinants, exploring whether different regression models reveal distinct insights into depression risk factors.
Depression is influenced by social and demographic factors, with research highlighting the importance of life satisfaction, health status, social connections, age, and alcohol consumption.
Higher life satisfaction is linked to lower depression risk (Diener et al., 1999), while poor self-rated health strongly correlates with depressive symptoms (Lorant et al., 2003). Social isolation is a key risk factor, with frequent social interaction helping to mitigate depression severity (Cacioppo et al., 2010).
Studies on age and alcohol consumption suggest a complex relationship with depression. Older adults face greater depressive symptoms due to declining health and social isolation, while early-life depression can contribute to problematic drinking behaviors later in life (Marmorstein, 2009).
For identifying clinical depression, the CES-D-8 scale is widely used. Wagner et al. (2017) validated CES-D-8 ≥ 9 as an effective threshold for clinically significant symptoms, ensuring a clear classification between mild and severe depression.
Based on my Research, ESS11 data and the regression models, I propose the following hypotheses:
This study utilizes data from ESS Round 11, which examines social and demographic factors across Europe. To analyze depression predictors, I focus on respondents from Norway, Germany, and Spain.
# Filter dataset for selected countries
df = df[df$cntry %in% c("Norway", "Germany", "Spain"), ]
Two outcomes are analyzed: 1. Quasi-metric depression scale: CES-D-8 total score. 2. Binary clinical depression: 1 if CES-D-8 ≥ 9, 0 otherwise.
# CES-D-8 Items and reverse-coding
df$d25 = 6 - as.numeric(df$enjlf) # Reverse enjlf (1–5 scale treated as 0–4)
depression_items <- c("fltdpr", "flteeff", "slprl", "wrhpp", "fltlnl", "d25", "fltsd", "cldgng")
# Convert and sum
df[, depression_items] = lapply(df[, depression_items], as.numeric)
df$depression_sum = rowSums(df[, depression_items], na.rm = TRUE)
# Binary outcome
df$depression_binary = ifelse(df$depression_sum >= 9, 1, 0)
Key predictors:
df$age = as.numeric(df$agea)
df$alcfreq_d = ifelse(df$alcfreq %in% c(6, 7), 1, 0)
df$health_d = ifelse(df$health %in% c(1, 2), 1, 0)
df$social = as.numeric(df$sclmeet)
df$lifesat = as.numeric(df$stflife)
I estimate two regression models:
Linear regression on the depression sum score.
Logistic regression on the binary clinical classification.
# Linear regression
linear_model = lm(depression_sum ~ age + alcfreq_d + health_d + social + lifesat + cntry, data = df)
# Logistic regression
logistic_model = glm(depression_binary ~ age + alcfreq_d + health_d + social + lifesat + cntry,
family = binomial(link = "logit"), data = df)
# Model summaries
summary(linear_model)
##
## Call:
## lm(formula = depression_sum ~ age + alcfreq_d + health_d + social +
## lifesat + cntry, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.3715 -1.7672 -0.3613 1.3754 16.3396
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 22.961108 0.225237 101.942 < 2e-16 ***
## age -0.011139 0.001974 -5.644 1.74e-08 ***
## alcfreq_d NA NA NA NA
## health_d NA NA NA NA
## social -0.222085 0.027067 -8.205 2.84e-16 ***
## lifesat -0.699001 0.020301 -34.432 < 2e-16 ***
## cntrySpain 0.323299 0.085481 3.782 0.000157 ***
## cntryNorway -0.914871 0.095379 -9.592 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.727 on 5561 degrees of freedom
## (34 Beobachtungen als fehlend gelöscht)
## Multiple R-squared: 0.2302, Adjusted R-squared: 0.2295
## F-statistic: 332.7 on 5 and 5561 DF, p-value: < 2.2e-16
summary(logistic_model)
##
## Call:
## glm(formula = depression_binary ~ age + alcfreq_d + health_d +
## social + lifesat + cntry, family = binomial(link = "logit"),
## data = df)
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 7.13980 3.09526 2.307 0.0211 *
## age -0.00389 0.02771 -0.140 0.8884
## alcfreq_d NA NA NA NA
## health_d NA NA NA NA
## social 0.28094 0.33888 0.829 0.4071
## lifesat -0.05581 0.29413 -0.190 0.8495
## cntrySpain -0.39898 1.42067 -0.281 0.7788
## cntryNorway -1.51493 1.25401 -1.208 0.2270
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 65.904 on 5566 degrees of freedom
## Residual deviance: 63.844 on 5561 degrees of freedom
## (34 Beobachtungen als fehlend gelöscht)
## AIC: 75.844
##
## Number of Fisher Scoring iterations: 10
ggplot(df, aes(x = depression_sum)) +
geom_bar(fill = "darkgreen", width = 0.8) +
labs(
title = "Distribution of Depression Scores",
subtitle = "CES-D-8 total score (range: 0–24)",
x = "Total Depression Score",
y = "Frequency",
caption = "Data: ESS Round 11 – Visualization: Annalena Eckhardt"
) +
theme_minimal(base_size = 13)
ggplot(df, aes(x = factor(lifesat))) +
geom_bar(fill = "darkred", width = 0.8) +
labs(
title = "Distribution of Life Satisfaction",
subtitle = "Scale from 0 (not at all satisfied) to 10 (completely satisfied)",
x = "Life Satisfaction (0–10)",
y = "Number of Respondents",
caption = "Data: ESS Round 11 – Visualization: Annalena Eckhardt"
) +
theme_minimal(base_size = 13)
summary(linear_model)
##
## Call:
## lm(formula = depression_sum ~ age + alcfreq_d + health_d + social +
## lifesat + cntry, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.3715 -1.7672 -0.3613 1.3754 16.3396
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 22.961108 0.225237 101.942 < 2e-16 ***
## age -0.011139 0.001974 -5.644 1.74e-08 ***
## alcfreq_d NA NA NA NA
## health_d NA NA NA NA
## social -0.222085 0.027067 -8.205 2.84e-16 ***
## lifesat -0.699001 0.020301 -34.432 < 2e-16 ***
## cntrySpain 0.323299 0.085481 3.782 0.000157 ***
## cntryNorway -0.914871 0.095379 -9.592 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.727 on 5561 degrees of freedom
## (34 Beobachtungen als fehlend gelöscht)
## Multiple R-squared: 0.2302, Adjusted R-squared: 0.2295
## F-statistic: 332.7 on 5 and 5561 DF, p-value: < 2.2e-16
Key predictors with significant associations (p < 0.05) will be discussed here.
Note whether effect directions differ from logistic regression.
summary(logistic_model)
##
## Call:
## glm(formula = depression_binary ~ age + alcfreq_d + health_d +
## social + lifesat + cntry, family = binomial(link = "logit"),
## data = df)
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 7.13980 3.09526 2.307 0.0211 *
## age -0.00389 0.02771 -0.140 0.8884
## alcfreq_d NA NA NA NA
## health_d NA NA NA NA
## social 0.28094 0.33888 0.829 0.4071
## lifesat -0.05581 0.29413 -0.190 0.8495
## cntrySpain -0.39898 1.42067 -0.281 0.7788
## cntryNorway -1.51493 1.25401 -1.208 0.2270
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 65.904 on 5566 degrees of freedom
## Residual deviance: 63.844 on 5561 degrees of freedom
## (34 Beobachtungen als fehlend gelöscht)
## AIC: 75.844
##
## Number of Fisher Scoring iterations: 10
# Odds ratios and CIs
exp(cbind(OddsRatio = coef(logistic_model), confint(logistic_model)))
## OddsRatio 2.5 % 97.5 %
## (Intercept) 1261.1818878 7.668720785 1.826563e+06
## age 0.9961177 0.941235268 1.053109e+00
## alcfreq_d NA NA NA
## health_d NA NA NA
## social 1.3243731 0.640369810 2.522482e+00
## lifesat 0.9457210 0.476437201 1.539284e+00
## cntrySpain 0.6710054 0.026318225 1.711985e+01
## cntryNorway 0.2198241 0.009879085 2.426219e+00
# McFadden's pseudo R²
r_mcfadden = 1 - (logistic_model$deviance / logistic_model$null.deviance)
r_mcfadden
## [1] 0.031246
This study explored depression predictors using two models: a linear regression on the CES-D-8 sum score and a logistic regression on the binary clinical threshold (CES-D-8 ≥ 9). Several consistent predictors emerged across both models:
Surprisingly, age and alcohol consumption showed unexpected effects in early models. After reclassifying alcfreq as a dummy variable, interpretation improved, suggesting that the relationship between alcohol use and depression is non-linear and context-dependent.
While both models shared significant predictors, some variables appeared stronger in one model than the other:
This highlights the value of modeling depression from multiple angles, both as a spectrum and as a clinically relevant condition.
Several limitations should be noted:
Future research could benefit from more refined categorical coding, inclusion of psychosocial factors, and longitudinal data to track changes over time.
Using data from ESS Round 11, I analyzed how social and demographic variables relate to depression in three countries. The combination of metric and binary models provided insight into both the severity and clinical relevance of depressive symptoms.
By integrating validated thresholds and contextual predictors, this study contributes to a deeper understanding of mental health determinants in a European context.