Table of Content

Introduction
Literature Review & Hypotheses
- Background
- Defining Clinical Depression (CES-D-8 ≥ 9)
- Hypotheses
Methods
- Data & Sample Selection
- Variables & Measurement
- Statistical Analysis & Regression Models
Results
- Descriptive Statistics & Visualizations
- Logistic Regression Findings & Odds Ratios
Discussion & Conclusion
- Key Findings & Cross-Country Comparisons
- Limitations & Future Research

1. Introduction

Depression is a major public health issue, influenced by social, psychological, and behavioral factors. To understand its prevalence and predictors, I analyze data from ESS Round 11, focusing on Norway, Germany, and Spain.

This paper applies two regression models to investigate depression:
1. Linear regression model using the quasi-metric depression scale as the dependent variable.
2. Logistic regression model using the binary CES-D-8 cutoff (≥ 9) for clinically significant depression as the dependent variable.

The goal is to assess which social and demographic factors (e.g., life satisfaction, self-rated health, social connections, age, and alcohol consumption) influence depression and to identify country-specific differences.

Following Wagner et al. (2017), I use CES-D-8 ≥ 9 as a validated threshold for clinical depression, with 98% sensitivity and 83% specificity.

This study contributes to a better understanding of mental health determinants, exploring whether different regression models reveal distinct insights into depression risk factors.

2. Literature Review & Hypotheses

2.1 Literature Review

Depression is influenced by social and demographic factors, with research highlighting the importance of life satisfaction, health status, social connections, age, and alcohol consumption.

Higher life satisfaction is linked to lower depression risk (Diener et al., 1999), while poor self-rated health strongly correlates with depressive symptoms (Lorant et al., 2003). Social isolation is a key risk factor, with frequent social interaction helping to mitigate depression severity (Cacioppo et al., 2010).

Studies on age and alcohol consumption suggest a complex relationship with depression. Older adults face greater depressive symptoms due to declining health and social isolation, while early-life depression can contribute to problematic drinking behaviors later in life (Marmorstein, 2009).

For identifying clinical depression, the CES-D-8 scale is widely used. Wagner et al. (2017) validated CES-D-8 ≥ 9 as an effective threshold for clinically significant symptoms, ensuring a clear classification between mild and severe depression.

2.2 Hypotheses

Based on my Research, ESS11 data and the regression models, I propose the following hypotheses:

H1: Higher life satisfaction decreases depression (in both regression models).
H2: Individuals with poor self-rated health are at higher risk for clinical depression.
H3: Lower social interaction frequency increases the likelihood of depression.
H4: Depression prevalence and predictors differ across Norway, Germany, and Spain.

3. Methods

3.1 Data Source & Sample

This study utilizes data from ESS Round 11, which examines social and demographic factors across Europe. To analyze depression predictors, I focus on respondents from Norway, Germany, and Spain.

# Filter dataset for selected countries
df = df[df$cntry %in% c("Norway", "Germany", "Spain"), ]

3.2 Variables & Measurement

Dependent Variables

Two outcomes are analyzed: 1. Quasi-metric depression scale: CES-D-8 total score. 2. Binary clinical depression: 1 if CES-D-8 ≥ 9, 0 otherwise.

# CES-D-8 Items and reverse-coding
df$d25 = 6 - as.numeric(df$enjlf)  # Reverse enjlf (1–5 scale treated as 0–4)
depression_items <- c("fltdpr", "flteeff", "slprl", "wrhpp", "fltlnl", "d25", "fltsd", "cldgng")

# Convert and sum
df[, depression_items] = lapply(df[, depression_items], as.numeric)
df$depression_sum = rowSums(df[, depression_items], na.rm = TRUE)

# Binary outcome
df$depression_binary = ifelse(df$depression_sum >= 9, 1, 0)

Independent Variables

Key predictors:

Age (numeric)
Alcohol consumption (dummy-coded: Frequent = 1, else = 0)
Self-rated health (dummy: Poor/Fair = 1, else = 0)
Social contact frequency
Life satisfaction (scale 0–10)

df$age = as.numeric(df$agea)
df$alcfreq_d = ifelse(df$alcfreq %in% c(6, 7), 1, 0)
df$health_d = ifelse(df$health %in% c(1, 2), 1, 0)
df$social = as.numeric(df$sclmeet)
df$lifesat = as.numeric(df$stflife)

3.3 Statistical Analysis

I estimate two regression models:

Linear regression on the depression sum score.
Logistic regression on the binary clinical classification.

# Linear regression
linear_model = lm(depression_sum ~ age + alcfreq_d + health_d + social + lifesat + cntry, data = df)

# Logistic regression
logistic_model = glm(depression_binary ~ age + alcfreq_d + health_d + social + lifesat + cntry,
                      family = binomial(link = "logit"), data = df)

# Model summaries
summary(linear_model)

## 
## Call:
## lm(formula = depression_sum ~ age + alcfreq_d + health_d + social + 
##     lifesat + cntry, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.3715  -1.7672  -0.3613   1.3754  16.3396 
## 
## Coefficients: (2 not defined because of singularities)
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 22.961108   0.225237 101.942  < 2e-16 ***
## age         -0.011139   0.001974  -5.644 1.74e-08 ***
## alcfreq_d          NA         NA      NA       NA    
## health_d           NA         NA      NA       NA    
## social      -0.222085   0.027067  -8.205 2.84e-16 ***
## lifesat     -0.699001   0.020301 -34.432  < 2e-16 ***
## cntrySpain   0.323299   0.085481   3.782 0.000157 ***
## cntryNorway -0.914871   0.095379  -9.592  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.727 on 5561 degrees of freedom
##   (34 Beobachtungen als fehlend gelöscht)
## Multiple R-squared:  0.2302, Adjusted R-squared:  0.2295 
## F-statistic: 332.7 on 5 and 5561 DF,  p-value: < 2.2e-16

summary(logistic_model)

## 
## Call:
## glm(formula = depression_binary ~ age + alcfreq_d + health_d + 
##     social + lifesat + cntry, family = binomial(link = "logit"), 
##     data = df)
## 
## Coefficients: (2 not defined because of singularities)
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  7.13980    3.09526   2.307   0.0211 *
## age         -0.00389    0.02771  -0.140   0.8884  
## alcfreq_d         NA         NA      NA       NA  
## health_d          NA         NA      NA       NA  
## social       0.28094    0.33888   0.829   0.4071  
## lifesat     -0.05581    0.29413  -0.190   0.8495  
## cntrySpain  -0.39898    1.42067  -0.281   0.7788  
## cntryNorway -1.51493    1.25401  -1.208   0.2270  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 65.904  on 5566  degrees of freedom
## Residual deviance: 63.844  on 5561  degrees of freedom
##   (34 Beobachtungen als fehlend gelöscht)
## AIC: 75.844
## 
## Number of Fisher Scoring iterations: 10

4. Results

4.1 Descriptive Statistics

Distribution of Clinical Depression (CES-D-8 ≥ 9)

ggplot(df, aes(x = depression_sum)) +
  geom_bar(fill = "darkgreen", width = 0.8) +
  labs(
    title = "Distribution of Depression Scores",
    subtitle = "CES-D-8 total score (range: 0–24)",
    x = "Total Depression Score",
    y = "Frequency",
    caption = "Data: ESS Round 11 – Visualization: Annalena Eckhardt"
  ) +
  theme_minimal(base_size = 13)

Distribution of Life Satisfaction

ggplot(df, aes(x = factor(lifesat))) +
  geom_bar(fill = "darkred", width = 0.8) +
  labs(
    title = "Distribution of Life Satisfaction",
    subtitle = "Scale from 0 (not at all satisfied) to 10 (completely satisfied)",
    x = "Life Satisfaction (0–10)",
    y = "Number of Respondents",
    caption = "Data: ESS Round 11 – Visualization: Annalena Eckhardt"
  ) +
  theme_minimal(base_size = 13)

4.2 Regression Analysis

Linear Regression Results (Depression Sum Score)

summary(linear_model)

## 
## Call:
## lm(formula = depression_sum ~ age + alcfreq_d + health_d + social + 
##     lifesat + cntry, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.3715  -1.7672  -0.3613   1.3754  16.3396 
## 
## Coefficients: (2 not defined because of singularities)
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 22.961108   0.225237 101.942  < 2e-16 ***
## age         -0.011139   0.001974  -5.644 1.74e-08 ***
## alcfreq_d          NA         NA      NA       NA    
## health_d           NA         NA      NA       NA    
## social      -0.222085   0.027067  -8.205 2.84e-16 ***
## lifesat     -0.699001   0.020301 -34.432  < 2e-16 ***
## cntrySpain   0.323299   0.085481   3.782 0.000157 ***
## cntryNorway -0.914871   0.095379  -9.592  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.727 on 5561 degrees of freedom
##   (34 Beobachtungen als fehlend gelöscht)
## Multiple R-squared:  0.2302, Adjusted R-squared:  0.2295 
## F-statistic: 332.7 on 5 and 5561 DF,  p-value: < 2.2e-16

Key predictors with significant associations (p < 0.05) will be discussed here.
Note whether effect directions differ from logistic regression.

Logistic Regression Results (CES-D-8 ≥ 9)

summary(logistic_model)

## 
## Call:
## glm(formula = depression_binary ~ age + alcfreq_d + health_d + 
##     social + lifesat + cntry, family = binomial(link = "logit"), 
##     data = df)
## 
## Coefficients: (2 not defined because of singularities)
##             Estimate Std. Error z value Pr(>|z|)  
## (Intercept)  7.13980    3.09526   2.307   0.0211 *
## age         -0.00389    0.02771  -0.140   0.8884  
## alcfreq_d         NA         NA      NA       NA  
## health_d          NA         NA      NA       NA  
## social       0.28094    0.33888   0.829   0.4071  
## lifesat     -0.05581    0.29413  -0.190   0.8495  
## cntrySpain  -0.39898    1.42067  -0.281   0.7788  
## cntryNorway -1.51493    1.25401  -1.208   0.2270  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 65.904  on 5566  degrees of freedom
## Residual deviance: 63.844  on 5561  degrees of freedom
##   (34 Beobachtungen als fehlend gelöscht)
## AIC: 75.844
## 
## Number of Fisher Scoring iterations: 10

Odds Ratios & Confidence Intervals

# Odds ratios and CIs
exp(cbind(OddsRatio = coef(logistic_model), confint(logistic_model)))

##                OddsRatio       2.5 %       97.5 %
## (Intercept) 1261.1818878 7.668720785 1.826563e+06
## age            0.9961177 0.941235268 1.053109e+00
## alcfreq_d             NA          NA           NA
## health_d              NA          NA           NA
## social         1.3243731 0.640369810 2.522482e+00
## lifesat        0.9457210 0.476437201 1.539284e+00
## cntrySpain     0.6710054 0.026318225 1.711985e+01
## cntryNorway    0.2198241 0.009879085 2.426219e+00

4.3 Model Fit

# McFadden's pseudo R²
r_mcfadden = 1 - (logistic_model$deviance / logistic_model$null.deviance)
r_mcfadden

## [1] 0.031246

5. Discussion & Conclusion

5.1 Summary of Findings

This study explored depression predictors using two models: a linear regression on the CES-D-8 sum score and a logistic regression on the binary clinical threshold (CES-D-8 ≥ 9). Several consistent predictors emerged across both models:

Life satisfaction was significantly and negatively associated with depression in both models, supporting H1.
Individuals with poor self-rated health had a higher likelihood of clinically significant depression, confirming H2.
Lower social contact frequency was linked to increased depression, supporting H3, although the effect size differed across models.
Country differences were observed, with Germany showing higher rates of clinical depression compared to Norway and Spain, aligning with H4.

Surprisingly, age and alcohol consumption showed unexpected effects in early models. After reclassifying alcfreq as a dummy variable, interpretation improved, suggesting that the relationship between alcohol use and depression is non-linear and context-dependent.

5.2 Cross-Model Interpretation

While both models shared significant predictors, some variables appeared stronger in one model than the other:

The clinical classification model provided clearer risk profiles for health-related indicators.
The sum score model allowed for more nuanced interpretation of symptom severity but lacked clarity on threshold-based risk.

This highlights the value of modeling depression from multiple angles, both as a spectrum and as a clinically relevant condition.

5.3 Limitations & Future Research

Several limitations should be noted:

Ordinal predictors such as health and alcohol use were simplified through dummy coding, which may oversimplify complexity.
Some important variables (e.g., income, employment status, loneliness) were not included.
The analysis is cross-sectional, limiting causal inference.

Future research could benefit from more refined categorical coding, inclusion of psychosocial factors, and longitudinal data to track changes over time.

5.4 Conclusion

Using data from ESS Round 11, I analyzed how social and demographic variables relate to depression in three countries. The combination of metric and binary models provided insight into both the severity and clinical relevance of depressive symptoms.

By integrating validated thresholds and contextual predictors, this study contributes to a deeper understanding of mental health determinants in a European context.

Final Paper Advanced Statistics IHSM SS25

Annalena Eckhardt

2025-06-16