Depression is a major public health issue across Europe. This analysis explores key social and behavioral factors that may contribute to depressive symptoms in the Austrian population using data from the European Social Survey (ESS Round 11). The goal is to evaluate how depression varies based on alcohol consumption, diet, financial strain, gender perceptions, and living environment.
There are two regression-based models involved:
We begin with the full ESS11 dataset (N = 40156) respondents.
Hypotheses
We compute an 8‐item depression index (items fltdpr, flteeff, slprl, wrhpp, fltlnl, enjlf, fltsd, cldgng; reverse‐scored) and assess reliability.
# convert to numeric
df$d20 = as.numeric(df$fltdpr)
df$d21 = as.numeric(df$flteeff)
df$d22 = as.numeric(df$slprl)
df$d23 = as.numeric(df$wrhpp)
df$d24 = as.numeric(df$fltlnl)
df$d25 = as.numeric(df$enjlf)
df$d26 = as.numeric(df$fltsd)
df$d27 = as.numeric(df$cldgng)
# reverse scoring for the positive items
df$d23 = 5-df$d23
df$d25 = 5-df$d25
# check degree of consistency (internal consistency)
cronbach.alpha(df[,c("d20", "d21", "d22", "d23", "d24", "d25", "d26", "d27" )], na.rm=T)
##
## Cronbach's alpha for the 'df[, c("d20", "d21", "d22", "d23", "d24", "d25", "d26", "d27")]' data-set
##
## Items: 8
## Sample units: 40156
## alpha: 0.823
# compute the score
df$dep = rowSums(df[,c("d20", "d21", "d22", "d23", "d24", "d25", "d26", "d27")]) / 8
df$dep_sum = rowSums(df[,c("d20", "d21", "d22", "d23", "d24", "d25", "d26", "d27")], na.rm = TRUE)
library(ltm)
library(likert) # create basic Likert tables and plots
library(kableExtra) # create formatted tables
vnames = c("fltdpr", "flteeff", "slprl", "wrhpp", "fltlnl", "enjlf", "fltsd", "cldgng")
likert_df = df[,vnames]
likert_table = likert(likert_df)$results
likert_numeric_df = as.data.frame(lapply((df[,vnames]), as.numeric))
likert_table$Mean = unlist(lapply((likert_numeric_df[,vnames]), mean, na.rm=T)) # ... and append new columns to the data frame
likert_table$Count = unlist(lapply((likert_numeric_df[,vnames]), function (x) sum(!is.na(x))))
likert_table$Item <- c(
fltdpr = "How much of the time … feel depressed?",
flteeff = "… everything you did feel like an effort?",
slprl = "… was your sleep restless?",
wrhpp = "… did you feel happy?",
fltlnl = "… did you feel lonely?",
enjlf = "… did you enjoy life?",
fltsd = "… did you feel sad?",
cldgng = "… did you feel you could not get going?"
)
# round all percentage values to 1 decimal digit
likert_table[,2:5] = round(likert_table[,2:5],1)
# round means to 3 decimal digits
likert_table[,7] = round(likert_table[,7],3)
# create formatted table
kable_styling(kable(likert_table,
caption = "Distribution of answers regarding depression indicators (ESS round 11, all countries)"
)
)
| Item | None or almost none of the time | Some of the time | Most of the time | All or almost all of the time | Mean | Count |
|---|---|---|---|---|---|---|
| How much of the time … feel depressed? | 64.9 | 29.1 | 4.6 | 1.5 | 1.425627 | 39981 |
| … everything you did feel like an effort? | 48.4 | 38.4 | 9.8 | 3.4 | 1.681515 | 39983 |
| … was your sleep restless? | 43.9 | 39.9 | 11.6 | 4.6 | 1.770123 | 40017 |
| … did you feel happy? | 4.0 | 23.5 | 48.9 | 23.6 | 2.920231 | 39890 |
| … did you feel lonely? | 68.1 | 24.3 | 5.3 | 2.3 | 1.417377 | 39983 |
| … did you enjoy life? | 5.3 | 24.8 | 44.8 | 25.0 | 2.895281 | 39878 |
| … did you feel sad? | 52.5 | 41.1 | 4.9 | 1.6 | 1.555214 | 39981 |
| … did you feel you could not get going? | 55.7 | 36.1 | 6.2 | 2.0 | 1.545546 | 39949 |
table(df$cntry)
##
## Albania Austria Belgium Bulgaria
## 0 2354 1594 0
## Switzerland Cyprus Czechia Germany
## 1384 685 0 2420
## Denmark Estonia Spain Finland
## 0 0 1844 1563
## France United Kingdom Georgia Greece
## 1771 1684 0 2757
## Croatia Hungary Ireland Israel
## 1563 2118 2017 0
## Iceland Italy Lithuania Luxembourg
## 842 2865 1365 0
## Latvia Montenegro North Macedonia Netherlands
## 0 0 0 1695
## Norway Poland Portugal Romania
## 1337 1442 1373 0
## Serbia Russian Federation Sweden Slovenia
## 1563 0 1230 1248
## Slovakia Turkey Ukraine Kosovo
## 1442 0 0 0
unique(df$cntry)
## [1] Austria Belgium Switzerland Cyprus Germany
## [6] Spain Finland France United Kingdom Greece
## [11] Croatia Hungary Ireland Iceland Italy
## [16] Lithuania Netherlands Norway Poland Portugal
## [21] Serbia Sweden Slovenia Slovakia
## 40 Levels: Albania Austria Belgium Bulgaria Switzerland Cyprus ... Kosovo
# subset to Austria
df_Austria = df[df$cntry == "Austria", ]
nrow(df_Austria)
## [1] 2354
Austrian sample consisted of 2354 respondents.
## alcfreq is factor, make it numeric & new levels & recode
df_Austria$alcfreq_num = NA
df_Austria$alcfreq_num[df_Austria$alcfreq == "Every day"] = 1
df_Austria$alcfreq_num[df_Austria$alcfreq == "Several times a week"] = 2
df_Austria$alcfreq_num[df_Austria$alcfreq == "Once a week"] = 3
df_Austria$alcfreq_num[df_Austria$alcfreq == "2-3 times a month"] = 4
df_Austria$alcfreq_num[df_Austria$alcfreq == "Once a month"] = 5
df_Austria$alcfreq_num[df_Austria$alcfreq == "Less than once a month"] = 6
df_Austria$alcfreq_num[df_Austria$alcfreq == "Never"] = 7
# recoding
df_Austria$alcfreq_recoded = 8 - df_Austria$alcfreq_num
table(df_Austria$alcfreq_recoded)
##
## 1 2 3 4 5 6 7
## 531 231 152 375 380 511 171
# group domicil
df_Austria$domicil = as.numeric(df_Austria$domicil)
table(df_Austria$domicil)
##
## 1 2 3 4 5
## 590 173 632 872 86
# now make levels to prove hypothesis
# Urban = Level 1 + 2 (A big city + Suburbs or outskirts of big city)
# Suburban = Level 3 (Town or small city)
# Rural = Level 4 +5 (Country village + Farm or home in countryside)
df_Austria$domicil_group = factor(NA, levels = c("Urban", "Suburban", "Rural"))
# Assign groups based on domicil levels
df_Austria$domicil_group[df_Austria$domicil %in% c(1, 2)] = "Urban"
df_Austria$domicil_group[df_Austria$domicil == 3] = "Suburban"
df_Austria$domicil_group[df_Austria$domicil %in% c(4, 5)] = "Rural"
# eatveg: 1 = "Twice a day", 0 = all other
df_Austria$eatveg_binary = ifelse(df_Austria$eatveg == "Twice a day", 1, 0)
# hincfel: 1 = "Very difficult on present income", 0 = others
df_Austria$hincfel_binary = ifelse(df_Austria$hincfel == "Very difficult on present income", 1, 0)
# wlespdm: 1 = "Always", 0 = others
df_Austria$wlespdm_binary = ifelse(df_Austria$wlespdm == "Always", 1, 0)
#Model 4:
model4 = lm(dep ~ alcfreq_recoded + eatveg + hincfel + domicil_group + wlespdm, data = df_Austria, weights = anweight)
summary(model4)
##
## Call:
## lm(formula = dep ~ alcfreq_recoded + eatveg + hincfel + domicil_group +
## wlespdm, data = df_Austria, weights = anweight)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -0.83574 -0.12839 -0.02423 0.11260 2.10102
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 1.416574 0.095008
## alcfreq_recoded -0.005004 0.004508
## eatvegTwice a day -0.051553 0.054862
## eatvegOnce a day -0.034415 0.052688
## eatvegLess than once a day but at least 4 times a week -0.022375 0.054720
## eatvegLess than 4 times a week but at least once a week 0.032516 0.058839
## eatvegLess than once a week 0.221997 0.131784
## eatvegNever 0.644169 0.541707
## hincfelCoping on present income 0.052220 0.019516
## hincfelDifficult on present income 0.314644 0.031891
## hincfelVery difficult on present income 0.810762 0.058422
## domicil_groupSuburban -0.018652 0.023519
## domicil_groupRural -0.041669 0.020590
## wlespdmRarely 0.108020 0.086179
## wlespdmSometimes 0.138248 0.081121
## wlespdmOften 0.171953 0.080545
## wlespdmAlways 0.097088 0.084347
## t value Pr(>|t|)
## (Intercept) 14.910 < 2e-16 ***
## alcfreq_recoded -1.110 0.26706
## eatvegTwice a day -0.940 0.34748
## eatvegOnce a day -0.653 0.51371
## eatvegLess than once a day but at least 4 times a week -0.409 0.68266
## eatvegLess than 4 times a week but at least once a week 0.553 0.58058
## eatvegLess than once a week 1.685 0.09222 .
## eatvegNever 1.189 0.23451
## hincfelCoping on present income 2.676 0.00751 **
## hincfelDifficult on present income 9.866 < 2e-16 ***
## hincfelVery difficult on present income 13.878 < 2e-16 ***
## domicil_groupSuburban -0.793 0.42783
## domicil_groupRural -2.024 0.04311 *
## wlespdmRarely 1.253 0.21018
## wlespdmSometimes 1.704 0.08848 .
## wlespdmOften 2.135 0.03288 *
## wlespdmAlways 1.151 0.24984
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2361 on 2199 degrees of freedom
## (138 observations deleted due to missingness)
## Multiple R-squared: 0.127, Adjusted R-squared: 0.1207
## F-statistic: 20 on 16 and 2199 DF, p-value: < 2.2e-16
Below we will create a logistic model
# Ensure categorical variables are factors
df_Austria$dep_binary = ifelse(df_Austria$dep_sum >= 9, 1, 0)
df_Austria$gndr = factor(df_Austria$gndr, labels = c("Male", "Female"))
df_Austria$eduyrs = as.numeric(df_Austria$eduyrs) # education (numeric)
df_Austria$health = as.numeric(df_Austria$health) # self-rated health
# Logistic regression
# Ensure gender is binary
levels(df_Austria$gndr) <- c("Male", "Female")
log_model_final = glm(dep_binary ~ alcfreq_recoded + eatveg_binary + hincfel_binary + domicil_group + wlespdm_binary + gndr + eduyrs + health,
data = df_Austria, family = binomial(), weights = anweight)
summary(log_model_final)
##
## Call:
## glm(formula = dep_binary ~ alcfreq_recoded + eatveg_binary +
## hincfel_binary + domicil_group + wlespdm_binary + gndr +
## eduyrs + health, family = binomial(), data = df_Austria,
## weights = anweight)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.07061 0.88749 -1.206 0.2277
## alcfreq_recoded 0.21197 0.08031 2.639 0.0083 **
## eatveg_binary 0.03677 0.38414 0.096 0.9237
## hincfel_binary 0.37054 1.22139 0.303 0.7616
## domicil_groupSuburban 0.79503 0.47552 1.672 0.0945 .
## domicil_groupRural -0.14195 0.34638 -0.410 0.6820
## wlespdm_binary 0.53965 0.63627 0.848 0.3964
## gndrFemale 0.15099 0.31181 0.484 0.6282
## eduyrs 0.05130 0.04469 1.148 0.2510
## health 1.16776 0.25072 4.658 3.2e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 374.09 on 2198 degrees of freedom
## Residual deviance: 331.23 on 2189 degrees of freedom
## (155 observations deleted due to missingness)
## AIC: 189.2
##
## Number of Fisher Scoring iterations: 6
exp(coef(log_model_final))
## (Intercept) alcfreq_recoded eatveg_binary
## 0.3428007 1.2361132 1.0374553
## hincfel_binary domicil_groupSuburban domicil_groupRural
## 1.4485233 2.2145171 0.8676669
## wlespdm_binary gndrFemale eduyrs
## 1.7154036 1.1629863 1.0526403
## health
## 3.2147675
The table displays the adjusted odds‐ratios (ORs) for predicting the odds of being classified as “clinically depressed”
2. Gender (gndr)
- Females have 1.04 times the odds of meeting the depression
cutoff compared to males
library(ggplot2)
Gender distribution This bar chart displays the gender distribution within the Austrian subsample of the ESS11 dataset. It provides a basic demographic overview, useful for understanding sample composition before deeper analysis.
ggplot(df_Austria, aes(x = gndr)) +
geom_bar(fill = "steelblue") +
labs(
title = "Gender Distribution of Austrian Sample",
x = "Gender",
y = "Count of Respondents",
caption = "ESS Round 11"
) +
theme_minimal()
This boxplot visualizes the average depression scores by residential setting (Urban, Suburban, Rural). It helps test Hypothesis 4 (H4): individuals in rural areas are hypothesized to report higher depressive symptoms. Median and interquartile ranges reveal variability in depression across different living environments.
ggplot(df_Austria, aes(x = domicil_group, y = dep, fill = domicil_group)) +
geom_boxplot(alpha = 0.7) +
labs(
title = "Depression Scores by Domicile Group",
x = "Domicile Type",
y = "Average Depression Score"
) +
theme_minimal() +
scale_fill_brewer(palette = "Set2") +
theme(legend.position = "none")
This boxplot illustrates how average depression scores vary across different levels of alcohol consumption frequency. This tests Hypothesis 1 (H1): more frequent alcohol use is associated with higher depressive symptoms. Boxplot spread and central tendency show whether more frequent drinkers exhibit elevated depression levels.
ggplot(df_Austria, aes(x = alcfreq_recoded, y = dep)) +
geom_boxplot(fill = "pink", alpha = 0.7, na.rm = TRUE) +
labs(
title = "Feelings by How Often People Drink",
x = "Drinking Frequency",
y = "Average Feeling Score"
) +
theme_minimal()
This stacked bar chart shows the proportion of vegetable consumption frequencies by gender. It supports exploratory analysis for Hypothesis 2 (H2) and potential gender-related dietary patterns. The use of relative proportions (position = “fill”) makes gender-based comparison easier.
Improve colors
library(scales)
ggplot(df[!is.na(df$eatveg),], aes(gndr)) +
geom_bar(aes(fill=eatveg), position = "fill", width=.6) +
scale_y_continuous(labels = percent) +
coord_flip() +
scale_fill_manual(values = c("darkgreen", "lightgreen", "yellow", "orange", "lavender", "purple", "red")) +
labs(title ="Health by gender",
subtitle = "ESS mround 11",
x="Gender",
y = "",
caption = "Radvile Karaleviciute") +
theme_minimal()
Discussion: Financial hardship exhibits the strongest association with both continuous and clinical depression indicators, corroborating H3. While alcohol and vegetable intake align with hypothesized directions, they are only significant in the binary model (H1, H2). No evidence supports higher depression in rural areas after adjustment (H4) or perceptions of gender inequality (H5).
Strengths include weighted analyses and reliability assessment. Limitations involve cross‐sectional design and self‐report biases. Future work could extend multilevel modeling across European countries.
To expand our results, we propose a new hypothesis:
H6: The effect of alcohol consumption on depression differs by gender.
# Ensure variables are set correctly
df_Austria$gndr = factor(df_Austria$gndr, labels = c("Male", "Female"))
# Model with interaction term
interaction_model <- lm(dep ~ alcfreq_recoded * gndr + eatveg + hincfel + domicil_group + wlespdm, data = df_Austria, weights = anweight)
summary(interaction_model)
##
## Call:
## lm(formula = dep ~ alcfreq_recoded * gndr + eatveg + hincfel +
## domicil_group + wlespdm, data = df_Austria, weights = anweight)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -0.75775 -0.12954 -0.02642 0.10775 2.02547
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 1.285904 0.097396
## alcfreq_recoded 0.015556 0.006604
## gndrFemale 0.204645 0.040524
## eatvegTwice a day -0.043461 0.054481
## eatvegOnce a day -0.029511 0.052297
## eatvegLess than once a day but at least 4 times a week -0.013772 0.054338
## eatvegLess than 4 times a week but at least once a week 0.042508 0.058507
## eatvegLess than once a week 0.262493 0.130961
## eatvegNever 0.637096 0.537596
## hincfelCoping on present income 0.049238 0.019387
## hincfelDifficult on present income 0.302214 0.031719
## hincfelVery difficult on present income 0.811712 0.057996
## domicil_groupSuburban -0.022810 0.023351
## domicil_groupRural -0.057885 0.020612
## wlespdmRarely 0.114160 0.085530
## wlespdmSometimes 0.139415 0.080506
## wlespdmOften 0.163870 0.079953
## wlespdmAlways 0.093553 0.083707
## alcfreq_recoded:gndrFemale -0.027810 0.009200
## t value Pr(>|t|)
## (Intercept) 13.203 < 2e-16 ***
## alcfreq_recoded 2.355 0.01859 *
## gndrFemale 5.050 4.78e-07 ***
## eatvegTwice a day -0.798 0.42511
## eatvegOnce a day -0.564 0.57261
## eatvegLess than once a day but at least 4 times a week -0.253 0.79995
## eatvegLess than 4 times a week but at least once a week 0.727 0.46758
## eatvegLess than once a week 2.004 0.04515 *
## eatvegNever 1.185 0.23611
## hincfelCoping on present income 2.540 0.01116 *
## hincfelDifficult on present income 9.528 < 2e-16 ***
## hincfelVery difficult on present income 13.996 < 2e-16 ***
## domicil_groupSuburban -0.977 0.32876
## domicil_groupRural -2.808 0.00502 **
## wlespdmRarely 1.335 0.18210
## wlespdmSometimes 1.732 0.08346 .
## wlespdmOften 2.050 0.04052 *
## wlespdmAlways 1.118 0.26385
## alcfreq_recoded:gndrFemale -3.023 0.00253 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2343 on 2197 degrees of freedom
## (138 observations deleted due to missingness)
## Multiple R-squared: 0.141, Adjusted R-squared: 0.134
## F-statistic: 20.04 on 18 and 2197 DF, p-value: < 2.2e-16
ggplot(df_Austria, aes(x = alcfreq_recoded, y = dep, color = gndr)) +
geom_smooth(method = "lm", se = TRUE) +
labs(
title = "Interaction Between Gender and Alcohol Frequency on Depression",
x = "Alcohol Frequency (higher = more frequent)",
y = "Depression Score",
color = "Gender"
) +
theme_minimal()
Among men (the reference category), higher alcohol consumption frequency is associated with higher depression scores.Women, on average, report 0.205 points higher depression scores than men (p < 0.001), regardless of alcohol use.This means: The positive association between alcohol consumption and depression is weaker among women than men.
The relationship between alcohol consumption and depression varies by gender.