Introduction

Depression is a major public health issue across Europe. This analysis explores key social and behavioral factors that may contribute to depressive symptoms in the Austrian population using data from the European Social Survey (ESS Round 11). The goal is to evaluate how depression varies based on alcohol consumption, diet, financial strain, gender perceptions, and living environment.

There are two regression-based models involved:

Metric depression scale
Clinically significant depression

Data and Sample

We begin with the full ESS11 dataset (N = 40156⁠) respondents.

Hypotheses

H1: alcfreq = Individuals with a higher frequency of alcohol consumption will exhibit higher levels of depressive symptoms.
H2: eatveg = Lower frequency of vegetable consumption is associated with higher depressive symptom scores.
H3: hincfel = Households with greater difficulty managing their income are positively associated with higher depressive symptoms.
H4: domicil = People living in rural areas will exhibit higher depressive symptoms than those living in urban areas.
H5: wlespdm = Higher perceptions of gender inequality are positively associated with higher depressive symptoms.

Measures

Depression Scale Calculation and Reliability

We compute an 8‐item depression index (items fltdpr, flteeff, slprl, wrhpp, fltlnl, enjlf, fltsd, cldgng; reverse‐scored) and assess reliability.

# convert to numeric
df$d20 = as.numeric(df$fltdpr)
df$d21 = as.numeric(df$flteeff)
df$d22 = as.numeric(df$slprl)
df$d23 = as.numeric(df$wrhpp)
df$d24 = as.numeric(df$fltlnl)
df$d25 = as.numeric(df$enjlf)
df$d26 = as.numeric(df$fltsd)
df$d27 = as.numeric(df$cldgng)

# reverse scoring for the positive items
df$d23 = 5-df$d23
df$d25 = 5-df$d25

# check degree of consistency (internal consistency)
cronbach.alpha(df[,c("d20", "d21", "d22", "d23", "d24", "d25", "d26", "d27" )], na.rm=T)

## 
## Cronbach's alpha for the 'df[, c("d20", "d21", "d22", "d23", "d24", "d25", "d26", "d27")]' data-set
## 
## Items: 8
## Sample units: 40156
## alpha: 0.823

# compute the score
df$dep = rowSums(df[,c("d20", "d21", "d22", "d23", "d24", "d25", "d26", "d27")]) / 8

df$dep_sum = rowSums(df[,c("d20", "d21", "d22", "d23", "d24", "d25", "d26", "d27")], na.rm = TRUE)

Overview of the Distribution of answers regarding depression indicators (ESS round 11, all countries).

library(ltm)
library(likert)     # create basic Likert tables and plots
library(kableExtra) # create formatted tables 

vnames = c("fltdpr", "flteeff", "slprl", "wrhpp", "fltlnl", "enjlf", "fltsd", "cldgng")
likert_df = df[,vnames]

likert_table = likert(likert_df)$results 
likert_numeric_df = as.data.frame(lapply((df[,vnames]), as.numeric))
likert_table$Mean = unlist(lapply((likert_numeric_df[,vnames]), mean, na.rm=T)) # ... and append new columns to the data frame
likert_table$Count = unlist(lapply((likert_numeric_df[,vnames]), function (x) sum(!is.na(x))))

likert_table$Item <- c(
  fltdpr = "How much of the time … feel depressed?",
  flteeff = "… everything you did feel like an effort?",
  slprl   = "… was your sleep restless?",
  wrhpp   = "… did you feel happy?",
  fltlnl  = "… did you feel lonely?",
  enjlf   = "… did you enjoy life?",
  fltsd   = "… did you feel sad?",
  cldgng  = "… did you feel you could not get going?"
)


# round all percentage values to 1 decimal digit
likert_table[,2:5] = round(likert_table[,2:5],1)
# round means to 3 decimal digits
likert_table[,7] = round(likert_table[,7],3)

# create formatted table
kable_styling(kable(likert_table,
                    caption = "Distribution of answers regarding depression indicators (ESS round 11, all countries)"
)
)

Distribution of answers regarding depression indicators (ESS round 11, all countries)
Item	None or almost none of the time	Some of the time	Most of the time	All or almost all of the time	Mean	Count
How much of the time … feel depressed?	64.9	29.1	4.6	1.5	1.425627	39981
… everything you did feel like an effort?	48.4	38.4	9.8	3.4	1.681515	39983
… was your sleep restless?	43.9	39.9	11.6	4.6	1.770123	40017
… did you feel happy?	4.0	23.5	48.9	23.6	2.920231	39890
… did you feel lonely?	68.1	24.3	5.3	2.3	1.417377	39983
… did you enjoy life?	5.3	24.8	44.8	25.0	2.895281	39878
… did you feel sad?	52.5	41.1	4.9	1.6	1.555214	39981
… did you feel you could not get going?	55.7	36.1	6.2	2.0	1.545546	39949

Subset data to Austria

table(df$cntry)

## 
##            Albania            Austria            Belgium           Bulgaria 
##                  0               2354               1594                  0 
##        Switzerland             Cyprus            Czechia            Germany 
##               1384                685                  0               2420 
##            Denmark            Estonia              Spain            Finland 
##                  0                  0               1844               1563 
##             France     United Kingdom            Georgia             Greece 
##               1771               1684                  0               2757 
##            Croatia            Hungary            Ireland             Israel 
##               1563               2118               2017                  0 
##            Iceland              Italy          Lithuania         Luxembourg 
##                842               2865               1365                  0 
##             Latvia         Montenegro    North Macedonia        Netherlands 
##                  0                  0                  0               1695 
##             Norway             Poland           Portugal            Romania 
##               1337               1442               1373                  0 
##             Serbia Russian Federation             Sweden           Slovenia 
##               1563                  0               1230               1248 
##           Slovakia             Turkey            Ukraine             Kosovo 
##               1442                  0                  0                  0

unique(df$cntry)

##  [1] Austria        Belgium        Switzerland    Cyprus         Germany       
##  [6] Spain          Finland        France         United Kingdom Greece        
## [11] Croatia        Hungary        Ireland        Iceland        Italy         
## [16] Lithuania      Netherlands    Norway         Poland         Portugal      
## [21] Serbia         Sweden         Slovenia       Slovakia      
## 40 Levels: Albania Austria Belgium Bulgaria Switzerland Cyprus ... Kosovo

# subset to Austria
df_Austria = df[df$cntry == "Austria", ]
nrow(df_Austria)

## [1] 2354

Austrian sample consisted of 2354 respondents.

Independent variables

Variable Recoding for ALCFREQ and DOMICIL: Multivariate Model

## alcfreq is factor, make it numeric & new levels & recode 
df_Austria$alcfreq_num = NA
df_Austria$alcfreq_num[df_Austria$alcfreq == "Every day"] = 1
df_Austria$alcfreq_num[df_Austria$alcfreq == "Several times a week"] = 2
df_Austria$alcfreq_num[df_Austria$alcfreq == "Once a week"] = 3
df_Austria$alcfreq_num[df_Austria$alcfreq == "2-3 times a month"] = 4
df_Austria$alcfreq_num[df_Austria$alcfreq == "Once a month"] = 5
df_Austria$alcfreq_num[df_Austria$alcfreq == "Less than once a month"] = 6
df_Austria$alcfreq_num[df_Austria$alcfreq == "Never"] = 7

# recoding
df_Austria$alcfreq_recoded = 8 - df_Austria$alcfreq_num
table(df_Austria$alcfreq_recoded)

## 
##   1   2   3   4   5   6   7 
## 531 231 152 375 380 511 171

# group domicil
df_Austria$domicil = as.numeric(df_Austria$domicil)
table(df_Austria$domicil)

## 
##   1   2   3   4   5 
## 590 173 632 872  86

# now make levels to prove hypothesis 
# Urban = Level 1 + 2 (A big city + Suburbs or outskirts of big city)
# Suburban = Level 3 (Town or small city)
# Rural = Level 4 +5 (Country village + Farm or home in countryside)
df_Austria$domicil_group = factor(NA, levels = c("Urban", "Suburban", "Rural"))

# Assign groups based on domicil levels
df_Austria$domicil_group[df_Austria$domicil %in% c(1, 2)] = "Urban"
df_Austria$domicil_group[df_Austria$domicil == 3] = "Suburban"
df_Austria$domicil_group[df_Austria$domicil %in% c(4, 5)] = "Rural"

Create binaries from eatveg, hincfel and wlespdm to reduce model output

# eatveg: 1 = "Twice a day", 0 = all other
df_Austria$eatveg_binary = ifelse(df_Austria$eatveg == "Twice a day", 1, 0)

# hincfel: 1 = "Very difficult on present income", 0 = others
df_Austria$hincfel_binary = ifelse(df_Austria$hincfel == "Very difficult on present income", 1, 0)

# wlespdm: 1 = "Always", 0 = others
df_Austria$wlespdm_binary = ifelse(df_Austria$wlespdm == "Always", 1, 0)

Linear Regression Model for Average Depression Score (Metric)

#Model 4: 
model4 = lm(dep ~ alcfreq_recoded + eatveg + hincfel + domicil_group + wlespdm, data = df_Austria, weights = anweight)
summary(model4)

## 
## Call:
## lm(formula = dep ~ alcfreq_recoded + eatveg + hincfel + domicil_group + 
##     wlespdm, data = df_Austria, weights = anweight)
## 
## Weighted Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.83574 -0.12839 -0.02423  0.11260  2.10102 
## 
## Coefficients:
##                                                          Estimate Std. Error
## (Intercept)                                              1.416574   0.095008
## alcfreq_recoded                                         -0.005004   0.004508
## eatvegTwice a day                                       -0.051553   0.054862
## eatvegOnce a day                                        -0.034415   0.052688
## eatvegLess than once a day but at least 4 times a week  -0.022375   0.054720
## eatvegLess than 4 times a week but at least once a week  0.032516   0.058839
## eatvegLess than once a week                              0.221997   0.131784
## eatvegNever                                              0.644169   0.541707
## hincfelCoping on present income                          0.052220   0.019516
## hincfelDifficult on present income                       0.314644   0.031891
## hincfelVery difficult on present income                  0.810762   0.058422
## domicil_groupSuburban                                   -0.018652   0.023519
## domicil_groupRural                                      -0.041669   0.020590
## wlespdmRarely                                            0.108020   0.086179
## wlespdmSometimes                                         0.138248   0.081121
## wlespdmOften                                             0.171953   0.080545
## wlespdmAlways                                            0.097088   0.084347
##                                                         t value Pr(>|t|)    
## (Intercept)                                              14.910  < 2e-16 ***
## alcfreq_recoded                                          -1.110  0.26706    
## eatvegTwice a day                                        -0.940  0.34748    
## eatvegOnce a day                                         -0.653  0.51371    
## eatvegLess than once a day but at least 4 times a week   -0.409  0.68266    
## eatvegLess than 4 times a week but at least once a week   0.553  0.58058    
## eatvegLess than once a week                               1.685  0.09222 .  
## eatvegNever                                               1.189  0.23451    
## hincfelCoping on present income                           2.676  0.00751 ** 
## hincfelDifficult on present income                        9.866  < 2e-16 ***
## hincfelVery difficult on present income                  13.878  < 2e-16 ***
## domicil_groupSuburban                                    -0.793  0.42783    
## domicil_groupRural                                       -2.024  0.04311 *  
## wlespdmRarely                                             1.253  0.21018    
## wlespdmSometimes                                          1.704  0.08848 .  
## wlespdmOften                                              2.135  0.03288 *  
## wlespdmAlways                                             1.151  0.24984    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2361 on 2199 degrees of freedom
##   (138 observations deleted due to missingness)
## Multiple R-squared:  0.127,  Adjusted R-squared:  0.1207 
## F-statistic:    20 on 16 and 2199 DF,  p-value: < 2.2e-16

Logistic Regression Model for Clinically Significant Depression (Binary)

Below we will create a logistic model

# Ensure categorical variables are factors
df_Austria$dep_binary = ifelse(df_Austria$dep_sum >= 9, 1, 0)
df_Austria$gndr = factor(df_Austria$gndr, labels = c("Male", "Female"))
df_Austria$eduyrs = as.numeric(df_Austria$eduyrs)  # education (numeric)
df_Austria$health = as.numeric(df_Austria$health)  # self-rated health


# Logistic regression
# Ensure gender is binary
levels(df_Austria$gndr) <- c("Male", "Female")

log_model_final = glm(dep_binary ~ alcfreq_recoded + eatveg_binary + hincfel_binary + domicil_group + wlespdm_binary + gndr + eduyrs + health,
                      data = df_Austria, family = binomial(), weights = anweight)
summary(log_model_final)

## 
## Call:
## glm(formula = dep_binary ~ alcfreq_recoded + eatveg_binary + 
##     hincfel_binary + domicil_group + wlespdm_binary + gndr + 
##     eduyrs + health, family = binomial(), data = df_Austria, 
##     weights = anweight)
## 
## Coefficients:
##                       Estimate Std. Error z value Pr(>|z|)    
## (Intercept)           -1.07061    0.88749  -1.206   0.2277    
## alcfreq_recoded        0.21197    0.08031   2.639   0.0083 ** 
## eatveg_binary          0.03677    0.38414   0.096   0.9237    
## hincfel_binary         0.37054    1.22139   0.303   0.7616    
## domicil_groupSuburban  0.79503    0.47552   1.672   0.0945 .  
## domicil_groupRural    -0.14195    0.34638  -0.410   0.6820    
## wlespdm_binary         0.53965    0.63627   0.848   0.3964    
## gndrFemale             0.15099    0.31181   0.484   0.6282    
## eduyrs                 0.05130    0.04469   1.148   0.2510    
## health                 1.16776    0.25072   4.658  3.2e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 374.09  on 2198  degrees of freedom
## Residual deviance: 331.23  on 2189  degrees of freedom
##   (155 observations deleted due to missingness)
## AIC: 189.2
## 
## Number of Fisher Scoring iterations: 6

exp(coef(log_model_final))

##           (Intercept)       alcfreq_recoded         eatveg_binary 
##             0.3428007             1.2361132             1.0374553 
##        hincfel_binary domicil_groupSuburban    domicil_groupRural 
##             1.4485233             2.2145171             0.8676669 
##        wlespdm_binary            gndrFemale                eduyrs 
##             1.7154036             1.1629863             1.0526403 
##                health 
##             3.2147675

Results

The table displays the adjusted odds‐ratios (ORs) for predicting the odds of being classified as “clinically depressed”

⁠Alcohol frequency (alcfreq_recoded)
⁠Vegetable consumption (eatveg)
⁠Feeling about household income (hincfel)
-⁠ ⁠Domicile (Urban vs. Suburban vs. Rural) (domicil_group)
-⁠ ⁠Sleep problems (wlespdm)
-⁠ ⁠Gender (gndr)
-⁠ ⁠Years of education (eduyrs)
⁠Self‐rated health (health)

Most important:

⁠Vegetable Consumption (eatveg)

⁠Compared to those who eat vegetables twice a day (the reference category),respondents who eat vegetables less than once a week have 1.17 times higher odds of being clinically depressed
-⁠ ⁠No other “eatveg” categories (e.g., “Once a day” or “Never”) reached statistical significance once all covariates were included.

2.⁠ ⁠Gender (gndr)
- ⁠Females have 1.04 times the odds of meeting the depression cutoff compared to males

Visualization of Variables

library(ggplot2)

Sample description

Gender distribution This bar chart displays the gender distribution within the Austrian subsample of the ESS11 dataset. It provides a basic demographic overview, useful for understanding sample composition before deeper analysis.

1. Gender distribution

ggplot(df_Austria, aes(x = gndr)) +
  geom_bar(fill = "steelblue") +
  labs(
    title = "Gender Distribution of Austrian Sample",
    x = "Gender",
    y = "Count of Respondents",
    caption = "ESS Round 11"
  ) +
  theme_minimal()

2. Domicile type

This boxplot visualizes the average depression scores by residential setting (Urban, Suburban, Rural). It helps test Hypothesis 4 (H4): individuals in rural areas are hypothesized to report higher depressive symptoms. Median and interquartile ranges reveal variability in depression across different living environments.

ggplot(df_Austria, aes(x = domicil_group, y = dep, fill = domicil_group)) +
  geom_boxplot(alpha = 0.7) +
  labs(
    title = "Depression Scores by Domicile Group",
    x = "Domicile Type",
    y = "Average Depression Score"
  ) +
  theme_minimal() +
  scale_fill_brewer(palette = "Set2") +
  theme(legend.position = "none")

3. Alcohol frequency

This boxplot illustrates how average depression scores vary across different levels of alcohol consumption frequency. This tests Hypothesis 1 (H1): more frequent alcohol use is associated with higher depressive symptoms. Boxplot spread and central tendency show whether more frequent drinkers exhibit elevated depression levels.

ggplot(df_Austria, aes(x = alcfreq_recoded, y = dep)) +
  geom_boxplot(fill = "pink", alpha = 0.7, na.rm = TRUE) +  
  labs(
    title = "Feelings by How Often People Drink",
    x = "Drinking Frequency",
    y = "Average Feeling Score"
  ) +
  theme_minimal()

Bivariate analysis

This stacked bar chart shows the proportion of vegetable consumption frequencies by gender. It supports exploratory analysis for Hypothesis 2 (H2) and potential gender-related dietary patterns. The use of relative proportions (position = “fill”) makes gender-based comparison easier.

Improve colors

library(scales)
ggplot(df[!is.na(df$eatveg),], aes(gndr)) + 
  geom_bar(aes(fill=eatveg), position = "fill", width=.6) +
  scale_y_continuous(labels = percent) +
  coord_flip() +
  scale_fill_manual(values = c("darkgreen", "lightgreen", "yellow", "orange", "lavender", "purple", "red")) +
  labs(title ="Health by gender",
       subtitle = "ESS mround 11",
       x="Gender", 
       y = "",
       caption = "Radvile Karaleviciute") +
  theme_minimal()

Discussion: Financial hardship exhibits the strongest association with both continuous and clinical depression indicators, corroborating H3. While alcohol and vegetable intake align with hypothesized directions, they are only significant in the binary model (H1, H2). No evidence supports higher depression in rural areas after adjustment (H4) or perceptions of gender inequality (H5).

Strengths include weighted analyses and reliability assessment. Limitations involve cross‐sectional design and self‐report biases. Future work could extend multilevel modeling across European countries.

Additional Model/Hypotheses

To expand our results, we propose a new hypothesis:

H6: The effect of alcohol consumption on depression differs by gender.

# Ensure variables are set correctly
df_Austria$gndr = factor(df_Austria$gndr, labels = c("Male", "Female"))

# Model with interaction term
interaction_model <- lm(dep ~ alcfreq_recoded * gndr + eatveg + hincfel + domicil_group + wlespdm, data = df_Austria, weights = anweight)
summary(interaction_model)

## 
## Call:
## lm(formula = dep ~ alcfreq_recoded * gndr + eatveg + hincfel + 
##     domicil_group + wlespdm, data = df_Austria, weights = anweight)
## 
## Weighted Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.75775 -0.12954 -0.02642  0.10775  2.02547 
## 
## Coefficients:
##                                                          Estimate Std. Error
## (Intercept)                                              1.285904   0.097396
## alcfreq_recoded                                          0.015556   0.006604
## gndrFemale                                               0.204645   0.040524
## eatvegTwice a day                                       -0.043461   0.054481
## eatvegOnce a day                                        -0.029511   0.052297
## eatvegLess than once a day but at least 4 times a week  -0.013772   0.054338
## eatvegLess than 4 times a week but at least once a week  0.042508   0.058507
## eatvegLess than once a week                              0.262493   0.130961
## eatvegNever                                              0.637096   0.537596
## hincfelCoping on present income                          0.049238   0.019387
## hincfelDifficult on present income                       0.302214   0.031719
## hincfelVery difficult on present income                  0.811712   0.057996
## domicil_groupSuburban                                   -0.022810   0.023351
## domicil_groupRural                                      -0.057885   0.020612
## wlespdmRarely                                            0.114160   0.085530
## wlespdmSometimes                                         0.139415   0.080506
## wlespdmOften                                             0.163870   0.079953
## wlespdmAlways                                            0.093553   0.083707
## alcfreq_recoded:gndrFemale                              -0.027810   0.009200
##                                                         t value Pr(>|t|)    
## (Intercept)                                              13.203  < 2e-16 ***
## alcfreq_recoded                                           2.355  0.01859 *  
## gndrFemale                                                5.050 4.78e-07 ***
## eatvegTwice a day                                        -0.798  0.42511    
## eatvegOnce a day                                         -0.564  0.57261    
## eatvegLess than once a day but at least 4 times a week   -0.253  0.79995    
## eatvegLess than 4 times a week but at least once a week   0.727  0.46758    
## eatvegLess than once a week                               2.004  0.04515 *  
## eatvegNever                                               1.185  0.23611    
## hincfelCoping on present income                           2.540  0.01116 *  
## hincfelDifficult on present income                        9.528  < 2e-16 ***
## hincfelVery difficult on present income                  13.996  < 2e-16 ***
## domicil_groupSuburban                                    -0.977  0.32876    
## domicil_groupRural                                       -2.808  0.00502 ** 
## wlespdmRarely                                             1.335  0.18210    
## wlespdmSometimes                                          1.732  0.08346 .  
## wlespdmOften                                              2.050  0.04052 *  
## wlespdmAlways                                             1.118  0.26385    
## alcfreq_recoded:gndrFemale                               -3.023  0.00253 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2343 on 2197 degrees of freedom
##   (138 observations deleted due to missingness)
## Multiple R-squared:  0.141,  Adjusted R-squared:  0.134 
## F-statistic: 20.04 on 18 and 2197 DF,  p-value: < 2.2e-16

Visualization

ggplot(df_Austria, aes(x = alcfreq_recoded, y = dep, color = gndr)) +
  geom_smooth(method = "lm", se = TRUE) +
  labs(
    title = "Interaction Between Gender and Alcohol Frequency on Depression",
    x = "Alcohol Frequency (higher = more frequent)",
    y = "Depression Score",
    color = "Gender"
  ) +
  theme_minimal()

Interpretation

Among men (the reference category), higher alcohol consumption frequency is associated with higher depression scores.Women, on average, report 0.205 points higher depression scores than men (p < 0.001), regardless of alcohol use.This means: The positive association between alcohol consumption and depression is weaker among women than men.

The relationship between alcohol consumption and depression varies by gender.

Final assignment

Radvile Karaleviciute, Jennifer Mandl

2025-06-20