Quantitative Research on Depression Determinants

setwd("/Users/macbook/Desktop/R")

df = read.spss("/Users/macbook/Desktop/R/ESS11.sav", to.data.frame = T)

Introduction

Depression is a globally prevalent mental health condition and a leading contributor to disability, according to the World Health Organization (2017). Understanding the factors that influence depression is critical for informing evidence-based mental health interventions and social policy.

This study investigates depression as a primary outcome variable using a quantitative cross-sectional design. The analysis focuses on Hungary, drawing on data from the 11th round of the European Social Survey (ESS). It explores how social determinants, including education, gender, self-reported health, internet use, and socializing frequency relate to depressive symptoms, with the aim of identifying significant predictors and informing targeted interventions.

# Filtering dataset to only include responses from Hungary
df = df[df$cntry=="Hungary",]
sample_size = nrow(df)

Variables for social determinants of depression measuring how often participants experienced different emotional states over the past week:

1. fltdpr: Felt depressed, how often past week
2. flteeff: Felt everything did as effort, how often past week
3. slprl: Sleep was restless, how often past week
4. wrhpp: Were happy, how often past week
5. fltlnl: Felt lonely, how often past week
6. enjlf: Enjoyed life, how often past week
7. fltsd: Felt sad, how often past week
8. cldgng: Could not get going, how often past week

Method: Scale Construction and Reliability Analysis

• Variables used: d20 to d27
• Number of items: 8
• Sample size: 2,118
• Cronbach’s alpha: 0.845

The Cronbach’s alpha of 0.845 indicates strong internal consistency, suggesting that the items reliably capture depressive symptoms.

library(knitr)
library(kableExtra)
# Computing average depression score
df$depression = rowSums(df[,c("d20","d21","d22","d23","d24","d25","d26","d27")]) / 8

# Create summary statistics data
summary_stats = data.frame(
  Statistic = c("Min", "1st Quartile", "Median", "Mean", "3rd Quartile", "Max", "Missing (NA)"),
  Value = c(1.000, 1.375, 1.750, 1.807, 2.125, 3.875, 34)
)

# Create a styled HTML table
kable(summary_stats, format = "html", caption = "Summary Statistics for Depression Score") |>
  kable_styling(full_width = TRUE, bootstrap_options = c("striped", "hover"))

Summary Statistics for Depression Score
Statistic	Value
Min	1.000
1st Quartile	1.375
Median	1.750
Mean	1.807
3rd Quartile	2.125
Max	3.875
Missing (NA)	34.000

hist(df$depression, breaks=8,
      main = "Distribution of Depression Scores",
     xlab = "Depression Score",
     col = "lightgray",
     border = "white")

Figure 1 shows the distribution of depression scores. Cronbach’s alpha for the scale was 0.8446908.

Hypothesis

1. Individuals with higher educational levels are associated with lower depression scores.
2. Women are more likely to report higher depression scores compared to men.
3. Individuals with better self-reported health levels are likely to report lower levels of depression.
4. Excessive internet use increases levels of depression.
5. Individuals who frequently socialize with friends, relatives, or colleagues are less likely to experience symptoms of depression compared to those who socialize less frequently.

Hypothesis 1

Individuals with higher educational levels are associated with lower depression scores.

# A one-way ANOVA was conducted to test differences in depression scores across three education levels (low, medium, high), using the computed depression scale as the dependent variable.

# Recoding "Highest level of education, ES - ISCED" into 3 groups - low, medium and high
df$edu = factor(NA, levels=c("low", "medium", "high"))

# Original values
kable(table(df$eisced),
col.names = c("Education","Frequency"),
      caption = "Frequency of Answers by Education Level" 
      )

Frequency of Answers by Education Level
Education	Frequency
Not possible to harmonise into ES-ISCED	0
ES-ISCED I , less than lower secondary	27
ES-ISCED II, lower secondary	377
ES-ISCED IIIb, lower tier upper secondary	623
ES-ISCED IIIa, upper tier upper secondary	679
ES-ISCED IV, advanced vocational, sub-degree	141
ES-ISCED V1, lower tertiary education, BA level	195
ES-ISCED V2, higher tertiary education, >= MA level	73
Other	0

df$edu[df$eisced == "ES-ISCED I , less than lower secondary"] = "low"
df$edu[df$eisced == "ES-ISCED II, lower secondary"] = "low"
df$edu[df$eisced == "ES-ISCED IIIb, lower tier upper secondary"] = "medium"
df$edu[df$eisced == "ES-ISCED IIIa, upper tier upper secondary"] = "medium"
df$edu[df$eisced == "ES-ISCED IV, advanced vocational, sub-degree"] = "high"
df$edu[df$eisced == "ES-ISCED V1, lower tertiary education, BA level"] = "high"
df$edu[df$eisced == "ES-ISCED V2, higher tertiary education, >= MA level"] = "high"

# As numeric
df$edunum = as.numeric(df$edu)

# Check
kable(table(df$edunum),
col.names = c("Education","Average Depression Score"),
      caption = "Frequency of Answers by Educational, Low (1),Medium (2), High (3)"
      )

Frequency of Answers by Educational, Low (1),Medium (2), High (3)
Education	Average Depression Score
1	404
2	1302
3	409

# An ANOVA was conducted to test for differences in depression scores across education levels. The model was statistically significant, F(73.41, 2, 2078) = 73.41, p < 2e-16, indicating that mean depression levels varied significantly by educational category.

means_df = data.frame(
by(df$depression, df$edu, mean, na.rm=T)
)
edu_means = aggregate(depression ~ edu, data = df, 
FUN = function(x) c(mean = round(mean(x, na.rm = TRUE), 2), sd = round(sd(x, na.rm = TRUE), 2)))

edu_means_df = data.frame(Education = edu_means$edu,
Mean_Depression = edu_means$depression[, "mean"],
SD_Depression = edu_means$depression[, "sd"]
)

kable(edu_means_df, caption = "Mean Depression Scores by Education Level") |>
kable_styling(full_width = T)

Mean Depression Scores by Education Level
Education	Mean_Depression	SD_Depression
low	2.06	0.59
medium	1.78	0.51
high	1.64	0.44

# Result shows high significance

Result

The table shows the average depression scores across education levels. As expected, individuals with higher education levels reported lower average depression scores. The standard deviation (SD) indicates how much responses varied within each group, with lower values suggesting more consistent responses.

Hypothesis 2

Women are more likely to report higher depression scores compared to men.

# A t-test (or ANOVA, if multiple gender identities) was conducted to compare average depression scores by gender.

# Computing and comparing mean depression score by gender
means_df = data.frame(
by(df$depression, df$gndr, mean, na.rm=T)
)
kable(means_df,
col.names = c("Gender","Average Depression Score"),
      caption = "Average Depression Score by Gender"
      )

Average Depression Score by Gender
Gender	Average Depression Score
Male	1.752427
Female	1.842361

Result

# Females show higher depression rates than males, confirming the hypothesis.
kable(table(df$gndr),
      col.names = c("Gender","Frequency"),
      caption = "Frequency of answers by Gender"
      )

Frequency of answers by Gender
Gender	Frequency
Male	835
Female	1283

# Creating binary variable for gender: 1 for female, 0 for male
df$female = NA
df$female[df$gndr=="Male"] = 0
df$female[df$gndr=="Female"] = 1

Hypothesis 3

Individuals who report ‘very good’ health will have significantly lower depression scores than those reporting ‘bad’ or ‘very bad’ health.

# Depression scores were compared across five levels of self-rated health using ANOVA.

# Computing mean depression scores for each health level
means_df = data.frame(
by(df$depression, df$health, mean, na.rm=T)
)
kable(means_df,
col.names = c("Health Level","Average Depression Score"),
      caption = "Average Depression Score by Gender"
      )

Average Depression Score by Gender
Health Level	Average Depression Score
Very good	1.484863
Good	1.725671
Fair	2.009073
Bad	2.483275
Very bad	2.842105

# Check
kable(table(df$health),
      col.names = c("Health","Frequency"),
      caption = "Frequency of answers by Subjective Health Level"
      )

Frequency of answers by Subjective Health Level
Health	Frequency
Very good	521
Good	905
Fair	505
Bad	145
Very bad	40

# Result
# People with self-reported good health show lower levels of depression, confirming the hypothesis.

Hypothesis 4

Excessive internet use increases levels of depression.

# ANOVA was used to assess differences in depression scores across internet use.

# Computing mean depression scores for different internet usage levels
means_df = data.frame(
by(df$depression, df$netusoft, mean, na.rm=T)
)
kable(means_df,
      col.names = c("Amount of Internet Use","Average depression score"),
      caption = "Average Depression Score by the amount of Internet Use"
      )

Average Depression Score by the amount of Internet Use
Amount of Internet Use	Average depression score
Never	2.214467
Only occasionally	2.010246
A few times a week	1.855132
Most days	1.784722
Every day	1.662448

# Check
netusoft_table = as.data.frame(table(df$netusoft))

kable(netusoft_table,
      col.names = c("Internet Use Frequency", "Number of Respondents"),
      caption = "Frequency of Responses by Internet Use Category") |>
  kable_styling(full_width = T)

Frequency of Responses by Internet Use Category
Internet Use Frequency	Number of Respondents
Never	406
Only occasionally	64
A few times a week	154
Most days	272
Every day	1219

Result

Contrary to the hypothesis, higher internet use was linked to lower depression scores, with the highest score among non-users (2.21) and the lowest among daily users (1.66).

Hypothesis 5

Individuals who frequently socialize with friends, relatives, or colleagues are less likely to experience symptoms of depression compared to those who socialize less frequently.

# ANOVA was used to assess differences in depression scores across socialization frequency.

# Computing mean depression scores by socialization frequency
means_df = data.frame(
by(df$depression, df$sclmeet, mean, na.rm=T)
)

# Check
kable(table(df$sclmeet),
      col.names = c("Amount of Socializing","Number of Respondents"),
      caption = "Frequency of Answers by Amount of Socialisation"
      )

Frequency of Answers by Amount of Socialisation
Amount of Socializing	Number of Respondents
Never	159
Less than once a month	538
Once a month	386
Several times a month	534
Once a week	256
Several times a week	192
Every day	50

# Testing for differences in depression scores by socialization frequency
# Run ANOVA model
anova_result = aov(depression ~ sclmeet, data = df)

# Extract summary
anova_summary = summary(anova_result)

# Convert ANOVA output to a clean data frame
anova_table = as.data.frame(anova_summary[[1]])

# Round values for presentation
anova_table = round(anova_table, 3)

# Create styled table
kable(anova_table, caption = "ANOVA Summary: Depression by Socializing Frequency") |>
  kable_styling(full_width = T)

ANOVA Summary: Depression by Socializing Frequency
	Df	Sum Sq	Mean Sq	F value	Pr(>F)
sclmeet	6	76.342	12.724	52.454	0
Residuals	2074	503.091	0.243	NA	NA

## 37 observations deleted due to missingness

Result

The hypothesis can only be partially confirmed as there are small discrepancies.

Quantitative Research on Depression Determinants

Anna Vongsaravanh | va9176@mci4me.at

2025-05-11

Introduction

Method: Scale Construction and Reliability Analysis

Hypothesis

Hypothesis 1

Result

Hypothesis 2

Result

Hypothesis 3

Hypothesis 4

Result

Hypothesis 5

Result