Depression is a prevalent mental disorder, experienced by 4–10% of the global population over their lifetime (Chapman et al., 2022). Currently, around 280 million people (3.8%) are affected worldwide (WHO, 2023), with depression ranked among the leading contributors to the global health burden in 2019. Consequently, it represents a highly relevant field of research.
The present paper aims to investigate depression in a British population, as 15–30% of individuals do not recover after two or more treatments (Chapman et al., 2022). A greater understanding of potential contributing factors is therefore crucial for improving recovery outcomes.
library(foreign)
library(ltm)
library(ggplot2)
library(likert)
library(kableExtra)
#setwd("h:/MCI/Lehre/09-AdvancedStatistics/ihsm24/data")
setwd("/Users/annarendez/Desktop/Master/Advanced Statistics/R-Data")
df = read.spss("ESS11.sav", to.data.frame = T)
H1: The prevalence of depression increases with experiences of discrimination based on an individual’s sexuality (LGBQ+).
H2: The prevalence of depression increases with experiences of discrimination based on an individual’s skin colour or race.
H3: The prevalence of depression decreases with age (to be justified by the literature). H4: The prevalence of depression is higher among females compared to males (to be justified by the literature).
The European Social Survey (ESS) is a dataset with over 50,000 questions covering wellbeing, social inequalities, immigration, and health. Within this dataset, there are direct questions on mental health status (D20–D27), which are used to calculate an individual’s depression score. The depression scores range from 0 to 24, with scores from 0 to 8 indicating mild depression and scores from 9 to 24 indicating severe clinical depression (University of Washington, 2025).
df$d20 = as.numeric(df$fltdpr)
df$d21 = as.numeric(df$flteeff)
df$d22 = as.numeric(df$slprl)
df$d23 = as.numeric(df$wrhpp)
df$d24 = as.numeric(df$fltlnl)
df$d25 = as.numeric(df$enjlf)
df$d26 = as.numeric(df$fltsd)
df$d27 = as.numeric(df$cldgng)
# reverse scales of d23 and d25 (negative coding)
df$d23 = 5 - df$d23
df$d25 = 5 - df$d25
# lookup: existing country names in the dataframe (df)
#table(df$cntry)
# selected country: United Kingdom (UK hereafter)
# subset dataset: rows where cntry is "United Kingdom", all columns
# name it "df_uk" (dataset UK)
df_uk = df[df$cntry == "United Kingdom", ]
# check
#table(df_uk$cntry)
In order to test how well the depression questionnaire measures depression, we calculated Cronbach’s alpha: 0.84. The results indicate a high internal consistency, suggesting that the questionnaire is a reliable tool for assessing depression scores.
#Gender all Data set not just U.K
#table(df$gndr)
#Visualisation
#ggplot(df, aes(x = gndr)) +
#geom_bar(fill="steelblue")+
#labs(title = "Gender Distribution",
# x = "Gender",
#y = "Count") +
# theme_minimal()
# eine Farbe für alle Balken, oder verschiedene Farben: #ggplot(df, aes(x = gndr, fill = gndr)) + scale_fill_manual(values = c("steelblue", "pink"))+ geom_bar()
#Likert Scale
#Zeigt allgemeine Verteilung von Depression Scores von allen Ländern im Datensatz auf
#Kann ggf. Verwendet werden u depression Scores zu Vergleichen. Wo liegt England? Über oder Unterm Average?
vnames = c("fltdpr", "flteeff", "slprl","wrhpp", "fltlnl", "enjlf", "fltsd","cldgng")
likert_numeric_df = as.data.frame(lapply((df[,vnames]), as.numeric))
likert_table = likert(df[,vnames])$results
likert_table$Mean = unlist(lapply((likert_numeric_df[,vnames]), mean, na.rm=T))
# ... and append new columns to the data frame
likert_table$Count = unlist(lapply((likert_numeric_df[,vnames]), function (x) sum(!is.na(x))))
likert_table$Item = c(
d20="how much of the time during the past week you felt depressed?",
d21="…you felt that everything you did was an effort?",
d22="…your sleep was restless?",
d23="…you were happy?",
d24="…you felt lonely?",
d25="…you enjoyed life?",
d26="…you felt sad?",
d27="…you could not get going?")
#likert_table
# round all percentage values to 1 decimal digit
#likert_table[,2:5] = round(likert_table[,2:5],1)
# round means to 3 decimal digits
#likert_table[,6] = round(likert_table[,6],3)
# create formatted table
#kable_styling(kable(likert_table,
#format="html",
#caption = "Distribution of answers regarding mental health items (ESS round 11, all countries, in %))"))
# create basic plot (code also valid)
#plot(likert(summary=likert_table[,1:5])) # limit to columns 1:6 to skip mean and count
The following visualizations present the distribution of age, gender, and depression scores to provide a clearer understanding of the sociodemographic characteristics in the dataset.
library(kableExtra)
library(knitr)
# check further (frequency table)
#table(df_uk$depres)
table_dep=data.frame(table(df_uk$depres))
#kable(table_dep,
#col.names = c("Depression Score","Frequency"),
#caption = "Frequency Distribution of Depressionscores in the UK")
#kable_styling(
#kable(table_dep,
#col.names = c("Depression Score","Frequency"),
#caption = "Frequency Distribution of Depressionscores in the UK"
#)
#,full_width = F, font_size = 13, bootstrap_options = c("hover", "condensed"))
#Demographic Data
scroll_box(
kable_styling(
kable(data.frame(table(df_uk$agea)), col.names = c("Age","Frequency"),
caption = "Distribution of Age in the Data of UK"
),full_width = F, font_size = 13, bootstrap_options = c("hover", "condensed")),height="300px")
| Age | Frequency |
|---|---|
| 15 | 5 |
| 16 | 8 |
| 17 | 9 |
| 18 | 6 |
| 19 | 7 |
| 20 | 10 |
| 21 | 12 |
| 22 | 11 |
| 23 | 10 |
| 24 | 19 |
| 25 | 18 |
| 26 | 26 |
| 27 | 15 |
| 28 | 16 |
| 29 | 20 |
| 30 | 25 |
| 31 | 19 |
| 32 | 32 |
| 33 | 34 |
| 34 | 30 |
| 35 | 22 |
| 36 | 40 |
| 37 | 24 |
| 38 | 37 |
| 39 | 19 |
| 40 | 20 |
| 41 | 27 |
| 42 | 16 |
| 43 | 28 |
| 44 | 29 |
| 45 | 22 |
| 46 | 21 |
| 47 | 29 |
| 48 | 37 |
| 49 | 20 |
| 50 | 27 |
| 51 | 22 |
| 52 | 17 |
| 53 | 27 |
| 54 | 20 |
| 55 | 24 |
| 56 | 20 |
| 57 | 24 |
| 58 | 25 |
| 59 | 26 |
| 60 | 31 |
| 61 | 31 |
| 62 | 25 |
| 63 | 25 |
| 64 | 26 |
| 65 | 21 |
| 66 | 29 |
| 67 | 31 |
| 68 | 33 |
| 69 | 23 |
| 70 | 36 |
| 71 | 24 |
| 72 | 32 |
| 73 | 27 |
| 74 | 23 |
| 75 | 26 |
| 76 | 27 |
| 77 | 22 |
| 78 | 18 |
| 79 | 28 |
| 80 | 31 |
| 81 | 21 |
| 82 | 18 |
| 83 | 13 |
| 84 | 14 |
| 85 | 9 |
| 86 | 10 |
| 87 | 5 |
| 88 | 10 |
| 89 | 7 |
| 90 | 16 |
#Distribution of Gender
scroll_box(
kable_styling(
kable(data.frame(table(df_uk$gndr)), col.names = c("Age","Frequency"),
caption = "Distribution of Gender in the Data of UK"
),full_width = F, font_size = 13, bootstrap_options = c("hover", "condensed")),height="300px")
| Age | Frequency |
|---|---|
| Male | 824 |
| Female | 860 |
#Distribution of Depression Score 0-8= okay, 9-24 =bad
scroll_box(
kable_styling(
kable(table_dep, col.names = c("Depression Score","Frequency"),
caption = "Frequency Distribution of Depressionscores in the UK"
),full_width = F, font_size = 13, bootstrap_options = c("hover", "condensed")),height="300px")
| Depression Score | Frequency |
|---|---|
| 0 | 103 |
| 1 | 98 |
| 2 | 172 |
| 3 | 201 |
| 4 | 167 |
| 5 | 158 |
| 6 | 146 |
| 7 | 144 |
| 8 | 94 |
| 9 | 78 |
| 10 | 55 |
| 11 | 43 |
| 12 | 34 |
| 13 | 35 |
| 14 | 27 |
| 15 | 14 |
| 16 | 15 |
| 17 | 11 |
| 18 | 11 |
| 19 | 9 |
| 20 | 9 |
| 21 | 1 |
| 22 | 3 |
| 23 | 3 |
| 24 | 4 |
The table below presents the frequency distribution of all depression scores within the U.K. dataset. Scores ranging from 0 to 8 are associated with no or very mild depressive symptoms, whereas scores between 9 and 24 are associated with clinically severe depression. The chart illustrates the frequencies of these two categories: non-severe depression (0–8) and severe depression (9–24). A total of 1,283 individuals scored between 0 and 8, while 352 individuals fell into the category of severe depression (9–24).
depression_table_uk = table(df_uk$depres)
#depression_table_uk
#Just show me the scores of people with equal or higher than 9 depression scores
df_uk$dep=ifelse(df_uk$depres >= 9,1,0)
#df_uk$dep
#table(df_uk$dep)
#Balkendiagram sever and non severe Depression
df_uk$dep = ifelse(df_uk$depres >= 9, 1, 0)
#labels beschreiben
df_uk$dep=factor(df_uk$dep, levels = c(0,1),
labels = c("Non-severe depression", "Severe depression"))
#Mit Zahlen der Categorien im Balkendiagram
ggplot(df_uk, aes(x = dep)) +
geom_bar(fill = "steelblue") +
geom_text(stat = "count", aes(label = ..count..), vjust = -0.5) +
labs(title = "Depression Severity",
x = "Depression category",
y = "Number of participants") +
theme_minimal()
#Calculating Odds Ratio between people with lower score 0-8 and people with higher score 9 up to 24
#People with depression scale between 0-8: 1283
#People with despression scale between 9-24: 352
#Odds Ratio: 78/1557=0,050 --> Odds are lower to have a severe depression
aModel = glm(dep ~ gndr, data=df_uk, family=binomial)
# Show summary of regression model
summary(aModel)
##
## Call:
## glm(formula = dep ~ gndr, family = binomial, data = df_uk)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.45815 0.09035 -16.139 <2e-16 ***
## gndrFemale 0.30941 0.12131 2.551 0.0108 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1703.3 on 1634 degrees of freedom
## Residual deviance: 1696.7 on 1633 degrees of freedom
## (49 observations deleted due to missingness)
## AIC: 1700.7
##
## Number of Fisher Scoring iterations: 4
coef(aModel)
## (Intercept) gndrFemale
## -1.4581529 0.3094088
# Interpretation:
#Calculating odds Ratio
exp(coef(aModel))
## (Intercept) gndrFemale
## 0.2326656 1.3626193
# Calculate Confidence Intervals for ORs
exp(confint(aModel))
## 2.5 % 97.5 %
## (Intercept) 0.194246 0.2768727
## gndrFemale 1.075026 1.7300513
#coef(aModel) gives the raw coefficients from your model (log-odds if logistic regression).
#exp(coef(aModel)) converts each coefficient into an odds ratio.
#exp(confint(aModel)) converts the interval bounds from log-odds to odds ratios.
# Multivariate logistic regression
#Altersgruppen erstellen
# Beispiel: Altersgruppen
#was ist Alter?
str(df_uk$age)
## Factor w/ 76 levels "15","16","17",..: 1 52 74 48 42 56 76 49 19 27 ...
#--> Wörter nicht numerisch!
#Alter umwandeln in numerisch
df_uk$age <- as.numeric(as.character(df_uk$age))
#Altersdgruppen Bilden
df_uk$age_group <- cut(
df_uk$age,
breaks = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100),
labels = c("0-9","10-19","20-29","30-39","40-49",
"50-59","60-69","70-79","80-89","90+"),
right = FALSE
)
#Überprüfen
table(df_uk$age_group)
##
## 0-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89 90+
## 0 35 157 282 249 232 275 263 138 16
#Erstellen
df_uk$age_group =cut(df_uk$age,
breaks = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100),labels = c("0-9","10-19","20-29","30-39","40-49",
"50-59","60-69","70-79","80-89","90+"),
right = FALSE)
aModel_multi_cat =glm(depres ~ gndr + age_group,
data = df_uk)
# Koeffizienten, Odds Ratios, CI
coef(aModel_multi_cat)
## (Intercept) gndrFemale age_group20-29 age_group30-39 age_group40-49
## 4.7521123 0.6722459 1.3834450 0.5989830 0.7372801
## age_group50-59 age_group60-69 age_group70-79 age_group80-89 age_group90+
## 1.5972234 0.4029606 0.4053788 0.3252356 0.4226898
exp(coef(aModel_multi_cat))
## (Intercept) gndrFemale age_group20-29 age_group30-39 age_group40-49
## 115.828694 1.958631 3.988619 1.820267 2.090242
## age_group50-59 age_group60-69 age_group70-79 age_group80-89 age_group90+
## 4.939299 1.496248 1.499870 1.384357 1.526061
exp(confint(aModel_multi_cat))
## 2.5 % 97.5 %
## (Intercept) 26.2667919 510.769889
## gndrFemale 1.2767516 3.004685
## age_group20-29 0.7874765 20.202610
## age_group30-39 0.3839934 8.628717
## age_group40-49 0.4348195 10.048109
## age_group50-59 1.0244261 23.814968
## age_group60-69 0.3152145 7.102332
## age_group70-79 0.3140619 7.162955
## age_group80-89 0.2671143 7.174620
## age_group90+ 0.1074058 21.682829
# Modell mit kleineren Altersgruppen
aModel_multi_cat = glm(depres ~ gndr + age_group ,data = df_uk)
# Koeffizienten
coefs =coef(aModel_multi_cat)
# 95%-Konfidenzintervalle
ci = confint(aModel_multi_cat)
# Zusammenführen in ein DataFrame
df_or = data.frame(
term = names(coefs),
OR = exp(coefs),
OR_lower = exp(ci[,1]),
OR_upper = exp(ci[,2]))
#Forestplot
# Odds Ratios und CIs berechnen
coefs =coef(aModel_multi_cat)
ci =confint(aModel_multi_cat)
df_or =data.frame(
term = names(coefs),
OR = exp(coefs),
OR_lower = exp(ci[,1]),
OR_upper = exp(ci[,2])
)
# Intercept entfernen
df_or =df_or[df_or$term != "(Intercept)", ]
# Optional: Labels kürzen
df_or$term =gsub("age_group", "", df_or$term)
df_or$term =gsub("gndr", "", df_or$term)
df_or$term = gsub("Female", "F", df_or$term)
library(ggplot2)
ggplot(df_or, aes(x = term, y = OR)) +
geom_point(size = 3) +
geom_errorbar(aes(ymin = OR_lower, ymax = OR_upper), width = 0.2) +
geom_hline(yintercept = 1, linetype = "dashed", color = "red") +
coord_flip() + # horizontale Darstellung
ylab("Odds Ratio (95% CI)") +
xlab("") +
ggtitle("Odds Ratios for Gender and Age (without Intercept)") +
theme_minimal() +
theme(axis.text.y = element_text(size = 10),
axis.text.x = element_text(size = 10),
plot.margin = margin(5, 5, 5, 10))
# Gender, Age, and Depression
In the following regression model, differences in high depression scores between men and women were examined. The results indicate that females have 1.36 times higher odds of having a high depression score compared to males.
Regarding age groups, individuals aged 50–59 showed the highest odds ratio for developing severe depression. However, all age groups had wide confidence intervals, indicating that these results are not statistically robust.
The Spearman correlation coefficient between depression scores and age is -0.04, indicating a very weak negative correlation. As age increases, depression scores tend to decrease slightly (and vice versa). This suggests that there is almost no meaningful relationship between age and depression within the U.K. dataset (see scatter plot). In this context, age appears to have little to no impact on depression scores.
# hypothesis 3: prevalence of depression decreases with age (UK)
#table(df_uk$agea) # just to check first (not meaningful): youngest 15y, oldest 90y
# convert "agea" (age) into numeric
df_uk$age = as.numeric(as.character(df_uk[,"agea"]))
# check: scatter plot (visual inspection)
#plot(df_uk$age, df_uk$depres, main = "Scatter Plot: Age, Depression" , xlab = "Age", ylab = "Depression")
#Wäre schön hier Prozentuale Anteile zu sehen anstatt Frequencies.
#Hiereinmal Age von Dscrrce
#hist(df_uk$age[df_uk$dscrrce == "Marked"], breaks = 12, main = "Histogram: Marked", xlab = "Age", col = "steelblue")
#hist(df_uk$age[df_uk$dscrrce == "Not marked"], breaks = 12, main = "Histogram: Not Marked", xlab = "Age", col = "steelblue")
#Hier einmal Age von Dscrsex
#hist(df_uk$age[df_uk$dscrsex == "Marked"], breaks = 12, main = "Histogram: Marked", xlab = "Age", col = "steelblue")
#hist(df_uk$age[df_uk$dscrsex == "Not marked"], breaks = 12, main = "Histogram: Not Marked", xlab = "Age", col = "steelblue")
library(ggplot2)
# Alle Alterswerte als Faktor
#ages = sort(unique(df_uk$age))
#ggplot(df_uk, aes(x = factor(age), y = depres)) +
#geom_col(fill = "steelblue") +
#scale_x_discrete(
#breaks = ages[seq(1, length(ages), by = 5)] # nur jeden 5. Alterswert anzeigen
#) +
#labs(
# title = "Depressionsscore nach Alter",
# x = "Alter",
# y = "Depressionsscore"
#) +
# theme_minimal() +
#theme(axis.text.x = element_text(angle = 45, hjust = 1))
# scatter plot shows: not linear - NO Pearson Product-Moment Correlation; assumption: no relationship between both variables.
# use spearman-correlation
# is there a statistically significant association between the two metric variables "depression" and "age"?
# and how strong is it? effect size measure for the Wilcoxon test: correlation coefficient r
#cor(df_uk[, c("depression", "Age")], method = "spearman", use = "complete.obs")
# interpretation:
# spearman's correlation coefficient between depression and age is -0.04 (very weak negative correlation).
# as age increases, depression score tends to decrease (and vice versa).
# indicates that H3 holds. However:
# correlation coefficient of -.04 is very close to 0; indicates a very weak relationship between depression and age, almost none (see also scatterplot).
# in the context of this dataset for the UK, age has little to no meaningful impact on depression scores.
# does a statistically significant relationship exist between the two variables?
# store in variable "pvalue"
pvalue = cor.test(df_uk$depres, df_uk$age, method = "spearman")
pvalue # print p-value
##
## Spearman's rank correlation rho
##
## data: df_uk$depres and df_uk$age
## S = 717728947, p-value = 0.09598
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## -0.04156594
# interpretation:
# p-value = .096 and is > .05 (set significance level)
# meaning the correlation is not statistically significant
# strength and direction of the relationship (not meaningful because no statistically significance)
# just for curiosity: rₛ (Spearman’s rho): -.04
# very low effect size, almost nonexistent
# H3 (see above) is rejected, and H0 is retained: The sample data supports H0 (indicating no relationship)
# H3 done.
The following chart illustrates the distribution of perceived discrimination based on sexuality within the U.K. dataset. “Marked” indicates a feeling of discrimination (n = 38), while “Not marked” indicates no perceived discrimination (n = 1,646).
#Distribution of Marked and not Marked (Marked= Different Sexuality, Not Marked=Straight)
#table(df_uk$dscrrc)
ggplot(df_uk, aes(x = dscrsex)) +
geom_bar(fill = "steelblue") +
geom_text(stat = "count", aes(label = ..count..), vjust = -0.5)+
scale_x_discrete(labels = c("Marked" = "Marked", "Not marked" = "Not marked")) +
labs(title = "", x = "Perceived discrimination sexuality", y = "Count") +
theme_minimal()
# Depression Score and Sexuality The following histogram visualizes the
distribution of depression scores. Individuals who perceive
discrimination based on their sexuality (“marked”) in the U.K. have an
average depression score of 7.4, compared to an average score of 5.8
among individuals who do not perceive discrimination (“not marked”).
However, it should be noted that the distribution of scores among the
“not marked” group is wider (0–24) than that of the “marked” group
(1–18), which should be taken into consideration when interpreting the
data.
by(df_uk$depres, df_uk$dscrsex, mean, na.rm=T)
## df_uk$dscrsex: Not marked
## [1] 5.798998
## ------------------------------------------------------------
## df_uk$dscrsex: Marked
## [1] 7.421053
# mean depression score for two groups (Not marked, Marked)
# histogram for "not marked" group
#hist(df_uk$depres[df_uk$dscrsex == "Not marked"], breaks = 12, main = "Histogram: Not marked",
# xlab = "Depression Score",
# col = "steelblue")
# histogram for "Marked" group
#hist(df_uk$depres[df_uk$dscrsex == "Marked"], breaks = 12, main ="Histogram: Marked",
# xlab = "Depression score",
# col = "steelblue")
# histograms: probably no normal distribution of the data
# use Wilcoxon-test (rank based)
# Visualisierung beide Gruppen
# Basis: nur ASCII, keine Pipes, kein <-, kein percent_format()
library(ggplot2)
# Optional: NAs entfernen (sonst fehlen Kategorien im Plot)
df_sub = df_uk[!is.na(df_uk$dscrsex) & !is.na(df_uk$depres), ]
# 1) Zaehlen: wie viele pro Gruppe (dscrsex) und Score (depres)
counts = as.data.frame(table(dscrsex = df_sub$dscrsex,
depres = df_sub$depres))
names(counts)[names(counts) == "Freq"] = "n"
# 2) Gesamt je Gruppe
totals = aggregate(n ~ dscrsex, data = counts, FUN = sum)
names(totals)[names(totals) == "n"] = "total"
# 3) Mergen und Prozent berechnen
df_plot = merge(counts, totals, by = "dscrsex")
df_plot$pct = df_plot$n / df_plot$total
# (optional) Depression-Scores sortieren
df_plot$depres = factor(df_plot$depres, levels = sort(unique(df_plot$depres)))
# 4) Plotten: Facetten je Gruppe, Y-Achse in %
ggplot(df_plot, aes(x = depres, y = pct)) +
geom_col(width = 0.6, fill = "steelblue") +
facet_wrap(vars(dscrsex)) +
scale_y_continuous(labels = function(x) paste0(round(x * 100, 1), " %")) +
labs(subttitle = "Depression Score by Perceived Discrimination (Sexuality)",)
#table(df_uk$depres)
#Marked=Gay, Not Marked= Straight
library(ggplot2)
#Boxplot
#ggplot(df_uk, aes(x = dscrsex, y = depres,)) +
#geom_boxplot(fill="steelblue",alpha = 0.7) +
#scale_x_discrete(labels = c("Not marked" = "Not marked", "Marked" = "Marked")) +
#labs(title = "Depression score and preceived Discrimination of Sexuality",
#x = "Sexuality ",
#y = "Depression score") +
#theme_minimal() +
#theme(legend.position = "none")
#wilcox.test(depres ~ dscrsex, data=df_uk)
by(df_uk$depres, df_uk$dscrsex, summary, na.rm=T) # meaningful for interpretation (MEDIAN). 49 NA's
## df_uk$dscrsex: Not marked
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 3.000 5.000 5.799 8.000 24.000 49
## ------------------------------------------------------------
## df_uk$dscrsex: Marked
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 4.000 7.500 7.421 9.000 18.000
# check frequency
#table(df_uk$dscrsex) # (1623 + 61) - 49 NA's: n = 1635
In our dataset, 61 individuals reported experiencing discrimination based on their race, while 1,623 individuals did not. There was a significant difference in depression scores between those who perceived racial discrimination (“marked,” mean = 6.934) and those who did not (“not marked,” mean = 5.794), (p = 0.0108). The “not marked” group exhibited a wider range of depression scores (0–24) compared to the “marked” group (0–18), which should be considered when interpreting the results.
by(df_uk$depres, df_uk$dscrrce, mean, na.rm=T)
## df_uk$dscrrce: Not marked
## [1] 5.794155
## ------------------------------------------------------------
## df_uk$dscrrce: Marked
## [1] 6.934426
# mean depression score for two groups (Not marked, Marked)
# Not marked (no discrimination based on skin colour or race) - Mean = 1.72
# Marked (discrimination based on skin colour or race was perceived or reported) - Mean = 1.86 (rounded 1.87)
# this is a difference of 0.143 points (on the scale), which is even a lower difference on the scale than for "dscrsex" - borderline-significant (wegen der Standardabweichung)
# interpretation: In the UK, participants who report experiencing discrimination (skin colour or race) have, on average, higher depression scores
# compared to participants who do not report discrimination (skin colour or race).
# check further to see if this difference is statistically significant
# which test is appropriate?
# check for normal distribution of the data
# histogram for "not marked" group
hist(df_uk$depres[df_uk$dscrrce == "Not marked"], breaks = 12, main = "Histogram: Not marked", xlab = "Depression Score", col = "steelblue")
# histogram for "Marked" group
hist(df_uk$depres[df_uk$dscrrce == "Marked"], breaks = 12, main = "Histogram: Marked", xlab = "Depression Score", col = "steelblue")
# histograms: probably no normal distribution of the data
# use Wilcoxon-test (rank based)
wilcox.test(df_uk$depres ~ df_uk$dscrrce)
##
## Wilcoxon rank sum test with continuity correction
##
## data: df_uk$depres by df_uk$dscrrce
## W = 38817, p-value = 0.0108
## alternative hypothesis: true location shift is not equal to 0
by(df_uk$depres, df_uk$dscrrce, summary, na.rm=T) # meaningful for interpretation (MEDIAN). 49 NA's
## df_uk$dscrrce: Not marked
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 3.000 5.000 5.794 8.000 24.000 49
## ------------------------------------------------------------
## df_uk$dscrrce: Marked
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 4.000 7.000 6.934 10.000 19.000
# check frequency
table(df_uk$dscrrce) # (1623 + 61) - 49 NA's: n = 1635
##
## Not marked Marked
## 1623 61
# interpretation:
# not marked: Mdn = 1.625; IQR = 625)
# marked: Mdn = 1.875; IQR = 750)
# p-value = .011 (very low) and is < .05 (significance level)
# there is a statistically significant difference (in the median) of the depression scores between the two groups of dscrrce "Not marked" and "Marked"
# with "Marked" being higher: (1.875 - 1.625 = 0.25 points on the depression scale)
# individuals in the "Marked" group (those who report discrimination) show a wider range of depression scores compared to the "Not marked" group
# H2 done.
In a multivariate logistic regression, gender and experienced discrimination (Sexuality) were significant predictors of severe depressive symptoms (depression score ≥ 9). Women had a 40% higher risk compared to men (OR = 1.40, p = 0.006). Individuals who experienced discrimination based on sexual orientation had a substantially increased risk (OR = 2.40, p = 0.011). Age was not a significant predictor (p = 0.19). These findings highlight that gender and discrimination (Sexuality) are key factors associated with severe depression.
df_clean = df_uk[!is.na(df_uk$depres) &
!is.na(df_uk$age) &
!is.na(df_uk$gndr) &
!is.na(df_uk$dscrsex), ]
df_clean$dep = ifelse(df_clean$depres >= 9, 1, 0)
df_clean$dep = as.numeric(df_clean$dep)
table(df_clean$dep)
##
## 0 1
## 1257 348
model = glm(dep ~ age + gndr + dscrsex,
data = df_clean,
family = binomial)
summary(model)
##
## Call:
## glm(formula = dep ~ age + gndr + dscrsex, family = binomial,
## data = df_clean)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.263957 0.195819 -6.455 1.08e-10 ***
## age -0.004249 0.003240 -1.312 0.1896
## gndrFemale 0.337397 0.122789 2.748 0.0060 **
## dscrsexMarked 0.877132 0.342639 2.560 0.0105 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1678.4 on 1604 degrees of freedom
## Residual deviance: 1662.6 on 1601 degrees of freedom
## AIC: 1670.6
##
## Number of Fisher Scoring iterations: 4
#Odds Ratio bestimmen
exp(coef(model))
## (Intercept) age gndrFemale dscrsexMarked
## 0.2825337 0.9957596 1.4012956 2.4039944
#Visualisierung
library(ggplot2)
library(broom) # für tidy()
# Modell 1: dep ~ gndr
model1 = glm(dep ~ gndr, family = binomial, data = df_uk)
# Modell 2: dep ~ age + gndr + dscrsex
model2 = glm(dep ~ age + gndr + dscrsex, family = binomial, data = df_uk)
# tidy die Modelle
tidy1 = broom::tidy(model1, conf.int = TRUE, exponentiate = TRUE)
tidy2 = broom::tidy(model2, conf.int = TRUE, exponentiate = TRUE)
# addiere Modell-ID
tidy1$model ="Model 1"
tidy2$model = "Model 2"
# zusammenführen
df_plot = rbind(tidy1, tidy2)
# Forest plot
ggplot(df_plot[-1,], aes(x = term, y = estimate, ymin = conf.low, ymax = conf.high, color = model)) +
geom_pointrange(position = position_dodge(width = 0.5)) +
coord_flip() +
labs(x = "", y = "Odds Ratio (95% CI)", title = "Forest Plot of Regression Models") +
theme_minimal() +
geom_hline(yintercept = 1, linetype = "dashed")
Model 1: Depression and Gender Model 2: Depression, Gender, Age, Discrimination/Sexuality
A logistic regression analysis was conducted to examine factors associated with severe depression (score ≥ 9). In the bivariate model, women had significantly higher odds of severe depression compared to men (OR = 1.36; 95% CI: 1.08–1.73; p = 0.011). When age groups were included in a multivariate model, gender remained a significant predictor, whereas age showed no clear effect, likely due to wide confidence intervals. Analyses of discrimination indicated that participants reporting experiences of racial discrimination had significantly higher depression scores (median = 7 vs. 5; p = 0.011). In a multivariate model controlling for age and gender, both female gender (OR ≈ 1.40; p = 0.008) and racial discrimination (OR ≈ 1.89; p = 0.026) were associated with increased odds of severe depression, while age remained non-significant. Conclusion: Female gender and experiences of discrimination—whether based on sexual orientation or race—are significant risk factors for severe depressive symptoms, whereas age does not appear to have a significant effect in this dataset.
df_clean2 = df_uk[!is.na(df_uk$depres) &
!is.na(df_uk$age) &
!is.na(df_uk$gndr) &
!is.na(df_uk$dscrrce), ]
df_clean$dep = ifelse(df_clean$depres >= 9, 1, 0)
df_clean$dep = as.numeric(df_clean$dep)
table(df_clean$dep)
##
## 0 1
## 1257 348
model2 = glm(dep ~ age + gndr + dscrrce,
data = df_clean,
family = binomial)
summary(model2)
##
## Call:
## glm(formula = dep ~ age + gndr + dscrrce, family = binomial,
## data = df_clean)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.247838 0.195064 -6.397 1.58e-10 ***
## age -0.004464 0.003233 -1.381 0.16740
## gndrFemale 0.325487 0.122477 2.658 0.00787 **
## dscrrceMarked 0.636840 0.286119 2.226 0.02603 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1678.4 on 1604 degrees of freedom
## Residual deviance: 1664.1 on 1601 degrees of freedom
## AIC: 1672.1
##
## Number of Fisher Scoring iterations: 4
#Odds Ratio bestimmen
exp(coef(model2))
## (Intercept) age gndrFemale dscrrceMarked
## 0.2871250 0.9955459 1.3847047 1.8904974
#Visualisierung
library(ggplot2)
# Odds Ratios und 95%-Konfidenzintervalle berechnen
OR =exp(coef(model2))
CI =exp(confint(model2))
# Datenframe für ggplot vorbereiten
plot_data =data.frame(
term = names(OR),
OR = OR,
lower = CI[,1],
upper = CI[,2]
)
# Forest Plot
ggplot(plot_data, aes(x = term, y = OR, ymin = lower, ymax = upper)) +
geom_pointrange(color = "steelblue", size = 1) +
geom_hline(yintercept = 1, linetype = "dashed", color = "red") +
coord_flip() + # horizontal drehen
labs(title = "Odds Ratios from Logistic Regression",
x = "",
y = "Odds Ratio (95% CI)") +
theme_minimal()
Our results are consistent with previous research. For example, individuals who perceive discrimination based on their sexuality are more likely to experience depression than those who do not feel discriminated against. The United Kingdom Survey on the Mental Health of LGBTQ+ (2024) highlighted this issue and reported that victimization, discrimination, and lack of access to affirming spaces contribute to poorer mental health outcomes. Our data confirm these findings. Similarly, our results show that racial discrimination is associated with higher depression scores. This may be linked to increased rates of victimization and limited access to affirming spaces. According to Stop Hate UK, 43% of all hate crimes reported to their helpline were motivated by racism. This could reflect the historical legacy of colonialism and systemic racism in the U.K. Another contributing factor may be the lack of representation of ethnic minorities in positions of power across politics, media, and business, which can exacerbate feelings of marginalization and stress. Our analysis of age and depression showed very weak and non-significant correlations. Age does not appear to have a meaningful influence on depression scores in this dataset. However, the wide confidence intervals across age groups indicate that these results should be interpreted with caution and are not statistically robust. Gender differences in depression were pronounced in our findings. Women had significantly higher odds of developing severe depression compared to men. This gender gap may reflect societal pressures and structural inequalities, including lower pay, greater domestic responsibilities, and caregiving burdens that disproportionately affect women. Further research is needed to identify additional drivers of depression. According to the Mental Health Foundation UK, individuals living in the lowest socioeconomic groups are more likely to experience common mental health problems such as depression and anxiety. Loneliness is another significant factor, particularly among older adults (Sheffield Hallam University, 2025). Moreover, inequalities in access to healthcare services in the U.K. contribute to higher rates of depression (Royal College of Psychiatrists, 2025). Other potential determinants include lifestyle factors such as diet, exercise, and general health behaviors, which may also play a significant role in mental health outcomes. In conclusion, our findings underscore that gender and experiences of discrimination—whether based on sexual orientation or race—are key risk factors for severe depression, whereas age appears to have little direct effect in this dataset. Addressing structural inequalities and providing supportive, inclusive environments may be crucial steps in mitigating the impact of these risk factors.
Chapman, L. (2022). “I want to fit in… but I don’t want to change myself”: A study on autistic teenagers’ experiences of masking, mental health and their interplay. Research in Autism Spectrum Disorders, 96, 102016. https://doi.org/10.1016/j.rasd.2022.102016
European Social Survey. (n.d.). Data portal. European Social Survey ERIC. Retrieved September 5, 2025, from https://www.europeansocialsurvey.org/data-portal
Stop Hate UK. (n.d.). Racism in the UK. Retrieved September 5, 2025, from Stop Hate UK website: https://www.stophateuk.org/about-hate-crime/racism-in-the-uk/
University of Washington. (n.d.). Patient Health Questionnaire-9 (PHQ-9). In Mental Health Screening. National HIV Curriculum. Retrieved September 4, 2025, from https://www.hiv.uw.edu/page/mental-health-screening/phq-9
World Health Organization. (2023, September 19). Depression. In Fact sheets. https://www.who.int/news-room/fact-sheets/detail/depression
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.