1 Introduction
2 Hypothesis
3 Methods
4 Sociodemographic Data and Depression Scores within the UK Dataset
5 Distribution of Depressionscores with in the U.K
- 5.1 Gender
- 5.2 Age
6 Distribution of Perceived Discrimination Due to Sexuality
7 Depression and Racial Discrimination
8 Regression Model: Age, Gender, and Discrimination Based on Sexuality
9 Regression Model 2: Depression, Age, Gender, and Racial Discrimination
10 Discussion
11 References

1 Introduction

Depression is a prevalent mental disorder, experienced by 4–10% of the global population over their lifetime (Chapman et al., 2022). Currently, around 280 million people (3.8%) are affected worldwide (WHO, 2023), with depression ranked among the leading contributors to the global health burden in 2019. Consequently, it represents a highly relevant field of research.

The present paper aims to investigate depression in a British population, as 15–30% of individuals do not recover after two or more treatments (Chapman et al., 2022). A greater understanding of potential contributing factors is therefore crucial for improving recovery outcomes.

library(foreign)
library(ltm)
library(ggplot2)
library(likert) 
library(kableExtra)

#setwd("h:/MCI/Lehre/09-AdvancedStatistics/ihsm24/data")
setwd("/Users/annarendez/Desktop/Master/Advanced Statistics/R-Data")
df = read.spss("ESS11.sav", to.data.frame = T)

2 Hypothesis

H1: The prevalence of depression increases with experiences of discrimination based on an individual’s sexuality (LGBQ+).

H2: The prevalence of depression increases with experiences of discrimination based on an individual’s skin colour or race.

H3: The prevalence of depression decreases with age (to be justified by the literature). H4: The prevalence of depression is higher among females compared to males (to be justified by the literature).

3 Methods

The European Social Survey (ESS) is a dataset with over 50,000 questions covering wellbeing, social inequalities, immigration, and health. Within this dataset, there are direct questions on mental health status (D20–D27), which are used to calculate an individual’s depression score. The depression scores range from 0 to 24, with scores from 0 to 8 indicating mild depression and scores from 9 to 24 indicating severe clinical depression (University of Washington, 2025).

df$d20 = as.numeric(df$fltdpr)
df$d21 = as.numeric(df$flteeff)
df$d22 = as.numeric(df$slprl)
df$d23 = as.numeric(df$wrhpp)
df$d24 = as.numeric(df$fltlnl)
df$d25 = as.numeric(df$enjlf)
df$d26 = as.numeric(df$fltsd)
df$d27 = as.numeric(df$cldgng)


# reverse scales of d23 and d25 (negative coding)
df$d23 = 5 - df$d23
df$d25 = 5 - df$d25


# lookup: existing country names in the dataframe (df)
#table(df$cntry)
# selected country: United Kingdom (UK hereafter)
# subset dataset: rows where cntry is "United Kingdom", all columns
# name it "df_uk" (dataset UK)
df_uk = df[df$cntry == "United Kingdom", ]
# check
#table(df_uk$cntry)

In order to test how well the depression questionnaire measures depression, we calculated Cronbach’s alpha: 0.84. The results indicate a high internal consistency, suggesting that the questionnaire is a reliable tool for assessing depression scores.

#Gender all Data set not just U.K

#table(df$gndr)

#Visualisation

#ggplot(df, aes(x = gndr)) +
  #geom_bar(fill="steelblue")+ 
  #labs(title = "Gender Distribution", 
      # x = "Gender", 
       #y = "Count") +
 # theme_minimal()

  # eine Farbe für alle Balken, oder verschiedene Farben:  #ggplot(df, aes(x = gndr, fill = gndr)) + scale_fill_manual(values = c("steelblue", "pink"))+ geom_bar()

#Likert Scale
#Zeigt allgemeine Verteilung von Depression Scores von allen Ländern im Datensatz auf 
#Kann ggf. Verwendet werden u depression Scores zu Vergleichen. Wo liegt England? Über oder Unterm Average? 

vnames = c("fltdpr", "flteeff", "slprl","wrhpp", "fltlnl", "enjlf", "fltsd","cldgng")
likert_numeric_df = as.data.frame(lapply((df[,vnames]), as.numeric))
likert_table = likert(df[,vnames])$results 
likert_table$Mean = unlist(lapply((likert_numeric_df[,vnames]), mean, na.rm=T)) 
# ... and append new columns to the data frame
likert_table$Count = unlist(lapply((likert_numeric_df[,vnames]), function (x) sum(!is.na(x))))
likert_table$Item = c(
  d20="how much of the time during the past week you felt depressed?",
  d21="…you felt that everything you did was an effort?",
  d22="…your sleep was restless?",
  d23="…you were happy?",
  d24="…you felt lonely?",
  d25="…you enjoyed life?",
  d26="…you felt sad?",
  d27="…you could not get going?")
#likert_table

# round all percentage values to 1 decimal digit
#likert_table[,2:5] = round(likert_table[,2:5],1)
# round means to 3 decimal digits
#likert_table[,6] = round(likert_table[,6],3)

# create formatted table
#kable_styling(kable(likert_table,
                    #format="html",
                    #caption = "Distribution of answers regarding mental health items (ESS round 11, all countries, in %))"))
# create basic plot (code also valid) 
#plot(likert(summary=likert_table[,1:5])) # limit to columns 1:6 to skip mean and count

4 Sociodemographic Data and Depression Scores within the UK Dataset

The following visualizations present the distribution of age, gender, and depression scores to provide a clearer understanding of the sociodemographic characteristics in the dataset.

library(kableExtra)
library(knitr)
# check further (frequency table)
#table(df_uk$depres)

table_dep=data.frame(table(df_uk$depres))


#kable(table_dep,
      #col.names = c("Depression Score","Frequency"),
      #caption = "Frequency Distribution of Depressionscores in the UK")

#kable_styling(
 #kable(table_dep,
     #col.names = c("Depression Score","Frequency"),
      #caption = "Frequency Distribution of Depressionscores in the UK"
      #)
 #,full_width = F, font_size = 13, bootstrap_options = c("hover", "condensed"))


#Demographic Data
scroll_box(
  kable_styling(
  kable(data.frame(table(df_uk$agea)), col.names = c("Age","Frequency"),
      caption = "Distribution of Age in the Data of UK"
      ),full_width = F, font_size = 13, bootstrap_options = c("hover", "condensed")),height="300px")

Distribution of Age in the Data of UK
Age	Frequency
15	5
16	8
17	9
18	6
19	7
20	10
21	12
22	11
23	10
24	19
25	18
26	26
27	15
28	16
29	20
30	25
31	19
32	32
33	34
34	30
35	22
36	40
37	24
38	37
39	19
40	20
41	27
42	16
43	28
44	29
45	22
46	21
47	29
48	37
49	20
50	27
51	22
52	17
53	27
54	20
55	24
56	20
57	24
58	25
59	26
60	31
61	31
62	25
63	25
64	26
65	21
66	29
67	31
68	33
69	23
70	36
71	24
72	32
73	27
74	23
75	26
76	27
77	22
78	18
79	28
80	31
81	21
82	18
83	13
84	14
85	9
86	10
87	5
88	10
89	7
90	16

#Distribution of Gender

scroll_box(
  kable_styling(
  kable(data.frame(table(df_uk$gndr)), col.names = c("Age","Frequency"),
      caption = "Distribution of Gender in the Data of UK"
      ),full_width = F, font_size = 13, bootstrap_options = c("hover", "condensed")),height="300px")

Distribution of Gender in the Data of UK
Age	Frequency
Male	824
Female	860

#Distribution of Depression Score 0-8= okay, 9-24 =bad
scroll_box(
  kable_styling(
  kable(table_dep, col.names = c("Depression Score","Frequency"),
      caption = "Frequency Distribution of Depressionscores in the UK"
      ),full_width = F, font_size = 13, bootstrap_options = c("hover", "condensed")),height="300px")

Frequency Distribution of Depressionscores in the UK
Depression Score	Frequency
0	103
1	98
2	172
3	201
4	167
5	158
6	146
7	144
8	94
9	78
10	55
11	43
12	34
13	35
14	27
15	14
16	15
17	11
18	11
19	9
20	9
21	1
22	3
23	3
24	4

5 Distribution of Depressionscores with in the U.K

The table below presents the frequency distribution of all depression scores within the U.K. dataset. Scores ranging from 0 to 8 are associated with no or very mild depressive symptoms, whereas scores between 9 and 24 are associated with clinically severe depression. The chart illustrates the frequencies of these two categories: non-severe depression (0–8) and severe depression (9–24). A total of 1,283 individuals scored between 0 and 8, while 352 individuals fell into the category of severe depression (9–24).

depression_table_uk = table(df_uk$depres)
#depression_table_uk 

#Just show me the scores of people with equal or higher than 9 depression scores

df_uk$dep=ifelse(df_uk$depres >= 9,1,0)
#df_uk$dep

#table(df_uk$dep)

#Balkendiagram sever and non severe Depression

df_uk$dep = ifelse(df_uk$depres >= 9, 1, 0)

#labels beschreiben
df_uk$dep=factor(df_uk$dep, levels = c(0,1),
                    labels = c("Non-severe depression", "Severe depression"))


#Mit Zahlen der Categorien im Balkendiagram

ggplot(df_uk, aes(x = dep)) +
  geom_bar(fill = "steelblue") +
  geom_text(stat = "count", aes(label = ..count..), vjust = -0.5) +
  labs(title = "Depression Severity",
       x = "Depression category",
       y = "Number of participants") +
  theme_minimal()

#Calculating Odds Ratio between people with lower score 0-8 and people with higher score 9 up to 24

#People with depression scale between 0-8: 1283
#People with despression scale between 9-24: 352 
#Odds Ratio: 78/1557=0,050 --> Odds are lower to have a severe depression

aModel = glm(dep ~ gndr, data=df_uk, family=binomial) 
# Show summary of regression model
summary(aModel)

## 
## Call:
## glm(formula = dep ~ gndr, family = binomial, data = df_uk)
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -1.45815    0.09035 -16.139   <2e-16 ***
## gndrFemale   0.30941    0.12131   2.551   0.0108 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1703.3  on 1634  degrees of freedom
## Residual deviance: 1696.7  on 1633  degrees of freedom
##   (49 observations deleted due to missingness)
## AIC: 1700.7
## 
## Number of Fisher Scoring iterations: 4

coef(aModel)

## (Intercept)  gndrFemale 
##  -1.4581529   0.3094088

# Interpretation:

#Calculating odds Ratio
exp(coef(aModel))

## (Intercept)  gndrFemale 
##   0.2326656   1.3626193

# Calculate Confidence Intervals for ORs
exp(confint(aModel))

##                2.5 %    97.5 %
## (Intercept) 0.194246 0.2768727
## gndrFemale  1.075026 1.7300513

#coef(aModel) gives the raw coefficients from your model (log-odds if logistic regression).
#exp(coef(aModel)) converts each coefficient into an odds ratio.
#exp(confint(aModel)) converts the interval bounds from log-odds to odds ratios.


# Multivariate logistic regression
#Altersgruppen erstellen
# Beispiel: Altersgruppen

#was ist Alter?

str(df_uk$age)

##  Factor w/ 76 levels "15","16","17",..: 1 52 74 48 42 56 76 49 19 27 ...

#--> Wörter nicht numerisch!

#Alter umwandeln in numerisch

df_uk$age <- as.numeric(as.character(df_uk$age))

#Altersdgruppen Bilden
df_uk$age_group <- cut(
  df_uk$age,
  breaks = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100),
  labels = c("0-9","10-19","20-29","30-39","40-49",
             "50-59","60-69","70-79","80-89","90+"),
  right = FALSE
)

#Überprüfen

table(df_uk$age_group)

## 
##   0-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89   90+ 
##     0    35   157   282   249   232   275   263   138    16

#Erstellen
df_uk$age_group =cut(df_uk$age,
      breaks = c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100),labels = c("0-9","10-19","20-29","30-39","40-49",
"50-59","60-69","70-79","80-89","90+"),
      right = FALSE)

aModel_multi_cat =glm(depres ~ gndr + age_group,
                        data = df_uk)

# Koeffizienten, Odds Ratios, CI
coef(aModel_multi_cat)

##    (Intercept)     gndrFemale age_group20-29 age_group30-39 age_group40-49 
##      4.7521123      0.6722459      1.3834450      0.5989830      0.7372801 
## age_group50-59 age_group60-69 age_group70-79 age_group80-89   age_group90+ 
##      1.5972234      0.4029606      0.4053788      0.3252356      0.4226898

exp(coef(aModel_multi_cat))

##    (Intercept)     gndrFemale age_group20-29 age_group30-39 age_group40-49 
##     115.828694       1.958631       3.988619       1.820267       2.090242 
## age_group50-59 age_group60-69 age_group70-79 age_group80-89   age_group90+ 
##       4.939299       1.496248       1.499870       1.384357       1.526061

exp(confint(aModel_multi_cat))

##                     2.5 %     97.5 %
## (Intercept)    26.2667919 510.769889
## gndrFemale      1.2767516   3.004685
## age_group20-29  0.7874765  20.202610
## age_group30-39  0.3839934   8.628717
## age_group40-49  0.4348195  10.048109
## age_group50-59  1.0244261  23.814968
## age_group60-69  0.3152145   7.102332
## age_group70-79  0.3140619   7.162955
## age_group80-89  0.2671143   7.174620
## age_group90+    0.1074058  21.682829

# Modell mit kleineren Altersgruppen
aModel_multi_cat = glm(depres ~ gndr + age_group ,data = df_uk)

# Koeffizienten
coefs =coef(aModel_multi_cat)

# 95%-Konfidenzintervalle
ci = confint(aModel_multi_cat)

# Zusammenführen in ein DataFrame
df_or = data.frame(
  term = names(coefs),
  OR = exp(coefs),
  OR_lower = exp(ci[,1]),
  OR_upper = exp(ci[,2]))

#Forestplot
# Odds Ratios und CIs berechnen
coefs =coef(aModel_multi_cat)
ci =confint(aModel_multi_cat)

df_or =data.frame(
  term = names(coefs),
  OR = exp(coefs),
  OR_lower = exp(ci[,1]),
  OR_upper = exp(ci[,2])
)

# Intercept entfernen
df_or =df_or[df_or$term != "(Intercept)", ]

# Optional: Labels kürzen
df_or$term =gsub("age_group", "", df_or$term)
df_or$term =gsub("gndr", "", df_or$term)
df_or$term = gsub("Female", "F", df_or$term)

library(ggplot2)

ggplot(df_or, aes(x = term, y = OR)) +
  geom_point(size = 3) +
  geom_errorbar(aes(ymin = OR_lower, ymax = OR_upper), width = 0.2) +
  geom_hline(yintercept = 1, linetype = "dashed", color = "red") +
  coord_flip() +  # horizontale Darstellung
  ylab("Odds Ratio (95% CI)") +
  xlab("") +
  ggtitle("Odds Ratios for Gender and Age (without Intercept)") +
  theme_minimal() +
  theme(axis.text.y = element_text(size = 10),
        axis.text.x = element_text(size = 10),
        plot.margin = margin(5, 5, 5, 10))

# Gender, Age, and Depression

5.1 Gender

In the following regression model, differences in high depression scores between men and women were examined. The results indicate that females have 1.36 times higher odds of having a high depression score compared to males.

5.2 Age

Regarding age groups, individuals aged 50–59 showed the highest odds ratio for developing severe depression. However, all age groups had wide confidence intervals, indicating that these results are not statistically robust.

The Spearman correlation coefficient between depression scores and age is -0.04, indicating a very weak negative correlation. As age increases, depression scores tend to decrease slightly (and vice versa). This suggests that there is almost no meaningful relationship between age and depression within the U.K. dataset (see scatter plot). In this context, age appears to have little to no impact on depression scores.

# hypothesis 3: prevalence of depression decreases with age (UK)
#table(df_uk$agea) # just to check first (not meaningful): youngest 15y, oldest 90y
# convert "agea" (age) into numeric
df_uk$age = as.numeric(as.character(df_uk[,"agea"]))
# check: scatter plot (visual inspection)
#plot(df_uk$age, df_uk$depres, main = "Scatter Plot: Age, Depression" , xlab = "Age", ylab = "Depression")

#Wäre schön hier Prozentuale Anteile zu sehen anstatt Frequencies. 
#Hiereinmal Age von Dscrrce
#hist(df_uk$age[df_uk$dscrrce == "Marked"], breaks = 12, main = "Histogram: Marked", xlab = "Age", col = "steelblue")

#hist(df_uk$age[df_uk$dscrrce == "Not marked"], breaks = 12, main = "Histogram: Not Marked", xlab = "Age", col = "steelblue")

#Hier einmal Age von Dscrsex
#hist(df_uk$age[df_uk$dscrsex == "Marked"], breaks = 12, main = "Histogram: Marked", xlab = "Age", col = "steelblue")

#hist(df_uk$age[df_uk$dscrsex == "Not marked"], breaks = 12, main = "Histogram: Not Marked", xlab = "Age", col = "steelblue")


library(ggplot2)

# Alle Alterswerte als Faktor
#ages = sort(unique(df_uk$age))

#ggplot(df_uk, aes(x = factor(age), y = depres)) +
  #geom_col(fill = "steelblue") +
  #scale_x_discrete(
    #breaks = ages[seq(1, length(ages), by = 5)]  # nur jeden 5. Alterswert anzeigen
 #) +
  #labs(
   # title = "Depressionsscore nach Alter",
   # x = "Alter",
   # y = "Depressionsscore"
  #) +
 # theme_minimal() +
  #theme(axis.text.x = element_text(angle = 45, hjust = 1))





# scatter plot shows: not linear - NO Pearson Product-Moment Correlation; assumption: no relationship between both variables.
# use spearman-correlation
# is there a statistically significant association between the two metric variables "depression" and "age"?
# and how strong is it? effect size measure for the Wilcoxon test: correlation coefficient r
#cor(df_uk[, c("depression", "Age")], method = "spearman", use = "complete.obs")
# interpretation:
# spearman's correlation coefficient between depression and age is -0.04 (very weak negative correlation).
# as age increases, depression score tends to decrease (and vice versa).
# indicates that H3 holds. However:
# correlation coefficient of -.04 is very close to 0; indicates a very weak relationship between depression and age, almost none (see also scatterplot).
# in the context of this dataset for the UK, age has little to no meaningful impact on depression scores.
# does a statistically significant relationship exist between the two variables?
# store in variable "pvalue"
pvalue = cor.test(df_uk$depres, df_uk$age, method = "spearman")
pvalue # print p-value

## 
##  Spearman's rank correlation rho
## 
## data:  df_uk$depres and df_uk$age
## S = 717728947, p-value = 0.09598
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##         rho 
## -0.04156594

# interpretation:
# p-value = .096 and is > .05 (set significance level)
# meaning the correlation is not statistically significant
# strength and direction of the relationship (not meaningful because no statistically significance)
# just for curiosity: rₛ (Spearman’s rho): -.04
# very low effect size, almost nonexistent
# H3 (see above) is rejected, and H0 is retained: The sample data supports H0 (indicating no relationship)
# H3 done.

6 Distribution of Perceived Discrimination Due to Sexuality

The following chart illustrates the distribution of perceived discrimination based on sexuality within the U.K. dataset. “Marked” indicates a feeling of discrimination (n = 38), while “Not marked” indicates no perceived discrimination (n = 1,646).

#Distribution of Marked and not Marked (Marked= Different Sexuality, Not Marked=Straight)
#table(df_uk$dscrrc)


ggplot(df_uk, aes(x = dscrsex)) +
  geom_bar(fill = "steelblue") +
  geom_text(stat = "count", aes(label = ..count..), vjust = -0.5)+
 scale_x_discrete(labels = c("Marked" = "Marked", "Not marked" = "Not marked")) +
  labs(title = "", x = "Perceived discrimination sexuality", y = "Count") +
  theme_minimal()

# Depression Score and Sexuality The following histogram visualizes the distribution of depression scores. Individuals who perceive discrimination based on their sexuality (“marked”) in the U.K. have an average depression score of 7.4, compared to an average score of 5.8 among individuals who do not perceive discrimination (“not marked”). However, it should be noted that the distribution of scores among the “not marked” group is wider (0–24) than that of the “marked” group (1–18), which should be taken into consideration when interpreting the data.

by(df_uk$depres, df_uk$dscrsex, mean, na.rm=T)

## df_uk$dscrsex: Not marked
## [1] 5.798998
## ------------------------------------------------------------ 
## df_uk$dscrsex: Marked
## [1] 7.421053

# mean depression score for two groups (Not marked, Marked)



# histogram for "not marked" group
#hist(df_uk$depres[df_uk$dscrsex == "Not marked"], breaks = 12, main = "Histogram: Not marked", 
    # xlab = "Depression Score", 
    # col = "steelblue")

# histogram for "Marked" group
#hist(df_uk$depres[df_uk$dscrsex == "Marked"], breaks = 12, main ="Histogram: Marked", 
    # xlab = "Depression score", 
    # col = "steelblue")
# histograms: probably no normal distribution of the data
# use Wilcoxon-test (rank based)



# Visualisierung beide Gruppen 
# Basis: nur ASCII, keine Pipes, kein <-, kein percent_format()
library(ggplot2)

# Optional: NAs entfernen (sonst fehlen Kategorien im Plot)
df_sub = df_uk[!is.na(df_uk$dscrsex) & !is.na(df_uk$depres), ]

# 1) Zaehlen: wie viele pro Gruppe (dscrsex) und Score (depres)
counts = as.data.frame(table(dscrsex = df_sub$dscrsex,
                             depres  = df_sub$depres))
names(counts)[names(counts) == "Freq"] = "n"

# 2) Gesamt je Gruppe
totals = aggregate(n ~ dscrsex, data = counts, FUN = sum)
names(totals)[names(totals) == "n"] = "total"

# 3) Mergen und Prozent berechnen
df_plot = merge(counts, totals, by = "dscrsex")
df_plot$pct = df_plot$n / df_plot$total

# (optional) Depression-Scores sortieren
df_plot$depres = factor(df_plot$depres, levels = sort(unique(df_plot$depres)))

# 4) Plotten: Facetten je Gruppe, Y-Achse in %
ggplot(df_plot, aes(x = depres, y = pct)) +
  geom_col(width = 0.6, fill = "steelblue") +
  facet_wrap(vars(dscrsex)) +
  scale_y_continuous(labels = function(x) paste0(round(x * 100, 1), " %")) +
  labs(subttitle = "Depression Score by Perceived Discrimination (Sexuality)",)

#table(df_uk$depres)


#Marked=Gay, Not Marked= Straight

library(ggplot2)


#Boxplot
#ggplot(df_uk, aes(x = dscrsex, y = depres,)) +
  #geom_boxplot(fill="steelblue",alpha = 0.7) +
  #scale_x_discrete(labels = c("Not marked" = "Not marked", "Marked" = "Marked")) +
  #labs(title = "Depression score and preceived  Discrimination of Sexuality",
       #x = "Sexuality ",
       #y = "Depression score") +
  #theme_minimal() +
  #theme(legend.position = "none")




#wilcox.test(depres ~ dscrsex, data=df_uk)



by(df_uk$depres, df_uk$dscrsex, summary, na.rm=T) # meaningful for interpretation (MEDIAN). 49 NA's

## df_uk$dscrsex: Not marked
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   3.000   5.000   5.799   8.000  24.000      49 
## ------------------------------------------------------------ 
## df_uk$dscrsex: Marked
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   4.000   7.500   7.421   9.000  18.000

# check frequency
#table(df_uk$dscrsex) # (1623 + 61) - 49 NA's: n = 1635

7 Depression and Racial Discrimination

In our dataset, 61 individuals reported experiencing discrimination based on their race, while 1,623 individuals did not. There was a significant difference in depression scores between those who perceived racial discrimination (“marked,” mean = 6.934) and those who did not (“not marked,” mean = 5.794), (p = 0.0108). The “not marked” group exhibited a wider range of depression scores (0–24) compared to the “marked” group (0–18), which should be considered when interpreting the results.

by(df_uk$depres, df_uk$dscrrce, mean, na.rm=T)

## df_uk$dscrrce: Not marked
## [1] 5.794155
## ------------------------------------------------------------ 
## df_uk$dscrrce: Marked
## [1] 6.934426

# mean depression score for two groups (Not marked, Marked)
# Not marked (no discrimination based on skin colour or race) - Mean = 1.72
# Marked (discrimination based on skin colour or race was perceived or reported) - Mean = 1.86 (rounded 1.87) 
# this is a difference of 0.143 points (on the scale), which is even a lower difference on the scale than for "dscrsex" - borderline-significant (wegen der Standardabweichung)
# interpretation: In the UK, participants who report experiencing discrimination (skin colour or race) have, on average, higher depression scores 
# compared to participants who do not report discrimination (skin colour or race).
# check further to see if this difference is statistically significant
# which test is appropriate?
# check for normal distribution of the data
# histogram for "not marked" group
hist(df_uk$depres[df_uk$dscrrce == "Not marked"], breaks = 12, main = "Histogram: Not marked", xlab = "Depression Score", col = "steelblue")

# histogram for "Marked" group
hist(df_uk$depres[df_uk$dscrrce == "Marked"], breaks = 12, main = "Histogram: Marked", xlab = "Depression Score", col = "steelblue")

# histograms: probably no normal distribution of the data
# use Wilcoxon-test (rank based)
wilcox.test(df_uk$depres ~ df_uk$dscrrce)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  df_uk$depres by df_uk$dscrrce
## W = 38817, p-value = 0.0108
## alternative hypothesis: true location shift is not equal to 0

by(df_uk$depres, df_uk$dscrrce, summary, na.rm=T) # meaningful for interpretation (MEDIAN). 49 NA's

## df_uk$dscrrce: Not marked
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   3.000   5.000   5.794   8.000  24.000      49 
## ------------------------------------------------------------ 
## df_uk$dscrrce: Marked
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   4.000   7.000   6.934  10.000  19.000

# check frequency
table(df_uk$dscrrce) # (1623 + 61) - 49 NA's: n = 1635

## 
## Not marked     Marked 
##       1623         61

# interpretation:
# not marked: Mdn = 1.625; IQR = 625)
# marked: Mdn = 1.875; IQR = 750)
# p-value = .011 (very low) and is < .05 (significance level)
# there is a statistically significant difference (in the median) of the depression scores between the two groups of dscrrce "Not marked" and "Marked"
# with "Marked" being higher: (1.875 - 1.625 = 0.25 points on the depression scale) 
# individuals in the "Marked" group (those who report discrimination) show a wider range of depression scores compared to the "Not marked" group
# H2 done.

8 Regression Model: Age, Gender, and Discrimination Based on Sexuality

In a multivariate logistic regression, gender and experienced discrimination (Sexuality) were significant predictors of severe depressive symptoms (depression score ≥ 9). Women had a 40% higher risk compared to men (OR = 1.40, p = 0.006). Individuals who experienced discrimination based on sexual orientation had a substantially increased risk (OR = 2.40, p = 0.011). Age was not a significant predictor (p = 0.19). These findings highlight that gender and discrimination (Sexuality) are key factors associated with severe depression.

df_clean = df_uk[!is.na(df_uk$depres) &
                   !is.na(df_uk$age) &
                   !is.na(df_uk$gndr) &
                   !is.na(df_uk$dscrsex), ]


df_clean$dep = ifelse(df_clean$depres >= 9, 1, 0)

df_clean$dep = as.numeric(df_clean$dep)

table(df_clean$dep)

## 
##    0    1 
## 1257  348

model = glm(dep ~ age + gndr + dscrsex,
             data = df_clean,
             family = binomial)
summary(model)

## 
## Call:
## glm(formula = dep ~ age + gndr + dscrsex, family = binomial, 
##     data = df_clean)
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   -1.263957   0.195819  -6.455 1.08e-10 ***
## age           -0.004249   0.003240  -1.312   0.1896    
## gndrFemale     0.337397   0.122789   2.748   0.0060 ** 
## dscrsexMarked  0.877132   0.342639   2.560   0.0105 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1678.4  on 1604  degrees of freedom
## Residual deviance: 1662.6  on 1601  degrees of freedom
## AIC: 1670.6
## 
## Number of Fisher Scoring iterations: 4

#Odds Ratio bestimmen

exp(coef(model))

##   (Intercept)           age    gndrFemale dscrsexMarked 
##     0.2825337     0.9957596     1.4012956     2.4039944

#Visualisierung



library(ggplot2)
library(broom)  # für tidy()

# Modell 1: dep ~ gndr
model1 = glm(dep ~ gndr, family = binomial, data = df_uk)

# Modell 2: dep ~ age + gndr + dscrsex
model2 = glm(dep ~ age + gndr + dscrsex, family = binomial, data = df_uk)

# tidy die Modelle
tidy1 = broom::tidy(model1, conf.int = TRUE, exponentiate = TRUE)
tidy2 = broom::tidy(model2, conf.int = TRUE, exponentiate = TRUE)

# addiere Modell-ID
tidy1$model ="Model 1"
tidy2$model = "Model 2"

# zusammenführen
df_plot = rbind(tidy1, tidy2)

# Forest plot
ggplot(df_plot[-1,], aes(x = term, y = estimate, ymin = conf.low, ymax = conf.high, color = model)) +
  geom_pointrange(position = position_dodge(width = 0.5)) +
  coord_flip() +
  labs(x = "", y = "Odds Ratio (95% CI)", title = "Forest Plot of Regression Models") +
  theme_minimal() +
  geom_hline(yintercept = 1, linetype = "dashed")

Model 1: Depression and Gender Model 2: Depression, Gender, Age, Discrimination/Sexuality

9 Regression Model 2: Depression, Age, Gender, and Racial Discrimination

A logistic regression analysis was conducted to examine factors associated with severe depression (score ≥ 9). In the bivariate model, women had significantly higher odds of severe depression compared to men (OR = 1.36; 95% CI: 1.08–1.73; p = 0.011). When age groups were included in a multivariate model, gender remained a significant predictor, whereas age showed no clear effect, likely due to wide confidence intervals. Analyses of discrimination indicated that participants reporting experiences of racial discrimination had significantly higher depression scores (median = 7 vs. 5; p = 0.011). In a multivariate model controlling for age and gender, both female gender (OR ≈ 1.40; p = 0.008) and racial discrimination (OR ≈ 1.89; p = 0.026) were associated with increased odds of severe depression, while age remained non-significant. Conclusion: Female gender and experiences of discrimination—whether based on sexual orientation or race—are significant risk factors for severe depressive symptoms, whereas age does not appear to have a significant effect in this dataset.

df_clean2 = df_uk[!is.na(df_uk$depres) &
                   !is.na(df_uk$age) &
                   !is.na(df_uk$gndr) &
                   !is.na(df_uk$dscrrce), ]


df_clean$dep = ifelse(df_clean$depres >= 9, 1, 0)

df_clean$dep = as.numeric(df_clean$dep)

table(df_clean$dep)

## 
##    0    1 
## 1257  348

model2 = glm(dep ~ age + gndr + dscrrce,
             data = df_clean,
             family = binomial)
summary(model2)

## 
## Call:
## glm(formula = dep ~ age + gndr + dscrrce, family = binomial, 
##     data = df_clean)
## 
## Coefficients:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   -1.247838   0.195064  -6.397 1.58e-10 ***
## age           -0.004464   0.003233  -1.381  0.16740    
## gndrFemale     0.325487   0.122477   2.658  0.00787 ** 
## dscrrceMarked  0.636840   0.286119   2.226  0.02603 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1678.4  on 1604  degrees of freedom
## Residual deviance: 1664.1  on 1601  degrees of freedom
## AIC: 1672.1
## 
## Number of Fisher Scoring iterations: 4

#Odds Ratio bestimmen

exp(coef(model2))

##   (Intercept)           age    gndrFemale dscrrceMarked 
##     0.2871250     0.9955459     1.3847047     1.8904974

#Visualisierung

library(ggplot2)

# Odds Ratios und 95%-Konfidenzintervalle berechnen
OR =exp(coef(model2))
CI =exp(confint(model2))

# Datenframe für ggplot vorbereiten
plot_data =data.frame(
  term = names(OR),
  OR = OR,
  lower = CI[,1],
  upper = CI[,2]
)

# Forest Plot
ggplot(plot_data, aes(x = term, y = OR, ymin = lower, ymax = upper)) +
  geom_pointrange(color = "steelblue", size = 1) +
  geom_hline(yintercept = 1, linetype = "dashed", color = "red") +
  coord_flip() +  # horizontal drehen
  labs(title = "Odds Ratios from Logistic Regression",
       x = "",
       y = "Odds Ratio (95% CI)") +
  theme_minimal()

10 Discussion

Our results are consistent with previous research. For example, individuals who perceive discrimination based on their sexuality are more likely to experience depression than those who do not feel discriminated against. The United Kingdom Survey on the Mental Health of LGBTQ+ (2024) highlighted this issue and reported that victimization, discrimination, and lack of access to affirming spaces contribute to poorer mental health outcomes. Our data confirm these findings. Similarly, our results show that racial discrimination is associated with higher depression scores. This may be linked to increased rates of victimization and limited access to affirming spaces. According to Stop Hate UK, 43% of all hate crimes reported to their helpline were motivated by racism. This could reflect the historical legacy of colonialism and systemic racism in the U.K. Another contributing factor may be the lack of representation of ethnic minorities in positions of power across politics, media, and business, which can exacerbate feelings of marginalization and stress. Our analysis of age and depression showed very weak and non-significant correlations. Age does not appear to have a meaningful influence on depression scores in this dataset. However, the wide confidence intervals across age groups indicate that these results should be interpreted with caution and are not statistically robust. Gender differences in depression were pronounced in our findings. Women had significantly higher odds of developing severe depression compared to men. This gender gap may reflect societal pressures and structural inequalities, including lower pay, greater domestic responsibilities, and caregiving burdens that disproportionately affect women. Further research is needed to identify additional drivers of depression. According to the Mental Health Foundation UK, individuals living in the lowest socioeconomic groups are more likely to experience common mental health problems such as depression and anxiety. Loneliness is another significant factor, particularly among older adults (Sheffield Hallam University, 2025). Moreover, inequalities in access to healthcare services in the U.K. contribute to higher rates of depression (Royal College of Psychiatrists, 2025). Other potential determinants include lifestyle factors such as diet, exercise, and general health behaviors, which may also play a significant role in mental health outcomes. In conclusion, our findings underscore that gender and experiences of discrimination—whether based on sexual orientation or race—are key risk factors for severe depression, whereas age appears to have little direct effect in this dataset. Addressing structural inequalities and providing supportive, inclusive environments may be crucial steps in mitigating the impact of these risk factors.

11 References

Chapman, L. (2022). “I want to fit in… but I don’t want to change myself”: A study on autistic teenagers’ experiences of masking, mental health and their interplay. Research in Autism Spectrum Disorders, 96, 102016. https://doi.org/10.1016/j.rasd.2022.102016

European Social Survey. (n.d.). Data portal. European Social Survey ERIC. Retrieved September 5, 2025, from https://www.europeansocialsurvey.org/data-portal

Stop Hate UK. (n.d.). Racism in the UK. Retrieved September 5, 2025, from Stop Hate UK website: https://www.stophateuk.org/about-hate-crime/racism-in-the-uk/

University of Washington. (n.d.). Patient Health Questionnaire-9 (PHQ-9). In Mental Health Screening. National HIV Curriculum. Retrieved September 4, 2025, from https://www.hiv.uw.edu/page/mental-health-screening/phq-9

World Health Organization. (2023, September 19). Depression. In Fact sheets. https://www.who.int/news-room/fact-sheets/detail/depression

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Predictors of Clinically Significant Depression

2025-09-02, Anna Rendez