Homework 1

Introduction

Depression is a prevalent mental disorder, experienced by 4-10% of the global population over their lifetime (Chapman et al., 2022). Currently, around 280 million people (3.8%) are affected globally (WHO, 2023), with depression ranked among the top contributors to the global health burden in 2019. Although adequate and cost-efficient treatments are available across low-, middle-, and high-income countries, and stigma surrounding professional help is decreasing, depression treatment challenges are common occurrences. Depression remains a global health issue because of delayed diagnosis, treatments that are not tailored to individual needs (Ferrari et al., 2024), and inadequate responses to treatment attempts (Chapman et al., 2022). For instance, in the United Kingdom (UK), 15-30% of individuals still suffer from depression after undergoing two or more treatments. Moreover, depression affects various aspects of life, including education, employment, and personal relationships (OECD & European Commission, 2024), while contributing significantly to economic and healthcare costs (Statista, 2024; Vinokur, Price, & Caplan, 1996). Therefore, the research question is as follows: RQ. What are the Potential Drivers of Depression in the UK? Understanding the social determinants of depression is essential for developing more tailored treatments, targeting risk factors, and enabling earlier diagnoses to reduce the health burden, both for personal/societal and economic benefits.

library(foreign)
library(ltm)

setwd("/Users/annarendez/Desktop/Master/1.Semester/Quantitavie Forschung/R Data")
df = read.spss("ESS11.sav", to.data.frame = T)

Hypothesis

H1: The prevalence of depression increases with experienced discrimination based on an individual’s sexuality (LGBQ+).

H2: The prevalence of depression increases with experienced discrimination based on an individual’s skin colour or race.

H3: The prevalence of depression decrease with age (still to be justified by the literature)

H4: The prevalence of depression among females compared to males is higher (Female are more depressed than male) (still to be justified by the literature)

Methods

The present paper aimed to investigate depression in a British population, as 15-30% of individuals do not recover from depression after two or more treatments (Chapman et al., 2022) and therefore a greater understanding of potential contributing factors is crucial for improving recovery outcomes. Data and Sample For this investigation, the most recent dataset from the European Social Survey (ESS) (11th Edition, European Social Survey European Research Infrastructure, 2024) was used. Statistical analyses and data processing were performed using R (version 4.4.2, 2024-10-31, ucrt for Windows) (De Vries & Meys, 2021), initially loading the foreign and ltm packages required - the latter to compute Cronbach’s alpha (De Vries & Meys, 2021). The dataset was subsetted to include only participants from the United Kingdom, resulting in an initial sample size of 1684 (n = 1684). However, due to missing data, the final sample consisted of 1635 (n = 1635) participants, aged 15 to 90 years. Measures Dependent Variable: Depression was measured using the Centre for Epidemiological Studies Depression Scale (CES-D8), an 8-item scale. An example item is “felt that everything they did was an effort,” referring to the frequency of such feelings in the past week. Responses were given on a four-point Likert scale ranging from 1 = “None or almost none of the time” to 4 = “All or almost all of the time.” Items d23 (“you were happy?”) and d25 (“you enjoyed life?”) were reverse-coded due to their negative wording, ensuring a consistent positive polarity across all items. Independent Variables: Discrimination based on the respondent’s sexuality (nominal: “marked”, “not marked”), discrimination based on colour or race (nominal: “marked”, “not marked”), age (ratio scale: 15-90), and gender (nominal: “male”, “female”) were considered as potential factors influencing depression. Data Analysis To assess the internal consistency of the depression scale, Cronbach’s alpha was calculated, typically ranging from 0 to 1 (Döring & Bortz, 2016), although it can sometimes be negative (Bühner, 2005). The degree of agreement between items was high, with an alpha of 0.838, well above the recommended threshold of 0.7 (Hair, 2010), indicating that all items measure the same underlying construct of depression (Osburn, 2000). Given that Cronbach’s alpha typically ranges between 0.7 and 0.9 for adequate reliability, and values above 0.9 indicate high reliability (Döring & Bortz, 2016), a higher alpha value would have been preferable, however not necessary. Statistical Analyses Bivariate and multivariate statistical methods were used to assess the associations between variables. For hypotheses H1, H2, and H4, the median and interquartile range (IQR), along with the mean, were reported. No correlation analysis (e.g., Pearson’s product-moment correlation) was conducted because the independent variables were not interval-scaled (Bühner, 2005), but rather nominal (dichotomous). For hypothesis H3, Spearman’s rank correlation was used instead of Pearson’s product-moment correlation, as the assumption of bivariate normality for Pearson’s correlation was not met, as confirmed by visual inspection using a scatterplot. Furthermore, the linearity assumption required for Pearson’s correlation was not satisfied (Bühner, 2005). Statistical significance was assessed using a significance level of p < 0.05. To determine the effects of discrimination (based on sexuality, skin colour or race) and gender on depression, the rank-based Wilcoxon test was applied, as the variables did not require a normal distribution (Universität Zürich, 2023). The Spearman rank correlation test (Spearman’s rho) was used to examine the association between depression and age, where a value closer to 0 indicates a weaker effect (Universität Zürich, 2023). A multivariate model was constructed to assess the combined effects of discrimination (both sexuality and skin colour or race), age, and gender on depression. Variables that did not show a statistically significant relationship with depression were excluded from the final model.

df$d20 = as.numeric(df$fltdpr)
df$d21 = as.numeric(df$flteeff)
df$d22 = as.numeric(df$slprl)
df$d23 = as.numeric(df$wrhpp)
df$d24 = as.numeric(df$fltlnl)
df$d25 = as.numeric(df$enjlf)
df$d26 = as.numeric(df$fltsd)
df$d27 = as.numeric(df$cldgng)


# reverse scales of d23 and d25 (negative coding)
df$d23 = 5 - df$d23
df$d25 = 5 - df$d25


# lookup: existing country names in the dataframe (df)
table(df$cntry)

## 
##            Albania            Austria            Belgium           Bulgaria 
##                  0               2354               1594                  0 
##        Switzerland             Cyprus            Czechia            Germany 
##               1384                685                  0               2420 
##            Denmark            Estonia              Spain            Finland 
##                  0                  0               1844               1563 
##             France     United Kingdom            Georgia             Greece 
##               1771               1684                  0               2757 
##            Croatia            Hungary            Ireland             Israel 
##               1563               2118               2017                  0 
##            Iceland              Italy          Lithuania         Luxembourg 
##                842               2865               1365                  0 
##             Latvia         Montenegro    North Macedonia        Netherlands 
##                  0                  0                  0               1695 
##             Norway             Poland           Portugal            Romania 
##               1337               1442               1373                  0 
##             Serbia Russian Federation             Sweden           Slovenia 
##               1563                  0               1230               1248 
##           Slovakia             Turkey            Ukraine             Kosovo 
##               1442                  0                  0                  0

# selected country: United Kingdom (UK hereafter)
# subset dataset: rows where cntry is "United Kingdom", all columns
# name it "df_uk" (dataset UK)
df_uk = df[df$cntry == "United Kingdom", ]
# check
table(df_uk$cntry)

## 
##            Albania            Austria            Belgium           Bulgaria 
##                  0                  0                  0                  0 
##        Switzerland             Cyprus            Czechia            Germany 
##                  0                  0                  0                  0 
##            Denmark            Estonia              Spain            Finland 
##                  0                  0                  0                  0 
##             France     United Kingdom            Georgia             Greece 
##                  0               1684                  0                  0 
##            Croatia            Hungary            Ireland             Israel 
##                  0                  0                  0                  0 
##            Iceland              Italy          Lithuania         Luxembourg 
##                  0                  0                  0                  0 
##             Latvia         Montenegro    North Macedonia        Netherlands 
##                  0                  0                  0                  0 
##             Norway             Poland           Portugal            Romania 
##                  0                  0                  0                  0 
##             Serbia Russian Federation             Sweden           Slovenia 
##                  0                  0                  0                  0 
##           Slovakia             Turkey            Ukraine             Kosovo 
##                  0                  0                  0                  0

# calculation of Cronbach's alpha (using df_uk) to check internal consistency ("reliability") of depression items
cronbach.alpha(df_uk[,c("d20", "d21", "d22", "d23", "d24", "d25", "d26", "d27")], na.rm=T)

## 
## Cronbach's alpha for the 'df_uk[, c("d20", "d21", "d22", "d23", "d24", "d25", "d26", "d27")]' data-set
## 
## Items: 8
## Sample units: 1684
## alpha: 0.838

 alpha_uk=cronbach.alpha(df_uk[,c("d20", "d21", "d22", "d23", "d24", "d25", "d26", "d27")], na.rm=T)
 
# cronbach's alpha is 0.838 (not too high, not too low).
# to what extent are we measuring the same construct (reliability of the scale)? 
# goal of 0.7 ≤ α ≤ 0.9 has been achieved. 
# cronbach’s alpha falls between 0.8 and approximately 0.92, which is considered optimal.
# the scale measures the same underlying construct (depression) - no items needs to be removed.

##Inline Expression: round(alpha_uk$alpha,2)

Am Anfang haben wir den Cronbach Alpha berechnet um zu schauen ob unsere Variablen in Zusammenhang miteinander stehen. Der Cornbach Alpha unserer Rechnung beträgt 0.84 , dieses Ergebnis zeigt, dass die berechneten Variablen in Zusammenhang zueinander stehen.

Im folgendem Code block wird eine Likert Table erstellt. Aus irgendeinem Grund funktioniert die Durchführung nicht. Der Code wird im folgendem Code Block presentiert:

#library(likert)
#library(foreign)
#install.packages("likert") # required to calculate Cronbach's alpha
#library(ltm)     # create basic Likert tables and plots
#library(kableExtra)

#vnames =c("d20", "d21", "d22", "d23", "d24", "d25", "d26", "d27")
#likert_df = df[,vnames]

#likert_df

#likert_table = likert(likert_df)$results 
#likert_numeric_df = as.data.frame(lapply((df[,vnames]), as.numeric))
#likert_table$Mean = unlist(lapply((likert_numeric_df[,vnames]), mean, na.rm=T)) 
# ... and append new columns to the data frame
#likert_table$Count = unlist(lapply((likert_numeric_df[,vnames]), function (x) sum(!is.na(x))))
#likert_table$Item = c(
  #d20="how much of the time during the past week you felt depressed?",
  #d21="…you felt that everything you did was an effort?",
  #d22="…your sleep was restless?",
  #d23="…you were happy?",
  #d24="…you felt lonely?",
  #d25="…you enjoyed life?",
  #d26="…you felt sad?",
  #d27="…you could not get going?")

# round all percentage values to 1 decimal digit
#likert_table[,2:6] = round(likert_table[,2:6],1)
# round means to 3 decimal digits
#likert_table[,7] = round(likert_table[,7],3)

# create formatted table
#kable_styling(kable(likert_table,
                    #caption = "Distribution of answers regarding same sex partnerships (ESS round 11, all countries)"))
# create basic plot (code also valid)
#plot(likert(summary=likert_table[,1:6])) # limit to columns 1:6 to skip mean and count

Applying Weight to the Data

# calculation of the average score (new variable named "depression")
# score = mean of items row wise = sum of item values / number of items - 
df_uk$depression = rowSums(df_uk[, c("d20", "d21", "d22", "d23", "d24", "d25", "d26", "d27")]) / 8
# check that all columns (d20, d21, d22, etc.) are correctly spelled and exist in df_uk (not meaningful).
names(df_uk)

##   [1] "name"       "essround"   "edition"    "proddate"   "idno"      
##   [6] "cntry"      "dweight"    "pspwght"    "pweight"    "anweight"  
##  [11] "nwspol"     "netusoft"   "netustm"    "ppltrst"    "pplfair"   
##  [16] "pplhlp"     "polintr"    "psppsgva"   "actrolga"   "psppipla"  
##  [21] "cptppola"   "trstprl"    "trstlgl"    "trstplc"    "trstplt"   
##  [26] "trstprt"    "trstep"     "trstun"     "vote"       "prtvtdat"  
##  [31] "prtvtebe"   "prtvtchr"   "prtvtccy"   "prtvtffi"   "prtvtffr"  
##  [36] "prtvgde1"   "prtvgde2"   "prtvtegr"   "prtvthhu"   "prtvteis"  
##  [41] "prtvteie"   "prtvteit"   "prtvclt1"   "prtvclt2"   "prtvclt3"  
##  [46] "prtvtinl"   "prtvtcno"   "prtvtfpl"   "prtvtept"   "prtvtbrs"  
##  [51] "prtvtesk"   "prtvtgsi"   "prtvtges"   "prtvtdse"   "prtvthch"  
##  [56] "prtvtdgb"   "contplt"    "donprty"    "badge"      "sgnptit"   
##  [61] "pbldmna"    "bctprd"     "pstplonl"   "volunfp"    "clsprty"   
##  [66] "prtcleat"   "prtclebe"   "prtclbhr"   "prtclccy"   "prtclgfi"  
##  [71] "prtclgfr"   "prtclgde"   "prtclegr"   "prtclihu"   "prtcleis"  
##  [76] "prtclfie"   "prtclfit"   "prtclclt"   "prtclhnl"   "prtclcno"  
##  [81] "prtcljpl"   "prtclgpt"   "prtclbrs"   "prtclesk"   "prtclgsi"  
##  [86] "prtclhes"   "prtcldse"   "prtclhch"   "prtcldgb"   "prtdgcl"   
##  [91] "lrscale"    "stflife"    "stfeco"     "stfgov"     "stfdem"    
##  [96] "stfedu"     "stfhlth"    "gincdif"    "freehms"    "hmsfmlsh"  
## [101] "hmsacld"    "euftf"      "lrnobed"    "loylead"    "imsmetn"   
## [106] "imdfetn"    "impcntr"    "imbgeco"    "imueclt"    "imwbcnt"   
## [111] "happy"      "sclmeet"    "inprdsc"    "sclact"     "crmvct"    
## [116] "aesfdrk"    "health"     "hlthhmp"    "atchctr"    "atcherp"   
## [121] "rlgblg"     "rlgdnm"     "rlgdnbat"   "rlgdnacy"   "rlgdnafi"  
## [126] "rlgdnade"   "rlgdnagr"   "rlgdnhu"    "rlgdnais"   "rlgdnie"   
## [131] "rlgdnlt"    "rlgdnanl"   "rlgdnno"    "rlgdnapl"   "rlgdnapt"  
## [136] "rlgdnrs"    "rlgdnask"   "rlgdnase"   "rlgdnach"   "rlgdngb"   
## [141] "rlgblge"    "rlgdnme"    "rlgdebat"   "rlgdeacy"   "rlgdeafi"  
## [146] "rlgdeade"   "rlgdeagr"   "rlgdehu"    "rlgdeais"   "rlgdeie"   
## [151] "rlgdelt"    "rlgdeanl"   "rlgdeno"    "rlgdeapl"   "rlgdeapt"  
## [156] "rlgders"    "rlgdeask"   "rlgdease"   "rlgdeach"   "rlgdegb"   
## [161] "rlgdgr"     "rlgatnd"    "pray"       "dscrgrp"    "dscrrce"   
## [166] "dscrntn"    "dscrrlg"    "dscrlng"    "dscretn"    "dscrage"   
## [171] "dscrgnd"    "dscrsex"    "dscrdsb"    "dscroth"    "dscrdk"    
## [176] "dscrref"    "dscrnap"    "dscrna"     "ctzcntr"    "brncntr"   
## [181] "cntbrthd"   "livecnta"   "lnghom1"    "lnghom2"    "feethngr"  
## [186] "facntr"     "fbrncntc"   "mocntr"     "mbrncntc"   "ccnthum"   
## [191] "ccrdprs"    "wrclmch"    "admrclc"    "testjc34"   "testjc35"  
## [196] "testjc36"   "testjc37"   "testjc38"   "testjc39"   "testjc40"  
## [201] "testjc41"   "testjc42"   "vteurmmb"   "vteubcmb"   "ctrlife"   
## [206] "etfruit"    "eatveg"     "dosprt"     "cgtsmok"    "alcfreq"   
## [211] "alcwkdy"    "alcwknd"    "icgndra"    "alcbnge"    "height"    
## [216] "weighta"    "dshltgp"    "dshltms"    "dshltnt"    "dshltref"  
## [221] "dshltdk"    "dshltna"    "medtrun"    "medtrnp"    "medtrnt"   
## [226] "medtroc"    "medtrnl"    "medtrwl"    "medtrnaa"   "medtroth"  
## [231] "medtrnap"   "medtrref"   "medtrdk"    "medtrna"    "medtrnu"   
## [236] "hlpfmly"    "hlpfmhr"    "trhltacu"   "trhltacp"   "trhltcm"   
## [241] "trhltch"    "trhltos"    "trhltho"    "trhltht"    "trhlthy"   
## [246] "trhltmt"    "trhltpt"    "trhltre"    "trhltsh"    "trhltnt"   
## [251] "trhltref"   "trhltdk"    "trhltna"    "fltdpr"     "flteeff"   
## [256] "slprl"      "wrhpp"      "fltlnl"     "enjlf"      "fltsd"     
## [261] "cldgng"     "hltprhc"    "hltprhb"    "hltprbp"    "hltpral"   
## [266] "hltprbn"    "hltprpa"    "hltprpf"    "hltprsd"    "hltprsc"   
## [271] "hltprsh"    "hltprdi"    "hltprnt"    "hltprref"   "hltprdk"   
## [276] "hltprna"    "hltphhc"    "hltphhb"    "hltphbp"    "hltphal"   
## [281] "hltphbn"    "hltphpa"    "hltphpf"    "hltphsd"    "hltphsc"   
## [286] "hltphsh"    "hltphdi"    "hltphnt"    "hltphnap"   "hltphref"  
## [291] "hltphdk"    "hltphna"    "hltprca"    "cancfre"    "cnfpplh"   
## [296] "fnsdfml"    "jbexpvi"    "jbexpti"    "jbexpml"    "jbexpmc"   
## [301] "jbexpnt"    "jbexpnap"   "jbexpref"   "jbexpdk"    "jbexpna"   
## [306] "jbexevl"    "jbexevh"    "jbexevc"    "jbexera"    "jbexecp"   
## [311] "jbexebs"    "jbexent"    "jbexenap"   "jbexeref"   "jbexedk"   
## [316] "jbexena"    "nobingnd"   "likrisk"    "liklead"    "sothnds"   
## [321] "actcomp"    "mascfel"    "femifel"    "impbemw"    "trmedmw"   
## [326] "trwrkmw"    "trplcmw"    "trmdcnt"    "trwkcnt"    "trplcnt"   
## [331] "eqwrkbg"    "eqpolbg"    "eqmgmbg"    "eqpaybg"    "eqparep"   
## [336] "eqparlv"    "freinsw"    "fineqpy"    "wsekpwr"    "weasoff"   
## [341] "wlespdm"    "wexashr"    "wprtbym"    "wbrgwrm"    "hhmmb"     
## [346] "gndr"       "gndr2"      "gndr3"      "gndr4"      "gndr5"     
## [351] "gndr6"      "gndr7"      "gndr8"      "gndr9"      "gndr10"    
## [356] "gndr11"     "gndr12"     "yrbrn"      "agea"       "yrbrn2"    
## [361] "yrbrn3"     "yrbrn4"     "yrbrn5"     "yrbrn6"     "yrbrn7"    
## [366] "yrbrn8"     "yrbrn9"     "yrbrn10"    "yrbrn11"    "yrbrn12"   
## [371] "rshipa2"    "rshipa3"    "rshipa4"    "rshipa5"    "rshipa6"   
## [376] "rshipa7"    "rshipa8"    "rshipa9"    "rshipa10"   "rshipa11"  
## [381] "rshipa12"   "rshpsts"    "rshpsgb"    "lvgptnea"   "dvrcdeva"  
## [386] "marsts"     "marstgb"    "maritalb"   "chldhhe"    "domicil"   
## [391] "paccmoro"   "paccdwlr"   "pacclift"   "paccnbsh"   "paccocrw"  
## [396] "paccxhoc"   "paccnois"   "paccinro"   "paccnt"     "paccref"   
## [401] "paccdk"     "paccna"     "edulvlb"    "eisced"     "edlveat"   
## [406] "edlvebe"    "edlvehr"    "edlvgcy"    "edlvdfi"    "edlvdfr"   
## [411] "edudde1"    "educde2"    "edlvegr"    "edlvdahu"   "edlvdis"   
## [416] "edlvdie"    "edlvfit"    "edlvdlt"    "edlvenl"    "edlveno"   
## [421] "edlvipl"    "edlvept"    "edlvdrs"    "edlvdsk"    "edlvesi"   
## [426] "edlvies"    "edlvdse"    "edlvdch"    "educgb1"    "edubgb2"   
## [431] "edagegb"    "eduyrs"     "pdwrk"      "edctn"      "uempla"    
## [436] "uempli"     "dsbld"      "rtrd"       "cmsrv"      "hswrk"     
## [441] "dngoth"     "dngref"     "dngdk"      "dngna"      "mainact"   
## [446] "mnactic"    "crpdwk"     "pdjobev"    "pdjobyr"    "emplrel"   
## [451] "emplno"     "wrkctra"    "estsz"      "jbspv"      "njbspv"    
## [456] "wkdcorga"   "iorgact"    "wkhct"      "wkhtot"     "nacer2"    
## [461] "tporgwk"    "isco08"     "wrkac6m"    "uemp3m"     "uemp12m"   
## [466] "uemp5yr"    "mbtru"      "hincsrca"   "hinctnta"   "hincfel"   
## [471] "edulvlpb"   "eiscedp"    "edlvpfat"   "edlvpebe"   "edlvpehr"  
## [476] "edlvpgcy"   "edlvpdfi"   "edlvpdfr"   "edupdde1"   "edupcde2"  
## [481] "edlvpegr"   "edlvpdahu"  "edlvpdis"   "edlvpdie"   "edlvpfit"  
## [486] "edlvpdlt"   "edlvpenl"   "edlvpeno"   "edlvphpl"   "edlvpept"  
## [491] "edlvpdrs"   "edlvpdsk"   "edlvpesi"   "edlvphes"   "edlvpdse"  
## [496] "edlvpdch"   "edupcgb1"   "edupbgb2"   "edagepgb"   "pdwrkp"    
## [501] "edctnp"     "uemplap"    "uemplip"    "dsbldp"     "rtrdp"     
## [506] "cmsrvp"     "hswrkp"     "dngothp"    "dngdkp"     "dngnapp"   
## [511] "dngrefp"    "dngnap"     "mnactp"     "crpdwkp"    "isco08p"   
## [516] "emprelp"    "wkhtotp"    "edulvlfb"   "eiscedf"    "edlvfeat"  
## [521] "edlvfebe"   "edlvfehr"   "edlvfgcy"   "edlvfdfi"   "edlvfdfr"  
## [526] "edufcde1"   "edufbde2"   "edlvfegr"   "edlvfdahu"  "edlvfdis"  
## [531] "edlvfdie"   "edlvffit"   "edlvfdlt"   "edlvfenl"   "edlvfeno"  
## [536] "edlvfgpl"   "edlvfept"   "edlvfdrs"   "edlvfdsk"   "edlvfesi"  
## [541] "edlvfges"   "edlvfdse"   "edlvfdch"   "edufcgb1"   "edufbgb2"  
## [546] "edagefgb"   "emprf14"    "occf14b"    "edulvlmb"   "eiscedm"   
## [551] "edlvmeat"   "edlvmebe"   "edlvmehr"   "edlvmgcy"   "edlvmdfi"  
## [556] "edlvmdfr"   "edumcde1"   "edumbde2"   "edlvmegr"   "edlvmdahu" 
## [561] "edlvmdis"   "edlvmdie"   "edlvmfit"   "edlvmdlt"   "edlvmenl"  
## [566] "edlvmeno"   "edlvmgpl"   "edlvmept"   "edlvmdrs"   "edlvmdsk"  
## [571] "edlvmesi"   "edlvmges"   "edlvmdse"   "edlvmdch"   "edumcgb1"  
## [576] "edumbgb2"   "edagemgb"   "emprm14"    "occm14b"    "atncrse"   
## [581] "anctrya1"   "anctrya2"   "regunit"    "region"     "ipcrtiva"  
## [586] "impricha"   "ipeqopta"   "ipshabta"   "impsafea"   "impdiffa"  
## [591] "ipfrulea"   "ipudrsta"   "ipmodsta"   "ipgdtima"   "impfreea"  
## [596] "iphlppla"   "ipsucesa"   "ipstrgva"   "ipadvnta"   "ipbhprpa"  
## [601] "iprspota"   "iplylfra"   "impenva"    "imptrada"   "impfuna"   
## [606] "testji1"    "testji2"    "testji3"    "testji4"    "testji5"   
## [611] "testji6"    "testji7"    "testji8"    "testji9"    "respc19a"  
## [616] "symtc19"    "symtnc19"   "vacc19"     "recon"      "inwds"     
## [621] "ainws"      "ainwe"      "binwe"      "cinwe"      "dinwe"     
## [626] "einwe"      "finwe"      "hinwe"      "iinwe"      "kinwe"     
## [631] "rinwe"      "inwde"      "jinws"      "jinwe"      "inwtm"     
## [636] "mode"       "domain"     "prob"       "stratum"    "psu"       
## [641] "d20"        "d21"        "d22"        "d23"        "d24"       
## [646] "d25"        "d26"        "d27"        "depression"

# descriptive statistics of depression (to the describe the variable).
summary(df_uk$depression)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   1.000   1.375   1.625   1.730   2.000   4.000      49

hist(df_uk$depression, breaks = 6, main = "Histogram: Depression (UK)",
xlab = "Depression (UK)", col = ("lightblue")) # show skewness

# show also with breaks = 12
hist(df_uk$depression, breaks = 12, main = "Histogram: Depression (UK)", xlab = "Depression (UK)", col = ("maroon"))

Furthermore we looked at the frequency distribution of Depression in the UK. We also identified the Mean Depression Score in the UK. The mean is 1.73.

library(kableExtra)
library(knitr)
# check further (frequency table)
table(df_uk$depression)

## 
##     1 1.125  1.25 1.375   1.5 1.625  1.75 1.875     2 2.125  2.25 2.375   2.5 
##   103    98   172   201   167   158   146   144    94    78    55    43    34 
## 2.625  2.75 2.875     3 3.125  3.25 3.375   3.5 3.625  3.75 3.875     4 
##    35    27    14    15    11    11     9     9     1     3     3     4

table_dep=data.frame(table(df_uk$depression))



#kable(table_dep,
      #col.names = c("Depression Score","Frequency"),
      #caption = "Frequency Distribution of Depressionscores in the UK")

#kable_styling(
 # kable(table_dep,
     # col.names = c("Depression Score","Frequency"),
      #caption = "Frequency Distribution of Depressionscores in the UK"
      #),full_width = F, font_size = 13, bootstrap_options = c("hover", #"condensed"))

scroll_box(
  kable_styling(
  kable(table_dep, col.names = c("Depression Score","Frequency"),
      caption = "Frequency Distribution of Depressionscores in the UK"
      ),full_width = F, font_size = 13, bootstrap_options = c("hover", "condensed")),height="300px")

Frequency Distribution of Depressionscores in the UK
Depression Score	Frequency
1	103
1.125	98
1.25	172
1.375	201
1.5	167
1.625	158
1.75	146
1.875	144
2	94
2.125	78
2.25	55
2.375	43
2.5	34
2.625	35
2.75	27
2.875	14
3	15
3.125	11
3.25	11
3.375	9
3.5	9
3.625	1
3.75	3
3.875	3
4	4

table_dep=data.frame(table(df_uk$depression))

# frequency distribution of the new variable (depression)
# interpretation:
# min. is 1, max. is 4
# we have at least one individual who answered all items by 1 (lowest possible depression level) and at least on individual who answered all answers by 4 (highest possible depression level)
# most participants report low to moderate depression:
# median (1.625) and mean (1.730) are relatively low, indicating that most participants fall on the lower half of the depression scale (1 to 2 = 1283 participants - see below)
# only a few participants report higher to high levels of the depression scale (> 2 to 4 = 352 participants - see below)
# median is higher than the mean - right skew
# but missing data (49 NA's)
### just for a further check:
# counting participants with depression scores between 1 and 2
# counting participants with depression scores > 2 (up to 4)
# store the frequency table uk - depression in a new variable
depression_table_uk = table(df_uk$depression)
depression_table_uk

## 
##     1 1.125  1.25 1.375   1.5 1.625  1.75 1.875     2 2.125  2.25 2.375   2.5 
##   103    98   172   201   167   158   146   144    94    78    55    43    34 
## 2.625  2.75 2.875     3 3.125  3.25 3.375   3.5 3.625  3.75 3.875     4 
##    35    27    14    15    11    11     9     9     1     3     3     4

# print
# sum of participants between 1 and 2 (inclusive)
depression_scale_1_to_2 = sum(depression_table_uk[names(depression_table_uk) >= 1 & names(depression_table_uk) <= 2])
depression_scale_1_to_2 # print

## [1] 1283

# sum of participants > 2 (up to 4)
depression_scale_gt2 = sum(depression_table_uk[names(depression_table_uk) > 2])
depression_scale_gt2 # print

## [1] 352

# double-check
sum(1283, 352) # 1635 participants (uk - depression), excluding 49 NA's

## [1] 1635

# HYPOTHESIS BUIDLING
# determinants of depression
# variables:
# dscrsex: discrimination based on respondent's sexuality
# dscrrce: discrimination based on respondent's colour and race
# agea: age of respondent
# gndr: gender of respondent (male, female)
# hypothesis (all referring to the United Kingdom)
# H1: prevalence of depression increases with experienced discrimination based on an individual's sexuality
# H2: prevalence of depression increases with experienced discrimination based on an individual's skin colour or race
# H3: prevalence of depression decreases with age 
# H4: prevalence of depression among females compared to males is higher


# BIVARIATE ANALYSIS AND OPERATIONALIZATION (check these hypotheses on a bivariate level )
# hypothesis 1: prevalence of depression increases with experienced discrimination (sexuality; UK)
by(df_uk$depression, df_uk$dscrsex, mean, na.rm=T)

## df_uk$dscrsex: Not marked
## [1] 1.724875
## ------------------------------------------------------------ 
## df_uk$dscrsex: Marked
## [1] 1.927632

by(df_uk$depression, df_uk$dscrsex, summary, na.rm=T) # also meaningful for interpretation

## df_uk$dscrsex: Not marked
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   1.000   1.375   1.625   1.725   2.000   4.000      49 
## ------------------------------------------------------------ 
## df_uk$dscrsex: Marked
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.125   1.500   1.938   1.928   2.125   3.250

means_df=data.frame (by(df_uk$depression, df_uk$dscrsex, mean, na.rm=T))

#means_df

#kable(means_df)

Depression score related to Sexuality
Gay or Straight	Average Depression Score
Not marked	1.724875
Marked	1.927632

# mean depression score for two groups (Not marked, Marked)
# Not marked (no discrimination based on sexuality) - Mean = 1.72
# Marked (discrimination based on sexuality was perceived or reported) - Mean = 1.92 (rounded 1.93)
# this is a difference of 0.203 points (on the scale)
# interpretation: In the UK, participants who report experiencing discrimination (sexuality) have, on average, higher depression scores 
# compared to participants who do not report discrimination (sexuality).
# check further to see if this difference is statistically significant
# independent Samples t-test?
# check for normal distribution of the data
# histogram for "not marked" group
hist(df_uk$depression[df_uk$dscrsex == "Not marked"], breaks = 12, main = "Histogram: Not marked", xlab = "Depression Score", col = "yellow")

# histogram for "Marked" group
hist(df_uk$depression[df_uk$dscrsex == "Marked"], breaks = 12, main = "Histogram: Marked", xlab = "Depression Score", col = "lightgreen")

# visual inspection of both histograms: data do not follow a normal distribution
# t-test for independent samples not possible - use Wilcoxon-test (Mann-Whitney-U-test) instead
wilcox.test(df_uk$depression ~ df_uk$dscrsex)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  df_uk$depression by df_uk$dscrsex
## W = 23457, p-value = 0.01628
## alternative hypothesis: true location shift is not equal to 0

# interpretation:
# p-value = 0.01628 (very low) and is < 0.05 (significance level)
# there is a significant difference between the two groups of dscrsex "Not marked" and "Marked"
# in terms of the average depression scores.
# H1 done.


# hypothesis 2: prevalence of depression increases with experienced discrimination (skin colour or race; UK)
by(df_uk$depression, df_uk$dscrrce, mean, na.rm=T)

## df_uk$dscrrce: Not marked
## [1] 1.724269
## ------------------------------------------------------------ 
## df_uk$dscrrce: Marked
## [1] 1.866803

by(df_uk$depression, df_uk$dscrrce, summary, na.rm=T) # also meaningful for interpretation

## df_uk$dscrrce: Not marked
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   1.000   1.375   1.625   1.724   2.000   4.000      49 
## ------------------------------------------------------------ 
## df_uk$dscrrce: Marked
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.500   1.875   1.867   2.250   3.375

# mean depression score for two groups (Not marked, Marked)
# Not marked (no discrimination based on skin colour or race) - Mean = 1.72
# Marked (discrimination based on skin colour or race was perceived or reported) - Mean = 1.86 (rounded 1.87) 
# this is a difference of 0.143 points (on the scale), which is even a lower difference on the scale than for "dscrsex" - borderline-significant (wegen der Standardabweichung)
# interpretation: In the UK, participants who report experiencing discrimination (skin colour or race) have, on average, higher depression scores 
# compared to participants who do not report discrimination (skin colour or race).
# check further to see if this difference is statistically significant
# which test is appropriate?
# check for normal distribution of the data
# histogram for "not marked" group
hist(df_uk$depression[df_uk$dscrrce == "Not marked"], breaks = 12, main = "Histogram: Not marked", xlab = "Depression Score", col = "pink")

# histogram for "Marked" group
hist(df_uk$depression[df_uk$dscrrce == "Marked"], breaks = 12, main = "Histogram: Marked", xlab = "Depression Score", col = "lightblue")

# histograms: probably no normal distribution of the data
# use Wilcoxon-test (rank based)
wilcox.test(df_uk$depression ~ df_uk$dscrrce)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  df_uk$depression by df_uk$dscrrce
## W = 38817, p-value = 0.0108
## alternative hypothesis: true location shift is not equal to 0

# interpretation:
# p-value = 0.0108 (very low) and is < 0.05 (significance level)
# there is a significant difference between the two groups of dscrrce "Not marked" and "Marked"
# in terms of the average depression scores.
# H2 done.


# hypothesis 3: prevalence of depression decreases with age (UK)
table(df_uk$agea) # just to check first (not meaningful): youngest 15y, oldest 90y

## 
## 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 
##  5  8  9  6  7 10 12 11 10 19 18 26 15 16 20 25 19 32 34 30 22 40 24 37 19 20 
## 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 
## 27 16 28 29 22 21 29 37 20 27 22 17 27 20 24 20 24 25 26 31 31 25 25 26 21 29 
## 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 
## 31 33 23 36 24 32 27 23 26 27 22 18 28 31 21 18 13 14  9 10  5 10  7 16

# convert "agea" (age) into numeric
df_uk$age = as.numeric(as.character(df_uk[,"agea"]))
# check: scatter plot (visual inspection)
plot(df_uk$age, df_uk$depression, main = "Scatter Plot: Age, Depression" , xlab = "Age", ylab = "Depression")

# scatter plot shows: not linear - NO Pearson Product-Moment Correlation; assumption: no relationship between both variables.
# use spearman-correlation
# is there a statistically significant association between the two metric variables "depression" and "age"?
# and how strong is it?
cor(df_uk[, c("depression", "age")], method = "spearman", use = "complete.obs")

##             depression         age
## depression  1.00000000 -0.04156594
## age        -0.04156594  1.00000000

# interpretation:
# spearman's correlation coefficient between depression and age is -0.04 (very weak negative correlation).
# as age increases, depression score tends to decrease (and vice versa).
# indicates that H3 holds. However:
# correlation coefficient of -0.04 (= -8%) is very close to 0; indicates a very weak relationship between depression and age, almost none (see also scatterplot).
# in the context of this dataset for the UK, age has little to no meaningful impact on depression scores.
# check further for statistically significance: what is the p-value? 
# store in variable "pvalue"
pvalue = cor.test(df_uk$depression, df_uk$age, method = "spearman")
pvalue # print p-value: 0.09598

## 
##  Spearman's rank correlation rho
## 
## data:  df_uk$depression and df_uk$age
## S = 717728947, p-value = 0.09598
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##         rho 
## -0.04156594

# interpretation:
# H3 is rejected, H0 is retained: sample data supports H0 (no relationship).
# H3 done.


# hypothesis 4: prevalence of depression among females compared to males is higher (UK)
by(df_uk$depression, df_uk$gndr, mean, na.rm=T)

## df_uk$gndr: Male
## [1] 1.686875
## ------------------------------------------------------------ 
## df_uk$gndr: Female
## [1] 1.770509

by(df_uk$depression, df_uk$gndr, summary, na.rm=T) # some NA's

## df_uk$gndr: Male
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   1.000   1.375   1.625   1.687   1.875   4.000      24 
## ------------------------------------------------------------ 
## df_uk$gndr: Female
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   1.000   1.375   1.625   1.771   2.000   4.000      25

plot(df_uk$gndr, df_uk$depression) # boxplot

# mean depression score for two groups (Male, Female) - compare group means
# male (prevalence of depression is lower in male compared to female) - Mean = 1.68 (rounded 1.69)
# female (prevalence of depression is higher in female compared to male) - Mean = 1.77 
# this is a difference of 0.083 points (on the scale)
# interpretation: In the UK, female participants have, on average, higher depression scores 
# compared to male participants.
# check further to see if this difference in means is statistically significant
# check for normal distribution of the data
# histogram for "Male" group
hist(df_uk$depression[df_uk$gndr == "Male"], breaks = 12, main = "Histogram: Male", xlab = "Depression Score", col = "orange")

# histogram for "Female" group
hist(df_uk$depression[df_uk$gndr == "Female"], breaks = 12, main = "Histogram: Female", xlab = "Depression Score", col = "red")

# both histograms: no normal distribution of the data (right skewed)
# use Wilcoxon-test
wilcox.test(df_uk$depression ~ df_uk$gndr)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  df_uk$depression by df_uk$gndr
## W = 303649, p-value = 0.001413
## alternative hypothesis: true location shift is not equal to 0

# interpretation:
# p-value = 0.001413 and is < 0.05 (significance level)
# there is a significant difference between the two groups of "gndr", namely "Male" and "Female"
# in terms of the average depression scores.
# H4 done.
# bivariate statistics end here.


# now we need metric information
# USE DUMMY CODING (categorical data into a numerical format (binary: 0, 1) for "dscrsex", "dscrrce", "gndr"
## discrimination, sexuality (dscrsex): 0 = not marked, 1 = marked
## discrimination, skin colour or race (dscrrce): 0 = not marked, 1 = marked
## gender (gndr): 0 = male, 1 = female

# create dummy variable: marked (discrimination based on sexuality was perceived or reported)
# frequency distribution of "dscrsex"
table(df_uk$dscrsex)

## 
## Not marked     Marked 
##       1646         38

df_uk$marked = NA #initialize variable with NA 
df_uk$marked[df_uk$dscrsex=="Not marked"] = 0
df_uk$marked[df_uk$dscrsex=="Marked"] = 1
# check
table(df_uk$marked)

## 
##    0    1 
## 1646   38

table(df_uk$dscrsex, df_uk$marked)

##             
##                 0    1
##   Not marked 1646    0
##   Marked        0   38

# create dummy variable:  marked (discrimination based on skin colour or race was perceived or reported)
# frequency distribution of "dscrrce"
table(df_uk$dscrrce)

## 
## Not marked     Marked 
##       1623         61

df_uk$marked = NA #initialize variable with NA 
df_uk$marked[df_uk$dscrrce=="Not marked"] = 0
df_uk$marked[df_uk$dscrrce=="Marked"] = 1
# check
table(df_uk$marked)

## 
##    0    1 
## 1623   61

table(df_uk$dscrrce, df_uk$marked)

##             
##                 0    1
##   Not marked 1623    0
##   Marked        0   61

# create dummy variable: female
# frequency distribution of "gndr"
table(df_uk$gndr)

## 
##   Male Female 
##    824    860

df_uk$female = NA #initialize variable with NA 
df_uk$female[df_uk$gndr=="Male"] = 0
df_uk$female[df_uk$gndr=="Female"] = 1
# check
table(df_uk$female)

## 
##   0   1 
## 824 860

table(df_uk$gndr, df_uk$female)

##         
##            0   1
##   Male   824   0
##   Female   0 860

# MULTIVARIATE ANALYSIS
# multivariate Modelling: put everything together in a multiple regression model
# how strongly do the variables dscrsex and dscrrce influence the depression score?
# model 1: discrimination effects (both, "dscrsex" and "dscrrce")
lm(depression ~ dscrsex + dscrrce, data = df_uk)

## 
## Call:
## lm(formula = depression ~ dscrsex + dscrrce, data = df_uk)
## 
## Coefficients:
##   (Intercept)  dscrsexMarked  dscrrceMarked  
##        1.7211         0.1757         0.1169

# if the independent variables (dscrsex + dscrrce) are zero, depression is estimated 1.72 on average (unrealistic assumption)
# an increase of experienced/perceived discrimination (sexuality) by 1, leads to 0.1757 additional depression points on average if experienced discrimination (skin colour or race) remains constant.
# an increase of experienced discrimination (skin colour or race) by 1, leads to 0.1169 additional depression points on average if experienced discrimination (sexuality) remains constant.

# save model to show extended summary (lookup: p-value and R^2)
model = lm(depression ~ dscrsex + dscrrce, data = df_uk)
summary(model)

## 
## Call:
## lm(formula = depression ~ dscrsex + dscrrce, data = df_uk)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.88873 -0.34614 -0.09614  0.27886  2.27886 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    1.72114    0.01387 124.102   <2e-16 ***
## dscrsexMarked  0.17574    0.09125   1.926   0.0543 .  
## dscrrceMarked  0.11685    0.07254   1.611   0.1074    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5464 on 1632 degrees of freedom
##   (49 observations deleted due to missingness)
## Multiple R-squared:  0.004699,   Adjusted R-squared:  0.003479 
## F-statistic: 3.852 on 2 and 1632 DF,  p-value: 0.02142

# experienced/perceived discrimination (sexuality) effect is borderline significant (p < 0.1)
# experienced/perceived discrimination (skin colour or race) is not significant (p = 0.1074)
# R-squared = 0.47%, i.e. we can "explain" 0.47% of the total variation of depression by these two variables
# however, variable "dscrrce" is not significant
# keep it for now - in case remove it later on (in the final model).


# model 2: Add Age and gender effect to the model (structural variables)
# reminder: for male participants, variable "female" = 0; for female participants, variable "female" = 1
lm(depression ~ dscrsex + dscrrce + age + female, data=df_uk)

## 
## Call:
## lm(formula = depression ~ dscrsex + dscrrce + age + female, data = df_uk)
## 
## Coefficients:
##   (Intercept)  dscrsexMarked  dscrrceMarked            age         female  
##     1.7238869      0.1728510      0.1310334     -0.0009033      0.0874889

# if the independent variables (dscrsex + dscrrce + age + female) are zero, depression is estimated 1.72 on average (unrealistic assumption)
# an increase of experienced/perceived discrimination (sexuality) by 1, leads to 0.1729 additional depression points on average if all others remain constant.
# an increase of experienced/perceived discrimination (skin colour or race) by 1, leads to 0.1310 additional depression points on average if all others remain constant.
# for every 1-year increase in age, the depression score decreases by 0.0009033 on average if all others remain constant. 
# as a person gets older, their depression score decreases slightly, but the coefficient is quite small.
# female compared to male participants show higher depression by 0.0874889 points on average if all others remain constant.

# save model to show extended summary for all independent variables
model = lm(depression ~ dscrsex + dscrrce + age + female, data=df_uk,)
summary(model)

## 
## Call:
## lm(formula = depression ~ dscrsex + dscrrce + age + female, data = df_uk)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.8531 -0.3993 -0.1195  0.2365  2.3258 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    1.7238869  0.0442303  38.975  < 2e-16 ***
## dscrsexMarked  0.1728510  0.0918700   1.881  0.06009 .  
## dscrrceMarked  0.1310334  0.0746986   1.754  0.07959 .  
## age           -0.0009033  0.0007264  -1.243  0.21390    
## female         0.0874889  0.0273102   3.204  0.00138 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5464 on 1600 degrees of freedom
##   (79 observations deleted due to missingness)
## Multiple R-squared:  0.01268,    Adjusted R-squared:  0.01022 
## F-statistic: 5.139 on 4 and 1600 DF,  p-value: 0.0004092

# experienced/perceived discrimination (sexuality) effect is borderline significant (p < 0.1)
# experienced/perceived discrimination (skin colour or race) effect is borderline significant (p < 0.1)
# age effect is not significant (p = 0.21390)
# gender effect is significant (p < 0.01)
# R-squared = 1.268% (rounded 1.3%), i.e. we can explain 1.3% of the total variation of depression by these determinants.
# however, age is not significant - remove age effect too


# PUT EVERYTHING TOGETHER TO OBTAIN THE FINAL MODEL (remove age effect and dscrrce effect)
lm(depression ~ dscrsex + female, data=df_uk)

## 
## Call:
## lm(formula = depression ~ dscrsex + female, data = df_uk)
## 
## Coefficients:
##   (Intercept)  dscrsexMarked         female  
##       1.68041        0.21535        0.08648

# if the independent variables (dscrsex + female) are zero, depression is estimated 1.68 on average (unrealistic assumption)
# an increase of experienced discrimination (sexuality) by 1, leads to 0.21535 additional depression points on average if the other variable remain constant.
# female compared to male participants show higher depression by 0.08648 points on average if the other variable remain constant.

# depression = 1.68041 + 0.21535*dscrsex Marked + 0.086*female
# we receive different models (differ by their intercept):
# one for participants who reported "marked", one for "not marked" participants, related to discrimination (sexuality)
# one for female participants, one for male participants
## dscrsex marked: depression =      1.68041 + 0.21535*1 + 0.086*female (= 1.89576 + 0.086*female)
## dscrsex Not marked: depression =  1.68041 + 0.21535*0 + 0.086*female (= 1.68041 + 0.086*female)
## gndr male: depression =           1.68041 + 0.21535*dscrsexMarked + 0.086*0 (= 1.68041 + 0.21535*dscrsexMarked)
## gndr female: depression =         1.68041 + 0.21535*dscrsexMarked + 0.086*1 (= 1.76641 + 0.21535*dscrsexMarked)


# save model to show extended summary
model = lm(depression ~ dscrsex + female, data=df_uk)
summary(model)

## 
## Call:
## lm(formula = depression ~ dscrsex + female, data = df_uk)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.7708 -0.3919 -0.1419  0.2331  2.3196 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    1.68041    0.01946  86.348  < 2e-16 ***
## dscrsexMarked  0.21535    0.08957   2.404  0.01631 *  
## female         0.08648    0.02700   3.203  0.00138 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5452 on 1632 degrees of freedom
##   (49 observations deleted due to missingness)
## Multiple R-squared:  0.009346,   Adjusted R-squared:  0.008132 
## F-statistic: 7.698 on 2 and 1632 DF,  p-value: 0.0004704

# experienced/perceived discrimination (sexuality) effect is significant (p < 0.05)
# gender effect is significant (p < 0.01)
# R-squared = 0.9346% (rounded 0.93%), i.e. we can explain 0.93% of the total variation of depression by these determinants.
# only 0.93% of the variation in depression is explained by discrimination (sexuality) and gender.
# this mean, that there must be additional factors influencing depression that are not included in this model (model explains the depression outcomes not enough).
# 

#Weighted and unweighted Data

model = lm(depression ~ dscrsex + dscrrce + age + female, data=df_uk,weights=pspwght)

model = lm(depression ~ dscrsex + dscrrce + age + female, data=df_uk)

Weighted and Unweighted Data

Weighted:

Unweighted:

Discussion

Our results mirror the results in other papers. For example that LGTBQ+ people are more likely to suffer from depression than straight people. The United Kingdom Survey on the Mental Health of LGBTQ+ (2024), highlighted that problem before us and claimed that victimization, discrimination, and lack of access to affirming spaces result in poorer mental health status. With our data we can confirm those findings.

As well as our findings that different skin colour contributes to higher depression scores than white people, could be linked to higher rates of discrimination, victimization and lack of affirming spaces. According to ”Stop Hate UK” a help organization against hate crime in the UK, 43% of all hate crimes reported to their helpline were because of racism. This could result from the historical legacy of Colonialism and Empire, where racism is deeply rooted in. Another possible explanation could be the Lack of Representation, Ethnic minorities are underrepresented in positions of power across politics, media, and business.

Our results concerning the correlation between age and depression showed little to none significance. Age does not seem to have an influence on depression scores. The slight negative correlation could be interpreted that with age resilience rises and that people are more settled to stand against depression.

The gender gap between men and women continues with depression scores. We found a significant difference in depression scores between men and women. Possible explanations for these findings could be the higher strain women face in our society. From poorer payment, responsibility at home and parenting.

Nevertheless our Regression Analysis showed little impact of sexuality and gender on depression. Therefore further research is needed to identify bigger drivers of depression. According to the “Mental Health Foundation, UK”- “People living in the lowest socioeconomic groups are more likely to experience common mental health problems such as depression and anxiety.”-. Loneliness is another strong driver of depression, especially in elderly people (Sheffield Hallam University, 2025). Furthermore a lack of access and inequalities in health care services in the UK account for higher depression rates (Royal College of Psychiatrist, 2025). These variables could be more dominant when looking at determinants of depression as well as exercise, food and lifestyle choices. Further research has to be done to verify these associations.

##         
##            0   1
##   Male   824   0
##   Female   0 860

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.