INTRODUCTION

Exploring Social Determinants and Depression: A Multivariate Analysis

Depression is one of the leading causes of disability, affecting about 280 million people worldwide (Paiva, T. et al.,2023). It is characterized by persistent feelings of sadness, loss of interest, and cognitive impairment, with significant social and economic consequences. Approximately 4.3% of the global population is affected by depression, which has substantial financial and health impacts on the global burden of disease (Liu, J.et al.,2024). Research highlights that various social health determinants, such as sleep patterns, social support, loneliness, financial stress, age, gender and education, strongly influence the onset, progression, and severity of depression (Onyekachi et al., 2024). WHO reports higher rates of depression cases in high-income countries, while many cases in low- and middle-income countries (LMICs) go under diagnosed and untreated due to limited mental health resources and stigma (WHO,2021). Due to the growing complexity of mental health issues globally, examining the relationship between social health determinants and depression is essential for targeted interventions and policies.

Problem Statement

Depression is a significant growing health concern affecting millions of people worldwide and contributing to the global disease burden caused by several factors (WHO,2021). This mental illness can lead to self-harm or suicide and is characterized by high levels of sadness, apathy, guilt, low self-confidence, poor sleep, fatigue, and difficulty concentrating. These factors may be social and emotional factors such as sleeplessness, loneliness, sadness and unhappiness strongly correlating with the onset and severity of depression. Lack of sleep, loneliness and unhappiness increase the risk of depression by impacting emotional regulation and reducing social support (Baglioni et al.,2011). Understanding these predictors is important to improve mental health interventions and reduce the global disease burden. The justification for choosing this dependent variable lies in the rising prevalence of depression and the need to understand the social determinants that contribute to its onset and severity.

Literature review

Prevalence and relationship between depression and social health determinants

Social Health Determinants as Explanatory Variables:

In Finland, the prevalence of depression remains a pressing issue, with approximately 5.9% of the population experiencing depressive symptoms annually (OECD, 2023). Recent studies highlight that social health determinants, such as gender, age, income, employment status, education level, and social support, play a crucial role in influencing mental health outcomes (WHO, 2022).

In this study, we shall explore how social determinants such as happiness, lack of sleep, sadness and loneliness contribute to increasing levels of depression among the population of Finland.

Sleeplessness with depression

Poor sleep patterns and disturbances are strong predictors of depression. Insufficient sleep exacerbates the relationship between lack of sleep and high depression rates, intensifying symptoms such as sadness, fatigue, and cognitive impairment (Lee et al., 2024). According to (Dong, L.et al., 2022) show that people who experience chronic sleep disturbances have 1.9 times higher to develop depression than those with healthy sleep patterns, Studies show that supportive social network helps mitigate the adverse effects of stress and reduces the risk of depression (Onyekachi et al., 2024).

####Loneliness and social isolation

Loneliness is a powerful social determinant of depression. Research indicates that loneliness increases the risk of depression by 26%, particularly among the elderly and individuals with limited social support (Cacioppo et al., 2015). Social isolation reduces opportunities for emotional validation and meaningful interactions, which are crucial for mental well-being.

Sadness with depression

Persistent sadness is a core symptom of depression, closely associated with chronic stress, loss, and emotional trauma. Studies highlight that sadness leads to prolonged depressive episodes, especially when compounded by poor coping mechanisms and inadequate social support (Kessler et al., 2010).

Happiness as a Protective Factor

Happiness, while less studied in the context of depression, is known to be a buffer against mental illness(Seo, E. et al.,2018). Positive emotions and strong social connections foster resilience and promote better mental health outcomes. Individuals with higher happiness levels are more likely to engage in self-care behaviors and social activities, reducing the risk of depression(Luis.E.et al.,2021)

Methods

The study operationalized variables with depression status measured using the d20-d27 variables to create the CES-D8 depression scale. The independent variables were categorized on how frequently it happened in the past week i.e. sleeplessness, happiness, feelings of depression, sadness and loneliness. They were categorized as (none or almost none of the time, most of the time, all or almost all the time). Descriptive statistics summarized sample characteristics, while correlation analysis and multivariate regression analysis explored the relationship between social health determinants and depression. The data used was a subset of Finland.

Hypotheses:

Null hypothesis(H0): there is no relationship between the independent variables and depression

Alternative hypothesis(H1): There is an association between the independent variables and depression

H1: The more feeling of depression “fltdpr” is associated with higher levels of depression H1 : Sleeplessness “slprl” is associated with high levels of depression H1 : Sadness “fltsd” is associated to increase of depression levels H1 : Loneliness “fltlnl” is associated to higher depression levels H1 : Happiness “wrhpp” is associated to low levels of depression H1 : Could not get going “cldgng” is related to an increase in depression H1 : The feeling that everything did as effort “flteeff” is associated with an increase in depression.

Results

The data-set used in this analysis is drawn from the ESS11 with a subset focusing on Finland. The sample consists of 1563 respondents. The variables considered were psycho social determinants related to depression CES-D8 scale i.e. sleeplessness, loneliness, sadness, happiness, motivation and effort that will help to understand the emotional and mental health status of the sample population.

Below are the results obtained from the analysis methods used

#We had to first test the chosen variables ie fltlnl, slprl, fltsd, fltsd, wrhpp, cldgng, and flteeff from the database for reliability and the Cronbach’s alpha value was 0.714 which indicates acceptable internal consistency for the CES-D8 depression scale. # The table below shows the descriptive statistics summarizing the distribution of the depression-dependent variable with its interpretation.

library(foreign) #to read SPSS and other formats)
library(ltm)

## Loading required package: MASS

## Loading required package: msm

## Loading required package: polycor

library(kableExtra)
library (ggplot2)
library (broom)
library(likert)

## Loading required package: xtable

library(pseudo)

## Loading required package: KMsurv

## Loading required package: geepack

library(xtable)
library(MASS)
library(msm)
library(polycor)

df = read.spss("C:/Users/User/Downloads/ssp saffie/ESS11.sav", to.data.frame = T)
# knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)

Statistic measure Value Interpretation

Min (Minimum) 7.00 The lowest depression score in the dataset is 7. 1st Quartile (Q1) 9.00 25% of the participants scored ≤ 9 on the depression scale. Median (Q2) 10.00 The middle value in the dataset (50% of participants scored ≤ 10). Mean (Average) 10.78 The average depression score is 10.78. 3rd Quartile (Q3) 12.00 75% of participants scored ≤ 12 on the depression scale. Max (Maximum) 28.00 The highest depression score in the dataset is 28. NA’s (Missing Values) 18 There are 18 missing values in the dataset.

#we decided to subset Finland data
DataFI = subset(df,cntry == "Finland")
#DataFI

#selected explanatory variables/predictor variables 
#fltlnl felt lonely, how often past week
#slprl sleep was restless,how often past week
#fltdpr felt depressed,how often past week
#wrhpp felt happy in the past week
#fltsd felt sad how often in the past week
#cldgng couldnot get going how often in the past week
#flteeff felt everything did as effort, how often past week
#enjlf enjoy life how often past week

# Now I am going to elaborate d20-d27 variables to create the CES-D8 depression scale 
# First of all I have to reverse "wrhpp"  as their indication of wellbeing is exactly reversed from the other variables
# I will do this with "5-" so we get numbers from 1-4 

DataFI$wrhpp_num = as.numeric(DataFI$wrhpp)

DataFI$wrhpp_num = 5 - DataFI$wrhpp_num  #add varible no eight

DataFI$enjlf_num = as.numeric(DataFI$enjlf)
DataFI$enjlf_num = 5 - DataFI$enjlf_num  #add varible no eight

#table(DataFI$wrhpp_num)

# Now I transform the other scales into numeric ones to calculate with it

DataFI$fltdpr_num = as.numeric(DataFI$fltdpr) 
DataFI$flteeff_num = as.numeric(DataFI$flteeff) 
DataFI$slprl_num = as.numeric(DataFI$slprl) 
DataFI$fltlnl_num = as.numeric(DataFI$fltlnl) 
DataFI$fltsd_num = as.numeric(DataFI$fltsd) 
DataFI$cldgng_num = as.numeric(DataFI$cldgng) 

# long version
subset_vars <- DataFI[, c("fltdpr_num", "enjlf_num", "slprl_num", "wrhpp_num", "fltlnl_num", "fltsd_num", "flteeff_num", "cldgng_num")]

likert_means = c()
likert_means$wrhpp_num = mean(DataFI$wrhpp, na.rm=T)

## Warning in mean.default(DataFI$wrhpp, na.rm = T): argument is not numeric or
## logical: returning NA

likert_means$enjlf_num = mean(DataFI$enjlf, na.rm=T)

## Warning in mean.default(DataFI$enjlf, na.rm = T): argument is not numeric or
## logical: returning NA

likert_means$fltdpr_num = mean(DataFI$fltdpr, na.rm=T)

## Warning in mean.default(DataFI$fltdpr, na.rm = T): argument is not numeric or
## logical: returning NA

likert_means$fltnl_num = mean(DataFI$fltnl, na.rm=T)

## Warning in mean.default(DataFI$fltnl, na.rm = T): argument is not numeric or
## logical: returning NA

likert_means$slprl_num = mean(DataFI$slprl, na.rm=T)

## Warning in mean.default(DataFI$slprl, na.rm = T): argument is not numeric or
## logical: returning NA

likert_means$fltsd_num = mean(DataFI$fltsd, na.rm=T)

## Warning in mean.default(DataFI$fltsd, na.rm = T): argument is not numeric or
## logical: returning NA

likert_means$cldgng_num = mean(DataFI$cldgng, na.rm=T)

## Warning in mean.default(DataFI$cldgng, na.rm = T): argument is not numeric or
## logical: returning NA

likert_means$flteeff_num = mean(DataFI$flteeff, na.rm=T)

## Warning in mean.default(DataFI$flteeff, na.rm = T): argument is not numeric or
## logical: returning NA

library(likert)
kable_styling(kable(likert(DataFI[, c("fltdpr", "enjlf", "slprl", "wrhpp", "fltlnl", "fltsd", "flteeff", "cldgng")])$results))

Item	None or almost none of the time	Some of the time	Most of the time	All or almost all of the time
fltdpr	82.500000	15.00000	1.730769	0.7692308
enjlf	3.727506	21.97943	52.120823	22.1722365
slprl	42.233633	46.40565	7.894737	3.4659820
wrhpp	3.848621	23.28416	58.370750	14.4964721
fltlnl	78.745199	17.15749	2.816901	1.2804097
fltsd	69.186419	28.63549	1.729661	0.4484305
flteeff	59.858703	32.56262	5.587669	1.9910083
cldgng	50.965251	40.92664	6.177606	1.9305019

## print means, out comment if not needed
# Now I can sum the rows and calculate the mean

DataFI$CES_D8 = rowSums(DataFI[, c("fltdpr_num","enjlf_num", "flteeff_num", "slprl_num", "wrhpp_num", "fltlnl_num", "fltsd_num", "cldgng_num")])

##summary(DataFI$CES_D8)

# For the bivariate association, we used correlation analysis and regression analysis between the independent variables and depression (CES-D8).

# The correlation analysis shows that CES-D8 is positively associated with loneliness, sadness, sleeplessness, difficulty getting going, and feelings of effort, while happiness negatively correlates with depression. The p-value of < 2e-16 for all the independent variables shows a significant prediction of depression.

##The multivariate regression model

# The regression analysis demonstrated a strong relationship between the predictor variables and depression (CES-D8 scale). Where the intercept (1.035) represents the baseline depression score when all the independent variables are at zero. All the predictor variables have significant coefficients(p<2e-16) indicating that higher levels of these factors are associated with increase in depression scores. The R-squared value (0.9468) suggests that approximately 94.68% of the variation in depression scores confirming its strong predictive power. The F-statistic (4563, p< 2.2e-16) further reinforces the model’s statistical significance.
## Predictors of Clinically Significant Depression

In this section, we extend our analysis by examining predictors of clinically significant depressive symptoms, using a binary outcome variable derived from the CES-D8 depression scale.

Creating Binary Outcome Variable

Following Briggs et al. (2018), we use a cut-off score of 9 on the CES-D8 scale to classify participants with clinically significant depression.

# Create binary outcome for clinical depression based on CES-D8 score
DataFI$depression_clinical <- ifelse((DataFI$CES_D8-8) >= 9, 1, 0)

# Frequency distribution
table(DataFI$depression_clinical)

## 
##    0    1 
## 1366  175

prop.table(table(DataFI$depression_clinical))

## 
##         0         1 
## 0.8864374 0.1135626

# Fit logistic regression model
DataFI$gndr <- as.factor(DataFI$gndr)
DataFI$agea <- as.numeric(DataFI$agea)
DataFI$eduyrs <- as.numeric(DataFI$eduyrs)
DataFI$emplrel <- as.numeric(DataFI$emplrel)

 model_logistic_full <- glm(depression_clinical ~  gndr + agea + eduyrs + emplrel,
                           data = DataFI, family = binomial)

# View model summary
summary(model_logistic_full)

## 
## Call:
## glm(formula = depression_clinical ~ gndr + agea + eduyrs + emplrel, 
##     family = binomial, data = DataFI)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)   
## (Intercept) -1.256910   0.487249  -2.580  0.00989 **
## gndrFemale   0.301363   0.166969   1.805  0.07109 . 
## agea        -0.007809   0.004296  -1.818  0.06912 . 
## eduyrs      -0.034508   0.020830  -1.657  0.09760 . 
## emplrel     -0.131406   0.226225  -0.581  0.56133   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1060.2  on 1516  degrees of freedom
## Residual deviance: 1051.5  on 1512  degrees of freedom
##   (46 observations deleted due to missingness)
## AIC: 1061.5
## 
## Number of Fisher Scoring iterations: 5

exp(coef(model_logistic_full))

## (Intercept)  gndrFemale        agea      eduyrs     emplrel 
##   0.2845319   1.3517003   0.9922216   0.9660806   0.8768620

# Calculate Confidence Intervals for ORs
exp(confint(model_logistic_full))

## Waiting for profiling to be done...

##                 2.5 %    97.5 %
## (Intercept) 0.1097772 0.7428004
## gndrFemale  0.9760106 1.8799047
## agea        0.9839235 1.0006547
## eduyrs      0.9269038 1.0057911
## emplrel     0.5463544 1.3335323

r_mcfadden = with(summary(model_logistic_full), 1 - deviance/null.deviance)
r_nagelkerke = with(summary(model_logistic_full), r_mcfadden/(1 - (null.deviance / nrow(model_logistic_full$data)*log(2))))
r_nagelkerke

## [1] 0.01552694

#intrepretation : Baseline odds of clinical depression (not very interpretable on its own but included as a model constant).

gndrFemale: Females might have higher odds of clinical depression compared to males, but the lower bound being 0.999 makes this borderline — we should check the p-value to confirm statistical significance.

agea: No significant association with odds of clinical depression.

Discussion

In testing the hypotheses stated above with p-value of < 2e-16*** significance since it is less than 0.05 standard significant threshold, indicating strong relationship between the predictor variables and dependent variable which is depression.

We therefore go on and reject the null hypothesis and accept the alternative hypotheses since the empirical findings strongly support the hypotheses, that sleeplessness, loneliness, sadness, loneliness, and the general emotional distress are significantly associated with higher levels of depression. The positive and statistically significant coefficients in the regression model indicating increased levels of feeling sad, lonely or restless as well as struggling with motivation and effort are strongly associated to higher CES-D8 depression scores. the variation of 94.68% confirms that the selected independent variables or predictors effectively explain variations in depression levels, reinforcing that the hypothesis that these factors are strong determinants of mental health outcomes.

Limitations and quality criteria

1.The analysis was restricted to one country Finland limiting generalizability of other countries in the whole data set.

2.The study was limited to selected social determinants not all determinants which would be important in this context.

3.The Cronbach alpha was done to test the reliability and validity of the predictor variables and with the alpha value of 0.714 , they were considered reliable to bring sufficient correlation.

#the maximum possible value, if you are “very depressed” is 28 #on the other hand the minimum possible value is 7

HYPOTHESIS: for 8 independent variables with depression

#H1:the more feeling of depression “fltdpr” is associated with higher levels of depression #H1:enjoy life “enjlfl” is associated with high levels of depression #H1:Sleeplessness “slprl” is associated with high levels of depression #H1:sadness “fltsd” is associated to increase of depression levels #H1: loneliness “fltlnl” is associated to higher depression levels #H1: happiness “wrhpp” is associated to low levels of depression #H1: “cldgng” Higher frequency of feeling unable to get going is associated with increased depression level #H1: “Flteeff” Higher frequency of feeling that everything was an effort is associated with increased depression levels #All of them show positive correlation with the depression symptoms apart from wrhpp which shows a negative correlation #all of these variables have the same scale of 0-10

BIVARIATE ANALYSIS

we shall use correlation and regression analysis techniques

#interaction of these variables for Finland # Correlation matrix correlation <- cor(subset_vars, use = “complete.obs”) correlation

calculate how depression symptoms change by increasing my independent variables by 1

modelFI = lm(DataFI$CES_D8 ~ fltdpr + enjlf + slprl + wrhpp + fltlnl + fltsd + cldgng + flteeff ,data = DataFI) summary(modelFI) #it seems that the hypotheses stated above are mostly correct,

linear regression model (lm)

Regression analysis

modelFI <- lm(CES_D8 ~ fltdpr_num + enjlf_num + slprl_num + wrhpp_num + fltlnl_num + fltsd_num + cldgng_num + flteeff_num, data = DataFI)

save model to show extended summary

model = lm(CES_D8 ~ fltdpr_num +enjlf_num + slprl_num + fltsd_num + fltlnl_num + cldgng_num + flteeff_num, data = DataFI) summary(model)

this p-value of < 2e-6*** significance since it is less than 0.05 standard significant threshold, indicating strong relationship between the predictor varibles and dependent variable which is depression.

so we go on and reject the null hypothesis and accept the alternative hypotheses.

vnames = c(“fltdpr_num”, “enjlf_num”, “slprl_num”, “wrhpp_num”, “fltlnl_num”, “fltsd_num”, “flteeff_num”, “cldgng_num”) likert_df = df[,vnames]

create basic frequencies

likert(likert_df)

create basic plot

plot(likert(likert_df))

Append mean

#Append mean -> convert to numeric likert_numeric_df = as.data.frame(lapply((df[,vnames]), as.numeric))

get means

likert_means = lapply((likert_numeric_df[,vnames]), mean, na.rm=T)

Append counts -> get counts

likert_counts = lapply((likert_numeric_df[,vnames]), function (x) sum(!is.na(x)))

append means and counts to table

likert_table = likert(likert_df)$results # we save the "inner" data frame of the likert structure ... likert_table$Mean = unlist(likert_means) # … and append new columns to the data frame likert_table$Count = unlist(likert_counts) likert_table # print extended table, outcomment if not needed

set new item labels (take care to define in correct order!)

likert_table$Item = c(

d20=“you felt depression?”, d21=“you felt everything you did was an effort?”, d22=“your sleep was restless?”, d23=“you were happy?”, d24=“you felt lonely?”, d25=“you enjoyed life?”, d26=“you felt sad?”, d27=“you could not get going ⁶⁵?.” )

round all percetage values to 1 decimal digit

likert_table[,2:6] = round(likert_table[,2:6],1) # round means to 3 decimal digits likert_table[,7] = round(likert_table[,7],3) # create formatted table kable_styling(kable(likert_table, caption = “Distribution of answers regarding depression scale (ESS round 11, only Finland” ))

create basic plot (code also valid)

plot(likert(summary=likert_table[,1:6])) # limit to columns 1:6 to skip mean and count #Regression model for design model

only design weight when its a country

lm(depression ~ agea +gndr+smkstat+eduyrs+region,data = df_fin,weights = dweight)



Interpretation: Baseline odds of clinical depression (not very interpretable on its own but included as a model constant).

gndrFemale: Females might have higher odds of clinical depression compared to males, but the lower bound being 0.999 makes this borderline — we should check the p-value to confirm statistical significance.

agea: No significant association with odds of clinical depression.


## R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.

When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:


``` r
summary(cars)

##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

finial ass 4 finial

saffie abia

2025-06-04