Depression is a global public health priority. According to the World Health Organization and the World Bank depression accounts for a full 10% of the total nonfatal burden of disease worldwide (World Health Organization & World Bank, 2016). Understanding the factors that contribute to depression is critical for developing effective interventions and policies to mitigate its impact on individuals and society. This study aims to explore the relationship between depression and key variables, including education level, gender, frequency of vegetable consumption and physical activity. These variables were selected as they encompass both socio-demographic and lifestyle influences known to affect mental health. Focusing on Ireland, a country that has participated in all rounds of the European Social Survey (ESS), this study draws on data from over 2,000 respondents to provide insights into the factors associated with depression.
How do education level, gender, frequency of vegetable consumption and physical activity, influence the prevalence or severity of depression among the population of Iceland?
• Depression
• Education level
• Gender
• Frequency of vegetable consumption
• Physical activity
library(foreign)
library(ltm)
## Loading required package: MASS
## Loading required package: msm
## Loading required package: polycor
setwd("/Users/lenapape/Desktop/Uni /Quantative Research R")
df = read.spss("ESS11.sav", to.data.frame = T)
#names(df)
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
In the following section we created the depression scale using the
eight factors of the CES-D8 Depression Scale, which assesses depressive
symptoms based on responses to 8 items:
1. ‘fltdpr’: how often have you felt depressed in the past week
2. ‘flteeff’: how often did everything you did in the past week feel as
an effort?
3. ‘slprl’: Sleep was restless, how often past week
4. ‘wrhpp’: Were happy, how often past week
5. ‘fltlnl’: Felt lonely, how often past week
6. ‘enjlf’: Enjoyed life, how often past week
7. ‘fltsd’: Felt sad, how often past week
8. ‘cldgng’: Could not get going, how often past week
All responses are converted into a numeric scala with values ranging from 1 to 4.
#convert to numbers 1-4
df$d20 = as.numeric(df$fltdpr)
df$d21 = as.numeric(df$flteeff)
df$d22 = as.numeric(df$slprl)
df$d23 = as.numeric(df$wrhpp)
df$d24 = as.numeric(df$fltlnl)
df$d25 = as.numeric(df$enjlf)
df$d26 = as.numeric(df$fltsd)
df$d27 = as.numeric(df$cldgng)
df$d23 = 5 - df$d23
df$d25 = 5 - df$d25
cronbach.alpha(df[,c("d20", "d21", "d22", "d23", "d24", "d25", "d26", "d27")], na.rm=T)
##
## Cronbach's alpha for the 'df[, c("d20", "d21", "d22", "d23", "d24", "d25", "d26", "d27")]' data-set
##
## Items: 8
## Sample units: 40156
## alpha: 0.823
To show if the scale demonstartes high internal reliability we calculated the cronbachs alpha value. In our case we received an alpha value of 0.824, this shows a high reliability because it is above 0.7.
To compute an overall depression score, we averaged the values of the
8 items (d20–d27) for each respondent. This new variable
depression reflects the mean score across the items, and we
used summary() to examine its distribution.
df$depression = rowSums(df[,c("d20", "d21", "d22", "d23", "d24", "d25", "d26", "d27")]) / 8
summary(df$depression)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.000 1.375 1.625 1.695 2.000 4.000 799
Furthermore, to show the distribution of the results of the depression scala in a histogram:
library(kableExtra)
library(knitr)
hist(df$depression,
breaks=8,
col = "lightblue",
border = "white",
main = paste("Histogramm of Depression Score"),
xlab = "Depression Score")
table_depression = as.data.frame(table(df$depression))
#kable( table_depression,
# col.names = c("Varianz","Frequency"),
#caption = "Distribution of Depression")
scroll_box(
kable_styling(
kable(table_depression,
col.names = c("Varianz","Frequency"),
caption = "Distribution of Depression"
),
font_size = 13, bootstrap_options = c("hover", "condensed")
),
height="200px")
| Varianz | Frequency |
|---|---|
| 1 | 2340 |
| 1.125 | 2595 |
| 1.25 | 4354 |
| 1.375 | 4601 |
| 1.5 | 4556 |
| 1.625 | 4000 |
| 1.75 | 3546 |
| 1.875 | 2955 |
| 2 | 2480 |
| 2.125 | 1896 |
| 2.25 | 1673 |
| 2.375 | 1144 |
| 2.5 | 815 |
| 2.625 | 601 |
| 2.75 | 480 |
| 2.875 | 351 |
| 3 | 271 |
| 3.125 | 198 |
| 3.25 | 141 |
| 3.375 | 108 |
| 3.5 | 87 |
| 3.625 | 62 |
| 3.75 | 40 |
| 3.875 | 29 |
| 4 | 34 |
In the next section we visualize which countries gathered data on the
depression scale. And with these results we selected the country of
Ireland.
# Subset the data to a single country; Ireland
# df$cntry
table(df$cntry)
##
## Albania Austria Belgium Bulgaria
## 0 2354 1594 0
## Switzerland Cyprus Czechia Germany
## 1384 685 0 2420
## Denmark Estonia Spain Finland
## 0 0 1844 1563
## France United Kingdom Georgia Greece
## 1771 1684 0 2757
## Croatia Hungary Ireland Israel
## 1563 2118 2017 0
## Iceland Italy Lithuania Luxembourg
## 842 2865 1365 0
## Latvia Montenegro North Macedonia Netherlands
## 0 0 0 1695
## Norway Poland Portugal Romania
## 1337 1442 1373 0
## Serbia Russian Federation Sweden Slovenia
## 1563 0 1230 1248
## Slovakia Turkey Ukraine Kosovo
## 1442 0 0 0
df_irl= df[df$cntry == "Ireland",]
#df_irl
table_irl = as.data.frame(table(df_irl$depression))
scroll_box(
kable_styling(
kable(table_irl,
col.names = c("Varianz","Frequency"),
caption = "Distribution of Depression in Ireland"
),
font_size = 13, bootstrap_options = c("hover", "condensed")
),
height="200px")
| Varianz | Frequency |
|---|---|
| 1 | 269 |
| 1.125 | 207 |
| 1.25 | 236 |
| 1.375 | 210 |
| 1.5 | 203 |
| 1.625 | 159 |
| 1.75 | 139 |
| 1.875 | 125 |
| 2 | 113 |
| 2.125 | 79 |
| 2.25 | 88 |
| 2.375 | 45 |
| 2.5 | 19 |
| 2.625 | 19 |
| 2.75 | 12 |
| 2.875 | 8 |
| 3 | 10 |
| 3.125 | 4 |
| 3.25 | 5 |
| 3.375 | 2 |
| 3.5 | 2 |
| 3.875 | 1 |
hist(df_irl$depression, breaks = seq(1, 4, by = 0.25),
angle = 45,
col = "lightblue",
border = "white",
main = paste("Histogramm of Depression Score in Ireland"),
xlab = "Depression Score")
table(df_irl$gndr)
##
## Male Female
## 906 1111
#head(df_irl$agea)
#summary(df_irl$agea)
df_irl$agea = as.numeric(as.character(df_irl$agea))
hist(df_irl$agea,
main = "Age Distribution",
xlab = "Age",
col = "lightblue",
border = "white")
The relationship between education levels and the prevalence of depression is well-documented, indicating significant disparities in mental health outcomes. Research consistently shows that individuals with lower educational levels have a 19,1% higher likelihood to experience major depressive disorder (MDD) between the age of 18 and 65 compared to those with higher education (Lepe et al., 2022). Moreover, a meta-analysis from 2022 shows a significant prevalence of depressive symptoms, like anxiety, among adolescents with low education and income (Kempfer et al., 2022).
H1: Hypothesis one states that a lower socioeconomic status (education level) is associated with higher levels of depression. We assume that lower education levels will be positively associated with higher CES-D8 depression scores.
# BIVARIATE ANALYSIS AND OPERATIONALIZATION
# Hypothesis 1: Socioeconomic Status: Lower socioeconomic status (education level) is associated
# with higher levels of depression
# recode education levels into 3 groups
df_irl$edu = factor(NA, levels = c("low", "medium", "high")) # variables created as factors with levels (low, medium, high)
# look up original values
table(df_irl$eisced)
##
## Not possible to harmonise into ES-ISCED
## 0
## ES-ISCED I , less than lower secondary
## 197
## ES-ISCED II, lower secondary
## 376
## ES-ISCED IIIb, lower tier upper secondary
## 95
## ES-ISCED IIIa, upper tier upper secondary
## 320
## ES-ISCED IV, advanced vocational, sub-degree
## 436
## ES-ISCED V1, lower tertiary education, BA level
## 298
## ES-ISCED V2, higher tertiary education, >= MA level
## 285
## Other
## 4
Next, we recoded education into three groups, aiming to distribute the
data as evenly as possible across the categories.
kable_styling(
kable(as.data.frame(table(df_irl$edu)),
col.names = c("Level of Education","Distribution"),
caption = "Distribution Education Level in Ireland"
),
full_width = F, font_size = 13, bootstrap_options = c("hover", "condensed")
)
| Level of Education | Distribution |
|---|---|
| low | 668 |
| medium | 756 |
| high | 583 |
library(knitr)
library(kableExtra)
# Create the data frame
edu_table <- data.frame(
`Education Level` = c(
"Not possible to harmonise into ES-ISCED",
"ES-ISCED I , less than lower secondary",
"ES-ISCED II, lower secondary",
"ES-ISCED IIIb, lower tier upper secondary",
"ES-ISCED IIIa, upper tier upper secondary",
"ES-ISCED IV, advanced vocational, sub-degree",
"ES-ISCED V1, lower tertiary education, BA level",
"ES-ISCED V2, higher tertiary education, >= MA level",
"Other"
),
Low = c(0, 197, 376, 95, 0, 0, 0, 0, 0),
Medium = c(0, 0, 0, 0, 320, 436, 0, 0, 0),
High = c(0, 0, 0, 0, 0, 0, 298, 285, 0)
)
# Create the nicely formatted table
kable_styling(
kable(edu_table,
col.names = c("Education Level", "Low", "Medium", "High"),
caption = "Mapping of ES-ISCED Categories to Education Groups"),
full_width = FALSE,
font_size = 13,
bootstrap_options = c("hover", "condensed")
)
| Education Level | Low | Medium | High |
|---|---|---|---|
| Not possible to harmonise into ES-ISCED | 0 | 0 | 0 |
| ES-ISCED I , less than lower secondary | 197 | 0 | 0 |
| ES-ISCED II, lower secondary | 376 | 0 | 0 |
| ES-ISCED IIIb, lower tier upper secondary | 95 | 0 | 0 |
| ES-ISCED IIIa, upper tier upper secondary | 0 | 320 | 0 |
| ES-ISCED IV, advanced vocational, sub-degree | 0 | 436 | 0 |
| ES-ISCED V1, lower tertiary education, BA level | 0 | 0 | 298 |
| ES-ISCED V2, higher tertiary education, >= MA level | 0 | 0 | 285 |
| Other | 0 | 0 | 0 |
#check significance
kw_test = kruskal.test(depression ~ edu, data=df_irl)
kw_test
##
## Kruskal-Wallis rank sum test
##
## data: depression by edu
## Kruskal-Wallis chi-squared = 31.065, df = 2, p-value = 1.796e-07
p_edu = round(kw_test$p.value, 10)
The Kruskal-Wallis test for this variable yielded a p-value of
1.796^{-7}, indicating a statistically significant difference between
groups (p < 0.05).
by(df_irl$depression, df_irl$edu, mean, na.rm=T)
## df_irl$edu: low
## [1] 1.638351
## ------------------------------------------------------------
## df_irl$edu: medium
## [1] 1.567425
## ------------------------------------------------------------
## df_irl$edu: high
## [1] 1.468805
model_edu = lm(depression ~ edu, data = df_irl)
summary(model_edu)
##
## Call:
## lm(formula = depression ~ edu, data = df_irl)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.63835 -0.34381 -0.09381 0.30757 2.40619
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.63835 0.01817 90.179 < 2e-16 ***
## edumedium -0.07093 0.02490 -2.849 0.00443 **
## eduhigh -0.16955 0.02660 -6.374 2.29e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4618 on 1944 degrees of freedom
## (70 observations deleted due to missingness)
## Multiple R-squared: 0.02054, Adjusted R-squared: 0.01954
## F-statistic: 20.39 on 2 and 1944 DF, p-value: 1.726e-09
Myrna Weissman pioneered the field of gender-based research in depression in the 1970s, highlighting a significant gender disparity. She observed that among clinical and community samples, the prevalence of depression was twice as high among women as it was among men (Weissman & Klerman, 1977). After this landmark a lot of other research was conducted in this field (e.g. Bebbington et al., 1998; Ferrari et al., 2012) However, the most observed ratio of female to male cases typically ranges from to one, indicating a clear disparity in the prevalence of the condition between the two genders.
H2: Women report higher levels of depression compared to men. → Women will score higher on the CES-D8 scale compared to men.
# variable: 'gndr'
#df_irl$gndr
# test significance
wk_test = wilcox.test(depression ~ gndr, data = df_irl)
wk_test
##
## Wilcoxon rank sum test with continuity correction
##
## data: depression by gndr
## W = 443514, p-value = 0.01777
## alternative hypothesis: true location shift is not equal to 0
p_gndr = round(wk_test$p.value, 4)
by(df_irl$depression, df_irl$gndr, mean, na.rm=T)
## df_irl$gndr: Male
## [1] 1.540575
## ------------------------------------------------------------
## df_irl$gndr: Female
## [1] 1.580316
# create female dummy
table(df_irl$gndr)
##
## Male Female
## 906 1111
df_irl$female = NA #intiialize variable with NA
df_irl$female[df_irl$gndr=="Male"] = 0 # if variable is female then gender == 0
df_irl$female[df_irl$gndr=="Female"] = 1 # if variable is male then gender == 1
# check
table(df_irl$female)
##
## 0 1
## 906 1111
table(df_irl$gndr, df_irl$female)
##
## 0 1
## Male 906 0
## Female 0 1111
lm(depression ~ female, data = df_irl)
##
## Call:
## lm(formula = depression ~ female, data = df_irl)
##
## Coefficients:
## (Intercept) female
## 1.54058 0.03974
lm(depression ~ gndr, data = df_irl)
##
## Call:
## lm(formula = depression ~ gndr, data = df_irl)
##
## Coefficients:
## (Intercept) gndrFemale
## 1.54058 0.03974
The Wilcox Test showed statisticak sugnificance becaus the p value lied
at 0.0178, which is below 0.05. The linear regression model also
indicates that depression is more prevelant among females rather than
males.
The relationship between vegetable consumption and depression is complex, with a variety of studies in this area. A national survey among the Canadian population found out that people with a high intake of daily vegetables and fruits had a lower risk of major depressive disorder (MDD) (McMartin et al., 2013). Moreover, a systematic review indicated that a higher intake of fruits and vegetables has a positive effect on women’s mental health (Guzek et al., 2022). These findings align with the recommendations for the prevention of MDD. This study identified five dietary recommendations for prevention including increased consumption of fruits, vegetables, legumes, wholegrain cereals, nuts and seeds (Opie et al., 2017).
H3: Individuals who consume vegetables less frequently have higher levels of depression compared to those who consume them regularly. → Lower frequency of vegetable consumption will correspond to higher CES-D8 scores
# variable = eatveg
#df_irl$eatveg
levels(df_irl[,"eatveg"])
## [1] "Three times or more a day"
## [2] "Twice a day"
## [3] "Once a day"
## [4] "Less than once a day but at least 4 times a week"
## [5] "Less than 4 times a week but at least once a week"
## [6] "Less than once a week"
## [7] "Never"
As with the education levels we recoded the vegetable intake levels into
three groups, aiming to distribute the data as evenly as possible across
the categories.
# change level names
df_irl$veggis = factor(NA, levels = c("multiple daily", "daily", "irregular"))
df_irl$veggis[df_irl$eatveg == "Three times or more a day"] = "multiple daily"
df_irl$veggis[df_irl$eatveg == "Twice a day"] = "multiple daily"
df_irl$veggis[df_irl$eatveg == "Once a day"] = "daily"
df_irl$veggis[df_irl$eatveg == "Less than once a day but at least 4 times a week"] = "irregular"
df_irl$veggis[df_irl$eatveg == "Less than 4 times a week but at least once a week"] = "irregular"
df_irl$veggis[df_irl$eatveg == "Less than once a week"] = "irregular"
df_irl$veggis[df_irl$eatveg == "Never"] = "irregular"
veg_table = as.data.frame(table(df_irl$veggis))
# veg_table
library(kableExtra)
library(knitr)
kable_styling(
kable(veg_table,
col.names = c("Vegetable Intake", "Frequency"),
caption = "3 levels of vegetable intake"
),
font_size = 13, bootstrap_options = c("hover", "condensed")
)
| Vegetable Intake | Frequency |
|---|---|
| multiple daily | 557 |
| daily | 1052 |
| irregular | 407 |
#check significance
kt_veg = kruskal.test(depression ~ veggis, data=df_irl)
p_veg = round(kt_veg$p.value, 6)
by(df_irl$depression, df_irl$veggis, mean, na.rm=T)
## df_irl$veggis: multiple daily
## [1] 1.561806
## ------------------------------------------------------------
## df_irl$veggis: daily
## [1] 1.527275
## ------------------------------------------------------------
## df_irl$veggis: irregular
## [1] 1.654898
The Kruskal-Wallis Test resulted in a p value of 1.39^{-4}, showing statistical significance. But the comparison of all 3 levels didin’t show irregular vegetable consumption having a clear effect. Since vegetable consumption of multiple daily had a higher value than daily consumption.
Engagement in physical activity (PA) is significantly associated with lower levels of depression, according to several studies. A recently published cross-sectional study involving 58,455 adults found a clear link between lower levels of PA and increased depressive symptoms (Queiroga et al., 2025). Moreover, a multitude of studies have demonstrated the varying levels of PA can effectively diminish the likelihood of developing depression (e.g. Schuch et al., 2018). This is also reported by the National Health and Nutrition Examination Survey from 2007 to 2018, this cross-sectional analysis indicated that participants without depressive symptoms engage in significantly more PA (10.87 hours/week) than those with depressive symptoms (8.82 hours/week) (Bopari et al., 2024).
H4: Individuals who engage in physical activity on fewer days in the last week report higher levels of depression compared to those who are more physically active. → Fewer days of physical activity will correlate with higher CES-D8 scores.
# variable = dosprt : "Do sports or other physical activity, how many of last 7 days"
#df_irl$dosprt
# test significance
anova_m = aov(depression ~ dosprt, data = df_irl)
summary(anova_m)
## Df Sum Sq Mean Sq F value Pr(>F)
## dosprt 7 22.9 3.276 15.9 <2e-16 ***
## Residuals 1945 400.6 0.206
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 64 observations deleted due to missingness
# make dosprt variable numeric
df_irl$dosprt_n = as.numeric(df_irl$dosprt) -1
# pairwise correlation
# cor(df_irl[,c("depression", "dosprt_n")], use="complete.obs")
# results show negative value --> "the more sports, the less depressed"
library(knitr)
library(kableExtra)
# Calculate correlation matrix
cor_matrix <- cor(df_irl[, c("depression", "dosprt_n")], use = "complete.obs")
# Format as a data frame for table display
cor_df <- as.data.frame(round(cor_matrix, 2))
# Optional: move row names into a column
#cor_df$Variable <- rownames(cor_df)
#cor_df <- cor_df[, c("Variable", "depression", "dosprt_n")]
# Create the nicely styled table
kable_styling(
kable(cor_df,
caption = "Correlation Between Depression and Sport Frequency"),
full_width = FALSE,
font_size = 13,
bootstrap_options = c("hover", "condensed")
)
| depression | dosprt_n | |
|---|---|---|
| depression | 1.00 | -0.23 |
| dosprt_n | -0.23 | 1.00 |
# linear regression model
# lm(depression ~ dosprt, data = df_irl, na.rm=T)
# Required packages
library(broom)
library(knitr)
library(kableExtra)
library(dplyr)
# Run linear regression model
model_sport <- lm(depression ~ dosprt, data = df_irl)
# Tidy the model output (with confidence intervals)
tidy_model <- tidy(model_sport, conf.int = TRUE)
# Round results for readability
tidy_model <- tidy_model %>%
mutate(across(where(is.numeric), ~ round(., 3)))
# Create the nicely formatted table
kable_styling(
kable(tidy_model,
col.names = c("Term", "Estimate", "Std. Error", "t value", "p value", "CI Lower", "CI Upper"),
caption = "Linear Regression: Effect of Sport Frequency on Depression Score"),
full_width = FALSE,
font_size = 13,
bootstrap_options = c("hover", "condensed")
)
| Term | Estimate | Std. Error | t value | p value | CI Lower | CI Upper |
|---|---|---|---|---|---|---|
| (Intercept) | 1.712 | 0.021 | 81.151 | 0.000 | 1.670 | 1.753 |
| dosprt1 | -0.074 | 0.050 | -1.490 | 0.136 | -0.172 | 0.023 |
| dosprt2 | -0.075 | 0.039 | -1.903 | 0.057 | -0.152 | 0.002 |
| dosprt3 | -0.168 | 0.035 | -4.774 | 0.000 | -0.237 | -0.099 |
| dosprt4 | -0.153 | 0.038 | -3.993 | 0.000 | -0.227 | -0.078 |
| dosprt5 | -0.227 | 0.037 | -6.123 | 0.000 | -0.300 | -0.154 |
| dosprt6 | -0.208 | 0.056 | -3.753 | 0.000 | -0.317 | -0.100 |
| dosprt7 | -0.295 | 0.030 | -9.751 | 0.000 | -0.354 | -0.236 |
The Test yielded the following results: more physical activity (doing
sports on 3 - 7 days in a week) is linked to lower depression but lower
levels of physical activity (doing sports on 1 - 2 days in a week) don’t
show a significant difference from doing no sports at all
# Multivariate Modelling
#######################
# MULTIVARIATE MODELLING
# When hypothesis buidling and bivariate analysis is finalized,
# put everything together in a multiple regression model
# Model
lm(depression ~ edu + gndr + veggis + dosprt, data=df_irl) # add more content to your model
##
## Call:
## lm(formula = depression ~ edu + gndr + veggis + dosprt, data = df_irl)
##
## Coefficients:
## (Intercept) edumedium eduhigh gndrFemale
## 1.74403 -0.04290 -0.12210 0.05138
## veggisdaily veggisirregular dosprt1 dosprt2
## -0.06794 0.02902 -0.04520 -0.06111
## dosprt3 dosprt4 dosprt5 dosprt6
## -0.14400 -0.12917 -0.19111 -0.17482
## dosprt7
## -0.26412
model=lm(depression ~ edu + female + veggis + dosprt, data=df_irl)
summary(model)
##
## Call:
## lm(formula = depression ~ edu + female + veggis + dosprt, data = df_irl)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.82443 -0.34125 -0.07443 0.28458 2.17267
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.74403 0.03353 52.021 < 2e-16 ***
## edumedium -0.04290 0.02503 -1.714 0.086714 .
## eduhigh -0.12210 0.02734 -4.466 8.44e-06 ***
## female 0.05138 0.02077 2.473 0.013476 *
## veggisdaily -0.06794 0.02445 -2.779 0.005506 **
## veggisirregular 0.02902 0.03113 0.932 0.351332
## dosprt1 -0.04520 0.05007 -0.903 0.366689
## dosprt2 -0.06111 0.03946 -1.549 0.121627
## dosprt3 -0.14400 0.03569 -4.035 5.68e-05 ***
## dosprt4 -0.12917 0.03868 -3.340 0.000855 ***
## dosprt5 -0.19111 0.03774 -5.064 4.50e-07 ***
## dosprt6 -0.17482 0.05592 -3.126 0.001797 **
## dosprt7 -0.26412 0.03116 -8.477 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4499 on 1932 degrees of freedom
## (72 observations deleted due to missingness)
## Multiple R-squared: 0.07398, Adjusted R-squared: 0.06823
## F-statistic: 12.86 on 12 and 1932 DF, p-value: < 2.2e-16
The graph below presents the estimated effects of all independent variables on the depression score, including confidence intervals for each regression coefficient.
# visualize the model
#install.packages("ggplot2")
#install.packages("broom")
library(ggplot2)
library(broom)
tidy_model = tidy(model, conf.int = TRUE)
ggplot(tidy_model, aes(x = estimate, y = term)) +
geom_point() +
geom_errorbarh(aes(xmin = conf.low, xmax = conf.high), height = 0.2) +
labs(title = "Regression Coefficients for Depression Model", x = "Coefficient Estimate", y = "Predictor") +
theme_minimal()