Introduction

Depression is a global public health priority. According to the World Health Organization and the World Bank depression accounts for a full 10% of the total nonfatal burden of disease worldwide (World Health Organization & World Bank, 2016). Understanding the factors that contribute to depression is critical for developing effective interventions and policies to mitigate its impact on individuals and society. This study aims to explore the relationship between depression and key variables, including education level, gender, frequency of vegetable consumption and physical activity. These variables were selected as they encompass both socio-demographic and lifestyle influences known to affect mental health. Focusing on Ireland, a country that has participated in all rounds of the European Social Survey (ESS), this study draws on data from over 2,000 respondents to provide insights into the factors associated with depression.

Research Question

How do education level, gender, frequency of vegetable consumption and physical activity, influence the prevalence or severity of depression among the population of Iceland?

Dependent Variable

Depression

Independent Variables

Education level
Gender
Frequency of vegetable consumption
Physical activity

library(foreign)
library(ltm)

## Loading required package: MASS

## Loading required package: msm

## Loading required package: polycor

library(likert)

## Loading required package: ggplot2

## Loading required package: xtable

setwd("/Users/katif/Desktop/R Studios")
df = read.spss("ESS11.sav", to.data.frame = T)
#names(df)
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)

library(foreign)
library(ltm)
library(likert)
library(kableExtra)

vnames = c("fltdpr","flteeff","slprl","wrhpp","fltlnl","enjlf","fltsd","cldgng")
likert_df = df[,vnames]

likert_table = likert(likert_df)$results
likert_numeric_df = as.data.frame(lapply((df[,vnames]), as.numeric))
likert_table$Mean = unlist(lapply((likert_numeric_df[,vnames]), mean, na.rm=T)) # ... and append new columns to the data frame
likert_table$Count = unlist(lapply((likert_numeric_df[,vnames]), function (x) sum(!is.na(x))))
likert_table$Item = c(
  d20="How much of the time during the past week did you feel depressed?",
  d21="How much of the time during the past week you felt that everything you did was an effort?",
  d22="How much of the time during the past week your sleep was restless?",
  d23="How much of the time during the past week you were happy?",
  d24="How much of the time during the past week you felt lonely?",
  d25="How much of the time during the past week you enjoyed life?",
  d26="How much of the time during the past week you felt sad?",
  d27="How much of the time during the past week you could not get going?")

likert_table[,2:6] = round(likert_table[,2:6],1)
# round means to 3 decimal digits
likert_table[,7] = round(likert_table[,7],3)

# create formatted table
kable_styling(kable(likert_table,
                    caption = "Distribution of answers regarding depression indicators"
                    )
              )

Distribution of answers regarding depression indicators
Item	None or almost none of the time	Some of the time	Most of the time	All or almost all of the time	Mean	Count
How much of the time during the past week did you feel depressed?	64.9	29.1	4.6	1.5	1.4	39981
How much of the time during the past week you felt that everything you did was an effort?	48.4	38.4	9.8	3.4	1.7	39983
How much of the time during the past week your sleep was restless?	43.9	39.9	11.6	4.6	1.8	40017
How much of the time during the past week you were happy?	4.0	23.5	48.9	23.6	2.9	39890
How much of the time during the past week you felt lonely?	68.1	24.3	5.3	2.3	1.4	39983
How much of the time during the past week you enjoyed life?	5.3	24.8	44.8	25.0	2.9	39878
How much of the time during the past week you felt sad?	52.5	41.1	4.9	1.6	1.6	39981
How much of the time during the past week you could not get going?	55.7	36.1	6.2	2.0	1.5	39949

 # create basic plot (code also valid)
plot(likert(summary=likert_table[,1:6]))

In the following section we created the depression scale using the eight factors of the CES-D8 Depression Scale, which assesses depressive symptoms based on responses to 8 items:
1. ‘fltdpr’: how often have you felt depressed in the past week 2. ‘flteeff’: how often did everything you did in the past week feel as an effort? 3. ‘slprl’: Sleep was restless, how often past week 4. ‘wrhpp’: Were happy, how often past week 5. ‘fltlnl’: Felt lonely, how often past week 6. ‘enjlf’: Enjoyed life, how often past week 7. ‘fltsd’: Felt sad, how often past week 8. ‘cldgng’: Could not get going, how often past week

All responses are converted into a numeric scala with values ranging from 1 to 4.

#convert to numbers 1-4
df$d20 = as.numeric(df$fltdpr)
df$d21 = as.numeric(df$flteeff)
df$d22 = as.numeric(df$slprl)
df$d23 = as.numeric(df$wrhpp)
df$d24 = as.numeric(df$fltlnl)
df$d25 = as.numeric(df$enjlf)
df$d26 = as.numeric(df$fltsd)
df$d27 = as.numeric(df$cldgng)

df$d23 = 5 - df$d23
df$d25 = 5 - df$d25

cronbach.alpha(df[,c("d20", "d21", "d22", "d23", "d24", "d25", "d26", "d27")], na.rm=T)

## 
## Cronbach's alpha for the 'df[, c("d20", "d21", "d22", "d23", "d24", "d25", "d26", "d27")]' data-set
## 
## Items: 8
## Sample units: 40156
## alpha: 0.823

#install.packages("psych")
library(psych)
alpha_result = alpha(df[, c("d20", "d21", "d22", "d23", "d24", "d25", "d26", "d27")])
alpha_rnd = round(alpha_result$total[["raw_alpha"]], 3)
alpha_rnd

## [1] 0.824

To show if the scale demonstartes high internal reliability we calculated the cronbachs alpha value. In our case we received an alpha value of 0.824, this shows a high reliability because it is above 0.7.

df$depression = rowSums(df[,c("d20", "d21", "d22", "d23", "d24", "d25", "d26", "d27")]) / 8
summary(df$depression)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   1.000   1.375   1.625   1.695   2.000   4.000     799

Furthermore, to show the distribution of the results of the depression scala.

library(kableExtra)
library(knitr)
hist(df$depression, breaks=8)

table_depression = as.data.frame(table(df$depression))

#kable( table_depression,
 # col.names = c("Varianz","Frequency"),
  #caption = "Distribution of Depression")

scroll_box(
  kable_styling(
    kable(table_depression,
          col.names = c("Varianz","Frequency"),
          caption = "Distribution of Depression"
          ),
    font_size = 13, bootstrap_options = c("hover", "condensed")
  ),
  height="200px")

Distribution of Depression
Varianz	Frequency
1	2340
1.125	2595
1.25	4354
1.375	4601
1.5	4556
1.625	4000
1.75	3546
1.875	2955
2	2480
2.125	1896
2.25	1673
2.375	1144
2.5	815
2.625	601
2.75	480
2.875	351
3	271
3.125	198
3.25	141
3.375	108
3.5	87
3.625	62
3.75	40
3.875	29
4	34

In the next section we show which countries collected data on the depression scale. From these results we have chosen Ireland.

# Subset the data to a single country; Ireland
# df$cntry
table(df$cntry)

## 
##            Albania            Austria            Belgium           Bulgaria 
##                  0               2354               1594                  0 
##        Switzerland             Cyprus            Czechia            Germany 
##               1384                685                  0               2420 
##            Denmark            Estonia              Spain            Finland 
##                  0                  0               1844               1563 
##             France     United Kingdom            Georgia             Greece 
##               1771               1684                  0               2757 
##            Croatia            Hungary            Ireland             Israel 
##               1563               2118               2017                  0 
##            Iceland              Italy          Lithuania         Luxembourg 
##                842               2865               1365                  0 
##             Latvia         Montenegro    North Macedonia        Netherlands 
##                  0                  0                  0               1695 
##             Norway             Poland           Portugal            Romania 
##               1337               1442               1373                  0 
##             Serbia Russian Federation             Sweden           Slovenia 
##               1563                  0               1230               1248 
##           Slovakia             Turkey            Ukraine             Kosovo 
##               1442                  0                  0                  0

df_irl= df[df$cntry == "Ireland",]
#df_irl

table_irl = as.data.frame(table(df_irl$depression))

scroll_box(
  kable_styling(
    kable(table_irl,
          col.names = c("Varianz","Frequency"),
          caption = "Distribution of Depression in Ireland"
          ),
    font_size = 13, bootstrap_options = c("hover", "condensed")
  ),
  height="200px")

Distribution of Depression in Ireland
Varianz	Frequency
1	269
1.125	207
1.25	236
1.375	210
1.5	203
1.625	159
1.75	139
1.875	125
2	113
2.125	79
2.25	88
2.375	45
2.5	19
2.625	19
2.75	12
2.875	8
3	10
3.125	4
3.25	5
3.375	2
3.5	2
3.875	1

hist(df_irl$depression, breaks = seq(1, 4, by = 0.25),
     angle = 45,
     col = "blue", 
     main = paste("Histogramm of Depression Score in Ireland"),
     xlab = "Depression Score")

Statistical Description

Gender Distribution

table(df_irl$gndr)

## 
##   Male Female 
##    906   1111

Age Distribution

#head(df_irl$agea)
#summary(df_irl$agea)
df_irl$agea = as.numeric(as.character(df_irl$agea))
hist(df_irl$agea,
     main = "Age Distribution",
     xlab = "Age",
     col = "pink",
     border = "white")

Hypothesis 1 Education Level:

The relationship between education levels and the prevalence of depression is well-documented, indicating significant disparities in mental health outcomes. Research consistently shows that individuals with lower educational levels have a 19,1% higher likelihood to experience major depressive disorder (MDD) between the age of 18 and 65 compared to those with higher education (Lepe et al., 2022). Moreover, a meta-analysis from 2022 shows a significant prevalence of depressive symptoms, like anxiety, among adolescents with low education and income (Kempfer et al., 2022).

H1: Lower socioeconomic status (education level) is associated with higher levels of depression. → Lower education levels will be positively associated with higher CES-D8 depression scores.

According to the data, the distribution of education levels in Ireland is as follows:
Low education: 0 respondents
Medium education: 0 respondents
High education: 0 respondents

This corresponds to proportions of:
Low: NaN% Medium: NaN%
High: NaN%

# BIVARIATE ANALYSIS AND OPERATIONALIZATION
# Hypothesis 1: Socioeconomic Status: Lower socioeconomic status (education level) is associated 
#               with higher levels of depression

# recode education levels into 3 groups
df_irl$edu = factor(NA, levels = c("low", "medium", "high")) # variables created as factors with levels (low, medium, high)

# look up original values
table(df_irl$eisced)

## 
##             Not possible to harmonise into ES-ISCED 
##                                                   0 
##              ES-ISCED I , less than lower secondary 
##                                                 197 
##                        ES-ISCED II, lower secondary 
##                                                 376 
##           ES-ISCED IIIb, lower tier upper secondary 
##                                                  95 
##           ES-ISCED IIIa, upper tier upper secondary 
##                                                 320 
##        ES-ISCED IV, advanced vocational, sub-degree 
##                                                 436 
##     ES-ISCED V1, lower tertiary education, BA level 
##                                                 298 
## ES-ISCED V2, higher tertiary education, >= MA level 
##                                                 285 
##                                               Other 
##                                                   4

df_irl$edu[df_irl$eisced == "ES-ISCED I , less than lower secondary"] = "low"
df_irl$edu[df_irl$eisced == "ES-ISCED II, lower secondary"] = "low"
df_irl$edu[df_irl$eisced == "ES-ISCED IIIb, lower tier upper secondary"] = "low"

df_irl$edu[df_irl$eisced == "ES-ISCED IIIa, upper tier upper secondary"] = "medium"
df_irl$edu[df_irl$eisced == "ES-ISCED IV, advanced vocational, sub-degree"] = "medium"

df_irl$edu[df_irl$eisced == "ES-ISCED V1, lower tertiary education, BA level"] = "high"
df_irl$edu[df_irl$eisced == "ES-ISCED V2, higher tertiary education, >= MA level"] = "high"


as.data.frame(table(df_irl$edu))

##     Var1 Freq
## 1    low  668
## 2 medium  756
## 3   high  583

kable_styling(
    kable(as.data.frame(table(df_irl$edu)),
          col.names = c("Level of Education","Distribution"),
          caption = "Distribution Education Level in Ireland"
          ),
    full_width = F, font_size = 13, bootstrap_options = c("hover", "condensed")
  )

Distribution Education Level in Ireland
Level of Education	Distribution
low	668
medium	756
high	583

# check
table(df_irl$eisced, df_irl$edu)

##                                                      
##                                                       low medium high
##   Not possible to harmonise into ES-ISCED               0      0    0
##   ES-ISCED I , less than lower secondary              197      0    0
##   ES-ISCED II, lower secondary                        376      0    0
##   ES-ISCED IIIb, lower tier upper secondary            95      0    0
##   ES-ISCED IIIa, upper tier upper secondary             0    320    0
##   ES-ISCED IV, advanced vocational, sub-degree          0    436    0
##   ES-ISCED V1, lower tertiary education, BA level       0      0  298
##   ES-ISCED V2, higher tertiary education, >= MA level   0      0  285
##   Other                                                 0      0    0

#check significance
kruskal.test(depression ~ edu, data=df_irl)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  depression by edu
## Kruskal-Wallis chi-squared = 31.065, df = 2, p-value = 1.796e-07

by(df_irl$depression, df_irl$edu, mean, na.rm=T)

## df_irl$edu: low
## [1] 1.638351
## ------------------------------------------------------------ 
## df_irl$edu: medium
## [1] 1.567425
## ------------------------------------------------------------ 
## df_irl$edu: high
## [1] 1.468805

model_edu = lm(depression ~ edu, data = df_irl)
summary(model_edu)

## 
## Call:
## lm(formula = depression ~ edu, data = df_irl)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.63835 -0.34381 -0.09381  0.30757  2.40619 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.63835    0.01817  90.179  < 2e-16 ***
## edumedium   -0.07093    0.02490  -2.849  0.00443 ** 
## eduhigh     -0.16955    0.02660  -6.374 2.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4618 on 1944 degrees of freedom
##   (70 observations deleted due to missingness)
## Multiple R-squared:  0.02054,    Adjusted R-squared:  0.01954 
## F-statistic: 20.39 on 2 and 1944 DF,  p-value: 1.726e-09

Hypothesis 2 Gender:

Myrna Weissman pioneered the field of gender-based research in depression in the 1970s, highlighting a significant gender disparity. She observed that among clinical and community samples, the prevalence of depression was twice as high among women as it was among men (Weissman & Klerman, 1977). After this landmark a lot of other research was conducted in this field (e.g. Bebbington et al., 1998; Ferrari et al., 2012) However, the most observed ratio of female to male cases typically ranges from to one, indicating a clear disparity in the prevalence of the condition between the two genders.

H2: Women report higher levels of depression compared to men. → Women will score higher on the CES-D8 scale compared to men.

# variable: 'gndr'
#df_irl$gndr

# test significance
wilcox.test(depression ~ gndr, data = df_irl)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  depression by gndr
## W = 443514, p-value = 0.01777
## alternative hypothesis: true location shift is not equal to 0

by(df_irl$depression, df_irl$gndr, mean, na.rm=T)

## df_irl$gndr: Male
## [1] 1.540575
## ------------------------------------------------------------ 
## df_irl$gndr: Female
## [1] 1.580316

# create female dummy
table(df_irl$gndr)

## 
##   Male Female 
##    906   1111

df_irl$female = NA #intiialize variable with NA 
df_irl$female[df_irl$gndr=="Male"] = 0 # if variable is female then gender == 0
df_irl$female[df_irl$gndr=="Female"] = 1 # if variable is male then gender == 1
# check
table(df_irl$female)

## 
##    0    1 
##  906 1111

table(df_irl$gndr, df_irl$female)

##         
##             0    1
##   Male    906    0
##   Female    0 1111

lm(depression ~ female, data = df_irl)

## 
## Call:
## lm(formula = depression ~ female, data = df_irl)
## 
## Coefficients:
## (Intercept)       female  
##     1.54058      0.03974

lm(depression ~ gndr, data = df_irl)

## 
## Call:
## lm(formula = depression ~ gndr, data = df_irl)
## 
## Coefficients:
## (Intercept)   gndrFemale  
##     1.54058      0.03974

Hypothesis 3 Vegetable Consumption:

The relationship between vegetable consumption and depression is complex, with a variety of studies in this area. A national survey among the Canadian population found out that people with a high intake of daily vegetables and fruits had a lower risk of major depressive disorder (MDD) (McMartin et al., 2013). Moreover, a systematic review indicated that a higher intake of fruits and vegetables has a positive effect on women’s mental health (Guzek et al., 2022). These findings align with the recommendations for the prevention of MDD. This study identified five dietary recommendations for prevention including increased consumption of fruits, vegetables, legumes, wholegrain cereals, nuts and seeds (Opie et al., 2017).

H3: Individuals who consume vegetables less frequently have higher levels of depression compared to those who consume them regularly. → Lower frequency of vegetable consumption will correspond to higher CES-D8 scores

# variable = eatveg
#df_irl$eatveg
levels(df_irl[,"eatveg"])

## [1] "Three times or more a day"                        
## [2] "Twice a day"                                      
## [3] "Once a day"                                       
## [4] "Less than once a day but at least 4 times a week" 
## [5] "Less than 4 times a week but at least once a week"
## [6] "Less than once a week"                            
## [7] "Never"

# change level names
df_irl$veggis = factor(NA, levels = c("multiple daily", "daily", "irregular"))

df_irl$veggis[df_irl$eatveg == "Three times or more a day"] = "multiple daily"
df_irl$veggis[df_irl$eatveg == "Twice a day"] = "multiple daily"

df_irl$veggis[df_irl$eatveg == "Once a day"] = "daily"

df_irl$veggis[df_irl$eatveg == "Less than once a day but at least 4 times a week"] = "irregular"
df_irl$veggis[df_irl$eatveg == "Less than 4 times a week but at least once a week"] = "irregular"
df_irl$veggis[df_irl$eatveg == "Less than once a week"] = "irregular"
df_irl$veggis[df_irl$eatveg == "Never"] = "irregular"

table(df_irl$veggis)

## 
## multiple daily          daily      irregular 
##            557           1052            407

#check significance
kruskal.test(depression ~ veggis, data=df_irl)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  depression by veggis
## Kruskal-Wallis chi-squared = 17.764, df = 2, p-value = 0.0001388

by(df_irl$depression, df_irl$veggis, mean, na.rm=T)

## df_irl$veggis: multiple daily
## [1] 1.561806
## ------------------------------------------------------------ 
## df_irl$veggis: daily
## [1] 1.527275
## ------------------------------------------------------------ 
## df_irl$veggis: irregular
## [1] 1.654898

Hypothesis 4 Physical Activity

Hypothesis: Individuals who engage in physical activity on fewer days in the last week report higher levels of depression. → Fewer days of physical activity will correlate with higher CES-D8 scores. The average depression scores across physical activity levels (0 to 7 days) are as follows:

0 days: NaN
1 day: NaN
2 days: NaN
3 days: NaN
4 days: NaN
5 days: NaN
6 days: NaN
7 days: NaN

Numeric variablec

df_irl\(depression <- as.numeric(df_irl\)depression) df_irl\(dosprt_n <- as.numeric(df_irl\)dosprt_n)

Correlation

cor_value <- cor(df_irl\(depression, df_irl\)dosprt_n, use = “complete.obs”)

Output nicely

cat(“The correlation between physical activity and depression is”, round(cor_value, 2), “”)

# variable = dosprt : "Do sports or other physical activity, how many of last 7 days"

#df_irl$dosprt

# test significance
anova_m = aov(depression ~ dosprt, data = df_irl)
summary(anova_m)

##               Df Sum Sq Mean Sq F value Pr(>F)    
## dosprt         7   22.9   3.276    15.9 <2e-16 ***
## Residuals   1945  400.6   0.206                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 64 observations deleted due to missingness

# calculate the mean of "depression" within each group
by(df_irl$depression, df_irl$dosprt, mean, na.rm=T)

## df_irl$dosprt: 0
## [1] 1.711663
## ------------------------------------------------------------ 
## df_irl$dosprt: 1
## [1] 1.637376
## ------------------------------------------------------------ 
## df_irl$dosprt: 2
## [1] 1.636968
## ------------------------------------------------------------ 
## df_irl$dosprt: 3
## [1] 1.54375
## ------------------------------------------------------------ 
## df_irl$dosprt: 4
## [1] 1.559113
## ------------------------------------------------------------ 
## df_irl$dosprt: 5
## [1] 1.484797
## ------------------------------------------------------------ 
## df_irl$dosprt: 6
## [1] 1.503205
## ------------------------------------------------------------ 
## df_irl$dosprt: 7
## [1] 1.416667

# result: mean is higher for people how do no sport or less sports and mean slowly declines for the more days in which sport is done

# make dosprt variable numeric
df_irl$dosprt_n = as.numeric(df_irl$dosprt) -1

# pairwise correlation
cor(df_irl[,c("depression", "dosprt_n")], use="complete.obs")

##            depression   dosprt_n
## depression  1.0000000 -0.2287331
## dosprt_n   -0.2287331  1.0000000

# results show negative value --> "the more sports, the less depressed"

# linear regression model 
lm(depression ~ dosprt, data = df_irl, na.rm=T)

## 
## Call:
## lm(formula = depression ~ dosprt, data = df_irl, na.rm = T)
## 
## Coefficients:
## (Intercept)      dosprt1      dosprt2      dosprt3      dosprt4      dosprt5  
##     1.71166     -0.07429     -0.07469     -0.16791     -0.15255     -0.22687  
##     dosprt6      dosprt7  
##    -0.20846     -0.29500

Multivariate Modelling

# When hypothesis buidling and bivariate analysis is finalized,
# put everything together in a multiple regression model
# Model 
lm(depression ~ edu + gndr + veggis + dosprt, data=df_irl, weights = anweight)

## 
## Call:
## lm(formula = depression ~ edu + gndr + veggis + dosprt, data = df_irl, 
##     weights = anweight)
## 
## Coefficients:
##     (Intercept)        edumedium          eduhigh       gndrFemale  
##         1.70685         -0.01282         -0.07796          0.03957  
##     veggisdaily  veggisirregular          dosprt1          dosprt2  
##        -0.08964         -0.03023         -0.06632         -0.05378  
##         dosprt3          dosprt4          dosprt5          dosprt6  
##        -0.07799         -0.07557         -0.15757         -0.16568  
##         dosprt7  
##        -0.23464

model=lm(depression ~ edu + female + veggis + dosprt, data=df_irl)

summary(model)

## 
## Call:
## lm(formula = depression ~ edu + female + veggis + dosprt, data = df_irl)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.82443 -0.34125 -0.07443  0.28458  2.17267 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      1.74403    0.03353  52.021  < 2e-16 ***
## edumedium       -0.04290    0.02503  -1.714 0.086714 .  
## eduhigh         -0.12210    0.02734  -4.466 8.44e-06 ***
## female           0.05138    0.02077   2.473 0.013476 *  
## veggisdaily     -0.06794    0.02445  -2.779 0.005506 ** 
## veggisirregular  0.02902    0.03113   0.932 0.351332    
## dosprt1         -0.04520    0.05007  -0.903 0.366689    
## dosprt2         -0.06111    0.03946  -1.549 0.121627    
## dosprt3         -0.14400    0.03569  -4.035 5.68e-05 ***
## dosprt4         -0.12917    0.03868  -3.340 0.000855 ***
## dosprt5         -0.19111    0.03774  -5.064 4.50e-07 ***
## dosprt6         -0.17482    0.05592  -3.126 0.001797 ** 
## dosprt7         -0.26412    0.03116  -8.477  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4499 on 1932 degrees of freedom
##   (72 observations deleted due to missingness)
## Multiple R-squared:  0.07398,    Adjusted R-squared:  0.06823 
## F-statistic: 12.86 on 12 and 1932 DF,  p-value: < 2.2e-16

# visualize the model
#install.packages("ggplot2")
#install.packages("broom")
library(ggplot2)
library(broom)

tidy_model = tidy(model, conf.int = TRUE)

ggplot(tidy_model, aes(x = estimate, y = term)) +
  geom_point() +
  geom_errorbarh(aes(xmin = conf.low, xmax = conf.high), height = 0.2) +
  labs(title = "Regression Coefficients for Depression Model", x = "Coefficient Estimate", y = "Predictor") +
  theme_minimal()

Homework_2

Katarina

2025-05-12