This document analyzes depressive symptoms in Austria using data from the European Social Survey (ESS11). The analysis includes descriptive statistics and a regression model of the CES-D8 depression scale, using both unweighted and weighted approaches.
library(foreign)
library(likert)
## Warning: package 'likert' was built under R version 4.4.3
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.4.3
## Loading required package: xtable
## Warning: package 'xtable' was built under R version 4.4.3
library(kableExtra)
## Warning: package 'kableExtra' was built under R version 4.4.3
library(psych)
## Warning: package 'psych' was built under R version 4.4.3
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
library(effects)
## Warning: package 'effects' was built under R version 4.4.3
## Loading required package: carData
## Warning: package 'carData' was built under R version 4.4.3
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
library(pscl)
## Warning: package 'pscl' was built under R version 4.4.3
## Classes and Methods for R originally developed in the
## Political Science Computational Laboratory
## Department of Political Science
## Stanford University (2002-2015),
## by and under the direction of Simon Jackman.
## hurdle and zeroinfl functions by Achim Zeileis.
setwd("C:/Users/Vinicius Alpha/Downloads/")
suppressWarnings(df <- read.spss("ESS11.sav", to.data.frame = TRUE))
# Filter for Austria only
df_AT <- subset(df, cntry == "Austria")
# Print sample size
nrow(df_AT)
## [1] 2354
The Austrian sample consists of 2354 respondents.
We construct the CES-D8 scale by using eight self-reported depression-related items. Two positively worded items are reverse-coded.
# Convert items to numeric
df_AT$d1 <- as.numeric(df_AT$fltdpr)
df_AT$d2 <- as.numeric(df_AT$flteeff)
df_AT$d3 <- as.numeric(df_AT$slprl)
df_AT$d4 <- as.numeric(df_AT$wrhpp)
df_AT$d5 <- as.numeric(df_AT$fltlnl)
df_AT$d6 <- as.numeric(df_AT$enjlf)
df_AT$d7 <- as.numeric(df_AT$fltsd)
df_AT$d8 <- as.numeric(df_AT$cldgng)
# Reverse score the positive items
df_AT$d4 <- 5 - df_AT$d4
df_AT$d6 <- 5 - df_AT$d6
# Ensure numeric conversion is strict
scale_items <- df_AT[, c("d1", "d2", "d3", "d4", "d5", "d6", "d7", "d8")]
scale_items[] <- lapply(scale_items, function(x) as.numeric(as.character(x)))
# Compute depression score (mean)
df_AT$CES_D8 <- rowMeans(scale_items, na.rm = TRUE)
Below we describe the eight items included in the CES-D8 depression scale:
These items were rated on a Likert scale and are used to assess the presence and intensity of depressive symptoms experienced in the past week.
We now present the individual items as a table and visualize the
results using the likert package.
likert_items <- df_AT[, c("fltdpr", "flteeff", "slprl", "wrhpp", "fltlnl", "enjlf", "fltsd", "cldgng")]
# Convert all to ordered factors
likert_items[] <- lapply(likert_items, function(x) factor(as.character(x), ordered = TRUE))
# Create Likert object
likert_obj <- likert(likert_items)
# Table of results
summary_likert <- likert_obj$results
kable(summary_likert, caption = "Distribution of Depression Scale Responses") %>% kable_styling()
| Item | All or almost all of the time | Most of the time | None or almost none of the time | Some of the time |
|---|---|---|---|---|
| fltdpr | 0.8510638 | 2.340425 | 68.170213 | 28.63830 |
| flteeff | 2.0442930 | 7.112436 | 52.129472 | 38.71380 |
| slprl | 3.1116795 | 7.630008 | 50.554135 | 38.70418 |
| wrhpp | 21.8990590 | 47.048760 | 3.592814 | 27.45937 |
| fltlnl | 1.3617021 | 3.276596 | 77.361702 | 18.00000 |
| enjlf | 22.0795892 | 41.420625 | 4.321780 | 32.17801 |
| fltsd | 1.3219616 | 2.089552 | 68.656716 | 27.93177 |
| cldgng | 1.2782275 | 2.939923 | 69.237324 | 26.54452 |
# Plot
plot(likert_obj)
We now assess potential predictors of depressive symptoms using a linear regression model. The predictors include:
We test both unweighted and weighted models.
# Transform to numeric
df_AT$fltlnl <- as.numeric(as.factor(df_AT$fltlnl))
df_AT$sclmeet <- as.numeric(as.factor(df_AT$sclmeet))
df_AT$health <- as.numeric(as.factor(df_AT$health))
df_AT$dosprt <- as.numeric(as.factor(df_AT$dosprt))
# Replace function
replace_with_mode <- function(x) {
x <- as.character(x)
non_na <- x[!is.na(x)]
if (length(non_na) == 0) return(x)
mode_value <- names(sort(table(non_na), decreasing = TRUE))[1]
x[is.na(x)] <- mode_value
return(as.numeric(as.character(x)))
}
# Apply to predictors
df_AT$fltlnl <- replace_with_mode(df_AT$fltlnl)
df_AT$sclmeet <- replace_with_mode(df_AT$sclmeet)
df_AT$health <- replace_with_mode(df_AT$health)
df_AT$dosprt <- replace_with_mode(df_AT$dosprt)
# Checking again if missing values are fixed
colSums(is.na(df_AT[, c("fltlnl", "sclmeet", "health", "dosprt")]))
## fltlnl sclmeet health dosprt
## 0 0 0 0
summary(df_AT$fltlnl)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 1.000 1.286 1.000 4.000
summary(df_AT$sclmeet)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 4.000 5.000 4.892 6.000 7.000
summary(df_AT$health)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 2.000 2.064 3.000 5.000
### Unweighted Model
# Subset complete cases for model variables
subset_vars = df_AT[, c("fltlnl", "sclmeet", "health", "dosprt")]
correlation = cor(subset_vars, use = "complete.obs")
print(correlation)
## fltlnl sclmeet health dosprt
## fltlnl 1.00000000 -0.04745841 0.2541255 -0.09371122
## sclmeet -0.04745841 1.00000000 -0.1943385 0.09867391
## health 0.25412550 -0.19433852 1.0000000 -0.27171649
## dosprt -0.09371122 0.09867391 -0.2717165 1.00000000
# Fit model
model_unweighted <- lm(CES_D8 ~ fltlnl + sclmeet + health + dosprt, data = df_AT)
summary(model_unweighted)
##
## Call:
## lm(formula = CES_D8 ~ fltlnl + sclmeet + health + dosprt, data = df_AT)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.00555 -0.23609 -0.05078 0.19148 1.67036
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.795920 0.039992 19.902 <2e-16 ***
## fltlnl 0.381485 0.011619 32.832 <2e-16 ***
## sclmeet -0.004519 0.005597 -0.807 0.419
## health 0.177576 0.008319 21.346 <2e-16 ***
## dosprt -0.003870 0.002861 -1.353 0.176
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3229 on 2346 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.4759, Adjusted R-squared: 0.475
## F-statistic: 532.5 on 4 and 2346 DF, p-value: < 2.2e-16
dweight)# Fit weighted model
model_weighted <- lm(CES_D8 ~ fltlnl + sclmeet + health + dosprt, data = df_AT, weights = dweight)
summary(model_weighted)
##
## Call:
## lm(formula = CES_D8 ~ fltlnl + sclmeet + health + dosprt, data = df_AT,
## weights = dweight)
##
## Weighted Residuals:
## Min 1Q Median 3Q Max
## -1.66127 -0.21712 -0.03533 0.17408 2.35889
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.773847 0.039772 19.457 <2e-16 ***
## fltlnl 0.388493 0.012009 32.350 <2e-16 ***
## sclmeet -0.002070 0.005538 -0.374 0.7086
## health 0.180318 0.008363 21.561 <2e-16 ***
## dosprt -0.005710 0.002828 -2.019 0.0436 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3171 on 2346 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.463, Adjusted R-squared: 0.462
## F-statistic: 505.6 on 4 and 2346 DF, p-value: < 2.2e-16
cat("R-squared unweighted: ", round(summary(model_unweighted)$r.squared, 3), "\n")
## R-squared unweighted: 0.476
cat("R-squared weighted: ", round(summary(model_weighted)$r.squared, 3), "\n")
## R-squared weighted: 0.463
The weighted model shows a slightly different R-squared, indicating the influence of survey weights on model performance.
# Create binary outcome for clinically significant depression (CES-D >= 2)
hist(df_AT$CES_D8, main = "Distibution of CES_08", xlab = "CES-D Score", col = "brown")
df_AT$depr_clinical <- ifelse(df_AT$CES_D8 >= 2.0, 1, 0)
df_AT$gndr <- factor(df_AT$gndr)
df_AT$marsts <- factor(df_AT$marsts)
df_AT$uempla <- factor(df_AT$uempla)
# classify age into four group
df_AT$agea <- as.numeric(as.character(df_AT$agea))
quantile(df_AT$agea, c(1/4, 1/2, 3/4), na.rm=T)
## 25% 50% 75%
## 42 57 70
df_AT$age_group <- factor(NA, levels = c("<30", "30-45", "46-55", "56-65", ">65"))
df_AT$age_group[df_AT$agea < 30] <- "<30"
df_AT$age_group[df_AT$agea >= 30 & df_AT$agea <= 45] <- "30-45"
df_AT$age_group[df_AT$agea >= 46 & df_AT$agea <= 55] <- "46-55"
df_AT$age_group[df_AT$agea >= 56 & df_AT$agea <=65] <- "56-65"
df_AT$age_group[df_AT$agea > 66] <- ">66"
## Warning in `[<-.factor`(`*tmp*`, df_AT$agea > 66, value = structure(c(4L, :
## invalid factor level, NA generated
table(df_AT$depr_clinical)
##
## 0 1
## 1864 487
table(df_AT$gndr)
##
## Male Female
## 993 1361
table(df_AT$age_group)
##
## <30 30-45 46-55 56-65 >65
## 243 458 413 447 0
prop.table(table(df_AT$depr_clinical))
##
## 0 1
## 0.7928541 0.2071459
table(df_AT$depr_clinical, df_AT$gndr)
##
## Male Female
## 0 828 1036
## 1 162 325
t <- table(df_AT$depr_clinical, df_AT$gndr)
rownames(t)
## [1] "0" "1"
colnames(t)
## [1] "Male" "Female"
odds_female <- t["1", "Female"] / t["0", "Female"]
odds_male <- t["1", "Male"] / t["0", "Male"]
OR_gender <- odds_female / odds_male
OR_gender
## [1] 1.603389
# Simple logistic regression with only gender
aModel_simple <- glm(depr_clinical ~ gndr, data = df_AT, family = binomial)
summary(aModel_simple)
##
## Call:
## glm(formula = depr_clinical ~ gndr, family = binomial, data = df_AT)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.63142 0.08591 -18.990 < 2e-16 ***
## gndrFemale 0.47212 0.10688 4.417 9.99e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2398.7 on 2350 degrees of freedom
## Residual deviance: 2378.6 on 2349 degrees of freedom
## (3 observations deleted due to missingness)
## AIC: 2382.6
##
## Number of Fisher Scoring iterations: 4
exp(coef(aModel_simple))
## (Intercept) gndrFemale
## 0.1956522 1.6033891
exp(confint(aModel_simple))
## Waiting for profiling to be done...
## 2.5 % 97.5 %
## (Intercept) 0.1647876 0.2308224
## gndrFemale 1.3022718 1.9803814
aModel <- glm(depr_clinical ~ gndr + age_group + hincfel + health + marsts + uempla,
data = df_AT, family = binomial)
summary(aModel)
##
## Call:
## glm(formula = depr_clinical ~ gndr + age_group + hincfel + health +
## marsts + uempla, family = binomial, data = df_AT)
##
## Coefficients:
## Estimate
## (Intercept) -3.41619
## gndrFemale 0.27175
## age_group30-45 -0.24285
## age_group46-55 -0.85104
## age_group56-65 -1.50075
## hincfelCoping on present income -0.06307
## hincfelDifficult on present income 0.70677
## hincfelVery difficult on present income 1.44516
## health 1.13408
## marstsIn a legally registered civil union -13.36590
## marstsLegally divorced/Civil union dissolved 0.37492
## marstsWidowed/Civil partner died 1.45960
## marstsNone of these (NEVER married or in legally registered civil union) -0.11352
## uemplaMarked 0.41621
## Std. Error
## (Intercept) 0.73865
## gndrFemale 0.18969
## age_group30-45 0.26021
## age_group46-55 0.32076
## age_group56-65 0.34917
## hincfelCoping on present income 0.23736
## hincfelDifficult on present income 0.28224
## hincfelVery difficult on present income 0.48160
## health 0.12523
## marstsIn a legally registered civil union 598.90712
## marstsLegally divorced/Civil union dissolved 0.68018
## marstsWidowed/Civil partner died 0.80662
## marstsNone of these (NEVER married or in legally registered civil union) 0.67484
## uemplaMarked 0.39768
## z value
## (Intercept) -4.625
## gndrFemale 1.433
## age_group30-45 -0.933
## age_group46-55 -2.653
## age_group56-65 -4.298
## hincfelCoping on present income -0.266
## hincfelDifficult on present income 2.504
## hincfelVery difficult on present income 3.001
## health 9.056
## marstsIn a legally registered civil union -0.022
## marstsLegally divorced/Civil union dissolved 0.551
## marstsWidowed/Civil partner died 1.810
## marstsNone of these (NEVER married or in legally registered civil union) -0.168
## uemplaMarked 1.047
## Pr(>|z|)
## (Intercept) 3.75e-06
## gndrFemale 0.15197
## age_group30-45 0.35067
## age_group46-55 0.00797
## age_group56-65 1.72e-05
## hincfelCoping on present income 0.79044
## hincfelDifficult on present income 0.01227
## hincfelVery difficult on present income 0.00269
## health < 2e-16
## marstsIn a legally registered civil union 0.98219
## marstsLegally divorced/Civil union dissolved 0.58149
## marstsWidowed/Civil partner died 0.07037
## marstsNone of these (NEVER married or in legally registered civil union) 0.86642
## uemplaMarked 0.29529
##
## (Intercept) ***
## gndrFemale
## age_group30-45
## age_group46-55 **
## age_group56-65 ***
## hincfelCoping on present income
## hincfelDifficult on present income *
## hincfelVery difficult on present income **
## health ***
## marstsIn a legally registered civil union
## marstsLegally divorced/Civil union dissolved
## marstsWidowed/Civil partner died .
## marstsNone of these (NEVER married or in legally registered civil union)
## uemplaMarked
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 938.33 on 894 degrees of freedom
## Residual deviance: 764.30 on 881 degrees of freedom
## (1459 observations deleted due to missingness)
## AIC: 792.3
##
## Number of Fisher Scoring iterations: 13
exp(coef(aModel))
## (Intercept)
## 3.283725e-02
## gndrFemale
## 1.312260e+00
## age_group30-45
## 7.843905e-01
## age_group46-55
## 4.269698e-01
## age_group56-65
## 2.229637e-01
## hincfelCoping on present income
## 9.388731e-01
## hincfelDifficult on present income
## 2.027440e+00
## hincfelVery difficult on present income
## 4.242540e+00
## health
## 3.108311e+00
## marstsIn a legally registered civil union
## 1.567709e-06
## marstsLegally divorced/Civil union dissolved
## 1.454871e+00
## marstsWidowed/Civil partner died
## 4.304237e+00
## marstsNone of these (NEVER married or in legally registered civil union)
## 8.926907e-01
## uemplaMarked
## 1.516209e+00
suppressWarnings(exp(confint(aModel)))
## Waiting for profiling to be done...
## 2.5 %
## (Intercept) 0.006424371
## gndrFemale 0.906242694
## age_group30-45 0.470450554
## age_group46-55 0.225623225
## age_group56-65 0.110927432
## hincfelCoping on present income 0.592966778
## hincfelDifficult on present income 1.167999603
## hincfelVery difficult on present income 1.657105885
## health 2.444331935
## marstsIn a legally registered civil union NA
## marstsLegally divorced/Civil union dissolved 0.433769660
## marstsWidowed/Civil partner died 0.964083577
## marstsNone of these (NEVER married or in legally registered civil union) 0.268861305
## uemplaMarked 0.681660538
## 97.5 %
## (Intercept) 1.242233e-01
## gndrFemale 1.908325e+00
## age_group30-45 1.307383e+00
## age_group46-55 7.952167e-01
## age_group56-65 4.370677e-01
## hincfelCoping on present income 1.507335e+00
## hincfelDifficult on present income 3.539585e+00
## hincfelVery difficult on present income 1.105939e+01
## health 3.996227e+00
## marstsIn a legally registered civil union 4.298536e+34
## marstsLegally divorced/Civil union dissolved 6.757216e+00
## marstsWidowed/Civil partner died 2.430291e+01
## marstsNone of these (NEVER married or in legally registered civil union) 4.109351e+00
## uemplaMarked 3.273341e+00
r_mcfadden <- with(summary(aModel), 1 - deviance/null.deviance)
r_nagelkerke <- with(summary(aModel),
r_mcfadden / (1 - (null.deviance / nrow(aModel$data)*log(2))))
r_mcfadden
## [1] 0.185469
r_nagelkerke
## [1] 0.2562777
A multivariate logistic regression revealed that self-rated poor health and economic difficulty are strong predictors of clinically significant depressive symptoms. Participants reporting their income as “very difficult” were over 4 times more likely to report depression (OR = 4.45, 95% CI: 1.66–11.06), and those in poor health had three times the odds (OR = 3.12, 95% CI: 2.44–4.00). Age also played a protective role, especially in the 56–65 group (OR = 0.23). Gender showed a trend toward higher risk among females (OR = 1.60), though not statistically significant at the 95% level.
# Just testing some pltos ;)
png("effects_plot.png", width = 1200, height = 800)
plot(allEffects(aModel))