Seminarpaper Wachter Krabichler

1. Introduction

This script analyzes how social determinants (income, health services, childhood conflicts, and working hours) influence depression (CES-D8 score) in Austria using ESS11 data.

2. Recoding CES-D8 Depression Scale (Dependent Variable)

First of all, the CES-D8 variables were recoded as follows:

cesd_items <- c("fltdpr", "flteeff", "slprl", "wrhpp", "fltlnl", "enjlf", "fltsd", "cldgng")

for (item in cesd_items) {
  df_aus[[paste0(item, "_n")]] <- NA  # initialize
  df_aus[[paste0(item, "_n")]][df_aus[[item]] == "None or almost none of the time"] <- 0
  df_aus[[paste0(item, "_n")]][df_aus[[item]] == "Some of the time"] <- 1
  df_aus[[paste0(item, "_n")]][df_aus[[item]] == "Most of the time"] <- 2
  df_aus[[paste0(item, "_n")]][df_aus[[item]] == "All or almost all of the time"] <- 3
}

# Reverse code positive items (these were originally phrased positively)
df_aus$wrhpp_n <- ifelse(!is.na(df_aus$wrhpp_n), 3 - df_aus$wrhpp_n, NA)
df_aus$enjlf_n <- ifelse(!is.na(df_aus$enjlf_n), 3 - df_aus$enjlf_n, NA)



# Now compute CES-D8 total (for dclinically significant depressive symptoms)
df_aus$CESD_TOTAL = rowSums(df_aus[, paste0(cesd_items, "_n")], na.rm = FALSE)


table(df_aus$cesd8_total)

## < table of extent 0 >

3. Reliability Analysis (Cronbach’s Alpha for CES-D8)

alpha_value = alpha(df_aus[, paste0(cesd_items, "_n")])
print(alpha_value$total$raw_alpha)

## [1] 0.8033533

The Cronbach’s Alpha can be interpreted as follows:

Cronbach’s Alpha indicates good internal consistency.
Values >0.7 are acceptable, >0.8 are good, and >0.9 are excellent.
Removing any item does not significantly improve reliability, meaning all CES-D8 items contribute well to the scale.

4. Descriptive Statistics and Normality Test

summary(df_aus$CESD_TOTAL)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00    2.00    4.00    4.87    7.00   24.00      33

## 
##  Shapiro-Wilk normality test
## 
## data:  df_aus$CESD_TOTAL
## W = 0.9133, p-value < 2.2e-16

The results can be Interpreted as follows:

The p-value is < 0.05, meaning that CESD_TOTAL is NOT normally distributed.
Since many statistical tests (e.g., Pearson correlation, ANOVA, t-tests) assume normality,
We use non-parametric alternatives: Spearman correlation and Kruskal-Wallis.

5. Bivariate Analysis

Spearman’s rank correlation was used to assess the association between CES-D8 and ordinal predictors (hincfel, stfhlt, cnfpplh) due to violations of normality in the CES-D8 distribution. Kruskal-Wallis tests examined group differences in CES-D8 across ordinal independent variables.

Income satisfaction & depression

## 
##  Spearman's rank correlation rho
## 
## data:  df_aus$CESD_TOTAL and df_aus$hincfel_num
## S = 2562360194, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## -0.252128

## 
##  Kruskal-Wallis rank sum test
## 
## data:  CESD_TOTAL by hincfel_num
## Kruskal-Wallis chi-squared = 176.62, df = 3, p-value < 2.2e-16

H1: The results confirm the hypothesis that higher income satisfaction is significantly associated with lower depressive symptoms. The boxplot highlights the decreasing trend of depressive symptoms as income satisfaction increases. Individuals reporting financial difficulties show higher CES-D8 scores, supporting the hypothesis that income satisfaction is a key determinant of mental health.

Perceived health services & depression

## 
##  Spearman's rank correlation rho
## 
## data:  df_aus$CESD_TOTAL and df_aus$stfhlth_num
## S = 2421778021, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.1712044

## 
##  Kruskal-Wallis rank sum test
## 
## data:  CESD_TOTAL by stfhlth_num
## Kruskal-Wallis chi-squared = 76.065, df = 10, p-value = 2.951e-12

H2: Better perceptions of health services are linked to lower depressive symptoms, supporting the hypothesis of H2.

Childhood conflicts & depression

## 
##  Spearman's rank correlation rho
## 
## data:  df_aus$CESD_TOTAL and df_aus$cnfpplh_num
## S = 2608694583, p-value < 2.2e-16
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## -0.2747699

## 
##  Kruskal-Wallis rank sum test
## 
## data:  CESD_TOTAL by cnfpplh_num
## Kruskal-Wallis chi-squared = 175.57, df = 4, p-value < 2.2e-16

H3: More frequent childhood conflicts are significantly associated with higher depressive symptoms. Thus, the hypothesis is confirmed.

Working hours and depression

## 
##  Spearman's rank correlation rho
## 
## data:  df_aus$CESD_TOTAL and df_aus$wkhtot
## S = 1510067318, p-value = 0.5892
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.0118111

H4: Interpretation: Working hours are not significantly associated with depressive symptoms, leading to the rejection of H4. The scatterplot illustrates the lack of a significant correlation between working hours and depressive symptoms. This supports the results of the Spearman correlation analysis, which indicated a non-significant relationship (ρ = -0.010, p = 0.657). The dispersion of points suggests that working hours do not systematically predict depression in this sample.

6. Multi linear regresion model

##              CESD_TOTAL hincfel_num cnfpplh_num       wkhtot  stfhlth_num
## CESD_TOTAL   1.00000000 -0.28143950  -0.2689294 -0.011487895 -0.167037293
## hincfel_num -0.28143950  1.00000000   0.1493801  0.041298772  0.047046449
## cnfpplh_num -0.26892936  0.14938008   1.0000000 -0.010915399  0.163197732
## wkhtot      -0.01148789  0.04129877  -0.0109154  1.000000000  0.001021815
## stfhlth_num -0.16703729  0.04704645   0.1631977  0.001021815  1.000000000

## 
## Call:
## lm(formula = CESD_TOTAL ~ hincfel_num + cnfpplh_num + wkhtot + 
##     stfhlth_num, data = df_aus)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.9785 -2.1897 -0.6116  1.6286 18.3368 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 13.097456   0.519876  25.193  < 2e-16 ***
## hincfel_num -1.230848   0.103305 -11.915  < 2e-16 ***
## cnfpplh_num -0.723075   0.074471  -9.710  < 2e-16 ***
## wkhtot      -0.002867   0.006769  -0.424    0.672    
## stfhlth_num -0.183817   0.031195  -5.892 4.43e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.29 on 2068 degrees of freedom
##   (281 Beobachtungen als fehlend gelöscht)
## Multiple R-squared:  0.1386, Adjusted R-squared:  0.1369 
## F-statistic: 83.17 on 4 and 2068 DF,  p-value: < 2.2e-16

The multiple linear regression model identified income satisfaction, childhood conflict and healthcare perception as significant predictors of depression. Working hours had no significant effect. The regression model explains 14.06% of the variance in depressive symptoms.

7. Predictors of Clinically Significant Depression

CES-D8 total score ranges from 0 to 24. A common clinical cutoff is 9. We’ll use >=9 to represent clinically significant symptoms, based on prior literature.

df_aus$depression_binary = ifelse(df_aus$CESD_TOTAL >= 9, 1, 0)

Frequency distribution

table(df_aus$depression_binary)

## 
##    0    1 
## 1973  348

prop.table(table(df_aus$depression_binary))

## 
##         0         1 
## 0.8500646 0.1499354

348 people ( ~15%) have a score equal or above 9.

Fit logistic regression

df_aus$agea = as.numeric(as.character(df_aus$agea))

df_aus$age_group = cut(df_aus$agea,
                        breaks = c(15, 24, 34, 44, 54, 64, Inf),
                        labels = c("15–24", "25–34", "35–44", "45–54", "55–64", "65+"),
                        right = TRUE,
                        include.lowest = TRUE)

df_aus$age_group = as.factor(df_aus$age_group)


model_logit = glm(depression_binary ~ age_group + gndr + hinctnta + health,
                   data = df_aus, family = binomial)

Summary of the model

summary(model_logit)

## 
## Call:
## glm(formula = depression_binary ~ age_group + gndr + hinctnta + 
##     health, family = binomial, data = df_aus)
## 
## Coefficients:
##                         Estimate Std. Error z value Pr(>|z|)    
## (Intercept)              -1.6588     0.4091  -4.054 5.03e-05 ***
## age_group25–34           -0.6898     0.4040  -1.707 0.087779 .  
## age_group35–44           -0.1791     0.3662  -0.489 0.624811    
## age_group45–54           -0.9122     0.3805  -2.397 0.016521 *  
## age_group55–64           -1.2294     0.3667  -3.353 0.000800 ***
## age_group65+             -0.9304     0.3443  -2.703 0.006879 ** 
## gndrFemale                0.2224     0.1504   1.479 0.139216    
## hinctntaR - 2nd decile   -0.6551     0.2879  -2.276 0.022872 *  
## hinctntaC - 3rd decile   -0.8290     0.2965  -2.796 0.005176 ** 
## hinctntaM - 4th decile   -1.1069     0.3289  -3.365 0.000765 ***
## hinctntaF - 5th decile   -1.2155     0.3436  -3.538 0.000404 ***
## hinctntaS - 6th decile   -1.6358     0.3473  -4.710 2.48e-06 ***
## hinctntaK - 7th decile   -1.5437     0.3473  -4.444 8.81e-06 ***
## hinctntaP - 8th decile   -1.4861     0.3506  -4.239 2.25e-05 ***
## hinctntaD - 9th decile   -1.6320     0.4488  -3.636 0.000277 ***
## hinctntaH - 10th decile  -1.5001     0.5943  -2.524 0.011597 *  
## healthGood                1.2942     0.2781   4.654 3.25e-06 ***
## healthFair                2.4682     0.2906   8.493  < 2e-16 ***
## healthBad                 3.6781     0.3494  10.527  < 2e-16 ***
## healthVery bad            4.3995     0.6811   6.459 1.05e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1588.8  on 1826  degrees of freedom
## Residual deviance: 1295.7  on 1807  degrees of freedom
##   (527 Beobachtungen als fehlend gelöscht)
## AIC: 1335.7
## 
## Number of Fisher Scoring iterations: 6

plot(model_logit)

data.frame(
  coef(model_logit))

##                         coef.model_logit.
## (Intercept)                    -1.6588010
## age_group25–34                 -0.6897643
## age_group35–44                 -0.1790995
## age_group45–54                 -0.9122226
## age_group55–64                 -1.2294410
## age_group65+                   -0.9304475
## gndrFemale                      0.2223542
## hinctntaR - 2nd decile         -0.6550695
## hinctntaC - 3rd decile         -0.8290198
## hinctntaM - 4th decile         -1.1068801
## hinctntaF - 5th decile         -1.2155314
## hinctntaS - 6th decile         -1.6357976
## hinctntaK - 7th decile         -1.5437305
## hinctntaP - 8th decile         -1.4860544
## hinctntaD - 9th decile         -1.6319786
## hinctntaH - 10th decile        -1.5000757
## healthGood                      1.2942101
## healthFair                      2.4681532
## healthBad                       3.6780830
## healthVery bad                  4.3994576

Odds ratios and confidence intervals

exp(coef(model_logit))

##             (Intercept)          age_group25–34          age_group35–44 
##               0.1903671               0.5016943               0.8360227 
##          age_group45–54          age_group55–64            age_group65+ 
##               0.4016306               0.2924560               0.3943772 
##              gndrFemale  hinctntaR - 2nd decile  hinctntaC - 3rd decile 
##               1.2490136               0.5194059               0.4364769 
##  hinctntaM - 4th decile  hinctntaF - 5th decile  hinctntaS - 6th decile 
##               0.3305888               0.2965524               0.1947969 
##  hinctntaK - 7th decile  hinctntaP - 8th decile  hinctntaD - 9th decile 
##               0.2135828               0.2262636               0.1955423 
## hinctntaH - 10th decile              healthGood              healthFair 
##               0.2231133               3.6481131              11.8006335 
##               healthBad          healthVery bad 
##              39.5704650              81.4067056

exp(confint(model_logit))

##                               2.5 %      97.5 %
## (Intercept)              0.08270793   0.4135277
## age_group25–34           0.22665238   1.1142717
## age_group35–44           0.41236447   1.7439368
## age_group45–54           0.19180456   0.8582073
## age_group55–64           0.14389645   0.6097177
## age_group65+             0.20365935   0.7906016
## gndrFemale               0.93191658   1.6812519
## hinctntaR - 2nd decile   0.29552709   0.9153103
## hinctntaC - 3rd decile   0.24402239   0.7818637
## hinctntaM - 4th decile   0.17242176   0.6276817
## hinctntaF - 5th decile   0.14964051   0.5775781
## hinctntaS - 6th decile   0.09729862   0.3812284
## hinctntaK - 7th decile   0.10676471   0.4182928
## hinctntaP - 8th decile   0.11232529   0.4457451
## hinctntaD - 9th decile   0.07735240   0.4553992
## hinctntaH - 10th decile  0.06033250   0.6522433
## healthGood               2.16477800   6.4744976
## healthFair               6.83370960  21.4520280
## healthBad               20.30954194  80.2084476
## healthVery bad          22.52545314 343.3728297

Interpretation: The logistic regression model predicts the probability of clinically significant depression (CES-D8 ≥ 9) using age group, gender, income decile, and self-rated health as predictors: - Age appears to trend toward lower risk with increasing age, but this relationship isn’t statistically significant across the groups. - Women may be at higher risk, but the evidence isn’t strong enough in this model. - Higher income is strongly associated with lower odds of depression, and this is statistically significant from around the 5th decile and higher. - There is a very strong, statistically significant gradient — worse self-rated health is very strongly associated with increased odds of clinical depression.

Assess model fit

r_mcfadden = with(summary(model_logit), 1 - deviance/null.deviance)
r_nagelkerke = with(summary(model_logit), r_mcfadden/(1 - (null.deviance / nrow(model_logit$data)*log(2))))
r_mcfadden

## [1] 0.1844896

r_nagelkerke

## [1] 0.3466732