Homework number two

Introduction

This script analyzes how social determinants (income, health services, childhood conflicts, and working hours) influence depression (CES-D8 score) in Austria using ESS11 data.

# Load required packages
library(foreign)     # for reading SPSS files
library(likert)      # for Likert plots and summaries
library(kableExtra)  # for nicer tables

2. Recoding CES-D8 Depression Scale (Dependent Variable)

First of all, the CES-D8 variables were recoded as follows:

# Convert selected items to numeric scale
cesd_items = c("fltdpr", "flteeff", "slprl", "wrhpp", "fltlnl", "enjlf", "fltsd", "cldgng")

for (item in cesd_items) {
  df_aus[[paste0(item, "_n")]] = as.numeric(NA)
  df_aus[[paste0(item, "_n")]][df_aus[[item]] == "None or almost none of the time"] = 1
  df_aus[[paste0(item, "_n")]][df_aus[[item]] == "Some of the time"] = 2
  df_aus[[paste0(item, "_n")]][df_aus[[item]] == "Most of the time"] = 3
  df_aus[[paste0(item, "_n")]][df_aus[[item]] == "All or almost all of the time"] = 4
}

df_aus$wrhpp_n = 5 - df_aus$wrhpp_n
df_aus$enjlf_n = 5 - df_aus$enjlf_n
df_aus$CESD_TOTAL = rowMeans(df_aus[, paste0(cesd_items, "_n")], na.rm = TRUE)

Likert Scale Summary and Plot

# Extract relevant items for Likert analysis
likert_df_aus = df_aus[, cesd_items ]

# Create and display Likert summary
likert_obj = likert(likert_df_aus)
summary(likert_obj)

##      Item      low neutral      high     mean        sd
## 4   wrhpp 31.05218       0 68.947819 2.872541 0.7882364
## 6   enjlf 36.49979       0 63.500214 2.812580 0.8249923
## 3   slprl 89.25831       0 10.741688 1.632992 0.7562128
## 2 flteeff 90.84327       0  9.156729 1.590716 0.7119641
## 5  fltlnl 95.36170       0  4.638298 1.286383 0.5930864
## 8  cldgng 95.78185       0  4.218151 1.362591 0.6056129
## 7   fltsd 96.58849       0  3.411514 1.360768 0.5931890
## 1  fltdpr 96.80851       0  3.191489 1.358723 0.5727590

# Plot the Likert-scale responses
plot(likert_obj)

Mean Scores of Depression Items

# Convert items to numeric format
likert_numeric_df_aus = as.data.frame(lapply(df_aus[, cesd_items], as.numeric))

# Calculate means for each item
likert_means = sapply(likert_numeric_df_aus, mean, na.rm = TRUE)

# Print the means
likert_means

##   fltdpr  flteeff    slprl    wrhpp   fltlnl    enjlf    fltsd   cldgng 
## 1.358723 1.590716 1.632992 2.872541 1.286383 2.812580 1.360768 1.362591

bivariate analyis

multiple linear regresseion model as done in the previous seminar paper

mlrm = lm(CESD_TOTAL ~ hincfel_num + cnfpplh_num + wkhtot + stfhlth_num, data = df_aus)
summary(mlrm)

## 
## Call:
## lm(formula = CESD_TOTAL ~ hincfel_num + cnfpplh_num + wkhtot + 
##     stfhlth_num, data = df_aus)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.00641 -0.27287 -0.07585  0.20838  2.29300 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.6477758  0.0639004  41.436  < 2e-16 ***
## hincfel_num -0.1532889  0.0128009 -11.975  < 2e-16 ***
## cnfpplh_num -0.0912248  0.0092222  -9.892  < 2e-16 ***
## wkhtot      -0.0003108  0.0008385  -0.371    0.711    
## stfhlth_num -0.0240280  0.0038789  -6.194 7.02e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4118 on 2093 degrees of freedom
##   (256 Beobachtungen als fehlend gelöscht)
## Multiple R-squared:  0.1422, Adjusted R-squared:  0.1406 
## F-statistic: 86.76 on 4 and 2093 DF,  p-value: < 2.2e-16

Apply weights

df_aus$dweight = as.numeric(df_aus$dweight)


mlrm_weighted = lm(CESD_TOTAL ~ hincfel_num + cnfpplh_num + wkhtot + stfhlth_num, 
                   data = df_aus, 
                   weights = df_aus$dweight)
summary(mlrm_weighted)

## 
## Call:
## lm(formula = CESD_TOTAL ~ hincfel_num + cnfpplh_num + wkhtot + 
##     stfhlth_num, data = df_aus, weights = df_aus$dweight)
## 
## Weighted Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.31976 -0.23404 -0.04067  0.21687  2.55893 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.6461481  0.0628021  42.135  < 2e-16 ***
## hincfel_num -0.1606669  0.0124434 -12.912  < 2e-16 ***
## cnfpplh_num -0.0903881  0.0090008 -10.042  < 2e-16 ***
## wkhtot      -0.0012564  0.0007874  -1.596    0.111    
## stfhlth_num -0.0207764  0.0036831  -5.641 1.92e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3945 on 2093 degrees of freedom
##   (256 Beobachtungen als fehlend gelöscht)
## Multiple R-squared:  0.1477, Adjusted R-squared:  0.1461 
## F-statistic: 90.68 on 4 and 2093 DF,  p-value: < 2.2e-16

Comparison of unweighted and weighted regression results

We ran two regression models to explore what influences depression levels (CES-D8) in Austria: one without survey weights and one using the design weight (dweight) to better reflect the general population.

In both models, feeling financially secure was the strongest predictor: the more difficult people felt their financial situation was, the higher their depression scores. This effect became slightly stronger when weights were applied (from -0.153 to -0.161). Childhood conflict also remained a strong negative predictor (from -0.091 to -0.090), and health satisfaction showed a slightly weaker but still significant negative effect (from -0.024 to -0.021). Working hours per week had no clear influence in either model (changing from -0.0003 to -0.0013, both not significant).

Overall, the weighted model fit the data slightly better. It explained a bit more variance in depression scores (Adjusted R²: 0.146 vs. 0.141) and had a lower residual error (0.3945 vs. 0.4118). This shows that using survey weights helps make the results more accurate and more representative of the population.

homework2

2025-05-15