Data and Sample

We begin with the full ESS11 dataset (N = 40156).

Hypotheses

H1: alcfreq = Individuals with a higher frequency of alcohol consumption will exhibit higher levels of depressive symptoms.
H2: eatveg = Lower frequency of vegetable consumption is associated with higher depressive symptom scores.
H3: hincfel = Households with greater difficulty managing their income are positively associated with higher depressive symptoms.
H4: domicil = People living in rural areas will exhibit higher depressive symptoms than those living in urban areas.
H5: wlespdm = Higher perceptions of gender inequality are positively associated with higher depressive symptoms.

Depression Scale Calculation and Reliability

# convert to numeric
df$d20 = as.numeric(df$fltdpr)
df$d21 = as.numeric(df$flteeff)
df$d22 = as.numeric(df$slprl)
df$d23 = as.numeric(df$wrhpp)
df$d24 = as.numeric(df$fltlnl)
df$d25 = as.numeric(df$enjlf)
df$d26 = as.numeric(df$fltsd)
df$d27 = as.numeric(df$cldgng)

# reverse scoring for the positive items
df$d23 = 5-df$d23
df$d25 = 5-df$d25

# check degree of consistency (internal consistency)
cronbach.alpha(df[,c("d20", "d21", "d22", "d23", "d24", "d25", "d26", "d27" )], na.rm=T)

## 
## Cronbach's alpha for the 'df[, c("d20", "d21", "d22", "d23", "d24", "d25", "d26", "d27")]' data-set
## 
## Items: 8
## Sample units: 40156
## alpha: 0.823

# compute the score
df$dep = rowSums(df[,c("d20", "d21", "d22", "d23", "d24", "d25", "d26", "d27")]) / 8

library(ltm)
library(likert)     # create basic Likert tables and plots
library(kableExtra) # create formatted tables 

vnames = c("fltdpr", "flteeff", "slprl", "wrhpp", "fltlnl", "enjlf", "fltsd", "cldgng")
likert_df = df[,vnames]

likert_table = likert(likert_df)$results 
likert_numeric_df = as.data.frame(lapply((df[,vnames]), as.numeric))
likert_table$Mean = unlist(lapply((likert_numeric_df[,vnames]), mean, na.rm=T)) # ... and append new columns to the data frame
likert_table$Count = unlist(lapply((likert_numeric_df[,vnames]), function (x) sum(!is.na(x))))

likert_table$Item <- c(
  fltdpr = "How much of the time … feel depressed?",
  flteeff = "… everything you did feel like an effort?",
  slprl   = "… was your sleep restless?",
  wrhpp   = "… did you feel happy?",
  fltlnl  = "… did you feel lonely?",
  enjlf   = "… did you enjoy life?",
  fltsd   = "… did you feel sad?",
  cldgng  = "… did you feel you could not get going?"
)


# round all percentage values to 1 decimal digit
likert_table[,2:6] = round(likert_table[,2:6],1)
# round means to 3 decimal digits
likert_table[,7] = round(likert_table[,7],3)

# create formatted table
kable_styling(kable(likert_table,
                    caption = "Distribution of answers regarding depression indicators (ESS round 11, all countries)"
                    )
              )

Distribution of answers regarding depression indicators (ESS round 11, all countries)
Item	None or almost none of the time	Some of the time	Most of the time	All or almost all of the time	Mean	Count
How much of the time … feel depressed?	64.9	29.1	4.6	1.5	1.4	39981
… everything you did feel like an effort?	48.4	38.4	9.8	3.4	1.7	39983
… was your sleep restless?	43.9	39.9	11.6	4.6	1.8	40017
… did you feel happy?	4.0	23.5	48.9	23.6	2.9	39890
… did you feel lonely?	68.1	24.3	5.3	2.3	1.4	39983
… did you enjoy life?	5.3	24.8	44.8	25.0	2.9	39878
… did you feel sad?	52.5	41.1	4.9	1.6	1.6	39981
… did you feel you could not get going?	55.7	36.1	6.2	2.0	1.5	39949

# create basic plot (code also valid)
plot(likert(summary=likert_table[,1:6])) # limit to columns 1:6 to skip mean and count

table(df$cntry)

## 
##            Albania            Austria            Belgium           Bulgaria 
##                  0               2354               1594                  0 
##        Switzerland             Cyprus            Czechia            Germany 
##               1384                685                  0               2420 
##            Denmark            Estonia              Spain            Finland 
##                  0                  0               1844               1563 
##             France     United Kingdom            Georgia             Greece 
##               1771               1684                  0               2757 
##            Croatia            Hungary            Ireland             Israel 
##               1563               2118               2017                  0 
##            Iceland              Italy          Lithuania         Luxembourg 
##                842               2865               1365                  0 
##             Latvia         Montenegro    North Macedonia        Netherlands 
##                  0                  0                  0               1695 
##             Norway             Poland           Portugal            Romania 
##               1337               1442               1373                  0 
##             Serbia Russian Federation             Sweden           Slovenia 
##               1563                  0               1230               1248 
##           Slovakia             Turkey            Ukraine             Kosovo 
##               1442                  0                  0                  0

unique(df$cntry)

##  [1] Austria        Belgium        Switzerland    Cyprus         Germany       
##  [6] Spain          Finland        France         United Kingdom Greece        
## [11] Croatia        Hungary        Ireland        Iceland        Italy         
## [16] Lithuania      Netherlands    Norway         Poland         Portugal      
## [21] Serbia         Sweden         Slovenia       Slovakia      
## 40 Levels: Albania Austria Belgium Bulgaria Switzerland Cyprus ... Kosovo

# subset to Austria
df_Austria = df[df$cntry == "Austria", ]
nrow(df_Austria)

## [1] 2354

The Austrian sample consisted of 2354 respondents.

Variable Recoding for Multivariate Model

## alcfreq is factor, make it numeric & new levels & recode 
df_Austria$alcfreq_num = NA
df_Austria$alcfreq_num[df_Austria$alcfreq == "Every day"] = 1
df_Austria$alcfreq_num[df_Austria$alcfreq == "Several times a week"] = 2
df_Austria$alcfreq_num[df_Austria$alcfreq == "Once a week"] = 3
df_Austria$alcfreq_num[df_Austria$alcfreq == "2-3 times a month"] = 4
df_Austria$alcfreq_num[df_Austria$alcfreq == "Once a month"] = 5
df_Austria$alcfreq_num[df_Austria$alcfreq == "Less than once a month"] = 6
df_Austria$alcfreq_num[df_Austria$alcfreq == "Never"] = 7

# recoding
df_Austria$alcfreq_recoded = 8 - df_Austria$alcfreq_num
table(df_Austria$alcfreq_recoded)

## 
##   1   2   3   4   5   6   7 
## 531 231 152 375 380 511 171

# group domicil
df_Austria$domicil = as.numeric(df_Austria$domicil)
table(df_Austria$domicil)

## 
##   1   2   3   4   5 
## 590 173 632 872  86

# now make levels to prove hypothesis 
# Urban = Level 1 + 2 (A big city + Suburbs or outskirts of big city)
# Suburban = Level 3 (Town or small city)
# Rural = Level 4 +5 (Country village + Farm or home in countryside)
df_Austria$domicil_group = factor(NA, levels = c("Urban", "Suburban", "Rural"))

# Assign groups based on domicil levels
df_Austria$domicil_group[df_Austria$domicil %in% c(1, 2)] = "Urban"
df_Austria$domicil_group[df_Austria$domicil == 3] = "Suburban"
df_Austria$domicil_group[df_Austria$domicil %in% c(4, 5)] = "Rural"

Multivariate Regression Model

#Model 4: 
model4 = lm(dep ~ alcfreq_recoded + eatveg + hincfel + domicil_group + wlespdm, data = df_Austria, weights = anweight)
summary(model4)

## 
## Call:
## lm(formula = dep ~ alcfreq_recoded + eatveg + hincfel + domicil_group + 
##     wlespdm, data = df_Austria, weights = anweight)
## 
## Weighted Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.83574 -0.12839 -0.02423  0.11260  2.10102 
## 
## Coefficients:
##                                                          Estimate Std. Error
## (Intercept)                                              1.416574   0.095008
## alcfreq_recoded                                         -0.005004   0.004508
## eatvegTwice a day                                       -0.051553   0.054862
## eatvegOnce a day                                        -0.034415   0.052688
## eatvegLess than once a day but at least 4 times a week  -0.022375   0.054720
## eatvegLess than 4 times a week but at least once a week  0.032516   0.058839
## eatvegLess than once a week                              0.221997   0.131784
## eatvegNever                                              0.644169   0.541707
## hincfelCoping on present income                          0.052220   0.019516
## hincfelDifficult on present income                       0.314644   0.031891
## hincfelVery difficult on present income                  0.810762   0.058422
## domicil_groupSuburban                                   -0.018652   0.023519
## domicil_groupRural                                      -0.041669   0.020590
## wlespdmRarely                                            0.108020   0.086179
## wlespdmSometimes                                         0.138248   0.081121
## wlespdmOften                                             0.171953   0.080545
## wlespdmAlways                                            0.097088   0.084347
##                                                         t value Pr(>|t|)    
## (Intercept)                                              14.910  < 2e-16 ***
## alcfreq_recoded                                          -1.110  0.26706    
## eatvegTwice a day                                        -0.940  0.34748    
## eatvegOnce a day                                         -0.653  0.51371    
## eatvegLess than once a day but at least 4 times a week   -0.409  0.68266    
## eatvegLess than 4 times a week but at least once a week   0.553  0.58058    
## eatvegLess than once a week                               1.685  0.09222 .  
## eatvegNever                                               1.189  0.23451    
## hincfelCoping on present income                           2.676  0.00751 ** 
## hincfelDifficult on present income                        9.866  < 2e-16 ***
## hincfelVery difficult on present income                  13.878  < 2e-16 ***
## domicil_groupSuburban                                    -0.793  0.42783    
## domicil_groupRural                                       -2.024  0.04311 *  
## wlespdmRarely                                             1.253  0.21018    
## wlespdmSometimes                                          1.704  0.08848 .  
## wlespdmOften                                              2.135  0.03288 *  
## wlespdmAlways                                             1.151  0.24984    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2361 on 2199 degrees of freedom
##   (138 observations deleted due to missingness)
## Multiple R-squared:  0.127,  Adjusted R-squared:  0.1207 
## F-statistic:    20 on 16 and 2199 DF,  p-value: < 2.2e-16

Homework 2

Radvile Karaleviciute

2025-05-13

Introduction

Data and Sample

Depression Scale Calculation and Reliability

Variable Recoding for Multivariate Model

Multivariate Regression Model

Results