Individual and contextual socioeconomic variables as predictors of depression in Understanding Society

Methods

library(haven)   
library(tidyverse) 
library(table1)
library(lmerTest)
library(patchwork)
library(e1071)
library(gtsummary)

This study is a cross-sectional secondary analysis of data from the UK Household Longitudinal Study, also known as Understanding Society. Understanding society is a large, nationally representative survey that has collected data on a range of social, health, and economic-related outcomes since 2009 on approximately 40,000 UK households. Data from Wave 12 (2021) was selected for analysis to be geographically linked to 2021 Lower Layer Super Output Area (LSOA) data. This was then matched with the English Indices of Deprivation (IoD2019) which produces an Index of Multiple Deprivation (IMD) for each English LSOA. As such, individual and household data was extracted from Wave 12 of Understanding Society and linked to the IoD2019 dataset.

To begin the analysis, we loaded in Individual and Household Data from Understanding Society #6614, LSOA Data from SN #9169 and Index of Multiple Deprivation (IMD) data from IoD2019.

# Load the data from Understanding Society SN#6614 Wave 12 and SN#9169, and from English IoD2019

Individual_Interview_Data  <- read_dta("l_indresp.dta")  # n = 29,721
Household_Data <-read_dta("l_hhresp.dta") #n = 16,856
LSOA_Data <- read_dta("l_lsoa21_protect.dta")
IMD_Data <- read.csv("IMD_Data.csv")

In order to organise the data, we renamed the relevant columns from four data sets before merging them to create a larger data set that we can then filter from to produce our sample.

# Rename necessary columns for ease

Individual_Interview_Data <- Individual_Interview_Data %>% rename(Participant_ID = pidp, Household_ID = l_hidp, Sex = l_sex, Age = l_dvage, GHQ_Likert = l_scghq1_dv)
Household_Data <- Household_Data %>% rename(Household_ID = l_hidp, Household_Income = l_fihhmnnet1_dv, Equivalisation_Value = l_ieqmoecd_dv)
LSOA_Data <- LSOA_Data %>% rename(Household_ID = l_hidp, LSOA = l_lsoa21)
IMD_Data <- IMD_Data %>% rename( LSOA = LSOA.code..2011., IMD_Score = Index.of.Multiple.Deprivation..IMD..Score)

# Merge data sets together to create larger data set with variables of interest

Understanding_Society <- Individual_Interview_Data %>% left_join(Household_Data, by = "Household_ID") %>% left_join(LSOA_Data, by = "Household_ID") %>% left_join(IMD_Data, by = "LSOA") %>% select(Participant_ID, Household_ID, LSOA, IMD_Score, Age, Sex, Household_Income, Equivalisation_Value, GHQ_Likert)

Variables

Variable selection, creation, and analyses were pre-registered on AsPredicted (https://aspredicted.org/s6vw7n.pdf).

Our outcome variable was depressive symptom score measured used Understanding Society’s shortened, 12-item version of the Generalised Health Questionnaire (GHQ-12) The GHQ-12 is a screening instrument extensively used as a measure of psychological distress and has been re-purposed across multiple languages and countries as a population screening tool for depressive symptoms (1). This study uses Understanding Society’s provided Likert score variable, ‘scghq1_dv’ which rescales answers to the questionnaire by recording the values of individual variables from 1 to 4 to 0 to 3 before summing them. This produces a range of scores from 0 (indicating the least amount of depressive symptoms) to 36 (indicating the greatest amount of depressive symptoms). Prior to analyses, GHQ Likert scores were square-root transformed to account for the positive skew of the distribution.

Our first predictor variable is income measured as equivalised total net household income (£/month), “‘fihhmnnet1_dv’. This is calculated by dividing income values by the OECD modified equivalence provided under ‘ieqmoecd_dv’. For brevity, we refer to equivalised total net household income as income in the following text. The second predictor variable was neighbourhood socioeconomic context measured using the IMD score provided in IoD2019. IMD scores provide a measure of the deprivation of English LSOAs through the combination of seven weighted domains: Income Deprivation (22.5%), Employment Deprivation (22.5%), Education, Skills and Training Deprivation (13.5%), Health Deprivation and Disability (13.5%), Crime (9.3%), Barriers to Housing and Services (9.3%), and Living Environment Deprivation (9.3%). Income was log-transformed prior to analysis, and both income and IMD score were mean-centred on 0. Our co-variates were age (mean-centred on 0) and sex (binary with sum contrasts).

Before constructing the analytical sample, income was equivalised using the OECD modified equivalence scale, and the sex variable was coded into a binary factor. Missing income values were removed, while negative income observations were converted to zero to allow log-transformation. The following code performs these steps:

#Equivalise income (Household Income/Equivalisation Value)

Understanding_Society$Household_Income[Understanding_Society$Household_Income == -9] <- NA # Remove missing household income codes (-9)
Understanding_Society$Equivalisation_Value[Understanding_Society$Equivalisation_Value < 0] <- NA # Remove respondents with missing equivalisation values
Understanding_Society$Equivalised_Income <- Understanding_Society$Household_Income/Understanding_Society$Equivalisation_Value # Equivalise income
Understanding_Society$Equivalised_Income[Understanding_Society$Equivalised_Income< 0] <- 0 # Converting negative values of income to 0 to allow log-transformation

# Converting sex to binary factor with sum contrasts

Understanding_Society <- Understanding_Society %>% mutate(Sex = case_when (Sex == 1  ~ "Male",Sex == 2  ~ "Female",TRUE ~ NA_character_))
Understanding_Society$Sex <- factor(Understanding_Society$Sex, levels = c("Male", "Female"))
contrasts(Understanding_Society$Sex) <- contr.sum(2)

Participants

Wave 12 of Understanding Society contains 29,271 observations. Participants were excluded if they were under 18 years of age, resided in LSOAs outside England, had LSOA codes that could not be matched to IoD2019 (some LSOA boundaries included in Wave 12 of Understanding Society were created after IoD2019 data were collected and therefore do not correspond to the IMDs), or had missing data for any of the following variables: age, sex, GHQ-12 score, income, LSOA of residence, or IMD score. This resulted in a sample of 20,251 participants.

# Filtering data

Understanding_Society$GHQ_Likert[Understanding_Society$GHQ_Likert < 0] <- NA # Remove -9 (missing) and -7 (proxy) values from GHQ data 
F_Understanding_Society <- Understanding_Society %>% filter(str_starts(LSOA, "E")) %>% filter(complete.cases(LSOA, IMD_Score,GHQ_Likert, Equivalised_Income, Age, Sex), Age >= 18 ) # Filter for complete cases
# 20,251 complete cases

Analyses

All analyses were conducted on R version 4.5.2. Two multilevel linear regression models were conducted using the lmerTest package. The first model tested for the main effects of equivalised income and neighbourhood socioeconomic context while the second tested for the interaction effect of income and neighbourhood socioeconomic context on GHQ-12 Likert sum scores. The models had a level-two random effect of household, creating random intercepts for each. As mentioned in Variables, income values were log-transformed, and both income and IMD score were mean-centred on 0. GHQ Likert sum score was square-root transformed. Accordingly, the models were run using the following code:

# Modelling the associations

# Mutate equivalised income variable to its logarithm 

F_Understanding_Society$log_Equivalised_Income <- log(F_Understanding_Society$Equivalised_Income + 1) # Adding 1 as there are values of 0 in the dataset

# Scale Age, Income, IMD 
F_Understanding_Society$Age_centred <- F_Understanding_Society$Age - mean(F_Understanding_Society$Age, na.rm=T)
F_Understanding_Society$IMD_Score_centred <- F_Understanding_Society$IMD_Score - mean(F_Understanding_Society$IMD_Score, na.rm=T) 
F_Understanding_Society$log_Income_centred <- F_Understanding_Society$log_Equivalised_Income - mean(F_Understanding_Society$log_Equivalised_Income, na.rm = T)

# Square root transform GHQ Likert Score

F_Understanding_Society$sqrt_GHQ_Likert <- sqrt(F_Understanding_Society$GHQ_Likert)

# Model 1 - Testing main effects of equivalised household income and IMD on GHQ Score

m1 <-lmer(sqrt_GHQ_Likert ~ log_Income_centred + IMD_Score_centred + Age_centred + Sex + (1|Household_ID), data=F_Understanding_Society)

# Model 2 - Testing interaction effect of equivalised household income and IMD on GHQ Score

m2 <- lmer(sqrt_GHQ_Likert ~ log_Income_centred * IMD_Score_centred + Age_centred + Sex + (1|Household_ID), data=F_Understanding_Society)

Results

Descriptive Statistics

The final sample consisted of 20,251 participants (Table 1) from 12,188 households. The mean age was 51.6 years, and 44.5% of respondents were male. Average equivalised household income was £2,130 per month, and the mean neighbourhood deprivation score (IMD) was 20.5. The average GHQ-12 Likert score was 12.

///

label(F_Understanding_Society$Age) <- "Age (years)" 
label(F_Understanding_Society$Equivalised_Income) <- "Equivalised net income (£/month)"
label(F_Understanding_Society$IMD_Score) <- "Index of Multiple Deprivation (IMD) Score"
label(F_Understanding_Society$GHQ_Likert) <- "GHQ-12 Likert Score"
label(F_Understanding_Society$Sex) <- "Sex"
Table1 <- table1(~Age + Sex + Equivalised_Income + IMD_Score+ GHQ_Likert, data = F_Understanding_Society)

Table1

	Overall (N=20251)
Age (years)
Mean (SD)	51.6 (18.0)
Median [Min, Max]	52.0 [18.0, 100]
Sex
Male	9007 (44.5%)
Female	11244 (55.5%)
Equivalised net income (£/month)
Mean (SD)	2130 (1620)
Median [Min, Max]	1880 [0, 82700]
Index of Multiple Deprivation (IMD) Score
Mean (SD)	20.5 (14.6)
Median [Min, Max]	16.7 [0.861, 92.7]
GHQ-12 Likert Score
Mean (SD)	12.0 (5.81)
Median [Min, Max]	11.0 [0, 36.0]

Distribution of GHQ Scores

#GHQ-12 Likert scores were positively skewed (skewness = 1.386). A square-root transformation was therefore applied and reduced skewness to 0.577. 

# Measure skewness
skewness(F_Understanding_Society$GHQ_Likert)

## [1] 1.385478

skewness(F_Understanding_Society$sqrt_GHQ_Likert)

## [1] 0.5773107

# Display distribution of GHQ-12 scores
Figure1a <- F_Understanding_Society %>% ggplot(aes(x = GHQ_Likert)) + geom_histogram (bins = 30, fill = "grey75", colour = "black") + theme_classic() + labs(x = "GHQ-12 Likert score", y = "Number of Participants")

# Display distribution of square-root transformed GHQ-12 scores
Figure1b <- F_Understanding_Society %>% ggplot(aes(x = sqrt_GHQ_Likert)) + geom_histogram (bins = 30, fill = "grey75", colour = "black") + theme_classic() + labs(x = "Square root transformed GHQ-12 Likert score")

Figure1a + Figure1b

Figure 1. Distribution of GHQ-12 Likert scores before (A) and after (B) square-root transformation. GHQ-12 scores were positively skewed (skewness = 1.386). A square root transformation was therefore applied and reduced skewness to 0.577.

Distribution of Income

Individual income and neighborhood IMD score were weakly negatively correlated (r = -0.174). Median log equivalised income declined from 7.72 in the highest IMD decile (most deprived) to 7.26 in the lowest IMD decile (least deprived). Income distributions, furthermore, overlapped substantially across IMD deciles.

# Calculate correlation of income and IMD score

cor(F_Understanding_Society$log_Equivalised_Income, F_Understanding_Society$IMD_Score)

## [1] -0.1774144

# Create IMD decile variable

F_Understanding_Society$IMD_Decile <- ntile(F_Understanding_Society$IMD_Score, 10) # Creating IMD deciles
F_Understanding_Society$IMD_Decile <- factor(F_Understanding_Society$IMD_Decile) # Converting IMD deciles to factors

# Create violin plot to display distribution of income across IMD deciles

Figure3 <- ggplot(F_Understanding_Society, aes(x = IMD_Decile, y = log_Equivalised_Income, fill = factor(F_Understanding_Society$IMD_Decile))) + geom_violin(alpha = 0.5, position = position_dodge(1)) + geom_boxplot(alpha=0.5, position = position_dodge(1), width = 0.1) + scale_fill_viridis_d(option = "plasma") + theme(legend.position = "none") + labs(x = "LSOA IMD Decile", y = "Log Equivalised Income") # Plotting

Figure3

Figure 3.

To further explore the alignment between income and neighbourhood socioeconomic context, the difference of income and IMD z-scores were calculated. We found that 46.5% of respondents lived in neighbourhoods where their income z-score differed from the neighbourhood IMD z-score by more than one standard deviation and 12.7% differed by more than two standard deviations.

# Create z-score variables

F_Understanding_Society$Income_z_score <- (F_Understanding_Society$log_Equivalised_Income - mean(F_Understanding_Society$log_Equivalised_Income))/ sd(F_Understanding_Society$log_Equivalised_Income) 
F_Understanding_Society$IMD_z_score <- (F_Understanding_Society$IMD_Score - mean(F_Understanding_Society$IMD_Score))/ sd(F_Understanding_Society$IMD_Score) 

# Create Income IMD z-score difference variable

F_Understanding_Society$Income_IMD_Difference <- F_Understanding_Society$Income_z_score - F_Understanding_Society$IMD_z_score

# Calculate proportion of respondents living in a neighbourhood more than 1 and 2 standard deviations 

mean(abs(F_Understanding_Society$Income_IMD_Difference) >= 1, na.rm = TRUE) * 100

## [1] 46.51622

mean(abs(F_Understanding_Society$Income_IMD_Difference) >= 2, na.rm = TRUE) * 100

## [1] 12.70061

# Display

Figure4 <- ggplot(F_Understanding_Society,
       aes(IMD_z_score, Income_z_score)) + geom_point(alpha = 0.05, size = 0.7) + geom_abline(slope = 1, intercept = 0, colour = "red", linewidth = 1) + geom_abline(slope = 1, intercept = 1, linetype = "dashed") + geom_abline(slope = 1, intercept = -1, linetype = "dashed") + geom_abline(slope = 1, intercept = -2, linetype = "dotted") + geom_abline(slope = 1, intercept = 2, linetype = "dotted") + theme_minimal() + labs( x = "IMD z-score", y = "Income z-score")

Figure4

Figure 4. Alignment between individual income and neighbourhood socioeconomic context. Red line indicates income z-score = IMD z-score. Dashed line indicates a difference of ±1 standard deviation between income z-score and IMD z-score. Dotted line indicates a difference of ±2 standard deviation between income z-score and IMD z-score.

Model

Table 2 presents results from the multilevel linear regression models predicting GHQ-12 scores. In the main effects model, higher income was associated with lower depressive symptom scores (β = −0.037, SE = 0.007, t = −5.266, p < 0.001), while neighbourhood socioeconomic context was positively associated with depressive symptoms score (β = 0.002, SE = 0.0004, t = 4.834, p < 0.001). Our interaction model showed no statistically significant interaction between income and neighbourhood socioeconomic context (β = 9.65 × 10⁻⁶, SE = 0.00049, t = 0.020, p = 0.984).

Characteristic	Main effects model			Interaction effect model
Characteristic	Beta	95% CI	p-value	Beta	95% CI	p-value
Log equivalised income	-0.037	-0.051, -0.023	<0.001	-0.037	-0.051, -0.023	<0.001
Neighbourhood deprivation (IMD)	0.002	0.001, 0.003	<0.001	0.002	0.001, 0.003	<0.001
Age	-0.004	-0.005, -0.003	<0.001	-0.004	-0.005, -0.003	<0.001
Sex	-0.101	-0.111, -0.091	<0.001	-0.101	-0.111, -0.091	<0.001
Log equivalised income * Neighbourhood deprivation (IMD)				0.000	-0.001, 0.001	0.984
Abbreviation: CI = Confidence Interval

References

Comotti A, Barnini T, Fattori A, Paladino ME, Riva MA, Bonzini M, Belingheri M. Rethinking students’ mental health assessment through GHQ-12: evidence from the IRT approach. BMC Psychol. 2024 May 29;12(1):308. doi: 10.1186/s40359-024-01808-4. PMID: 38812050; PMCID: PMC11134724.