Economics of Mental Health: Exploring the link Between GDP and Depression
Author
Martia Eyi
Credit: Image generated by ChatGPT
Introduction
This project explores the relationship between income levels and economic depression. I chose this topic because the term “depression” personally resonates with me. While it’s most commonly associated with mental health, I was intrigued by how it might also apply in an economic context. This curiosity led me to investigate how income levels and economic downturns interact on a global scale, and whether there is a measurable connection between the two.
The dataset used in this analysis was obtained from the World Bank Open Data platform. The data was collected by the World Bank, an international financial institution that provides access to economic, social, and health statistics from member countries. Although the dataset did not include a ReadMe file or documentation describing the exact data collection methodology, it is likely that the information was gathered through national statistical offices, household health surveys, and international monitoring programs.
The dataset includes key variables such as country name, year, depression prevalence, and GDP per capita, with indicator codes representing each type of data. These variables allow for both cross-sectional and time-series analysis across regions and income classifications. Categorical variables include country and region, while quantitative variables include year, GDP per capita, and depression prevalence. This structure enables a meaningful analysis of the global relationship between economic conditions and mental health outcomes.
To prepare the data for analysis, I performed data cleaning by removing entries with missing or null values. I also narrowed the focus to more recent years in order to ensure that the insights drawn would be timely and relevant. These steps helped maintain the accuracy and integrity of the dataset, allowing for a more meaningful and up-to-date analysis.
Loading libraries for analysis
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.2 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
library(dplyr)library(ggthemes)library(ggplot2)
Importing the dataset and setting the working directory
To begin the analysis, I set the working directory to the location where my dataset is stored. Then, I loaded the dataset using the read.csv() function and displayed the first few rows with head() to verify that the data was imported correctly. This step ensures that I can easily access and reference the dataset throughout the project.
# Setting working directorysetwd("C:/Users/MCuser/Downloads")# Loading the datasetdepression_data <-read_csv("depression_income.csv")
Rows: 6468 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): country, iso3c, iso2c, region, income
dbl (6): year, prevalence, gdp_percap, population, birth_rate, neonat_mortal...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# View depression datahead(depression_data)
# A tibble: 6 × 11
country iso3c year prevalence iso2c gdp_percap population birth_rate
<chr> <chr> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 Afghanistan AFG 1990 318436. AF NA 12067570 49.0
2 Afghanistan AFG 1991 329045. AF NA 12789374 48.9
3 Afghanistan AFG 1992 382545. AF NA 13745630 48.8
4 Afghanistan AFG 1993 440382. AF NA 14824371 48.8
5 Afghanistan AFG 1994 456917. AF NA 15869967 48.9
6 Afghanistan AFG 1995 471475. AF NA 16772522 49.0
# ℹ 3 more variables: neonat_mortal_rate <dbl>, region <chr>, income <chr>
Identifying missing data in the dataset
Before performing any cleaning or analysis, it’s important to assess if the dataset contains missing values. Using the colSums(is.na()) function, I calculated how many missing values are present in each variable. This helps determine which columns may need to be cleaned or handled differently to ensure accurate results.
# Checking for missing valuescolSums(is.na(depression_data)) # Summarizes how many missing values exist in each variable
country iso3c year prevalence
0 980 0 0
iso2c gdp_percap population birth_rate
2243 2567 2224 2311
neonat_mortal_rate region income
2368 2218 2218
After identifying columns with missing values, I decided to remove all rows that contain any NAs. This ensures the dataset used for analysis is complete and reduces the risk of biased or inaccurate results due to incomplete records.
# Cleaning data: removing rows with missing values in key columnsdepression_data <- depression_data %>%filter(!is.na(prevalence),!is.na(gdp_percap),!is.na(region),!is.na(income) ) # Keeps rows only where all key variables are not missing
To get a general understanding of the dataset, I used the summary() function to display basic descriptive statistics. This includes the minimum, maximum, mean, and quartiles for each variable, helping identify patterns and possible outliers.
# Generating summary statistics for key variablessummary(depression_data)
country iso3c year prevalence
Length:3901 Length:3901 Min. :1990 Min. : 931
Class :character Class :character 1st Qu.:1996 1st Qu.: 70282
Mode :character Mode :character Median :2002 Median : 215531
Mean :2002 Mean : 1204673
3rd Qu.:2008 3rd Qu.: 604661
Max. :2014 Max. :54949281
iso2c gdp_percap population birth_rate
Length:3901 Min. : 239.7 Min. :4.730e+04 Min. : 7.60
Class :character 1st Qu.: 2158.3 1st Qu.:2.337e+06 1st Qu.:13.40
Mode :character Median : 6474.7 Median :7.503e+06 Median :22.15
Mean : 12649.5 Mean :3.604e+07 Mean :24.39
3rd Qu.: 17409.9 3rd Qu.:2.054e+07 3rd Qu.:34.42
Max. :141442.2 Max. :1.364e+09 Max. :55.12
NA's :38
neonat_mortal_rate region income
Min. : 1.00 Length:3901 Length:3901
1st Qu.: 6.10 Class :character Class :character
Median :15.10 Mode :character Mode :character
Mean :19.25
3rd Qu.:29.30
Max. :73.10
NA's :48
To simplify the analysis, I selected only the most relevant variables from the dataset: country, year, region, prevalence, GDP per capita, and income classification. I also filtered the dataset to include only the most recent year of data available, which allows for a focused, cross-sectional analysis of global depression and income levels.
# selecting relevant columnsdepression_data_selescted <- depression_data %>%select(country, year, region, prevalence, gdp_percap, income)# Filtering for the most recent year recent_data <- depression_data %>%filter(year ==max(year))
Background Research
The intersection between economic stability and mental health has been widely discussed in global public health research. According to the World Health Organization (2023), depression is a leading cause of disability worldwide and disproportionately affects populations facing economic hardship. Individuals living in low and middle income countries often experience barriers such as poverty, poor access to mental health services, and underfunded healthcare systems, which contribute to a higher prevalence of untreated depression.
By examining both GDP per capita and depression rates, this project aims to identify whether national income levels are meaningfully associated with mental health outcomes. If a relationship exists, it would suggest that economic conditions—such as poverty, unemployment, and underfunded healthcare systems—play a significant role in shaping population well-being.
This kind of analysis is important because it can inform policies that address both economic development and mental health access at the same time. In particular, low- and middle-income countries may benefit from integrated strategies that consider mental health support as part of broader efforts to improve quality of life and social stability.
References (APA Style)
World Health Organization. (2023). Depression and other common mental disorders: Global health estimates. Retrieved from https://www.who.int/publications/i/item/depression-estimates
Patel, V., Saxena, S., Lund, C., Thornicroft, G., Baingana, F., Bolton, P., … & UnÜtzer, J. (2018). The Lancet Commission on global mental health and sustainable development. The Lancet, 392(10157), 1553–1598. https://doi.org/10.1016/S0140-6736(18)31612-X
Bulding and Interpreting a multiple linear regression model
To explore the relationship between depression prevalence and economic variables, I created a multiple linear regression model using region and GDP per capita as predictors. This helps assess how these two factors influence depression rates across countries. Below, I summarize the regression output to evaluate the model’s coefficients, significance, and explanatory power.
# Building a Multiple Linear Regression Model (I used chatGpt to creatw it)linear_model <-lm(prevalence ~ region + gdp_percap, data = depression_data) # Summarizing the Regression Outputsummary(linear_model)
Call:
lm(formula = prevalence ~ region + gdp_percap, data = depression_data)
Residuals:
Min 1Q Median 3Q Max
-6398277 -517459 -318199 -53321 51763799
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.272e+06 2.115e+05 15.469 < 2e-16 ***
regionEurope & Central Asia -2.487e+06 2.527e+05 -9.843 < 2e-16 ***
regionLatin America & Caribbean -2.723e+06 2.676e+05 -10.175 < 2e-16 ***
regionMiddle East & North Africa -2.725e+06 3.248e+05 -8.389 < 2e-16 ***
regionNorth America 1.909e+06 5.858e+05 3.258 0.00113 **
regionSouth Asia 3.151e+06 3.998e+05 7.882 4.16e-15 ***
regionSub-Saharan Africa -2.823e+06 2.510e+05 -11.249 < 2e-16 ***
gdp_percap -6.494e+00 5.448e+00 -1.192 0.23331
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4548000 on 3893 degrees of freedom
Multiple R-squared: 0.1044, Adjusted R-squared: 0.1028
F-statistic: 64.83 on 7 and 3893 DF, p-value: < 2.2e-16
Evaluating the regression model with diagnostic plots
These diagnostic plots help determine whether the assumptions of linear regression are met. They allow us to assess the normality of residuals, detect outliers, and identify potential issues.
library(ggfortify)# Set a custom theme before plottingtheme_set(theme_minimal(base_size =12)) # You can also try theme_bw() or theme_classic() # Diagnostic plots for the regression modelautoplot(linear_model, which =1:4, nrow =2, ncol =2, colour ="#1f77b4") # Blue color
Diagnostic plot
I created this set of diagnostic plots to check whether my linear regression model meets the necessary assumptions. The residuals appear unevenly spread and the Q-Q plot shows some deviation from normality, suggesting mild violations. A few outliers and influential points are also visible in the Cook’s distance plot. If I had more time, I would explore how removing those points or transforming variables could improve the model.
Scatter plot: Regional averages od depression vs GDP per capita
To better understand the relationship between average depression rates and economic conditions across regions, I created an interactive scatter plot. Each point represents a regional average for GDP per capita and depression prevalence. A regression line is added to show the overall trend across these regions.
library(plotly)library(viridis)
Loading required package: viridisLite
# Calculate average prevalence and GDP by regionavg_prevalence_region <- recent_data %>%group_by(region) %>%summarise(avg_prevalence =mean(prevalence, na.rm =TRUE),avg_gdp_percap =mean(gdp_percap, na.rm =TRUE) )# Create base plotp <-ggplot(avg_prevalence_region, aes(x = avg_gdp_percap, y = avg_prevalence, color = region, text = region)) +geom_point(size =4) +geom_smooth(method ="lm", se =FALSE, color ="black") +scale_color_viridis_d(option ="D", begin =0.2, end =0.8) +labs(title ="Regional Averages: Depression vs GDP per Capita",x ="Average GDP per Capita (USD)",y ="Average Depression Prevalence (%)",color ="Region" ) +theme_minimal()# Convert to interactiveggplotly(p, tooltip =c("text", "x", "y"))
`geom_smooth()` using formula = 'y ~ x'
Interactive line plot: Depression trends over time by region
To better understand how depression prevalence has evolved over time across different regions, I created an interactive line plot. Each line represents the total depression prevalence for a region over the years. This helps highlight rising or declining trends and compare patterns between regions.
# Summarize total prevalence by region and yearregion_prevalence <- depression_data %>%group_by(region, year) %>%summarise(total_prevalence =sum(prevalence, na.rm =TRUE),.groups ="drop" )# Create the line plotline_plot <-ggplot(region_prevalence, aes(x = year, y = total_prevalence, color = region, text = region)) +geom_line(size =1.2) +scale_color_viridis_d(option ="D") +labs(title ="Depression Trends Over Time by Region",x ="Year",y ="Total Depression Prevalence (%)",color ="Region",caption ="Source: World Bank Open Database" ) +theme_minimal() +theme(plot.title =element_text(face ="bold", hjust =0.5),legend.position ="bottom" )
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
# Convert to interactive plotggplotly(line_plot, tooltip =c("text", "x", "y"))
Conclusion
Through this project, I explored the connection between GDP per capita and depression prevalence using global data. The visualizations showed a clear pattern: regions with lower income levels tend to report higher levels of depression. The interactive scatter plot and line plot made it easy to observe how this relationship varies across regions and over time. One interesting pattern was that some high-GDP regions still reported relatively high depression rates, suggesting that economic wealth alone does not guarantee better mental health outcomes.
I was surprised to see how strong the regional differences were, especially in the regression model, where certain regions had a significant impact on predicted depression levels. If I had more time, I would have liked to explore additional variables such as healthcare spending or social support indicators. Despite that, the project helped me better understand how economic indicators can be used to study mental health on a global scale.