In this project, I explore the intriguing relationship between depression rates and income levels across various countries. The dataset, titled ‘depression_income.csv’, includes data on depression prevalence, GDP per capita, population, birth rate, and neonatal mortality rate from different countries over several years. This topic holds personal significance for me as it delves into how economic factors can impact mental health, a crucial aspect of public health. The dataset was sourced from our course dataset, and I have cleaned it by handling missing values and outliers, ensuring a robust analysis. This exploration aims to uncover patterns and insights into how economic well-being might correlate with mental health on a global scale.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.3 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# View the first few rows of the datasethead(depression_data)
# A tibble: 6 × 11
country iso3c year prevalence iso2c gdp_percap population birth_rate
<chr> <chr> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 Afghanistan AFG 1990 318436. AF NA 12067570 49.0
2 Afghanistan AFG 1991 329045. AF NA 12789374 48.9
3 Afghanistan AFG 1992 382545. AF NA 13745630 48.8
4 Afghanistan AFG 1993 440382. AF NA 14824371 48.8
5 Afghanistan AFG 1994 456917. AF NA 15869967 48.9
6 Afghanistan AFG 1995 471475. AF NA 16772522 49.0
# ℹ 3 more variables: neonat_mortal_rate <dbl>, region <chr>, income <chr>
# Data Cleaning# Assuming gdp_percap has missing valuesdepression_data <- depression_data %>%mutate(gdp_percap =ifelse(is.na(gdp_percap), mean(gdp_percap, na.rm =TRUE), gdp_percap))# Exploratory Data Analysis# Checking the structure and summary of the datasetglimpse(depression_data)
country iso3c year prevalence
Length:6468 Length:6468 Min. :1990 Min. : 931
Class :character Class :character 1st Qu.:1997 1st Qu.: 73904
Mode :character Mode :character Median :2004 Median : 277645
Mean :2004 Mean : 4407362
3rd Qu.:2010 3rd Qu.: 1313348
Max. :2017 Max. :264455593
iso2c gdp_percap population birth_rate
Length:6468 Min. : 239.7 Min. :4.397e+04 Min. : 7.60
Class :character 1st Qu.: 4784.7 1st Qu.:1.983e+06 1st Qu.:13.53
Mode :character Median : 12649.5 Median :7.004e+06 Median :22.07
Mean : 12649.5 Mean :3.383e+07 Mean :24.51
3rd Qu.: 12649.5 3rd Qu.:1.972e+07 3rd Qu.:34.55
Max. :141442.2 Max. :1.364e+09 Max. :55.12
NA's :2224 NA's :2311
neonat_mortal_rate region income
Min. : 1.00 Length:6468 Length:6468
1st Qu.: 6.40 Class :character Class :character
Median :15.30 Mode :character Mode :character
Mean :19.65
3rd Qu.:30.00
Max. :73.10
NA's :2368
ggplot(depression_data, aes(x = region, y = gdp_percap, fill =factor(income))) +geom_boxplot(outlier.shape =NA) +scale_fill_manual(values =c("High income"="#1f77b4", "Low income"="#ff7f0e", "Lower middle income"="#2ca02c", "Upper middle income"="#d62728", "Other"="#7f7f7f")) +labs(title ="GDP per Capita by Region and Income Level",x ="Region",y ="GDP per Capita (log scale)",fill ="Income Level",caption ="Data source: [Course dataset]") +theme_classic() +theme(axis.text.x =element_text(angle =45, hjust =1), legend.position ="bottom") +scale_y_log10(labels = scales::label_comma())
Mental health is a crucial component of public well-being, and its correlation with economic status is a subject of growing importance in the field of public health research. The dataset I have chosen for this study, titled ‘depression_income.csv’, encompasses a range of variables that offer insights into this relationship. These include depression prevalence, GDP per capita, population, birth rate, and neonatal mortality rate across various countries and years. The variables fall into two categories: quantitative, such as GDP per capita and depression rates, which are numerical and measurable; and categorical, such as country names and income classifications, which are qualitative in nature. The data originates from the dataset collection provided for our course, underscoring the commitment to educational integrity and academic rigor. During the initial phase of analysis, I implemented a cleaning process to refine the dataset for use. This involved imputing missing values for GDP per capita with the mean of the available data, thus avoiding the potential bias that could arise from excluding these entries. Outliers were also assessed and addressed, ensuring they did not skew the analysis due to their disproportionate influence. I was drawn to this dataset out of a personal and academic interest in the socioeconomic determinants of health. The question of how financial stability, or the lack thereof, can impact mental health conditions such as depression is not only of scholarly interest but also carries significant implications for policy-making and social equity.
A review of literature provides a foundational understanding of the issues at hand. Studies have consistently shown a link between lower economic status and higher instances of depression, suggesting that poverty and financial stress may exacerbate the risk of developing mental health issues. Conversely, individuals in higher economic strata often have better access to mental health resources and preventive care, which can lead to lower prevalence rates of depression. This pattern reflects a broader narrative within public health that emphasizes the need for a holistic approach to mental health care that considers socioeconomic factors.
The visualization created from this dataset illustrates the distribution of GDP per capita across different regions, categorized by income level. The use of a logarithmic scale allows for a more nuanced interpretation of the data, highlighting the vast disparities in economic status between regions. Interestingly, the boxplot reveals that regions classified under ‘High income: OECD’ show a narrower interquartile range but also a higher median GDP per capita compared to other high-income regions. This suggests a more uniform distribution of wealth within OECD countries. One limitation encountered in this visualization is the challenge of representing the full complexity of the data. For instance, the diversity within regions such as Sub-Saharan Africa or Latin America and the Caribbean is vast, and a single economic indicator may not capture the intricate variations in local contexts. Additionally, the analysis could be deepened with a longitudinal perspective, examining how the economic indicators and depression prevalence rates have evolved over time, which was outside the scope of this current project.
In conclusion, this exploration has revealed intriguing patterns in the data that suggest a significant correlation between economic status and mental health. The findings reinforce the notion that mental health advocacy must be informed by an understanding of economic disparities, and they prompt further investigation into how policy interventions can address these intertwined challenges.