Project 2

Author

Zijin Wang

In this project, I explore the intriguing relationship between depression rates and income levels across various countries. The dataset, titled ‘depression_income.csv’, includes data on depression prevalence, GDP per capita, population, birth rate, and neonatal mortality rate from different countries over several years. This topic holds personal significance for me as it delves into how economic factors can impact mental health, a crucial aspect of public health. The dataset was sourced from our course dataset, and I have cleaned it by handling missing values and outliers, ensuring a robust analysis. This exploration aims to uncover patterns and insights into how economic well-being might correlate with mental health on a global scale.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.3     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
setwd("/Users/zwang30/Downloads")
depression_data <- read_csv("depression_income.csv", show_col_types = FALSE)
# View the first few rows of the dataset
head(depression_data)
# A tibble: 6 × 11
  country     iso3c  year prevalence iso2c gdp_percap population birth_rate
  <chr>       <chr> <dbl>      <dbl> <chr>      <dbl>      <dbl>      <dbl>
1 Afghanistan AFG    1990    318436. AF            NA   12067570       49.0
2 Afghanistan AFG    1991    329045. AF            NA   12789374       48.9
3 Afghanistan AFG    1992    382545. AF            NA   13745630       48.8
4 Afghanistan AFG    1993    440382. AF            NA   14824371       48.8
5 Afghanistan AFG    1994    456917. AF            NA   15869967       48.9
6 Afghanistan AFG    1995    471475. AF            NA   16772522       49.0
# ℹ 3 more variables: neonat_mortal_rate <dbl>, region <chr>, income <chr>
# Data Cleaning
# Assuming gdp_percap has missing values
depression_data <- depression_data %>%
  mutate(gdp_percap = ifelse(is.na(gdp_percap), mean(gdp_percap, na.rm = TRUE), gdp_percap))

# Exploratory Data Analysis
# Checking the structure and summary of the dataset
glimpse(depression_data)
Rows: 6,468
Columns: 11
$ country            <chr> "Afghanistan", "Afghanistan", "Afghanistan", "Afgha…
$ iso3c              <chr> "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "AFG", "A…
$ year               <dbl> 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 199…
$ prevalence         <dbl> 318435.8, 329044.8, 382544.6, 440381.5, 456916.6, 4…
$ iso2c              <chr> "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF", "AF…
$ gdp_percap         <dbl> 12649.5475, 12649.5475, 12649.5475, 12649.5475, 126…
$ population         <dbl> 12067570, 12789374, 13745630, 14824371, 15869967, 1…
$ birth_rate         <dbl> 49.029, 48.896, 48.834, 48.839, 48.898, 48.978, 49.…
$ neonat_mortal_rate <dbl> 52.8, 51.9, 50.9, 49.9, 49.1, 48.2, 47.5, 47.0, 46.…
$ region             <chr> "South Asia", "South Asia", "South Asia", "South As…
$ income             <chr> "Low income", "Low income", "Low income", "Low inco…
summary(depression_data)
   country             iso3c                year        prevalence       
 Length:6468        Length:6468        Min.   :1990   Min.   :      931  
 Class :character   Class :character   1st Qu.:1997   1st Qu.:    73904  
 Mode  :character   Mode  :character   Median :2004   Median :   277645  
                                       Mean   :2004   Mean   :  4407362  
                                       3rd Qu.:2010   3rd Qu.:  1313348  
                                       Max.   :2017   Max.   :264455593  
                                                                         
    iso2c             gdp_percap         population          birth_rate   
 Length:6468        Min.   :   239.7   Min.   :4.397e+04   Min.   : 7.60  
 Class :character   1st Qu.:  4784.7   1st Qu.:1.983e+06   1st Qu.:13.53  
 Mode  :character   Median : 12649.5   Median :7.004e+06   Median :22.07  
                    Mean   : 12649.5   Mean   :3.383e+07   Mean   :24.51  
                    3rd Qu.: 12649.5   3rd Qu.:1.972e+07   3rd Qu.:34.55  
                    Max.   :141442.2   Max.   :1.364e+09   Max.   :55.12  
                                       NA's   :2224        NA's   :2311   
 neonat_mortal_rate    region             income         
 Min.   : 1.00      Length:6468        Length:6468       
 1st Qu.: 6.40      Class :character   Class :character  
 Median :15.30      Mode  :character   Mode  :character  
 Mean   :19.65                                           
 3rd Qu.:30.00                                           
 Max.   :73.10                                           
 NA's   :2368                                            

Filtering and summarizing data

Filter data for the year 2022

depression_data_2022 <- depression_data %>%
  filter(year == 2022)

Visualization with ggplot

ggplot(depression_data, aes(x = region, y = gdp_percap, fill = factor(income))) +
  geom_boxplot(outlier.shape = NA) +  
  scale_fill_manual(values = c("High income" = "#1f77b4", "Low income" = "#ff7f0e", 
                               "Lower middle income" = "#2ca02c", "Upper middle income" = "#d62728", 
                               "Other" = "#7f7f7f")) +  
  labs(title = "GDP per Capita by Region and Income Level",
       x = "Region",
       y = "GDP per Capita (log scale)",
       fill = "Income Level",
       caption = "Data source: [Course dataset]") +
  theme_classic() +  
  theme(axis.text.x = element_text(angle = 45, hjust = 1),  
        legend.position = "bottom") +  
  scale_y_log10(labels = scales::label_comma())  

Mental health is a crucial component of public well-being, and its correlation with economic status is a subject of growing importance in the field of public health research. The dataset I have chosen for this study, titled ‘depression_income.csv’, encompasses a range of variables that offer insights into this relationship. These include depression prevalence, GDP per capita, population, birth rate, and neonatal mortality rate across various countries and years. The variables fall into two categories: quantitative, such as GDP per capita and depression rates, which are numerical and measurable; and categorical, such as country names and income classifications, which are qualitative in nature. The data originates from the dataset collection provided for our course, underscoring the commitment to educational integrity and academic rigor. During the initial phase of analysis, I implemented a cleaning process to refine the dataset for use. This involved imputing missing values for GDP per capita with the mean of the available data, thus avoiding the potential bias that could arise from excluding these entries. Outliers were also assessed and addressed, ensuring they did not skew the analysis due to their disproportionate influence. I was drawn to this dataset out of a personal and academic interest in the socioeconomic determinants of health. The question of how financial stability, or the lack thereof, can impact mental health conditions such as depression is not only of scholarly interest but also carries significant implications for policy-making and social equity.

A review of literature provides a foundational understanding of the issues at hand. Studies have consistently shown a link between lower economic status and higher instances of depression, suggesting that poverty and financial stress may exacerbate the risk of developing mental health issues. Conversely, individuals in higher economic strata often have better access to mental health resources and preventive care, which can lead to lower prevalence rates of depression. This pattern reflects a broader narrative within public health that emphasizes the need for a holistic approach to mental health care that considers socioeconomic factors.

The visualization created from this dataset illustrates the distribution of GDP per capita across different regions, categorized by income level. The use of a logarithmic scale allows for a more nuanced interpretation of the data, highlighting the vast disparities in economic status between regions. Interestingly, the boxplot reveals that regions classified under ‘High income: OECD’ show a narrower interquartile range but also a higher median GDP per capita compared to other high-income regions. This suggests a more uniform distribution of wealth within OECD countries. One limitation encountered in this visualization is the challenge of representing the full complexity of the data. For instance, the diversity within regions such as Sub-Saharan Africa or Latin America and the Caribbean is vast, and a single economic indicator may not capture the intricate variations in local contexts. Additionally, the analysis could be deepened with a longitudinal perspective, examining how the economic indicators and depression prevalence rates have evolved over time, which was outside the scope of this current project.

In conclusion, this exploration has revealed intriguing patterns in the data that suggest a significant correlation between economic status and mental health. The findings reinforce the notion that mental health advocacy must be informed by an understanding of economic disparities, and they prompt further investigation into how policy interventions can address these intertwined challenges.