The topic of this project revolves around exploring the relationship between income levels and economic depression. I chose this topic because the word “depression” resonates with me. When we typically talk about “depression,” it’s often in the context of mental health. However, I was curious to explore how the term applies to income levels and what economic depression means on a global scale. I wanted to understand whether there is a significant link between economic downturns and a country’s income levels.
The dataset used for this project was sourced from the World Bank Open Data. It contains several variables, such as the country, year, region, income level, prevalence of economic depression, and GDP per capita. The variables are a mix of categorical (such as country, region, and income level) and quantitative (such as year, prevalence, and GDP per capita) data. The country variable provides information on which country each data entry corresponds to, while year indicates the year of the observation. The region variable helps categorize countries into broader geographical groups. The income variable classifies countries based on their income levels, such as low, middle, or high income. The prevalence variable represents the percentage of the population that experiences economic depression, and GDP per capita offers an insight into the economic output per person.
To ensure the dataset’s integrity, I cleaned it by removing any rows with missing or NA values. I also focused on the most recent years in the dataset to avoid using outdated information. This was done to enhance the relevance and timeliness of the analysis, ensuring that only accurate and current data was included in the final analysis.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
library(readr)library(viridis)
Loading required package: viridisLite
# Setting working directorysetwd("C:/Users/akais/OneDrive/Documents/Project2 Dataset")# Loading the datasetdepression_data <-read_csv("depression_income.csv")
Rows: 6468 Columns: 11
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): country, iso3c, iso2c, region, income
dbl (6): year, prevalence, gdp_percap, population, birth_rate, neonat_mortal...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#view depression datahead(depression_data)
# A tibble: 6 × 11
country iso3c year prevalence iso2c gdp_percap population birth_rate
<chr> <chr> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 Afghanistan AFG 1990 318436. AF NA 12067570 49.0
2 Afghanistan AFG 1991 329045. AF NA 12789374 48.9
3 Afghanistan AFG 1992 382545. AF NA 13745630 48.8
4 Afghanistan AFG 1993 440382. AF NA 14824371 48.8
5 Afghanistan AFG 1994 456917. AF NA 15869967 48.9
6 Afghanistan AFG 1995 471475. AF NA 16772522 49.0
# ℹ 3 more variables: neonat_mortal_rate <dbl>, region <chr>, income <chr>
# Checking for missing valuescolSums(is.na(depression_data)) # Display the count of missing values in each column
country iso3c year prevalence
0 980 0 0
iso2c gdp_percap population birth_rate
2243 2567 2224 2311
neonat_mortal_rate region income
2368 2218 2218
# Removing rows with missing values in critical columnsdepression_data <- depression_data%>%drop_na() # Remove rows with any missing values
# View summary statisticssummary(depression_data)
country iso3c year prevalence
Length:3793 Length:3793 Min. :1990 Min. : 1107
Class :character Class :character 1st Qu.:1996 1st Qu.: 75775
Mode :character Mode :character Median :2002 Median : 230360
Mean :2002 Mean : 1237921
3rd Qu.:2008 3rd Qu.: 632828
Max. :2014 Max. :54949281
iso2c gdp_percap population birth_rate
Length:3793 Min. : 239.7 Min. :5.140e+04 Min. : 7.60
Class :character 1st Qu.: 2092.8 1st Qu.:2.662e+06 1st Qu.:13.41
Mode :character Median : 6376.3 Median :7.934e+06 Median :22.24
Mean : 12480.9 Mean :3.703e+07 Mean :24.46
3rd Qu.: 17029.1 3rd Qu.:2.139e+07 3rd Qu.:34.51
Max. :141442.2 Max. :1.364e+09 Max. :55.12
neonat_mortal_rate region income
Min. : 1.00 Length:3793 Length:3793
1st Qu.: 6.00 Class :character Class :character
Median :14.90 Mode :character Mode :character
Mean :19.28
3rd Qu.:29.50
Max. :73.10
# Filtering for the most recent year for analysisrecent_data <- depression_data %>%filter(year ==max(year))
Background Research
Economic depression refers to a prolonged period of economic downturn characterized by a significant decrease in economic activity, rising unemployment, and declining GDP. While often associated with periods of financial crises or recessions, it can also be seen in countries with slower or stagnating growth. According to a report by the International Monetary Fund (IMF), economic depression can lead to income inequality and social instability, as well as increased rates of poverty and reduced access to essential services like healthcare and education.
In addition, it is important to understand that economic depression impacts countries differently based on their income levels. For instance, low-income countries are often more vulnerable to the effects of economic depression due to limited financial resources and weaker economic infrastructure. On the other hand, high-income countries may have better resilience due to stronger economies and financial systems. According to the World Bank, countries experiencing economic
references:
www.imf.org Recession: When Bad Times Prevail
www.worldbank.org
# Creating the linear modellinear_model <-lm(prevalence ~ region + gdp_percap, data = recent_data)# Checking the linaer model summarysummary(linear_model)
Call:
lm(formula = prevalence ~ region + gdp_percap, data = recent_data)
Residuals:
Min 1Q Median 3Q Max
-6979345 -607120 -380528 -72896 50910757
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.102e+06 1.351e+06 3.035 0.00285 **
regionEurope & Central Asia -3.291e+06 1.559e+06 -2.110 0.03654 *
regionLatin America & Caribbean -3.418e+06 1.694e+06 -2.017 0.04548 *
regionMiddle East & North Africa -3.406e+06 1.983e+06 -1.718 0.08799 .
regionNorth America 4.333e+06 4.216e+06 1.028 0.30579
regionSouth Asia 2.942e+06 2.359e+06 1.247 0.21438
regionSub-Saharan Africa -3.491e+06 1.573e+06 -2.220 0.02796 *
gdp_percap -4.772e+00 2.612e+01 -0.183 0.85532
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 5555000 on 146 degrees of freedom
Multiple R-squared: 0.1093, Adjusted R-squared: 0.06656
F-statistic: 2.559 on 7 and 146 DF, p-value: 0.01628
# Diagnostic plots for the linear modelautoplot(linear_model, which =1:4, nrow =2, ncol =2)
Linear Regression Model
I created a linear regression model to assess the impact of region and GDP per capita on the prevalence of economic depression.
Prevalence=4101771+(−3290763)×Region+(−3417591)×GDP per capita
This model suggests a negative relationship between GDP per capita and depression prevalence, meaning higher GDP per capita is associated with lower depression prevalence across regions.
Prevalence vs. GDP per capita by Region
I created a scatter plot to examine the average prevalence of depression across regions in relation to GDP per capita:
# Calculate average prevalence by regionavg_prevalence_region <- recent_data %>%group_by(region) %>%summarise(avg_prevalence =mean(prevalence, na.rm =TRUE), # Calculate average prevalenceavg_gdp_percap =mean(gdp_percap, na.rm =TRUE)) # Calculate average GDP per capita# Creating a scatter plot of avg prevalence vs avg GDP per capitaggplot(avg_prevalence_region, aes(x = avg_gdp_percap, y = avg_prevalence, color = region)) +geom_point(size =4) +# Scatter plotgeom_smooth(method ="lm", se =FALSE, col ="black") +# Adding regression line labs(title ="Avg Prevalence vs. Avg GDP per Capita by Region",x ="Avg GDP per Capita",y ="Avg Prevalence") +theme_minimal() +theme(legend.position ="right") # Adjusting the legend position
`geom_smooth()` using formula = 'y ~ x'
Prevalence by Region Over Time
I created an interactive stacked area plot to visualize how the prevalence of depression has changed over time across different regions:
# Summarizing prevalence by region over timeregion_prevalence <- depression_data %>%group_by(region, year) %>%summarise(total_prevalence =sum(prevalence, na.rm =TRUE), .groups ='drop')# Creating the interactive stacked area plotinteractive_area_plot <-ggplot(region_prevalence, aes(x = year, y = total_prevalence, fill = region)) +geom_area(color ="white", alpha =0.7) +# Adding transparent areas with white bordersscale_fill_viridis_d(option ="cividis") +# Non-default palettelabs(title ="Prevalence by Region vs Income Over Time",x ="Year",y ="Total Depression Prevalence (%)",fill ="Region",caption ="Source: World Bank Open Database" ) +theme_classic(base_size =14) +# Using a different theme for a better looktheme(plot.title =element_text(face ="bold", hjust =0.5, size =16), # Centered and bold titleaxis.title =element_text(size =12),legend.position ="bottom",legend.title =element_text(size =10),legend.text =element_text(size =9) )# Converting to an interactive plot using plotlyinteractive_area_plotly <-ggplotly(interactive_area_plot, tooltip =c("x", "y", "fill"))# Display the interactive plotinteractive_area_plotly
Conclusion
This analysis explores the relationship between economic depression and income levels globally. The results suggest that lower GDP per capita is associated with higher income depression rates, which aligns with the idea that poorer economies face greater challenges in addressing the mental health needs of their populations. What surprised me during the analysis was seeing that the Middle East & North Africa region had the lowest income depression rate, while East Asia and the Pacific had the highest. This was an unexpected finding, given the diverse economic conditions in these regions. The visualizations provided a clear representation of how income depression rates vary by region and income levels, and I also considered incorporating the birth rate to see if there were any correlations with depression prevalence. This could be an interesting avenue for future exploration. Overall, the findings provide a basis for further research into economic policies and mental health interventions.