Introduction

This report explores the relationship between economic growth, education levels, and age demographics using data from the World Bank dataset for the Dominican Republic. The analysis focuses on two key questions:

  1. How does economic standing vary by education level?
  2. How does economic standing vary by age group?
# Loading the necessary libraries.

library(tidyverse)
library(ggplot2)

Loading the dataset

The data for this analysis is loaded from the provided World Bank dataset.

# Loading the World Bank Economic data for the Dominican Republic.

data <- read_csv("micro_dom_varlabel.csv")

# Displaying the first few rows of the dataset.

head(data)
## # A tibble: 6 × 103
##   Economy    `Economy Code` Gallup World Poll id…¹ Weight `Respondent is female`
##   <chr>      <chr>                           <dbl>  <dbl> <chr>                 
## 1 Dominican… DOM                         124778896  0.323 Female                
## 2 Dominican… DOM                         205085400  0.371 Male                  
## 3 Dominican… DOM                         172115461  0.320 Female                
## 4 Dominican… DOM                         160018113  1.03  Female                
## 5 Dominican… DOM                         141033751  1.44  Female                
## 6 Dominican… DOM                         143983792  0.495 Female                
## # ℹ abbreviated name: ¹​`Gallup World Poll identifier`
## # ℹ 98 more variables: `Respondent age` <dbl>,
## #   `Respondent education level` <chr>,
## #   `Within-economy household income quintile` <chr>,
## #   `Respondent is in the workforce` <chr>, `Has a debit card` <chr>,
## #   `If has debit card: card in own name` <chr>,
## #   `If has debit card: used card in past 12 months` <chr>, …

Renaming Columns for Clarity

We will rename the columns to make them easier to work with.

# Renaming key columns for clarity.

clean_data <- data %>%
  rename(age = `Respondent age`,
    education = `Respondent education level`,
    income_quintile = `Within-economy household income quintile`)

# Displaying the cleaned and renamed data.

head(clean_data)
## # A tibble: 6 × 103
##   Economy    `Economy Code` Gallup World Poll id…¹ Weight `Respondent is female`
##   <chr>      <chr>                           <dbl>  <dbl> <chr>                 
## 1 Dominican… DOM                         124778896  0.323 Female                
## 2 Dominican… DOM                         205085400  0.371 Male                  
## 3 Dominican… DOM                         172115461  0.320 Female                
## 4 Dominican… DOM                         160018113  1.03  Female                
## 5 Dominican… DOM                         141033751  1.44  Female                
## 6 Dominican… DOM                         143983792  0.495 Female                
## # ℹ abbreviated name: ¹​`Gallup World Poll identifier`
## # ℹ 98 more variables: age <dbl>, education <chr>, income_quintile <chr>,
## #   `Respondent is in the workforce` <chr>, `Has a debit card` <chr>,
## #   `If has debit card: card in own name` <chr>,
## #   `If has debit card: used card in past 12 months` <chr>,
## #   `Used mobile phone or internet to access FI account` <chr>,
## #   `Used mobile phone or internet to check account balance` <chr>, …

Filtering Out Invalid Education Values and Removing Missing Data

We will remove rows where the education level is (dk) or (rf) and also remove rows with missing values (NA) in critical columns like age, education, and income quintile.

# Filtering out invalid education values and removing rows with NA in key columns.

clean_data <- clean_data %>%
  filter(!(education %in% c("(dk)", "(rf)"))) %>%
  drop_na(age, education, income_quintile)

# Viewing the cleaned data.

head(clean_data)
## # A tibble: 6 × 103
##   Economy    `Economy Code` Gallup World Poll id…¹ Weight `Respondent is female`
##   <chr>      <chr>                           <dbl>  <dbl> <chr>                 
## 1 Dominican… DOM                         124778896  0.323 Female                
## 2 Dominican… DOM                         205085400  0.371 Male                  
## 3 Dominican… DOM                         172115461  0.320 Female                
## 4 Dominican… DOM                         160018113  1.03  Female                
## 5 Dominican… DOM                         141033751  1.44  Female                
## 6 Dominican… DOM                         143983792  0.495 Female                
## # ℹ abbreviated name: ¹​`Gallup World Poll identifier`
## # ℹ 98 more variables: age <dbl>, education <chr>, income_quintile <chr>,
## #   `Respondent is in the workforce` <chr>, `Has a debit card` <chr>,
## #   `If has debit card: card in own name` <chr>,
## #   `If has debit card: used card in past 12 months` <chr>,
## #   `Used mobile phone or internet to access FI account` <chr>,
## #   `Used mobile phone or internet to check account balance` <chr>, …

Mean Income Quintile by Age Group and Education Level

We will calculate the mean income quintile by age group and education level to represent economic standing more clearly.

# Converting income_quintile to a factor for proper ordering.

clean_data <- clean_data %>%
  mutate(income_quintile = factor(income_quintile, 
  levels = c("Poorest 20%", "Second 20%", "Middle 20%", "Fourth 20%", "Richest 20%")))

# Converting income_quintile to numeric for mean calculation.

clean_data <- clean_data %>%
  mutate(income_numeric = as.numeric(income_quintile))

# Creating age groups based on age, then converting it to a factor to order it correctly.

clean_data <- clean_data %>%
  mutate(age_group = case_when(
    age < 25 ~ "Under 25",
    age >= 25 & age < 45 ~ "25-44",
    age >= 45 & age < 65 ~ "45-64",
    age >= 65 ~ "65+"
  )) %>%
  mutate(age_group = factor(age_group, levels = c("Under 25", "25-44", "45-64", "65+")))

# Calculating the mean income quintile by age group and education level.

mean_income <- clean_data %>%
  group_by(age_group, education) %>%
  summarize(mean_income_quintile = mean(income_numeric, na.rm = TRUE))

# Viewing the summarized data.

mean_income
## # A tibble: 9 × 3
## # Groups:   age_group [4]
##   age_group education                  mean_income_quintile
##   <fct>     <chr>                                     <dbl>
## 1 Under 25  completed primary or less                  2.60
## 2 Under 25  secondary                                  3.04
## 3 25-44     completed primary or less                  2.85
## 4 25-44     secondary                                  3.35
## 5 45-64     completed primary or less                  2.95
## 6 45-64     completed tertiary or more                 5   
## 7 45-64     secondary                                  3.91
## 8 65+       completed primary or less                  2.94
## 9 65+       secondary                                  4.46

Scatterplot of Mean Income Quintile by Age and Education

We will plot the mean income quintile on the y-axis and age group on the x-axis, with color representing education levels.

# Scatterplot.

ggplot(mean_income, aes(x = age_group, y = mean_income_quintile, color = education)) +
  geom_point(size = 4, alpha = 0.8) +
  theme_minimal() +
  labs(title = "Mean Income Quintile by Age and Education Level",
       x = "Age Group",
       y = "Mean Income Quintile",
       color = "Education Level")

Conclusion

From the analysis, we can observe the following key insights:

  1. Higher education levels (tertiary or more) are strongly associated with representation in the higher income quintiles, especially among younger and middle-aged groups.
  2. Secondary education is critical for maintaining middle to higher income ranges, particularly in older age groups (65+).
  3. Age does not seem to significantly impact income for individuals with higher education levels, as education appears to be the most important factor in determining economic outcomes.