This report explores the relationship between economic growth, education levels, and age demographics using data from the World Bank dataset for the Dominican Republic. The analysis focuses on two key questions:
# Loading the necessary libraries.
library(tidyverse)
library(ggplot2)
The data for this analysis is loaded from the provided World Bank dataset.
# Loading the World Bank Economic data for the Dominican Republic.
data <- read_csv("micro_dom_varlabel.csv")
# Displaying the first few rows of the dataset.
head(data)
## # A tibble: 6 × 103
## Economy `Economy Code` Gallup World Poll id…¹ Weight `Respondent is female`
## <chr> <chr> <dbl> <dbl> <chr>
## 1 Dominican… DOM 124778896 0.323 Female
## 2 Dominican… DOM 205085400 0.371 Male
## 3 Dominican… DOM 172115461 0.320 Female
## 4 Dominican… DOM 160018113 1.03 Female
## 5 Dominican… DOM 141033751 1.44 Female
## 6 Dominican… DOM 143983792 0.495 Female
## # ℹ abbreviated name: ¹`Gallup World Poll identifier`
## # ℹ 98 more variables: `Respondent age` <dbl>,
## # `Respondent education level` <chr>,
## # `Within-economy household income quintile` <chr>,
## # `Respondent is in the workforce` <chr>, `Has a debit card` <chr>,
## # `If has debit card: card in own name` <chr>,
## # `If has debit card: used card in past 12 months` <chr>, …
We will rename the columns to make them easier to work with.
# Renaming key columns for clarity.
clean_data <- data %>%
rename(age = `Respondent age`,
education = `Respondent education level`,
income_quintile = `Within-economy household income quintile`)
# Displaying the cleaned and renamed data.
head(clean_data)
## # A tibble: 6 × 103
## Economy `Economy Code` Gallup World Poll id…¹ Weight `Respondent is female`
## <chr> <chr> <dbl> <dbl> <chr>
## 1 Dominican… DOM 124778896 0.323 Female
## 2 Dominican… DOM 205085400 0.371 Male
## 3 Dominican… DOM 172115461 0.320 Female
## 4 Dominican… DOM 160018113 1.03 Female
## 5 Dominican… DOM 141033751 1.44 Female
## 6 Dominican… DOM 143983792 0.495 Female
## # ℹ abbreviated name: ¹`Gallup World Poll identifier`
## # ℹ 98 more variables: age <dbl>, education <chr>, income_quintile <chr>,
## # `Respondent is in the workforce` <chr>, `Has a debit card` <chr>,
## # `If has debit card: card in own name` <chr>,
## # `If has debit card: used card in past 12 months` <chr>,
## # `Used mobile phone or internet to access FI account` <chr>,
## # `Used mobile phone or internet to check account balance` <chr>, …
We will remove rows where the education level is (dk) or (rf) and also remove rows with missing values (NA) in critical columns like age, education, and income quintile.
# Filtering out invalid education values and removing rows with NA in key columns.
clean_data <- clean_data %>%
filter(!(education %in% c("(dk)", "(rf)"))) %>%
drop_na(age, education, income_quintile)
# Viewing the cleaned data.
head(clean_data)
## # A tibble: 6 × 103
## Economy `Economy Code` Gallup World Poll id…¹ Weight `Respondent is female`
## <chr> <chr> <dbl> <dbl> <chr>
## 1 Dominican… DOM 124778896 0.323 Female
## 2 Dominican… DOM 205085400 0.371 Male
## 3 Dominican… DOM 172115461 0.320 Female
## 4 Dominican… DOM 160018113 1.03 Female
## 5 Dominican… DOM 141033751 1.44 Female
## 6 Dominican… DOM 143983792 0.495 Female
## # ℹ abbreviated name: ¹`Gallup World Poll identifier`
## # ℹ 98 more variables: age <dbl>, education <chr>, income_quintile <chr>,
## # `Respondent is in the workforce` <chr>, `Has a debit card` <chr>,
## # `If has debit card: card in own name` <chr>,
## # `If has debit card: used card in past 12 months` <chr>,
## # `Used mobile phone or internet to access FI account` <chr>,
## # `Used mobile phone or internet to check account balance` <chr>, …
We will calculate the mean income quintile by age group and education level to represent economic standing more clearly.
# Converting income_quintile to a factor for proper ordering.
clean_data <- clean_data %>%
mutate(income_quintile = factor(income_quintile,
levels = c("Poorest 20%", "Second 20%", "Middle 20%", "Fourth 20%", "Richest 20%")))
# Converting income_quintile to numeric for mean calculation.
clean_data <- clean_data %>%
mutate(income_numeric = as.numeric(income_quintile))
# Creating age groups based on age, then converting it to a factor to order it correctly.
clean_data <- clean_data %>%
mutate(age_group = case_when(
age < 25 ~ "Under 25",
age >= 25 & age < 45 ~ "25-44",
age >= 45 & age < 65 ~ "45-64",
age >= 65 ~ "65+"
)) %>%
mutate(age_group = factor(age_group, levels = c("Under 25", "25-44", "45-64", "65+")))
# Calculating the mean income quintile by age group and education level.
mean_income <- clean_data %>%
group_by(age_group, education) %>%
summarize(mean_income_quintile = mean(income_numeric, na.rm = TRUE))
# Viewing the summarized data.
mean_income
## # A tibble: 9 × 3
## # Groups: age_group [4]
## age_group education mean_income_quintile
## <fct> <chr> <dbl>
## 1 Under 25 completed primary or less 2.60
## 2 Under 25 secondary 3.04
## 3 25-44 completed primary or less 2.85
## 4 25-44 secondary 3.35
## 5 45-64 completed primary or less 2.95
## 6 45-64 completed tertiary or more 5
## 7 45-64 secondary 3.91
## 8 65+ completed primary or less 2.94
## 9 65+ secondary 4.46
We will plot the mean income quintile on the y-axis and age group on the x-axis, with color representing education levels.
# Scatterplot.
ggplot(mean_income, aes(x = age_group, y = mean_income_quintile, color = education)) +
geom_point(size = 4, alpha = 0.8) +
theme_minimal() +
labs(title = "Mean Income Quintile by Age and Education Level",
x = "Age Group",
y = "Mean Income Quintile",
color = "Education Level")
From the analysis, we can observe the following key insights: