Introduction

BCG vaccinations are an essential tool in maintaining global health. Understanding the socioeconomic and gender dynamics of their distribution can provide insight into areas of success and potential improvement. This analysis assesses the impact of factors such as income, education, place of residence, and gender on BCG vaccination rates, leveraging data from the WHO Health Equity Monitor database.

The dataset can be found here.

Pre-requisites & Data Loading

Before diving into the data, let’s set up our environment and load the necessary libraries and data.

Data Loading & Initial Exploration

Let’s load our dataset from the WHO Health Inequality Data Repository and take an initial look at its structure.

data <- read.csv("./data.csv")
str(data)
## 'data.frame':    7473 obs. of  24 variables:
##  $ setting             : chr  "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
##  $ date                : int  2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
##  $ source              : chr  "MICS" "MICS" "MICS" "MICS" ...
##  $ indicator_abbr      : chr  "bcg" "bcg" "bcg" "bcg" ...
##  $ indicator_name      : chr  "BCG immunization coverage among one-year-olds (%)" "BCG immunization coverage among one-year-olds (%)" "BCG immunization coverage among one-year-olds (%)" "BCG immunization coverage among one-year-olds (%)" ...
##  $ dimension           : chr  "Economic status (wealth quintile)" "Economic status (wealth quintile)" "Economic status (wealth quintile)" "Economic status (wealth quintile)" ...
##  $ subgroup            : chr  "Quintile 1 (poorest)" "Quintile 2" "Quintile 3" "Quintile 4" ...
##  $ estimate            : int  54 62 58 65 78 61 76 86 60 77 ...
##  $ se                  : int  4 3 3 4 2 2 4 3 2 2 ...
##  $ ci_lb               : int  45 56 51 57 73 57 68 78 56 73 ...
##  $ ci_ub               : int  62 68 65 72 82 65 83 91 64 81 ...
##  $ population          : int  532 549 495 473 447 2267 122 108 2060 436 ...
##  $ flag                : chr  "" "" "" "" ...
##  $ setting_average     : int  63 63 63 63 63 63 63 63 63 63 ...
##  $ iso3                : chr  "AFG" "AFG" "AFG" "AFG" ...
##  $ favourable_indicator: int  1 1 1 1 1 1 1 1 1 1 ...
##  $ indicator_scale     : int  100 100 100 100 100 100 100 100 100 100 ...
##  $ ordered_dimension   : int  1 1 1 1 1 1 1 1 0 0 ...
##  $ subgroup_order      : int  1 2 3 4 5 1 2 3 0 0 ...
##  $ reference_subgroup  : int  0 0 0 0 0 0 0 0 0 1 ...
##  $ whoreg6             : chr  "Eastern Mediterranean" "Eastern Mediterranean" "Eastern Mediterranean" "Eastern Mediterranean" ...
##  $ wbincome2022        : chr  "Low income" "Low income" "Low income" "Low income" ...
##  $ dataset_id          : chr  "rep_tb" "rep_tb" "rep_tb" "rep_tb" ...
##  $ update              : chr  "06 December 2021" "06 December 2021" "06 December 2021" "06 December 2021" ...

Data Cleaning & Pre-processing

Standardizing Dataset Columns

The dataset contains multiple indicators. We’ll filter out only the data relevant to BCG vaccinations and make necessary modifications to the column names for clarity.

data_1 <- data %>% 
  mutate(country = setting, year = date) %>%
  filter(indicator_abbr == "bcg") %>%
  select(-setting, -date, -flag, -reference_subgroup)

# Income categories
income <- table(data_1$wbincome2022)

Addressing Data Discrepancies

Before we proceed further, it’s crucial to address any data inconsistencies. For instance, we’ve noticed some misspelled country names which need correction.

# Correcting country names
misspelled_countries <- c("T\xfcrkiye", "C\xf4te d'Ivoire", "Viet Nam")
correct_countries <- c("Turkey", "Côte d'Ivoire", "Vietnam")
data_1$country <- as.character(data_1$country) # Ensure it's a character vector
for (i in seq_along(misspelled_countries)) {
  data_1$country[data_1$country == misspelled_countries[i]] <- correct_countries[i]
}

data <- data_1

Structuring Data based on Dimensions

The data column ‘subgroup’ embeds information on economic status, education, place of residence, and gender. For ease of analysis, it would be beneficial to split this data into separate columns.

# Separating data based on dimensions
economic_status_data <- data %>%
  filter(dimension == 'Economic status (wealth quintile)') %>%
  pivot_wider(names_from = 'dimension', values_from = 'subgroup', names_prefix = 'economic_status_') %>%
  select(-contains("indicator")) %>%
  mutate(econ_quintile = `economic_status_Economic status (wealth quintile)`)


education_data <- data %>%
  filter(dimension == 'Education (3 groups)') %>%
  pivot_wider(names_from = 'dimension', values_from = 'subgroup', names_prefix = 'education_')%>%
  select(-contains("indicator"))


place_of_residence_data <- data %>%
  filter(dimension == 'Place of residence') %>%
  pivot_wider(names_from = 'dimension', values_from = 'subgroup', names_prefix = 'place_of_residence_') %>%
  select(-contains("indicator")) %>%
  mutate(residence_type = `place_of_residence_Place of residence`)

sex_data <- data %>%
  filter(dimension == 'Sex') %>%
  pivot_wider(names_from = 'dimension', values_from = 'subgroup', names_prefix = 'sex_') %>%
  mutate(sex = sex_Sex) %>%
  select(-contains("indicator"))

Grouping and Summarizing

To identify trends, we’ll aggregate data by various categories within each dimension.

a <- economic_status_data %>%
  group_by(country, econ_quintile) %>%
  filter(!is.na(estimate)) %>%
  summarise(mean(estimate))
## `summarise()` has grouped output by 'country'. You can override using the
## `.groups` argument.
head(a)
## # A tibble: 6 × 3
## # Groups:   country [2]
##   country     econ_quintile        `mean(estimate)`
##   <chr>       <chr>                           <dbl>
## 1 Afghanistan Quintile 1 (poorest)             59.5
## 2 Afghanistan Quintile 2                       65  
## 3 Afghanistan Quintile 3                       65  
## 4 Afghanistan Quintile 4                       72  
## 5 Afghanistan Quintile 5 (richest)             81  
## 6 Algeria     Quintile 1 (poorest)             97
b <- education_data %>%
  group_by(country, `education_Education (3 groups)`) %>%
  filter(!is.na(estimate)) %>%
  summarise(mean(estimate))
## `summarise()` has grouped output by 'country'. You can override using the
## `.groups` argument.
head(b)
## # A tibble: 6 × 3
## # Groups:   country [2]
##   country     `education_Education (3 groups)` `mean(estimate)`
##   <chr>       <chr>                                       <dbl>
## 1 Afghanistan No education                                 66  
## 2 Afghanistan Primary education                            81  
## 3 Afghanistan Secondary or higher education                87  
## 4 Algeria     No education                                 97  
## 5 Algeria     Primary education                            98.5
## 6 Algeria     Secondary or higher education                98
c <- place_of_residence_data %>%
  group_by(country, residence_type) %>%
  filter(!is.na(estimate)) %>%
  summarise(mean(estimate))
## `summarise()` has grouped output by 'country'. You can override using the
## `.groups` argument.
head(c)
## # A tibble: 6 × 3
## # Groups:   country [3]
##   country     residence_type `mean(estimate)`
##   <chr>       <chr>                     <dbl>
## 1 Afghanistan Rural                      65.5
## 2 Afghanistan Urban                      79.5
## 3 Algeria     Rural                      98  
## 4 Algeria     Urban                      98.5
## 5 Angola      Rural                      53  
## 6 Angola      Urban                      84
d <- sex_data %>%
  group_by(country, sex) %>%
  filter(!is.na(estimate)) %>%
  summarise(mean(estimate))
## `summarise()` has grouped output by 'country'. You can override using the
## `.groups` argument.
head(d)
## # A tibble: 6 × 3
## # Groups:   country [3]
##   country     sex    `mean(estimate)`
##   <chr>       <chr>             <dbl>
## 1 Afghanistan Female             67.5
## 2 Afghanistan Male               68.5
## 3 Algeria     Female             97.5
## 4 Algeria     Male               98  
## 5 Angola      Female             72  
## 6 Angola      Male               72

Conclusion

The BCG immunization coverage data from the WHO Health Inequality Data Repository provides rich insights into the socio-economic and demographic factors affecting vaccination rates. A complex interplay of income, education, place of residence, and gender contributes to significant disparities in BCG immunization coverage across 95 countries. To ensure that immunization targets are met, policymakers and healthcare providers must recognize these dynamics and develop tailored strategies. By addressing the identified gaps and implementing the recommendations, we can move closer to achieving global immunization goals and fostering a healthier future for all.