Socioeconomic and Gender Dynamics in BCG Immunization Coverage: An Analysis Across 95 Countries

Introduction

BCG vaccinations are an essential tool in maintaining global health. Understanding the socioeconomic and gender dynamics of their distribution can provide insight into areas of success and potential improvement. This analysis assesses the impact of factors such as income, education, place of residence, and gender on BCG vaccination rates, leveraging data from the WHO Health Equity Monitor database.

The dataset can be found here.

Pre-requisites & Data Loading

Before diving into the data, let’s set up our environment and load the necessary libraries and data.

Data Loading & Initial Exploration

Let’s load our dataset from the WHO Health Inequality Data Repository and take an initial look at its structure.

data <- read.csv("./data.csv")
str(data)

## 'data.frame':    7473 obs. of  24 variables:
##  $ setting             : chr  "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
##  $ date                : int  2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
##  $ source              : chr  "MICS" "MICS" "MICS" "MICS" ...
##  $ indicator_abbr      : chr  "bcg" "bcg" "bcg" "bcg" ...
##  $ indicator_name      : chr  "BCG immunization coverage among one-year-olds (%)" "BCG immunization coverage among one-year-olds (%)" "BCG immunization coverage among one-year-olds (%)" "BCG immunization coverage among one-year-olds (%)" ...
##  $ dimension           : chr  "Economic status (wealth quintile)" "Economic status (wealth quintile)" "Economic status (wealth quintile)" "Economic status (wealth quintile)" ...
##  $ subgroup            : chr  "Quintile 1 (poorest)" "Quintile 2" "Quintile 3" "Quintile 4" ...
##  $ estimate            : int  54 62 58 65 78 61 76 86 60 77 ...
##  $ se                  : int  4 3 3 4 2 2 4 3 2 2 ...
##  $ ci_lb               : int  45 56 51 57 73 57 68 78 56 73 ...
##  $ ci_ub               : int  62 68 65 72 82 65 83 91 64 81 ...
##  $ population          : int  532 549 495 473 447 2267 122 108 2060 436 ...
##  $ flag                : chr  "" "" "" "" ...
##  $ setting_average     : int  63 63 63 63 63 63 63 63 63 63 ...
##  $ iso3                : chr  "AFG" "AFG" "AFG" "AFG" ...
##  $ favourable_indicator: int  1 1 1 1 1 1 1 1 1 1 ...
##  $ indicator_scale     : int  100 100 100 100 100 100 100 100 100 100 ...
##  $ ordered_dimension   : int  1 1 1 1 1 1 1 1 0 0 ...
##  $ subgroup_order      : int  1 2 3 4 5 1 2 3 0 0 ...
##  $ reference_subgroup  : int  0 0 0 0 0 0 0 0 0 1 ...
##  $ whoreg6             : chr  "Eastern Mediterranean" "Eastern Mediterranean" "Eastern Mediterranean" "Eastern Mediterranean" ...
##  $ wbincome2022        : chr  "Low income" "Low income" "Low income" "Low income" ...
##  $ dataset_id          : chr  "rep_tb" "rep_tb" "rep_tb" "rep_tb" ...
##  $ update              : chr  "06 December 2021" "06 December 2021" "06 December 2021" "06 December 2021" ...

Data Cleaning & Pre-processing

Standardizing Dataset Columns

The dataset contains multiple indicators. We’ll filter out only the data relevant to BCG vaccinations and make necessary modifications to the column names for clarity.

data_1 <- data %>% 
  mutate(country = setting, year = date) %>%
  filter(indicator_abbr == "bcg") %>%
  select(-setting, -date, -flag, -reference_subgroup)

# Income categories
income <- table(data_1$wbincome2022)

Addressing Data Discrepancies

Before we proceed further, it’s crucial to address any data inconsistencies. For instance, we’ve noticed some misspelled country names which need correction.

# Correcting country names
misspelled_countries <- c("T\xfcrkiye", "C\xf4te d'Ivoire", "Viet Nam")
correct_countries <- c("Turkey", "Côte d'Ivoire", "Vietnam")
data_1$country <- as.character(data_1$country) # Ensure it's a character vector
for (i in seq_along(misspelled_countries)) {
  data_1$country[data_1$country == misspelled_countries[i]] <- correct_countries[i]
}

data <- data_1

Structuring Data based on Dimensions

The data column ‘subgroup’ embeds information on economic status, education, place of residence, and gender. For ease of analysis, it would be beneficial to split this data into separate columns.

# Separating data based on dimensions
economic_status_data <- data %>%
  filter(dimension == 'Economic status (wealth quintile)') %>%
  pivot_wider(names_from = 'dimension', values_from = 'subgroup', names_prefix = 'economic_status_') %>%
  select(-contains("indicator")) %>%
  mutate(econ_quintile = `economic_status_Economic status (wealth quintile)`)


education_data <- data %>%
  filter(dimension == 'Education (3 groups)') %>%
  pivot_wider(names_from = 'dimension', values_from = 'subgroup', names_prefix = 'education_')%>%
  select(-contains("indicator"))


place_of_residence_data <- data %>%
  filter(dimension == 'Place of residence') %>%
  pivot_wider(names_from = 'dimension', values_from = 'subgroup', names_prefix = 'place_of_residence_') %>%
  select(-contains("indicator")) %>%
  mutate(residence_type = `place_of_residence_Place of residence`)

sex_data <- data %>%
  filter(dimension == 'Sex') %>%
  pivot_wider(names_from = 'dimension', values_from = 'subgroup', names_prefix = 'sex_') %>%
  mutate(sex = sex_Sex) %>%
  select(-contains("indicator"))

Grouping and Summarizing

To identify trends, we’ll aggregate data by various categories within each dimension.

a <- economic_status_data %>%
  group_by(country, econ_quintile) %>%
  filter(!is.na(estimate)) %>%
  summarise(mean(estimate))

## `summarise()` has grouped output by 'country'. You can override using the
## `.groups` argument.

head(a)

## # A tibble: 6 × 3
## # Groups:   country [2]
##   country     econ_quintile        `mean(estimate)`
##   <chr>       <chr>                           <dbl>
## 1 Afghanistan Quintile 1 (poorest)             59.5
## 2 Afghanistan Quintile 2                       65  
## 3 Afghanistan Quintile 3                       65  
## 4 Afghanistan Quintile 4                       72  
## 5 Afghanistan Quintile 5 (richest)             81  
## 6 Algeria     Quintile 1 (poorest)             97

b <- education_data %>%
  group_by(country, `education_Education (3 groups)`) %>%
  filter(!is.na(estimate)) %>%
  summarise(mean(estimate))

## `summarise()` has grouped output by 'country'. You can override using the
## `.groups` argument.

head(b)

## # A tibble: 6 × 3
## # Groups:   country [2]
##   country     `education_Education (3 groups)` `mean(estimate)`
##   <chr>       <chr>                                       <dbl>
## 1 Afghanistan No education                                 66  
## 2 Afghanistan Primary education                            81  
## 3 Afghanistan Secondary or higher education                87  
## 4 Algeria     No education                                 97  
## 5 Algeria     Primary education                            98.5
## 6 Algeria     Secondary or higher education                98

c <- place_of_residence_data %>%
  group_by(country, residence_type) %>%
  filter(!is.na(estimate)) %>%
  summarise(mean(estimate))

## `summarise()` has grouped output by 'country'. You can override using the
## `.groups` argument.

head(c)

## # A tibble: 6 × 3
## # Groups:   country [3]
##   country     residence_type `mean(estimate)`
##   <chr>       <chr>                     <dbl>
## 1 Afghanistan Rural                      65.5
## 2 Afghanistan Urban                      79.5
## 3 Algeria     Rural                      98  
## 4 Algeria     Urban                      98.5
## 5 Angola      Rural                      53  
## 6 Angola      Urban                      84

d <- sex_data %>%
  group_by(country, sex) %>%
  filter(!is.na(estimate)) %>%
  summarise(mean(estimate))

## `summarise()` has grouped output by 'country'. You can override using the
## `.groups` argument.

head(d)

## # A tibble: 6 × 3
## # Groups:   country [3]
##   country     sex    `mean(estimate)`
##   <chr>       <chr>             <dbl>
## 1 Afghanistan Female             67.5
## 2 Afghanistan Male               68.5
## 3 Algeria     Female             97.5
## 4 Algeria     Male               98  
## 5 Angola      Female             72  
## 6 Angola      Male               72

Visualizing Trends Among Countries

To get a better understanding of the immunization trends across countries and their socio-economic and demographic groups, let’s create scatter plots.

Economic status-based Immunization Trends

##Scatterplot to view trends among countries

library(plotly)


# Add a new column to the data that indicates initial visibility.
# For example, let's make only "Afghanistan" initially visible.
a$hover_text <- paste(a$country, ": ", round(a$`mean(estimate)`, 2), "%")

# Scatter plot for Dataset A using the hover text
plot_A <- plot_ly(data = a, 
                  x = ~country, 
                  y = ~`mean(estimate)`, 
                  type = "scatter", 
                  mode = "markers", 
                  color = ~econ_quintile, 
                  text = ~hover_text,
                  marker = list(size = 10),
                  hoverinfo = "text") %>%
  layout(title = "Mean Estimate by Country and Economic Status",
         xaxis = list(showticklabels = FALSE, title = ""), # Hiding x-axis labels
         yaxis = list(title = "Mean Estimate"),
         showlegend = TRUE,
         width=900, height=500,
         hovermode = 'x'  # Ensure "Compare Data on Hover" is auto-selected
  )

## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()

plot_A

From our plot, we can see that there are exists large differences in the vaccination coverage between the poorest and the richest in some countries, namely Angola, Chad, Nigeria, Papua New Guinea, and Yemen.

Angola: A notable 50 percentage point gap exists, with the poorest quintile at 45% and the richest at 95% coverage.

Chad: Vaccination coverage varies from 28.75% in the poorest to 71.5% in the richest, a 42.75 point difference.

Nigeria: The most significant gap is observed here, with a stark 66.86 point difference between the poorest (23.43%) and the richest (90.29%).

Papua New Guinea: Begins at 43% for the poorest quintile, progressing to 92% for the richest, revealing a 49 percentage point difference.

Yemen: Coverage rises from 51.5% in the poorest quintile to 93.5% in the richest, a 42 point disparity.

Key insights:

Consistent Economic Gradient: All countries display increasing BCG vaccination rates from the poorest to the richest quintiles. Nigeria’s steep rise is particularly noteworthy.
Relative Baselines: Despite disparities, the poorest quintiles in Angola and Yemen have higher baseline rates compared to Nigeria.
Middle Quintile Trends: The progression in the intermediate quintiles underscores the persistent link between economic status and vaccination coverage.
Need for Targeted Interventions: These disparities hint at potential barriers faced by the economically disadvantaged, warranting focused efforts to enhance vaccination in the poorer quintiles.

Education-based Immunization Trends

# Scatter plot for Education Types and Coverage by Country
plot_B <- plot_ly(data = b, 
                  x = ~country, 
                  y = ~`mean(estimate)`, 
                  type = "scatter", 
                  mode = "markers", 
                  color = ~`education_Education (3 groups)`, 
                  text = ~paste(country, ": ", round(`mean(estimate)`, 2), "%"),
                  marker = list(size = 10),
                  hoverinfo = "text") %>%
  layout(title = "Mean Estimate by Country and Education Group",
         xaxis = list(showticklabels = FALSE, title = ""), # Hiding x-axis labels
         yaxis = list(title = "Mean Estimate"),
         showlegend = TRUE,
         width=900, height=500,
         hovermode = 'x',
         shapes = list(
           # Add line for 90% target
           list(type = "line",
                y0 = 90, y1 = 90,
                x0 = 0, x1 = 1,
                xref = "paper", yref = "y",
                line = list(color = "grey", dash = "dot", width = 2))
          ),
         annotations = list(
           # Footnote for the solid line
           list(xref = 'paper', 
                yref = 'paper', 
                x = 0, 
                y = -0.1, 
                text = "* Dashed line: GVAP target national vaccination coverage for vaccines initiated during the first year of life",
                showarrow = FALSE)
         ),
         margin = list(b = 60)  # Increase bottom margin to accommodate footnotes
  )

## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()

plot_B

Residence-based Immunization Trends

# Scatter plot for Resident Types and Coverage by Country
# Create a unique hover text for each combination of country and residence type
c$hover_text <- paste(c$country, ": ", round(c$`mean(estimate)`, 2), "%")

# Scatter plot for Dataset C using the hover text
plot_C <- plot_ly(data = c, 
                  x = ~country, 
                  y = ~`mean(estimate)`, 
                  type = "scatter", 
                  mode = "markers", 
                  color = ~residence_type, 
                  text = ~hover_text,
                  marker = list(size = 10),
                  hoverinfo = "text") %>%
  layout(title = "Mean Estimate by Country and Place of Residence",
         xaxis = list(showticklabels = FALSE, title = ""), # Hiding x-axis labels
         yaxis = list(title = "Mean Estimate"),
         showlegend = TRUE,
         width=700, height=500,
         hovermode = 'x'  # Ensure "Compare Data on Hover" is auto-selected
         )

## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()

plot_C

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

Gender-based Immunization Trends

# Scatter plot for Gender and Coverage by Country
# Create a unique hover text for each combination of country and sex
d$hover_text <- paste(d$country, 
                     round(d$`mean(estimate)`, 2), "%")

# Scatter plot for Dataset D using the hover text
plot_D <- plot_ly(data = d, 
                  x = ~country, 
                  y = ~`mean(estimate)`, 
                  type = "scatter", 
                  mode = "markers", 
                  color = ~sex, 
                  text = ~hover_text,
                  marker = list(size = 10),
                  hoverinfo = "text") %>%
  layout(title = "Mean Estimate by Country and Sex",
         xaxis = list(showticklabels = FALSE, title = ""), # Hiding x-axis labels
         yaxis = list(title = "Mean Estimate"),
         showlegend = TRUE,
         width=700, height=500,
         hovermode = 'x'  # Ensure "Compare Data on Hover" is auto-selected
         )

## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()

plot_D

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels

Conclusion

The BCG immunization coverage data from the WHO Health Inequality Data Repository provides rich insights into the socio-economic and demographic factors affecting vaccination rates. A complex interplay of income, education, place of residence, and gender contributes to significant disparities in BCG immunization coverage across 95 countries. To ensure that immunization targets are met, policymakers and healthcare providers must recognize these dynamics and develop tailored strategies. By addressing the identified gaps and implementing the recommendations, we can move closer to achieving global immunization goals and fostering a healthier future for all.