BCG vaccinations are an essential tool in maintaining global health. Understanding the socioeconomic and gender dynamics of their distribution can provide insight into areas of success and potential improvement. This analysis assesses the impact of factors such as income, education, place of residence, and gender on BCG vaccination rates, leveraging data from the WHO Health Equity Monitor database.
The dataset can be found here.
Before diving into the data, let’s set up our environment and load the necessary libraries and data.
Let’s load our dataset from the WHO Health Inequality Data Repository and take an initial look at its structure.
data <- read.csv("./data.csv")
str(data)
## 'data.frame': 7473 obs. of 24 variables:
## $ setting : chr "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
## $ date : int 2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
## $ source : chr "MICS" "MICS" "MICS" "MICS" ...
## $ indicator_abbr : chr "bcg" "bcg" "bcg" "bcg" ...
## $ indicator_name : chr "BCG immunization coverage among one-year-olds (%)" "BCG immunization coverage among one-year-olds (%)" "BCG immunization coverage among one-year-olds (%)" "BCG immunization coverage among one-year-olds (%)" ...
## $ dimension : chr "Economic status (wealth quintile)" "Economic status (wealth quintile)" "Economic status (wealth quintile)" "Economic status (wealth quintile)" ...
## $ subgroup : chr "Quintile 1 (poorest)" "Quintile 2" "Quintile 3" "Quintile 4" ...
## $ estimate : int 54 62 58 65 78 61 76 86 60 77 ...
## $ se : int 4 3 3 4 2 2 4 3 2 2 ...
## $ ci_lb : int 45 56 51 57 73 57 68 78 56 73 ...
## $ ci_ub : int 62 68 65 72 82 65 83 91 64 81 ...
## $ population : int 532 549 495 473 447 2267 122 108 2060 436 ...
## $ flag : chr "" "" "" "" ...
## $ setting_average : int 63 63 63 63 63 63 63 63 63 63 ...
## $ iso3 : chr "AFG" "AFG" "AFG" "AFG" ...
## $ favourable_indicator: int 1 1 1 1 1 1 1 1 1 1 ...
## $ indicator_scale : int 100 100 100 100 100 100 100 100 100 100 ...
## $ ordered_dimension : int 1 1 1 1 1 1 1 1 0 0 ...
## $ subgroup_order : int 1 2 3 4 5 1 2 3 0 0 ...
## $ reference_subgroup : int 0 0 0 0 0 0 0 0 0 1 ...
## $ whoreg6 : chr "Eastern Mediterranean" "Eastern Mediterranean" "Eastern Mediterranean" "Eastern Mediterranean" ...
## $ wbincome2022 : chr "Low income" "Low income" "Low income" "Low income" ...
## $ dataset_id : chr "rep_tb" "rep_tb" "rep_tb" "rep_tb" ...
## $ update : chr "06 December 2021" "06 December 2021" "06 December 2021" "06 December 2021" ...
The dataset contains multiple indicators. We’ll filter out only the data relevant to BCG vaccinations and make necessary modifications to the column names for clarity.
data_1 <- data %>%
mutate(country = setting, year = date) %>%
filter(indicator_abbr == "bcg") %>%
select(-setting, -date, -flag, -reference_subgroup)
# Income categories
income <- table(data_1$wbincome2022)
Before we proceed further, it’s crucial to address any data inconsistencies. For instance, we’ve noticed some misspelled country names which need correction.
# Correcting country names
misspelled_countries <- c("T\xfcrkiye", "C\xf4te d'Ivoire", "Viet Nam")
correct_countries <- c("Turkey", "Côte d'Ivoire", "Vietnam")
data_1$country <- as.character(data_1$country) # Ensure it's a character vector
for (i in seq_along(misspelled_countries)) {
data_1$country[data_1$country == misspelled_countries[i]] <- correct_countries[i]
}
data <- data_1
The data column ‘subgroup’ embeds information on economic status, education, place of residence, and gender. For ease of analysis, it would be beneficial to split this data into separate columns.
# Separating data based on dimensions
economic_status_data <- data %>%
filter(dimension == 'Economic status (wealth quintile)') %>%
pivot_wider(names_from = 'dimension', values_from = 'subgroup', names_prefix = 'economic_status_') %>%
select(-contains("indicator")) %>%
mutate(econ_quintile = `economic_status_Economic status (wealth quintile)`)
education_data <- data %>%
filter(dimension == 'Education (3 groups)') %>%
pivot_wider(names_from = 'dimension', values_from = 'subgroup', names_prefix = 'education_')%>%
select(-contains("indicator"))
place_of_residence_data <- data %>%
filter(dimension == 'Place of residence') %>%
pivot_wider(names_from = 'dimension', values_from = 'subgroup', names_prefix = 'place_of_residence_') %>%
select(-contains("indicator")) %>%
mutate(residence_type = `place_of_residence_Place of residence`)
sex_data <- data %>%
filter(dimension == 'Sex') %>%
pivot_wider(names_from = 'dimension', values_from = 'subgroup', names_prefix = 'sex_') %>%
mutate(sex = sex_Sex) %>%
select(-contains("indicator"))
To identify trends, we’ll aggregate data by various categories within each dimension.
a <- economic_status_data %>%
group_by(country, econ_quintile) %>%
filter(!is.na(estimate)) %>%
summarise(mean(estimate))
## `summarise()` has grouped output by 'country'. You can override using the
## `.groups` argument.
head(a)
## # A tibble: 6 × 3
## # Groups: country [2]
## country econ_quintile `mean(estimate)`
## <chr> <chr> <dbl>
## 1 Afghanistan Quintile 1 (poorest) 59.5
## 2 Afghanistan Quintile 2 65
## 3 Afghanistan Quintile 3 65
## 4 Afghanistan Quintile 4 72
## 5 Afghanistan Quintile 5 (richest) 81
## 6 Algeria Quintile 1 (poorest) 97
b <- education_data %>%
group_by(country, `education_Education (3 groups)`) %>%
filter(!is.na(estimate)) %>%
summarise(mean(estimate))
## `summarise()` has grouped output by 'country'. You can override using the
## `.groups` argument.
head(b)
## # A tibble: 6 × 3
## # Groups: country [2]
## country `education_Education (3 groups)` `mean(estimate)`
## <chr> <chr> <dbl>
## 1 Afghanistan No education 66
## 2 Afghanistan Primary education 81
## 3 Afghanistan Secondary or higher education 87
## 4 Algeria No education 97
## 5 Algeria Primary education 98.5
## 6 Algeria Secondary or higher education 98
c <- place_of_residence_data %>%
group_by(country, residence_type) %>%
filter(!is.na(estimate)) %>%
summarise(mean(estimate))
## `summarise()` has grouped output by 'country'. You can override using the
## `.groups` argument.
head(c)
## # A tibble: 6 × 3
## # Groups: country [3]
## country residence_type `mean(estimate)`
## <chr> <chr> <dbl>
## 1 Afghanistan Rural 65.5
## 2 Afghanistan Urban 79.5
## 3 Algeria Rural 98
## 4 Algeria Urban 98.5
## 5 Angola Rural 53
## 6 Angola Urban 84
d <- sex_data %>%
group_by(country, sex) %>%
filter(!is.na(estimate)) %>%
summarise(mean(estimate))
## `summarise()` has grouped output by 'country'. You can override using the
## `.groups` argument.
head(d)
## # A tibble: 6 × 3
## # Groups: country [3]
## country sex `mean(estimate)`
## <chr> <chr> <dbl>
## 1 Afghanistan Female 67.5
## 2 Afghanistan Male 68.5
## 3 Algeria Female 97.5
## 4 Algeria Male 98
## 5 Angola Female 72
## 6 Angola Male 72
To get a better understanding of the immunization trends across countries and their socio-economic and demographic groups, let’s create scatter plots.
##Scatterplot to view trends among countries
library(plotly)
# Add a new column to the data that indicates initial visibility.
# For example, let's make only "Afghanistan" initially visible.
a$hover_text <- paste(a$country, ": ", round(a$`mean(estimate)`, 2), "%")
# Scatter plot for Dataset A using the hover text
plot_A <- plot_ly(data = a,
x = ~country,
y = ~`mean(estimate)`,
type = "scatter",
mode = "markers",
color = ~econ_quintile,
text = ~hover_text,
marker = list(size = 10),
hoverinfo = "text") %>%
layout(title = "Mean Estimate by Country and Economic Status",
xaxis = list(showticklabels = FALSE, title = ""), # Hiding x-axis labels
yaxis = list(title = "Mean Estimate"),
showlegend = TRUE,
width=900, height=500,
hovermode = 'x' # Ensure "Compare Data on Hover" is auto-selected
)
## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()
plot_A
From our plot, we can see that there are exists large differences in the vaccination coverage between the poorest and the richest in some countries, namely Angola, Chad, Nigeria, Papua New Guinea, and Yemen.
Angola: A notable 50 percentage point gap exists, with the poorest quintile at 45% and the richest at 95% coverage.
Chad: Vaccination coverage varies from 28.75% in the poorest to 71.5% in the richest, a 42.75 point difference.
Nigeria: The most significant gap is observed here, with a stark 66.86 point difference between the poorest (23.43%) and the richest (90.29%).
Papua New Guinea: Begins at 43% for the poorest quintile, progressing to 92% for the richest, revealing a 49 percentage point difference.
Yemen: Coverage rises from 51.5% in the poorest quintile to 93.5% in the richest, a 42 point disparity.
Key insights:
Consistent Economic Gradient: All countries display increasing BCG vaccination rates from the poorest to the richest quintiles. Nigeria’s steep rise is particularly noteworthy.
Relative Baselines: Despite disparities, the poorest quintiles in Angola and Yemen have higher baseline rates compared to Nigeria.
Middle Quintile Trends: The progression in the intermediate quintiles underscores the persistent link between economic status and vaccination coverage.
Need for Targeted Interventions: These disparities hint at potential barriers faced by the economically disadvantaged, warranting focused efforts to enhance vaccination in the poorer quintiles.
# Scatter plot for Education Types and Coverage by Country
plot_B <- plot_ly(data = b,
x = ~country,
y = ~`mean(estimate)`,
type = "scatter",
mode = "markers",
color = ~`education_Education (3 groups)`,
text = ~paste(country, ": ", round(`mean(estimate)`, 2), "%"),
marker = list(size = 10),
hoverinfo = "text") %>%
layout(title = "Mean Estimate by Country and Education Group",
xaxis = list(showticklabels = FALSE, title = ""), # Hiding x-axis labels
yaxis = list(title = "Mean Estimate"),
showlegend = TRUE,
width=900, height=500,
hovermode = 'x',
shapes = list(
# Add line for 90% target
list(type = "line",
y0 = 90, y1 = 90,
x0 = 0, x1 = 1,
xref = "paper", yref = "y",
line = list(color = "grey", dash = "dot", width = 2))
),
annotations = list(
# Footnote for the solid line
list(xref = 'paper',
yref = 'paper',
x = 0,
y = -0.1,
text = "* Dashed line: GVAP target national vaccination coverage for vaccines initiated during the first year of life",
showarrow = FALSE)
),
margin = list(b = 60) # Increase bottom margin to accommodate footnotes
)
## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()
plot_B
# Scatter plot for Resident Types and Coverage by Country
# Create a unique hover text for each combination of country and residence type
c$hover_text <- paste(c$country, ": ", round(c$`mean(estimate)`, 2), "%")
# Scatter plot for Dataset C using the hover text
plot_C <- plot_ly(data = c,
x = ~country,
y = ~`mean(estimate)`,
type = "scatter",
mode = "markers",
color = ~residence_type,
text = ~hover_text,
marker = list(size = 10),
hoverinfo = "text") %>%
layout(title = "Mean Estimate by Country and Place of Residence",
xaxis = list(showticklabels = FALSE, title = ""), # Hiding x-axis labels
yaxis = list(title = "Mean Estimate"),
showlegend = TRUE,
width=700, height=500,
hovermode = 'x' # Ensure "Compare Data on Hover" is auto-selected
)
## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()
plot_C
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
# Scatter plot for Gender and Coverage by Country
# Create a unique hover text for each combination of country and sex
d$hover_text <- paste(d$country,
round(d$`mean(estimate)`, 2), "%")
# Scatter plot for Dataset D using the hover text
plot_D <- plot_ly(data = d,
x = ~country,
y = ~`mean(estimate)`,
type = "scatter",
mode = "markers",
color = ~sex,
text = ~hover_text,
marker = list(size = 10),
hoverinfo = "text") %>%
layout(title = "Mean Estimate by Country and Sex",
xaxis = list(showticklabels = FALSE, title = ""), # Hiding x-axis labels
yaxis = list(title = "Mean Estimate"),
showlegend = TRUE,
width=700, height=500,
hovermode = 'x' # Ensure "Compare Data on Hover" is auto-selected
)
## Warning: Specifying width/height in layout() is now deprecated.
## Please specify in ggplotly() or plot_ly()
plot_D
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
## Warning in RColorBrewer::brewer.pal(N, "Set2"): minimal value for n is 3, returning requested palette with 3 different levels
The BCG immunization coverage data from the WHO Health Inequality Data Repository provides rich insights into the socio-economic and demographic factors affecting vaccination rates. A complex interplay of income, education, place of residence, and gender contributes to significant disparities in BCG immunization coverage across 95 countries. To ensure that immunization targets are met, policymakers and healthcare providers must recognize these dynamics and develop tailored strategies. By addressing the identified gaps and implementing the recommendations, we can move closer to achieving global immunization goals and fostering a healthier future for all.