Human papillomavirus (HPV) is a common viral infection that can lead to various health complications, including several types of cancer. Over the past decades, vaccines have been developed and introduced as a preventive measure against HPV. The vaccines have demonstrated efficacy in preventing infections and consequently reducing the incidence of HPV-related diseases. However, their implementation is subjected to cost considerations, especially in countries with limited healthcare budgets.
This project aims to explore the economic considerations related to the implementation of HPV vaccination programs globally. One of the key measures to assess the value of health interventions is the Incremental Cost-Effectiveness Ratio (ICER). The ICER is a statistic used in cost-effectiveness analysis to summarize the cost-effectiveness of a health care intervention. It is defined as the difference in cost between two possible interventions, divided by the difference in their effect. Lower ICERs suggest more cost-effective interventions.
The World Health Organization (WHO) often uses a general reference of one to three times a country’s Gross Domestic Product (GDP) per capita for the cost-effectiveness threshold. Therefore, this analysis will primarily focus on examining the factors influencing the predicted ICERs for the HPV vaccine in various countries, especially in the context of the country’s GDP per capita. The objective is to understand which factors are significant in determining the cost-effectiveness of the HPV vaccine and how these factors interact. By understanding these determinants, we can provide better recommendations for countries planning to implement or optimize HPV vaccination programs, ensuring efficient utilization of resources for maximum public health benefit.
The dataset can be accessed here.
The main objectives of this project are:
Investigate Cost-Effectiveness of HPV Vaccination: Analyze and understand the cost-effectiveness of HPV vaccination in different countries worldwide as represented by the incremental cost-effectiveness ratios (ICERs).
Explore the Influence of Key Variables on Predicted ICERs: Understand the impact of key variables such as the logarithm of GDP per capita, GAVI eligibility, the logarithm of the disease burden, and the logarithm of the vaccine cost on the predicted ICERs of HPV vaccination.
Identify Outliers in Cost-Effectiveness: Identify countries with significant economic challenges in implementing HPV vaccination, as indicated by high predicted ICERs relative to their GDP per capita.
Assess Variability Across Regions: Examine the variability in predicted ICERs across different regions, which reflects differing economic and health conditions.
Build and Evaluate a Predictive Model: Develop a linear regression model to predict ICERs based on selected variables and evaluate its performance and assumptions.
The understanding and insights gained from achieving these objectives could provide valuable input for policy makers and health organizations, aiding in the planning and implementation of HPV vaccination programs.
Here we load the dataset and check its structure.
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(knitr)
library(car)
## Loading required package: carData
##
## Attaching package: 'car'
##
## The following object is masked from 'package:dplyr':
##
## recode
##
## The following object is masked from 'package:purrr':
##
## some
library(broom)
library(plotly)
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
# Read the dataset
hpv <- read.csv("hpv.csv")
# Glimpse of the dataset
str(hpv) #total 203 rows, 41 columns
## 'data.frame': 203 obs. of 41 variables:
## $ country : chr "Canada" "Greenland" "USA" "Australia" ...
## $ location_id : int 101 349 102 71 72 66 67 69 68 74 ...
## $ ihme_loc_id : chr "CAN" "GRL" "USA" "AUS" ...
## $ region_id : int 100 100 100 70 70 65 65 65 65 73 ...
## $ region_name : chr "High-income North America" "High-income North America" "High-income North America" "Australasia" ...
## $ super_region_id : int 64 64 64 64 64 64 64 64 64 64 ...
## $ super_region_name : chr "High-income" "High-income" "High-income" "High-income" ...
## $ gavi_eligible : int 0 0 0 0 0 0 0 0 0 0 ...
## $ log_GDP_2017usd_per_cap : num 11 10.9 11 11.2 10.7 ...
## $ log_burden_variable : num -6.56 -5.53 -6.49 -6.83 -6.83 ...
## $ log_vaccine_cost_2017usd : num 5.06 5.06 6.02 5.06 5.06 ...
## $ payer : int 1 1 1 1 1 1 1 1 1 1 ...
## $ intercept : int 1 1 1 1 1 1 1 1 1 1 ...
## $ burden_disc_rate : int 3 3 3 3 3 3 3 3 3 3 ...
## $ cost_disc_rate : int 3 3 3 3 3 3 3 3 3 3 ...
## $ coverage : int 70 70 70 70 70 70 70 70 70 70 ...
## $ not_lifetime : int 0 0 0 0 0 0 0 0 0 0 ...
## $ screen_comparator : int 0 0 0 0 0 0 0 0 0 0 ...
## $ access_to_care_100 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ quadrivalent : int 0 0 0 0 0 0 0 0 0 0 ...
## $ qalys : int 0 0 0 0 0 0 0 0 0 0 ...
## $ both_sex : int 0 0 0 0 0 0 0 0 0 0 ...
## $ new_spline_cov : num 0.182 0.169 0.184 0.215 0.146 ...
## $ predicted_icer_usd : num 10150 6747 27600 11493 11100 ...
## $ predicted_icer_usd_median : num 7834 5198 21271 8876 8582 ...
## $ predicted_icer_usd_lower : num 1849 1236 5041 2089 2021 ...
## $ predicted_icer_usd_upper : num 32426 21231 88339 36622 35297 ...
## $ predicted_log_icer_usd : num 8.95 8.54 9.95 9.07 9.03 ...
## $ predicted_log_icer_usd_lower : num 7.52 7.12 8.53 7.64 7.61 ...
## $ predicted_log_icer_usd_upper : num 10.39 9.96 11.39 10.51 10.47 ...
## $ ratio_of_upper_to_lower_prediction : num 17.5 17.2 17.5 17.5 17.5 ...
## $ predicted_icer_usd_over_gdp_pc : num 0.173 0.126 0.463 0.152 0.248 ...
## $ predicted_icer_usd_median_over_gdp_pc: num 0.133 0.097 0.357 0.118 0.192 ...
## $ pred_prob_usd : num 4.15e-06 1.96e-05 3.50e-07 3.44e-06 2.04e-06 1.49e-05 4.59e-06 2.24e-06 1.56e-06 2.38e-06 ...
## $ pred_val_usd : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ adj_ICER_usd : num 10150 6747 27600 11493 11100 ...
## $ adj_ICER_usd_lower : num 1849 1236 5041 2089 2021 ...
## $ adj_ICER_usd_upper : num 32426 21231 88339 36622 35297 ...
## $ GDP_usd_per_cap : num 58777 53570 59613 75506 44792 ...
## $ GDP_usd_category : chr "< 0.5" "< 0.5" "< 0.5" "< 0.5" ...
## $ X : logi NA NA NA NA NA NA ...
We see that the dataset consists of 203 rows and 41 columns.
unique(hpv$country)
## [1] "Canada"
## [2] "Greenland"
## [3] "USA"
## [4] "Australia"
## [5] "New Zealand"
## [6] "Brunei"
## [7] "Japan"
## [8] "Singapore"
## [9] "South Korea"
## [10] "Andorra"
## [11] "Austria"
## [12] "Belgium"
## [13] "Cyprus"
## [14] "Denmark"
## [15] "Finland"
## [16] "France"
## [17] "Germany"
## [18] "Greece"
## [19] "Iceland"
## [20] "Ireland"
## [21] "Israel"
## [22] "Italy"
## [23] "Luxembourg"
## [24] "Malta"
## [25] "Netherlands"
## [26] "Norway"
## [27] "Portugal"
## [28] "Spain"
## [29] "Sweden"
## [30] "Switzerland"
## [31] "Argentina"
## [32] "Chile"
## [33] "Uruguay"
## [34] "Belarus"
## [35] "Estonia"
## [36] "Latvia"
## [37] "Lithuania"
## [38] "Moldova"
## [39] "Russian Federation"
## [40] "Ukraine"
## [41] "Albania"
## [42] "Bosnia and Herzegovina"
## [43] "Bulgaria"
## [44] "Croatia"
## [45] "Czech Republic"
## [46] "Hungary"
## [47] "Macedonia"
## [48] "Montenegro"
## [49] "Poland"
## [50] "Romania"
## [51] "Serbia"
## [52] "Slovakia"
## [53] "Slovenia"
## [54] "Armenia"
## [55] "Azerbaijan"
## [56] "Georgia"
## [57] "Kazakhstan"
## [58] "Kyrgyzstan"
## [59] "Mongolia"
## [60] "Tajikistan"
## [61] "Turkmenistan"
## [62] "Uzbekistan"
## [63] "Colombia"
## [64] "Costa Rica"
## [65] "El Salvador"
## [66] "Guatemala"
## [67] "Honduras"
## [68] "Mexico"
## [69] "Nicaragua"
## [70] "Panama"
## [71] "Venezuela"
## [72] "Bolivia"
## [73] "Ecuador"
## [74] "Peru"
## [75] "Antigua and Barbuda"
## [76] "The Bahamas"
## [77] "Barbados"
## [78] "Belize"
## [79] "Bermuda"
## [80] "Cuba"
## [81] "Dominica"
## [82] "Dominican Republic"
## [83] "Grenada"
## [84] "Guyana"
## [85] "Haiti"
## [86] "Jamaica"
## [87] "Puerto Rico"
## [88] "Saint Lucia"
## [89] "Saint Vincent and the Grenadines"
## [90] "Suriname"
## [91] "Trinidad and Tobago"
## [92] "Virgin Islands"
## [93] "Brazil"
## [94] "Paraguay"
## [95] "China"
## [96] "North Korea"
## [97] "Taiwan (Province of China)"
## [98] "Cambodia"
## [99] "Indonesia"
## [100] "Laos"
## [101] "Malaysia"
## [102] "Maldives"
## [103] "Mauritius"
## [104] "Myanmar"
## [105] "Philippines"
## [106] "Sri Lanka"
## [107] "Seychelles"
## [108] "Thailand"
## [109] "Timor-Leste"
## [110] "Vietnam"
## [111] "American Samoa"
## [112] "Federated States of Micronesia"
## [113] "Fiji"
## [114] "Guam"
## [115] "Kiribati"
## [116] "Marshall Islands"
## [117] "Northern Mariana Islands"
## [118] "Papua New Guinea"
## [119] "Samoa"
## [120] "Solomon Islands"
## [121] "Tonga"
## [122] "Vanuatu"
## [123] "Afghanistan"
## [124] "Algeria"
## [125] "Bahrain"
## [126] "Egypt"
## [127] "Iran"
## [128] "Iraq"
## [129] "Jordan"
## [130] "Kuwait"
## [131] "Lebanon"
## [132] "Libya"
## [133] "Morocco"
## [134] "Palestine"
## [135] "Oman"
## [136] "Qatar"
## [137] "Saudi Arabia"
## [138] "Sudan"
## [139] "Syria"
## [140] "Tunisia"
## [141] "Turkey"
## [142] "United Arab Emirates"
## [143] "Yemen"
## [144] "Bangladesh"
## [145] "Bhutan"
## [146] "India"
## [147] "Nepal"
## [148] "Pakistan"
## [149] "Botswana"
## [150] "Lesotho"
## [151] "Namibia"
## [152] "South Africa"
## [153] "Swaziland"
## [154] "Zimbabwe"
## [155] "Benin"
## [156] "Burkina Faso"
## [157] "Cameroon"
## [158] "Cape Verde"
## [159] "Chad"
## [160] "Cote d'Ivoire"
## [161] "The Gambia"
## [162] "Ghana"
## [163] "Guinea"
## [164] "Guinea-Bissau"
## [165] "Liberia"
## [166] "Mali"
## [167] "Mauritania"
## [168] "Niger"
## [169] "Nigeria"
## [170] "Sao Tome and Principe"
## [171] "Senegal"
## [172] "Sierra Leone"
## [173] "Togo"
## [174] "Burundi"
## [175] "Comoros"
## [176] "Djibouti"
## [177] "Eritrea"
## [178] "Ethiopia"
## [179] "Kenya"
## [180] "Madagascar"
## [181] "Malawi"
## [182] "Mozambique"
## [183] "Rwanda"
## [184] "Somalia"
## [185] "South Sudan"
## [186] "Tanzania"
## [187] "Uganda"
## [188] "Zambia"
## [189] "Angola"
## [190] "Central African Republic"
## [191] "Congo (Brazzaville)"
## [192] "DR Congo"
## [193] "Equatorial Guinea"
## [194] "Gabon"
## [195] "United Kingdom"
## [196] "Global"
## [197] "Southeast Asia, East Asia, and Oceania"
## [198] "Central Europe, Eastern Europe, and Central Asia"
## [199] "High-income"
## [200] "Latin America and Caribbean"
## [201] "North Africa and Middle East"
## [202] "South Asia"
## [203] "Sub-Saharan Africa"
We can see that there are 203 unique entries under ‘country’. However, only 195 are actually unique countries. The remaining 8 are listed as: “Global”, “Southeast Asia, East Asia, and Oceania”, “Central Europe, Eastern Europe, and Central Asia”, “High-income”, “Latin America and Caribbean”, “North Africa and Middle East”, “South Asia”, and “Sub-Saharan Africa”.
Let’s filter these so that we only have country-level data:
# Cleaning data to remove aggregated entries and retain only country-specific data
filtered_hpv <- hpv %>% filter(!country %in% c("Global", "Southeast Asia, East Asia, and Oceania", "Central Europe, Eastern Europe, and Central Asia", "High-income", "Latin America and Caribbean", "North Africa and Middle East", "South Asia", "Sub-Saharan Africa"))
#Create a histogram for Log GDP per Capita
filtered_hpv %>%
ggplot(aes(x = log_GDP_2017usd_per_cap)) +
geom_histogram(binwidth = 0.5, fill = 'blue', alpha = 0.5) +
labs(title = "Figure 1. Distribution of Log GDP per Capita", x = "Log GDP per Capita", y = "Count",
plot.title = element_text(hjust = 0.5))
# Create a scatter plot for Log GDP per Capita and Predicted ICER (we've labelled the plot by country as well)
scatter_plot <- ggplot(filtered_hpv, aes(x = log_GDP_2017usd_per_cap, y = predicted_icer_usd_over_gdp_pc, text = country)) +
geom_point(alpha = 0.5) +
labs(title = "Figure 2. Scatter Plot of Log GDP per Capita and Predicted ICER", x = "Log GDP per Capita", y = "Predicted ICER", plot.title.position = "plot")
# Convert ggplot to a plotly object
ggplotly(scatter_plot, tooltip = c("text"))
From the plot, we can see several outliers whose predicted ICER is beyond x3 their GDP (the upper limit of WHO’s cost-effectiveness threshold) - Somalia, Egypt, Palestine. This may indicate that implementing the HPV vaccine program in these countries could be economically challenging, given their low GDP and potentially high costs relative to expected health benefits.
# Create a boxplot for Predicted ICER for each Region
filtered_hpv %>%
ggplot(aes(x = region_name, y = predicted_icer_usd_over_gdp_pc)) +
geom_boxplot() +
labs(title = "Figure 3. Boxplot of Predicted ICER for each Region", x = "Region Name", y = "Predicted ICER",
plot.title = element_text(hjust = 0.5)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
We can see that some regions have little variability in their predicted ICERs, while others display high variability.
We construct a multiple linear regression model with the dependent variable being the predicted ICER and the independent variables being GDP per capita, GAVI eligibility, the burden of disease, and vaccine cost.
lm_model <- lm(predicted_icer_usd_over_gdp_pc ~ log_GDP_2017usd_per_cap + gavi_eligible + log_burden_variable +
log_vaccine_cost_2017usd,
data = filtered_hpv)
# Summary of the model to check the statistical details
summary(lm_model)
##
## Call:
## lm(formula = predicted_icer_usd_over_gdp_pc ~ log_GDP_2017usd_per_cap +
## gavi_eligible + log_burden_variable + log_vaccine_cost_2017usd,
## data = filtered_hpv)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.55014 -0.14967 -0.06400 0.08391 2.02278
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.19116 0.23625 9.275 < 2e-16 ***
## log_GDP_2017usd_per_cap -0.59671 0.02509 -23.784 < 2e-16 ***
## gavi_eligible -0.41570 0.09561 -4.348 2.24e-05 ***
## log_burden_variable -0.31879 0.03038 -10.492 < 2e-16 ***
## log_vaccine_cost_2017usd 0.45201 0.04355 10.380 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2762 on 190 degrees of freedom
## Multiple R-squared: 0.7586, Adjusted R-squared: 0.7535
## F-statistic: 149.2 on 4 and 190 DF, p-value: < 2.2e-16
The regression analysis revealed significant associations between the predicted ICER of the HPV vaccine and the considered variables:
1. Intercept
The model shows a statistically significant positive intercept of approximately 2.19. This means that when all the predictor variables are zero (or their baseline values), the predicted ICER is expected to be around 2.19 times the GDP per capita.
2. Log of GDP per Capita The log of GDP per capita has a negative coefficient of approximately -0.60. This means that as the GDP per capita of a country increases, the predicted ICER tends to decrease. In simpler terms, wealthier countries are expected to have a lower predicted ICER for the HPV vaccine.
3. GAVI Eligibility Being eligible for support from GAVI has a negative coefficient of approximately -0.42. This suggests that countries eligible for GAVI support tend to have a lower predicted ICER compared to non-eligible countries. GAVI support might help in reducing the cost of vaccination and improve cost-effectiveness.
4. Burden of the Disease The burden of the disease, represented by the log of a disease-related variable, has a negative coefficient of approximately -0.32. This indicates that as the burden of the disease increases, the predicted ICER tends to decrease. In other words, countries facing a higher disease burden might have a more cost-effective HPV vaccination program.
5. Log of Vaccine Costs The log of vaccine costs has a positive coefficient of approximately 0.45. This suggests that as the vaccine costs increase, the predicted ICER also tends to increase. Higher vaccine costs might lead to a higher predicted ICER for the HPV vaccine.
Model Performance The model’s performance is evaluated using the R-squared value, which indicates that about 75.86% of the variation in the predicted ICER can be explained by the variation in the predictor variables. The model appears to be a good fit for the data, given the significant R-squared value.
Overall, the model shows that GDP per capita, GAVI eligibility, disease burden, and vaccine costs are important factors influencing the predicted cost-effectiveness of the HPV vaccine. By understanding these relationships, policymakers and healthcare stakeholders can make informed decisions to improve the cost-effectiveness of HPV vaccination programs in different countries.
library(car)
# calculate VIF
vif_model <- vif(lm_model)
print(vif_model)
## log_GDP_2017usd_per_cap gavi_eligible log_burden_variable
## 3.492501 4.835314 1.532657
## log_vaccine_cost_2017usd
## 5.626995
A few countries, specifically Somalia, Egypt, and Palestine, have predicted ICERs that exceed three times their GDP per capita, suggesting significant economic challenges in implementing HPV vaccination programs.
Regions show varying levels of variability in their predicted ICERs, indicative of differing economic and health conditions.
The log of GDP per capita, GAVI eligibility, the log of burden variable, and the log of vaccine cost are significant predictors of ICERs, with varying degrees of influence.
Based on the Variance Inflation Factor (VIF) results, there is no strong evidence of multicollinearity among the predictors.
Our model only includes a selected set of variables and does not account for potentially significant factors such as country-specific health policies and societal attitudes towards vaccination.
Even though our predictors do not show strong multicollinearity, there might still be complex relationships between the variables not captured by VIF.
The data’s quality and consistency may differ as it combines various studies and registries.
The findings may not be applicable to all local contexts because of the unique health, economic, and societal conditions in different countries.
Countries with high ICERs relative to GDP per capita may need to explore alternative strategies or seek additional support for HPV vaccination programs.
For regions with high variability in ICERs, further investigations are recommended to understand the specific conditions that lead to this high variation.
Policies related to GAVI eligibility should be taken into consideration when designing HPV vaccination programs.
Attempts to increase cost-effectiveness could focus on improving economic conditions and reducing vaccine costs.
Future analyses could benefit from a more comprehensive approach that considers a wider range of factors.
Interpretation and application of the findings should always consider specific national or regional conditions.
Efforts to improve the quality and consistency of the data should be prioritized in future studies.
Institute for Health Metrics and Evaluation (IHME). HPV Vaccination Cost Effectiveness Estimates. Seattle, United States of America: Institute for Health Metrics and Evaluation (IHME), 2021. https://doi.org/10.6069/MGZ2-6413
Marseille E, Larson B, Kazi DS, Kahn JG, Rosen S. Thresholds for the cost-effectiveness of interventions: alternative approaches. Bull World Health Organ. 2015;93(2):118-124. doi:10.2471/BLT.14.138206