Studying the association between the change in funding and the increase in HPV-related cancer research is critical to improving women’s health globally. Human Papillomavirus (HPV) is one of the most common sexually transmitted infections and a leading cause of cervical cancer, which is a significant contributor to mortality among women (OASH, 2022). Worldwide, high-risk HPVs cause about 5% of all cancers. In the United States, high-risk HPVs cause 3% of all cancers in women and 2% of all cancers in men(NIH, 2023).
To address this urgent problem, there is tremendous funding for this research area. In the United States, the National Institutes of Health (NIH) provides the majority of federal funding for research on HPV-related cancer. From 2012 to 2022, NIH spent over 448 million on HPV and/or Cervical Cancer Vaccines, and in 2022, it allocated 61 million on it, increasing by 32.6% from the previous year 2021(NIH, 2023).
Key Focus Areas
Each visualization includes detailed observations about what the data reveals, why these insights are meaningful, and key takeaways.
This dataset integrates information from three primary sources to analyze NIH funding, HPV vaccine coverage, and cancer mortality rates across the United States between 2012 and 2021. I first began by preprocessing the data by filtering the abstract data using the keywords “HPV” or “Human Papillomavirus”, and “Cancer” to identify relevant research projects. Then, I conducted data cleaning, linkage, visualization, and performed text analysis. The data preprocessing steps ensure accuracy, consistency, and comparability across datasets.
2012–2021),
converted years to integers, and removed rows with missing or invalid
data.13–17,
including HPV vaccination rates.Chart Description: This scatter plot illustrates the relationship between NIH funding (in U.S. dollars, log scale) and cancer deaths (log scale) across all states and years (2012–2021). Each point represents a state-year observation, and color shading indicates the year.
Why it’s interesting: The use of a double logarithmic scale allows for better visualization of highly skewed data, revealing subtle trends and clustering that would be hidden in linear scale plots. This chart explores whether greater NIH funding corresponds to reductions in cancer mortality across U.S. states over time.
Key insights:
No clear inverse correlation: Although intuitively more funding might be expected to reduce mortality, the scatter plot does not exhibit a strong negative trend. Cancer deaths remain high even with substantial NIH investment in many cases.
High funding, high mortality clusters: Numerous observations exist in the upper right quadrant (e.g., high NIH funding and high cancer deaths), likely corresponding to larger states (e.g., California, Texas, New York) where both funding needs and disease burden are greater due to population size.
Lower funding does not guarantee fewer deaths: Some points with low NIH funding still experience high cancer deaths, possibly reflecting underinvestment in high-burden areas, late-stage disease detection, or healthcare access disparities.
Temporal patterning is subtle: Year-based color gradients are not strongly separated, suggesting no sharp changes year to year in the relationship between funding and mortality. However, slightly lighter blue shades (later years) appear more concentrated in the mid-to-high funding range, indicating possible realignment of NIH investments.
Policy implication: NIH funding alone may not directly reduce cancer deaths at the state level unless paired with effective implementation strategies—such as early screening programs, equitable care access, and HPV vaccination coverage.
This scatter plot shows the relationship between HPV vaccine coverage (%) and cancer deaths across U.S. states and years (2012–2021). Each dot represents a state-year observation, colored by year. A linear regression line is overlaid to indicate the overall trend between these two variables.
This visualization aims to examine whether increasing HPV vaccination coverage at the state level is associated with a decline in total cancer deaths—particularly cancers related to HPV such as cervical cancer. While vaccines are effective at the individual level, this chart investigates whether that impact is observable at the population level across a decade.
No strong downward trend observed: The regression line is nearly flat, suggesting no clear inverse correlation between HPV vaccination rates and total cancer deaths in the aggregated data.
Wide spread at every level: Across all HPV coverage levels (from <25% to >75%), cancer deaths vary significantly—from fewer than 500 to more than 3000—indicating that many other factors influence overall cancer mortality (e.g., age distribution, screening rates, access to care).
Persistent high-death clusters: Densely grouped points around 3000 deaths likely represent large states with higher baseline mortality due to population size rather than low vaccination.
Low-coverage, low-death cases: Some states with low HPV coverage still report low cancer deaths, possibly due to small populations or fewer HPV-related cancer cases.
Chart Description: This scatter plot displays the relationship between NIH funding (in dollars, on a log scale) and HPV vaccine coverage (%) across all U.S. states from 2012 to 2021. Each dot represents a state-year pair, with color intensity indicating the year. A regression line is added to highlight the overall trend.
Why it’s interesting: This chart investigates whether increased federal research investment (via NIH) is associated with greater public health action—in this case, HPV vaccination rates. It offers a view into the indirect translation of biomedical funding into population-level prevention.
Key insights:
Positive correlation present: The upward slope of the regression line suggests a moderate positive association—states receiving more NIH funding tend to report higher HPV vaccine coverage.
Heavy clustering in mid-funding, mid-coverage range: Most observations cluster between $100,000–$10 million in NIH funding and 50%–70% vaccine coverage, indicating typical performance ranges.
Wide variance at all funding levels: Despite the overall trend, vaccine coverage varies substantially even among states with similar NIH support—suggesting other factors (e.g., state policies, public trust, outreach) heavily influence vaccine uptake.
Temporal trends: Later years (lighter blue dots) dominate higher funding and coverage zones, indicating national progress in both investment and immunization between 2012 and 2020.
Policy insight: While NIH funding likely supports awareness and infrastructure, increasing HPV vaccine coverage appears to require more than just funding—including state-level mandates, education campaigns, and provider engagement.
Chart Description: This heatmap visualizes HPV vaccine coverage (%) by U.S. state and year (2012–2021). Each horizontal row represents a state, and each vertical tile represents a year. The color gradient—from dark purple to yellow—indicates the percentage of adolescent vaccine coverage in that state-year.
Why it’s interesting: This chart provides a clear, comparative view of how HPV vaccine adoption has evolved across states and over time. The use of color allows quick identification of leading and lagging states, trends in policy effectiveness, and the presence of data gaps.
Key insights:
Overall upward trend: Many states move from purple to orange/yellow tones over the years, indicating consistent improvements in HPV vaccine uptake across the U.S.
Wide state-level disparity: Some states (e.g., RI, MA, DC) consistently report high coverage rates (75–85%), while others (e.g., MS, WY, UT) show persistently low or missing data, suggesting disparities in public health outreach, mandates, or cultural acceptance.
Gaps in reporting: White tiles indicate missing data, which occur more frequently in early years or in certain states. This suggests either lower data collection compliance or reporting delays.
Notable late adopters: A few states showed slower initial uptake but accelerated progress around 2016–2019—potentially reflecting policy changes or targeted campaigns.
Regional consistency: Neighboring states sometimes share similar coverage patterns, hinting at regional policy alignment (e.g., Northeastern states consistently outperform Southern and Mountain West states).
Chart Description: This pairwise regression matrix displays the relationships between NIH funding (log-transformed), HPV vaccine coverage, and cancer deaths (log-transformed) across all U.S. states from 2012 to 2021. Each cell shows either a correlation coefficient, a density distribution, or a regression trend—colored by year to highlight temporal patterns.
Why it’s interesting: This chart provides a comprehensive multivariate view of how these three core public health variables interact. The year-based coloring allows us to track how these relationships evolve over time and whether stronger associations emerge in more recent years. The matrix design simplifies simultaneous comparison of correlation strength, linear fit, and distributional behavior.
Key insights:
Weak direct relationship between funding and cancer deaths: Across all years, there is little to no correlation between NIH funding and cancer mortality. This suggests that simply increasing funding may not directly reduce cancer deaths without targeted strategies and long-term follow-up.
Positive link between funding and vaccine coverage: While still moderate, the correlation between NIH funding and HPV vaccine coverage shows a consistent upward trend. States receiving more funding tend to show better vaccine uptake—highlighting the indirect role of research investment in promoting prevention.
Emerging negative correlation between vaccine coverage and deaths: In recent years (2020 and 2021), there is a visibly stronger negative correlation between HPV vaccine coverage and cancer deaths (e.g., −0.240 in 2021***), suggesting that public health benefits of HPV vaccination may be starting to manifest at the population level.
Temporal and regional variations: The density plots and color-separated regression lines reveal that state-level behaviors and outcomes vary significantly over time. Earlier years show more scattered patterns, while later years show emerging structure.
Visual layering adds context: The diagonal density plots show the skewed nature of funding and mortality data, while the colored regression fits reveal subtle trends that might be missed in simple scatterplots or summary statistics.
Chart Description: This line chart illustrates the total NIH funding across all U.S. states for each year from 2012 to 2021. The y-axis represents the cumulative dollar amount of NIH grants awarded, with values displayed using a scaled currency format.
Why it’s interesting: This visualization provides a high-level overview of how federal investment in biomedical and public health research—specifically related to HPV and cancer—has evolved over time. Identifying peaks, troughs, and patterns helps contextualize policy shifts, health crises, and funding priorities.
Key insights:
Early decline (2012–2016): A gradual drop in total NIH funding occurred between 2012 and 2016, possibly reflecting post-recession budget constraints or restructuring in federal research priorities.
Funding rebound (2017–2018): Funding rose sharply in 2018, suggesting a significant reinvestment in cancer and vaccine-related research. This could be attributed to renewed federal initiatives or expansion of HPV-related studies.
Pandemic dip (2019–2021): A dramatic funding decline is evident from 2019 to 2021. This trend likely reflects disruptions caused by the COVID-19 pandemic, such as budget reallocations toward emergency response, clinical trial delays, or postponed research cycles.
Long-term volatility: The jagged trendline highlights volatility in NIH support year-over-year. This could be due to changes in political administrations, grant program cycles, or shifting national priorities in biomedical research.
Policy implication: Fluctuating funding levels can impact long-term research planning, especially in public health areas requiring consistent investment (e.g., HPV prevention, vaccine innovation, community outreach).
Chart Description: This line chart displays average HPV vaccine coverage by U.S. Census region (excluding missing/undefined entries) between 2012 and 2021.
Why it’s interesting: It allows clear comparison of regional adoption rates of the HPV vaccine, reflecting differences in public health investment, school mandates, and outreach.
Key insights:
Chart Description: This horizontal bar chart presents the total number of cancer deaths in each U.S. state from 2012 to 2021. States are ranked in descending order, making it easy to identify those with the highest and lowest cumulative cancer mortality.
Why it’s interesting: This chart offers a clear visual summary of which states bear the heaviest absolute cancer burden over the past decade. Unlike rates or per capita measures, this view emphasizes where the largest raw numbers of cancer deaths occur, often reflecting population size, healthcare access, and disease detection systems.
Key insights:
Top contributors: California, New York, Texas, and Florida clearly stand out, with California reporting significantly more deaths than any other state—likely due to its population and extensive cancer tracking systems.
Regional distribution: Many Northeastern and Midwestern states (e.g., Pennsylvania, Ohio, Illinois) also fall in the top half, underscoring both their population size and high cancer incidence.
Long tail of low totals: A steep drop-off occurs after the top 10 states, with a long tail of states reporting far fewer deaths—these include low-population states like Wyoming, Vermont, and North Dakota.
Policy relevance: While this chart doesn’t normalize for population size, it provides a foundation for evaluating where absolute impact is greatest—and where large-scale prevention and care programs may have the biggest effect.
Next step: To refine analysis, consider comparing this with per capita cancer death rates or plotting cancer deaths vs. NIH funding to assess resource alignment.
Chart Description: This choropleth map visualizes total cancer mortality counts across U.S. states from 2012 to 2021. The color gradient—ranging from dark purple (low) to bright yellow (high)—indicates the total number of cancer-related deaths reported in each state over the 10-year period.
Why it’s interesting: This map offers an intuitive geographic overview of where the cancer burden has been highest in absolute terms. It allows public health officials, researchers, and policymakers to quickly identify the most affected areas and consider the distribution of federal and state-level resources.
Key insights:
High-death states: California, Texas, New York, and Florida exhibit the highest total cancer deaths. These states have both large populations and major urban centers, contributing to their high totals.
Northeast concentration: States like Pennsylvania, New Jersey, and Massachusetts also show notably high mortality, reflecting both population density and possibly higher diagnostic/reporting completeness.
Low-visibility states: Several states in the Midwest, Mountain West, and Great Plains regions show very dark colors—suggesting lower total deaths. However, these values may understate the per capita burden due to small populations.
Data anomalies or gaps: Grey areas (e.g., Alaska, Wyoming, Vermont) may represent missing or incomplete data, or aggregation issues during the merging process. These should be flagged and cleaned before comparative interpretation.
Policy implications: This map can be used to guide regional prioritization of screening, education, and prevention initiatives—especially when combined with funding and vaccination data layers.