Background

The 20th century saw great reductions in incidences of diseases such as polio through the use of vaccines as a preventative measure.

Unfortunately, in the 21st century we are now seeing a reemergence of vaccine-preventable diseases as the anti-vaccination movement gains more traction in certain segments of society.

This is a major problem for public health.

In this project, I will use data on incidences of vaccine-preventable diseases (including diptheria, pertussis, etc.) to document how the incidence of these diseases has changed over the past 10-20 or so years.

I will also place these incidences in a geographic context, which perhaps may highlight where vaccine-preventable diseases (possibly correlated with higher anti-vaccination adherence) are especially prevalent.

All of this will be in relation to the state of California, which has a nice dataset available documenting vaccine-preventable diseases over a long time span and in great detail.

Data

Main data

This data is available from the following link:

https://data.chhs.ca.gov/dataset/vaccine-preventable-disease-cases-by-county-and-year

From the website:

“These data contain counts of Immunization Branch-related disease cases among California residents by county, disease, and year.

The California Department of Public Health (CDPH) maintains a mandatory, passive reporting system for a list of communicable disease cases and outbreaks. The CDPH Immunization Branch conducts surveillance for vaccine preventable diseases. Health care providers and laboratories are mandated to report cases or suspected cases of these communicable diseases to their local health department (LHD). LHDs are also mandated to report these cases to CDPH."

This dataset contains information on all 58 counties in California, plus a statewide total across all counties.

Diseases include: Diphtheria, Hepatitis A, Hepatitis B, acute, Hepatitis C, acute, Invasive Meningococcal Disease, Measles, Mumps, Pertussis, Rubella, Tetanus, and Varicella Hospitalizations.

Data spans 2001-2017 overall. However, mumps, rubella, and varicella hospitalizations were not measured in this dataset until 2010, while all three forms of hepatitis were not measured until 2011.

Supplementary data

I may need to supplement this data with population totals per county per year from 2001-2017, as we may want to adjust case counts by county population for a fairer comparison. I have not found this supplementary data yet, but I am sure it should not be hard to find.

Analysis

With three variables here (disease/county/year), there are a lot of possible visualizations that could be made.

One obvious choice would be a simple map, colored by relative prevalence of vaccine-preventable diseases. This could be either across diseases/timepoints (e.g. cumulative from 2001-2017), or specific to a given disease and/or year.

This map could also be generated based on user input for a given disease and/or year, in a Shiny or Dash app.

If we wanted to focus on the change over time (e.g. to show an increase in cases over time), one could also make a simple line plot with time on the x-axis and cases on the y-axis. Again, this plot could be across diseases and counties, or specific to a particular disease and/or county. We may also want to compare two counties, or a given county to the statewide values. All of these possibilities could be rendered in either a static interface, or interactive based on user input of disease and/or county.

A final possiblity would be if we wanted to compare geographic patterns over time. For example, perhaps we have a hypothesis that those with anti-vaccine sentiments now tend to settle more in county X, whereas before they used to settle more in county Y. And that this has led to more vaccine-preventable disease cases in county X, while county Y would be the same or decreased. For this, a pair of side-by-side maps from the earlier vs. later year would be helpful. Again, this could be either a static output, or created based on a user-specified disease and/or pair of years.