Project Milestone 6
Problem Statement
In our research on COVID-19 vaccinations and flu rates in California, we are specifically investigating the correlation within the Hispanic population. Focusing on Hispanics is essential due to their known health disparities, and understanding their vaccination and flu trends is crucial to address these inequities. Given the disproportionate impact of COVID-19 on minority communities, including Hispanics, our project contributes to achieving public health equity by focusing on a demographic that requires targeted interventions for comprehensive healthcare access. The project aims to analyze and visualize the correlation of COVID-19 vaccination rates and infection risks across counties in California.
Methods
We utilized three primary datasets: Los Angeles Flu Data, California Flu Data and COVID-19 Vaccine Data across all counties in California. First, the 2 provided flu datasets (LA county flu data and California flu data) were processed by handling missing values, recoding variables, and combining. Specifically, we recoded the values for the variables “sex” and “race_ethnicity” to match across both provided datasets (the latter was key for later evaluating the Hispanic subpopulation), generated a “county” variable for the LA dataset, and renamed variables including the “population” variable to match across both sources. Finally, we removed the word “County” from all response options for the variable “county”, as this variable would later be used for merging with the vaccination data (which only included the county name, without the word “County” included). As both flu datasets had been cleaned to contain common column names, the “bind_rows” command was used to join the two datasets together; the variables selected for inclusion for further analysis/generation of additional variables included: county, age_cat, sex, race_ethnicity, dt_diagnosis, pop, and new_infections. Once joined and variables of interest were selected, a new variable called “quarter” was generated using the “dt_diagnosis” variable and based on the date cutoffs included in the vaccination dataset, for later ease of merging the datasets together. Finally, a variable for flu rate per 100,000 population was generated by the demographic variable of interest (in this case, race_ethnicity), county and quarter. We grouped by these variables, and then divided “new_infections” by “pop” and multiplying by 100,000 to generate a variable named “flu_risk”.
Similarly, vaccination data was cleaned by addressing missing and erroneous values, ensuring compatibility with flu data. This included re-coding values for the demographic_value variable for race_ethnicity (by generating a new variable called “demographic_value_new”), fixing the “quarter” variable to be in date format, and filtering the dataset for the time frame that matched with flu data (the vaccination data contained one more quarter beyond 2023-04-01 not in the flu data). This resulted in a dataset that contained vaccination data ranging from 2022-07-01 to 2023-04-01. A COVID vaccination rate (“vax_rate”) was then calculated for each unique demographic category in each quarter and county by first grouping by these variables, and then dividing “fully_vaccinated” by “estimated_pop”. Finally, the data was pivoted from long to wide in order to generate a column entitled “race_ethnicity”, which would later facilitate merging with the flu data which contained the same variable. We filtered out any rows of data for which the information in the “fully vaccinated” column were “NA”.
Finally, cleaned datasets as described above were joined together using an inner join with variables “county”, “quarter”, and “race_ethnicity”; the variables county, quarter, estimated_pop, fully_vaccinated, vax_rate, race_ethnicity, pop, new_infections, and flu_risk were included in the final dataset. When patterns specific to the subpopulation of Hispanic people in the quarter starting on 1-1-2023 began to emerge, we filtered for this quarter and race_ethnicity to explore more in depth. The final project results are presented using visualizations, including interactive tables and plots and explore patterns and correlations between COVID vacciantion rates and flu rate derived from these merged datasets in this subpopulation of people.
Results
Table 1 presents an interactive display showcasing flu and vaccination data across all counties in California, categorized by different levels of race/ethnicity and by quarter. To enhance visibility, we have highlighted groups with less than 70% COVID-19 vaccination coverage in yellow. The table is sorted based on the flu risk per 100,000 population, and an additional column indicates whether the infection risk for a specific race/ethnicity category in a given quarter exceeds the average infection risk for that group across all quarters. Notably, the Hispanic (any race) and Multiracial categories consistently exhibit vaccination rates below 70%, coupled with elevated flu risks per 100,000 in various quarters.
The bar graph (Plot 1) illustrates the flu infection rate per 100,000 population across all counties, grouped by race and quarter. While certain race/ethnicities, such as Alaska Native, White (Non-Hispanic), and Hispanic (any race), consistently demonstrate higher infection rates throughout all quarters, the data reveals a notable peak in infections during the quarter starting on 1/1/2023, as indicated by the medium blue segment in each bar.
Considering the trends observed in Table 1 and the consistent flu rates across races, with a peak in the quarter starting on 1/1/2023 (as shown in Plot 1), Plot 2 focuses specifically on the Hispanic (any race) population. This visualization aims to explore potential correlations between COVID vaccination rates and flu within this specific racial group during the quarter with the highest infection rates. The plot reveals variations in flu rates, ranging from 32.37K per 100,000 in San Mateo county to 34.92K per 100,000 in Calaveras county for the Hispanic (any race) population.
Plot 3 investigates the relationship between the flu rate by county for the Hispanic (any race) population in the quarter starting on 1/1/2023 and the proportion of the population that is fully vaccinated against COVID-19.
To investigate potential correlations between flu rates and COVID-19 vaccination coverage, we generated a plot showing the proportion of the population vaccinated against the rate of flu infection. We see that within Hispanic populations in the quarter starting on 1/1/23, there may be pattern demonstrating potential lower rates of flu infection per 100,000 among populations with higher proportions who are fully vaccinated against COVID-19, as demonstrated by the blue descending line of best fit.
Discussion
The findings of our research show insights into the correlation between COVID-19 vaccination rates and flu infection risks within the Hispanic population in California. The interactive table (Table 1) highlights disparities in vaccination coverage, particularly among Hispanic (any race) and Multiracial categories, where less than 70% of the population is vaccinated. These groups consistently exhibit higher flu risks per 100,000 population, emphasizing the vulnerability of certain demographic segments. Plot 1 illustrates a significant increase in flu infections per 100,000 across all races in the quarter starting on 1/1/2023. Examining the Hispanic (any race) population in Plot 2 reveals varying flu rates by county in the same quarter. Plot 3 suggests a potential inverse correlation between COVID-19 vaccination rates and flu infections in Hispanic populations, indicating that higher vaccination rates may be associated with lower flu risks, highlighting the protective impact of vaccination.