PROBLEM STATEMENT:

The California Department of Public Health has been tasked with overseeing and managing a simulated outbreak of a novel infectious respiratory disease affecting every county in California except Los Angeles County, with the goal of understanding the courses of the outbreak and its impact across various geographic regions and demographic groups to identify any disproportionate effects on specific populations.

To tackle this challenge, it is crucial to track the outbreak’s progression over time, pinpoint high-risk groups based on severity and morbidity data, and analyze infection rates by demographic and geographic factors. These insights will shape targeted prevention and treatment efforts, ensuring resources are allocated to geographic regions or demographic groups where they are most needed. By integrating three diverse data sources (including case counts, disease severity, population demographics, health officer region, and 2023 population estimates) this analysis will be used to generate data to guide the implementation of effective public health interventions and minimize the outbreak’s overall impact.

METHODS:

Source: sim_novelid_CA.csv, sim_novelid_LACounty.csv, ca_pop_2023.csv

Years and Dates of Data: Weekly data for 2023 and population estimates for 2023 Description of Cleaning and Creating New Variables: Missing values were identified and adjusted as appropriate, discrepancies in the way certain variables (eg race/ethnicity) were recorded between datasets was made consistent across datasets before joining. Duplicates were removed. Rows with incomplete demographic or geographic information were removed. A variable was created to calculate cumulative cases by demographic categories and counties. New infection rates per 100,000 population as well as severe infection rates were calculated.

We aggregated total cases and severity counts by age, race, and sex for health officer regions and counties and generated incidence and severity rates to compare the outbreak’s impact across counties and demographics.

VISUALIZATIONS:

Case Rate for New and Severe Cases of Novel Infectious Disease by Health Officer Region

case_rate_region <- final_dataset %>%
  group_by(health_officer_region) %>%
  summarize(new_cases = sum(total_new_infections),
            total_pop = sum(pop),
            new_case_rate = round((new_cases/total_pop)*10000, 3),
            severe_cases = sum(total_new_severe),
            severe_case_rate = round((severe_cases/total_pop)*10000,3)) %>%
  ungroup() %>%
  select(health_officer_region, new_cases, severe_cases, total_pop, new_case_rate, severe_case_rate)
datatable(
  case_rate_region,
  options = list(
    order = list(4,5, 'desc'), 
    columnDefs = list(
      list(className = 'dt-center', targets = 1:5) 
    ), 
    dom = 'ti'
  ),
  rownames = FALSE,
  colnames = c("Health Officer Region", "Number of New Cases", "Number of Severe Cases", "Total Population", "New Case Rate per 10,000", "Severe Case Rate per 10,000")
) 

Interpretation: This table shows that Central California is disproportionately affected both by novel infectious disease and severe cases of infectious disease.

Infection by Race and Sex

Infection_by_race_sex <- final_dataset %>%
  group_by(race_ethnicity, sex) %>%
  summarize(mean_infection_rate = mean(infection_rate_per_100k, na.rm = TRUE))

Infection_by_race_sex$wrapped_race_ethnicity <- str_wrap(Infection_by_race_sex$race_ethnicity, width = 25)
  
infection_by_race_sex_plot <- Infection_by_race_sex %>%
  ggplot(aes(x = wrapped_race_ethnicity, y = mean_infection_rate, fill = sex)) +
  geom_bar(stat = "identity", position = "dodge", alpha = 0.8) +
  labs(title = "Mean Infection Rate by Race/Ethnicity and Sex",
       x = "Race/Ethnicity", y = "Mean Infection Rate per 100,000") +
  scale_fill_manual(values = c("lightblue", "lightpink")) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        strip.background = element_rect(fill = "grey90", color = NA),
        strip.text = element_text(face = "bold"))

print(infection_by_race_sex_plot)

Average Infection Rate per 100k per Race/Ethnicity

#This is the same plot as above, but interactive - we weren't sure which plot is more preferrable/useful for these cases so we would love your feedback! Thank you! 
plot_ly(
  agg_data,
  x = ~agg_data_wrapped,
  y = ~average_infection_rate,
  type = ~"bar"
) %>%
  layout(title = "Average Infection Rate by Race/Ethnicity",
         yaxis = list(title = "Average Infection Rate by 10,000"),
         xaxis = list(title = "Race/Ethnicity")
  )

RESULTS:

Central California has the highest case rate for new cases of the novel infectious disease, with 1815.191 new cases per 10,000, as well as the highest severe case rate per 10,000 population at 45, indicating a relatively higher burden of severe cases compared to other regions.

Overall, Non-Hispanic populations are the most affected by severe novel infectious disease compared to other race/ethnicity groups in this dataset. Native Hawaiian or Pacific Islander females are disproportionately affected based on mean infection rates. However, the mean infection rates seem to be equal between males and females per race/ethnicity group.

Overall, males and females seem to be equally affected by novel infectious disease, regardless of region.

DISCUSSION:

With the California data of outbreak of a novel infectious respiratory disease, we conducted a data analysis to see the impact across various geographic regions and demographic groups. Our analysis reveals that Central California is the region that is most severely affected by the outbreak, with the highest rates of both new and severe cases per 100,000 people. This highlights the need for targeted public health interventions in this area to address the outbreak’s intensity. Additionally, this data shows that Non-Hispanic populations are disproportionately affected by the disease, especially when it comes to severe cases. Among the ethnic groups we had access to data for, Native Hawaiian or Pacific Islander females exhibited disproportionately higher mean infection rates. To address these health disparities, culturally sensitive interventions are necessary, to include community engagement and efforts to reduce language barriers. These strategies will be essential to tackle the unique challenges faced by these populations. Since there were no significant differences in infection rates between males and females across most racial/ethnic groups, sex does not appear to play a significant role in infection rates. Public health efforst should, therefore, focus on targeted intervention, resource allocation, and ensuring equitable access to education, healthcare, and preventive services such as testing and vaccines, which will help mitigate the outbreak’s impact, particularly in vulnerable populations.

It is important to consider other confounding factors, such as socioeconomic status and access to healthcare, that could have influenced these results. Since this analysis is only based on data from 2023, future studies should be performed including data from 2024 to identify emerging trends. Further exploration into the outbreak data in Los Angeles County will be important. Due to the diverse population of Los Angeles County, analysis of this data could reveal localized trends if the data is stratifid by neighborhoods and specific demographic groups to identify local trends. Additionally, while sex did not show significant variation in infection rates, further investigation into gender/sex-specific factors may offer valuable insights. When looking at infectious disease data, it is important to continuously monitor new infection rates and ensure the integrity of data collection, to best guide future public health strategies.

CONCLUSION:

This analysis provides a comprehensive overview of the disease outbreak across California, offering critical information for public health officials to best implement interventions. By identifying the high-risk region of Central California and specific racial/ethnic groups disproportionately affected, particularly Native Hawaiian or Pacific Islander females, public health interventions can be more effectively taiolred to meet the needs of vulnerable populations. Targeted interventions, along with community-based collaborations, will be essential in reducing health disparities and improving the outcomes of the outbreak across all communities in California. By focusing on these groups and ensuring equitable access to healthcare, education, and preventive services, we can mitigate the impact of the disease and enhance overall public health efforts.