PROBLEM STATEMENT:

The California Department of Public Health has been tasked with overseeing and managing a simulated outbreak of a novel infectious respiratory disease affecting every county in California except Los Angeles County. The goal is to understand the course of the outbreak and its impact across various geographic regions and demographic groups, to identify any disproportionate effects on specific populations.

To tackle this challenge, it is crucial to track the outbreak’s progression over time, pinpoint high-risk groups based on severity and morbidity data, and analyze infection rates by demographic and geographic factors. These insights will shape targeted prevention and treatment efforts, ensuring resources are allocated to geographic regions or demographic groups where they are most needed. By integrating three diverse data sources (including case counts, disease severity, population demographics, health officer region, and 2023 population estimates) this analysis will be used to generate data to guide the implementation of effective public health interventions and minimize the outbreak’s overall impact.

METHODS:

Data Sources: sim_novelid_CA.csv, sim_novelid_LACounty.csv, ca_pop_2023.csv

Years and Dates of Data: Weekly data for 2023 and population estimates for 2023

Description of Cleaning and Creating New Variables: Missing values were identified and adjusted as appropriate, discrepancies in the way certain variables (eg race/ethnicity) were recorded between datasets was made consistent across datasets before joining. Duplicates were removed. Rows with incomplete demographic or geographic information were removed. A variable was created to calculate cumulative cases by demographic categories and counties. New infection rates per 100,000 population as well as severe infection rates were calculated.

Total cases and severity counts were aggregated by age, race, and sex for health officer regions and counties. Incidence and severity rates were computed to assess the outbreak’s geographic and demographic impact.

RESULTS & VISUALIZATIONS:

Central California has the highest case rate of new cases of the novel infectious disease (1815.191 per 10,000), as well as the highest severe case rate (45 per 10,000), indicating a relatively higher burden of severe cases compared to other regions (Table 1). Overall, males and females appeared to be equally affected by infections rates (Figure 1). Finally, Non-Hispanic populations were most affected by severe novel infectious disease (Figure 2). Native Hawaiian or Pacific Islander females were disproportionately affected based on mean infection rates. However, the mean infection rates seem to be equal between males and females per race/ethnicity group (Figure 2).

Table 1: Case Rate for New and Severe Cases of Novel Infectious Disease by Health Officer Region

Interpretation: This table shows that Central California is disproportionately affected both by novel infectious disease and severe cases of infectious disease.

Figure 1: Infection by Race and Sex

Figure 1 shows that overall, males and females seem to be equally affected by novel infectious disease, regardless of region.

Figure 2: Average Infection Rate per 100k per Race/Ethnicity

Overall, Non-Hispanic populations are the most affected by severe novel infectious disease compared to other race/ethnicity groups in this dataset (Figure 2). Native Hawaiian or Pacific Islander females are disproportionately affected based on mean infection rates. However, the mean infection rates seem to be equal between males and females per race/ethnicity group.

DISCUSSION:

Using data from an outbreak of a novel infectious respiratory disease in California, we conducted an analysis to evaluate the outbreak’s impact across various geographic regions and demographic groups. Our analysis reveals that Central California is the region that is most severely affected by the outbreak, with the highest rates of both new and severe cases per 100,000 people. This highlights the need for targeted public health interventions in this area to address the outbreak’s intensity. Additionally, we found that Non-Hispanic populations are disproportionately affected by the disease, particularly when it comes to severe cases. Among the ethnic groups we had access to data for, Native Hawaiian or Pacific Islander females exhibited disproportionately higher mean infection rates. To address these health disparities, culturally sensitive interventions are necessary, to include community engagement and efforts to reduce language barriers. These strategies will be essential to tackle the unique challenges faced by these populations. Since there were no significant differences in infection rates between males and females across most racial/ethnic groups, sex does not appear to play a significant role in infection rates. Public health efforst should, therefore, focus on targeted intervention, resource allocation, and ensuring equitable access to education, healthcare, and preventive services such as testing and vaccines, which will help mitigate the outbreak’s impact, particularly in vulnerable populations.

It is important to consider other confounding factors, such as socioeconomic status and access to healthcare, that could have influenced these results. Since this analysis is only based on data from 2023, future studies should be performed including data from 2024 to identify emerging trends. Further exploration into the outbreak data in Los Angeles County will be important. Due to the diverse population of Los Angeles County, analysis of this data could reveal localized trends if the data is stratifid by neighborhoods and specific demographic groups to identify local trends. Additionally, while sex did not show significant variation in infection rates, further investigation into gender/sex-specific factors may offer valuable insights. When looking at infectious disease data, it is important to continuously monitor new infection rates and ensure the integrity of data collection, to best guide future public health strategies.

CONCLUSION:

This analysis provides a comprehensive overview of the disease outbreak across California, offering critical information for public health officials to best implement interventions. By identifying the high-risk region of Central California and specific racial/ethnic groups disproportionately affected, particularly Native Hawaiian or Pacific Islander females, public health interventions can be more effectively taiolred to meet the needs of vulnerable populations. Targeted interventions, along with community-based collaborations, will be essential in reducing health disparities and improving the outcomes of the outbreak across all communities in California. By focusing on these groups and ensuring equitable access to healthcare, education, and preventive services, we can mitigate the impact of the disease and enhance overall public health efforts.