Problem Statement

The California Department of Public Health (CDPH) faces a critical challenge in addressing an ongoing infectious disease outbreak. To respond effectively, we must analyze current outbreak data to map its trajectory, identify disproportionately affected populations, and understand how population density and distribution influence disease spread. This analysis will help us pinpoint vulnerable groups requiring targeted interventions and determine which communities experience higher attack rates or face greater barriers to care. By gaining these insights, we can make informed decisions about the allocation of limited prevention and treatment resources, ensuring our response is both equitable and effective in mitigating the outbreak’s impact across California’s diverse communities.

Methods

Dataset 1: novel_id_ca

This dataset contains weekly data on a novel respiratory disease outbreak in California from May 29 to December 25, 2023. Provided information includes case numbers and severity by demographic categories (age, race, sex) and geography for all counties except Los Angeles, which is contained in a separate dataset (Dataset 2).

To analyze the outbreak, it was necessary to combine all available datasets into one master dataset. This required cleaning and recoding existing variables for consistency across datasets, as well as creating new variables for further analysis. Specifically, the “dt_diagnosis” column was reclassified to a date, and the “race_ethnicity” column was reclassified from numeric to character.

Dataset 2: novel_id_la

This dataset contains weekly outbreak morbidity and population data for Los Angeles County from May 29 to December 25, 2023, categorized by demographic and geographic factors.

To prepare for merging with the other datasets, a “county” column was created to incorporate Los Angeles with other county data, and all columns were renamed to match those in Dataset 1. The “DT_REPORT” column was removed as it would not be used for analysis, and the “dt_diagnosis” column was reclassified as a date. After combining with Dataset 1, the “time_int” column was also reclassified as a date to accurately reflect the epidemiological week (“epi week”).

Dataset 3: ca_pop

This dataset contains 2023 population estimates for California counties, categorized by age, race, and sex.

To create a master dataset, the “race_ethnicity” column was reclassified from numeric to character, matching Dataset 1. Data were then grouped by county and race_ethnicity, with two new columns added: “pop_race_county” for population by race and county, and “total_county_pop” for total county population. This dataset was then merged with the morbidity data from Datasets 1 and 2 to produce a comprehensive dataset for analysis.

Analytic Methods

After combining all datasets into a master dataset, analysis was conducted to evaluate the outbreak’s impact on different demographic and geographic populations and inform resource allocation for prevention and treatment. The total number of both non-severe and severe infections were examined by racial/ethnic group and county of residence. This analysis involved grouping the dataset by county and race/ethnicity, then calculating the proportion of individuals within each group who had contracted the disease for both non-severe and severe infections. These results were then used to compile Table 1, as well as box-and-whisker plots utilizing the plotly package (Plot 1, Plot 2). To analyze the course of the outbreak, the total number of new infections and total population in each county were grouped by ethnicity, allowing incidence to be calculated by epi week. An interactive epiweek plot was then created using the plotly package (Plot 3).

Results

Geographic and Demographic Impact of the Outbreak

Table 1 presents the top ten populations infected with both non-severe and severe disease by race/ethnicity and county. For non-severe infections, Imperial County is the most affected, with seven of the ten highest infection rates. Disease rates are highest among Black residents of Imperial County (46.4%), while other racial/ethnic groups in the county show rates between 39.89% and 43.19%. Inyo County is the only other county appearing on this list, with high infection rates among three racial/ethnic groups, particularly Native Hawaiian or Pacific Islander residents (40%). Plot 1 displays this distribution of non-severe cases. Overall, the data indicate a disproportionate impact on minority communities, particularly in Imperial County.

In terms of severe disease—defined as cases requiring hospitalization—Asian residents in Plumas County have the highest infection rate (1.89%), followed closely by Native Hawaiian or Pacific Islander residents in Madera County (1.74%). Imperial County appears twice this list, with White (1.67%) and Multiracial (1.61%) populations showing relatively high rates of severe infection, reinforcing the challenges faced by this county during the outbreak. Several rural counties–including Plumas, Madera, Tehama, Colusa, Inyo, and Sutter–are represented in the severe disease list, suggesting that rural areas may be disproportionately affected. Notably, Native Hawaiian or Pacific Islander populations appear three times among the top ten for severe infections, indicating a higher risk for this group. Plot 2 displays this distribution of severe cases. In summary, while overall rates of severe infection are low, the data highlight potential disparities among different racial and ethnic groups across counties that warrant close surveillance.

Course of the Outbreak

As depicted in Plot 3, the outbreak was first reported in epi week 22 and the last cases were reported in epi week 50. Peak incidence occurred throughout epi weeks 35-39.

Table 1: Impact of Infectious Disease Outbreak in California by Race/Ethnicity and County, May-December 2023
Total Infections
Severe Infections
County Race/Ethnicity Population Infected (%) County Race/Ethnicity Population Infected (%)
Imperial County Black, Non-Hispanic 46.40 Plumas County Asian, Non-Hispanic 1.89
Imperial County Asian, Non-Hispanic 43.19 Madera County Native Hawaiian or Pacific Islander, Non-Hispanic 1.74
Imperial County American Indian or Alaska Native, Non-Hispanic 43.17 Imperial County White, Non-Hispanic 1.67
Imperial County Native Hawaiian or Pacific Islander, Non-Hispanic 43.00 Imperial County Multiracial (two or more of above races), Non-Hispanic 1.61
Imperial County White, Non-Hispanic 42.27 Tehama County Native Hawaiian or Pacific Islander, Non-Hispanic 1.49
Imperial County Multiracial (two or more of above races), Non-Hispanic 40.40 Colusa County Asian, Non-Hispanic 1.39
Inyo County Native Hawaiian or Pacific Islander, Non-Hispanic 40.00 Inyo County White, Non-Hispanic 1.38
Imperial County Hispanic (any race) 39.89 Sutter County Native Hawaiian or Pacific Islander, Non-Hispanic 1.31
Inyo County Black, Non-Hispanic 33.00 Inyo County Asian, Non-Hispanic 1.27
Inyo County Asian, Non-Hispanic 32.07 Colusa County American Indian or Alaska Native, Non-Hispanic 1.24
*Source: CDPH
This table displays: 1) the 10 populations most affected by the outbreak, as calculated by the percentage of total infections per racial/ethnic group and California county, and 2) the 10 populations most affected by severe cases within the disease outbreak, as calculated by the percentage of total severe infections per racial/ethnic group and California county. Considering non-severe infections, Imperial and Inyo counties appear to be disproportionately affected from a geographic standpoint, with the Black, Asian, and Native Hawaiian/Pacific Islander populations most affected within those counties. Regarding severe disease, the Asian population in Plumas county has been most affected, though only 1.89% of this population contracted severe disease. Therefore, these data suggest that the rate of severe infection remains low and has not significantly impacted a particular geographic and/or demographic population.
These plots display the distribution of cases calculated as a percentage of the total population in each county, per racial/ethnic group. Distributions for non-severe and severe cases are shown. The IQR of percent infection for all racial groups lies between 8-20% with a right-skewed median of ~10%, suggesting a fairly similar distribution of general infection apart from the counties and demographics of concern identified in Table 1. In contrast, the distribution of severe infection varies slightly: while Asian and Pacific Islander populations are most impacted by severe disease in high impact counties, the IQR and median of severe infection is highest among White populations statewide. As noted above, the rate of severe infection is low overall, and does not suggest that further action needs to be taken beyond standard surveillance.
This plot depicts the total number of new cases (incidence) per 1,000 people by epi week of the outbreak. The outbreak started in epi week 22 and peak incidence was in weeks 35-39. No new cases were reported in week 49, 51 or 52. Data are additionally stratified by race, depicting the total number of new cases by epi week for each racial/ethnic group. During the peak of the outbreak, there were disproportionately higher incidences of infectious disease in populations identifying as American Indian or Alaskan Native, Non-Hispanic compared to other ethnic groups. This incidence rate was followed by Native Hawaiian or Pacific Islander, Non-Hispanic and White, Non-Hispanic.

Discussion

As public health professionals, we must assess the environmental/exposure events that led to the outbreak in order to prevent future outbreaks of this disease. Additionally, understanding the factors associated with the decline of reported incidence beginning in epi week 44 is vital for appropriate resource allocation.

At the population level, the above results indicate that certain demographic and geographic populations are disproportionately affected by the infectious disease outbreak. Imperial County exhibits the highest overall infection rates, particularly among minority populations such as Black, Native Hawaiian or Pacific Islander, and Asian residents, highlighting a significant outbreak in this area. Additionally, several rural counties demonstrate disproportionately high rates of severe infection. To address these disparities, it is essential to prioritize resources for Imperial County and develop targeted interventions for vulnerable groups. Rural counties with elevated severe case rates must also receive adequate medical support. Finally, implementing culturally appropriate prevention strategies and community outreach programs tailored to the most affected populations is crucial for effective resource allocation and management of the outbreak across California’s diverse communities.

Appendix

Descriptive Statistics

Table 2. Counties with the highest percent of each racial group
Race/Ethnicity County Percent of County
American Indian or Alaska Native, Non-Hispanic Alpine County 18.80
Asian, Non-Hispanic Santa Clara County 32.85
Black, Non-Hispanic Solano County 14.19
Hispanic (any race) Imperial County 82.27
Multiracial (two or more of above races), Non-Hispanic Solano County 5.52
Native Hawaiian or Pacific Islander, Non-Hispanic San Mateo County 1.44
White, Non-Hispanic Sierra County 86.30
Table 3. Age Distribution in California
Age Count (n) Percent
0-17 8838544 21.90
18-49 17084891 42.34
50-64 7407769 18.36
65+ 7023013 17.40
Table 4. Gender Distribution in California
Sex Count (n) Percent
FEMALE 20203669 50.07
MALE 20150548 49.93
Table:
Table 5. Summary of New Infections
Min. 0.00
1st Qu. 0.00
Median 1.00
Mean 45.18
3rd Qu. 9.00
Max. 12110.00
Table:
Table 6. Summary of Severe Infections
Min. 0.00
1st Qu. 0.00
Median 1.00
Mean 45.18
3rd Qu. 9.00
Max. 12110.00