2024-12-02

Problem Statement

The California Department of Public Health (CDPH) is facing a critical challenge in addressing an ongoing infectious disease outbreak across the state. As public health professionals, we must understand the trajectory of this outbreak, identify populations that are being disproportionately affected, and determine the most effective allocation of public health resources. Therefore, our task is to analyze the current outbreak data to map its course, paying particular attention to how it may be affecting different demographic groups and geographic areas within California. This analysis is crucial for several reasons: it will help us identify vulnerable populations that may require targeted interventions, understand how population density and distribution are influencing disease spread, and determine if certain communities are experiencing higher attack rates or facing greater barriers to care. By gaining these insights, we can make informed decisions about where to allocate limited prevention and treatment resources, ensuring that our response is both equitable and effective in mitigating the outbreak’s impact across all of California’s diverse communities.

Methods

Dataset 1: novel_id_ca

This dataset contains weekly data available to CDPH about a novel infectious respiratory disease outbreak from May 29, 2023 to December 25, 2023. Provided information includes the number of cases and case severity by demographic categories (age category, race, sex) and geographic categories (county) for all California counties except Los Angeles county. Given its large size, data for Los Angeles county were contained in a separate dataset (Dataset 2).

To properly analyze the course and impact of the outbreak, it was necessary to combine all available datasets into one complete, master dataset. This required cleaning and recoding exisiting variables for consistency across datasets, as well as creating new variables for further analysis. Within this dataset, the “dt_diagnosis” column was reclassified to a date class, and the “race_ethnicity” column was reclassified from a numeric coding system to a character system.

Dataset 2: novel_id_la

This dataset contains weekly outbreak morbidity data and population data by demographic and geographic categories available to CDPH for Los Angeles county from May 29, 2023 to December 25, 2023.

To prepare this dataset for merging with the other available datasets, a “county” column was created so that Los Angeles county could be incorporated with the other county data. All columns in this dataset were renamed to match those in Dataset 1 (novel_id_ca), and the column “DT_REPORT” was removed as it would not be used for analysis. The column “dt_diagnosis” was reclassified to a date class, and Dataset 1 (novel_id_ca) and Dataset 2 (novel_id_la) were then ready to be joined. After combining these datasets, the “time_int” column was reclassified as a date class in order to appropriately reflect epi week.

Dataset 3: ca_pop

This dataset contains population estimates available to CDPH for 2023 by California county and demographic categories (age category, race, sex).

To create a master dataset, the “race_ethnicity” column in this dataset was reclassified from numeric to character, as was done in Dataset 1. Data were then grouped by county and race_ethnicity–the strata of interest for this investigation–and two new columns were created to provide the population by race and county (pop_race_county) and the total county population (total_county_pop). Finally, this dataset was joined with the morbidity data (Datasets 1 and 2) to produce a master dataset.

Analytic Methods

Once all datasets were combined, analysis was performed using this master dataset. To evaluate the course of the outbreak…

To evaluate the effect of the outbreak on demographic and geographic populations, as well as make considerations for the allocation of prevention and treatment resources, the total number of infections and the total number of severe infections were examined by racial/ethnic group and county of residence. Beginning with total infections, this metric was calculated by first grouping the dataset by county and race/ethnicity. The sum of new infections within this group was then divided by the total population by race and county. The resulting values were then multiplied by 100 to obtain the percentage of individuals within each group that had contracted the disease, which thereby identified the demographic and geographic populations most affected by the outbreak and thus potentially in greatest need of prevention resources.

Next, a nearly identical analysis was performed for severe infections. Again grouping by county and race/ethnicity, the sum of new severe infections was divided by the total population by race and county. The resulting values were then multiplied by 100 to obtain the percentage of individuals within each group that had contracted severe disease. This, in turn, identified the demographic and geographic populations most affected by severe manifestations of the outbreak and thus potentially in greatest need of treatment resources.

Results

Geographic and Demographic Impact of the Outbreak

Table 1 depicts the top ten populations infected with disease by race/ethnicity and county. Imperial County appears to be the most severely affected, with 7 out of the 10 highest infection rates. The highest infection rate is among Black residents of Imperial County at 46.4%. Other racial/ethnic groups in Imperial County also show high infection rates, ranging from 39.89% to 43.19%. Inyo County is the only other county represented in this top ten list, with three racial/ethnic groups showing high infection rates, particularly among Native Hawaiian or Pacific Islander residents at 40%.

The data suggest a disproportionate impact on minority communities, particularly in Imperial County.

Table 1: Top 10 Populations Infected with Disease by Race/Ethnicity and County
County Race/Ethnicity Population Infected (%)
Imperial County Black, Non-Hispanic 46.40
Imperial County Asian, Non-Hispanic 43.19
Imperial County American Indian or Alaska Native, Non-Hispanic 43.17
Imperial County Native Hawaiian or Pacific Islander, Non-Hispanic 43.00
Imperial County White, Non-Hispanic 42.27
Imperial County Multiracial (two or more of above races), Non-Hispanic 40.40
Inyo County Native Hawaiian or Pacific Islander, Non-Hispanic 40.00
Imperial County Hispanic (any race) 39.89
Inyo County Black, Non-Hispanic 33.00
Inyo County Asian, Non-Hispanic 32.07
*Source: CDPH

Interpretation: This table displays the 10 populations most affected by the outbreak, as calculated by the percentage of total infections per racial/ethnic group and California county. Imperial and Inyo counties appear to be disproportionately affected from a geographic standpoint, with the Black, Asian, and Native Hawaiian/Pacific Islander populations most affected within those counties.

Table 2: California Population Infected with Disease by Race/Ethnicity and County

Geographic and Demographic Impact of Severe Disease

Table 3 depicts the top ten populations infected with severe disease by race/ethnicity and county. The highest rate of severe infection is among Asian residents in Plumas County at 1.89%. This is followed closely by Native Hawaiian or Pacific Islander residents in Madera County at 1.74%. Imperial County appears twice in the top ten, with White (1.67%) and Multiracial (1.61%) populations showing relatively high rates of severe infection. This aligns with data presented above indicating that Imperial County has faced significant challenges with the outbreak.

Several rural counties are represented in this list, including Plumas, Madera, Tehama, Colusa, Inyo, and Sutter. This suggests that some rural areas may be disproportionately affected by severe cases of the disease. Furthermore, Native Hawaiian or Pacific Islander populations appear three times in the top ten, indicating this group may be at higher risk for severe infection. While overall rates of severe infection remain low, the data suggest potential disparities in severe infection rates among different racial and ethnic groups across various counties that warrants close surveillance.

Table 3: Top 10 Populations Infected with Severe Disease by Race/Ethnicity and County
County Race/Ethnicity Population with Severe Infection (%)
Plumas County Asian, Non-Hispanic 1.89
Madera County Native Hawaiian or Pacific Islander, Non-Hispanic 1.74
Imperial County White, Non-Hispanic 1.67
Imperial County Multiracial (two or more of above races), Non-Hispanic 1.61
Tehama County Native Hawaiian or Pacific Islander, Non-Hispanic 1.49
Colusa County Asian, Non-Hispanic 1.39
Inyo County White, Non-Hispanic 1.38
Sutter County Native Hawaiian or Pacific Islander, Non-Hispanic 1.31
Inyo County Asian, Non-Hispanic 1.27
Colusa County American Indian or Alaska Native, Non-Hispanic 1.24
*Source: CDPH

Interpretation: This table displays the 10 populations most affected by severe cases within the disease outbreak, as calculated by the percentage of total severe infections per racial/ethnic group and California county. The Asian population in Plumas county has been most affected, though only 1.89% of this population contracted severe disease. Therefore, these data suggest that the rate of severe infection remains low and has not significantly impacted a particular geographic and/or demographic population.

Table 4: California Population Infected with Severe Disease by Race/Ethnicity and County

Plots of California Population Infected with Disease and Severe Disease by Race/Ethnicity and County

The plots display the distribution of cases calculated as a percentage of the total population in each county, per racial/ethnic group. Distributions for normal cases and severe cases (defined as cases requiring hospitalization) are shown. The IQR of percent infection for all racial groups lies between 8-20% with a right-skewed median of ~10%, suggesting a fairly similar distribution of general infection apart from the counties and demographics of concern identified in Table 1.

In contrast, the distribution of severe infection varies slightly: while Asian and Pacific Islander populations are most impacted by severe disease in certain counties, the IQR and median of severe infection is highest among White populations. As noted above, the rate of severe infection is low overall and does not suggest that further action needs to be taken beyond standard surveillance.

This plot depicts the total number of new cases (incidence) per 1,000 people by epi week of the outbreak. The outbreak started in epi week 22 and peak incidence was in weeks 35- 39. No new cases were reported in week 49, 51 and 52. Data are additionally stratified by race, depicting the total number of new cases by epi week for each racial/ethnic group. During the peak of the outbreak, there are disproportionately higher incidence in population identifying as American Indian or Alaskan Native, Non-Hispanic compared to other ethnic groups.

Discussion

The results detailed above indicate that certain demographic and geographic populations are disproportionately affected by the disease outbreak. Notably, Imperial County has the highest overall infection rates, particularly among certain minority populations (Black, Native Hawaiian or Pacific Islander, and Asian). This suggests a significant outbreak in this area, while rural counties such as Plumas, Madera, Tehama, and Colusa also show relatively high rates of severe infection. To address these disparities, resources should be prioritized for Imperial County and targeted interventions developed for vulnerable groups. Additionally, rural counties exhibiting high severe case rates should receive adequate medical support. Finally, culturally appropriate prevention strategies and community outreach programs tailored to the most affected populations are essential for effective resource allocation and management of the outbreak across California’s diverse communities.

Appendix

Descriptive Statistics

Table 5. Counties with the highest percent of each racial group
Race/Ethnicity County Percent of County
American Indian or Alaska Native, Non-Hispanic Alpine County 18.80
Asian, Non-Hispanic Santa Clara County 32.85
Black, Non-Hispanic Solano County 14.19
Hispanic (any race) Imperial County 82.27
Multiracial (two or more of above races), Non-Hispanic Solano County 5.52
Native Hawaiian or Pacific Islander, Non-Hispanic San Mateo County 1.44
White, Non-Hispanic Sierra County 86.30
Table 6. Age Distribution in California
Age Count (n) Percent
0-17 8838544 21.90
18-49 17084891 42.34
50-64 7407769 18.36
65+ 7023013 17.40
Table 7. Gender Distribution in California
Sex Count (n) Percent
FEMALE 20203669 50.07
MALE 20150548 49.93

Table 8. Summary of New Infections
Min. 0.00
1st Qu. 0.00
Median 1.00
Mean 45.18
3rd Qu. 9.00
Max. 12110.00
Table 9. Summary of Severe Infections
Min. 0.00
1st Qu. 0.00
Median 1.00
Mean 45.18
3rd Qu. 9.00
Max. 12110.00

Data Dictionary

Variable Type Description
age_cat character Age in this dataset is categorized for simplicity and represents the age groups
time_int double This variable contains the epi week and allows for standardization of data
new_infections integer It contains the new reported cases of the infectious disease per epi week
pop_race_county integer This variable contains the racial breakdown of a county popuation
total_county_pop integer This variable contains the total population of a county in the state of CA