Problem Statement

The novel infectious respiratory disease outbreak in California (CA) presents an urgent public health challenge, requiring robust surveillance and resource allocation strategies. As a public health professional at the California Department of Public Health, our task is to analyze outbreak data to uncover its progression, identify disproportionately affected demographic and geographic populations, and provide actionable recommendations for resource allocation. Using three datasets, the analysis will focus on the incidence rate of morbidity across demographic groups (age, race, sex) and geographic locations to ensure equity and efficiency in the response. The goal is to inform evidence-based strategies to prioritize prevention and treatment resources for the populations and regions most affected by the outbreak.

Methods

In this analysis, three datasets from the CA Department of Public Health were used. The data included weekly cases and case severity of a respiratory disease by demographic and geographic categories for CA counties excluding Los Angeles (LA); case data by demographic and geographic categories for LA; and population estimates by demographics for counties in 2023.

Data cleaning

To perform data cleaning, the structure of variables in the three datasets were evaluated to ensure alignment with the proper data type. Variable names were first standardized to naming conventions established by the group across all datasets, with all letters lowercase with underscores between spaces. Variables for the CA cases dataset, LA cases dataset, and CA population dataset were renamed to match one another. Categorical variables (sex, age, race/ethnicity) were factored to enable proper use of statistical tools and were set with appropriate reference levels. Date variables (date of diagnosis, date of report, respiratory season week/year) were converted into date formats. Date of diagnosis was recreated to match the definitions stated in the codebook. Values of categorical variables (e.g. race and ethnicity) with improper labels were re-coded accordingly and standardized across datasets. To enable proper joining of datasets, a county variable was created in the LA cases dataset. After ensuring data harmonization and cohesion, a single morbidity case dataset was created by joining the CA cases dataset and LA cases dataset according to matched variable names.

Morbidity data

To address the specific research question, certain demographic (age, race/ethnicity, sex) and geographic (county) strata were selected. Total new infections and new severe infections were calculated by county, sex, and race/ethnicity to identify at-risk groups. These summary statistics were combined into one dataset for analysis.

Population data

Total population data were aggregated by county, sex, race/ethnicity, and age and combined into one database.

Final data and analytical method

Aggregated morbidity data and aggregated population data were merged into one dataset by county, sex, race/ethnicity, and age category. Then, prevalence of new infections and new severe infections were calculated by dividing the sum of cases over the total aggregated population data.

Analysis and Visualizations

Figure 1: Cumulative Incidence Rate of Novel Infectious Respiratory Disease in California by County in 2023

Cumulative Incidence Rate of Novel Infectious Respiratory Disease in California by County in 2023
Cumulative Incidence (%)
New Infection Severe Infection
County
Alameda 9.00 0.25
Alpine 10.99 0.34
Amador 8.80 0.35
Butte 10.12 0.33
Calaveras 10.48 0.45
Colusa 21.12 0.62
Contra Costa 8.99 0.28
Del Norte 10.36 0.31
El Dorado 10.10 0.38
Fresno 8.49 0.22
Glenn 9.05 0.30
Humboldt 9.51 0.31
Imperial 43.77 1.09
Inyo 26.89 1.11
Kern 24.19 0.56
Kings 22.84 0.53
Lake 10.42 0.40
Lassen 10.54 0.36
Los Angeles 9.02 0.26
Madera 10.69 0.28
Marin 9.55 0.36
Mariposa 11.27 0.46
Mendocino 10.18 0.38
Merced 22.62 0.54
Modoc 11.97 0.43
Mono 9.37 0.31
Monterey 10.11 0.30
Napa 9.42 0.30
Nevada 9.84 0.40
Orange 8.83 0.26
Placer 8.93 0.29
Plumas 12.76 0.50
Riverside 8.91 0.26
Sacramento 19.43 0.53
San Benito 8.13 0.25
San Bernardino 22.48 0.55
San Diego 8.71 0.25
San Francisco 10.04 0.28
San Joaquin 19.15 0.49
San Luis Obispo 12.59 0.45
San Mateo 9.08 0.29
Santa Barbara 9.07 0.26
Santa Clara 9.05 0.26
Santa Cruz 10.01 0.32
Shasta 9.06 0.33
Sierra 12.82 0.67
Siskiyou 11.04 0.44
Solano 18.88 0.55
Sonoma 9.65 0.34
Stanislaus 19.95 0.53
Sutter 19.15 0.53
Tehama 19.55 0.63
Trinity 7.83 0.33
Tulare 23.69 0.56
Tuolumne 10.20 0.39
Ventura 9.01 0.27
Yolo 10.20 0.26
Yuba 17.31 0.44
Legend:
Red = Risk greater than 15 for new infections and greater than 0.5 for severe infections.
Orange = Risk greater than 10 for new infections.
Data Source
1 Case data was from the California Department of Public Health. Population denominators were retrieved based on demographic category and county in California for 2023.

Figure 2: Cumulative Incidence of Novel Respiratory Infections by Age in California, 2023

Figure 3: Cumulative Incidence of Novel Respiratory Infections by Race/Ethnicity in California, 2023

Results

The main variables of interest to determine public health resources to address the novel respiratory disease were age, county, and race/ethnicity.

The cumulative incidence rate of respiratory infections in California is highest among individuals aged 65 and older (19.1%) and lowest among those aged 0–17 (4.1%).

Counties with a cumulative incidence rate of new infections larger than 17% were Colusa, Imperial, Inyo, Kern, Kings, Merced, Sacramento, San Bernardino, San Joaquin, Solano, Stanislaus, Sutter, Tehama, Tulare, and Yuba, with Imperial County having the highest cumulative incidence at 44.77%. Some counties with the lowest cumulative incidence rate of new infections were San Diego, San Benito, Riverside,Contra Costa, Amador, Orange, and placer counties with cumulative incidence rates of new infections around 7-9%. Trinity County had the lowest risk of new infection at 7.83%.

The cumulative incidence rate of respiratory infections in California is highest among non-Hispanic American Indian/Alaska Native individuals (14%) followed by non-Hispanic White individuals (12.8%) and non-Hispanic Black individuals (12.3%), while it was lowest among non-Hispanic multiracial individuals (7.2%) and non-Hispanic Asian individuals (8.7%).

Discussion

Based on the analyses, California Department of Public Health should consider increasing treatment resources and prevention tactics to the following groups: individuals aged 65 years and older, counties with a cumulative incidence rate greater than 17%, and non-Hispanic American Indian/Alaska Native individuals.

Adults aged 65 years and older are susceptible to viral disease due to a weakened immune system from age, as well as potentially have one or more comorbidities that increase severity of infections. Colusa, Imperial, Inyo, Kern, Kings, Merced, Sacramento, San Bernardino, San Joaquin, Solano, Stanislaus, Sutter, Tehama, Tulare, and Yuba Counties had high rates of new infection and as such, resources should be allocated to help alleviate the situation. These counties are unique in that they have various factors that may have contributed to the high rates of infection with the novel respiratory disease. First, San Joaquin, Sacramento, Solano, Stanislaus, Tulare, Kern, Kings, and Merced are located in the Central Valley and agriculture is the main industry in these counties. As such, there are more agricultural workers, typically undocumented, who have to work despite being sick, leading to more spread. Colusa, Tehama and Yuba counties also have agriculture based industries, although not as large as the counties in the Central Valley. On the other hand, Inyo, San Bernardino, and Imperial Counties are counties that border the eastern and southern region of California, and may have limited healthcare infrastructure and resources. Imperial county is on the MexiCali border and may have experienced higher transmission rates due to the proximity to border travel. These counties may have had a higher percentage of essential workers who could not have transitioned to remote work. Overall, these counties are located away from the coastal areas of California, and may have less healthcare infrastructure to support surges and lack of healthcare access. These areas usually do not have as many healthcare professionals for their population in relation to the coastal and affluent counties of California.

The cumulative incidence rate of respiratory infections in California reveals significant disparities across racial and ethnic groups, highlighting critical areas for public health intervention. With Non-Hispanic American Indian/Alaska Native individuals experiencing the highest incidence rate, followed closely by non-Hispanic White and non-Hispanic Black individuals. These elevated rates may reflect underlying systemic inequities, including differential access to healthcare, higher prevalence of comorbid conditions, and social determinants of health such as income, housing stability, and occupational exposures. Understanding these disparities is essential for tailoring evidence-based prevention and treatment efforts, particularly for populations with heightened vulnerability. Focused resources and culturally appropriate interventions can help mitigate these inequities and reduce the overall burden of respiratory infections.