Problem Statement

The novel infectious respiratory disease outbreak in California (CA) presents an urgent public health challenge, requiring robust surveillance and resource allocation strategies. As a public health professional at the California Department of Public Health (CDPH), our task is to analyze outbreak data to track its progression, identify disproportionately affected groups and areas, and recommend targeted interventions. Using three datasets, we will examine morbidity incidence by age, race, sex, and location to ensure an equitable and efficient response, guiding evidence-based strategies to prioritize prevention and treatment resources for the most affected populations and regions.

Methods

Three datasets from the CDPH were used. The data included weekly cases and case severity of a respiratory disease by demographic and geographic categories for CA counties excluding Los Angeles (LA); case data by demographic and geographic categories for LA; and population estimates by demographics for CA counties in 2023.

Data cleaning

The structure of variables in all datasets were matched with the proper data type. Variable names were first standardized to naming conventions across all datasets, with all letters lowercase with underscores between spaces. Variables for the CA cases dataset, LA cases dataset, and CA population dataset were renamed to match. Categorical variables (sex, age, race/ethnicity) were factored to enable proper use of statistical tools and set with appropriate reference levels. Date variables (date of diagnosis, date of report, respiratory season week/year) were converted into dates. Date of diagnosis was recreated to match the definitions stated in the codebook. Values of categorical variables (race, ethnicity) with improper labels were re-coded and standardized across datasets. To enable joining of datasets, a county variable was created in the LA cases dataset. After ensuring data harmonization and cohesion, a single morbidity case dataset was created by joining the CA cases dataset and LA cases dataset according to matched variable names.

Morbidity data

Certain demographic (age, race/ethnicity, sex) and geographic (county) strata were selected. Total new infections and new severe infections were calculated by county, sex, and race/ethnicity to identify at-risk groups. These summary statistics were combined into one dataset for analysis.

Population data

Total population data were aggregated by county, sex, race/ethnicity, and age and combined into one database.

Final data and analytical method

Aggregated morbidity data and aggregated population data were merged into one dataset by county, sex, race/ethnicity, and age category. Prevalence of new infections and new severe infections were calculated by dividing the sum of cases over the total aggregated population data.

Analysis and Visualizations

Figure 1: Cumulative Incidence Rate of Novel Infectious Respiratory Disease in California by County in 2023

Cumulative Incidence Rate of Novel Infectious Respiratory Disease in California by County in 2023
Cumulative Incidence
New Infection Severe Infection
County
Alameda 0.090 0.003
Alpine 0.110 0.003
Amador 0.088 0.003
Butte 0.101 0.003
Calaveras 0.105 0.004
Colusa 0.211 0.006
Contra Costa 0.090 0.003
Del Norte 0.104 0.003
El Dorado 0.101 0.004
Fresno 0.085 0.002
Glenn 0.091 0.003
Humboldt 0.095 0.003
Imperial 0.438 0.011
Inyo 0.269 0.011
Kern 0.242 0.006
Kings 0.228 0.005
Lake 0.104 0.004
Lassen 0.105 0.004
Los Angeles 0.090 0.003
Madera 0.107 0.003
Marin 0.095 0.004
Mariposa 0.113 0.005
Mendocino 0.102 0.004
Merced 0.226 0.005
Modoc 0.120 0.004
Mono 0.094 0.003
Monterey 0.101 0.003
Napa 0.094 0.003
Nevada 0.098 0.004
Orange 0.088 0.003
Placer 0.089 0.003
Plumas 0.128 0.005
Riverside 0.089 0.003
Sacramento 0.194 0.005
San Benito 0.081 0.002
San Bernardino 0.225 0.006
San Diego 0.087 0.003
San Francisco 0.100 0.003
San Joaquin 0.192 0.005
San Luis Obispo 0.126 0.004
San Mateo 0.091 0.003
Santa Barbara 0.091 0.003
Santa Clara 0.091 0.003
Santa Cruz 0.100 0.003
Shasta 0.091 0.003
Sierra 0.128 0.007
Siskiyou 0.110 0.004
Solano 0.189 0.006
Sonoma 0.097 0.003
Stanislaus 0.200 0.005
Sutter 0.191 0.005
Tehama 0.196 0.006
Trinity 0.078 0.003
Tulare 0.237 0.006
Tuolumne 0.102 0.004
Ventura 0.090 0.003
Yolo 0.102 0.003
Yuba 0.173 0.004
Legend:
Red = Risk greater than 0.15 for new infections and greater than 0.005 for severe infections.
Orange = Risk greater than 0.10 for new infections.
Data Source
1 Case data was from the California Department of Public Health. Population denominators were retrieved based on demographic category and county in California for 2023.

Figure 2: Cumulative Incidence of Novel Respiratory Infections by Age in California, 2023

Figure 3: Cumulative Incidence of Novel Respiratory Infections by Race/Ethnicity in California, 2023

Results

The main variables of interest to determine public health resources to address the novel respiratory disease were age, county, and race/ethnicity.

The cumulative incidence rate of respiratory infections in California is highest among individuals aged 65 and older (0.19) and lowest among those aged 0–17 (0.04).

Counties with a cumulative incidence rate of new infections larger than 0.17 were Colusa, Imperial, Inyo, Kern, Kings, Merced, Sacramento, San Bernardino, San Joaquin, Solano, Stanislaus, Sutter, Tehama, Tulare, and Yuba, with Imperial County having the highest cumulative incidence at 0.45. Some counties with the lowest cumulative incidence rate of new infections were San Diego, San Benito, Riverside,Contra Costa, Amador, Orange, and Placer counties with cumulative incidence rates of new infections around 0.07-0.09. Trinity County had the lowest risk of new infection at 0.078.

The cumulative incidence rate of respiratory infections in California is highest among non-Hispanic American Indian/Alaska Native individuals (0.14) followed by non-Hispanic White individuals (0.13) and non-Hispanic Black individuals (0.12), while it was lowest among non-Hispanic multiracial individuals (0.07) and non-Hispanic Asian individuals (0.09).

Discussion

Based on the analyses, California Department of Public Health should consider increasing treatment resources and prevention tactics to the following groups: individuals aged 65 years and older, counties with a cumulative incidence rate greater than 0.17, and non-Hispanic American Indian/Alaska Native individuals.

Adults aged 65 years and older are susceptible to viral disease due to a weakened immune system from age, as well as potentially have one or more comorbidities that increase severity of infections. Colusa, Imperial, Inyo, Kern, Kings, Merced, Sacramento, San Bernardino, San Joaquin, Solano, Stanislaus, Sutter, Tehama, Tulare, and Yuba Counties had high rates of new infection and as such, resources should be allocated to help alleviate the situation. These counties are unique in that they have various factors that may have contributed to the high rates of infection with the novel respiratory disease. First, San Joaquin, Sacramento, Solano, Stanislaus, Tulare, Kern, Kings, and Merced are located in the Central Valley and agriculture is a large industry. As such, there are more agricultural workers, typically undocumented, who have to work despite being sick, leading to more spread. Colusa, Tehama and Yuba counties also have an agriculture industry. On the other hand, Inyo, San Bernardino, and Imperial Counties are counties that border the eastern and southern region of California, and may have limited healthcare infrastructure and resources. Imperial county is on the MexiCali border and may have experienced higher transmission rates due to the proximity to border travel. These counties may have had a higher percentage of essential workers who could not have transitioned to remote work. Overall, these counties are located away from the coastal areas of California, and may have less healthcare infrastructure to support surges and lack of healthcare access. These areas usually do not have as many healthcare professionals for their population.

The cumulative incidence of respiratory infections in California shows significant racial and ethnic disparities, with Non-Hispanic American Indian/Alaska Native individuals having the highest rates, followed by non-Hispanic White and Black populations. These elevated rates may reflect underlying systemic inequities, including differential access to healthcare, higher prevalence of comorbid conditions, and social determinants of health such as income, housing stability, and occupational exposures. Addressing these gaps is crucial for developing targeted, culturally appropriate interventions to reduce vulnerability and the overall burden of this novel respiratory infections outbreak.