Problem Statement

The California Department of Public Health detected a novel infectious respiratory disease outbreak in California between May to December 2023 and collected the number of cases and case severity with demographic information on infected individuals. This report examines the course of this outbreak to determine whether it disproportionately affected specific demographic or geographic populations. In particular, race/ethnicity, county, and age factors are examined to identify populations who may benefit most from prevention and treatment resources.

Methods

Dataset 1: Simulated Novel Infection Disease case reporting for California

This data source is simulated data of weekly infectious disease cases for each county in California reported from public health agencies and organizations such as county health departments, beginning in late May 2023 until the end of December 2023. The infection data are linked with demographic information such as age group, binary gender, race and ethnicity.

Cleaning

  1. Column names into snake case and renamed.
  2. Diagnosis dates read in dmy format.
  3. Race/ethnicity categories recoded with human readable values to match the LA county dataset.
  4. Removed the word “county” from each county name.

Dataset 2: Simulated Novel Infectious Disease case reporting for Los Angeles County

This simulated dataset contains weekly reported cases in Los Angeles county of a disease categorized by diagnosis date, patient demographics, and cumulative totals for infected, unrecovered, and severe cases, representative of data collected by public health agencies from around LA county. Data spans from late May 2023 to late December 2023. The data includes age group, binary gender, race and ethnicity.

Cleaning

  1. Column names into snake case and renamed.
  2. Diagnosis dates read in dmy format.
  3. Created a “county” column and populated with “Los Angeles”.

Dataset 3: California estimated population for 2023

This dataset is simulated population data from the State of California. It includes population estimates by the CA Dept of Finance for 2023 by CA county and demographic categories (age, race, and sex).

Cleaning

  1. Column names into snake case and renamed to match with other sources
  2. Race/ethnicity categories recoded to match the LA county dataset.
  3. First 3 age categories combined because the other two data sets are 0-17 and this one is broken down into 0-4, 5-11, and 12-17.
  4. Removed health officer region data.

Analytic methods

First, the two infection datasets were joined together (Source 1 & 2) by binding the rows, to generate a joined infection dataset for all counties in California. Next, strata of interest demonstrating the distribution of infections across race and geographic categories were created by grouping by county then by race/ethnicity. Counts of the infections in each stratum were summed to obtain a cumulative case count over the entire data collection period (May to Dec 2023) for both new infections and new severe infections per stratum.

Stratified cumulative new and severe case counts were left joined with the California population dataset, using county and race_ethnicity categories as keys. The resulting table contains the counts of cumulative new infections, new severe infections, and total population count for each stratum.

To calculate the cumulative incidence of new and severe cases per 100 individuals in each stratified demographic group, the count new or severe cases was divided by the population count for that group, and multiplied by 100.

To obtain weekly incidence rate information, the joined infection data (Source 1 & 2) was grouped by age category and weekly diagnosis dates. The weekly new case and severe case counts were then summed up to give a total count for each week per age group. The population dataset (Source 3) was likewise grouped by age, and left joined with the stratified weekly case data. Weekly cases per 1000 individuals in each age group was calculated by dividing the weekly case count of each age group by the population of that age group, and multiplied by 1000.

Results

Map of cumulative new infections by county

Cumulative incidence of new cases in each county was mapped. The map shows that Imperial County has the highest cumulative incidence of all the counties. Examining the granular data for Imperial county, the highest incidence was experienced in the Black, Non-Hispanic population in this county, with a cumulative incidence of 66.64 cases/100. Cumulative incidence among White, Non-Hispanic population in this county was close behind, with 65.93 cases/100. Coastal counties generally have lower cumulative incidence of new infections.

Table of Cumulative New Infections per 100 People per Race and Ethnic Group in each County

American Indian or Alaska Native (Non-Hispanic) and White (Non-Hispanic) race/ethnicity groups have the highest cumulative incidence of new infections/100 in each county in California. Specifically, the highest rates of new infections for these two groups are in Imperial County.

Discussion

The collected infection data provide insight into populations most at risk from this novel respiratory infection and most likely to benefit from targeted prevention and mitigation strategies.

Geographically, inland counties experience higher risk than coastal counties, with Imperial county demonstrating the highest incidence. Among race/ethnicity groups, Black (Non-Hispanic), American Indian or Alaska Native (Non-Hispanic), and White (Non-Hispanic) populations experienced the highest incidence of new infections.

The disparity in infection risk of inland states may be linked with higher proportion of agricultural occupations, leading to elevated risk of respiratory infection. Risk disparity due to race/ethnicity may also contribute to this. Specifically, inland counties with larger minority populations may face disproportionate exposure risk, indicating that more resources should be invested to prevent disease in these populations.

Older adults aged 65 and older and adults aged 18–49 appear to be at highest risk of infection. The observed increased risk in individuals aged 18–49 over those aged 50-64 may be due to social factors such as increased time in public spaces like school and work, making this age group at increased exposure to infection. However, severe disease disproportionately occurs among adults aged 65 and older. Given that the burden of severe disease is likely linked to infection-related mortality, investing in protections for this group may be especially important for reducing overall deaths.

Lastly, weekly infection rates are highest between August through late October in 2023. This provides insight on time-dependent external factors like weather patterns on individual behaviours that increase risk. Prevention strategies and preparation of testing and treatment should be prioritized preceding these peak months.