Milestone_6
A Novel Respiratory Infectious Disease Outbreak in California
Problem Statement
The California Department of Public Health is monitoring a simulated outbreak of a novel respiratory infectious disease among California residents during 2023–2024. Weekly surveillance data include counts of new infections and disease severity, stratified by age group, sex, race/ethnicity, and geographic area, and summarized by health officer region. Characterizing how the outbreak unfolded across regions and identifying populations that experienced a higher burden of disease are essential for informing timely public health response.
The objective of this analysis is to describe the course of the outbreak from June 2023 through January 2024 and to assess whether incidence and severity differed across demographic and geographic groups. By examining cumulative incidence rates and severe disease burden across health officer regions, this report aims to highlight populations that may benefit from targeted prevention, treatment, or resource allocation. These findings are intended to support surveillance planning and equitable distribution of public health resources during an active outbreak.
Methods
Three datasets were used to explore this issue. Datasets one (sim_novelid_CA) and two (sim_novelid_LACounty) are from the California Department of Public Health and contain weekly surveillance data about cases and case severity from May 29, 2023 to December 25, 2023. Both of these datasets stratify surveillance case data by age, race/ethnicity, and sex. Dataset one provides surveillance data for all California counties except Los Angeles county and dataset two provides surveillance data for Los Angeles county only. Dataset three (ca_pop_2023) is from the California Department of Finance and provides population data estimates for the state of California by county and several demographic variables for the year 2023.
We first needed to join datasets one and two to create a complete infectious outbreak morbidity dataset for the entire state of California. The cleaning process to do this involved formatting column names to snake case for consistency, standardizing column names, reformatting dates from character values to date values, standardizing character values describing counties, and recoding race/ethnicity values from numeric to character values for consistency across both datasets. Next, we needed to clean dataset three for future joins with strata of interest from the morbidity dataset. Cleaning included renaming race/ethnicity values and recoding age categories to ensure consistency with the morbidity dataset.
To identify regions of California most impacted by the outbreak, we added a variable calculating weekly cumulative incidence and subsetted data by health officer region.
| Health Officer Region | Race/Ethnicity | Age Category | Total New Infections | Population (2023) | Cumulative incidence per 100,000 |
|---|---|---|---|---|---|
| Central California | White, Non-Hispanic | 65+ | 100740 | 8812928 | 1143.1 |
| Central California | American Indian or Alaska Native, Non-Hispanic | 65+ | 1526 | 145142 | 1051.4 |
| Central California | Black, Non-Hispanic | 65+ | 6725 | 639685 | 1051.3 |
| Greater Sierra Sacramento | Black, Non-Hispanic | 65+ | 6804 | 688510 | 988.2 |
| Central California | White, Non-Hispanic | 18-49 | 134456 | 14216879 | 945.7 |
| Central California | Asian, Non-Hispanic | 65+ | 13570 | 1545722 | 877.9 |
| Central California | Hispanic (any race) | 65+ | 51568 | 5957704 | 865.6 |
| Greater Sierra Sacramento | American Indian or Alaska Native, Non-Hispanic | 65+ | 954 | 110639 | 862.3 |
| Central California | Black, Non-Hispanic | 18-49 | 22417 | 2661071 | 842.4 |
| Greater Sierra Sacramento | Black, Non-Hispanic | 18-49 | 19463 | 2358294 | 825.3 |
| Greater Sierra Sacramento | White, Non-Hispanic | 65+ | 86617 | 10774453 | 803.9 |
| Central California | American Indian or Alaska Native, Non-Hispanic | 18-49 | 2682 | 333715 | 803.7 |
| Central California | Native Hawaiian or Pacific Islander, Non-Hispanic | 65+ | 489 | 63240 | 773.2 |
| Greater Sierra Sacramento | Asian, Non-Hispanic | 65+ | 13641 | 1791025 | 761.6 |
| Central California | Hispanic (any race) | 18-49 | 260337 | 35672599 | 729.8 |
| Greater Sierra Sacramento | American Indian or Alaska Native, Non-Hispanic | 18-49 | 1675 | 233802 | 716.4 |
| Central California | Native Hawaiian or Pacific Islander, Non-Hispanic | 18-49 | 1274 | 180265 | 706.7 |
| Greater Sierra Sacramento | Hispanic (any race) | 65+ | 13617 | 1943948 | 700.5 |
| Central California | Multiracial (two or more of above races), Non-Hispanic | 65+ | 2481 | 356159 | 696.6 |
| Greater Sierra Sacramento | White, Non-Hispanic | 18-49 | 115459 | 17128244 | 674.1 |
| Greater Sierra Sacramento | Native Hawaiian or Pacific Islander, Non-Hispanic | 65+ | 732 | 113739 | 643.6 |
| Central California | Multiracial (two or more of above races), Non-Hispanic | 18-49 | 10807 | 1704287 | 634.1 |
| Greater Sierra Sacramento | Hispanic (any race) | 18-49 | 65496 | 10830098 | 604.8 |
| Central California | Asian, Non-Hispanic | 18-49 | 30292 | 5072375 | 597.2 |
| Greater Sierra Sacramento | Native Hawaiian or Pacific Islander, Non-Hispanic | 18-49 | 2017 | 347975 | 579.6 |
| Greater Sierra Sacramento | Asian, Non-Hispanic | 18-49 | 31796 | 5753476 | 552.6 |
| Greater Sierra Sacramento | Multiracial (two or more of above races), Non-Hispanic | 65+ | 2158 | 407743 | 529.3 |
| Greater Sierra Sacramento | Multiracial (two or more of above races), Non-Hispanic | 18-49 | 12055 | 2410250 | 500.2 |
| Central California | White, Non-Hispanic | 50-64 | 36439 | 7719775 | 472 |
| Central California | American Indian or Alaska Native, Non-Hispanic | 50-64 | 699 | 156488 | 446.7 |
| Central California | Black, Non-Hispanic | 50-64 | 4204 | 953839 | 440.7 |
| Greater Sierra Sacramento | Black, Non-Hispanic | 50-64 | 4024 | 956970 | 420.5 |
| Greater Sierra Sacramento | American Indian or Alaska Native, Non-Hispanic | 50-64 | 453 | 118110 | 383.5 |
| Central California | Hispanic (any race) | 50-64 | 38714 | 10169984 | 380.7 |
| Central California | Asian, Non-Hispanic | 50-64 | 6354 | 1877453 | 338.4 |
| Greater Sierra Sacramento | White, Non-Hispanic | 50-64 | 30750 | 9283601 | 331.2 |
| Central California | Multiracial (two or more of above races), Non-Hispanic | 50-64 | 1378 | 421011 | 327.3 |
| Greater Sierra Sacramento | Hispanic (any race) | 50-64 | 9957 | 3164604 | 314.6 |
| Greater Sierra Sacramento | Asian, Non-Hispanic | 50-64 | 6568 | 2142906 | 306.5 |
| Greater Sierra Sacramento | Native Hawaiian or Pacific Islander, Non-Hispanic | 50-64 | 445 | 146816 | 303.1 |
| Central California | Native Hawaiian or Pacific Islander, Non-Hispanic | 50-64 | 255 | 86056 | 296.3 |
| Central California | White, Non-Hispanic | 0-17 | 21120 | 7513625 | 281.1 |
| Central California | American Indian or Alaska Native, Non-Hispanic | 0-17 | 379 | 147374 | 257.2 |
| Central California | Black, Non-Hispanic | 0-17 | 3442 | 1447390 | 237.8 |
| Greater Sierra Sacramento | Multiracial (two or more of above races), Non-Hispanic | 50-64 | 1338 | 574120 | 233.1 |
| Greater Sierra Sacramento | Black, Non-Hispanic | 0-17 | 2630 | 1173381 | 224.1 |
| Greater Sierra Sacramento | Native Hawaiian or Pacific Islander, Non-Hispanic | 0-17 | 253 | 131347 | 192.6 |
| Greater Sierra Sacramento | White, Non-Hispanic | 0-17 | 14838 | 7920934 | 187.3 |
| Greater Sierra Sacramento | American Indian or Alaska Native, Non-Hispanic | 0-17 | 196 | 105152 | 186.4 |
| Central California | Hispanic (any race) | 0-17 | 45211 | 25151478 | 179.8 |
| Central California | Asian, Non-Hispanic | 0-17 | 4422 | 2606573 | 169.6 |
| Greater Sierra Sacramento | Asian, Non-Hispanic | 0-17 | 4036 | 2545100 | 158.6 |
| Central California | Native Hawaiian or Pacific Islander, Non-Hispanic | 0-17 | 140 | 89683 | 156.1 |
| Greater Sierra Sacramento | Hispanic (any race) | 0-17 | 10120 | 6918921 | 146.3 |
| Central California | Multiracial (two or more of above races), Non-Hispanic | 0-17 | 2396 | 1693654 | 141.5 |
| Greater Sierra Sacramento | Multiracial (two or more of above races), Non-Hispanic | 0-17 | 2298 | 2095352 | 109.7 |
| Note: | |||||
| Rates are calculated as total new infections divided by the 2023 population for each county–race/ethnicity group, multiplied by 100,000. Cells highlighted in red indicate values greater than the overall mean cumulative incidence rate across all groups. |
Results:
Weekly trends in cumulative incidence for this outbreak were similar across all health officer regions in California, with synchronous increases and declines over the course of the outbreak. Across regions, cumulative incidence peaked between August 28 and September 18, 2023. The mean weekly cumulative incidence rate during the outbreak period was 402.45 cases per 100,000 population. From August 21 through October 23, 2023, weekly cumulative incidence rates in all health officer regions exceeded this mean. Despite shared patterns, two regions experienced a disproportionately higher burden of disease: Central California and Greater Sierra Sacramento. These regions consistently exhibited higher cumulative incidence rates relative to other health officer regions in the state. To identify priority populations for focused intervention, we examined cumulative incidence within these two regions by race/ethnicity and age group. The highest cumulative incidence rates were observed among adults aged 65 years and older in Central California, with White, non-Hispanic individuals most affected, followed by American Indian or Alaska Native, non-Hispanic individuals, and then Black, non-Hispanic individuals. The fourth-highest cumulative incidence rate was observed among Black, non-Hispanic adults aged 65 years and older in the Greater Sierra Sacramento region. Overall, 32 race/ethnicity and age-specific subgroups within Central California and Greater Sierra Sacramento had cumulative incidence rates exceeding the statewide mean.
Discussion
Overall, the weekly cumulative incidence trends shown in the Visualization 1 indicate that the outbreak followed a similar temporal pattern across California health officer regions, with rapid increases during the summer of 2023, peaks in late August through mid-September, and steady declines into the fall. While the timing of the outbreak was consistent statewide, disease burden was not evenly distributed. Central California and Greater Sierra Sacramento experienced consistently higher cumulative incidence rates compared to other regions, indicating a disproportionate share of infections during the outbreak period.
The cumulative incidence table further demonstrates that within these higher-burden regions, incidence varied by age group and race/ethnicity. Adults aged 65 years and older experienced the highest cumulative incidence, particularly in Central California, with notable differences across racial and ethnic groups. Similar patterns were observed in Greater Sierra Sacramento, where older adult populations also exhibited elevated incidence. Together, these findings highlight meaningful geographic and demographic variation in disease burden and underscore the importance of region-specific surveillance and targeted public health planning rather than reliance on statewide averages alone.