Footnote: Case rates are per 100,000 population and age group in 2023

Problem Statement Our team is tasked with understanding the progression of a disease outbreak in California to identify which populations are most affected and how best to allocate prevention and treatment resources. The dataset provides detailed demographic data, including age, sex, and race, as well as geographic information about counties within California. This data, combined with health-related metrics such as diagnoses, severity of illness, and epidemiological week, offers a comprehensive view of the outbreak’s impact across different populations.

The goal is to determine if certain demographic or geographic groups are disproportionately affected by the disease over time. Insights from this analysis will help prioritize and target public health interventions, ensuring that prevention and treatment efforts are directed where they are needed most. This approach aims to mitigate the outbreak’s impact and address health inequities effectively.

Methods The dataset used in this analysis includes simulated morbidity data for California counties and demographic details, as well as population data for 2023. The primary objective was to analyze disparities in infection rates by county and age group, enabling informed resource allocation. The methods and adjustments described below reflect the implementation of feedback from earlier milestones, including focusing on one table and one visualization and refining the analysis to exclude redundant columns and variables. Data was sourced from the simulated datasets sim_novelid_CA and sim_novelid_LA for morbidity data, and ca_pop_2023 for California population data. These datasets were joined to calculate case rates per 100,000 population and analyze demographic and geographic disparities in infection rates. The datasets were standardized, with inconsistent column names, formats, and values reconciled to allow for accurate merging and analysis.

Column names across datasets were standardized to snake case, ensuring consistency. Categories for sex, age_category, and race_ethnicity were aligned across datasets, with race values recoded into standardized names such as “White, Non-Hispanic” or “Hispanic.” Dates were converted to proper formats, and date_report was adjusted to reflect the last day of the epiweek as described in the codebook. Columns not required for the analysis, such as new_unrecovered and cumulative_unrecovered, were removed. The morbidity datasets were joined with the population dataset, and adjustments were made to the county column to ensure consistent naming for proper alignment during the merge. The combined dataset was grouped by county and age_category, and total new infections, total population, and infection rates were calculated. Infection rates were standardized to per 100,000 population to facilitate comparisons across strata.

To address feedback, the table was revised to exclude the sex column, as it did not add meaningful variation and was redundant for this analysis. The focus shifted to county and age_category as the primary variables of interest. Age group-specific rates were calculated and ranked, with counties showing the highest infection rates highlighted in descending order. A summary table was created to display infection rates, total new cases, and population counts by county and age group. The table excluded the sex column, focusing instead on infection rates and demographic groupings of county and age_category. Only counties with the highest rates were included for clarity, ensuring the table conveyed meaningful insights. A bar plot visualized infection rates by county and age group, further emphasizing disparities in infection rates. Counties with the highest rates across age groups were featured, and age categories were color-coded to enhance interpretability.

This analysis incorporated feedback by reducing the number of visualizations to focus on one table and one plot, each emphasizing key insights. Redundant columns, such as sex, were excluded from the table to enhance clarity. The interpretation of the bar plot was revised to highlight differences in infection rates by age group and county. These refinements ensure the deliverables are clear, concise, and aligned with the objectives of identifying populations and regions most impacted by the outbreak. By emphasizing infection rates and addressing the requested adjustments, this analysis highlights areas for targeted interventions and resource allocation effectively.

Results

The table highlights California counties with the total number of new COVID-19 cases for 2023, total population by age group, and total infection rates per 100,000 population for each age group by county. The data is organized in descending order, with counties displaying the highest infection rates at the top. Notably, Imperial County shows the highest infection rates among the 65+ age group at 66,593 cases per 100,000 population, followed by 58,216 cases per 100,000 population for the 18–49 age group in the same county. Among the 65+ population, Kings, Tulare, and Kern counties also report high infection rates, though lower than Imperial County.

The graph displays infection rates per 100,000 population (y-axis) against California counties (x-axis). The rates are categorized by age groups: 65+, 50–64, 18–49, and 0–17. The graph visually confirms that Imperial County has the highest new infection rates across all age groups.

Discussion

The table and graph provide a detailed analysis of new COVID-19 infection rates in California counties by age group for 2023. The findings highlight that older adults (65+) consistently experience the highest infection rates, with Imperial County leading at 66,593 cases per 100,000 population. This trend reflects the heightened vulnerability of older populations to infectious diseases due to factors such as weakened immune systems or pre-existing conditions.

Additionally, the younger age group (18–49) in Imperial County shows a notably high infection rate of 58,216 cases per 100,000 population, significantly exceeding rates observed in other counties. This could indicate increased transmission within this group due to factors such as greater mobility, higher levels of social interaction, or potential barriers to vaccination and healthcare access.

Intervention and Resource Allocation

The elevated infection rates in the 65+ age group underscore the importance of interventions such as vaccination drives, improved access to healthcare, and preventive education tailored to older populations. The high infection rates among the 18–49 age group in Imperial County suggest a need to investigate vaccination rates, vaccine hesitancy, and healthcare barriers in this demographic. Strengthened public health messaging and outreach could address these challenges. Imperial County, given its consistently high infection rates across all age groups, requires targeted public health efforts, including increased testing, awareness campaigns, and community engagement to reduce further transmission.

In conclusion, the result highlights the need for age-specific intervention focusing on Imperial County to manage and reduce COVID-19 Infection rates effectively.