R for Public Health: Milestone 4, Scenario 1

Group 2: Suzanne Michele and David Burke

November 26th, 2025




Identify Clean 2023 Data Sets and Join for Visualization and Analysis


Identify clean Los Angeles County data set

glimpse(los_angeles_02)
## Rows: 1,736
## Columns: 12
## $ dt_diagnosis           <date> 2023-05-29, 2023-06-05, 2023-06-12, 2023-06-19…
## $ age_cat                <chr> "0-17", "0-17", "0-17", "0-17", "0-17", "0-17",…
## $ sex                    <chr> "FEMALE", "FEMALE", "FEMALE", "FEMALE", "FEMALE…
## $ race_ethnicity         <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1…
## $ new_infections         <dbl> 15, 17, 23, 51, 67, 75, 106, 83, 91, 173, 162, …
## $ cumulative_infected    <dbl> 15, 32, 55, 106, 173, 248, 354, 437, 528, 701, …
## $ new_unrecovered        <dbl> 0, 0, 0, 0, 1, 1, 4, 3, 1, 3, 2, 4, 6, 5, 9, 7,…
## $ cumulative_unrecovered <dbl> 0, 0, 0, 0, 1, 2, 6, 9, 10, 13, 15, 19, 25, 30,…
## $ new_severe             <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ cumulative_severe      <dbl> 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ county                 <chr> "Los Angeles County", "Los Angeles County", "Lo…
## $ time_int               <dbl> 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,…



Identify clean California State data set without Los Angeles County data

glimpse(california_02)
## Rows: 98,952
## Columns: 12
## $ county                 <chr> "Alameda County", "Alameda County", "Alameda Co…
## $ age_cat                <chr> "0-17", "0-17", "0-17", "0-17", "0-17", "0-17",…
## $ sex                    <chr> "FEMALE", "FEMALE", "FEMALE", "FEMALE", "FEMALE…
## $ race_ethnicity         <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1…
## $ dt_diagnosis           <date> 2023-05-29, 2023-06-05, 2023-06-12, 2023-06-19…
## $ time_int               <dbl> 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,…
## $ new_infections         <dbl> 6, 1, 2, 10, 19, 25, 23, 18, 22, 35, 29, 43, 69…
## $ cumulative_infected    <dbl> 6, 7, 9, 19, 38, 63, 86, 104, 126, 161, 190, 23…
## $ new_unrecovered        <dbl> 0, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 3, 2,…
## $ cumulative_unrecovered <dbl> 0, 1, 1, 1, 1, 2, 2, 3, 4, 5, 6, 6, 7, 7, 10, 1…
## $ new_severe             <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ cumulative_severe      <dbl> 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…



Identify clean California State population data set

glimpse(ca_pop_03)
## Rows: 222
## Columns: 9
## $ county                <chr> "Los Angeles", "Los Angeles", "Los Angeles", "Lo…
## $ health_officer_region <chr> "Los Angeles", "Los Angeles", "Los Angeles", "Lo…
## $ age_cat               <chr> "0-4", "0-4", "0-4", "0-4", "0-4", "5-11", "5-11…
## $ sex                   <chr> "FEMALE", "FEMALE", "FEMALE", "FEMALE", "FEMALE"…
## $ race7                 <chr> "Hispanic", "Hispanic", "Hispanic", "Hispanic", …
## $ pop                   <dbl> 24761, 25113, 24034, 27174, 27752, 28015, 29124,…
## $ race_ethnicity        <chr> "7", "7", "7", "7", "7", "7", "7", "7", "7", "7"…
## $ age_cat2              <chr> "0-17", "0-17", "0-17", "0-17", "0-17", "0-17", …
## $ population            <dbl> 2097487, 2097487, 2097487, 2097487, 2097487, 209…



Join Los Angeles County data set with California data set to create comprehensive California state data set

california_all_counties <- rbind(california_02, los_angeles_02) #joins Los Angeles county data with all California data



Create filtered data set for Hispanic patients in Los Angeles County from comprehensive California state data set

# filter to specify county, race/ethnicity
la_hispanic <- california_all_counties %>% 
  filter(
    county == "Los Angeles County",
    race_ethnicity == "7"
  )

glimpse(la_hispanic)
## Rows: 248
## Columns: 12
## $ county                 <chr> "Los Angeles County", "Los Angeles County", "Lo…
## $ age_cat                <chr> "0-17", "0-17", "0-17", "0-17", "0-17", "0-17",…
## $ sex                    <chr> "FEMALE", "FEMALE", "FEMALE", "FEMALE", "FEMALE…
## $ race_ethnicity         <chr> "7", "7", "7", "7", "7", "7", "7", "7", "7", "7…
## $ dt_diagnosis           <date> 2023-05-29, 2023-06-05, 2023-06-12, 2023-06-19…
## $ time_int               <dbl> 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,…
## $ new_infections         <dbl> 41, 42, 42, 142, 189, 239, 312, 239, 255, 431, …
## $ cumulative_infected    <dbl> 41, 83, 125, 267, 456, 695, 1007, 1246, 1501, 1…
## $ new_unrecovered        <dbl> 0, 2, 0, 1, 1, 4, 10, 5, 9, 4, 6, 8, 13, 15, 23…
## $ cumulative_unrecovered <dbl> 0, 2, 2, 3, 4, 8, 18, 23, 32, 36, 42, 50, 63, 7…
## $ new_severe             <dbl> 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 0, 3,…
## $ cumulative_severe      <dbl> 0, 0, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 4, 4, 7,…



Create data table to capture severe infection rate

ca_pop_sex_totals <- ca_pop_03 %>%
  group_by(sex) %>%
  summarise(population = sum(pop, na.rm = TRUE), .groups = "drop")



Summary of 2023 Los Angeles County Infections

Aggregate Data by Sex

# apply variable labels
la_hispanic_summary <- apply_labels(
   la_hisp_severe_by_sex,
   sex            = "Sex",
   new_infections = "New Infections",
   new_severe     = "New Severe Infections",
   new_severe_rate_per_100k = "New Severe Rate of Infections per 100k",
   time_int       = "Epi Week"
 )


summary_table1 <- tbl_summary(
  la_hispanic_summary,
  by = sex,
  include = c(new_infections, new_severe, new_severe_rate_per_100k),
  statistic = list(
    all_continuous() ~ "{mean} ({sd})"  # show mean and SD
  )
) %>%
  bold_labels() %>%
  modify_caption(
    "**Table 1. Summary of 2023 Los Angeles County Infections by Epi Week for Hispanic Population (N = number of Epi Weeks)**"
  )

summary_table1
Table 1. Summary of 2023 Los Angeles County Infections by Epi Week for Hispanic Population (N = number of Epi Weeks)
Characteristic FEMALE
N = 31
1
MALE
N = 31
1
New Infections 6,946 (6,661) 6,898 (6,619)
New Severe Infections 164 (154) 142 (135)
New Severe Rate of Infections per 100k 7.8 (7.3) 7.1 (6.8)
1 Mean (SD)



Summary table description

In 2023, among the Hispanic population in Los Angeles County that tested positive for new infections, there was a comparable age distribution across both sexes. Females had slightly more new infections (median 723) and new severe infections (median 6) compared to males (median 709 new infections and 4 new severe infections).



Create 2023 New Severe Infection Barcharts

Data aggregated by Sex

plot_ly(
  la_hisp_severe_by_sex,
  x = ~time_int,
  y = ~new_severe,
  color = ~sex,
  type = "bar"
) %>%
  layout(
    title = list(
      text = "Figure 1.<br>2023: New Severe Infections Among Hispanics in Los Angeles County, CA<br>by Sex",
      x = 0.5,             
      xanchor = "center",  
      yanchor = "top",     
      pad = list(t = 20)   
    ),
    xaxis = list(title = "Epi Week"),
    yaxis = list(title = "New Severe Infections"),
    margin = list(t = 120) # increase top margin to fit wrapped title
  )


Figure 1 description

In 2023, among the Hispanic population in Los Angeles County, females consistently had slightly more new severe infections compared to males over time.



Visualize barchart for new severe infection rate by gender

plot_ly(
  la_hisp_severe_by_sex,
  x = ~time_int,
  y = ~new_severe_rate_per_100k,
  color = ~sex,
  type = "bar"
) %>%
  layout(
    title = list(
      text = "Figure 2.<br>2023: New Severe Infection Rate Among Hispanics in Los Angeles County, CA<br>by Sex",
      x = 0.5,             
      xanchor = "center",  
      yanchor = "top",     
      pad = list(t = 20)   
    ),
    xaxis = list(title = "Epi Week"),
    yaxis = list(title = "New Severe Infection Rate per 100k"),
    margin = list(t = 120) # increase top margin to fit wrapped title
  )


Figure 2 description

In 2023, among the Hispanic population in Los Angeles County, the rate of new severe infections was consistently slightly higher for females compared to males, accounting for the size of the relevant demographic strata.



Data Dictionary for data set columns

Variable Name Data Type Description
time_int number Epiweek (2023)
sex character Sex categorization as defined by California Department of Finance
new_severe_by_week_sex number Total of new severe infections in each week, grouped by sex for the demographic and geographic strata of interest (Los Angeles County, Hispanic (any race))
population number Sum of hispanic population in LA county, grouped by sex (Source: California Department of Finance 2023 data)
new_severe_rate_per_100k number Rate of new severe infections per 100,000 people in the demographic stratum of interest