Project Milestone #4

Authors

Angela Bartolo

Lucy Lu

Tyana Michelle Perera

The joined dataset

The final joined dataset

str(df_joined)
tibble [406 × 8] (S3: tbl_df/tbl/data.frame)
 $ health_officer_region: chr [1:406] "Bay Area" "Bay Area" "Bay Area" "Bay Area" ...
 $ county               : chr [1:406] "Alameda County" "Alameda County" "Alameda County" "Alameda County" ...
 $ race_ethnicity       : chr [1:406] "American Indian or Alaska Native, Non-Hispanic" "Asian, Non-Hispanic" "Black, Non-Hispanic" "Hispanic (any race)" ...
 $ total_population     : num [1:406] 4569 568612 157817 335452 88937 ...
 $ total_infections     : num [1:406] 444 39069 17161 33568 5719 ...
 $ total_severe         : num [1:406] 12 1108 510 661 100 ...
 $ infection_rate       : num [1:406] 9.72 6.87 10.87 10.01 6.43 ...
 $ severe_infection_rate: num [1:406] 2.63 1.95 3.23 1.97 1.12 2.11 3.54 4.05 2.19 2.6 ...

Data Dictionary

Data Dictionary
Variable Type Description
health_officer_region character California Health Officer Region
county character County of residence of novel ID cases
race_ethnicity character Race-ethnicity categorization as defined by CA Department of Finance
total_population numeric Total poulation estimates from the CA Department of Finance for 2023
total_infections numeric Number of newly diagnosed individuals
total_severe numeric Number of newly identified individuals having severe disease requiring hospitalization
infection_rate numeric Rate of newly diagnosed individuals per 100 people
severe_infection_rate numeric Rate of newly diagnosed individuals having severe disease requiring hospitalization per 1000 people

Descriptive statistics

Table 1. Descriptive Statistics for Infection Rate (per 100 people)
n population mean sd median IQR min max
406 39109070 12.55 8.43 9.89 7.24 0.00 66.64
Rates calculated per 100 people. N = 406 strata.

Interpretation: The infection rate for the entire state of California, with a total population of 39,109,070 people, is 12.55 cases per 100 persons with a standard deviation of 8.43 cases per 100 persons. The median is 9.89 cases per 100 persons, with an interquartile range of 7.24 cases per 100 people, with a minimum of 0 cases per 100 persons and a maximum of 66.64 cases per 100 people in some strata.

Table 2. Descriptive Statistics for Severe Infection Rate (per 1000 people)
n population mean sd median IQR min max
406 39109070 3.35 3.17 2.60 2.94 0.00 25.98
Rates calculated per 100 people. N = 406 strata.

Interpretation: The severe infection rate for the entire state of California, with a total population of 39,109,070, is 3.35 cases per 1,000 persons with a standard deviation of 3.27 cases per 1,000 persons. The median is 2.60 cases per 1,000 persons, with an interquartile range of 2.94 cases per 1,000 people, with a minimum of 0 cases per 1,000 persons and a maximum of 25.98 cases per 1,000 people in some strata.

Visualization: Infection data grouped by region

California Infection Data Grouped by Region
Region Total Population Total Infections Infection Rate Total Severe Infections Severe Infection Rate
Central California 4432134 804517 18.15 19897 4.49
Greater Sierra Sacramento 2973210 460390 15.48 13134 4.42
Southern California 12802429 1503964 11.75 41650 3.25
Rural North 683715 72896 10.66 2559 3.74
Bay Area 8391874 821660 9.79 24577 2.93
Los Angeles 9825708 886156 9.02 25109 2.56
Infection Rate is per 100 people. Severe Infection Rate is per 1000 people.

Interpretation: Infection rates and severe infection rates are not determined by the total population or total number of infections in a region. Central California and the Greater Sierra Sacramento region have higher infection rates and severe infection rates compared to the Bay Area and Los Angeles, which have lower infection rates and severe infection rates.

Visualization: Infection data grouped by demographics

California Infection Data Grouped by Demographics
Race/Ethnicity Total Population Total Infections Infection Rate Total Severe Infections Severe Infection Rate
American Indian or Alaska Native, Non-Hispanic 158672 22195 13.99 671 4.23
White, Non-Hispanic 13848282 1778774 12.84 63448 4.58
Black, Non-Hispanic 2211518 271836 12.29 7118 3.22
Hispanic (any race) 14829946 1796696 12.12 36484 2.46
Native Hawaiian or Pacific Islander, Non-Hispanic 153729 16921 11.01 410 2.67
Asian, Non-Hispanic 6295420 546770 8.69 16568 2.63
Multiracial (two or more of above races), Non-Hispanic 1611503 116391 7.22 2227 1.38
Infection Rate is per 100 people. Severe Infection Rate is per 1000 people.

Interpretation: Infection rates and severe infection rates are not determined by the total population or total number of infections in a racial and ethnic category. American Indian or Alaska Native, Non-Hispanics, and White, Non-Hispanics have higher infection rates and severe infection rates compared to Asian, Non-Hispanics, and Multiracial Non-Hispanics, who have the lower infection rates and severe infection rates.

Visualization: Racial Comparison

ggplot(
  df_joined_demog %>%
    mutate(race_ethnicity = fct_reorder(race_ethnicity, infection_rate_demog)),
       aes(x = infection_rate_demog, y = race_ethnicity)
  ) +
  geom_col() +
  labs(
    title = "Infection Rate by Race/Ethnicity in California",
    subtitle = "Rates calculated per 100 people",
    x = "Infection Rate (per 100 people)",
    y = "Race/Ethnicity",
    caption = "Source: CA Department of Public Health. Rates per 100 people."
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold"),
    axis.text.y = element_text(size = 10)
  )

Interpretation: Infection rates varied by race/ethnicity, with White (non-Hispanic) and American Indian/Alaska Native groups showing the highest infection rates per 100 people, while Multiracial and Asian (non-Hispanic) groups show the lowest, suggesting clear racial/ethnic disparities in infection burden.

Visualization: Course of Pandemic

ggplot(
  df_joined_time_race, 
  aes(
    x = dt_diagnosis, 
    y = severe_infection_rate, 
    color = race_ethnicity, 
    group = race_ethnicity 
  )
) +
  geom_line(linewidth = 1) +
  geom_point(size = 1.5) +
  labs(
    title = "Time Trend of Severe Infection Rate by Race/Ethnicity",
    subtitle = "Severe infection rate defined as new severe infections per 1000 people",
    x = "Diagnosis Week",
    y = "Severe Infection Rate",
    color = "Race/Ethnicity" 
  ) +
  theme_minimal(base_size = 16) +
  theme(
    plot.title = element_text(face = "bold"),
    legend.position = "bottom"
  ) +
  guides(color = guide_legend(ncol = 2, byrow = TRUE))

Interpretation: Across all race & ethnicity groups, we observe a trend of increasing rate of severe infection from May 2023 to Oct 2023, and the rate decreased from Oct 2023 to Dec 2023. Consistent with the previous visualizations, we observe significant variations across race & ethnicity groups, with White, Non-Hispanic & American Indian or Alaska Native, Non-Hispanic having the highest severe infection rate over the study duration.