tibble [406 × 8] (S3: tbl_df/tbl/data.frame)
$ health_officer_region: chr [1:406] "Bay Area" "Bay Area" "Bay Area" "Bay Area" ...
$ county : chr [1:406] "Alameda County" "Alameda County" "Alameda County" "Alameda County" ...
$ race_ethnicity : chr [1:406] "American Indian or Alaska Native, Non-Hispanic" "Asian, Non-Hispanic" "Black, Non-Hispanic" "Hispanic (any race)" ...
$ total_population : num [1:406] 4569 568612 157817 335452 88937 ...
$ total_infections : num [1:406] 444 39069 17161 33568 5719 ...
$ total_severe : num [1:406] 12 1108 510 661 100 ...
$ infection_rate : num [1:406] 9.72 6.87 10.87 10.01 6.43 ...
$ severe_infection_rate: num [1:406] 2.63 1.95 3.23 1.97 1.12 2.11 3.54 4.05 2.19 2.6 ...
Data Dictionary
Data Dictionary
Variable
Type
Description
health_officer_region
character
California Health Officer Region
county
character
County of residence of novel ID cases
race_ethnicity
character
Race-ethnicity categorization as defined by CA Department of Finance
total_population
numeric
Total poulation estimates from the CA Department of Finance for 2023
total_infections
numeric
Number of newly diagnosed individuals
total_severe
numeric
Number of newly identified individuals having severe disease requiring hospitalization
infection_rate
numeric
Rate of newly diagnosed individuals per 100 people
severe_infection_rate
numeric
Rate of newly diagnosed individuals having severe disease requiring hospitalization per 1000 people
Descriptive statistics
Table 1. Descriptive Statistics for Infection Rate (per 100 people)
n
population
mean
sd
median
IQR
min
max
406
39109070
12.55
8.43
9.89
7.24
0.00
66.64
Rates calculated per 100 people. N = 406 strata.
Interpretation: The infection rate for the entire state of California, with a total population of 39,109,070 people, is 12.55 cases per 100 persons with a standard deviation of 8.43 cases per 100 persons. The median is 9.89 cases per 100 persons, with an interquartile range of 7.24 cases per 100 people, with a minimum of 0 cases per 100 persons and a maximum of 66.64 cases per 100 people in some strata.
Table 2. Descriptive Statistics for Severe Infection Rate (per 1000 people)
n
population
mean
sd
median
IQR
min
max
406
39109070
3.35
3.17
2.60
2.94
0.00
25.98
Rates calculated per 100 people. N = 406 strata.
Interpretation: The severe infection rate for the entire state of California, with a total population of 39,109,070, is 3.35 cases per 1,000 persons with a standard deviation of 3.27 cases per 1,000 persons. The median is 2.60 cases per 1,000 persons, with an interquartile range of 2.94 cases per 1,000 people, with a minimum of 0 cases per 1,000 persons and a maximum of 25.98 cases per 1,000 people in some strata.
Visualization: Infection data grouped by region
California Infection Data Grouped by Region
Region
Total Population
Total Infections
Infection Rate
Total Severe Infections
Severe Infection Rate
Central California
4432134
804517
18.15
19897
4.49
Greater Sierra Sacramento
2973210
460390
15.48
13134
4.42
Southern California
12802429
1503964
11.75
41650
3.25
Rural North
683715
72896
10.66
2559
3.74
Bay Area
8391874
821660
9.79
24577
2.93
Los Angeles
9825708
886156
9.02
25109
2.56
Infection Rate is per 100 people. Severe Infection Rate is per 1000 people.
Interpretation: Infection rates and severe infection rates are not determined by the total population or total number of infections in a region. Central California and the Greater Sierra Sacramento region have higher infection rates and severe infection rates compared to the Bay Area and Los Angeles, which have lower infection rates and severe infection rates.
Visualization: Infection data grouped by demographics
California Infection Data Grouped by Demographics
Race/Ethnicity
Total Population
Total Infections
Infection Rate
Total Severe Infections
Severe Infection Rate
American Indian or Alaska Native, Non-Hispanic
158672
22195
13.99
671
4.23
White, Non-Hispanic
13848282
1778774
12.84
63448
4.58
Black, Non-Hispanic
2211518
271836
12.29
7118
3.22
Hispanic (any race)
14829946
1796696
12.12
36484
2.46
Native Hawaiian or Pacific Islander, Non-Hispanic
153729
16921
11.01
410
2.67
Asian, Non-Hispanic
6295420
546770
8.69
16568
2.63
Multiracial (two or more of above races), Non-Hispanic
1611503
116391
7.22
2227
1.38
Infection Rate is per 100 people. Severe Infection Rate is per 1000 people.
Interpretation: Infection rates and severe infection rates are not determined by the total population or total number of infections in a racial and ethnic category. American Indian or Alaska Native, Non-Hispanics, and White, Non-Hispanics have higher infection rates and severe infection rates compared to Asian, Non-Hispanics, and Multiracial Non-Hispanics, who have the lower infection rates and severe infection rates.
Visualization: Racial Comparison
ggplot( df_joined_demog %>%mutate(race_ethnicity =fct_reorder(race_ethnicity, infection_rate_demog)),aes(x = infection_rate_demog, y = race_ethnicity) ) +geom_col() +labs(title ="Infection Rate by Race/Ethnicity in California",subtitle ="Rates calculated per 100 people",x ="Infection Rate (per 100 people)",y ="Race/Ethnicity",caption ="Source: CA Department of Public Health. Rates per 100 people." ) +theme_minimal(base_size =12) +theme(plot.title =element_text(face ="bold"),axis.text.y =element_text(size =10) )
Interpretation: Infection rates varied by race/ethnicity, with White (non-Hispanic) and American Indian/Alaska Native groups showing the highest infection rates per 100 people, while Multiracial and Asian (non-Hispanic) groups show the lowest, suggesting clear racial/ethnic disparities in infection burden.
Visualization: Course of Pandemic
ggplot( df_joined_time_race, aes(x = dt_diagnosis, y = severe_infection_rate, color = race_ethnicity, group = race_ethnicity )) +geom_line(linewidth =1) +geom_point(size =1.5) +labs(title ="Time Trend of Severe Infection Rate by Race/Ethnicity",subtitle ="Severe infection rate defined as new severe infections per 1000 people",x ="Diagnosis Week",y ="Severe Infection Rate",color ="Race/Ethnicity" ) +theme_minimal(base_size =16) +theme(plot.title =element_text(face ="bold"),legend.position ="bottom" ) +guides(color =guide_legend(ncol =2, byrow =TRUE))
Interpretation: Across all race & ethnicity groups, we observe a trend of increasing rate of severe infection from May 2023 to Oct 2023, and the rate decreased from Oct 2023 to Dec 2023. Consistent with the previous visualizations, we observe significant variations across race & ethnicity groups, with White, Non-Hispanic & American Indian or Alaska Native, Non-Hispanic having the highest severe infection rate over the study duration.