(update 2020-07-10 22:12:38)

Based on final reconciled data from Ishaan and YJ.

Understanding availablity of COVID data by age AND race across US states

*This is part of a working paper, Racial disparities in case fatality ratio widen after controlling for age: A call for race-specific age distributions in State COVID-19 data (Ishaan Pathak, Yoonjoung Choi, Dazhi Jiao, Diana Yeung, Li Liu)

Methods

  • Objective of this assessment is to see if data are publicly available for users/researchers to be able to assess racial disparity in case fatality rate, adjusted for different age-distribution in confirmed cases. Accessibility is not factored, as long as we could find data (even in a painful and tedious way).
  • Public data sources in each state were reviewed. Information in a dashboard/report (figures and tables) and, if available, downloadable source data were reviewed.
  • Assessment conducted separately for confirmed cases and COVID-19 deaths.
  • Five binary items for age-disaggregated data and five binary items for race-disaggregated data. The total score would range from 0 to 10 (for each cases/deaths).

FIve items for age-disaggregated data
* Age 1: Age-disaggregated data (either count or percent share of total) exist.
* Age 2: For ages 50 and above, data are disaggregated by 10-year age group or more in detail.
* Age 3: The oldest, open-age group starts at 80 or above.
* Age 4: Data are disaggregated by 5-year age group, as in typical demographic research.
* Age 5: An exact magnitude of observations with unknown age is available, enabling sensitivity analyses with and without observations with unknown age.

NOTE: We developed two items (2 and 3) for adequate disaggregation in older ages, given significantly higher mortality in older age groups.

FIve items for race-disaggregated data
* Race 1: Race-disaggregated data (either count or percent share of total) exist.
* Race 2: Ethnicity-disaggregated data (i.e., Hispanic origin) exist.
* Race 3: Data for Hispanic origin can be mutually exclusively distinguished from other race-ethnicity combinations. Data presented across race-ethnicity categories that are mutually exclusive (e.g., Hispanic origin any race, Non-Hispanic White, Non-Hispanic White, Non-Hispanic Asian, and other).
* Race 4: Data are disaggregated by race and ethnicity combination (e.g., Hispanic Black, Non-Hispanic Black, Hispanic White, Non-Hispanic White, etc.), as in census data.
* Race 5: An exact magnitude of observations with unknown race is available, enabling sensitivity analyses with and without observations with unknown race.

  • Two co-authors independently assigned scores, discussed clarification questions, and again independently revised scores. Two sets of revised scores were compared and reconciled for differences.

Results

  • No state got full scores. Figure 1 shows data availability scores for disaggregated data on confirmed cases and deaths. States are listed in the order of the total scores. Blue and red shaded boxes are for age- and race-disaggregated data points, respectively.

  • Among 50 states, availability scores for case data:
    – Average total score is 6.4 (SD=2).
    – Average age-disaggregated data score is 3.3 (SD=1.1).
    – Average race-disaggregated data score is 3.1 (SD=1.2).

  • Among 50 states, availability scores for death data:
    – Average total score is 6 (SD=2.2).
    – Average age-disaggregated data score is 3 (SD=1.5).
    – Average race-disaggregated data score is 3.1 (SD=1.2).

  • Only 30 states have data that are appropriate to study the steep age-pattern of CFR (i.e., 10-year interval for 50 or above, AND open age internal starting at 80 or above - for both confirmed cases and deaths).
  • Only 19 states have data that are appropriate to study disparity in CFR by race-ethnicity in the US (i.e., Data for Hispanic origin can be mutually exclusively distinguished from other race-ethnicity combinations - for both confirmed cases and deaths).

  • Most importantly, only three states have age-race-disaggregated data that are required to assess age-adjusted racial disparity: California, Illinois, and Ohio.
  • In California, however, age-disaggregation is not optimal to adjust for any mortality gradients in older ages (i.e., 50-64, 65-79, and 80+).
  • In Illinois and Ohio, age-data are disaggregated by 10-year age group (i.e., 50-59, 60-69, 70-79, and 80+) for 50 and above within each race/ethnicity group.

Discussion regarding data availability

  • All states have some COVID-19 data available to the public - great thing. Some states have in depth information.
  • But, in only three states, we were able to assess racial disparity, adjusted for different distribution of ages among confirmed cases.

  • The race-ethnicity criteria may be less relevant in some states. For example in Hawaii, detailed race/ethnicity categories are used (e.g., Japanese, Philippine), though there is no Hispanic origin category.
  • In states with a small number of deaths (e.g., Montana & Alaska), it is reasonable not to present the data by detailed category yet.
  • Some local governments such as New York City has already presented age-adjusted data across race/ethnicity groups (thought their age-disaggregation is not adequate).
  • States with no or minimal race-disaggregated data have predominantly white population (XX% in Montana). SO, it’s understandable. Nevertheless, even a small minority group can be disproportionately affected by COVID. It is important to collect and report data.
  • Florida even publishes de-identified, individual-level data with background characteristics, type of transmission (e.g., community acquired), etc., but without race/ethnicity. Bummer.

  • THERE IS HUGE variability in terms of accessibility. Understandable, depending on who their intended target audience is. Still, it would be great to improve accessibility for researchers. Ishaan can write about this a lot…
  • Some states that do not have detailed data on demographic background have detailed other type of data (e.g., health systems capacity, co-morbidity). So, it’s not that they suck overall.
  • Oklahoma has weekly report. Most states have daily update.