Objective
To visualize group-specific excess deaths in the State of Texas (and United States).
Data
I called in the United States COVID-19 data directly into my work environment from the CDC website using URL link. I also restricted the data to year 2020 only with \(subset()\) function, being the year of interest. The merits of this approach include the ease of access to most recent data from the central database and the tons of space it allows user to save on storage disk.
url_us <- "https://data.cdc.gov/api/views/qfhf-uhaa/rows.csv?accessType=DOWNLOAD"
usdat0 <- read_csv(url_us)
# usdat0<-subset(usdat0, Outcome=='All Cause' & Type=='Unweighted' &
# MMWRYear==2020)
usdat0 <- subset(usdat0, Outcome == "All Cause" & Type == "Predicted (weighted)" &
MMWRYear == 2020 & MMWRWeek <= 39)The \(gsub()\) command is used to format the column names; such as removing trailing spaces and special characters, or replacing them with choice formatting. In this case, I removed the spaces and hyphens in-between column names. I also added two new columns- PriorNumberofDeaths and MMWRMonth to the dataframe:
# names(usdat0) <- gsub(x = names(usdat0), pattern = ('-'), replacement = '')
names(usdat0) <- gsub(x = names(usdat0), pattern = (" |-|/"), replacement = "")
usdat0$PriorNumberofDeaths <- 0
usdat0$MMWRMonth <- 0
dim(usdat0)[1] 12636 17
Using \(dim()\) function, it is observed that the file contains 1 rows (or cases) and 17 columns (or data attributes). The \(names()\) function is applied to display the dataframe column names:
[1] "Jurisdiction" "WeekEndingDate"
[3] "StateAbbreviation" "MMWRYear"
[5] "MMWRWeek" "RaceEthnicity"
[7] "TimePeriod" "Suppress"
[9] "Note" "Outcome"
[11] "NumberofDeaths" "AverageNumberofDeathsinTimePeriod"
[13] "Differencefrom20152019to2020" "PercentDifferencefrom20152019to2020"
[15] "Type" "PriorNumberofDeaths"
[17] "MMWRMonth"
Next step is to select the needed columns using a combination of \(subset()\) and \(select()\) functions, and rename the columns using \(transmute()\) function which automatically discards the original columns:
usdat <- subset(usdat0, select = c(Jurisdiction, Outcome, WeekEndingDate, MMWRYear,
MMWRMonth, MMWRWeek, RaceEthnicity, PriorNumberofDeaths, NumberofDeaths, AverageNumberofDeathsinTimePeriod,
Differencefrom20152019to2020, PercentDifferencefrom20152019to2020, Type))
usdat <- usdat %>% transmute(jurisd = Jurisdiction, outcome = Outcome, estimatetype = Type,
wkendate = WeekEndingDate, mmwryr = MMWRYear, mmwrmo = MMWRMonth, mmwrwk = MMWRWeek,
racethn = RaceEthnicity, projected_deaths = PriorNumberofDeaths, observed_deaths = NumberofDeaths,
avrg_observed_deaths = AverageNumberofDeathsinTimePeriod, excess_deaths = Differencefrom20152019to2020,
perc_excess_deaths = PercentDifferencefrom20152019to2020)
rm(usdat0)Then, I validate the transmuted column headers:
[1] "jurisd" "outcome" "estimatetype"
[4] "wkendate" "mmwryr" "mmwrmo"
[7] "mmwrwk" "racethn" "projected_deaths"
[10] "observed_deaths" "avrg_observed_deaths" "excess_deaths"
[13] "perc_excess_deaths"
Lastly, I check the the structure of the data structure using the \(glimpse()\) function:
Rows: 12,636
Columns: 13
$ jurisd <chr> "Alabama", "Alabama", "Alabama", "Alabama", "A...
$ outcome <chr> "All Cause", "All Cause", "All Cause", "All Ca...
$ estimatetype <chr> "Predicted (weighted)", "Predicted (weighted)"...
$ wkendate <chr> "01/04/2020", "01/11/2020", "01/18/2020", "01/...
$ mmwryr <dbl> 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020...
$ mmwrmo <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ mmwrwk <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,...
$ racethn <chr> "Hispanic", "Hispanic", "Hispanic", "Hispanic"...
$ projected_deaths <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ observed_deaths <dbl> NA, NA, NA, NA, 10, 11, NA, 12, NA, NA, NA, 10...
$ avrg_observed_deaths <dbl> NA, NA, NA, NA, 10, 11, NA, 12, NA, NA, NA, 10...
$ excess_deaths <dbl> NA, NA, NA, NA, 2, 4, NA, 5, NA, NA, NA, 6, NA...
$ perc_excess_deaths <dbl> NA, NA, NA, NA, 25.0, 57.1, NA, 71.4, NA, NA, ...
- Notes
- Outcome: Deaths from all cause
- Completeness: Data in recent weeks are incomplete. Only 60% of death records are submitted to NCHS within 10 days of the date of death, and completeness varies by jurisdiction.
Create new period and cummulative case report columns
In the following steps, I generated values for \(mmwrmo()\) column based on the \(wkendate()\) column to specify the months of data reporting.
- I convert \(wkendate()\) column to \(POSIXlt()\) class of data:
[1] "01/04/2020" "01/11/2020" "01/18/2020" "01/25/2020" "02/01/2020"
[1] "08/29/2020" "09/05/2020" "09/12/2020" "09/19/2020" "09/26/2020"
[1] "POSIXlt" "POSIXt"
[1] "2020-01-04 CST" "2020-01-11 CST" "2020-01-18 CST" "2020-01-25 CST"
[5] "2020-02-01 CST"
[1] "2020-08-29 CDT" "2020-09-05 CDT" "2020-09-12 CDT" "2020-09-19 CDT"
[5] "2020-09-26 CDT"
- I extract only the mmwrmos of reporting from the \(wkendate()\) column:
Frequencies
usdat$mmwrmo
Type: Character
Freq %
----------- ------- --------
01 1296 10.26
02 1620 12.82
03 1296 10.26
04 1296 10.26
05 1620 12.82
06 1296 10.26
07 1296 10.26
08 1620 12.82
09 1296 10.26
Total 12636 100.00
- I convert the extracted \(mmwrmos()\) to a \(factor()\) class of data with descriptive mmwrmo labels. Then, I reorder the columns such that \(mmwryr()\), \(mmwrwk()\), \(mmwrmo()\) and \(racethn()\) will shift from columns 4, 5, 11 and 6 to occupy columns 6, 4, 5 and 11, respectively:
usdat$mmwrmo <- factor(usdat$mmwrmo, levels = c("01", "02", "03", "04", "05", "06",
"07", "08", "09", "10"), labels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct"))
freq(usdat$mmwrmo, report.nas = F, cumul = F)Frequencies
usdat$mmwrmo
Type: Factor
Freq %
----------- ------- --------
Jan 1296 10.26
Feb 1620 12.82
Mar 1296 10.26
Apr 1296 10.26
May 1620 12.82
Jun 1296 10.26
Jul 1296 10.26
Aug 1620 12.82
Sep 1296 10.26
Oct 0 0.00
Total 12636 100.00
- I check the column names in the dataframe:
[1] "jurisd" "outcome" "estimatetype"
[4] "wkendate" "mmwryr" "mmwrmo"
[7] "mmwrwk" "racethn" "projected_deaths"
[10] "observed_deaths" "avrg_observed_deaths" "excess_deaths"
[13] "perc_excess_deaths"
Recode race/ethnicity labels
Here, I simply recode race/ethnicity to shorter labels and replace NAs with zeros:
Frequencies
usdat$racethn
Type: Character
Freq %
--------------------------------------------------- ------- --------
Hispanic 2106 16.67
Non-Hispanic American Indian or Alaska Native 2106 16.67
Non-Hispanic Asian 2106 16.67
Non-Hispanic Black 2106 16.67
Non-Hispanic White 2106 16.67
Other 2106 16.67
Total 12636 100.00
usdat$racethn <- recode(usdat$racethn, Hispanic = "Hispa", `Non-Hispanic American Indian or Alaska Native` = "IA/AN",
`Non-Hispanic Asian` = "Asian", `Non-Hispanic Black` = "Black", `Non-Hispanic White` = "White")
freq(usdat$racethn, report.nas = F, cumul = F)Frequencies
usdat$racethn
Type: Character
Freq %
----------- ------- --------
Asian 2106 16.67
Black 2106 16.67
Hispa 2106 16.67
IA/AN 2106 16.67
Other 2106 16.67
White 2106 16.67
Total 12636 100.00
usdat <- usdat %>% mutate(projected_deaths = coalesce(projected_deaths, 0), observed_deaths = coalesce(observed_deaths,
0), avrg_observed_deaths = coalesce(avrg_observed_deaths, 0), excess_deaths = coalesce(excess_deaths,
0), perc_excess_deaths = coalesce(perc_excess_deaths, 0))I generated values for \(projected_deaths\) column representing the difference between \(observed_deaths\) and \(excess_deaths\), then check the summary statistics for number of excess death in 2020:
usdat$projected_deaths_ <- as.numeric(usdat$observed_deaths - usdat$excess_deaths)
summary(usdat$excess_deaths) Min. 1st Qu. Median Mean 3rd Qu. Max.
-1352.0 0.0 0.0 55.4 18.0 11505.0
This implies estimated excess deaths as low as -1352 and as high as 11505 across the United States in year 2020 relative to similar time period (in months) between 2015 and 2019.
The \(View()\) function allows for viewing the dataframe:
Split data by race/ethnicity
Here, I attempt to create separate files by race/ethnicity and combine them into a wide-format data sets for the United States and State of Texas.
United States
usdat_ <- subset(usdat, racethn == "Hispa", select = c(jurisd, outcome, estimatetype,
wkendate, mmwryr, mmwrmo, mmwrwk))
dim(usdat_)[1] 2106 7
usdat_ia <- subset(usdat, racethn == "IA/AN")
usdat_ia <- usdat_ia %>% mutate(projected_deaths_ia = projected_deaths_, observed_deaths_ia = observed_deaths,
avrg_observed_deaths_ia = avrg_observed_deaths, excess_deaths_ia = excess_deaths,
perc_excess_deaths_ia = perc_excess_deaths, racethn_ia = racethn)
usdat_ia <- subset(usdat_ia, select = c(racethn_ia, projected_deaths_ia, observed_deaths_ia,
avrg_observed_deaths_ia, excess_deaths_ia, perc_excess_deaths_ia))
dim(usdat_ia)[1] 2106 6
usdat_as <- subset(usdat, racethn == "Asian")
usdat_as <- usdat_as %>% mutate(projected_deaths_as = projected_deaths_, observed_deaths_as = observed_deaths,
avrg_observed_deaths_as = avrg_observed_deaths, excess_deaths_as = excess_deaths,
perc_excess_deaths_as = perc_excess_deaths, racethn_as = racethn)
usdat_as <- subset(usdat_as, select = c(racethn_as, projected_deaths_as, observed_deaths_as,
avrg_observed_deaths_as, excess_deaths_as, perc_excess_deaths_as))
dim(usdat_as)[1] 2106 6
usdat_bl <- subset(usdat, racethn == "Black")
usdat_bl <- usdat_bl %>% mutate(projected_deaths_bl = projected_deaths_, observed_deaths_bl = observed_deaths,
avrg_observed_deaths_bl = avrg_observed_deaths, excess_deaths_bl = excess_deaths,
perc_excess_deaths_bl = perc_excess_deaths, racethn_bl = racethn)
usdat_bl <- subset(usdat_bl, select = c(racethn_bl, projected_deaths_bl, observed_deaths_bl,
avrg_observed_deaths_bl, excess_deaths_bl, perc_excess_deaths_bl))
dim(usdat_bl)[1] 2106 6
usdat_hi <- subset(usdat, racethn == "Hispa")
usdat_hi <- usdat_hi %>% mutate(projected_deaths_hi = projected_deaths_, observed_deaths_hi = observed_deaths,
avrg_observed_deaths_hi = avrg_observed_deaths, excess_deaths_hi = excess_deaths,
perc_excess_deaths_hi = perc_excess_deaths, racethn_hi = racethn)
usdat_hi <- subset(usdat_hi, select = c(racethn_hi, projected_deaths_hi, observed_deaths_hi,
avrg_observed_deaths_hi, excess_deaths_hi, perc_excess_deaths_hi))
dim(usdat_hi)[1] 2106 6
usdat_ot <- subset(usdat, racethn == "Other")
usdat_ot <- usdat_ot %>% mutate(projected_deaths_ot = projected_deaths_, observed_deaths_ot = observed_deaths,
avrg_observed_deaths_ot = avrg_observed_deaths, excess_deaths_ot = excess_deaths,
perc_excess_deaths_ot = perc_excess_deaths, racethn_ot = racethn)
usdat_ot <- subset(usdat_ot, select = c(racethn_ot, projected_deaths_ot, observed_deaths_ot,
avrg_observed_deaths_ot, excess_deaths_ot, perc_excess_deaths_ot))
dim(usdat_ot)[1] 2106 6
usdat_wh <- subset(usdat, racethn == "White")
usdat_wh <- usdat_wh %>% mutate(projected_deaths_wh = projected_deaths_, observed_deaths_wh = observed_deaths,
avrg_observed_deaths_wh = avrg_observed_deaths, excess_deaths_wh = excess_deaths,
perc_excess_deaths_wh = perc_excess_deaths, racethn_wh = racethn)
usdat_wh <- subset(usdat_wh, select = c(racethn_wh, projected_deaths_wh, observed_deaths_wh,
avrg_observed_deaths_wh, excess_deaths_wh, perc_excess_deaths_wh))
dim(usdat_wh)[1] 2106 6
usdat2 <- as.data.frame(cbind(usdat_, usdat_wh, usdat_hi, usdat_bl, usdat_as, usdat_ia,
usdat_ot))
dim(usdat2)[1] 2106 43
[1] 39
[1] "jurisd" "outcome"
[3] "estimatetype" "wkendate"
[5] "mmwryr" "mmwrmo"
[7] "mmwrwk" "racethn_wh"
[9] "projected_deaths_wh" "observed_deaths_wh"
[11] "avrg_observed_deaths_wh" "excess_deaths_wh"
[13] "perc_excess_deaths_wh" "racethn_hi"
[15] "projected_deaths_hi" "observed_deaths_hi"
[17] "avrg_observed_deaths_hi" "excess_deaths_hi"
[19] "perc_excess_deaths_hi" "racethn_bl"
[21] "projected_deaths_bl" "observed_deaths_bl"
[23] "avrg_observed_deaths_bl" "excess_deaths_bl"
[25] "perc_excess_deaths_bl" "racethn_as"
[27] "projected_deaths_as" "observed_deaths_as"
[29] "avrg_observed_deaths_as" "excess_deaths_as"
[31] "perc_excess_deaths_as" "racethn_ia"
[33] "projected_deaths_ia" "observed_deaths_ia"
[35] "avrg_observed_deaths_ia" "excess_deaths_ia"
[37] "perc_excess_deaths_ia" "racethn_ot"
[39] "projected_deaths_ot" "observed_deaths_ot"
[41] "avrg_observed_deaths_ot" "excess_deaths_ot"
[43] "perc_excess_deaths_ot"
State of Texas
[1] 234 14
txdat_ <- subset(txdat, racethn == "Hispa", select = c(jurisd, outcome, estimatetype,
wkendate, mmwryr, mmwrmo, mmwrwk))
dim(txdat_)[1] 39 7
txdat_ia <- subset(txdat, racethn == "IA/AN")
txdat_ia <- txdat_ia %>% mutate(projected_deaths_ia = projected_deaths_, observed_deaths_ia = observed_deaths,
avrg_observed_deaths_ia = avrg_observed_deaths, excess_deaths_ia = excess_deaths,
perc_excess_deaths_ia = perc_excess_deaths, racethn_ia = racethn)
txdat_ia <- subset(txdat_ia, select = c(racethn_ia, projected_deaths_ia, observed_deaths_ia,
avrg_observed_deaths_ia, excess_deaths_ia, perc_excess_deaths_ia))
dim(txdat_ia)[1] 39 6
txdat_as <- subset(txdat, racethn == "Asian")
txdat_as <- txdat_as %>% mutate(projected_deaths_as = projected_deaths_, observed_deaths_as = observed_deaths,
avrg_observed_deaths_as = avrg_observed_deaths, excess_deaths_as = excess_deaths,
perc_excess_deaths_as = perc_excess_deaths, racethn_as = racethn)
txdat_as <- subset(txdat_as, select = c(racethn_as, projected_deaths_as, observed_deaths_as,
avrg_observed_deaths_as, excess_deaths_as, perc_excess_deaths_as))
dim(txdat_as)[1] 39 6
txdat_bl <- subset(txdat, racethn == "Black")
txdat_bl <- txdat_bl %>% mutate(projected_deaths_bl = projected_deaths_, observed_deaths_bl = observed_deaths,
avrg_observed_deaths_bl = avrg_observed_deaths, excess_deaths_bl = excess_deaths,
perc_excess_deaths_bl = perc_excess_deaths, racethn_bl = racethn)
txdat_bl <- subset(txdat_bl, select = c(racethn_bl, projected_deaths_bl, observed_deaths_bl,
avrg_observed_deaths_bl, excess_deaths_bl, perc_excess_deaths_bl))
dim(txdat_bl)[1] 39 6
txdat_hi <- subset(txdat, racethn == "Hispa")
txdat_hi <- txdat_hi %>% mutate(projected_deaths_hi = projected_deaths_, observed_deaths_hi = observed_deaths,
avrg_observed_deaths_hi = avrg_observed_deaths, excess_deaths_hi = excess_deaths,
perc_excess_deaths_hi = perc_excess_deaths, racethn_hi = racethn)
txdat_hi <- subset(txdat_hi, select = c(racethn_hi, projected_deaths_hi, observed_deaths_hi,
avrg_observed_deaths_hi, excess_deaths_hi, perc_excess_deaths_hi))
dim(txdat_hi)[1] 39 6
txdat_ot <- subset(txdat, racethn == "Other")
txdat_ot <- txdat_ot %>% mutate(projected_deaths_ot = projected_deaths_, observed_deaths_ot = observed_deaths,
avrg_observed_deaths_ot = avrg_observed_deaths, excess_deaths_ot = excess_deaths,
perc_excess_deaths_ot = perc_excess_deaths, racethn_ot = racethn)
txdat_ot <- subset(txdat_ot, select = c(racethn_ot, projected_deaths_ot, observed_deaths_ot,
avrg_observed_deaths_ot, excess_deaths_ot, perc_excess_deaths_ot))
dim(txdat_ot)[1] 39 6
txdat_wh <- subset(txdat, racethn == "White")
txdat_wh <- txdat_wh %>% mutate(projected_deaths_wh = projected_deaths_, observed_deaths_wh = observed_deaths,
avrg_observed_deaths_wh = avrg_observed_deaths, excess_deaths_wh = excess_deaths,
perc_excess_deaths_wh = perc_excess_deaths, racethn_wh = racethn)
txdat_wh <- subset(txdat_wh, select = c(racethn_wh, projected_deaths_wh, observed_deaths_wh,
avrg_observed_deaths_wh, excess_deaths_wh, perc_excess_deaths_wh))
dim(txdat_wh)[1] 39 6
txdat2 <- as.data.frame(cbind(txdat_, txdat_wh, txdat_hi, txdat_bl, txdat_as, txdat_ia,
txdat_ot))
dim(txdat2)[1] 39 43
[1] 39
[1] "jurisd" "outcome"
[3] "estimatetype" "wkendate"
[5] "mmwryr" "mmwrmo"
[7] "mmwrwk" "racethn_wh"
[9] "projected_deaths_wh" "observed_deaths_wh"
[11] "avrg_observed_deaths_wh" "excess_deaths_wh"
[13] "perc_excess_deaths_wh" "racethn_hi"
[15] "projected_deaths_hi" "observed_deaths_hi"
[17] "avrg_observed_deaths_hi" "excess_deaths_hi"
[19] "perc_excess_deaths_hi" "racethn_bl"
[21] "projected_deaths_bl" "observed_deaths_bl"
[23] "avrg_observed_deaths_bl" "excess_deaths_bl"
[25] "perc_excess_deaths_bl" "racethn_as"
[27] "projected_deaths_as" "observed_deaths_as"
[29] "avrg_observed_deaths_as" "excess_deaths_as"
[31] "perc_excess_deaths_as" "racethn_ia"
[33] "projected_deaths_ia" "observed_deaths_ia"
[35] "avrg_observed_deaths_ia" "excess_deaths_ia"
[37] "perc_excess_deaths_ia" "racethn_ot"
[39] "projected_deaths_ot" "observed_deaths_ot"
[41] "avrg_observed_deaths_ot" "excess_deaths_ot"
[43] "perc_excess_deaths_ot"
Export data
Since I work in R environment, I prefer and recommended saving data in base R format using .rda or .Rdata extension with compress= option. The function \(load()\) is used to call .rda files into R session. Saving data in this format compresses the file and saves significant space on disk. The \(save()\) command is very flexible and can save any file object.
save(usdat, file = "../ExcessDeaths/data/usdat_long.rda", compress = "xz")
save(usdat2, file = "../ExcessDeaths/data/usdat_wide.rda", compress = "xz")
save(txdat, file = "../ExcessDeaths/data/txdat_long.rda", compress = "xz")
save(txdat2, file = "../ExcessDeaths/data/txdat_wide.rda", compress = "xz")
save(txdat_wh, file = "../ExcessDeaths/data/txdat_white.rda", compress = "xz")
save(txdat_hi, file = "../ExcessDeaths/data/txdat_hispc.rda", compress = "xz")
save(txdat_bl, file = "../ExcessDeaths/data/txdat_black.rda", compress = "xz")
save(txdat_as, file = "../ExcessDeaths/data/txdat_asian.rda", compress = "xz")
save(txdat_ia, file = "../ExcessDeaths/data/txdat_indal.rda", compress = "xz")
save(txdat_ot, file = "../ExcessDeaths/data/txdat_other.rda", compress = "xz")write_csv(usdat, path = "../ExcessDeaths/data/usdat_long.csv")
write_csv(usdat2, path = "../ExcessDeaths/data/usdat_wide.csv")
write_csv(txdat, path = "../ExcessDeaths/data/txdat_long.csv")
write_csv(txdat2, path = "../ExcessDeaths/data/txdat_wide.csv")
write_csv(txdat_wh, path = "../ExcessDeaths/data/txdat_white.csv")
write_csv(txdat_hi, path = "../ExcessDeaths/data/txdat_hispc.csv")
write_csv(txdat_bl, path = "../ExcessDeaths/data/txdat_black.csv")
write_csv(txdat_as, path = "../ExcessDeaths/data/txdat_asian.csv")
write_csv(txdat_ia, path = "../ExcessDeaths/data/txdat_indal.csv")
write_csv(txdat_ot, path = "../ExcessDeaths/data/txdat_other.csv")Attributions
Corey Sparks, Ph.D. lab materials.
Ben Baumer and R. Jordan Crouser lab materials.
CDC/NCHS’s Main Database with major reference to data on Provisional COVID-19 Death Counts by Sex, Age, and State, United States COVID-19 Cases and Deaths by State over Time, Provisional COVID-19 Death Counts by Week Ending Date and State, and Provisional COVID-19 Death Counts in the United States by County.
R package was used for data wrangling and analysis.