Demography of COVID-19

lab01: Introduction to R and RStudio Utilities

Elhakim Ibrahim

Graduate Assistant, Department of Demography, The University of Texas at San Antonio, TX, USA



Lab objectives

By the end of this lab:

  • We would have learned the basics of R and RStudio, and their common concepts as applicable to this class;
  • We would develop appreciation of the flexibility and applicability of the R tools in conducting demographic analysis


Why R and RStudio?

  • Flexible analytical software
  • Free; open-source; vast community-of-practice
  • Reproducible analysis


R and Data analysis: the confluence

R stores data as dataframe object which contains a collection of vectors representing rows/observations and columns/attributes in the dataframe. In this class, for instance, we have used a number of data files downloaded from the US CDC website to calculate rates and ratios; each of those data files represents an object in R, while the attributes such as jurisdiction, race/ethinicity, fatality status represent the vectors.

Setting up RStudio workspace

The YAML interface

  • Meaning:
    • Before: used to stand for Yet Another Markup Language
    • Now: stands for YAML Aint Markup Language
  • A standalone programming language that allows you to design the general architecture/template of your R Markdown document:
    • Common of R Markdown documents are HTML (such as one we are using for this lab), CSS, PDF and Word documents.
    • All about customizing the overall look of your document.

Example of default YAML:

---
title: "lab01: Introduction to R and RStudio Utilities"
author: "Elhakim Ibrahim"
date: "10/20/2020"
output: html_document
---

Check below links for more details:

Chunk settings

R functions work on codes specified in dedicated block called chunk. R allows for additional list of workspace customization options by specifying a global setting options for the entire R session using opts_chunk$set() function:

remove() 
rm(list=ls())

knitr::opts_chunk$set(
comment=NULL, message=F, warning=F, error=F, tidy=T, 
prompt=F, cache=F, fig.align='center', fig.ext='png')

options(prompt="e>",digits=5,scipen=10,max.print=100000)


R and RStudio: the tools

Instaling R packages

  • The first step in using R
  • Achieved with instal.packages("nameofpackage") function

Loading required libraries

  • Achieved with library(nameofpackage) function; this time without quotes around the name:

Possibilities in RStudio

R has lots of functions embedded in the libraries which enable users to download different data types (e.g. haven()), perform data restructuring, creation, subsetting, recoding, renaming, to mention but a few, as well as basic and complex tabulations. We shall go through some of these basic functions with the US COVID-19 data downloaded from the CDC website.



Example with CDC COVID-19 data

The data

I called in the United States COVID-19 data (object) directly into my work environment from the CDC website using URL link. This is one of the classical flexibility allowed by R. I also restricted the data to year 2020 only withsubset() function, being the year of interest. The merits of this approach include the ease of access to most recent data from the central database and the tons of space it allows user to save on storage disk.

url_us <- "https://data.cdc.gov/api/views/qfhf-uhaa/rows.csv?accessType=DOWNLOAD"
usdat0 <- read_csv(url_us)
## usdat0<-subset(usdat0, Outcome=='All Cause' & Type=='Unweighted' &
## MMWRYear==2020)
usdat0 <- subset(usdat0, Outcome == "All Cause" & Type == "Predicted (weighted)" & 
    MMWRYear == 2020 & MMWRWeek <= 39)

Cleaning column names

In some instances, column names could have space or special character in-between which could be a torn in the flesh. Imagine having to manually edit an excel dataset/object with 100 columns with such “disturbing characters” that need be clean before working on the data.

Good news!!! We can acheive this with one line of codes using the gsub() command. The command allows us to format the column names; such as removing trailing spaces and special characters, or replacing them with choice formatting. In this case, I removed the spaces and hyphens in-between column names. I also added two new columns- Rowcount, PriorNumberofDeaths and MMWRMonth to the dataframe:

names(usdat0) <- gsub(x = names(usdat0), pattern = (" |-|/"), replacement = "")

usdat0$Rowcount <- c(1:nrow(usdat0))
usdat0$PriorNumberofDeaths <- 0
usdat0$MMWRMonth <- 0

Checking data contents

Using dim() function, it is observed that the file contains 12636 rows (or cases) and 18 columns (or data attributes).

dim(usdat0)
[1] 12636    18

Meanwhile, the names() function is applied to display the dataframe column names:

names(usdat0)
 [1] "Jurisdiction"                        "WeekEndingDate"                     
 [3] "StateAbbreviation"                   "MMWRYear"                           
 [5] "MMWRWeek"                            "RaceEthnicity"                      
 [7] "TimePeriod"                          "Suppress"                           
 [9] "Note"                                "Outcome"                            
[11] "NumberofDeaths"                      "AverageNumberofDeathsinTimePeriod"  
[13] "Differencefrom20152019to2020"        "PercentDifferencefrom20152019to2020"
[15] "Type"                                "Rowcount"                           
[17] "PriorNumberofDeaths"                 "MMWRMonth"                          

Subsetting and renaming

Next step is to select the needed columns using a combination of subset() and select() functions, and rename the columns using transmute() function which automatically discards the original columns:

usdat <- subset(usdat0, select = c(Jurisdiction, Outcome, WeekEndingDate, MMWRYear, 
    MMWRMonth, MMWRWeek, RaceEthnicity, PriorNumberofDeaths, NumberofDeaths, AverageNumberofDeathsinTimePeriod, 
    Differencefrom20152019to2020, PercentDifferencefrom20152019to2020, Type))

usdat <- usdat %>% transmute(jurisd = Jurisdiction, outcome = Outcome, estimatetype = Type, 
    wkendate = WeekEndingDate, mmwryr = MMWRYear, mmwrmo = MMWRMonth, mmwrwk = MMWRWeek, 
    racethn = RaceEthnicity, projected_deaths = PriorNumberofDeaths, observed_deaths = NumberofDeaths, 
    avrg_observed_deaths = AverageNumberofDeathsinTimePeriod, excess_deaths = Differencefrom20152019to2020, 
    perc_excess_deaths = PercentDifferencefrom20152019to2020)
rm(usdat0)

Then, I validate the transmuted column headers:

names(usdat)
 [1] "jurisd"               "outcome"              "estimatetype"        
 [4] "wkendate"             "mmwryr"               "mmwrmo"              
 [7] "mmwrwk"               "racethn"              "projected_deaths"    
[10] "observed_deaths"      "avrg_observed_deaths" "excess_deaths"       
[13] "perc_excess_deaths"  

Lastly, I check the the structure of the data structure using theglimpse() function:

glimpse(usdat)
Rows: 12,636
Columns: 13
$ jurisd               <chr> "Alabama", "Alabama", "Alabama", "Alabama", "A...
$ outcome              <chr> "All Cause", "All Cause", "All Cause", "All Ca...
$ estimatetype         <chr> "Predicted (weighted)", "Predicted (weighted)"...
$ wkendate             <chr> "01/04/2020", "01/11/2020", "01/18/2020", "01/...
$ mmwryr               <dbl> 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020...
$ mmwrmo               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ mmwrwk               <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,...
$ racethn              <chr> "Hispanic", "Hispanic", "Hispanic", "Hispanic"...
$ projected_deaths     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0...
$ observed_deaths      <dbl> NA, NA, NA, NA, 10, 11, NA, 12, NA, NA, NA, 10...
$ avrg_observed_deaths <dbl> NA, NA, NA, NA, 10, 11, NA, 12, NA, NA, NA, 10...
$ excess_deaths        <dbl> NA, NA, NA, NA, 2, 4, NA, 5, NA, NA, NA, 6, NA...
$ perc_excess_deaths   <dbl> NA, NA, NA, NA, 25.0, 57.1, NA, 71.4, NA, NA, ...

Notes

  • Outcome: Deaths from all cause
  • Completeness: Data in recent weeks are incomplete. Only 60% of death records are submitted to NCHS within 10 days of the date of death, and completeness varies by jurisdiction.

Creating month from date column

In the following steps, I generated values formmwrmo column based on thewkendate column to specify the months of data reporting.

  1. I convertwkendate column toPOSIXlt class of data:
head(usdat$wkendate, 5)
[1] "01/04/2020" "01/11/2020" "01/18/2020" "01/25/2020" "02/01/2020"
tail(usdat$wkendate, 5)
[1] "08/29/2020" "09/05/2020" "09/12/2020" "09/19/2020" "09/26/2020"
usdat$mmwrmo <- strptime(usdat$wkendate, "%m/%d/%Y")
class(usdat$mmwrmo)
[1] "POSIXlt" "POSIXt" 
head(usdat$mmwrmo, 5)
[1] "2020-01-04 CST" "2020-01-11 CST" "2020-01-18 CST" "2020-01-25 CST"
[5] "2020-02-01 CST"
tail(usdat$mmwrmo, 5)
[1] "2020-08-29 CDT" "2020-09-05 CDT" "2020-09-12 CDT" "2020-09-19 CDT"
[5] "2020-09-26 CDT"
  1. I extract only the mmwrmos of reporting from the wkendate column:
usdat$mmwrmo <- format(usdat$mmwrmo, "%m")
freq(usdat$mmwrmo, report.nas = F, cumul = F)
Frequencies  
usdat$mmwrmo  
Type: Character  

               Freq        %
----------- ------- --------
         01    1296    10.26
         02    1620    12.82
         03    1296    10.26
         04    1296    10.26
         05    1620    12.82
         06    1296    10.26
         07    1296    10.26
         08    1620    12.82
         09    1296    10.26
      Total   12636   100.00
  1. I convert the extracted mmwrmos to a factor class of data with descriptive mmwrmo labels. Then, I reorder the columns such that mmwryr, mmwrwk,mmwrmo and racethn will shift from columns 4, 5, 11 and 6 to occupy columns 6, 4, 5 and 11, respectively:
usdat$mmwrmo <- factor(usdat$mmwrmo, levels = c("01", "02", "03", "04", "05", "06", 
    "07", "08", "09", "10"), labels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", 
    "Jul", "Aug", "Sep", "Oct"))
freq(usdat$mmwrmo, report.nas = F, cumul = F)
Frequencies  
usdat$mmwrmo  
Type: Factor  

               Freq        %
----------- ------- --------
        Jan    1296    10.26
        Feb    1620    12.82
        Mar    1296    10.26
        Apr    1296    10.26
        May    1620    12.82
        Jun    1296    10.26
        Jul    1296    10.26
        Aug    1620    12.82
        Sep    1296    10.26
        Oct       0     0.00
      Total   12636   100.00
  1. I check the column names in the dataframe:
names(usdat)
 [1] "jurisd"               "outcome"              "estimatetype"        
 [4] "wkendate"             "mmwryr"               "mmwrmo"              
 [7] "mmwrwk"               "racethn"              "projected_deaths"    
[10] "observed_deaths"      "avrg_observed_deaths" "excess_deaths"       
[13] "perc_excess_deaths"  


Recoding labels

Here, I simply recode race/ethnicity to shorter labels and replace NAs with zeros:

freq(usdat$racethn, report.nas = F, cumul = F)
Frequencies  
usdat$racethn  
Type: Character  

                                                       Freq        %
--------------------------------------------------- ------- --------
                                           Hispanic    2106    16.67
      Non-Hispanic American Indian or Alaska Native    2106    16.67
                                 Non-Hispanic Asian    2106    16.67
                                 Non-Hispanic Black    2106    16.67
                                 Non-Hispanic White    2106    16.67
                                              Other    2106    16.67
                                              Total   12636   100.00
usdat$racethn <- recode(usdat$racethn, Hispanic = "Hispa", `Non-Hispanic American Indian or Alaska Native` = "IA/AN", 
    `Non-Hispanic Asian` = "Asian", `Non-Hispanic Black` = "Black", `Non-Hispanic White` = "White")
freq(usdat$racethn, report.nas = F, cumul = F)
Frequencies  
usdat$racethn  
Type: Character  

               Freq        %
----------- ------- --------
      Asian    2106    16.67
      Black    2106    16.67
      Hispa    2106    16.67
      IA/AN    2106    16.67
      Other    2106    16.67
      White    2106    16.67
      Total   12636   100.00
usdat <- usdat %>% mutate(projected_deaths = coalesce(projected_deaths, 0), observed_deaths = coalesce(observed_deaths, 
    0), avrg_observed_deaths = coalesce(avrg_observed_deaths, 0), excess_deaths = coalesce(excess_deaths, 
    0), perc_excess_deaths = coalesce(perc_excess_deaths, 0))

I generated values forprojected_.deaths column representing the difference between observed_.deaths and excess_deaths, then check the summary statistics for number of excess death in 2020:

usdat$projected_deaths_ <- as.numeric(usdat$observed_deaths - usdat$excess_deaths)

summary(usdat$excess_deaths)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-1352.0     0.0     0.0    55.4    18.0 11505.0 

This implies estimated excess deaths as low as -1352 and as high as 11505 across the United States in year 2020 relative to similar time period (in months) between 2015 and 2019.

The View() function allows for viewing the dataframe:

View(usdat)


Restructuring and splitting data

Here, I attempt to create separate files by race/ethnicity and combine them into a wide-format data sets for the United States and State of Texas.

Creating final United States data
usdat_ <- subset(usdat, racethn == "Hispa", select = c(jurisd, outcome, estimatetype, 
    wkendate, mmwryr, mmwrmo, mmwrwk))
dim(usdat_)
[1] 2106    7
usdat_ia <- subset(usdat, racethn == "IA/AN")
usdat_ia <- usdat_ia %>% mutate(projected_deaths_ia = projected_deaths_, observed_deaths_ia = observed_deaths, 
    avrg_observed_deaths_ia = avrg_observed_deaths, excess_deaths_ia = excess_deaths, 
    perc_excess_deaths_ia = perc_excess_deaths, racethn_ia = racethn)
usdat_ia <- subset(usdat_ia, select = c(racethn_ia, projected_deaths_ia, observed_deaths_ia, 
    avrg_observed_deaths_ia, excess_deaths_ia, perc_excess_deaths_ia))
dim(usdat_ia)
[1] 2106    6
usdat_as <- subset(usdat, racethn == "Asian")
usdat_as <- usdat_as %>% mutate(projected_deaths_as = projected_deaths_, observed_deaths_as = observed_deaths, 
    avrg_observed_deaths_as = avrg_observed_deaths, excess_deaths_as = excess_deaths, 
    perc_excess_deaths_as = perc_excess_deaths, racethn_as = racethn)
usdat_as <- subset(usdat_as, select = c(racethn_as, projected_deaths_as, observed_deaths_as, 
    avrg_observed_deaths_as, excess_deaths_as, perc_excess_deaths_as))
dim(usdat_as)
[1] 2106    6
usdat_bl <- subset(usdat, racethn == "Black")
usdat_bl <- usdat_bl %>% mutate(projected_deaths_bl = projected_deaths_, observed_deaths_bl = observed_deaths, 
    avrg_observed_deaths_bl = avrg_observed_deaths, excess_deaths_bl = excess_deaths, 
    perc_excess_deaths_bl = perc_excess_deaths, racethn_bl = racethn)
usdat_bl <- subset(usdat_bl, select = c(racethn_bl, projected_deaths_bl, observed_deaths_bl, 
    avrg_observed_deaths_bl, excess_deaths_bl, perc_excess_deaths_bl))
dim(usdat_bl)
[1] 2106    6
usdat_hi <- subset(usdat, racethn == "Hispa")
usdat_hi <- usdat_hi %>% mutate(projected_deaths_hi = projected_deaths_, observed_deaths_hi = observed_deaths, 
    avrg_observed_deaths_hi = avrg_observed_deaths, excess_deaths_hi = excess_deaths, 
    perc_excess_deaths_hi = perc_excess_deaths, racethn_hi = racethn)
usdat_hi <- subset(usdat_hi, select = c(racethn_hi, projected_deaths_hi, observed_deaths_hi, 
    avrg_observed_deaths_hi, excess_deaths_hi, perc_excess_deaths_hi))
dim(usdat_hi)
[1] 2106    6
usdat_ot <- subset(usdat, racethn == "Other")
usdat_ot <- usdat_ot %>% mutate(projected_deaths_ot = projected_deaths_, observed_deaths_ot = observed_deaths, 
    avrg_observed_deaths_ot = avrg_observed_deaths, excess_deaths_ot = excess_deaths, 
    perc_excess_deaths_ot = perc_excess_deaths, racethn_ot = racethn)
usdat_ot <- subset(usdat_ot, select = c(racethn_ot, projected_deaths_ot, observed_deaths_ot, 
    avrg_observed_deaths_ot, excess_deaths_ot, perc_excess_deaths_ot))
dim(usdat_ot)
[1] 2106    6
usdat_wh <- subset(usdat, racethn == "White")
usdat_wh <- usdat_wh %>% mutate(projected_deaths_wh = projected_deaths_, observed_deaths_wh = observed_deaths, 
    avrg_observed_deaths_wh = avrg_observed_deaths, excess_deaths_wh = excess_deaths, 
    perc_excess_deaths_wh = perc_excess_deaths, racethn_wh = racethn)
usdat_wh <- subset(usdat_wh, select = c(racethn_wh, projected_deaths_wh, observed_deaths_wh, 
    avrg_observed_deaths_wh, excess_deaths_wh, perc_excess_deaths_wh))
dim(usdat_wh)
[1] 2106    6
usdat2 <- as.data.frame(cbind(usdat_, usdat_wh, usdat_hi, usdat_bl, usdat_as, usdat_ia, 
    usdat_ot))
dim(usdat2)
[1] 2106   43
max(usdat2$mmwrwk)
[1] 39
names(usdat2)
 [1] "jurisd"                  "outcome"                
 [3] "estimatetype"            "wkendate"               
 [5] "mmwryr"                  "mmwrmo"                 
 [7] "mmwrwk"                  "racethn_wh"             
 [9] "projected_deaths_wh"     "observed_deaths_wh"     
[11] "avrg_observed_deaths_wh" "excess_deaths_wh"       
[13] "perc_excess_deaths_wh"   "racethn_hi"             
[15] "projected_deaths_hi"     "observed_deaths_hi"     
[17] "avrg_observed_deaths_hi" "excess_deaths_hi"       
[19] "perc_excess_deaths_hi"   "racethn_bl"             
[21] "projected_deaths_bl"     "observed_deaths_bl"     
[23] "avrg_observed_deaths_bl" "excess_deaths_bl"       
[25] "perc_excess_deaths_bl"   "racethn_as"             
[27] "projected_deaths_as"     "observed_deaths_as"     
[29] "avrg_observed_deaths_as" "excess_deaths_as"       
[31] "perc_excess_deaths_as"   "racethn_ia"             
[33] "projected_deaths_ia"     "observed_deaths_ia"     
[35] "avrg_observed_deaths_ia" "excess_deaths_ia"       
[37] "perc_excess_deaths_ia"   "racethn_ot"             
[39] "projected_deaths_ot"     "observed_deaths_ot"     
[41] "avrg_observed_deaths_ot" "excess_deaths_ot"       
[43] "perc_excess_deaths_ot"  
View(usdat2)
Creating State of Texas data
txdat <- subset(usdat, jurisd == "Texas")
dim(txdat)
[1] 234  14
txdat_ <- subset(txdat, racethn == "Hispa", select = c(jurisd, outcome, estimatetype, 
    wkendate, mmwryr, mmwrmo, mmwrwk))
dim(txdat_)
[1] 39  7
txdat_ia <- subset(txdat, racethn == "IA/AN")
txdat_ia <- txdat_ia %>% mutate(projected_deaths_ia = projected_deaths_, observed_deaths_ia = observed_deaths, 
    avrg_observed_deaths_ia = avrg_observed_deaths, excess_deaths_ia = excess_deaths, 
    perc_excess_deaths_ia = perc_excess_deaths, racethn_ia = racethn)
txdat_ia <- subset(txdat_ia, select = c(racethn_ia, projected_deaths_ia, observed_deaths_ia, 
    avrg_observed_deaths_ia, excess_deaths_ia, perc_excess_deaths_ia))
dim(txdat_ia)
[1] 39  6
txdat_as <- subset(txdat, racethn == "Asian")
txdat_as <- txdat_as %>% mutate(projected_deaths_as = projected_deaths_, observed_deaths_as = observed_deaths, 
    avrg_observed_deaths_as = avrg_observed_deaths, excess_deaths_as = excess_deaths, 
    perc_excess_deaths_as = perc_excess_deaths, racethn_as = racethn)
txdat_as <- subset(txdat_as, select = c(racethn_as, projected_deaths_as, observed_deaths_as, 
    avrg_observed_deaths_as, excess_deaths_as, perc_excess_deaths_as))
dim(txdat_as)
[1] 39  6
txdat_bl <- subset(txdat, racethn == "Black")
txdat_bl <- txdat_bl %>% mutate(projected_deaths_bl = projected_deaths_, observed_deaths_bl = observed_deaths, 
    avrg_observed_deaths_bl = avrg_observed_deaths, excess_deaths_bl = excess_deaths, 
    perc_excess_deaths_bl = perc_excess_deaths, racethn_bl = racethn)
txdat_bl <- subset(txdat_bl, select = c(racethn_bl, projected_deaths_bl, observed_deaths_bl, 
    avrg_observed_deaths_bl, excess_deaths_bl, perc_excess_deaths_bl))
dim(txdat_bl)
[1] 39  6
txdat_hi <- subset(txdat, racethn == "Hispa")
txdat_hi <- txdat_hi %>% mutate(projected_deaths_hi = projected_deaths_, observed_deaths_hi = observed_deaths, 
    avrg_observed_deaths_hi = avrg_observed_deaths, excess_deaths_hi = excess_deaths, 
    perc_excess_deaths_hi = perc_excess_deaths, racethn_hi = racethn)
txdat_hi <- subset(txdat_hi, select = c(racethn_hi, projected_deaths_hi, observed_deaths_hi, 
    avrg_observed_deaths_hi, excess_deaths_hi, perc_excess_deaths_hi))
dim(txdat_hi)
[1] 39  6
txdat_ot <- subset(txdat, racethn == "Other")
txdat_ot <- txdat_ot %>% mutate(projected_deaths_ot = projected_deaths_, observed_deaths_ot = observed_deaths, 
    avrg_observed_deaths_ot = avrg_observed_deaths, excess_deaths_ot = excess_deaths, 
    perc_excess_deaths_ot = perc_excess_deaths, racethn_ot = racethn)
txdat_ot <- subset(txdat_ot, select = c(racethn_ot, projected_deaths_ot, observed_deaths_ot, 
    avrg_observed_deaths_ot, excess_deaths_ot, perc_excess_deaths_ot))
dim(txdat_ot)
[1] 39  6
txdat_wh <- subset(txdat, racethn == "White")
txdat_wh <- txdat_wh %>% mutate(projected_deaths_wh = projected_deaths_, observed_deaths_wh = observed_deaths, 
    avrg_observed_deaths_wh = avrg_observed_deaths, excess_deaths_wh = excess_deaths, 
    perc_excess_deaths_wh = perc_excess_deaths, racethn_wh = racethn)
txdat_wh <- subset(txdat_wh, select = c(racethn_wh, projected_deaths_wh, observed_deaths_wh, 
    avrg_observed_deaths_wh, excess_deaths_wh, perc_excess_deaths_wh))
dim(txdat_wh)
[1] 39  6
txdat2 <- as.data.frame(cbind(txdat_, txdat_wh, txdat_hi, txdat_bl, txdat_as, txdat_ia, 
    txdat_ot))
dim(txdat2)
[1] 39 43
max(txdat2$mmwrwk)
[1] 39
names(txdat2)
 [1] "jurisd"                  "outcome"                
 [3] "estimatetype"            "wkendate"               
 [5] "mmwryr"                  "mmwrmo"                 
 [7] "mmwrwk"                  "racethn_wh"             
 [9] "projected_deaths_wh"     "observed_deaths_wh"     
[11] "avrg_observed_deaths_wh" "excess_deaths_wh"       
[13] "perc_excess_deaths_wh"   "racethn_hi"             
[15] "projected_deaths_hi"     "observed_deaths_hi"     
[17] "avrg_observed_deaths_hi" "excess_deaths_hi"       
[19] "perc_excess_deaths_hi"   "racethn_bl"             
[21] "projected_deaths_bl"     "observed_deaths_bl"     
[23] "avrg_observed_deaths_bl" "excess_deaths_bl"       
[25] "perc_excess_deaths_bl"   "racethn_as"             
[27] "projected_deaths_as"     "observed_deaths_as"     
[29] "avrg_observed_deaths_as" "excess_deaths_as"       
[31] "perc_excess_deaths_as"   "racethn_ia"             
[33] "projected_deaths_ia"     "observed_deaths_ia"     
[35] "avrg_observed_deaths_ia" "excess_deaths_ia"       
[37] "perc_excess_deaths_ia"   "racethn_ot"             
[39] "projected_deaths_ot"     "observed_deaths_ot"     
[41] "avrg_observed_deaths_ot" "excess_deaths_ot"       
[43] "perc_excess_deaths_ot"  
View(txdat2)


Exporting data

Since I work in R environment, I prefer and recommended saving data in base R format using .rda or .Rdata extension with compress= option. The function load() is used to call .rda files into R session. Saving data in this format compresses the file and saves significant space on disk. The save() command is very flexible and can save any file object.

save(usdat, file = "../ExcessDeaths/data/usdat_long.rda", compress = "xz")
save(usdat2, file = "../ExcessDeaths/data/usdat_wide.rda", compress = "xz")
save(txdat, file = "../ExcessDeaths/data/txdat_long.rda", compress = "xz")
save(txdat2, file = "../ExcessDeaths/data/txdat_wide.rda", compress = "xz")
save(txdat_wh, file = "../ExcessDeaths/data/txdat_white.rda", compress = "xz")
save(txdat_hi, file = "../ExcessDeaths/data/txdat_hispc.rda", compress = "xz")
save(txdat_bl, file = "../ExcessDeaths/data/txdat_black.rda", compress = "xz")
save(txdat_as, file = "../ExcessDeaths/data/txdat_asian.rda", compress = "xz")
save(txdat_ia, file = "../ExcessDeaths/data/txdat_indal.rda", compress = "xz")
save(txdat_ot, file = "../ExcessDeaths/data/txdat_other.rda", compress = "xz")
write_csv(usdat, path = "../ExcessDeaths/data/usdat_long.csv")
write_csv(usdat2, path = "../ExcessDeaths/data/usdat_wide.csv")
write_csv(txdat, path = "../ExcessDeaths/data/txdat_long.csv")
write_csv(txdat2, path = "../ExcessDeaths/data/txdat_wide.csv")
write_csv(txdat_wh, path = "../ExcessDeaths/data/txdat_white.csv")
write_csv(txdat_hi, path = "../ExcessDeaths/data/txdat_hispc.csv")
write_csv(txdat_bl, path = "../ExcessDeaths/data/txdat_black.csv")
write_csv(txdat_as, path = "../ExcessDeaths/data/txdat_asian.csv")
write_csv(txdat_ia, path = "../ExcessDeaths/data/txdat_indal.csv")
write_csv(txdat_ot, path = "../ExcessDeaths/data/txdat_other.csv")


Questions

Any question or suggestion?

I recommend to you all to explore below resources for detailed explanation and easy to adapt examples.



Resources & Attributions

  1. Corey Sparks, Ph.D. lab materials.

  2. UCLA’s R Markdown Basics - IDRE Stats

  3. Ben Baumer and R. Jordan Crouser lab materials.

  4. CDC/NCHS’s Main Database with major reference to data on Provisional COVID-19 Death Counts by Sex, Age, and State, United States COVID-19 Cases and Deaths by State over Time, Provisional COVID-19 Death Counts by Week Ending Date and State, and Provisional COVID-19 Death Counts in the United States by County.

  5. R package was used for data wrangling and analysis.



Created on October 20, 2020