It is a common practice to load necessary R libraries at the beginning of the document

The following are the R code for making the Table 2 in the New England J. of Medicine paper

# Please change to your own directory
Dig_dat = read.csv("C:/Users/mindy/Dropbox/SHI/Course Materials/Mindy/Basic Biostatistics/Lecture 1/DIG/dig_demo.csv")

Create an empty data table as the placeholder for the Table in the New England J. of Medicine paper

Death_Table = data.frame(matrix(nrow = 7, ncol= 9))
colnames(Death_Table) = c("DIGOXIN", "DIGOXINPCT", "PLACEBO","PLACEBOPCT",  "ABSOLUTEDIFFRENCE", "RISKRATIO","LowerCI", "UpperCI", "PVALUE")
rownames(Death_Table) = c("All", "Cardiovascular", "WorseningHeartFailure", "OtherCardiac", "OtherVascular",
                                       "Unknown", "NoncardiacAndNovascular")

In order to be accurate when we create new variables with the reason for death, print out all the exact texts for the reasons

table(Dig_dat$REASON)
## 
##                         Non Cardiac Nonvascular           Other Cardiac 
##                    4425                     355                     952 
##          Other Vascular                 Unknown Worsening Heart Failure 
##                      95                     130                     843

Create new variables to be used for further data processing with the %>% operator from the dplyr package

Dig_dat = Dig_dat %>%
  mutate(All = DEATH ==1) %>%
  mutate(Cardiovascular = REASON == "Worsening Heart Failure"|REASON == "Other Cardiac"|REASON == "Other Vascular"|REASON == "Unknown") %>%
  mutate(WorseningHeartFailure = REASON == "Worsening Heart Failure") %>%
  mutate(OtherCardiac = REASON == "Other Cardiac") %>%
  mutate(OtherVascular = REASON == "Other Vascular") %>%
  mutate(Unknown = REASON == "Unknown") %>%
  mutate(NoncardiacAndNovascular = REASON == "Non Cardiac Nonvascular")

Calcuate the numbers for the columns “Digoxin” and “Placebo” in our summary table using the %>% operator from the dplyr package

part1 = Dig_dat %>% filter(DEATH == 1) %>% 
  group_by(TRTMT) %>%
  summarise_each(sum, All, Cardiovascular, WorseningHeartFailure, OtherCardiac, OtherVascular, Unknown, NoncardiacAndNovascular) %>%
  select(-TRTMT)  #remove the treatment label from the output table since we only need the numbers

dim(part1)
## [1] 2 7

Paste the result for the columns “Digoxin” and “Placebo” to our summary table

# transpose the result by the t() function
Death_Table[,c("DIGOXIN", "PLACEBO")] = t(part1)

Calculate the percentages for the Digoxin group

Death_Table$DIGOXINPCT = round(Death_Table$DIGOXIN/sum(Dig_dat$TRTMT=="Digoxin"),3)*100

Calculate the percentages for the Placebo group

Death_Table$PLACEBOPCT = round(Death_Table$PLACEBO/sum(Dig_dat$TRTMT=="Placebo"),3)*100

Calculate the abosulte difference

Death_Table$ABSOLUTEDIFFRENCE = round((Death_Table$DIGOXINPCT - Death_Table$PLACEBOPCT),3)

Calculate the relative risks of death between the TRTMT groups for all of the death reasons

As an example, let us first start with “all” death reasons

In order to calculate the relative risk, we need to make a 2x2 table:

table2by2_all = table(subset(Dig_dat, select = c(TRTMT, All)))

Then we use the riskratio() function from the epitools package to calculate the risk ratios

rr_all = riskratio(table2by2_all)

We can see from the above output, “Digoxin” was used as the reference group when calculating the risk ratio. This is because “Digoxin” goes before “Placebo” in the dictionary order. In our case, we would like to use the “Placebo” group as the reference group. So we create a new variable for the treatment group labels

Dig_dat = Dig_dat %>%
  mutate(trtgrp = TRTMT == "Digoxin")

We repeat the riskratio model with the new group labels

table2by2_all = table(subset(Dig_dat, select = c(trtgrp, All)))
rr_all = riskratio(table2by2_all)

It seems that in the New England J. of Medicine, the author had reported the p value from the Fisher’s exact test, instead of the chi squared test, which is more recommended. We will reproduce the table as the author.

Death_Table["All", c("RISKRATIO", "LowerCI", "UpperCI")] = round(rr_all$measure[2,],2)
Death_Table["All", c("PVALUE")] = round(rr_all$p.value[2,2],2)

We can repeat the same for the rest of the rows

Now let’s move on to Figire 2

Our outcome endpoint is time to death due to worsening heart failure.

dig_survfit = survfit(Surv(DEATHDAY/30, WorseningHeartFailure)~TRTMT, data = Dig_dat)
plot(dig_survfit)
ggsurvplot(dig_survfit, risk.table = T, fun = "event")

To check the hazard ratio between TRTMT groups and the p value:

dig_coxph = coxph(Surv(DEATHDAY/30, WorseningHeartFailure)~TRTMT, data = Dig_dat)
dig_coxph = coxph(Surv(DEATHDAY/30, WorseningHeartFailure)~trtgrp, data = Dig_dat)