The following are the R code for making the Table 2 in the New England J. of Medicine paper
# Please change to your own directory
Dig_dat = read.csv("C:/Users/mindy/Dropbox/SHI/Course Materials/Mindy/Basic Biostatistics/Lecture 1/DIG/dig_demo.csv")
Create an empty data table as the placeholder for the Table in the New England J. of Medicine paper
Death_Table = data.frame(matrix(nrow = 7, ncol= 9))
colnames(Death_Table) = c("DIGOXIN", "DIGOXINPCT", "PLACEBO","PLACEBOPCT", "ABSOLUTEDIFFRENCE", "RISKRATIO","LowerCI", "UpperCI", "PVALUE")
rownames(Death_Table) = c("All", "Cardiovascular", "WorseningHeartFailure", "OtherCardiac", "OtherVascular",
"Unknown", "NoncardiacAndNovascular")
In order to be accurate when we create new variables with the reason for death, print out all the exact texts for the reasons
table(Dig_dat$REASON)
##
## Non Cardiac Nonvascular Other Cardiac
## 4425 355 952
## Other Vascular Unknown Worsening Heart Failure
## 95 130 843
Create new variables to be used for further data processing with the %>% operator from the dplyr package
Dig_dat = Dig_dat %>%
mutate(All = DEATH ==1) %>%
mutate(Cardiovascular = REASON == "Worsening Heart Failure"|REASON == "Other Cardiac"|REASON == "Other Vascular"|REASON == "Unknown") %>%
mutate(WorseningHeartFailure = REASON == "Worsening Heart Failure") %>%
mutate(OtherCardiac = REASON == "Other Cardiac") %>%
mutate(OtherVascular = REASON == "Other Vascular") %>%
mutate(Unknown = REASON == "Unknown") %>%
mutate(NoncardiacAndNovascular = REASON == "Non Cardiac Nonvascular")
Calcuate the numbers for the columns “Digoxin” and “Placebo” in our summary table using the %>% operator from the dplyr package
part1 = Dig_dat %>% filter(DEATH == 1) %>%
group_by(TRTMT) %>%
summarise_each(sum, All, Cardiovascular, WorseningHeartFailure, OtherCardiac, OtherVascular, Unknown, NoncardiacAndNovascular) %>%
select(-TRTMT) #remove the treatment label from the output table since we only need the numbers
dim(part1)
## [1] 2 7
Paste the result for the columns “Digoxin” and “Placebo” to our summary table
# transpose the result by the t() function
Death_Table[,c("DIGOXIN", "PLACEBO")] = t(part1)
Calculate the percentages for the Digoxin group
Death_Table$DIGOXINPCT = round(Death_Table$DIGOXIN/sum(Dig_dat$TRTMT=="Digoxin"),3)*100
Calculate the percentages for the Placebo group
Death_Table$PLACEBOPCT = round(Death_Table$PLACEBO/sum(Dig_dat$TRTMT=="Placebo"),3)*100
Calculate the abosulte difference
Death_Table$ABSOLUTEDIFFRENCE = round((Death_Table$DIGOXINPCT - Death_Table$PLACEBOPCT),3)
Calculate the relative risks of death between the TRTMT groups for all of the death reasons
As an example, let us first start with “all” death reasons
In order to calculate the relative risk, we need to make a 2x2 table:
table2by2_all = table(subset(Dig_dat, select = c(TRTMT, All)))
We can see from the above output, “Digoxin” was used as the reference group when calculating the risk ratio. This is because “Digoxin” goes before “Placebo” in the dictionary order. In our case, we would like to use the “Placebo” group as the reference group. So we create a new variable for the treatment group labels
Dig_dat = Dig_dat %>%
mutate(trtgrp = TRTMT == "Digoxin")
We repeat the riskratio model with the new group labels
table2by2_all = table(subset(Dig_dat, select = c(trtgrp, All)))
rr_all = riskratio(table2by2_all)
It seems that in the New England J. of Medicine, the author had reported the p value from the Fisher’s exact test, instead of the chi squared test, which is more recommended. We will reproduce the table as the author.
Death_Table["All", c("RISKRATIO", "LowerCI", "UpperCI")] = round(rr_all$measure[2,],2)
Death_Table["All", c("PVALUE")] = round(rr_all$p.value[2,2],2)
We can repeat the same for the rest of the rows
Now let’s move on to Figire 2
Our outcome endpoint is time to death due to worsening heart failure.
dig_survfit = survfit(Surv(DEATHDAY/30, WorseningHeartFailure)~TRTMT, data = Dig_dat)
plot(dig_survfit)
ggsurvplot(dig_survfit, risk.table = T, fun = "event")


ggsurvplot(dig_survfit, risk.table = T, fun = "event", risk.table.height = 0.3, size=0.5, censor = F, pval = TRUE, pval.coord = c(36, 0.03), ylim=c(0,0.18), xlim=c(0,52))
## Warning: Removed 65 rows containing missing values (geom_path).
## Warning: Removed 65 rows containing missing values (geom_path).

To check the hazard ratio between TRTMT groups and the p value:
dig_coxph = coxph(Surv(DEATHDAY/30, WorseningHeartFailure)~TRTMT, data = Dig_dat)
dig_coxph = coxph(Surv(DEATHDAY/30, WorseningHeartFailure)~trtgrp, data = Dig_dat)