07 Statistical modeling and reporting

The following are the R code for making the Table 2 in the New England J. of Medicine paper

# Please change to your own directory
Dig_dat = read.csv("C:/Users/mindy/Dropbox/SHI/Course Materials/Mindy/Basic Biostatistics/Lecture 1/DIG/dig_demo.csv")

Create an empty data table as the placeholder for the Table in the New England J. of Medicine paper

Death_Table = data.frame(matrix(nrow = 7, ncol= 9))
colnames(Death_Table) = c("DIGOXIN", "DIGOXINPCT", "PLACEBO","PLACEBOPCT",  "ABSOLUTEDIFFRENCE", "RISKRATIO","LowerCI", "UpperCI", "PVALUE")
rownames(Death_Table) = c("All", "Cardiovascular", "WorseningHeartFailure", "OtherCardiac", "OtherVascular",
                                       "Unknown", "NoncardiacAndNovascular")

In order to be accurate when we create new variables with the reason for death, print out all the exact texts for the reasons

table(Dig_dat$REASON)

## 
##                         Non Cardiac Nonvascular           Other Cardiac 
##                    4425                     355                     952 
##          Other Vascular                 Unknown Worsening Heart Failure 
##                      95                     130                     843

Create new variables to be used for further data processing with the %>% operator from the dplyr package

Dig_dat = Dig_dat %>%
  mutate(All = DEATH ==1) %>%
  mutate(Cardiovascular = REASON == "Worsening Heart Failure"|REASON == "Other Cardiac"|REASON == "Other Vascular"|REASON == "Unknown") %>%
  mutate(WorseningHeartFailure = REASON == "Worsening Heart Failure") %>%
  mutate(OtherCardiac = REASON == "Other Cardiac") %>%
  mutate(OtherVascular = REASON == "Other Vascular") %>%
  mutate(Unknown = REASON == "Unknown") %>%
  mutate(NoncardiacAndNovascular = REASON == "Non Cardiac Nonvascular")

Calcuate the numbers for the columns “Digoxin” and “Placebo” in our summary table using the %>% operator from the dplyr package

part1 = Dig_dat %>% filter(DEATH == 1) %>% 
  group_by(TRTMT) %>%
  summarise_each(sum, All, Cardiovascular, WorseningHeartFailure, OtherCardiac, OtherVascular, Unknown, NoncardiacAndNovascular) %>%
  select(-TRTMT)  #remove the treatment label from the output table since we only need the numbers

dim(part1)

## [1] 2 7

Paste the result for the columns “Digoxin” and “Placebo” to our summary table

# transpose the result by the t() function
Death_Table[,c("DIGOXIN", "PLACEBO")] = t(part1)

Calculate the percentages for the Digoxin group

Death_Table$DIGOXINPCT = round(Death_Table$DIGOXIN/sum(Dig_dat$TRTMT=="Digoxin"),3)*100

Calculate the percentages for the Placebo group

Death_Table$PLACEBOPCT = round(Death_Table$PLACEBO/sum(Dig_dat$TRTMT=="Placebo"),3)*100

Calculate the abosulte difference

Death_Table$ABSOLUTEDIFFRENCE = round((Death_Table$DIGOXINPCT - Death_Table$PLACEBOPCT),3)

Calculate the relative risks of death between the TRTMT groups for all of the death reasons

As an example, let us first start with “all” death reasons

In order to calculate the relative risk, we need to make a 2x2 table:

table2by2_all = table(subset(Dig_dat, select = c(TRTMT, All)))

Then we use the riskratio() function from the epitools package to calculate the risk ratios

rr_all = riskratio(table2by2_all)

We can see from the above output, “Digoxin” was used as the reference group when calculating the risk ratio. This is because “Digoxin” goes before “Placebo” in the dictionary order. In our case, we would like to use the “Placebo” group as the reference group. So we create a new variable for the treatment group labels

Dig_dat = Dig_dat %>%
  mutate(trtgrp = TRTMT == "Digoxin")

We repeat the riskratio model with the new group labels

table2by2_all = table(subset(Dig_dat, select = c(trtgrp, All)))
rr_all = riskratio(table2by2_all)

It seems that in the New England J. of Medicine, the author had reported the p value from the Fisher’s exact test, instead of the chi squared test, which is more recommended. We will reproduce the table as the author.

Death_Table["All", c("RISKRATIO", "LowerCI", "UpperCI")] = round(rr_all$measure[2,],2)
Death_Table["All", c("PVALUE")] = round(rr_all$p.value[2,2],2)

We can repeat the same for the rest of the rows

A beatiful rendering of the table using the kable package. You can find more styling options from the link: https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html

Death_Table %>% 
  kable() %>%
  kable_styling()

	DIGOXIN	DIGOXINPCT	PLACEBO	PLACEBOPCT	ABSOLUTEDIFFRENCE	RISKRATIO	LowerCI	UpperCI	PVALUE
All	1181	34.8	1194	35.1	-0.3	0.99	0.93	1.06	0.8
Cardiovascular	1016	29.9	1004	29.5	0.4	NA	NA	NA	NA
WorseningHeartFailure	394	11.6	449	13.2	-1.6	NA	NA	NA	NA
OtherCardiac	508	15.0	444	13.0	2.0	NA	NA	NA	NA
OtherVascular	50	1.5	45	1.3	0.2	NA	NA	NA	NA
Unknown	64	1.9	66	1.9	0.0	NA	NA	NA	NA
NoncardiacAndNovascular	165	4.9	190	5.6	-0.7	NA	NA	NA	NA

Now let’s move on to Figire 2

Our outcome endpoint is time to death due to worsening heart failure.

dig_survfit = survfit(Surv(DEATHDAY/30, WorseningHeartFailure)~TRTMT, data = Dig_dat)
plot(dig_survfit)
ggsurvplot(dig_survfit, risk.table = T, fun = "event")

You can find more options with the ggsurvplot() from the link: https://rpkgs.datanovia.com/survminer/index.html, http://www.sthda.com/english/wiki/survminer-r-package-survival-data-analysis-and-visualization

ggsurvplot(dig_survfit, risk.table = T, fun = "event", risk.table.height = 0.3, size=0.5, censor = F, pval = TRUE, pval.coord = c(36, 0.03), ylim=c(0,0.18), xlim=c(0,52))

## Warning: Removed 65 rows containing missing values (geom_path).

## Warning: Removed 65 rows containing missing values (geom_path).

To check the hazard ratio between TRTMT groups and the p value:

dig_coxph = coxph(Surv(DEATHDAY/30, WorseningHeartFailure)~TRTMT, data = Dig_dat)
dig_coxph = coxph(Surv(DEATHDAY/30, WorseningHeartFailure)~trtgrp, data = Dig_dat)

07 Statistical modeling and reporting

Mindy Fang

2020/1/8

It is a common practice to load necessary R libraries at the beginning of the document

The following are the R code for making the Table 2 in the New England J. of Medicine paper

Create an empty data table as the placeholder for the Table in the New England J. of Medicine paper

In order to be accurate when we create new variables with the reason for death, print out all the exact texts for the reasons

Create new variables to be used for further data processing with the %>% operator from the dplyr package

Calcuate the numbers for the columns “Digoxin” and “Placebo” in our summary table using the %>% operator from the dplyr package

Paste the result for the columns “Digoxin” and “Placebo” to our summary table

Calculate the percentages for the Digoxin group

Calculate the percentages for the Placebo group

Calculate the abosulte difference

Calculate the relative risks of death between the TRTMT groups for all of the death reasons

As an example, let us first start with “all” death reasons

In order to calculate the relative risk, we need to make a 2x2 table:

Then we use the riskratio() function from the epitools package to calculate the risk ratios

We repeat the riskratio model with the new group labels

It seems that in the New England J. of Medicine, the author had reported the p value from the Fisher’s exact test, instead of the chi squared test, which is more recommended. We will reproduce the table as the author.

We can repeat the same for the rest of the rows

A beatiful rendering of the table using the kable package. You can find more styling options from the link: https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html

Now let’s move on to Figire 2

Our outcome endpoint is time to death due to worsening heart failure.

You can find more options with the ggsurvplot() from the link: https://rpkgs.datanovia.com/survminer/index.html, http://www.sthda.com/english/wiki/survminer-r-package-survival-data-analysis-and-visualization

To check the hazard ratio between TRTMT groups and the p value: