Appendix A

Recalculation of the Ashoush et al. trial

The paper by Ashoush et al. was published in 2017 (1). Here, we systematically go over the tables and figures, and recalculate some of the published data.

Figure 1

This is the CONSORT flow chart in the paper.

n_alloc <- c(prog=106, plac=106)
n_ana  <- c(prog=96,  plac=91)

Table 1

This is table 1 in the paper

Below, we recalculate the p-values and recalculate the proportions stated in the paper:

#To store results in
p_tbl1 <- data.frame(characteristic = c("Maternal age","Parity","Previous preterm delivery","Previous PPROM","Gestational age at previous deliveries","Mean birthweight at previous deliveries"),
                     pval_paper  = c(0.65, 0.07, 0.69, 0.36, 0.72, 0.11),
                     pval_recal  = NA,
                     cum_prop_tx = NA,
                     cum_prop_c  = NA)

#Total N per group ("analysed" group, from figure 1)
n <- n_ana
  
#Maternal age
mat_age_mu <- c(29.3, 29.5)
mat_age_sd <- c(4.5, 3.5)

#t.test
ttest_matage <- tsum.test(mean.x = mat_age_mu[1], s.x = mat_age_sd[1], n.x = n[1],
                          mean.y = mat_age_mu[2], s.y = mat_age_sd[2], n.y = n[2])
p_tbl1[p_tbl1$characteristic=="Maternal age","pval_recal"] <- ttest_matage$p.value
  
#Parity
parity_n   <- c(37,40,23,6,
                37,49,20,0)
tbl_parity <- matrix(parity_n, ncol = 2, byrow = FALSE)

chisq_parity <- fisher.test(tbl_parity) # because expected frequency in one cell <5
p_tbl1[p_tbl1$characteristic=="Parity","pval_recal"]  <- chisq_parity$p.value
p_tbl1[p_tbl1$characteristic=="Parity","cum_prop_tx"] <- sum(tbl_parity[,1]/n[1])
p_tbl1[p_tbl1$characteristic=="Parity","cum_prop_c"]  <- sum(tbl_parity[,2]/n[1])

#Previous preterm delivery (ppd)
ppd_n   <- c(46,51,9,
             41, 53,12)
tbl_ppd <- matrix(ppd_n, ncol = 2, byrow = FALSE)

chisq_ppd <- chisq.test(tbl_ppd) 
p_tbl1[p_tbl1$characteristic=="Previous preterm delivery","pval_recal"]  <- chisq_ppd$p.value
p_tbl1[p_tbl1$characteristic=="Previous preterm delivery","cum_prop_tx"] <- sum(tbl_ppd[,1]/n[1])
p_tbl1[p_tbl1$characteristic=="Previous preterm delivery","cum_prop_c"]  <- sum(tbl_ppd[,2]/n[1])

#Previous PPROM (pprom)
pprom_n <- c(27, n[1]-27,
             33, n[2]-33)
tbl_pprom <- matrix(pprom_n, ncol=2, byrow = FALSE)

chisq_pprom <- chisq.test(tbl_pprom) 
p_tbl1[p_tbl1$characteristic=="Previous PPROM","pval_recal"]  <- chisq_pprom$p.value
p_tbl1[p_tbl1$characteristic=="Previous PPROM","cum_prop_tx"] <- NA
p_tbl1[p_tbl1$characteristic=="Previous PPROM","cum_prop_c"]  <- NA

#Gestational age at previous deliveries
g_age_p_mu <- c(31.9, 32.1)
g_age_p_sd <- c(2.1, 2)

#t.test
ttest_gage <- tsum.test(mean.x = g_age_p_mu[1], s.x = g_age_p_sd[1], n.x = n[1], 
                        mean.y = g_age_p_mu[2], s.y = g_age_p_sd[2], n.y = n[2])
p_tbl1[p_tbl1$characteristic=="Gestational age at previous deliveries","pval_recal"] <- ttest_gage$p.value

#Mean birthweight at previous deliveries
bw_p_mu <- c(1493, 1607)
bw_p_sd <- c(48, 54)

#t.test
ttest_bwp <- tsum.test(mean.x = bw_p_mu[1], s.x = bw_p_sd[1], n.x = n[1], 
                       mean.y = bw_p_mu[2], s.y = bw_p_sd[2], n.y = n[2])
p_tbl1[p_tbl1$characteristic=="Mean birthweight at previous deliveries","pval_recal"] <- ttest_bwp$p.value

#Show table
p_tbl1$pval_recal <- round(p_tbl1$pval_recal, digits = 2)
kable(p_tbl1)

characteristic	pval_paper	pval_recal	cum_prop_tx	cum_prop_c
Maternal age	0.65	0.73	NA	NA
Parity	0.07	0.06	1.104167	1.104167
Previous preterm delivery	0.69	0.69	1.104167	1.104167
Previous PPROM	0.36	0.30	NA	NA
Gestational age at previous deliveries	0.72	0.51	NA	NA
Mean birthweight at previous deliveries	0.11	0.00	NA	NA

This table is not correct. But we can recalculate the table with n = 106, the group before patients were lost to follow-up.

characteristic	pval_paper	pval_recal	cum_prop_tx	cum_prop_c
Maternal age	0.65	0.72	NA	NA
Parity	0.07	0.06	1	1
Previous preterm delivery	0.69	0.69	1	1
Previous PPROM	0.36	0.45	NA	NA
Gestational age at previous deliveries	0.72	0.48	NA	NA
Mean birthweight at previous deliveries	0.11	0.00	NA	NA

Now, the proportions are correct. However, the p-values remain incorrect. Especially the last row of the table remain striking: especially for the mean birthweight at previous delivery was significantly lower in the progesterone group (-114 (95% CI: -128 - -100)). Of course, this is a table 1, and inferences does not really apply. However, the standard deviations of these important confounding factors are so low, and the difference is definitely relevant, to raise some concern.

Table 2

This is table 2 in the paper

Below, we again recalculate the p-values and recalculate the proportions stated in the paper:

#To store results in
p_tbl2 <- data.frame(characteristic = c("Progesterone levels at 20 weeks",
                                        "Progesterone levels at 28 weeks",
                                        "Cervical length",
                                        "Total cervical cerclage",
                                        "Elective cervical cerclage",
                                        "Rescue cerclage at 20 weeks",
                                        "Rescue cerclage beyond 20 weeks"),
                     pval_paper  = c("<0.001", "<0.001", 0.17, 0.25, 0.46, 0.95, 0.37),
                     pval_recal  = NA)

#Total N per group ("analysed" group, from figure 1)
n <- n_ana
  
#Progesterone levels at 20 weeks (plev20)
plev20_mu <- c(30.7, 15.7)
plev20_sd <- c(3.4, 1.4)

#t.test
ttest_plev20 <- tsum.test(mean.x = plev20_mu[1], s.x = plev20_sd[1], n.x = n[1],
                          mean.y = plev20_mu[2], s.y = plev20_sd[2], n.y = n[2])
p_tbl2[p_tbl2$characteristic=="Progesterone levels at 20 weeks","pval_recal"] <- ttest_plev20$p.value
  
#Progesterone levels at 28 weeks (plev28)
plev28_mu <- c(34.5, 17.6)
plev28_sd <- c(3.8, 3.3)

#t.test
ttest_plev28 <- tsum.test(mean.x = plev28_mu[1], s.x = plev28_sd[1], n.x = n[1],
                          mean.y = plev28_mu[2], s.y = plev28_sd[2], n.y = n[2])
p_tbl2[p_tbl2$characteristic=="Progesterone levels at 28 weeks","pval_recal"] <- ttest_plev28$p.value
  
#Cervical length (cerl)
cerl_mu <- c(25.7, 23.9)
cerl_sd <- c(8.3, 9.7)

#t.test
ttest_cerl <- tsum.test(mean.x = cerl_mu[1], s.x = cerl_sd[1], n.x = n[1],
                        mean.y = cerl_mu[2], s.y = cerl_sd[2], n.y = n[2])
p_tbl2[p_tbl2$characteristic=="Cervical length","pval_recal"] <- ttest_cerl$p.value

#Total cervical cerclage (tcer)
tcer_n <- c(70, n[1]-70,
            73, n[2]-73)
tbl_tcer <- matrix(tcer_n, ncol=2, byrow = FALSE)

chisq_tcer <- chisq.test(tbl_tcer) 
p_tbl2[p_tbl2$characteristic=="Total cervical cerclage","pval_recal"]  <- chisq_tcer$p.value

#Elective cervical cerclage (ecer)
ecer_n <- c(55, n[1]-55,
            57, n[2]-57)
tbl_ecer <- matrix(ecer_n, ncol=2, byrow = FALSE)

chisq_ecer <- chisq.test(tbl_ecer) 
p_tbl2[p_tbl2$characteristic=="Elective cervical cerclage","pval_recal"]  <- chisq_ecer$p.value

#Rescue cerclage at 20 weeks (rcera20)
rcera20_n <- c(13, n[1]-13,
            12, n[2]-12)
tbl_rcera20 <- matrix(rcera20_n, ncol=2, byrow = FALSE)

chisq_rcera20 <- chisq.test(tbl_rcera20) 
p_tbl2[p_tbl2$characteristic=="Rescue cerclage at 20 weeks","pval_recal"]  <- chisq_rcera20$p.value

#Rescue cerclage beyond 20 weeks
rcerb20_n <- c(2, n[1]-4,
               2, n[2]-4)
tbl_rcerb20 <- matrix(rcerb20_n, ncol=2, byrow = FALSE)

chisq_rcerb20 <- fisher.test(tbl_rcerb20) # because expected frequency <5
p_tbl2[p_tbl2$characteristic=="Rescue cerclage beyond 20 weeks","pval_recal"]  <- chisq_rcerb20$p.value

#Show table
p_tbl2$pval_recal <- round(p_tbl2$pval_recal, digits = 2)
kable(p_tbl2)

characteristic	pval_paper	pval_recal
Progesterone levels at 20 weeks	<0.001	0.00
Progesterone levels at 28 weeks	<0.001	0.00
Cervical length	0.17	0.18
Total cervical cerclage	0.25	0.32
Elective cervical cerclage	0.46	0.55
Rescue cerclage at 20 weeks	0.95	1.00
Rescue cerclage beyond 20 weeks	0.37	1.00

The p-values are similar, but not quite right. Let’s calculate the table again with n=106 (before lost to follow-up).

characteristic	pval_paper	pval_recal
Progesterone levels at 20 weeks	<0.001	0.00
Progesterone levels at 28 weeks	<0.001	0.00
Cervical length	0.17	0.15
Total cervical cerclage	0.25	0.77
Elective cervical cerclage	0.46	0.89
Rescue cerclage at 20 weeks	0.95	1.00
Rescue cerclage beyond 20 weeks	0.37	1.00

This does not change much about the discrepancies.

Table 3

This is table 3 in the paper

Below, we again recalculate the p-values stated in the paper:

#To store results in
p_tbl3 <- data.frame(characteristic = c("GA at delivery",
                                        "Mid-trimester miscarriages",
                                        "Admission for tocolysis",
                                        "Tocolysis-to-delivery interval",
                                        "PPROM",
                                        "Preterm delivery",
                                        "Cesarean delivery",
                                        "Instrumental delivery",
                                        "Chorioamnionitis",
                                        "Postpartum hemorrhage",
                                        "Postpartum sepsis"),
                     pval_paper  = c(0.01, 0.46, 0.03, "<0.001", 0.27, 0.01, 0.05, 0.93, 0.55, 0.4,0.13),
                     pval_recal  = NA)

#Total N per group ("analysed" group, from figure 1)
n <- n_ana
  
#GA at delivery (GAd)
GAd_mu <- c(35.4, 33.9)
GAd_sd <- c(2.7, 2.9)

#t.test
ttest_GAd <- tsum.test(mean.x = GAd_mu[1], s.x = GAd_sd[1], n.x = n[1],
                       mean.y = GAd_mu[2], s.y = GAd_sd[2], n.y = n[2])
p_tbl3[p_tbl3$characteristic=="GA at delivery","pval_recal"] <- ttest_GAd$p.value

#Mid-trimester miscarriages (mtm)
mtm_n <- c(7, n[1]-7,
           11, n[2]-11)
tbl_mtm <- matrix(mtm_n, ncol=2, byrow = FALSE)

chisq_mtm <- chisq.test(tbl_mtm) 
p_tbl3[p_tbl3$characteristic=="Mid-trimester miscarriages","pval_recal"]  <- chisq_mtm$p.value

#Admission for tocolysis (ato)
ato_n <- c(12, n[1]-12,
           23, n[2]-23)
tbl_ato <- matrix(ato_n, ncol=2, byrow = FALSE)

chisq_ato <- chisq.test(tbl_ato) 
p_tbl3[p_tbl3$characteristic=="Admission for tocolysis","pval_recal"]  <- chisq_ato$p.value

#Tocolysis-to-delivery interval (ttdi)
ttdi_mu <- c(87, 36)
ttdi_sd <- c(45.5, 14.2)

#t.test
ttest_ttdi <- tsum.test(mean.x = ttdi_mu[1], s.x = ttdi_sd[1], n.x = n[1],
                        mean.y = ttdi_mu[2], s.y = ttdi_sd[2], n.y = n[2])
p_tbl3[p_tbl3$characteristic=="Tocolysis-to-delivery interval","pval_recal"] <- ttest_ttdi$p.value

#PPROM
PPROM_n <- c(36, n[1]-36,
             40, n[2]-40)
tbl_PPROM <- matrix(PPROM_n, ncol=2, byrow = FALSE)

chisq_PPROM <- chisq.test(tbl_PPROM) 
p_tbl3[p_tbl3$characteristic=="PPROM","pval_recal"]  <- chisq_PPROM$p.value

#Preterm delivery (ptd)
ptd_n <- c(43, n[1]-43,
           58, n[2]-58)
tbl_ptd <- matrix(ptd_n, ncol=2, byrow = FALSE)

chisq_ptd <- chisq.test(tbl_ptd) 
p_tbl3[p_tbl3$characteristic=="Preterm delivery","pval_recal"]  <- chisq_ptd$p.value

#Cesarean delivery (csd)
csd_n <- c(69, n[1]-69,
           77, n[2]-77)
tbl_csd <- matrix(csd_n, ncol=2, byrow = FALSE)

chisq_csd <- chisq.test(tbl_csd) 
p_tbl3[p_tbl3$characteristic=="Cesarean delivery","pval_recal"]  <- chisq_csd$p.value

#Instrumental delivery (ind)
ind_n <- c(8, n[1]-8,
           7, n[2]-7)
tbl_ind <- matrix(ind_n, ncol=2, byrow = FALSE)

chisq_ind <- chisq.test(tbl_ind) 
p_tbl3[p_tbl3$characteristic=="Instrumental delivery","pval_recal"]  <- chisq_ind$p.value

#Chorioamnionitis (choam)
choam_n <- c(9,  n[1]-9,
             12, n[2]-12)
tbl_choam <- matrix(choam_n, ncol=2, byrow = FALSE)

chisq_choam <- chisq.test(tbl_choam) 
p_tbl3[p_tbl3$characteristic=="Chorioamnionitis","pval_recal"]  <- chisq_choam$p.value

#Postpartum hemorrhage (pph)
pph_n <- c(7, n[1]-7,
           11, n[2]-11)
tbl_pph <- matrix(pph_n, ncol=2, byrow = FALSE)

chisq_pph <- chisq.test(tbl_pph) 
p_tbl3[p_tbl3$characteristic=="Postpartum hemorrhage","pval_recal"]  <- chisq_pph$p.value

#Postpartum sepsis (pps)
pps_n <- c(4, n[1]-4,
           10, n[2]-10)
tbl_pps <- matrix(pps_n, ncol=2, byrow = FALSE)

chisq_pps <- chisq.test(tbl_pps) 
p_tbl3[p_tbl3$characteristic=="Postpartum sepsis","pval_recal"]  <- chisq_pps$p.value

#Show table
p_tbl3$pval_recal <- round(p_tbl3$pval_recal, digits = 2)
kable(p_tbl3)

characteristic	pval_paper	pval_recal
GA at delivery	0.01	0.00
Mid-trimester miscarriages	0.46	0.39
Admission for tocolysis	0.03	0.04
Tocolysis-to-delivery interval	<0.001	0.00
PPROM	0.27	0.45
Preterm delivery	0.01	0.01
Cesarean delivery	0.05	0.05
Instrumental delivery	0.93	1.00
Chorioamnionitis	0.55	0.55
Postpartum hemorrhage	0.4	0.39
Postpartum sepsis	0.13	0.14

All recalculated p-values are in similar range of what was reported in the paper, and no relevant difference (crossing the boundary of statistical significance) was observed.

Table 4

This is table 4 in the paper.

Below, we again recalculate the p-values stated in the paper.

#To store results in
p_tbl4 <- data.frame(characteristic = c("Birthweight",
                                        "LBW",
                                        "NICU",
                                        "LOS in NICU",
                                        "NMR",
                                        "RDS",
                                        "ICH",
                                        "NEC"),
                     pval_paper  = c(0.03, 0.003, "<0.001", 0.009, "<0.001",0.004,0.55,0.36),
                     pval_recal  = NA)

#Total N per group ("analysed" group, from figure 1)
n <- n_ana
  
# Birthweight (bw)
bw_mu <- c(2312, 1878)
bw_sd <- c(77, 74)

#t.test
ttest_bw <- tsum.test(mean.x = bw_mu[1], s.x = bw_sd[1], n.x = n[1],
                      mean.y = bw_mu[2], s.y = bw_sd[2], n.y = n[2])
p_tbl4[p_tbl4$characteristic=="Birthweight","pval_recal"] <- ttest_bw$p.value

# LBW
LBW_n <- c(29, n[1]-29,
           48, n[2]-48)
tbl_LBW <- matrix(LBW_n, ncol=2, byrow = FALSE)

chisq_LBW <- chisq.test(tbl_LBW) 
p_tbl4[p_tbl4$characteristic=="LBW","pval_recal"]  <- chisq_LBW$p.value

# NICU
NICU_n <- c(22, n[1]-22,
            42, n[2]-42)
tbl_NICU <- matrix(NICU_n, ncol=2, byrow = FALSE)

chisq_NICU <- chisq.test(tbl_NICU) 
p_tbl4[p_tbl4$characteristic=="NICU","pval_recal"]  <- chisq_NICU$p.value

# LOS in NICU (LOSNICU)
LOSNICU_mu <- c(15.4, 19.5)
LOSNICU_sd <- c(5.5, 5.8)

#t.test
ttest_LOSNICU <- tsum.test(mean.x = LOSNICU_mu[1], s.x = LOSNICU_sd[1], n.x = tbl_NICU[1,1], #Only patients who went to the NICU
                           mean.y = LOSNICU_mu[2], s.y = LOSNICU_sd[2], n.y = tbl_NICU[2,1])
p_tbl4[p_tbl4$characteristic=="LOS in NICU","pval_recal"] <- ttest_LOSNICU$p.value

# NMR
NMR_n <- c(7, n[1]-7,
           23, n[2]-23)
tbl_NMR <- matrix(NMR_n, ncol=2, byrow = FALSE)

chisq_NMR <- chisq.test(tbl_NMR) 
p_tbl4[p_tbl4$characteristic=="NMR","pval_recal"]  <- chisq_NMR$p.value

# RDS
RDS_n <- c(21, n[1]-21,
           39, n[2]-39)
tbl_RDS <- matrix(RDS_n, ncol=2, byrow = FALSE)

chisq_RDS <- chisq.test(tbl_RDS) 
p_tbl4[p_tbl4$characteristic=="RDS","pval_recal"]  <- chisq_RDS$p.value

# ICH
ICH_n <- c(8, n[1]-8,
           11, n[2]-11)
tbl_ICH <- matrix(ICH_n, ncol=2, byrow = FALSE)

chisq_ICH <- chisq.test(tbl_ICH) 
p_tbl4[p_tbl4$characteristic=="ICH","pval_recal"]  <- chisq_ICH$p.value

# NEC
NEC_n <- c(5, n[1]-5,
           9, n[2]-9)
tbl_NEC <- matrix(NEC_n, ncol=2, byrow = FALSE)

chisq_NEC <- chisq.test(tbl_NEC) 
p_tbl4[p_tbl4$characteristic=="NEC","pval_recal"]  <- chisq_NEC$p.value

#Show table
p_tbl4$pval_recal <- round(p_tbl4$pval_recal, digits = 3)
kable(p_tbl4)

characteristic	pval_paper	pval_recal
Birthweight	0.03	0.000
LBW	0.003	0.003
NICU	<0.001	0.001
LOS in NICU	0.009	0.004
NMR	<0.001	0.002
RDS	0.004	0.004
ICH	0.55	0.544
NEC	0.36	0.348

Overall, the p-values are similar to what was stated in the paper. However, it is striking how big of an effect is presented in the table here. Below, the risk differences and 95% confidence interval are shown for the most relevant outcomes.

#Calculate risk differences
rd_nmr  <- riskdifference(a = NMR_n[1],N1 = n[1], 
                          b = NMR_n[3],N0 = n[2])

##                  Cases People at risk         Risk
## Exposed     7.00000000    96.00000000   0.07291667
## Unexposed  23.00000000    91.00000000   0.25274725
## Total      30.00000000   187.00000000   0.16042781

rd_NICU <- riskdifference(a = NICU_n[1],N1 = n[1], 
                          b = NICU_n[3],N0 = n[2])

##                 Cases People at risk        Risk
## Exposed    22.0000000     96.0000000   0.2291667
## Unexposed  42.0000000     91.0000000   0.4615385
## Total      64.0000000    187.0000000   0.3422460

df <- data.frame('ARR in %, 95% CI'= c('Neonatal mortality rate' = 
                              paste(round(rd_nmr$estimate*100,digits = 1),
                                    " (",
                                    round(rd_nmr$conf.int[1]*100, digits = 1),
                                    " - ",
                                    round(rd_nmr$conf.int[2]*100, digits = 1),
                                    ")",
                                    sep = ""),
                              'NICU admission rate' = 
                              paste(round(rd_NICU$estimate*100,digits = 1),
                                    " (",
                                    round(rd_NICU$conf.int[1]*100, digits = 1),
                                    " - ",
                                    round(rd_NICU$conf.int[2]*100, digits = 1),
                                    ")",
                                    sep = "")
                              ))

kable(df)

	ARR.in….95..CI
Neonatal mortality rate	-18 (-28.3 - -7.6)
NICU admission rate	-23.2 (-36.5 - -10)

These risk differences, between 18 and 23%, are huge benefits. This would imply that progesterone would save a child every 6 treatments, or would prevent a NICU admission in 1 every 4 treatments. Those are big numbers, and warrent critical appraisal.

Table 5

The final table is table 5:

See below the recalculation of the p-values.

#To store results in
p_tbl5 <- data.frame(characteristic = c("Dizziness",
                                        "Constipation",
                                        "Somnolence",
                                        "Vaginal dryness"),
                     pval_paper  = c(0.002, 0.25, 0.002, 0.03),
                     pval_recal  = NA)

#Total N per group ("analysed" group, from figure 1)
n <- n_ana


#Dizziness (dizz)
dizz_n <- c(28, n[1]-28,
            9, n[2]-9)
tbl_dizz <- matrix(dizz_n, ncol=2, byrow = FALSE)

chisq_dizz <- chisq.test(tbl_dizz) 
p_tbl5[p_tbl5$characteristic=="Dizziness","pval_recal"]  <- chisq_dizz$p.value

#Constipation (cons)
cons_n <- c(21, n[1]-21,
            13, n[2]-13)
tbl_cons <- matrix(cons_n, ncol=2, byrow = FALSE)

chisq_cons <- chisq.test(tbl_cons) 
p_tbl5[p_tbl5$characteristic=="Constipation","pval_recal"]  <- chisq_cons$p.value

#Somnolence (somn)
somn_n <- c(40, n[1]-40,
            18, n[2]-18)
tbl_somn <- matrix(somn_n, ncol=2, byrow = FALSE)

chisq_somn <- chisq.test(tbl_somn) 
p_tbl5[p_tbl5$characteristic=="Somnolence","pval_recal"]  <- chisq_somn$p.value

#Vaginal dryness (vdry)
vdry_n <- c(20, n[1]-20,
            8, n[2]-8)
tbl_vdry <- matrix(vdry_n, ncol=2, byrow = FALSE)

chisq_vdry <- chisq.test(tbl_vdry) 
p_tbl5[p_tbl5$characteristic=="Vaginal dryness","pval_recal"]  <- chisq_vdry$p.value

#Show table
p_tbl5$pval_recal <- round(p_tbl5$pval_recal, digits = 3)
kable(p_tbl5)

characteristic	pval_paper	pval_recal
Dizziness	0.002	0.002
Constipation	0.250	0.248
Somnolence	0.002	0.002
Vaginal dryness	0.030	0.036

Most p-values again seem similar to what was reported in the paper.

Figure 2

This is figure 2 in the paper.

Below is a reproduction of the analysis with the data provided. Unfortunately, the follow-up time is not given for censored patients (patients who were drop-out, or had a miscarriage). Therefore, in the reproduction of the analysis, I will assume that these events took place at 20 weeks (halfway through pregnancy, which is a strong assumption).

#Import data
dt <- read_xlsx("Data/Copy of Copy of Ashoush_DB_3.43.xlsx")

#Recode variables
dt$arm <- factor(dt$`Groups (A= progesterone and B=placebo)`)
levels(dt$arm) <- c("Progesterone","Placebo")

#Follow-up time
dt$futime <- as.numeric(dt$`G.A at delivery (wks)`)

## Warning: NAs introduced by coercion

dt$futime[dt$`G.A at delivery (wks)`%in%c("miscarriage", "dropping out")] <- 20

dt$birth <- factor(as.numeric(dt$`G.A at delivery (wks)`%in%c("miscarriage", "dropping out"))) #Anyone who did not drop out or have had a miscarriage, has delivered
levels(dt$birth) <- c("Yes","No")

#Do Kaplan Meijer analysis
fit_km   <- survfit(formula = Surv(time=futime, event=birth=="Yes")~arm, data=dt)
ggsurvplot(fit_km, data = dt, pval = TRUE,pval.coord=c(21,0.25),  xlim=c(20,40))

We can repeat the analysis, however, by assuming that censoring took place by a uniform distribution, between 20 and 35 weeks gestational age. See below how such a random distribution might look like, and the Kaplan Meijer analysis with this assumption.

#Set seed, for reproducibility
set.seed(123)

#Follow-up time
dt$futime <- as.numeric(dt$`G.A at delivery (wks)`)

## Warning: NAs introduced by coercion

dt$futime[dt$`G.A at delivery (wks)`%in%c("miscarriage", "dropping out")] <- runif(n = dt$`G.A at delivery (wks)`%in%c("miscarriage", "dropping out"),
                                                                                   min = 20, 
                                                                                   max=35)

## Warning in dt$futime[dt$`G.A at delivery (wks)` %in% c("miscarriage", "dropping
## out")] <- runif(n = dt$`G.A at delivery (wks)` %in% : number of items to replace
## is not a multiple of replacement length

dt$birth <- factor(as.numeric(dt$`G.A at delivery (wks)`%in%c("miscarriage", "dropping out"))) #Anyone who did not drop out or have had a miscarriage, has delivered
levels(dt$birth) <- c("Yes","No")

# Show assumed distribution of censoring
ggplot(dt[dt$birth=="No",], aes(x=futime))+geom_histogram()+theme_bw()+labs(x="Assumed gestational age when censored (weeks)")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

#Do Kaplan Meijer analysis
fit_km   <- survfit(formula = Surv(time=futime, event=birth=="Yes")~arm, data=dt)
ggsurvplot(fit_km, data = dt, pval = TRUE,pval.coord=c(21,0.25), xlim=c(20,40))

Both show a significant difference between the groups (Log-rank test), similar to the analysis of the authors who did not take into account censoring. However, we did not take into account whether censoring was associated with the outcome.

Summary of observation

When recalculating the results of the trial by Ashoush et al. (1), there are some calculations that are odd, some are wrong but matter little, and some results seem too good to be true.

The odd calculations are mostly in the baseline characteristics table, where the p-values differ stated in the paper differ too much from what I can calculate based on the data. Most intriguingly, the mean birthweight at previous deliveries was lower in the progesterone group, while also their cervical length was longer, giving them an advantage over the control group. Both would result in an overestimated treatment effect.

The wrong calculation that matters little, is in the Kaplan Meijer analysis. They do not adjust for censoring, but assuming some random distribution of censoring doesn’t change the significance in the difference in trajectory. This does assume that censoring was independent of the outcome. Because the primary reason for censoring was miscarriage, this does not necessarily need to hold. Moreover, it is strange that the the weeks until miscarriage was not registered in this study.

##                  Cases People at risk         Risk
## Exposed    653.0000000   4963.0000000    0.1315736
## Unexposed  762.0000000   4870.0000000    0.1564682
## Total     1415.0000000   9833.0000000    0.1439032

The results that seem too good to be true, are the absolute risk reductions in neonatal outcome. For example, the antenatal corticosteroids (a widely implemented intervention to reduce neonatal mortality) is expected to reduce neonatal mortality by -2.5 (-3.9 - -1.1) (2). This found risk difference for progesterone in this trial is 7 times larger, which is quite astounding.