Carlisle recently published a paper in which he examined the distribution of baseline characteristics in many published paper in 8 journals. In an RCT, groups are randomly allocated, thus, p-values calculated for the group difference in each baseline patient characteristic should follow the Uniform(0,1) distribution (i.e., the null distribution). Using p-values reconstructed from published baseline patient characteristic tables, Carlisle found that there was an excess of p-values that were very small or very large compared to the reference Uniform(0,1) distribution.
Carlisle kindly made the table extracts available as appendix S2. I’ll take a brief look here. Please note this is not intended to be a rigorous re-analysis.
library(tidyverse)
## Journal names
journals <- c("Anaesthesia",
"Anesthesia and Analgesia",
"Anesthesiology",
"BJA",
"CJA",
"EJA",
"JAMA",
"NEJM")
## Load into one dataset
data1 <- map(c(1:8), function(i){
df <- readxl::read_excel("./anae13938-sup-0002-AppendixS2.xlsx", sheet = i) %>%
rename(n = `number in group`) %>%
mutate(journal = journals[i]) %>%
select(journal,
everything())
names(df) <- tolower(names(df))
df
}) %>% bind_rows
cat("### Imported data\n")
## ### Imported data
data1
## # A tibble: 72,236 × 9
## journal trial measure group n mean sd decm decsd
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Anaesthesia 1 1 1 15 63 13 0 0
## 2 Anaesthesia 1 1 2 17 68 12 0 0
## 3 Anaesthesia 1 2 1 15 165 7 0 0
## 4 Anaesthesia 1 2 2 17 167 7 0 0
## 5 Anaesthesia 1 3 1 15 359 100 0 0
## 6 Anaesthesia 1 3 2 17 350 60 0 0
## 7 Anaesthesia 1 4 1 15 31 5 0 0
## 8 Anaesthesia 1 4 2 17 35 7 0 0
## 9 Anaesthesia 1 5 1 15 100 18 0 0
## 10 Anaesthesia 1 5 2 17 110 23 0 0
## # ... with 72,226 more rows
cat("### Summary\n")
## ### Summary
summary(data1)
## journal trial measure group n mean
## Length:72236 Min. : 1.0 Min. : 1.000 Min. : 1.000 Min. : 2.0 Min. : 0
## Class :character 1st Qu.: 159.0 1st Qu.: 2.000 1st Qu.: 1.000 1st Qu.: 20.0 1st Qu.: 17
## Mode :character Median : 322.0 Median : 4.000 Median : 2.000 Median : 35.0 Median : 51
## Mean : 381.4 Mean : 5.781 Mean : 1.851 Mean : 262.2 Mean : 463
## 3rd Qu.: 520.0 3rd Qu.: 7.000 3rd Qu.: 2.000 3rd Qu.: 106.0 3rd Qu.: 84
## Max. :1286.0 Max. :76.000 Max. :12.000 Max. :34644.0 Max. :6400000
## sd decm decsd
## Min. : -1 Min. : 0.0000 Min. :0.0000
## 1st Qu.: 4 1st Qu.: 0.0000 1st Qu.:0.0000
## Median : 9 Median : 1.0000 Median :1.0000
## Mean : 362 Mean : 0.6832 Mean :0.7092
## 3rd Qu.: 14 3rd Qu.: 1.0000 3rd Qu.:1.0000
## Max. :6400000 Max. :11.0000 Max. :4.0000
Here I calculated the absolute standardized mean distance using the weighted average of variance between the first two groups.
##
data_smd <- data1 %>%
filter(group %in% c(1,2)) %>%
group_by(journal, trial, measure) %>%
summarize(nrow = n(),
decm = mean(c(decm[1], decm[2])),
decsd = mean(c(decsd[1], decsd[2])),
n_total = n[1] + n[2],
absolute_mean_diff = abs(mean[1] - mean[2]),
## weighted mean of variance
pooled_var = (n[1]/n_total * (sd[1]^2)) + (n[2]/n_total * (sd[2]^2)),
pooled_sd = sqrt(pooled_var),
smd = absolute_mean_diff / pooled_sd)
The absolute SMD of each variable is plotted against the trial’s sample size (log10 scale).
## Overall full Y scale
ggplot(data = data_smd, mapping = aes(x = n_total, y = smd, label = trial)) +
geom_text() +
scale_y_continuous() +
scale_x_log10(breaks = c(30,100,1000,5000)) +
facet_wrap(~ journal) +
theme_bw() + theme(legend.key = element_blank())
## Warning: Removed 23 rows containing missing values (geom_text).
Some trials have large outliers. As the sample size increases, the group balance measured in absolute SMD improves as expected. Some trials such as JAMA #267, JAMA #114, and NEJM #349 may be worth closer look considering their sample sizes and SMD magnitude.
## Zoomed
data_smd %>%
mutate(decm = round(decm)) %>%
filter(decm <= 3) %>%
##
ggplot(data = ., mapping = aes(x = n_total, y = smd, label = trial, color = factor(decm))) +
geom_text() +
scale_y_continuous(limits = c(0,0.1)) +
scale_x_log10(limits = c(min(data_smd$n_total), 100)) +
facet_grid(factor(decm) ~ journal) +
theme_bw() + theme(legend.key = element_blank())
## Warning: Removed 25378 rows containing missing values (geom_text).
At a closer look of small studies (n \(\le\) 100), many studies are at SMD = 0, which may be a little suspicious. Insufficient reporting of decimal points (the dataset contains this information) may be erroneously setting SMD to 0, but it is less likely to be the case in very small studies.
In the above mentioned moderate to large trials, the variables that had SMD > 1 are the following. JAMA #267 variable 6 appears quite discordant.
data1 %>%
filter((journal == "JAMA" & trial %in% c(267,114)) |
(journal == "NEJM" & trial %in% c(349))) %>%
left_join(., select(data_smd, journal, trial, measure, pooled_sd, smd)) %>%
filter(smd > 1) %>%
print(n = Inf)
## # A tibble: 14 × 11
## journal trial measure group n mean sd decm decsd pooled_sd smd
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 JAMA 114 6 1 230 37.2 0.09 1 2 0.1476106 6.774581
## 2 JAMA 114 6 2 220 38.2 0.19 1 2 0.1476106 6.774581
## 3 JAMA 267 6 1 85 126.4 6.10 1 1 5.5771857 9.108537
## 4 JAMA 267 6 2 85 177.2 5.00 1 1 5.5771857 9.108537
## 5 NEJM 349 1 1 2365 63.0 0.20 1 1 0.2000000 2.500000
## 6 NEJM 349 1 2 2366 62.5 0.20 1 1 0.2000000 2.500000
## 7 NEJM 349 2 1 2365 138.9 0.40 1 1 0.4000000 1.250000
## 8 NEJM 349 2 2 2366 138.4 0.40 1 1 0.4000000 1.250000
## 9 NEJM 349 3 1 2365 82.0 0.20 1 1 0.2000000 3.000000
## 10 NEJM 349 3 2 2366 81.4 0.20 1 1 0.2000000 3.000000
## 11 NEJM 349 4 1 2365 27.5 0.10 1 1 0.1000000 1.000000
## 12 NEJM 349 4 2 2366 27.4 0.10 1 1 0.1000000 1.000000
## 13 NEJM 349 5 1 2365 87.1 1.00 1 1 1.0000000 2.800000
## 14 NEJM 349 5 2 2366 84.3 1.00 1 1 1.0000000 2.800000
According to appendix S1 “P. blocks of 8. Perhaps authors’ error in Var6, either wrong mean or authors’ SD was SEM (perhaps mean should have been 176.4, not 126.4 as other mean 177.2)”.