Background

Carlisle recently published a paper in which he examined the distribution of baseline characteristics in many published paper in 8 journals. In an RCT, groups are randomly allocated, thus, p-values calculated for the group difference in each baseline patient characteristic should follow the Uniform(0,1) distribution (i.e., the null distribution). Using p-values reconstructed from published baseline patient characteristic tables, Carlisle found that there was an excess of p-values that were very small or very large compared to the reference Uniform(0,1) distribution.

Carlisle kindly made the table extracts available as appendix S2. I’ll take a brief look here. Please note this is not intended to be a rigorous re-analysis.

References

Load packages

library(tidyverse)

Load data

## Journal names
journals <- c("Anaesthesia",
              "Anesthesia and Analgesia",
              "Anesthesiology",
              "BJA",
              "CJA",
              "EJA",
              "JAMA",
              "NEJM")

## Load into one dataset
data1 <- map(c(1:8), function(i){

    df <- readxl::read_excel("./anae13938-sup-0002-AppendixS2.xlsx", sheet = i) %>%
        rename(n = `number in group`) %>%
        mutate(journal = journals[i]) %>%
        select(journal,
               everything())
    names(df) <- tolower(names(df))

    df

}) %>% bind_rows


cat("###  Imported data\n")
## ###  Imported data
data1
## # A tibble: 72,236 × 9
##        journal trial measure group     n  mean    sd  decm decsd
##          <chr> <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1  Anaesthesia     1       1     1    15    63    13     0     0
## 2  Anaesthesia     1       1     2    17    68    12     0     0
## 3  Anaesthesia     1       2     1    15   165     7     0     0
## 4  Anaesthesia     1       2     2    17   167     7     0     0
## 5  Anaesthesia     1       3     1    15   359   100     0     0
## 6  Anaesthesia     1       3     2    17   350    60     0     0
## 7  Anaesthesia     1       4     1    15    31     5     0     0
## 8  Anaesthesia     1       4     2    17    35     7     0     0
## 9  Anaesthesia     1       5     1    15   100    18     0     0
## 10 Anaesthesia     1       5     2    17   110    23     0     0
## # ... with 72,226 more rows
cat("###  Summary\n")
## ###  Summary
summary(data1)
##    journal              trial           measure           group              n                mean        
##  Length:72236       Min.   :   1.0   Min.   : 1.000   Min.   : 1.000   Min.   :    2.0   Min.   :      0  
##  Class :character   1st Qu.: 159.0   1st Qu.: 2.000   1st Qu.: 1.000   1st Qu.:   20.0   1st Qu.:     17  
##  Mode  :character   Median : 322.0   Median : 4.000   Median : 2.000   Median :   35.0   Median :     51  
##                     Mean   : 381.4   Mean   : 5.781   Mean   : 1.851   Mean   :  262.2   Mean   :    463  
##                     3rd Qu.: 520.0   3rd Qu.: 7.000   3rd Qu.: 2.000   3rd Qu.:  106.0   3rd Qu.:     84  
##                     Max.   :1286.0   Max.   :76.000   Max.   :12.000   Max.   :34644.0   Max.   :6400000  
##        sd               decm             decsd       
##  Min.   :     -1   Min.   : 0.0000   Min.   :0.0000  
##  1st Qu.:      4   1st Qu.: 0.0000   1st Qu.:0.0000  
##  Median :      9   Median : 1.0000   Median :1.0000  
##  Mean   :    362   Mean   : 0.6832   Mean   :0.7092  
##  3rd Qu.:     14   3rd Qu.: 1.0000   3rd Qu.:1.0000  
##  Max.   :6400000   Max.   :11.0000   Max.   :4.0000

Transform data

Here I calculated the absolute standardized mean distance using the weighted average of variance between the first two groups.

##
data_smd <- data1 %>%
    filter(group %in% c(1,2)) %>%
    group_by(journal, trial, measure) %>%
    summarize(nrow = n(),
              decm = mean(c(decm[1], decm[2])),
              decsd = mean(c(decsd[1], decsd[2])),
              n_total = n[1] + n[2],
              absolute_mean_diff = abs(mean[1] - mean[2]),
              ## weighted mean of variance
              pooled_var = (n[1]/n_total * (sd[1]^2)) + (n[2]/n_total * (sd[2]^2)),
              pooled_sd = sqrt(pooled_var),
              smd = absolute_mean_diff / pooled_sd)

Plot data

The absolute SMD of each variable is plotted against the trial’s sample size (log10 scale).

## Overall full Y scale
ggplot(data = data_smd, mapping = aes(x = n_total, y = smd, label = trial)) +
    geom_text() +
    scale_y_continuous() +
    scale_x_log10(breaks = c(30,100,1000,5000)) +
    facet_wrap(~ journal) +
    theme_bw() + theme(legend.key = element_blank())
## Warning: Removed 23 rows containing missing values (geom_text).

Some trials have large outliers. As the sample size increases, the group balance measured in absolute SMD improves as expected. Some trials such as JAMA #267, JAMA #114, and NEJM #349 may be worth closer look considering their sample sizes and SMD magnitude.

## Zoomed
data_smd %>%
    mutate(decm = round(decm)) %>%
    filter(decm <= 3) %>%
    ##
    ggplot(data = ., mapping = aes(x = n_total, y = smd, label = trial, color = factor(decm))) +
    geom_text() +
    scale_y_continuous(limits = c(0,0.1)) +
    scale_x_log10(limits = c(min(data_smd$n_total), 100)) +
    facet_grid(factor(decm) ~ journal) +
    theme_bw() + theme(legend.key = element_blank())
## Warning: Removed 25378 rows containing missing values (geom_text).

At a closer look of small studies (n \(\le\) 100), many studies are at SMD = 0, which may be a little suspicious. Insufficient reporting of decimal points (the dataset contains this information) may be erroneously setting SMD to 0, but it is less likely to be the case in very small studies.

In the above mentioned moderate to large trials, the variables that had SMD > 1 are the following. JAMA #267 variable 6 appears quite discordant.

data1 %>%
    filter((journal == "JAMA" & trial %in% c(267,114)) |
           (journal == "NEJM" & trial %in% c(349))) %>%
    left_join(., select(data_smd, journal, trial, measure, pooled_sd, smd)) %>%
    filter(smd > 1) %>%
    print(n = Inf)
## # A tibble: 14 × 11
##    journal trial measure group     n  mean    sd  decm decsd pooled_sd      smd
##      <chr> <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>
## 1     JAMA   114       6     1   230  37.2  0.09     1     2 0.1476106 6.774581
## 2     JAMA   114       6     2   220  38.2  0.19     1     2 0.1476106 6.774581
## 3     JAMA   267       6     1    85 126.4  6.10     1     1 5.5771857 9.108537
## 4     JAMA   267       6     2    85 177.2  5.00     1     1 5.5771857 9.108537
## 5     NEJM   349       1     1  2365  63.0  0.20     1     1 0.2000000 2.500000
## 6     NEJM   349       1     2  2366  62.5  0.20     1     1 0.2000000 2.500000
## 7     NEJM   349       2     1  2365 138.9  0.40     1     1 0.4000000 1.250000
## 8     NEJM   349       2     2  2366 138.4  0.40     1     1 0.4000000 1.250000
## 9     NEJM   349       3     1  2365  82.0  0.20     1     1 0.2000000 3.000000
## 10    NEJM   349       3     2  2366  81.4  0.20     1     1 0.2000000 3.000000
## 11    NEJM   349       4     1  2365  27.5  0.10     1     1 0.1000000 1.000000
## 12    NEJM   349       4     2  2366  27.4  0.10     1     1 0.1000000 1.000000
## 13    NEJM   349       5     1  2365  87.1  1.00     1     1 1.0000000 2.800000
## 14    NEJM   349       5     2  2366  84.3  1.00     1     1 1.0000000 2.800000

According to appendix S1 “P. blocks of 8. Perhaps authors’ error in Var6, either wrong mean or authors’ SD was SEM (perhaps mean should have been 176.4, not 126.4 as other mean 177.2)”.