Setup

Rationale

Simulations

Simulation I: Correlations and Sample Sizes

Scenario I: A Large True Effect
Scenario II: A Small True Effect

Simulation II: Meta-Analyses vs. Large Studies

Discussion

References

Post-Script: Effect Sizes of Large Studies and Meta-Analyses

Setup

library(pacman); p_load(metafor, ggplot2, dplyr, tidyverse, scales, rockchalk)

CRITR <- function(n, alpha = .05) {
  df <- n - 2; CRITT <- qt(alpha/2, df, lower.tail = F)
  CRITR <- sqrt((CRITT^2)/((CRITT^2) + df ))
  return(CRITR)}

my_theme <- function(){
  theme(
    legend.background = element_rect(colour = "black", linewidth = 0.5),
    legend.key.width = unit(0.5, "cm"),
    legend.key.height = unit(0.4, "cm"),
    panel.background = element_rect(fill = "white"),
    plot.background = element_rect(fill = "white"),
    panel.grid.major.x = element_blank(),
    panel.grid.minor.x = element_blank(),
    panel.grid.major.y = element_line(colour ="gray80",linewidth = 1,linetype="dashed"),
    panel.grid.minor.y = element_blank(),
    axis.text.x = element_text(colour ="black",size=rel(1.5),hjust=0.5),
    axis.text.y = element_text(colour ="black",size=rel(1.5),vjust=0.5),
    axis.title.x = element_text(colour ="black",size=rel(1.2)),
    axis.title.y = element_text(colour ="black",size=rel(1.2)),
    plot.margin=unit(c(1,0.5,0.5,0.5), "cm"),
    plot.title=element_text(hjust=0.5,vjust=5,size=rel(1.5),face="bold"),
    plot.caption=element_text(hjust=0,size=rel(1.2)),
    axis.line.x=element_line(color="gray80",linewidth = 1,linetype="dashed"),
    axis.line.y=element_line(color="gray80",linewidth = 1,linetype="dashed"),
    axis.ticks.x = element_blank(),
    axis.ticks.y = element_blank())}

Rationale

I have discussed the relationship between power and effects before (https://rpubs.com/JLLJ/MSRY).

Small-scale pilot studies are often undertaken for the purposes of justifying further study. The problem with these is that they increase the rates of Type I and II errors, potentially discouraging good research and encouraging bad research far more than a few larger studies would. And, critically, when they are used as the basis for the power studies behind additional studies, they are likely to produce biased results due to the extent of variance in estimated effects at small sample sizes.

One proposal to deal with this is to run many pilot studies. After all, replication is taken as proof that effects are real. If there’s no publication bias and people present all of their results, leaving the file-drawer empty, that might work! And true, that will tend to work. But we live in a world where people often do not publish nonsignificant or unfavorable results.

So below, I have provided plots that show the imprecision implied in studies with small sample sizes. I used correlations for this purpose. I then showed how the effect of imprecision can be accentuated by a bias towards significant results (“publication bias”).

I next showed the proportion of meta-analytic results for samples of ten small studies versus the proportion of results from singular large studies where the 95% CIs contained a true “small” effect (d = .10).

Simulations

Simulation I: Correlations and Sample Sizes

Scenario I: A Large True Effect

This code is largely borrowed from: https://www.r-bloggers.com/2020/01/on-the-relationship-of-the-sample-size-and-the-correlation/.

In this first scenario, basically all effects at any sample size will be significant. You can see this if you add the line line_data <- data.frame(x = 3:5000, y = sapply(3:5000, CRITR)) before graphing, and + geom_line(data = line_data, aes(x = x, y = y), linewidth = 1, alpha = .6) + coord_cartesian(ylim = c(0.4, 0.8)) after my_theme(). In this case, the effect of having a small sample size on effect size over- and underestimation is there, but its importance is relatively muted because basically all of the effects you’ll arrive at are fairly large. If you performed a power analysis based on a pilot study in a relatively small sample with a true effect that was this large, you wouldn’t be too greatly misled in quantitative terms, but at small sample sizes, you definitely could still easily pick an effect size for subsequent power analyses that could mislead you to a pretty considerable degree.

Since this isn’t very strongly affected by the number of simulated points, we’ll leave this to 1000 samples.

set.seed(123)

myCov=lazyCov(Rho=0.6, Sd=15, d=2) # define covariance matrix for two variables with sd = 15 and correlation = .6
myData=data.frame(mvrnorm(n=1000000, mu=c(100,100), Sigma=myCov)) # create two variables with specified covariance matrix and mean = 100, with population = 1,000,000
pop_cor=cor(myData[,1], myData[,2]) # calculate population correlation

rez=data.frame() # result data frame
for (i in (1:1000)){ # iterate through samples
  sampleData=myData[sample(nrow(myData),i*5),] # select random sample from myData with size i*5
  q=cor(sampleData$X1,sampleData$X2, method = "pearson") # calculate correlation of the sample
  rez[i,1]=i*5 # sample size - V1
  rez[i,2]=q # sample correlation - V2
  rez[i,3]=abs(q-pop_cor)} # absolute deviation from the population correlation - V3

ggplot(data = rez, aes(x = V1, y = V2)) +
  geom_point() +
  labs(title = "Simulated Correlations at Different Sample Sizes", x = "Sample Size", y = "Correlation")+
  annotate("text", x = 4000, y = 0.675, size = 9, label = paste("Population r = ", round(pop_cor, 2))) + 
  my_theme() + 
  scale_x_continuous(expand = c(0, 0), limits = c(0, 5050))

Scenario II: A Small True Effect

If the true effect size is small, however, then significance filtering is likely to take place, and this could result in major biases if the effects arrived at with pilot studies are used to inform power analyses for subsequent confirmatory studies. I’ve shown the effect sizes that are not significant at different sample sizes in the purple-shaded areas below.

set.seed(123)

myCov=lazyCov(Rho=0.1, Sd=15, d=2) # define covariance matrix for two variables with sd = 15 and correlation = .1
myData=data.frame(mvrnorm(n=1000000, mu=c(100,100), Sigma=myCov))
pop_cor=cor(myData[,1], myData[,2]) 

rez=data.frame()
for (i in (1:1000)){
  sampleData=myData[sample(nrow(myData),i*5),]
  q=cor(sampleData$X1,sampleData$X2, method = "pearson")
  rez[i,1]=i*5
  rez[i,2]=q
  rez[i,3]=abs(q-pop_cor)}

line_data <- data.frame(x = 3:5000, y = sapply(3:5000, CRITR), yn = -sapply(3:5000, CRITR)) # simulate a line indicating the minimum significant r at a given sample size, positive values

ggplot(data = rez, aes(x = V1, y = V2)) +
  geom_point() +
  labs(title = "Simulated Correlations at Different Sample Sizes", x = "Sample Size", y = "Correlation")+
  annotate("text", x = 4000, y = 0.25, size = 9, label = paste("Population r = ", round(pop_cor, 2))) + 
  my_theme() +
  geom_line(data = line_data, aes(x = x, y = y), linewidth = 1, alpha = .6) + 
  geom_line(data = line_data, aes(x = x, y = yn), linewidth = 1, alpha = .6) +
  coord_cartesian(ylim = c(-0.1, 0.4)) +
  geom_ribbon(data = line_data, aes(x = x, ymin = 0, ymax = y), inherit.aes = FALSE, fill = "#67238a", alpha = 0.6) +
  geom_ribbon(data = line_data, aes(x = x, ymin = yn, ymax = 0), inherit.aes = FALSE, fill = "#67238a", alpha = 0.6) + 
  scale_x_continuous(expand = c(0, 0), limits = c(0, 5050))

Now we need to know how bad the problem is for this data.

# Join the rez data with the line_data data frame to get the corresponding y and yn values
joined_data_all <- merge(rez, line_data, by.x = "V1", by.y = "x")

# Calculate the proportion of results that were within the area between y and yn for each N value
proportions_by_n_all <- data.frame(
  N = c(5000, 1000, 500, 250, 100, 50),
  proportion_within_area = c(
    nrow(joined_data_all[joined_data_all$V1 <= 5000 & joined_data_all$V2 >= joined_data_all$yn & joined_data_all$V2 <= joined_data_all$y,]) / nrow(joined_data_all[joined_data_all$V1 <= 5000,]),
    nrow(joined_data_all[joined_data_all$V1 <= 1000 & joined_data_all$V2 >= joined_data_all$yn & joined_data_all$V2 <= joined_data_all$y,]) / nrow(joined_data_all[joined_data_all$V1 <= 1000,]),
    nrow(joined_data_all[joined_data_all$V1 <= 500 & joined_data_all$V2 >= joined_data_all$yn & joined_data_all$V2 <= joined_data_all$y,]) / nrow(joined_data_all[joined_data_all$V1 <= 500,]),
    nrow(joined_data_all[joined_data_all$V1 <= 250 & joined_data_all$V2 >= joined_data_all$yn & joined_data_all$V2 <= joined_data_all$y,]) / nrow(joined_data_all[joined_data_all$V1 <= 250,]),
    nrow(joined_data_all[joined_data_all$V1 <= 100 & joined_data_all$V2 >= joined_data_all$yn & joined_data_all$V2 <= joined_data_all$y,]) / nrow(joined_data_all[joined_data_all$V1 <= 100,]),
    nrow(joined_data_all[joined_data_all$V1 <= 50 & joined_data_all$V2 >= joined_data_all$yn & joined_data_all$V2 <= joined_data_all$y,]) / nrow(joined_data_all[joined_data_all$V1 <= 50,])))

# Display the results
proportions_by_n_all

ABCDEFGHIJ0123456789

N <dbl>	proportion_within_area <dbl>
5000	0.096
1000	0.455
500	0.670
250	0.680
100	0.850
50	1.000

So, overall, a bit under 10% of results for all sample sizes below 5000 were not significant and thus might have been filtered. Below 1000, a 46% of results were not significant. Below 500, this was 67%, and that increased to 68% below 250, 85% below 100, and 100% below 50. The effect of significance filtering is probably less important than the effect of high variance at smaller sample sizes. But we can check explicitly to make sure by looking at the average sample sizes in various sample size bins, with and without significance filtering.

# Calculate the average correlations below the specified N values
average_correlations <- data.frame(
  N = c(5000, 1000, 500, 250, 100, 50),
  avg_correlation = c(
    mean(joined_data_all[joined_data_all$V1 <= 5000,"V2"]),
    mean(joined_data_all[joined_data_all$V1 <= 1000,"V2"]),
    mean(joined_data_all[joined_data_all$V1 <= 500,"V2"]),
    mean(joined_data_all[joined_data_all$V1 <= 250,"V2"]),
    mean(joined_data_all[joined_data_all$V1 <= 100,"V2"]),
    mean(joined_data_all[joined_data_all$V1 <= 50,"V2"])))

# Calculate the average correlations filtered for significance below the specified N values
average_correlations_filtered <- data.frame(
  N = c(5000, 1000, 500, 250, 100, 50),
  avg_correlation_filtered = c(
    mean(joined_data_all[joined_data_all$V1 <= 5000 & joined_data_all$V2 >= joined_data_all$yn & joined_data_all$V2 <= joined_data_all$y,"V2"]),
    mean(joined_data_all[joined_data_all$V1 <= 1000 & joined_data_all$V2 >= joined_data_all$yn & joined_data_all$V2 <= joined_data_all$y,"V2"]),
    mean(joined_data_all[joined_data_all$V1 <= 500 & joined_data_all$V2 >= joined_data_all$yn & joined_data_all$V2 <= joined_data_all$y,"V2"]),
    mean(joined_data_all[joined_data_all$V1 <= 250 & joined_data_all$V2 >= joined_data_all$yn & joined_data_all$V2 <= joined_data_all$y,"V2"]),
    mean(joined_data_all[joined_data_all$V1 <= 100 & joined_data_all$V2 >= joined_data_all$yn & joined_data_all$V2 <= joined_data_all$y,"V2"]),
    mean(joined_data_all[joined_data_all$V1 <= 50 & joined_data_all$V2 >= joined_data_all$yn & joined_data_all$V2 <= joined_data_all$y,"V2"])))

# Filter the data to exclude values between y and yn
filtered_data <- joined_data_all[joined_data_all$V2 < joined_data_all$yn | joined_data_all$V2 > joined_data_all$y,]

# Calculate the average correlations below the specified N values for non-filtered results
average_correlations_significant <- data.frame(
  N = c(5000, 1000, 500, 250, 100, 50),
  avg_correlation_significant = c(
    mean(filtered_data[filtered_data$V1 <= 5000,"V2"]),
    mean(filtered_data[filtered_data$V1 <= 1000,"V2"]),
    mean(filtered_data[filtered_data$V1 <= 500,"V2"]),
    mean(filtered_data[filtered_data$V1 <= 250,"V2"]),
    mean(filtered_data[filtered_data$V1 <= 100,"V2"]),
    mean(filtered_data[filtered_data$V1 <= 50,"V2"])))

# Display the results
average_correlations

ABCDEFGHIJ0123456789

N <dbl>	avg_correlation <dbl>
5000	0.09945828
1000	0.10274273
500	0.10976281
250	0.12700895
100	0.13526906
50	0.13158448

average_correlations_filtered

ABCDEFGHIJ0123456789

N <dbl>	avg_correlation_filtered <dbl>
5000	0.06984582
1000	0.07181495
500	0.07916928
250	0.09006573
100	0.10912859
50	0.13158448

average_correlations_significant

ABCDEFGHIJ0123456789

N <dbl>	avg_correlation_significant <dbl>
5000	0.1026030
1000	0.1285632
500	0.1718770
250	0.2055133
100	0.2833984
50	NaN

The critical correlation for an n of 50 is

CRITR(50)

## [1] 0.2787106

so it makes sense that, even with 50,000 people and no additional biasing factors or post hoc controls, there weren’t any significant results. It’s likely that a significant effect with n = 50 and a true effect of r = 0.10 would be extraordinarily inflated, but it generally won’t happen without a source of bias, searching for a significant specification, or measurement unreliability.

Regardless, this result made clear that for bins with enough points in them to produce a mean correlation estimate, there was a lot of inflation as sample sizes got smaller. Averaging through a sample size of 5000, inflation was minimal; the true r of 0.10 was basically what was achieved, with or without filtering for significance. Dropping to 1000, this became less true, and the achieved effect size was 129% of the desired one and 125% of the empirical unfiltered one. This problem became worse with shrinking cutoffs. With the cutoff at 500, the achieved effect size was 172% as large as the desired effect size and 157% as large as the empirical unfiltered one. At 250, the achieved filtered effect size was 206% as large as the desired one and 162% as large as the empirical unfiltered one. At 100, the achieved filtered effect size was 283% as large as the desired one and 210% as large as the empirical unfiltered one. At 50, there weren’t any examples of unfiltered effects to work with, but the empirical unfiltered effect size was 132% as large as the desired one; at 100, it was 135% as large, so even though the simulation left it equally likely for effects to be underestimated as they were to be overestimated, in this simulation, they ended up overestimated at small sample sizes.

So there we have it: there’s lots of variability in effect size estimates at small sample sizes, and this can be accentuated by publication bias. In the real world, publication bias is generally not total, so the problem in real life could be more muted. But, the real world also has other factors that accentuate publication bias, like wandering down the garden of forking paths, using unreliable measurements, psychometric bias, and perhaps even taking forking paths to estimates that come with completely different estimands than the ones researchers sought to test.

Simulation II: Meta-Analyses vs. Large Studies

There’s a bit more power for mean comparisons than for correlations (Yarkoni, 2009), so let’s be charitable to meta-analysis and use mean comparisons to contrast meta-analytic results with small samples with singular analyses done with larger samples.

set.seed(123)

# Define the number of simulations
n_simulations <- 1000

# Define the sample sizes for the small and large studies
n_small <- 25
n_large <- 250

# Define the true effect size
d_true <- 0.10

# Define a function to generate data from a given sample size and effect size
generate_data <- function(n, d) {
  # Generate data for the control group
  control <- rnorm(n)
  
  # Generate data for the treatment group
  treatment <- rnorm(n, mean = d)
  
  # Return the data as a data frame
  data.frame(control = control, treatment = treatment)}

# Define a function to calculate the effect size and its standard error for a given dataset
calculate_effect_size <- function(data) {
  # Calculate the mean difference
  mean_diff <- mean(data$treatment) - mean(data$control)
  
  # Calculate the pooled standard deviation
  sd_pooled <- sqrt((sd(data$control)^2 + sd(data$treatment)^2) / 2)
  
  # Calculate Cohen's d
  d <- mean_diff / sd_pooled
  
  # Calculate the standard error of d
  n <- nrow(data)
  se_d <- sqrt((n / (n - 3)) * (4 / n))
  
  # Return the effect size and its standard error
  list(d = d, se_d = se_d)}

# Initialize vectors to store the results of the simulations
large_sample_results <- numeric(n_simulations)
meta_analysis_results <- numeric(n_simulations)

# Initialize vectors to store whether the CI contains the true effect size for each simulation
large_sample_ci_contains_true_effect <- logical(n_simulations)
meta_analysis_ci_contains_true_effect <- logical(n_simulations)

# Run the simulations
for (i in seq_len(n_simulations)) {
  # Generate data for the large sample study
  large_sample_data <- generate_data(n_large, d_true)
  
  # Calculate the effect size for the large sample study
  large_sample_results[i] <- calculate_effect_size(large_sample_data)$d
  
  # Check if the CI contains the true effect size for the large sample study
  large_sample_ci_contains_true_effect[i] <-
    abs(large_sample_results[i] - d_true) <= qt(0.975, df = n_large - 1) * sqrt(2 / n_large)
  
  # Initialize vectors to store the effect sizes and their standard errors for the small studies
  small_study_effect_sizes <- numeric(10)
  small_study_se_d <- numeric(10)
  
  # Generate data for the small studies, calculate their effect sizes and their standard errors
  for (j in seq_len(10)) {
    small_study_data <- generate_data(n_small, d_true)
    small_study_results <- calculate_effect_size(small_study_data)
    small_study_effect_sizes[j] <- small_study_results$d
    small_study_se_d[j] <- small_study_results$se_d}
  
  # Perform a meta-analysis of the small studies and extract its results
  meta_analysis_result <- rma.uni(small_study_effect_sizes, sei = small_study_se_d)
  
  # Store the estimated effect size from the meta-analysis
  meta_analysis_results[i] <- meta_analysis_result$b[1]
  
  # Check if the CI contains the true effect size for the meta-analysis result
  ci_lower_bound <- meta_analysis_result$ci.lb[1]
  ci_upper_bound <- meta_analysis_result$ci.ub[1]
  
  meta_analysis_ci_contains_true_effect[i] <-
    ci_lower_bound <= d_true && d_true <= ci_upper_bound}

# Calculate the proportion of results where the CI contains the true effect size
large_sample_proportion <-
    mean(large_sample_ci_contains_true_effect)

meta_analysis_proportionUB <-
    mean(meta_analysis_ci_contains_true_effect)

# Output the large sample proportions and meta-analytic proportions

large_sample_proportion

## [1] 0.954

meta_analysis_proportionUB

## [1] 0.995

There we have it! The large studies achieved accurate results and the unbiased meta-analyses did too! Now, I’m going to rerun this simulation to obtain what the meta-analytic result would be if people refused to published significant results.

set.seed(123)

# Initialize vectors to store the results of the simulations
large_sample_results <- numeric(n_simulations)
meta_analysis_results <- numeric(n_simulations)

# Initialize vectors to store whether the CI contains the true effect size for each simulation
large_sample_ci_contains_true_effect <- logical(n_simulations)
meta_analysis_ci_contains_true_effect <- logical(n_simulations)

# Run the simulations
for (i in seq_len(n_simulations)) {
    # Generate data for the large sample study
    large_sample_data <- generate_data(n_large, d_true)

    # Calculate the effect size for the large sample study
    large_sample_results[i] <-
        calculate_effect_size(large_sample_data)$d

    # Check if the CI contains the true effect size for the large sample study
    large_sample_ci_contains_true_effect[i] <-
        abs(large_sample_results[i] - d_true) <= qt(0.975, df = n_large - 1) * sqrt(2 / n_large)

    # Initialize vectors to store the effect sizes and their standard errors for the small studies
    small_study_effect_sizes <- numeric(10)
    small_study_se_d <- numeric(10)

    # Generate data for the small studies, calculate their effect sizes and their standard errors,
    # and check if their p-values are significant at alpha = .05 level.
    j <- 1

    while (j <= 10) {
        small_study_data <- generate_data(n_small, d_true)
        small_study_results <-
            calculate_effect_size(small_study_data)

        if (small_study_results$p_value <= .05) {
            small_study_effect_sizes[j] <-
                small_study_results$d

            small_study_se_d[j] <-
                small_study_results$se_d

            j <- j + 1}}

    if (all(small_study_effect_sizes == 0)) {
        meta_analysis_ci_contains_true_effect[i] <-
            FALSE

        next}

    # Perform a meta-analysis of the significant small studies and extract its results.
    meta_analysis_result <-
        rma.uni(small_study_effect_sizes[small_study_effect_sizes != 0], sei = small_study_se_d[small_study_se_d != 0])

    # Store the estimated effect size from the meta-analysis.
    meta_analysis_results[i] <-
        meta_analysis_result$b[1]

    # Check if the CI contains the true effect size for the meta-analysis result.
    ci_lower_bound <-
        meta_analysis_result$ci.lb[1]

    ci_upper_bound <-
        meta_analysis_result$ci.ub[1]

    meta_analysis_ci_contains_true_effect[i] <-
        ci_lower_bound <= d_true && d_true <= ci_upper_bound}

# Calculate the proportion of results where the CI contains the true effect size.
meta_analysis_proportionB <-
mean(meta_analysis_ci_contains_true_effect)

meta_analysis_proportionB

## [1] 0.223

Publication bias reduced the viability of meta-analytic results. Since nonsignificant results are expected to occur frequently with small correlations, the results of a meta-analysis of small studies ougrht to be seriously biased upwards. We can compare the correctness of these approaches graphically:

data.frame(
    Study = c("Large Sample", "X Meta-Analysis", "Y Meta-Analysis"),
    Proportion = c(large_sample_proportion, meta_analysis_proportionUB, meta_analysis_proportionB)) %>%
  ggplot(aes(x = Study, y = Proportion)) +
  scale_y_continuous(expand = c(0, 0), limits = c(0, 1), labels = scales::percent) + 
  geom_col(fill = c("#67238a", "#1874cd", "#9B870C")) +
  labs(y = "Proportion of Results Where CI Contains True Effect Size",
       x = NULL) + 
  my_theme() +
  scale_x_discrete(labels = c("Large Sample Results", "Meta-Analytic Results
Without Publication Bias", "Meta-Analytic Results
With Publication Bias"))

Discussion

As these simple simulations show, it’s unwise to use pilot studies for subsequent power analyses and large studies outperform meta-analyses of small studies in the presence of publication bias.

References

Yarkoni, T. (2009). Big Correlations in Little Studies: Inflated fMRI Correlations Reflect Low Statistical Power—Commentary on Vul et al. (2009). Perspectives on Psychological Science, 4(3), 294–298. https://doi.org/10.1111/j.1745-6924.2009.01127.x

sessionInfo()

## R version 4.3.1 (2023-06-16 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19045)
## 
## Matrix products: default
## 
## 
## locale:
## [1] LC_COLLATE=English_United States.utf8 
## [2] LC_CTYPE=English_United States.utf8   
## [3] LC_MONETARY=English_United States.utf8
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.utf8    
## 
## time zone: America/Chicago
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] rockchalk_1.8.157   scales_1.2.1        lubridate_1.9.2    
##  [4] forcats_1.0.0       stringr_1.5.0       purrr_1.0.1        
##  [7] readr_2.1.4         tidyr_1.3.0         tibble_3.2.1       
## [10] tidyverse_2.0.0     dplyr_1.1.2         ggplot2_3.4.2      
## [13] metafor_4.2-0       numDeriv_2016.8-1.1 metadat_1.2-0      
## [16] Matrix_1.5-4.1      pacman_0.5.1       
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.3      xfun_0.39         bslib_0.5.0       lattice_0.21-8   
##  [5] mathjaxr_1.6-0    tzdb_0.4.0        vctrs_0.6.3       tools_4.3.1      
##  [9] generics_0.1.3    fansi_1.0.4       highr_0.10        pkgconfig_2.0.3  
## [13] lifecycle_1.0.3   farver_2.1.1      compiler_4.3.1    munsell_0.5.0    
## [17] kutils_1.72       carData_3.0-5     htmltools_0.5.5   sass_0.4.7       
## [21] yaml_2.3.7        pillar_1.9.0      nloptr_2.0.3      jquerylib_0.1.4  
## [25] MASS_7.3-60       cachem_1.0.8      boot_1.3-28.1     nlme_3.1-162     
## [29] tidyselect_1.2.0  zip_2.3.0         digest_0.6.33     stringi_1.7.12   
## [33] labeling_0.4.2    splines_4.3.1     fastmap_1.1.1     grid_4.3.1       
## [37] colorspace_2.1-0  cli_3.6.1         magrittr_2.0.3    utf8_1.2.3       
## [41] foreign_0.8-84    withr_2.5.0       timechange_0.2.0  rmarkdown_2.23   
## [45] lme4_1.1-34       hms_1.1.3         openxlsx_4.2.5.2  evaluate_0.21    
## [49] knitr_1.43        rlang_1.1.1       Rcpp_1.0.11       xtable_1.8-4     
## [53] glue_1.6.2        rstudioapi_0.15.0 minqa_1.2.5       jsonlite_1.8.7   
## [57] R6_2.5.1          plyr_1.8.8

Post-Script: Effect Sizes of Large Studies and Meta-Analyses

We can plot out the study-level effect sizes for the large study results and the studies that went into the meta-analytic results. For the purposes of being realistic, we need to reduce the number of studies per category to a a much smaller number, like 80. This is arbitrary and unimportant, so pick whatever number you feel is right. We can also check what happens with 100% publication bias and 50% publication bias. 100% publication bias would entail not publishing any nonsignificant or negative results. 50% publication bias would entail omitting half of those results.

# Define the number of simulations
n_simulations <- 80

# Define the sample sizes for the small and large studies
n_small <- 25
n_large <- 250

# Define the true effect size
d_true <- 0.10

# Define a function to generate data from a given sample size and effect size
generate_data <- function(n, d) {
  # Generate data for the control group
  control <- rnorm(n)
  
  # Generate data for the treatment group
  treatment <- rnorm(n, mean = d)
  
  # Return the data as a data frame
  data.frame(control = control, treatment = treatment)}

# Define a function to calculate the effect size for a given dataset
calculate_effect_size <- function(data) {
  # Calculate the mean difference
  mean_diff <- mean(data$treatment) - mean(data$control)
  
  # Calculate the pooled standard deviation
  sd_pooled <- sqrt((sd(data$control)^2 + sd(data$treatment)^2) / 2)
  
  # Calculate Cohen's d
  d <- mean_diff / sd_pooled
  
  # Return the effect size
  d}

set.seed(123)

# Initialize vectors to store the results of the simulations
large_sample_results <- numeric(n_simulations)
meta_analysis_results <- numeric(n_simulations)
censored_50_meta_analysis_results <- numeric(n_simulations)
censored_100_meta_analysis_results <- numeric(n_simulations)

# Initialize a data frame to store the effect sizes for all studies
effect_sizes_df <- data.frame(
    Study = character(0),
    EffectSize = numeric(0))

# Run the simulations
for (i in seq_len(n_simulations)) {
    # Generate data for the large sample study
    large_sample_data <- generate_data(n_large, d_true)
    # Calculate the effect size for the large sample study
    large_sample_results[i] <- calculate_effect_size(large_sample_data)
    # Add the effect size to the data frame
    effect_sizes_df[nrow(effect_sizes_df) + 1, ] <- list("Large Study", large_sample_results[i])
    # Initialize a vector to store the effect sizes for the small studies
    small_study_effect_sizes <- numeric(10)
    # Generate data for one small study and calculate its effect size 
    small_study_data <- generate_data(n_small, d_true)
    small_study_effect_size <- calculate_effect_size(small_study_data)
    # Add small study effect size to all categories 
    effect_sizes_df[nrow(effect_sizes_df) + 1, ] <- list("Meta-Analysis Without Censoring", small_study_effect_size)
    if (!(t.test(small_study_data$treatment, small_study_data$control)$p.value > 0.05 || small_study_effect_size <0)) {
        effect_sizes_df[nrow(effect_sizes_df) + 1, ] <- list("Meta-Analysis With 100% Censoring", small_study_effect_size)}
    if (t.test(small_study_data$treatment, small_study_data$control)$p.value > 0.05 || small_study_effect_size <0) {
        if (runif(1) < .5) {
            effect_sizes_df[nrow(effect_sizes_df) + 1, ] <- list("Meta-Analysis With 50% Censoring", small_study_effect_size)}}
    censored_50_meta_analysis_results[i] = mean(effect_sizes_df$EffectSize[effect_sizes_df$Study == "Meta-Analysis With 50% Censoring"])
    censored_100_meta_analysis_results[i] = mean(effect_sizes_df$EffectSize[effect_sizes_df$Study == "Meta-Analysis With 100% Censoring"])}

set.seed(100)
# Create a jitter plot of the effect sizes.
effect_sizes_df %>%
ggplot(aes(x = Study, y = EffectSize)) +
geom_jitter(alpha = 0.3, size = 2) +
stat_summary(fun.data = mean_cl_normal, geom = "errorbar", width = .2, color = "#9B870C", linewidth = 1) +
stat_summary(fun.data = mean_cl_normal, geom = "point", color = "#1874cd", size = 3) +
my_theme() +
labs(y = "Effect Sizes",
     x = NULL) + 
  scale_x_discrete(
    labels = c("Large Study Effects", "Meta-Analysis
100% Censoring", "Meta-Analysis
50% Censoring", "Meta-Analysis
No Censoring")) + 
  geom_hline(yintercept = 0.1, linetype = "dashed", color = "gray80", linewidth = 1)

With censoring applied for all negative and nonsignificant results, the studies in our meta-analyses would be severely biased. With 50% of negative and nonsignificant results censored at random, the meta-analysis would not be that bad! Dropping the possible number of studies to, say, 20, there’s still not much bias with this method because the removal is at random within the set of estimates that were nonsignificant or negative. What happens is just that the CI becomes wider, but as the number of samples decreases, the likelihood of biased means increases and CIs obviously become wider. Uncensored results converge with large study estimates, but with wider CIs.

Problems With Pilot Studies

Setup

Rationale

Simulations

Simulation I: Correlations and Sample Sizes

Scenario I: A Large True Effect

Scenario II: A Small True Effect

Simulation II: Meta-Analyses vs. Large Studies

Discussion

References

Post-Script: Effect Sizes of Large Studies and Meta-Analyses