How does standard deviation behave with low sample sizes?

In this week’s forum threads for the Principles and Practice of Clinical research, one of the posts raises the question:

Why We Underestimate Variance for Sample Size Calculation?

The colleagues that posted before me all found this question hard to answer, and, when looking at their answers, I agreed. I couldn’t come up with verifiable reasons for why that would happen systematically. The opening post claims:

An article by Livingston 2005 (1), shows that in most of the trials the SD used for sample size calculation is usually smaller as compared to the actual SD found in the trials in 80% of the studied trials.

Well, if this was due to chance, one would presume that it would be closer to 50%, half are smaller than the calculated and half are bigger. I thought that maybe it would be a good idea to run an hypothesis test on this data to see if this is a statistically significant difference as well, but I couldn’t find the original claim in the article so I just let it be.

Either way, the question still remained and I felt like I lacked the mathematical background to answer it. I had a feeling, though, that two things might play a part here, due to the fact that many trials use previous pilot studies to determine the values which they will use in sample size calculation:

As the sampling is done by convenience, a smaller sample size might also lead to a more homogenous sample than the one that gets enrolled on the larger, subsequent trial.
The sample standard deviation might be systematically lower than the population standard deviation, and, as sample size increases, the SD approximates the population standard deviation, getting larger and larger.

These are just gut feelings, and I have no compelling evidence that this is the case for sure, but I thought I might be able to test it empirically with R.

The first step was to create a normal distribution with a pre-determined SD.

library (tidyverse) # loading required libraries
set.seed(1234) # to ensure reproducibility
normal <- rnorm (1000000, mean = 0, sd = 100)
as.data.frame(normal) %>% ggplot (aes(x=normal)) + geom_density()

Next step is to sample this distribution and create a data-frame that collects the size of the sample and the standard-deviation of that sample.

# Create an empty data frame
df <- data.frame(n = numeric(), sd = numeric())

# Loop through each sample size (n)
for (n in 2:24){
        # Create 100 samples for each sample size
        for (i in 1:100){
        # Generate a sample of size n from the "normal" variable
        sample_data <- sample(normal, n)
        
        # Calculate the standard deviation of the sample
        sample_sd <- sd(sample_data)
        
        # Append the sample size (n) and sample standard deviation (sd) to the data frame
        df <- rbind(df, data.frame(n = n, sd = sample_sd))
        }
}

First, we can plot the average standard deviation for each sample size. The horizontal line shows the population standard deviation.

df %>% group_by(n) %>% 
        summarize(average.sd = mean(sd)) %>%
                ggplot(aes(x = n, y = average.sd)) + geom_point() +
                        geom_hline(yintercept = 100)

Next, we can test if the average SD for a given sample size is statistically different from the population SD of 100.

library(knitr)
df2 <- data.frame(n = numeric(), p.value = numeric(), significant = logical())
for(i in 2:24){
        df %>% filter (n == i) %>% pull (sd) %>% t.test(mu = 100) -> ttest
        if (ttest$p.value < 0.05){
                sig <- TRUE
        } else {
                sig <- FALSE
        }
        df2 <- rbind(df2, data.frame(n = i, p.value = ttest$p.value, significant = sig))
}
kable(df2)

n	p.value	significant
2	0.0833216	FALSE
3	0.0132271	TRUE
4	0.0112484	TRUE
5	0.0125649	TRUE
6	0.1247991	FALSE
7	0.0666966	FALSE
8	0.0074822	TRUE
9	0.2523369	FALSE
10	0.3442947	FALSE
11	0.0678563	FALSE
12	0.0563202	FALSE
13	0.3115111	FALSE
14	0.5915422	FALSE
15	0.0013134	TRUE
16	0.1019738	FALSE
17	0.4076397	FALSE
18	0.4318861	FALSE
19	0.0541384	FALSE
20	0.5566075	FALSE
21	0.1187056	FALSE
22	0.4587887	FALSE
23	0.1820971	FALSE
24	0.4685764	FALSE

These results lead me to think that there might be an effect of low sample sizes on underestimating the true SD of the population, and, therefore, when researchers base their calculations on pilot studies, they may be underestimating the true SD.

How does standard deviation behave with low sample sizes?

Pedro Henrique Brant

2023-06-20