Sample size is tested with the success-failure condition to ensure that the sample data has enough observations that inferences can be made about the sampling distribution using a normal distribution.
To demonstrate this, four binomial sampling distributions have been made below with 1000 samples each and a success rate of 0.5. The distributions differ by the size of the sample; n = 10, 20, 100, and 150. From the graphs, the greater the n value the closer to normal and less discrete the distribution is in addition to less variability.
library(ggplot2)
set.seed(6)
temp = NULL
for (n in c(10, 20, 100, 150)) {
temp = c(temp, (rbinom(n = 1000, size = n, prob = 0.5)/n))
}
sizeEx = data.frame(identifier = rep(c("A", "B", "C", "D"), each = 1000),
data = temp)
graph1 = ggplot(sizeEx, aes(x = data)) + geom_bar(fill = "navy") +
facet_wrap(~identifier, labeller = labeller(identifier = c(A = "n = 10",
B = "n = 20", C = "n = 100", D = "n = 150")), scale = "free_y") +
scale_x_continuous(breaks = seq(0, 1, 0.1), limits = c(0,
1)) + labs(title = "Sample Distributions for a Population's Proportion",
x = "", y = "Frequency of the Proportion")
graph1
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_bar()`).
