DATA 621 Blog 5 Sampling and Survey Bias:
Sampling
Sampling is the process of selecting a subset of observations from a larger population. Because it is often impractical to measure entire populations, sampling plays a critical role in data collection.
However, poor sampling methods can lead to biased results. This is common in surveys where participants self-select or certain groups are underrepresented.
R Example
set.seed(122)
# Population data
population <- rnorm(10000, mean = 50, sd = 10)
# Biased sample (only higher values)
biased_sample <- population[population > 55][1:200]
mean(population)
## [1] 49.92574
mean(biased_sample)
## [1] 61.44295
Visualization
hist(population, breaks = 40, col = "gray",
main = "Population vs Biased Sample",
xlab = "Value")
hist(biased_sample, breaks = 20, col = "red", add = TRUE)
Interpretation
The biased sample overestimates the true population mean because it excludes lower values. This demonstrates how sampling decisions directly affect conclusions.
Conclusion
Sampling methods are just as important as analysis techniques. Understanding and avoiding bias is essential for drawing valid conclusions from data.