DATA 621 Blog 5 Sampling and Survey Bias:

Sampling

Sampling is the process of selecting a subset of observations from a larger population. Because it is often impractical to measure entire populations, sampling plays a critical role in data collection.

However, poor sampling methods can lead to biased results. This is common in surveys where participants self-select or certain groups are underrepresented.

R Example

set.seed(122)

# Population data
population <- rnorm(10000, mean = 50, sd = 10)

# Biased sample (only higher values)
biased_sample <- population[population > 55][1:200]

mean(population)
## [1] 49.92574
mean(biased_sample)
## [1] 61.44295

Visualization

hist(population, breaks = 40, col = "gray",
     main = "Population vs Biased Sample",
     xlab = "Value")
hist(biased_sample, breaks = 20, col = "red", add = TRUE)

Interpretation

The biased sample overestimates the true population mean because it excludes lower values. This demonstrates how sampling decisions directly affect conclusions.

Conclusion

Sampling methods are just as important as analysis techniques. Understanding and avoiding bias is essential for drawing valid conclusions from data.