2025-09-17

What does Sampling Bias mean?

When a sample does not accurately represent the true population: \[ P(\hat{\theta}) \neq \theta \]

Why does Sampling Bias matter?

-Bias can lead to inaccurate results, this is especially misleaading if the data is being used to train a model

-Models that inherit biased information can lead to unfair outcomes

-Data bias can cause people to question the reliability of these models or AI systems

Example of Biased Sampling

-These proportions do not match reality

Population vs. Sample Bias

Correcting Bias with Weights

Weighted mean:

\[ \hat{\mu}_w \;=\; \frac{\sum_{i=1}^{n} w_i x_i}{\sum_{i=1}^{n} w_i}, \qquad w_i \;=\; \frac{1}{\pi_i} \]

Population vs. Biased Sample

-This is showing the true population vs the biased data of only part of the population

Example R Code

library(dplyr)

population %>%
  group_by(gender) %>%
  sample_n(50)