Understanding Biased Samples

From How to Read Numbers – Chapter 4

Genyang Li

What Is a Biased Sample?

A biased sample occurs when the group you study isn’t representative of the larger population you’re trying to understand.

  • Imagine measuring height only at a basketball convention → you’ll overestimate average height.
  • In statistics, bias ≠ opinion—it’s a systematic error in how data is collected.

Even large samples can be misleading if they’re biased.

Real-World Example: Lockdown Snacks

“Cheese on toast is Britain’s favourite lockdown snack!”
The Sun, April 2020

  • Based on a poll of 2,000 people by an online bank.
  • Sounds reliable… but who responded?
    • Likely: internet users who like filling out surveys.
    • Not necessarily: all UK adults.

👉 Key question: Does this sample reflect the whole population?

The Problem with Convenience Samples

Many polls use convenience sampling: - Twitter polls - Online surveys - Street interviews

But these groups are not random: - Twitter users skew younger, more urban, more politically engaged. - Landline phone polls miss mobile-only households.

More data ≠ better accuracy if the sample is biased.

Case Study: The Corbyn vs. Johnson Debate Poll

After a 2019 UK TV debate: - YouGov (landline + online): Johnson won 48%–46% - Twitter polls: Corbyn won by wide margins

Why the difference? - Twitter ≠ UK electorate - Younger, more Labour-leaning users dominate Twitter

Sampling bias, not media bias.

How Pollsters Try to Fix Bias: Weighting

When samples aren’t perfect, statisticians weight responses:

Example:
- Your survey has 40% women, but population is 50%. - You give each woman’s response 1.25× more weight.

This works only if you know the true population characteristics (e.g., from census data).

But weighting can’t fix everything—especially if key groups are missing entirely.

Leading Questions = Another Form of Bias

Bias isn’t just who you ask—it’s how you ask.

“Should 600 lives be saved?” → 72% say yes
“Should 400 people die?” → Only 22% say yes

Same outcome, different framing.

This is the framing effect—a cognitive bias that distorts responses.

Historical Lesson: The 1936 Literary Digest Poll

Predicted Alf Landon would beat FDR in the US presidential election.

  • Surveyed 2.4 million people!
  • Used phone directories and car registrations.

But in 1 936, phones and cars = wealthy → Republican-leaning.

Result? FDR won in a landslide.

Meanwhile, George Gallup surveyed just 50,000 people—but used a representative sample and got it right.

Key Takeaways

  1. Sample size isn’t everything—representativeness matters more.
  2. Online polls, social media surveys, and convenience samples are often biased.
  3. Weighting helps—but can’t fix missing voices.
  4. Always ask: Who was included? Who was left out?

“A biased sample gives you precise answers to the wrong question.”

Final Thought

“All models are wrong, but some are useful.”
— George Box

Similarly:
> All samples are imperfect—but only unbiased ones are trustworthy.

Be skeptical. Ask about methods. Demand transparency.