P-hacking

2025-11-16

An Introduction to P-values

A p-value is a statistical measure used to test hypotheses
It shows how likely the data is to be a result of random noise assuming the null hypothesis is true
Low p-values (< 0.05) are often treated as “significant”
Based on the t-value of a sample:

t-value formula

\[ t = \frac{\hat{\beta} - \beta_0}{SE(\hat{\beta})} \]

The p-value derived from this t-value by comparing it to a t-distribution table

What is P-Hacking?

Manipulating analyses to get a “significant” p-value (< 0.05)
Often done unintentionally — the researcher keeps testing until something “works”
Inflates the probability of false positives
Makes results look meaningful even when they’re just random noise

Common Forms of p-Hacking

Trying many variables and reporting only the significant one
Filtering or excluding data until p < 0.05 appears
Running the same test multiple ways and cherry-picking the best outcome
Stopping data collection early when results look good

A Simple Example: Height vs. Movie Theater Attendance

To demonstrate an example of p-hacking, I will be comparing height to movie theater attendance (in the last 3 months)
First, I will show the full dataset with no correlation
Then, I will show a cherry-picked subset that does have a “statistically significant” correlation

Height vs. Movie Theater Attendance

The correlation is -0.017 with a p-value of 0.907 (Not significant).

Height vs. Movie Theater Attendance

The correlation is 0.441 with a p-value of 0.021 (< 0.05, Significant).

R Code Chunk

# Create dataset
df <- data.frame(height_cm, movie_visits)

# Filtered dataset for p-hacking demonstration
df_filtered <- subset(df, height_cm >= 165 & height_cm <= 185)

# Full-data scatterplot with regression line
ggplot(df, aes(height_cm, movie_visits)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE)

# Filtered scatterplot with regression line
ggplot(df_filtered, aes(height_cm, movie_visits)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE)

How to Avoid P-Hacking in Your Own Research

Pre-register your hypotheses and analysis plan
Decide on data cleaning rules before looking at results
Avoid running many tests “just to see what works”
Report all analyses, not just the significant ones
Focus on effect sizes and confidence intervals, not only p-values
Be transparent about filtering, exclusions, and model choices

Conclusion

p-hacking can make random noise look like a real effect
Small decisions (filtering, variable choices, repeated tests) can create “significance”
My example showed how slicing data changed a non-correlation into a significant one
Good research requires transparency, consistency, and skepticism
p-values are tools — but only meaningful when used responsibly