Introduction to P-Values


  • P-values are widely used in statistical hypothesis testing.
  • They measure the probability of obtaining results as extreme as observed, assuming the null hypothesis is true.
  • A smaller P-value suggests stronger evidence against the null hypothesis.

Mathematical Definition of P-Values


  • The P-value is defined as:

\[ P = P(T \geq t | H_0) \]

where: - \(T\) is the test statistic. - \(H_0\) is the null hypothesis.


  • Another integral representation:

\[ P = \int_{t}^{\infty} f(x) dx \]

where \(f(x)\) is the probability density function.

Importance of P-Values


  • Used in scientific research to determine statistical significance.
  • Common threshold: \(P < 0.05\) is often considered significant.
  • Helps decide whether to reject the null hypothesis.

Example: Does Sleep Improve Memory?


Experiment Setup:

  • Null Hypothesis (\(H_0\)): Sleep has no effect on memory recall.
  • Alternative Hypothesis (\(H_A\)): Sleep improves memory recall.
  • A study tests participants’ recall ability before and after a full night’s sleep.

Simulating the Experiment



set.seed(42)
before_sleep <- rnorm(30, mean = 65, sd = 10)  
after_sleep <- rnorm(30, mean = 72, sd = 10)   


t_test_result <- t.test(before_sleep, after_sleep, paired = TRUE)
t_test_result$p.value  # Extract the P-value
## [1] 0.1304872

Boxplot: Comparing Memory Scores



Density Plot: Memory Recall Distribution



What is P-Hacking?


  • P-hacking occurs when researchers manipulate analyses to obtain significant results.
  • Common practices:
    • Running multiple tests and reporting only significant ones.
    • Stopping data collection early if \(P < 0.05\).
    • Selecting variables after seeing the data.

Avoiding P-Hacking


  • Pre-register your hypothesis before analyzing data.
  • Use correct statistical adjustments
  • Report effect sizes and confidence intervals, not just P-values.
  • Conduct replication studies to validate findings.

Annotated 3D Visualization of P-Values



Importance of the plot


  • Values near the peak (high Z) are likely outcomes under \(H_0\).
  • Values in the tails (low Z) are unlikely outcomes, leading to small P-values.
  • This visualization helps explain why small P-values indicate statistical significance.

Conclusion


  • P-values help determine statistical significance.
  • A small P-value suggests strong evidence against the null hypothesis.
  • Be cautious of P-hacking and always follow rigorous statistical practices.