Brief Overview of P-value in Statistics

2023-09-14

Definition and Distribution

-The p-value is the probability under the null hypothesis of obtaining a real-valued test statistic at least as extreme as the one obtained.

-The p-value is used in the context of null hypothesis testing in order to quantify the statistical significance of a result, the result being the observed value of the chosen statistic T.

-When a null hypothesis of the form \(H_0\) : \(\theta\) = \(\theta_0\) is true, and the underlying random variable is continuous, then the probability distribution of the p-value is uniform on the interval [0,1].

-The distribution of p-values may be biased toward 0 or 1 depending how the true parameter \(\theta\) relates to the critical value(s) \(\theta_0\) that define \(H_0\); by design the distribution will be biased toward 0 when the alternative hypothesis is true.

Probability of P-value

The Null Hypothesis

The Alternative Hypothesis

Interpreting P-Values

P-values measure evidence against the null hypothesis (\(H_0\)).
Small p-value (\(p < \alpha\)) suggests strong evidence against \(H_0\).
Large p-value (\(p \geq \alpha\)) suggests weak evidence against \(H_0\).
\(p(x) = \int_{t}^{+\infty} f(t) dt\)

Usage and Calculation of P-value

The p-value equation: \[ p(x) = \int_{t}^{+\infty} f(t) dt \]
Here, \(p(x)\) is the p-value, \(f(t)\) is the probability density function of the test statistic, and \(t\) is the observed test statistic.
It quantifies the probability of observing data as extreme as what was observed, assuming \(H_0\) is true.

Scenario: effectiveness of a new drug in reducing blood pressure is tested.

Null Hypothesis (\(H_0\)): The drug has no effect on blood pressure (mean blood pressure change = 0).
Alternative Hypothesis (\(H_a\)): The drug has an effect on blood pressure (mean blood pressure change ≠ 0).
The corresponding patient data is sampled and utilized for the test statistic, \(t = 2.5\).

The p-value is calculated as shown: \[ p(x) = \int_{2.5}^{+\infty} f(t) dt \]
If the calculated p-value is less than the significance level (\(\alpha\)), in this case 0.05, then the null hypothesis is rejected. Thus, the drug does have a significant effect on blood pressure.
In this scenario, the p-value quantifies the probability of observing a test statistic as extreme as 2.5 given that the drug has no effect.

The code written to create histogram of p-values

set.seed(123)
n_points = 100
data = data.frame(p_value = runif(n_points))

#histogram
ggplot(data, aes(x = p_value)) +
  geom_histogram(binwidth = 0.05, fill = "hotpink", color = "black") +
  labs(title = "P-Values according to Null Hypothesis",
       x = "P-Value",
       y = "Frequency")

Work Cited

Wikimedia Foundation. (2023, September 8). P-value. Wikipedia. https://en.wikipedia.org/wiki/P-value