Statistics: P-Value

2024-02-04

What is a P-value

The P-value is an important figure that is used in hypothesis testing. It is calculated from a set of data and is used in statistics to determine whether a conclusion is due to random chance or as a direct result of a variable. The larger the P-value, the more likely an effect is due to random chance.

Steps to calculate P-value

Determine hypothesis to be tested.
Find relevant variables, including assumed population proportion, sample proportion, Z-value, and sample size.
Calculate P-value using formula.
Alternatively, P-value can be calculated using software.

How test statistic Z is calculated

\[Z = {\hat{p} - p0 \over {\sqrt{p0 (1-p0)} \over n}}\]

\(\hat{p}\) = Sample Proportion
p0 = Assumed population proportion in the null hypothesis
n = sample size

How P-Value is calculated

\[Z = {\hat{p} - p0 \over {\sqrt{p0 (1-p0)} \over n}}\] Once Z has been found, we can refer to the z table to find appropriate P-value.
The P-value will always be between 0 and 1

When we need a P-value

P-value is useful when working with data that follows a normal distribution such as the following, which is the age of participants of a study investigating risk factors for heart disease. We can see the data creates a “bell curve.”

When we need a P-value

This shows the same plot, with the bell shape made more apparent.

When we need a P-value

The highlighted portion shows the point of interest being the “tail” of the graph, which shows the P-value.

Example Calculating P-value

Let’s say we are making the hypothesis that the mean(\(\mu\)) of our entire population is 56. First, we pick our population size. For this example we will pick 10. We also pick a \(\alpha\) value of .05

Here is a sample of 10, of which we will calculate the average. Also, we find the number of rows of our data to get our assumed population size.

head(heart_data$age, 10)

 [1] 63 37 41 56 57 57 56 44 52 57

nrow(heart_data)

[1] 303

Example continued

So from the data, we have
n = 10
\(\bar{x}\) = 52
p0 = 303
using our formula from before,
p-value = .997
supporting our hypothesis, or failing to reject our hypothesis.