2024-09-21

What is the P-value

The p-value or value of probability is the probability of an event happening. This can be applied in a number of scenarios, the probability of students passing a test or we can use the p-value to verify weather an experiment or test works. The value of probability can help us find the likely hood of a student getting an A on the next score based on their previous scores. It can also help us determine if a test on a subject is working. By using the probability of an observation happening we can determine its likely-hood of it happening in the future but also its likely-hood of happening in different scenarios and different conditions, given we can obtain the new studies standard deviation and mean.

Basic Probability

Find the probability of an event happening can be simple, for example: if we have 2 dice and we would like the probability of a certain value for example

##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]    2    3    4    5    6    7
## [2,]    3    4    5    6    7    8
## [3,]    4    5    6    7    8    9
## [4,]    5    6    7    8    9   10
## [5,]    6    7    8    9   10   11
## [6,]    7    8    9   10   11   12

P(8) = probability of getting 8 = (5/36) P(4) = 3/36 P(4 or 8) = probability of 4 or 8 = (5/36)+(3/36) = 8/36 P(4 \(\cap\) (!8) = probability of 4 and not 8 = 3/36 * (1- 5/36) While this is useful we will explore the way the p value is used in greater samples, for samples under 30 studies the use of t- test is recommended, for greater samples the z- test is more useful, and lastly the use of F-test can also be applied.

## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Representation of Dice values

If we take a look at the values in the dice matrix we can see the mean is 7, and the other values appear less in frequency as we move away from the mean. In a similar fashion the normalized distribution shows the mean at 7 and the values decreasing in frequency as we reach the ends.

P-value in biology

The p-value in biology can be used to determine weather an experiment works, if the probability is lower than the cut off we decide then we can determine that what’s happening is not normal,its not something that would probably normally happen.

In this case we will use the data from the data frame ‘iris’ we will use values from the species “Setosa”. We will use those values as our control group and the value of 5.8 as the height of a flower that’s part of a study done with a new fertilizer to test out the growth.

How to get the P-value

There are various ways to get the p value some include

Z-tests, like the name there is a chart called a z-table, it has the probabilities depending on the given z-value. z-value = \(z = \frac{x - \mu}{\sigma}\)

T-test, like the z-test there is a t-table with probabilities given a T-value t-value = \(t = \frac{x - \mu}{\frac{\sigma}{\sqrt{n}}}\)

F-test, also has an F-table for given f-values F-value = \(f = \frac{\sigma_1}{\sigma_2}\)

T-test

For the t test and the rest of the test we will use the standard deviation of growth of .352 and a mean growth of 5.006 for the control set and 5.8 average growth for the sample set. This gives us a variance of \(.352^2\) and for our confidence level we use 95 % making our significance level .05.

t-value = \(t = \frac{5.8 - 5.006}{.352}\) in this case we use 1 as the population since we are only testing one sample. t-value = \(\frac{.794}{.352}\) = 2.256 Looking at the t-table we can see that the t value is above that of a p-value of .25 but below that of .1 given this, the probability of a height of 5.8 is greater than .05 making this a normal occurrence and our conclusion does not work.

Graph for the t- test

In the previous graph we can see the solid line on the right side of the line, if we assume this is the line at 95 % confidence level and the t-test shows we are not outside the given confidence level which means the probability of the value happening is within the range for it to be considered normal meaning the fertilizer in this case by this conclusion is not working.

Z-test w p value Slide with R Output

The Z test in the same manner lets us choose the confidence level we want, which dictates the lowest probability we allow before we consider the result as too unlikely to happen naturally.

z-value = \(z = \frac{5.8 - 5.006}{.352}\) = 2.256 This then gives us a p value of the area to the left of our value meaning if we subtract from 1 we will obtain the probability of our sample data normally happening. p-value = \(1 - .9881\) = .012 but since we want to cover the possibility of a both a shorter or taller result we multiply by 2 making this a 2 tailed test. Making the p-value = .024 according to the z-table.

Since the probability is less than .05 we can conclude that the result would not normally happen so the fertilizer is making a change.

Graph for z- test

The Graph above shows two blue arrows which point towards the outsides of the curve. The probability of .012 we obtained is the area on the right side outside of the red solid line and under the curve, since we did a two tail test we have 2 areas and thus why we multiplied by 2 giving us the area of the right and left side in case the value was shorter than the mean. This resulted in a value of .024 which is less than .05 meaning by the z-test our sample can be considered as outside the normal range and the fertilizer is working.

Conclusion

Over all the use of the p- test can be used in many situations, mainly, it’s useful if we have different confidence intervals we have to test out, this way we don’t have to keep getting new t, z or f-values for every confidence level.

It is also useful to be able to use the different test given different sample sizes. Or to be able to use the f-test in similar fashion but also as a double purpose and verify the use of appropriate samples in experiments. By being able to test the variance differences within different samples can help make sure the samples used are similar and the outcome is not too influenced by the variability in the sample.

As seen previously we can use the p value to test weather a test works which is a form of hypothesis testing that can be used to confirm if a weight loss program works, or to predict future events or the likely hood of future events happening.