Module 5: P-values and Hypothesis Testing

Learning Objectives:

A statistical test always involves two hypotheses, a p-value and a conclusion.

We can accept/reject the hypotheses based on a statistic called the “p-value”.

A p-value is the estimated probability of rejecting the null hypothesis of a study question when that hypothesis is true. Thus, a low p-value (<0.05) means that there is only a little probability that we observe a strong correlation between certain two variables by chance, which means we can reject the null hypothesis and accept the alternative hypothesis.

For example, let us consider a linear model as follows: First, we need to state the null and alternative hypothesis.

ExampleData= read.csv("ExampleData.csv")
## Warning: cannot open file 'ExampleData.csv': No such file or directory
## Error: cannot open the connection
mod1=lm (concentration~treatment, data=ExampleData)
## Error: object 'ExampleData' not found
summary (mod1)
## Error: object 'mod1' not found

Let us look at the p-values of the model. For the treatment term, we see a p-value of 3.34e-08 which is much lower than the accepted p-value of 0.05. This means that we have statistically significant evidence to reject the null hypothesis and accept the alternative hypothesis.

Activity I:

Now you try! Use the First_Day file that you have been working on and try finding out whether or not raffinose has a significant effect on the concentration of bacteria being tested. (Remeber, you have to have converted the file to long format to proceed!)

#Enter your code here!

Answer the following questions:

Making Errors:

In theory, the null hypothesis is either true or false- but the output of the statistical test gives us the probability that the null hypothesis is true based on the data we have collected. What this means is that, we can make wrong inferences from statistical tests. There may be times that we reject the null hypothesis when the null hypothesis is actually true, or vice versa. These errors are called Type I and Type II Errors.

Type I Error:

This is incorrectly rejecting the null hypothesis i.e. the null hypothesis is actually true, but the statistical test led us to believe that it is false. This situation is analogous to getting a false positive on a test.

Type II Error:

This is incorrectly rejecting the alternative hypothesis i.e. the alternative hypothesis is actually true, but the statistical test has not picked up on this difference. This error occurs due to small sample sizes.

The only way to reduce both the errors, is by increasing sample size!

Activity II: