Module 5: P-values and Hypothesis Testing

Learning Objectives:

Understand what a p-value is and how it can be useful.
Understand how to state relevant hypotheses in experiments.
Understand the difference between Type I and Type II Errors.

A statistical test always involves two hypotheses, a p-value and a conclusion.

Null Hypothesis(H0): This is the default hypothesis and it simply means that there is no influence of the explanatory variable on the response variable.
Alternative Hypothesis (Ha): The alternative hypothesis means that there is a significant influence of one variable on the other.

We can accept/reject the hypotheses based on a statistic called the “p-value”.

A p-value is the estimated probability of rejecting the null hypothesis of a study question when that hypothesis is true. Thus, a low p-value (<0.05) means that there is only a little probability that we observe a strong correlation between certain two variables by chance, which means we can reject the null hypothesis and accept the alternative hypothesis.

For example, let us consider a linear model as follows: First, we need to state the null and alternative hypothesis.

H0-> Ascorbic acid treatment has no effect on concentration.
Ha-> Ascorbic acid treatment has a significant effect on concentration.

ExampleData= read.csv("ExampleData.csv")

## Warning: cannot open file 'ExampleData.csv': No such file or directory

## Error: cannot open the connection

mod1=lm (concentration~treatment, data=ExampleData)

## Error: object 'ExampleData' not found

summary (mod1)

## Error: object 'mod1' not found

Let us look at the p-values of the model. For the treatment term, we see a p-value of 3.34e-08 which is much lower than the accepted p-value of 0.05. This means that we have statistically significant evidence to reject the null hypothesis and accept the alternative hypothesis.

Activity I:

Now you try! Use the First_Day file that you have been working on and try finding out whether or not raffinose has a significant effect on the concentration of bacteria being tested. (Remeber, you have to have converted the file to long format to proceed!)

#Enter your code here!

Answer the following questions:

How do you know raffinose has an influence on bacterial growth?
What are the null and alternative hypotheses in this case?
Does lactose have a significant effect on bacterial growth?

Making Errors:

In theory, the null hypothesis is either true or false- but the output of the statistical test gives us the probability that the null hypothesis is true based on the data we have collected. What this means is that, we can make wrong inferences from statistical tests. There may be times that we reject the null hypothesis when the null hypothesis is actually true, or vice versa. These errors are called Type I and Type II Errors.

Type I Error:

This is incorrectly rejecting the null hypothesis i.e. the null hypothesis is actually true, but the statistical test led us to believe that it is false. This situation is analogous to getting a false positive on a test.

Type II Error:

This is incorrectly rejecting the alternative hypothesis i.e. the alternative hypothesis is actually true, but the statistical test has not picked up on this difference. This error occurs due to small sample sizes.

The only way to reduce both the errors, is by increasing sample size!

Activity II:

Refer to your previous activity to answer the following questions:
Did you end up rejecting the null hypothesis?
What type of error would you be making?
If you suddenly found out that raffinose is a life-threatening substance that is going to be used in medical treatments, which error would you rather make- Type I or Type II?