Markdown Author: Audrey Salerno, 2022 Edited by: Jessie Bell, 2023

Practice Lab: Chapter 9 resources from Analysing Biological Data for t-tests

Running a Null Hypothesis Statistical Test (NHST)

What we are doing when we run a t-test is null hypothesis significance testing.

  1. Look at the data. (It is informative to visually examine data prior to running a NHST.)… Is it normal? Was the sample random?

  2. State the statistical hypotheses * Null hypothesis (depends on type of t-test) * Alternative (all other possibilities)

  3. Determine the degrees of freedom

    • df=n−1 for 1-sample t-test
    • df=n−2 for 2-sample t-test
    • df=Number of pairs - 1 for a paired t-test
  4. Choose criterion to reject/fail to reject H0

    For instance, set α=0.05. If p≤0.05 then reject H0, else fail to reject H0

  5. Calculate t

    The test statistic for all t-tests is t_calc (or t_stat). However, we calculate this value slightly different for one-sample, paired, and two-sample t-tests.

  6. Make a decision about H0

    • Two options: ‘reject H0’ or ‘fail to reject H0

    • ‘If the p is low, the null must go’

  7. Write a conclusion sentence

    Compose a summary sentence that makes a conclusion responding to the initial question. Read the following example: The mean length of salmon differed between Chuckanut and Squalicum Creek (two sample t-test, t(2)36=3.12,p=0.004)

Format is t(tails)df=t-test stat,p=calculated p-value

t.test() arguments

Using the ‘arguments’ in t.test() you can run any kind of t-test, a one-sided 1-sample t-test a two-sided paired t-test, a two-sided 2-sample t-test, etc. Recall that many arguments have default values. Read about the t.test() arguments on the help page. Examine ‘x,’ ‘y,’ ‘mu,’ ‘paired,’ and ‘var.equal.’ Following are some specific directions for each type of test.

t.test(x, y = NULL, alternative = c(“two.sided”, “less”, “greater”), mu = 0, paired = FALSE, var.equal = FALSE)

1-sample t-test

  • The one-sample t-test compares the mean of a random sample from a normal population with the population mean proposed in a null hypothesis.

  • The first information to enter into the function is the vector of data (this is x)

  • If the hypothetical μ=0, no other information is needed

  • If the hypothetical μ is another value like 8, specify as ‘mu = 8’.

paired t-test

  • We are testing in this test if the difference between means is 0 in our paired data.

  • the two vectors of data are added first, separated by a comma (one is ‘x’ and the other is ‘y’). R will pair them in order when ‘paired = TRUE’

  • If the hypothetical μ=0, no other information is needed (this is for the proprosed difference between the means). If the hypothetical μ is another value like 8, specify as ‘mu = 8’.

2-sample t-test

  • The two sample t-test compares the means of a numerical variable between two populations.

  • the two vectors of data are added first, separated by a comma (one is ‘x’ and the other is ‘y’). R will treat them as two different samples when ‘paired = FALSE’

  • Specify ‘var.equal = TRUE’ which will prevent the function from running a correction for unequal variances.

  • If the hypothetical \(\mu\) = 0, no other information is needed (this is for the proprosed difference between the means). If the hypothetical \(\mu\) is another value like 8, specify as ‘\(\mu\) = 8’

two- vs one-tailed test

This argument is ‘alternative’ and it has three options:

  • alternative = ‘two.sided’ # two-sided test H0:\(\mu\)= 0, Ha: \(\mu\)≠ 0
  • alternative = ‘less’ # one-sided test H0: \(\mu\ge\) 0, Ha: \(\mu\) < 0
  • alternative = ‘greater’ # one-sided test H0: \(\mu\le\) 0, Ha: \(>\) 0

Distributions

What is a null distribution?

The null distribution is the probability distribution of the test statistic when the null hypothesis is true. This is different for each question and each set of hypotheses you have.

Here is the information straight from Lab #4… (thanks Robin!)

This example about eating pizza comes from the web (thanks Stephanie Glen, a contributing writer at DataScienceCentral.com). Lets imagine you were opening a pizza restaurant in Bellingham, and you wanted to know if offering all you could eat pizza would be cost effective. Before you jump in, you do a bit of research. You get data from a national chain that suggests that on average, people visiting their restaurants eat 4 slices per visit. So your null hypothesis is that people in Bellingham that go out for pizza eat 4 slices per person per visit.

You go back and get the full set of data used to calculate the mean of 4 (a dataset on pizza eating with an n=1000 observations across the country) and you calculate a standard deviation of 1. To plot the null distribution, you can use dnorm().

4 slices is the most likely number of slices, but do you trust this estimate to be true for Bellingham? We are a college town and college students love all you can eat pizza. You decide to get some real data from Bellingham to assess the comparison to national averages. To do this you go to Fiamma Pizza in downtown Bellingham and take some data over the course of a week. Your data suggests that thje average number of slices eaten per person at this pizza restaurant in Bellingham is actually 5.6. How different is this than the expected value of 4 slices/person? Is this difference meaningful?

So if I had two hypotheses… Hnull: People in Bellingham that go out for pizza eat 4 slices per person per visit. Halt: People in Bellingham that go out for pizza eat more than 4 slices \(visit/person\).

Because our national data indicates a mean of 4 and sd of 1, that is what we would use to create the null distribution because if our null is true we would have this curve.

curve(dnorm(x,4,1),0,8)

What is a normal distribution?

What are characteristics of a normal distribution? (You should be able to answer this) (Our null distribution above is an example of a normal distribution)

Here are just some exampes of normal distributions…. note the changes in mean and sd that change the shape of the curve. And that they are still normal even though the x-limits are not symmetric around the mean and sd.

curve(dnorm(x,17,1),-5,25, col='green')
curve(dnorm(x,10,7),-5,25, add=TRUE)
curve(dnorm(x,5,3),-5,25, col='red', add=TRUE)

Next question…

What is a standard normal distribution??? What is the mean? Standard deviation?

curve(dnorm(x,0,1),-3,3)

T Distribution

T distributions are continuous, they help in estimating the mean of a normally distributed population when sample size is small and the population’s standard deviation is unknown.

How is it different from a Z-distribution? It is more conservative form of the standard normal distribution, also known as the z-distribution.

A t-distribution gives a lower probability to the center and a higher probability to the tails than the standard normal distribution.

#create density plots
curve(dt(x, df=6), from=-4, to=4, col='steelblue') 
curve(dt(x, df=10), from=-4, to=4, col='pink', add=TRUE)
curve(dt(x, df=30), from=-4, to=4, col='red', add=TRUE)

#add legend
legend(-4, .3, legend=c("df=6", "df=10", "df=30"),
       col=c("steelblue", "pink", "red"), lty=1, cex=1.2)

The degrees of freedom change the shape of the t-distribution because the t-distribution uses that to approximate the variance. As you increase df, the curve gets closer and closer to approximating a standard normal distribution.

Reflection

I was going to give you some practice examples, but I am going to start you all off with this. What I’ve made for you is something I would make for myself as a guide if I was studying for this exam. For all the other information that is necessary for the exam, I recommend creating your own guide for understanding the material that you are still confused about. Reach out when you need help and read your book and review the lectures.

Jessie’s textbook notes