Biological Sampling and Interpretation

August 20, 2019

Good morning!

Week 6

Midsemester test nightmare

Week 6

What we will do today

Week 6

A little simulation of why we divide by \(n-1\), not \(n\) when calculating the standard deviation
Quick recap of the box1-box2 example
What is a t-test?
What different t-tests exist?
What is a t-distribution?
What is one-tailed vs. two-tailed testing?
How to do a t-test in R
What assumptions need to be met?

Why we divide by \(n-1\), not \(n\) when calculating the standard deviation

Week 6

pop = rnorm(100000) #base population

sd1 = NULL #initial sample 1 (with nothing in it)
sd2 = NULL #initial sample 2 (with nothing in it)

for (i in 1:1000) { #take 1000 samples, append them to sd1 and sd2
  s1 = sample(pop, 5) #take a sample of 5 from the population
  sd1 = append(sd1, sqrt(sum((s1 - mean(s1))^2)/(length(s1) - 1))) #sd with n-1
  sd2 = append(sd2, sqrt(sum((s1 - mean(s1))^2)/length(s1))) #sd with n
}

par(mfrow = c(1, 2)) #change parameter settings to a 1x2 plotting area
hist(sd1, xlim = c(0, 2)) #draw a histogram of 
abline(v = 1, col = 'red')
abline(v = mean(sd1), col = 'green')
hist(sd2, xlim = c(0, 2))
abline(v = 1, col = 'red')
abline(v = mean(sd2), col = 'green')

Why we divide by \(n-1\), not \(n\) when calculating the standard deviation

Week 6

Recap week 5

Week 6

pnorm(q = 160, mean = 164, sd = 6)
[1] 0.2524925

Is a value that we'd get 25% of the time by chance rare?

The story with the two boxes

Week 6

The simplest experiment

Week 6

Only one response variable, one predictor
The predictor variable is binomial (e.g. treated, non-treated)
For example, we can ask if the movie 'Scream 2' is scarier than the original 'Scream'.
We could measure heart rates (which indicate anxiety) during both films and compare them.

This situation can be analysed with a t-test:

scream1 = c(180, 165, 122, 156, 170) #max heart rates scream1
scream2 = c(190, 145, 100, 138, 166) #max heart rates scream2
t.test(scream1, scream2)

The t-test

Week 6

Independent t-test (or simply t-test)
- Compares two means based on independent data.
- E.g. data from different groups of people
Paired (or dependent) t-test
- Compares two means based on related data.
- E.g. data from the same people measured at different times.
- Data from ‘matched’ samples (e.g. before - after)
One-tailed vs. two-tailed testing

Rationale for the t-test (1)

Week 6

Two samples are collected and the sample means calculated. These means might differ by either a little or a lot. Our null hypothesis is There is no difference between the samples
If the samples come from the same population, then we expect their means to be roughly equal (give or take a little due to chance)
We compare the difference between the sample means that we collected to the difference between the sample means that we would expect to obtain if there were no effect (i.e. if the null hypothesis were true). We use the standard deviation(s) as a gauge of the variability in our samples. Why does the spread in the two samples matter?

Rationale for the t-test (2)

Week 6

If the difference between the samples we have collected is unusually large then we can assume one of two things:
- There is no 'effect' (difference between samples) but sample means in our population fluctuate a lot and we have, by chance, collected at least one or two atypical samples.
- OR: the two samples come from different populations and are typical of their respective parent population. In this scenario, the difference between samples represents a genuine difference between the samples (and so the null hypothesis is incorrect).

As the observed difference between the sample means gets larger and the spread around the means gets smaller, the more confident we become that the second scenario (above) is correct (i.e. that the null hypothesis should be rejected)

We need an objective metric, that takes into account both the difference between the samples AND their standard deviations

Rationale for the t-test (in brief)

Week 6

We need a metric (a test statistic!) that puts the difference between the samples into perspective with

the difference between the samples that we would expect by chance, and
the standard deviations of the two samples

This is called the t-statistic:

\[t = \frac{\text{observed difference - expected difference}}{\text{estimate of the standard deviations}}\]

In fact, the expected difference is mostly zero (this is the case in the following examples)

The t-statistic, the test statistic for a t-test

Week 6

\[ t = \frac{\bar{X_1}-\bar{X_2}}{\sqrt{\frac{s^2_p}{n_1} + \frac{s^2_p}{n_2}}} \]

\[ s^2_p = \frac{(n_1 - 1)s^2_1 + (n_2 - 1)s^2_2}{n_1 + n_2 -2} \]

The t-distribution

Week 6

Use the equivalent commands to rnorm(), pnorm(), and qnorm()

rt(100, df = 10); pt(q = 0, df = 10); qt(p = .025, df = 10)

Arachnophobia example

Week 6

Is arachnophobia (fear of spiders) specific to real spiders or is a picture enough?
Participants
- 12 arachnophobic individuals
Manipulation
- 6 participants were exposed to a real spider
- 6 were exposed to a picture of the same spider
Response variable: anxiety (using an imaginary anxiety meter…)
Our null hypothesis is: There is no difference in anxiety between seeing a real spider or a picture of a spider

realspider = c(3, 5, 3, 7, 8, 5)
spiderpicture = c(5, 6, 3, 8, 7, 8)

Arachnophobia example

Week 6

You can now organise your data in two ways, the so-called 'wide'- or 'long' format:

d_wide <- data.frame(realspider, spiderpicture)
d_long <- data.frame(treat = c(0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1),
                     anxiety = c(realspider, spiderpicture))
head(d_wide, 3)
  realspider spiderpicture
1          3             5
2          5             6
3          3             3
head(d_long, 3)
  treat anxiety
1     0       3
2     0       5
3     0       3

The t-test in R

Week 6

To do a t-test we use the function t.test()
Depending on the format of your data, you can use this function in two ways:

t.test(d_wide$realspider, d_wide$spiderpicture)
t.test(d_long$anxiety ~ d_long$treat) #or:
t.test(anxiety ~ treat, data = d_long)

What does the dollar sign do again…?

I recommend the long format, it is more versatile and easier to use when you have a lot of data!

The result however looks the same:

The t-test in R

Week 6

t.test(d_long$anxiety ~ d_long$treat)

    Welch Two Sample t-test

data:  d_long$anxiety by d_long$treat
t = -0.86966, df = 9.9746, p-value = 0.4049
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -3.562974  1.562974
sample estimates:
mean in group 0 mean in group 1 
       5.166667        6.166667

How do we interpret this output?

Comparing the t-value against a random t-distribution

Week 6

Apart from a recall of the data, the group means and a confidence interval for the difference between groups, we obtain a t-value, the degrees of freedom, and a p-value
We now compare our obtained t-value against a random t-distribution: how rare is our t-value, which reflects the difference between the groups and their standard deviations?

pt(q = -0.87, df = 10)
[1] 0.20235

This is our p-value! (in fact we need to multiply it by 2 to account for the two tails)

The t-test in R

Week 6

qt(p = .025, df = 10)
[1] -2.228139

Arachnophobia example: our conclusion

Week 6

We fail to reject our null hypotheis, but we cannot accept our null hypothesis

We cannot state that there is no difference in anxiety between seeing a real spider or a picture of a spider

We can only say that we don't have enough evidence to reject the null hypothesis, as we could be committing a type II error, and we don't know the probability for such an error (\(\beta\), see slides on power test!)

How to report the results of a t-test

Week 6

On average, participants did not experience greater anxiety from real spiders (6.1 \(\pm\) 0.79 s.e.) than from pictures of spiders (5.1 \(\pm\) 0.83 s.e.; t-test, p = 0.4).

or, if it were significant:

On average, participants did experience greater anxiety from real spiders (xy \(\pm\) xy s.e.) than from pictures of spiders (xy \(\pm\) x s.e.; t-test, p < xyz).

Paired t-test: example

Week 6

Does plant transpiration respond to treatment with carbon dioxide?
Null hypothesis: plant transpiration is not affected by carbon dioxide
Measure transpiration in 12 leaves before and after treatment with carbon dioxide
The measurements are paired: the 12 leaves are the same before and after the treatment
Could you create a fictive data frame for this example?

Paired t-test: example

Week 6

d1 = data.frame(transpiration = c(2, 4, 3, 4, 3, 5, 5, 4, 3, 6, 5, 4, 1, 2, 1, 4, 2, 3, 4, 3, 3, 2, 1, 4),
                co2 = rep(c('before', 'after'), each = 12))
head(d1)
  transpiration    co2
1             2 before
2             4 before
3             3 before
4             4 before
5             3 before
6             5 before

Paired t-test: example

Week 6

t.test(d1$transpiration ~ d1$co2, paired = T)

    Paired t-test

data:  d1$transpiration by d1$co2
t = -3.7607, df = 11, p-value = 0.003151
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -2.3778894 -0.6221106
sample estimates:
mean of the differences 
                   -1.5 
#set argument paired to TRUE

One- vs. two-tailed tests

Week 6

Depending on whether we expect differences between groups to occur in both directions or only in one direction, we use 1- or 2-tailed t-tests
In the spider example, differences could occur in both directions (those shown a picture could be more afraid): 2-tailed
If you test whether carrying backpacks makes people shorter (paired, before and after): 1-tailed (carrying backpacks can’t make you taller!

In R, choose alternative = 'two-sided', 'greater' or 'less' (by default the argument is set to 'two-sided':

t.test(d1$transpiration ~ d1$co2, paired = T, alternative = 'greater')

Assumptions of a t-test

Week 6

Both the independent t-test and the paired t-test are parametric tests based on the normal distribution. Therefore, they assume:
- The sampling distribution is normally distributed. In the paired t-test this means that the sampling distribution of the differences between scores should be normal, not the scores themselves (assumption of normality).
- Variances in the two samples are roughly equal (assumption of homogeneity of variance). This assumption however can be relaxed if variance heterogeneity is accounted for, and by default, R will do that!
- Scores are independent (unless in a paired t-test) (assumption of independence)

Example

Week 6

Research question: Does the body height of the rear half of the lecture theatre differ from the front half?
Think about how we should sample, how many samples?
Set up a data frame, once in the wide, once in the long format
Formulate a null hypothesis
Think of what kind of t-test we should use
Think of the assumptions. Are they veryfied?
Think about type I and type II errors!

Summary: the t-test soup

Week 6

Ingredients:

One continuous response variable
One binomial predictor variable
The mean and standard deviations for both groups

Method:

Mix in the two groups with their means and standard deviations
The outcome will be a t-statistic (your test statistic)
Compare against a random t-distribution with the same degrees of freedom
Decide whether your t-value is rare, medium-rare, or not rare at all by looking at the p-value…

What will we have learnt in Week 6?

Week 6

How to compute a t-test
What an independent and a paired t-test is, how they are used
- how to formulate a null hypothesis for a t-test
- how to run a t-test in R
- how to interpret the output of a t-test
How the argument ‘alternative’ is used (one- and two-tailed tests)
What the t-distribution is, how it relates to the normal distribution
What it means to commit a type I or type II error in a t-test
What assumptions need to be met in a t-test

Glossary Week 6

Week 6

t-statistic
t-distribution
t-test
independent t-test
paired t-test
one-tailed t-test
two-tailed t-test
p-value