Lecture 7 Is A different than B (t-tests etc)

Eamonn Mallon
20/09/2020

Occam's razor applied to statistical models

mod_diamond2 <- lm(lprice ~ lcarat + color + cut + clarity, data = diamonds2)

No point carrying out an analysis that is more complicated than it has to be
The tests we will look at in the next two sessions, the classical tests, deal with some of the most frequent types of analysis
e.g. men's height vs women's, height versus weight etc.
FYI the R code is an example of a linear model, more on those in BS1070/MB1080

Todays' tests

t test (comparing two sample means with normal residuals)
wilcoxon's test (comparing two sample means with non-normal residuals)

Student's t-test and Guinness

Guinness
Student was the pseudonym of W.S. Gosset (1876 - 1937)
Head Experimental Brewer, small-sample, stratified, and repeated balanced experiments on barley for proving the best yielding varieties
Gosset was a friend of both Pearson and Fisher, a noteworthy achievement, for each had a massive ego and a loathing for the other. He was a modest man who once cut short an admirer with this comment: “Fisher would have discovered it all anyway.”
Other awesome Guinness ads

The t-test

plot of chunk unnamed-chunk-2

how likely is it that the two sample means were drawn from populations with the same average?
calculate a test statistic
how likely that we obtain a test statistic this big or bigger if the null hypothesis is true
- compare the calculated test statistic to the critical value which is calculated on the assumption that the null hypothesis is true
quick test: what is the null hypothesis when comparing two means?

The t-test

t = \( \frac{difference\, between\, two\, means}{standard\, error\, of\, the\, difference} \)
t = \( \frac{\bar{y}_A-\bar{y}_B}{S.E.D} \)
- Lecture 3 explains the standard error of the mean (an estimate of how far the sample mean is likely to be from the population mean)
- For two independent variables, the variance of a difference is the sum of the separate variances
- \( S.E.M =\sqrt{\frac{s^2}{n}} \)
- \( S.E.D =\sqrt{\frac{s_A^2}{n_A}+\frac{s_B^2}{n_B}} \)
t = \( \frac{\bar{y}_A-\bar{y}_B}{\sqrt{\frac{s_A^2}{n_A}+\frac{s_B^2}{n_B}}} \)

R code for a t-test

library(SMPracticals)#Data is in this package
t.test(formula = height ~ type,  # Formula
       data = darwin) # Dataframe containing the variables


    Welch Two Sample t-test

data:  height by type
t = 2.4371, df = 22.164, p-value = 0.02328
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 0.3909566 4.8423767
sample estimates:
mean in group Cross  mean in group Self 
           20.19167            17.57500

Outcrossed plants (mean +/- 95% confidence intervals: 20.19(0.39)) are larger than selfed plants (17.58 (4.48)) (t-test: t = 2.4371, df =22.164, p = 0.02328)

Wilcoxon test

plot of chunk unnamed-chunk-4

When the residuals are non-normal
Also know as a Mann-Whitney test
Rank all the data together
Add up the ranks for each treatment
compare the smaller value to a critical value

R code for a wilcoxon test

wilcox.test(formula = len ~ supp,  # Formula
       data = ToothGrowth, exact=FALSE) # Dataframe containing the variables


    Wilcoxon rank sum test with continuity correction

data:  len by supp
W = 575.5, p-value = 0.06449
alternative hypothesis: true location shift is not equal to 0

There is no significant difference between supplement types on their effect on tooth growth (Wilcoxon Rank-Sum Test: W= 575.5, n = 60, p = 0.06449)