Now we are going to be talking about t-tests. Why use a t-test instead of a z-test? First, because z-tests assume that you know the actual standard deviation, which we almost never have. Second and more importantly, it is more robust to outliers and thus non-normality. Also, after about 100 people, the t and z are approximately the same. T-tests use degrees of freedom, which are the number of items that are free to vary. For example, if you have an average of 10 with three numbers and two numbers are 0 and 20, then you know that the last number is 10. So for every sample statistic that you want to estimate you lose a degree of freedom (might be slightly different for more complex cases, but this is a general principle).
Let’s generate some data to work with:
set.seed(123)
fideility = rnorm(100, 50, 2)
fideilityNonNorm = rbind(rnorm(50, 90, 2), rnorm(50, 40, 20))
PHQ9Pre = round(rnorm(100, 50, 4),0)
PHQ9Post = round(rnorm(100, 40, 4), 0)
genderSamp = c(1,0)
gender = as.factor(sample(genderSamp, 100, prob = c(.5, .5), replace = TRUE))
What are the assumptions of a t-test Unless paired, the two variables are independent Variances of the two variables are equal (can use Welch’s if not) Variables are normally distributed Data are collected from a simple random sample (no fancy sampling schemes)
One-sample t-test equation:https://www.google.com/search?newwindow=1&biw=1058&bih=778&tbm=isch&sa=1&ei=6iBTW-vfFOKOggeJraT4CQ&q=t-test+one-sample+equation&oq=t-test+one-sample+equation&gs_l=img.3..0i8i30k1.4770.9156.0.9383.20.19.0.0.0.0.350.2282.0j10j3j1.14.0….0…1c.1.64.img..6.14.2275…0j0i30k1.0.LjmbLBiCfwk#imgrc=5kLVLXBvT0gNSM:
One sample t-test. Let us assume that a fidelity score of 50 or higher means the person is exhibiting “fidelity”. If we are looking at higher scores then should this be a one-tailed or two-tailed test and why?
t.test(fideility, alternative = "greater", mu = 50)
##
## One Sample t-test
##
## data: fideility
## t = 0.99041, df = 99, p-value = 0.1622
## alternative hypothesis: true mean is greater than 50
## 95 percent confidence interval:
## 49.87769 Inf
## sample estimates:
## mean of x
## 50.18081
So now let’s try a two-sample test. What’s wrong with this test?
t.test(PHQ9Post, PHQ9Pre)
##
## Welch Two Sample t-test
##
## data: PHQ9Post and PHQ9Pre
## t = -18.761, df = 196.36, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.736356 -9.503644
## sample estimates:
## mean of x mean of y
## 39.87 50.49
Yes, these are pre and post scores from PHQ9Pre and PHQ9Post so we need to account for the relationship between the two.
t.test(PHQ9Post, PHQ9Pre, paired = TRUE)
##
## Paired t-test
##
## data: PHQ9Post and PHQ9Pre
## t = -18.247, df = 99, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.774847 -9.465153
## sample estimates:
## mean of the differences
## -10.62
t.test(PHQ9Post, PHQ9Pre, paired = TRUE, alternative = "less")
##
## Paired t-test
##
## data: PHQ9Post and PHQ9Pre
## t = -18.247, df = 99, p-value < 2.2e-16
## alternative hypothesis: true difference in means is less than 0
## 95 percent confidence interval:
## -Inf -9.653625
## sample estimates:
## mean of the differences
## -10.62
Let us go back to fideility. Maybe we are interested in understanding whether gender if affecting fideility.
t.test(fideility~gender)
##
## Welch Two Sample t-test
##
## data: fideility by gender
## t = 0.68943, df = 92.756, p-value = 0.4923
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.4780308 0.9864554
## sample estimates:
## mean in group 0 mean in group 1
## 50.31300 50.05879
The non-parametric version of a t-test Wilcox Rank Sum Test
Assumptions Variables are independent (unless paired) Data are at least ordinal (not binary) Equal vars (unless Welchs)
Example: Let us assume that we want to understand differences in fidelity among gender. Maybe we have a lot of people who are really good and then a lot who are ok. So here is the ugly distribution below.
hist(fideilityNonNorm)
So the t-test will not work for this distribution. So we need the Wilcox rank summed test. In the fidelity example, it will mean looking at all the values in both groups, then ranking them. For example, if one male has the score of 100 and that is the highest score, that person will receive a one. Then the person regardless of gender who has the second highest score will receive a two. If there is a tie, then the average of the rank is applied to both. The sum of the ranks is added for the two groups and then we evaluate if these sums are statistically significant from each other.
Drawbacks: These tests are less powerful, because they are using less data. When we use the actual values instead of the ranks we are getting more information so therefore we have more certainty.
See here for more details: http://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_nonparametric/BS704_Nonparametric4.html
Fidelity by gender, paired, and alternative all the same
wilcox.test(fideility~gender)
##
## Wilcoxon rank sum test with continuity correction
##
## data: fideility by gender
## W = 1393, p-value = 0.3188
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(PHQ9Post, PHQ9Pre, paired = TRUE)
##
## Wilcoxon signed rank test with continuity correction
##
## data: PHQ9Post and PHQ9Pre
## V = 25, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
wilcox.test(PHQ9Post, PHQ9Pre, paired = TRUE, alternative = "less")
##
## Wilcoxon signed rank test with continuity correction
##
## data: PHQ9Post and PHQ9Pre
## V = 25, p-value < 2.2e-16
## alternative hypothesis: true location shift is less than 0
Next week correlation, correlation with binary and ordinal variables, maybe bivariate regression