One of the things you have to do here is decide how you are going to collect the data. How many subjects are you going to have? What groups are there going to be what are you going to do differently for each group—these are called different treatments).
For us, this step is superfluous because we are handed a data set. But typically before you collect data you also decide how you are going to analyze it. What tests will you use? What hypotheses will you test? Often you make these decisions before you collect data. Again, because we are handed a data set, I am going to put these decisions off until after we have seen the data, which is not a best practice in statistics, but it is a little more tutorial and it is fine for our purposes.
sleep <- read.csv("sleep.csv")
sleep
## improve group
## 1 25.2 Unrest
## 2 14.5 Unrest
## 3 -7.0 Unrest
## 4 12.6 Unrest
## 5 34.5 Unrest
## 6 45.6 Unrest
## 7 11.6 Unrest
## 8 18.6 Unrest
## 9 12.1 Unrest
## 10 30.5 Unrest
## 11 -10.7 Deprived
## 12 4.5 Deprived
## 13 2.2 Deprived
## 14 21.3 Deprived
## 15 -14.7 Deprived
## 16 -10.7 Deprived
## 17 9.6 Deprived
## 18 2.4 Deprived
## 19 21.8 Deprived
## 20 7.2 Deprived
## 21 10.0 Deprived
We have a data set with 21 rows and 2 columns. The rows represent the subjects of the study (college students) and the columns represent variables. The first variable is “improve” the improvement of scores on a cognitive task over time. If a subject gets worse over time, “improve” is negative. The second column “group” is a binary categorical variable placing each subject in to one of two different groups: unrest and deprived. Subjects in the unrest group were allowed to enjoy unrestricted sleep for the course of the study. Subjects in the deprived were deprived of sleep, then allowed to try to catch up on sleep. There were 10 subjects in the unrest group and 11 subjects in the deprived group.
The purpose of the study is to assess the evidence that subjects can catch up on sleep, and perform equally well on a cognitive task with those not recently deprived of sleep, or alternatively, that sleep deprivation impairs subjects beyond a period of catching up.
It is always a good idea to visualize your data, as soon as you load it. Technically this should be done after you design the study, but as mentioned above, we are going to switch the order around.
Load graphics library, then plot:
library(ggplot2)
ggplot(data=sleep, mapping=aes(x=group, y=improve)) + geom_point()
The plot suggests that the unrest group may improve more than the deprived group. However, there is a lot of intersubject variability, and the sample sizes are pretty low, so we don’t know, without doing a test of significance whether or not the pattern we see could be due to chance, and that there no difference between the groups that would hold up when we look at the populations as a whole (e.g. all college students, and not just samples of 10 or 11). If this were the case, then the difference we see in the plot would be due bad luck as we drew the samples. We need to back this plot up with a p-value: the probability, if there is actually no difference between groups, of seeing data as or more suggestive of a difference between groups than seen in the samples shown. If this probability is low, we would interpret that result as evidence against the statement that there actually is no difference between groups. The plot suggests an answer to our question. The p-value backs that suggestion up with solid evidence.
This should be a statement in terms of the parameters of the populations. First of all identify what the populations are. These are the groups from which the samples are drawn from. For example, if the samples are drawn from college students, then the unrest populations is all college students that could be allowed to enjoy unrestricted sleep, and the deprived population is all college students that could be given the other treatment. The samples are drawn from these populations. There are perhaps millions of students in the populations, whereas there are only 10 or 11 in the samples. Whereas, we usually only think of one of each possible population, the samples are one of many that could conceivably be drawn from the populations. A parameter is a number that describes a population, a statistic is a number that describes a sample. Can you think of the numbers we are interested in that describe the populations? How about population means? We say the null hypothesis is that the population mean for the unrest group is equal to the population mean of the deprived group. Typically this is expressed as the difference in means is zero.
This is the statement that the population means are different (unequal, less than, or greater than). The best choice here, since we have seen the data is the most conservative one: that they are unequal. This is always a good choice, and some statisticians recommend that you always make this choice. This is the default choice in R.
The choices here are t-test and proportions test. T test is for testing hypotheses about population means of a quantitative variable. Proportion tests are for testing hypotheses about population proportions of categorical variable, for example if the improve variable were categorial: “yes” or “no”. The correct choice is t-test.
The correct choice is two sample—we have a sample for unrest and a sample for deprived.
For the t-test, the main assumption is that the data lie close enough to a Normal (bell shaped) distribution. How close does it have to be? It depends on the sample size, the greater the sample size the more robust the t-test is to non-Normality. Actually even for small sample sizes (10 or 11) it is fairly robust, so unless there is strong skewness or substantial outliers we will be OK.
The best way of judging this is with a qq-plot.
ggplot(data=sleep) + geom_qq(mapping=aes(sample=improve, color=group))
If the data are Normal, they will lie on line. This graphs shows that it is probably close enough. If it were significantly less Normal, you should note this, and continue.
It is always a safe bet to use the traditional level of significance 0.05.
t.test(formula=improve~group, data=sleep)
##
## Welch Two Sample t-test
##
## data: improve by group
## t = -2.6851, df = 17.557, p-value = 0.01535
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -28.398779 -3.441221
## sample estimates:
## mean in group Deprived mean in group Unrest
## 3.90 19.82
Since the p-value is less than the level of significance, we REJECT the null hypothesis that the means are equal.
The confidence interval is the range of plausible values for the difference in means. Zero is not in this interval. Therefore 0 is not a plausible value for the difference in means, so it is not plausible that the means are the same. The result of STEP 15 is consistent with the result of STEP 14.
We have concluded that the means are not equal, but we really want to know: is it better to be sleep deprived or enjoy unrestricted sleep? Knowing that the means are unequal we can answer this question by looking the sample estimates unrest improved more than deprived.
We have evidence that sleep deprived students are still impaired on cognitive tasks relative to students who enjoy unrestricted sleep, even after they have a chance to try to catch up on sleep.