Sleep Data Set Analysis

knitr::opts_chunk$set(echo = TRUE)

First we load the data

sleep <- read.csv("Sleep.csv")
sleep

##    improve    group
## 1     25.2   Unrest
## 2     14.5   Unrest
## 3     -7.0   Unrest
## 4     12.6   Unrest
## 5     34.5   Unrest
## 6     45.6   Unrest
## 7     11.6   Unrest
## 8     18.6   Unrest
## 9     12.1   Unrest
## 10    30.5   Unrest
## 11   -10.7 Deprived
## 12     4.5 Deprived
## 13     2.2 Deprived
## 14    21.3 Deprived
## 15   -14.7 Deprived
## 16   -10.7 Deprived
## 17     9.6 Deprived
## 18     2.4 Deprived
## 19    21.8 Deprived
## 20     7.2 Deprived
## 21    10.0 Deprived

We see that the data come in 21 rows and 2 columns. The rows represent the subject in the experiment while the columns represent variables. The first column is “improve” — the improvement score of subjects in a cognitive task over time. The second column is “group.” Subjects were randomly assigned to one of two groups: “unrest” and “deprived.” The unrest group enjoyed unrestricted sleep throughout the study. The deprived group was deprived of sleep then allowed to try to catch up.

Does the “unrest” group improve more on the cognitive test on average than the “deprived” group?

Basically, we want to know whether you can catch up on your sleep, or if you remain impaired.

Step 1: Design the experiment and collect data.

But, we already have data, so we might as well visualize it.

library(ggplot2)
ggplot(data=sleep, mapping=aes(x=group, y=improve)) +geom_point()

Our data suggest that it is better to get unrestricted sleep, but do we have evidence that our result was not due to chance with sampling?

Let’s frame a null hypothesis

Null Hypothesis: The mean of the improvement of students in the unrest group (population of all students) is the same as the mean of the improvement in the deprived group (population of all students) Often, this statement is written as the difference in means is zero.

Alternative Hypothesis: Choices are: the two means are not equal or that one is greater than the other (3 choices). The best choice is not equal, (two sided alternative) because that is more conservative

We have two different samples, so this will be a two sample test.

Sleep Data Set Analysis

Alexis Portnoy

11/29/2017