Each lab in this course will have multiple components. First, there will a piece like the document below, which includes instructions, tutorials, and problems to be addressed in your write-up. Any part of the document below marked with a star, \(\star\), is a problem for your write-up. Second, there will be an R script where you will do all of the computations required for the lab. And third, you will complete a write-up to be turned in the Friday that you did your lab.
If you are comfortable doing so, I strongly suggest using RMarkdown to type your lab write-up. However, if you are new to R, you may handwrite your write-up (I’m also happy to work with you to learn RMarkdown!). All of your computational work will be done in RStudio Cloud, and both your lab write-up and your R script will be considered when grading your work.
In this lab, we will conduct hypothesis tests using R, including built-in functions that R has for this purpose, and practice justifying the assumptions that go into hypothesis testing. We will do this using real historical medical data sets.
In this part of the lab, we will work with the data set normtemp. The head of the data frame is shown below.
| temperature | gender | hr |
|---|---|---|
| 96.3 | 1 | 70 |
| 96.7 | 1 | 71 |
| 96.9 | 1 | 74 |
| 97.0 | 1 | 80 |
| 97.1 | 1 | 73 |
| 97.1 | 1 | 75 |
This data set contains measurements of body temperature and heart rate for 130 subjects, along with the gender of the subject, and was compiled by Allen L. Shoemaker for the 1996 article “What’s Normal? - Temperature, Gender, and Heart Rate” in the Journal of Statistics Education. The three variables in the data set have the following definitions:
We will use this data set to investigate the oft-claimed statistic that the mean body temperature of a healthy adult is 98.6 degrees Fahrenheit.
(\(\star\)) State appropriate null and alternative hypotheses that allow an hypothesis test to answer the question “Is the mean body temperature of a healthy adult not equal to 98.6 degrees Fahrenheit?”
In your R script, compute the p-value for the test you defined in problem 1. Do this by having R compute each necessary component of the p-value formula, but do not us a built-in R function to carry out the test.
(\(\star\)) Report the p-value in your write-up and justify your calculation in problem 2. Use exploratory visualizations to help explain why your assumptions are valid.
(\(\star\)) State appropriate null and alternative hypotheses that allow an hypothesis test to answer the question “Is the mean body temperature of a healthy adult different for men and women?”
(\(\star\)) In your R script, compute the carry out the test you defined in problem 4, for \(\alpha = 0.01\). Do this by having R compute each necessary component of the relevant formula, but do not us a built-in R function to carry out the test. Report the result in your write-up and interpret your conclusion in the context of the question.
We have another data set, loaded as sleep, that records the effect of two different sleep treatment drugs on 10 different patients. The entire data set is displayed below.
| extra | group | ID |
|---|---|---|
| 0.7 | 1 | 1 |
| -1.6 | 1 | 2 |
| -0.2 | 1 | 3 |
| -1.2 | 1 | 4 |
| -0.1 | 1 | 5 |
| 3.4 | 1 | 6 |
| 3.7 | 1 | 7 |
| 0.8 | 1 | 8 |
| 0.0 | 1 | 9 |
| 2.0 | 1 | 10 |
| 1.9 | 2 | 1 |
| 0.8 | 2 | 2 |
| 1.1 | 2 | 3 |
| 0.1 | 2 | 4 |
| -0.1 | 2 | 5 |
| 4.4 | 2 | 6 |
| 5.5 | 2 | 7 |
| 1.6 | 2 | 8 |
| 4.6 | 2 | 9 |
| 3.4 | 2 | 10 |
The variables correspond to
and this data was collected and published in 1905 by Cushy and Peebles, “The action of optical isomers: II hyoscines” in The Journal of Physiology. Student (the pseudonym of William Sealy Gosset) used this data set as an example in his 1908 Biometrika paper “The probable error of the mean”, in which he introduced the t distribution.
In order to make this data set a little easier to deal with, we will convert it to a wide data frame with two variables.
data(sleep)
sleep.wide <- sleep %>%
spread("group", "extra") %>%
dplyr::select(-ID) %>%
rename(Y1 = 1, Y2 = 2)
| Y1 | Y2 |
|---|---|
| 0.7 | 1.9 |
| -1.6 | 0.8 |
| -0.2 | 1.1 |
| -1.2 | 0.1 |
| -0.1 | -0.1 |
| 3.4 | 4.4 |
| 3.7 | 5.5 |
| 0.8 | 1.6 |
| 0.0 | 4.6 |
| 2.0 | 3.4 |
In this wide data frame, we have variables Y1 and Y2, which represent the extra amount of sleep corresponding to each drug. The ID column has been removed, as each row of the data frame corresponds to an individual subject.
The R function t.test (which you used back in Lab 3 on Confidence Intervals) will run a hypothesis test on one or two data sets. Read the documentation of the t.test function (?t.test in the console) to refamiliarize yourself with the arguments of this function.
(\(\star\)) If we want to determine if there is a difference in the mean extra hours of sleep for one drug over the other, state appropriate null and alternative hypotheses.
In your R script, use the t.test function to determine if the difference is statistically significant at the level \(\alpha = 0.01\).
(\(\star\)) In your write-up, justify your choices of arguments in t.test by explaining what type of test is appropriate in this case. Interpret the result of your hypothesis test.