Overview: In this lab exercise, you will explore associations between variables when data come from a paired design.
Objectives: At the end of this lab you will be able to:
Create a subdirectory named Lab 11 in the
PUBHBIO 2210 Labs directory you created in your OneDrive
folder in Lab 1.
Download the four lab files from Carmen while in the RStudio server:
lab-11-paired-blank.htmllab-11-paired-blank.Rmdlab-11-paired-worksheet-blank.docxyoga.csvIf you have not downloaded all of these files, do so now.
Save the four downloaded files in the
PUBHBIO 2210 Labs/Lab 11 directory (i.e., save the
downloaded files in the Lab 11 directory or folder
created). When working on labs, it is important to keep all related
files in the same directory.
Change the author and date information in the lab header.
We will load a dataset from the yoga.csv file into
R, using the read.csv() function.
The yoga.csv dataset is from a study of the
potential stress-reduction benefits of hatha yoga. This study enrolled
50 women, and each participant came into the research center three times
– once to participate in a 75 minute yoga session, once to participate
in a movement control session (walking on a treadmill), and once to
participate in a stationary control session (watching a video). For this
lab we will only use data from the yoga session and the movement control
session.
There are three outcome measures in this yoga.csv
dataset: positive affect (which is roughly like
happiness), the blood cytokine IL-6 (a marker of
stress; higher is more stress), and hours of sleep.
Both positive affect and IL-6 are continuous variables, and the sleep
measure is binary, indicating whether or not the subject got at least 8
hours of sleep (yes/no).
In the code chunk below, read the dataset from the
yoga.csv file into RStudio and store it in an object called
yoga. Then, convert the yoga object to a
tibble and print it. See labs 1 and 10 for help.
# Enter code here
yoga <- read.csv("yoga.csv")
yoga <- tibble::as_tibble(yoga)
yoga
## # A tibble: 50 × 7
## Subject Positive.Affect.after.Yoga Positive.Affect.afte…¹
## <int> <int> <int>
## 1 7002 22 19
## 2 7004 33 32
## 3 7005 34 29
## 4 7007 30 28
## 5 7010 35 29
## 6 7011 17 26
## 7 7013 31 25
## 8 7014 29 24
## 9 7015 30 21
## 10 7016 30 28
## # ℹ 40 more rows
## # ℹ abbreviated name: ¹Positive.Affect.after.Movement
## # ℹ 4 more variables: IL6.after.Yoga <dbl>,
## # IL6.after.Movement <dbl>, hours.sleep.after.Yoga <chr>,
## # hours.sleep.after.Movement <chr>
Each outcome measure was obtained twice for each subject – once after
the yoga session, and once after the movement session. Obtain summaries
for all six outcome measurements by reporting the mean, SD, median, and
range for the continuous outcomes and the frequencies and percents for
the binary outcomes. You can use the inspect() command (or
favstats() and tally() commands) to print
these summaries. See labs 1 and 10 for help.
# Enter code here
# I couldn't get the original variable names to work
names(yoga) <- c("Subject",
"Positive_Affect_Yoga",
"Positive_Affect_Movement",
"IL6_Yoga",
"IL6_Movement",
"Sleep_Yoga",
"Sleep_Movement")
favstats(~ Positive_Affect_Yoga, data = yoga)
## min Q1 median Q3 max mean sd n missing
## 17 29.25 32 36 44 32.22 5.838961 50 0
favstats(~ Positive_Affect_Movement, data = yoga)
## min Q1 median Q3 max mean sd n missing
## 12 23 26.5 30.75 44 26.54 6.679881 50 0
favstats(~ IL6_Yoga, data = yoga)
## min Q1 median Q3 max mean sd n
## 0.339 0.9805 1.941 3.552 8.097 2.44988 1.866505 50
## missing
## 0
favstats(~ IL6_Movement, data = yoga)
## min Q1 median Q3 max mean sd n
## 0.292 1.114 1.5865 2.8785 6.703 2.21982 1.582499 50
## missing
## 0
tally(~ Sleep_Yoga, data = yoga, format = "percent")
## Sleep_Yoga
## No Yes
## 82 18
tally(~ Sleep_Movement, data = yoga, format = "percent")
## Sleep_Movement
## No Yes
## 76 24
You will start by testing whether the mean positive affect was the same after the yoga session as after the movement session. Since the same women were measured twice—after yoga and after the movement control—we cannot use a regular two-sample t-test to compare means for each session. We don’t have two independent groups of subjects. These data are paired and so we have to use a paired t-test.
To perform the paired t-test to compare
Positive.Affect.after.Yoga to
Positive.Affect.after.Movement we will use the
t.test function with the paired = TRUE
parameter specified. This cannot be done using formulas, so we need to
use an alternative form of the command. The with(yoga, ...)
tells this t.test command which dataset to use, and we give
the t.test() command each of the variables that constitute
a pair.
For example,
# Not evaluated
with(mydata,
t.test(Performance.after.Sitting, Performance.after.Running, paired = TRUE)
)
will perform a paired t-test to compare
Performance.after.Sitting to
Performance.after.Running using the dataset
mydata.
In the code chunk below, perform the appropriate hypothesis test to
compare Positive.Affect.after.Yoga to
Positive.Affect.after.Movement using the yoga
dataset.
# Enter code here
with(yoga,
t.test(Positive_Affect_Yoga, Positive_Affect_Movement, paired = TRUE)
)
##
## Paired t-test
##
## data: Positive_Affect_Yoga and Positive_Affect_Movement
## t = 6.651, df = 49, p-value = 2.305e-08
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## 3.963796 7.396204
## sample estimates:
## mean difference
## 5.68
The resulting output shows the results of the paired t-test. On the output you can see the mean of the differences and its corresponding 95% confidence interval, as well as the results of the paired t-test.
Next you will compare the mean IL-6 after the yoga session to the
mean after the movement session in the yoga dataset. In the
code chunk below, follow the same steps (above) as you did for positive
affect, using the variables IL6.after.Yoga and
IL6.after.Movement.
# Enter code here
# Paired t-test for IL-6
with(yoga,
t.test(IL6_Yoga, IL6_Movement, paired = TRUE)
)
##
## Paired t-test
##
## data: IL6_Yoga and IL6_Movement
## t = 0.71599, df = 49, p-value = 0.4774
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -0.415648 0.875768
## sample estimates:
## mean difference
## 0.23006
We know that the paired t-test is really just a one-sample t-test using the differences. You will confirm this by creating a new variable that contains the differences and perform a one-sample t-test to test whether the mean difference is equal to zero.
We can create a “difference” variable using the mutate()
function. For example,
# Not evaluated
mydata <- mydata %>% mutate(Performance.difference =
Performance.after.Sitting - Performance.after.Running)
inspect(mydata %>% select(Performance.after.Sitting, Performance.after.Running, Performance.difference))
creates a new varaible Performance.difference which
equals Performance.after.Sitting minus
Performance.after.Running in the mydata
dataset. And also, summarizes the variables
Performance.after.Sitting,
Performance.after.Running, and
Performance.difference via the inspect() and
select() commands.
In the code chunk below, create a new variable
IL6.difference which equals IL6.after.Yoga
minus IL6.after.Movement in the yoga dataset.
In addition, summarize the three variables (IL6.after.Yoga,
IL6.after.Movement, and IL6.difference) using
the inspect() and select() functions.
# Enter code here
yoga <- yoga %>%
dplyr::mutate(IL6_difference = IL6_Yoga - IL6_Movement)
inspect(yoga %>% dplyr::select(IL6_Yoga, IL6_Movement, IL6_difference))
##
## quantitative variables:
## name class min Q1 median Q3
## 1 IL6_Yoga numeric 0.339 0.98050 1.9410 3.55200
## 2 IL6_Movement numeric 0.292 1.11400 1.5865 2.87850
## 3 IL6_difference numeric -5.970 -1.00775 -0.1390 1.31475
## max mean sd n missing
## 1 8.097 2.44988 1.866505 50 0
## 2 6.703 2.21982 1.582499 50 0
## 3 6.413 0.23006 2.272045 50 0
Perform a two-sided one-sample t-test comparing the mean of
IL6.difference to zero. See lab 5 for help.
# Enter code here
t.test(yoga$IL6_difference, mu = 0)
##
## One Sample t-test
##
## data: yoga$IL6_difference
## t = 0.71599, df = 49, p-value = 0.4774
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -0.415648 0.875768
## sample estimates:
## mean of x
## 0.23006
Finally, look at the distribution of each of the IL-6 measurements
(IL6.after.Yoga and IL6.after.Movement) and
also the distribution of the difference (IL6.difference)
using histograms. Here, for the gf_histogram() function,
specify the parameter bins=8 for all the
three histograms. See labs 4 and 5 for how to generate histograms.
# Enter code here
gf_histogram(~ IL6_Yoga, data = yoga, bins = 8)
gf_histogram(~ IL6_Movement, data = yoga, bins = 8)
gf_histogram(~ IL6_difference, data = yoga, bins = 8)
The last measurement you will look at is the variable that records whether or not a subject got at least 8 hours of sleep the night after the session.
We would like to compare the proportion of people who got 8 hours of sleep after the yoga session to the proportion of people who got 8 hours of sleep after the movement session. But it was the same group of people measured both times. So again we have paired data.
In the code chunk below, create a contingency table of
hours.sleep.after.Yoga by
hours.sleep.after.Movement named sleep.table
using the tally() function. Here, use
hours.sleep.after.Yoga as the outcome variable (y-variable)
and hours.sleep.after.Movement as the predictor variable
(x-variable). See lab 6 for help and when using the tally()
function, ensure to specify the paramenter margins = FALSE
(or exclude the parameter margins in the code for the
tally() function when creating the contingency Table).
# Enter code here
sleep.table <- tally(~ Sleep_Yoga + Sleep_Movement, data = yoga, margins = FALSE)
sleep.table
## Sleep_Movement
## Sleep_Yoga No Yes
## No 34 7
## Yes 4 5
To compare these paired proportions we need McNemar’s test. Using the
contingency table you just created (i.e., sleep.table),
perform McNemar’s test using mcnemar.test(sleep.table) in
the code chunk below.
# Enter code here
mcnemar.test(sleep.table)
##
## McNemar's Chi-squared test with continuity correction
##
## data: sleep.table
## McNemar's chi-squared = 0.36364, df = 1, p-value =
## 0.5465
Please turn in your completed worksheet (DOCX, i.e., word document), and your RMD file and updated HTML file to Carmen by the due date. Here, ensure to upload all the three (3) files before you click on the “Submit Assignment” tab to complete your submission.