Lab 11

Introduction

Overview: In this lab exercise, you will explore associations between variables when data come from a paired design.

Objectives: At the end of this lab you will be able to:

Perform a paired t-test to compare paired means;
Perform McNemar’s test to compare paired proportions.

Part 0: Preparing for the lab

Create a subdirectory named Lab 11 in the PUBHBIO 2210 Labs directory you created in your OneDrive folder in Lab 1.
Download the four lab files from Carmen while in the RStudio server:
1. lab-11-paired-blank.html
2. lab-11-paired-blank.Rmd
3. lab-11-paired-worksheet-blank.docx
4. yoga.csv
If you have not downloaded all of these files, do so now.
Save the four downloaded files in the PUBHBIO 2210 Labs/Lab 11 directory (i.e., save the downloaded files in the Lab 11 directory or folder created). When working on labs, it is important to keep all related files in the same directory.
Change the author and date information in the lab header.
We will load a dataset from the yoga.csv file into R, using the read.csv() function.
The yoga.csv dataset is from a study of the potential stress-reduction benefits of hatha yoga. This study enrolled 50 women, and each participant came into the research center three times – once to participate in a 75 minute yoga session, once to participate in a movement control session (walking on a treadmill), and once to participate in a stationary control session (watching a video). For this lab we will only use data from the yoga session and the movement control session.
There are three outcome measures in this yoga.csv dataset: positive affect (which is roughly like happiness), the blood cytokine IL-6 (a marker of stress; higher is more stress), and hours of sleep. Both positive affect and IL-6 are continuous variables, and the sleep measure is binary, indicating whether or not the subject got at least 8 hours of sleep (yes/no).

In the code chunk below, read the dataset from the yoga.csv file into RStudio and store it in an object called yoga. Then, convert the yoga object to a tibble and print it. See labs 1 and 10 for help.

# Enter code here
yoga <- read.csv("yoga.csv")

yoga <- tibble::as_tibble(yoga)

yoga

## # A tibble: 50 × 7
##    Subject Positive.Affect.after.Yoga Positive.Affect.afte…¹
##      <int>                      <int>                  <int>
##  1    7002                         22                     19
##  2    7004                         33                     32
##  3    7005                         34                     29
##  4    7007                         30                     28
##  5    7010                         35                     29
##  6    7011                         17                     26
##  7    7013                         31                     25
##  8    7014                         29                     24
##  9    7015                         30                     21
## 10    7016                         30                     28
## # ℹ 40 more rows
## # ℹ abbreviated name: ¹Positive.Affect.after.Movement
## # ℹ 4 more variables: IL6.after.Yoga <dbl>,
## #   IL6.after.Movement <dbl>, hours.sleep.after.Yoga <chr>,
## #   hours.sleep.after.Movement <chr>

Part 1: Summarizing your data

Each outcome measure was obtained twice for each subject – once after the yoga session, and once after the movement session. Obtain summaries for all six outcome measurements by reporting the mean, SD, median, and range for the continuous outcomes and the frequencies and percents for the binary outcomes. You can use the inspect() command (or favstats() and tally() commands) to print these summaries. See labs 1 and 10 for help.

# Enter code here
# I couldn't get the original variable names to work
names(yoga) <- c("Subject",
                 "Positive_Affect_Yoga",
                 "Positive_Affect_Movement",
                 "IL6_Yoga",
                 "IL6_Movement",
                 "Sleep_Yoga",
                 "Sleep_Movement")

favstats(~ Positive_Affect_Yoga, data = yoga)

##  min    Q1 median Q3 max  mean       sd  n missing
##   17 29.25     32 36  44 32.22 5.838961 50       0

favstats(~ Positive_Affect_Movement, data = yoga)

##  min Q1 median    Q3 max  mean       sd  n missing
##   12 23   26.5 30.75  44 26.54 6.679881 50       0

favstats(~ IL6_Yoga, data = yoga)

##    min     Q1 median    Q3   max    mean       sd  n
##  0.339 0.9805  1.941 3.552 8.097 2.44988 1.866505 50
##  missing
##        0

favstats(~ IL6_Movement, data = yoga)

##    min    Q1 median     Q3   max    mean       sd  n
##  0.292 1.114 1.5865 2.8785 6.703 2.21982 1.582499 50
##  missing
##        0

tally(~ Sleep_Yoga, data = yoga, format = "percent")

## Sleep_Yoga
##  No Yes 
##  82  18

tally(~ Sleep_Movement, data = yoga, format = "percent")

## Sleep_Movement
##  No Yes 
##  76  24

STOP! Answer Question 1 now.

Part 2: Paired t-test

You will start by testing whether the mean positive affect was the same after the yoga session as after the movement session. Since the same women were measured twice—after yoga and after the movement control—we cannot use a regular two-sample t-test to compare means for each session. We don’t have two independent groups of subjects. These data are paired and so we have to use a paired t-test.

To perform the paired t-test to compare Positive.Affect.after.Yoga to Positive.Affect.after.Movement we will use the t.test function with the paired = TRUE parameter specified. This cannot be done using formulas, so we need to use an alternative form of the command. The with(yoga, ...) tells this t.test command which dataset to use, and we give the t.test() command each of the variables that constitute a pair.

For example,

# Not evaluated
with(mydata,
     t.test(Performance.after.Sitting, Performance.after.Running, paired = TRUE)
)

will perform a paired t-test to compare Performance.after.Sitting to Performance.after.Running using the dataset mydata.

STOP! Answer Question 2 now.

In the code chunk below, perform the appropriate hypothesis test to compare Positive.Affect.after.Yoga to Positive.Affect.after.Movement using the yoga dataset.

# Enter code here
with(yoga,
     t.test(Positive_Affect_Yoga, Positive_Affect_Movement, paired = TRUE)
)

## 
##  Paired t-test
## 
## data:  Positive_Affect_Yoga and Positive_Affect_Movement
## t = 6.651, df = 49, p-value = 2.305e-08
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  3.963796 7.396204
## sample estimates:
## mean difference 
##            5.68

The resulting output shows the results of the paired t-test. On the output you can see the mean of the differences and its corresponding 95% confidence interval, as well as the results of the paired t-test.

STOP! Answer Questions 3–4 now.

Next you will compare the mean IL-6 after the yoga session to the mean after the movement session in the yoga dataset. In the code chunk below, follow the same steps (above) as you did for positive affect, using the variables IL6.after.Yoga and IL6.after.Movement.

STOP! Answer Question 5 now.

# Enter code here
# Paired t-test for IL-6
with(yoga,
     t.test(IL6_Yoga, IL6_Movement, paired = TRUE)
)

## 
##  Paired t-test
## 
## data:  IL6_Yoga and IL6_Movement
## t = 0.71599, df = 49, p-value = 0.4774
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -0.415648  0.875768
## sample estimates:
## mean difference 
##         0.23006

STOP! Answer Question 6 now.

We know that the paired t-test is really just a one-sample t-test using the differences. You will confirm this by creating a new variable that contains the differences and perform a one-sample t-test to test whether the mean difference is equal to zero.

We can create a “difference” variable using the mutate() function. For example,

# Not evaluated
mydata <- mydata %>% mutate(Performance.difference =
                            Performance.after.Sitting - Performance.after.Running) 
inspect(mydata %>% select(Performance.after.Sitting, Performance.after.Running, Performance.difference))

creates a new varaible Performance.difference which equals Performance.after.Sitting minus Performance.after.Running in the mydata dataset. And also, summarizes the variables Performance.after.Sitting, Performance.after.Running, and Performance.difference via the inspect() and select() commands.

In the code chunk below, create a new variable IL6.difference which equals IL6.after.Yoga minus IL6.after.Movement in the yoga dataset. In addition, summarize the three variables (IL6.after.Yoga, IL6.after.Movement, and IL6.difference) using the inspect() and select() functions.

# Enter code here
yoga <- yoga %>%
  dplyr::mutate(IL6_difference = IL6_Yoga - IL6_Movement)

inspect(yoga %>% dplyr::select(IL6_Yoga, IL6_Movement, IL6_difference))

## 
## quantitative variables:  
##             name   class    min       Q1  median      Q3
## 1       IL6_Yoga numeric  0.339  0.98050  1.9410 3.55200
## 2   IL6_Movement numeric  0.292  1.11400  1.5865 2.87850
## 3 IL6_difference numeric -5.970 -1.00775 -0.1390 1.31475
##     max    mean       sd  n missing
## 1 8.097 2.44988 1.866505 50       0
## 2 6.703 2.21982 1.582499 50       0
## 3 6.413 0.23006 2.272045 50       0

Perform a two-sided one-sample t-test comparing the mean of IL6.difference to zero. See lab 5 for help.

# Enter code here
t.test(yoga$IL6_difference, mu = 0)

## 
##  One Sample t-test
## 
## data:  yoga$IL6_difference
## t = 0.71599, df = 49, p-value = 0.4774
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##  -0.415648  0.875768
## sample estimates:
## mean of x 
##   0.23006

STOP! Answer Questions 7–8 now.

Finally, look at the distribution of each of the IL-6 measurements (IL6.after.Yoga and IL6.after.Movement) and also the distribution of the difference (IL6.difference) using histograms. Here, for the gf_histogram() function, specify the parameter bins=8 for all the three histograms. See labs 4 and 5 for how to generate histograms.

# Enter code here
gf_histogram(~ IL6_Yoga, data = yoga, bins = 8)

gf_histogram(~ IL6_Movement, data = yoga, bins = 8)

gf_histogram(~ IL6_difference, data = yoga, bins = 8)

STOP! Answer Question 9 now.

Part 3: McNemar’s test

The last measurement you will look at is the variable that records whether or not a subject got at least 8 hours of sleep the night after the session.

We would like to compare the proportion of people who got 8 hours of sleep after the yoga session to the proportion of people who got 8 hours of sleep after the movement session. But it was the same group of people measured both times. So again we have paired data.

In the code chunk below, create a contingency table of hours.sleep.after.Yoga by hours.sleep.after.Movement named sleep.table using the tally() function. Here, use hours.sleep.after.Yoga as the outcome variable (y-variable) and hours.sleep.after.Movement as the predictor variable (x-variable). See lab 6 for help and when using the tally() function, ensure to specify the paramenter margins = FALSE (or exclude the parameter margins in the code for the tally() function when creating the contingency Table).

# Enter code here
sleep.table <- tally(~ Sleep_Yoga + Sleep_Movement, data = yoga, margins = FALSE)
sleep.table

##           Sleep_Movement
## Sleep_Yoga No Yes
##        No  34   7
##        Yes  4   5

STOP! Answer Question 10 now.

To compare these paired proportions we need McNemar’s test. Using the contingency table you just created (i.e., sleep.table), perform McNemar’s test using mcnemar.test(sleep.table) in the code chunk below.

# Enter code here
mcnemar.test(sleep.table)

## 
##  McNemar's Chi-squared test with continuity correction
## 
## data:  sleep.table
## McNemar's chi-squared = 0.36364, df = 1, p-value =
## 0.5465

STOP! Answer Questions 11-13 now.

Please turn in your completed worksheet (DOCX, i.e., word document), and your RMD file and updated HTML file to Carmen by the due date. Here, ensure to upload all the three (3) files before you click on the “Submit Assignment” tab to complete your submission.