KIN 6005 Assignment 3: Independent and Paired T-Tests

Complete all Exercises, and write out your interpretation in a summary report as outlined in the Summary section at the end of this page.

The goal of this assignment is to determine the differences in exercise capacity measures between males and females and in running data.

RStudio

Your RStudio window has four panels.

Your R Markdown file (this document) is in the upper left panel.

The panel on the lower left is where the action happens. It’s called the console. Everytime you launch RStudio, it will have the same text at the top of the console telling you the version of R that you’re running. Below that information is the prompt. As its name suggests, this prompt is really a request, a request for a command. Initially, interacting with R is all about typing commands and interpreting the output. These commands and their syntax have evolved over decades (literally) and now provide what many users feel is a fairly natural way to access data and organize, describe, and invoke statistical computations.

The panel in the upper right contains your workspace as well as a history of the commands that you’ve previously entered.

Any plots that you generate will show up in the panel in the lower right corner. This is also where you can browse your files, access help, manage packages, etc.

R Packages

R is an open-source programming language, meaning that users can contribute packages that make our lives easier, and we can use them for free. For this lab, and many others in the future, we will use the following R packages that are included in the tidyverse set:

dplyr: for data wrangling
ggplot2: for data visualization

The easiest way to get these packages is to install the whole tidyverse: install.packages("tidyverse") or run this line of code:

if(!require(tidyverse)){install.packages("tidyverse")}

Next, you need to load the packages in your working environment. We do this with the library function. Note that you only need to install packages once, but you need to load them each time you relaunch RStudio.

library(tidyverse)

To do so, you can

click on the green arrow at the top of the code chunk in the R Markdown (Rmd) file, or
highlight these lines, and hit the Run button on the upper right corner of the pane, or
type the code in the console.

Going forward you will be asked to load any relevant packages at the beginning of each lab.

Working Directory

The first thing that you should do is set your working directory, which is the folder in which you plan to run all your R stats, so obviously it will be unique to your computer. You can easily do this in R Studio by selecting the directory in the Files tab in the lower right panel and clicking More->Set As Working Directory

Dataset: Exercise Capacity Data

If you haven’t already, please download the data file from Canvas into your working directory. To get started with this assignment, run the following command to load the data:

VO2Data <- read_csv("VO2maxData (1).csv")

If R Studio outputted an error, most likely the working directory was not properly set or the filename entered does not exactly match the actual filename (R is very case sensitive!). If R Studio didn’t indicate any errors in loading in the data, then you’re good to go!

Variable Names

To view the names of the variables, type the command:

names(VO2Data)

## [1] "ID"      "Age"     "Weight"  "HR"      "Sex"     "VO2max"  "RPE"    
## [8] "nVO2max"

# assign data to variables
age <- VO2Data$Age
sex <- VO2Data$Sex
weight <- VO2Data$Weight
HR <- VO2Data$HR
VO2max <- VO2Data$VO2max

Independent t test

Exercise: Perform independent t-tests on the dependent variables to determine differences between males and females.

Edit this code by adding the variables of interest to this code, in which the first two variables of weight and heart rate HR have already been set.

# perform independent t-tests on these two dependent variables betweeen sexes
t.test(weight~sex)

## 
##  Welch Two Sample t-test
## 
## data:  weight by sex
## t = -5.3612, df = 92.068, p-value = 6.096e-07
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -19.158206  -8.800704
## sample estimates:
## mean in group 0 mean in group 1 
##        70.85324        84.83270

t.test(HR~sex)

## 
##  Welch Two Sample t-test
## 
## data:  HR by sex
## t = -1.0559, df = 71.928, p-value = 0.2945
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -11.723951   3.604689
## sample estimates:
## mean in group 0 mean in group 1 
##        139.4324        143.4921

# simply copy the t.test code, paste, and change the dependent variable to VO2max
t.test(VO2max~sex)

## 
##  Welch Two Sample t-test
## 
## data:  VO2max by sex
## t = -4.7754, df = 82.629, p-value = 7.668e-06
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -10.609258  -4.369944
## sample estimates:
## mean in group 0 mean in group 1 
##        38.91135        46.40095

t.test(age~sex)

## 
##  Welch Two Sample t-test
## 
## data:  age by sex
## t = 0.40774, df = 61.426, p-value = 0.6849
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
##  -3.231933  4.887874
## sample estimates:
## mean in group 0 mean in group 1 
##        31.62162        30.79365

Dataset: Carbohydrate vs. Carb+Protein Consumption

Now let’s perform a dependent t test for paired groups on running distance between carbohydrate and carbohydrate + protein shake. In this study, researchers would like to know whether a new carbohydrate-protein drink leads to a difference in performance compared to the carbohydrate-only sports drink. To do this, the researchers recruited 20 participants who each performed two trials in which they had to run as far as possible in 2 hours on a treadmill. In one of the trials they drank the carbohydrate-only drink and in the other trial they drank the carbohydrate-protein drink. The order of the trials was counterbalanced and the distance they ran in both trials was recorded.

Paired t test

Verify that the carb_vs_carbprotein.csv file is in your working directory. Import the dataset, if needed.

# paried t-test on macronutrient data
MacroData <- read_csv("carb_vs_carbprotein.csv")

Exercise: Assign data to a variable called carb_prot by copying the code for carb and pasting it under the carbohydrate + protein comment, and changing the variable after the “$” sign.

# carbohydrate
carb <- MacroData$carb
# carbohydrate + protein
carb_prot <- MacroData$carb_protein

Exercise: Get summary statistics of the running distances for both carb and carb_protein conditions.

# mean +/- sd running distance for carb consumption
summarise(MacroData, carbo = mean(carb), sd = sd(carb))

# mean +/- sd running distance for carb consumption
summarise(MacroData, carbo_protein = mean(carb_protein), sd = sd(carb_protein))

Now let’s determine whether the mean difference in running distance between the two macronutrient consumption conditions is statistically significant.

Exercise Perform a paired t test on the carb vs. carb + protein data by deleting the “#” before the t.test function and Knit.

# paired test
 t.test(carb, carb_prot, paired=TRUE)

## 
##  Paired t-test
## 
## data:  carb and carb_prot
## t = -6.3524, df = 19, p-value = 4.283e-06
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -0.18014508 -0.09085492
## sample estimates:
## mean difference 
##         -0.1355

Summary

Now press Knit and view the resulting page, which will display the results of your analysis above and the information needed to complete this assignment in an easy-to-read format.

In your summary report, you should be able to: * identify the research variables (independent, dependent), research questions, and nulll hypotheses in both scenarios * make conclusions about the differences in the dependent variables between male and female participants in the exercise capacity sample * tabulate the mean +/- SD running distances for carbohydrate only and carbohydrate + protein consumptions * make a conclusion about the difference in distance run between carbohydrate vs. carbohydrate + protein consumption

Resources for learning R and working in RStudio

That was a short introduction to R and RStudio, but we will provide you with more functions and a more complete sense of the language as the course progresses. You might find the following tips and resources helpful.

In this course we will be using the dplyr (for data wrangling) and ggplot2 (for data visualization) extensively. If you are googling for R code, make sure to also include these package names in your search query. For example, instead of googling “scatterplot in R”, google “scatterplot in R with ggplot2”.
The following cheathseets may come in handy throughout the course. Note that some of the code on these cheatsheets may be too advanced for this course, however majority of it will become useful as you progress through the course material.
While you will get plenty of exercise working with these packages in the labs of this course, if you would like further opportunities to practice we recommend checking out the relevant courses at DataCamp.

This is a derivative of an OpenIntro lab, and is released under a Attribution-NonCommercial-ShareAlike 3.0 United States license.