Complete all Exercises, and write out your interpretation in a summary report as outlined in the Summary section at the end of this page.
The goal of this assignment is to determine the differences in exercise capacity measures between males and females and in running data.
Your RStudio window has four panels.
Your R Markdown file (this document) is in the upper left panel.
The panel on the lower left is where the action happens. It’s called the console. Everytime you launch RStudio, it will have the same text at the top of the console telling you the version of R that you’re running. Below that information is the prompt. As its name suggests, this prompt is really a request, a request for a command. Initially, interacting with R is all about typing commands and interpreting the output. These commands and their syntax have evolved over decades (literally) and now provide what many users feel is a fairly natural way to access data and organize, describe, and invoke statistical computations.
The panel in the upper right contains your workspace as well as a history of the commands that you’ve previously entered.
Any plots that you generate will show up in the panel in the lower right corner. This is also where you can browse your files, access help, manage packages, etc.
R is an open-source programming language, meaning that users can
contribute packages that make our lives easier, and we can use them for
free. For this lab, and many others in the future, we will use the
following R packages that are included in the tidyverse
set:
dplyr: for data wranglingggplot2: for data visualizationThe easiest way to get these packages is to install the whole
tidyverse: install.packages("tidyverse") or run this line
of code:
if(!require(tidyverse)){install.packages("tidyverse")}
Next, you need to load the packages in your working environment. We
do this with the library function. Note that you only need
to install packages once, but you need to
load them each time you relaunch RStudio.
library(tidyverse)
To do so, you can
Going forward you will be asked to load any relevant packages at the beginning of each lab.
The first thing that you should do is set your working directory, which is the folder in which you plan to run all your R stats, so obviously it will be unique to your computer. You can easily do this in R Studio by selecting the directory in the Files tab in the lower right panel and clicking More->Set As Working Directory
If you haven’t already, please download the data file from Canvas into your working directory. To get started with this assignment, run the following command to load the data:
VO2Data <- read_csv("VO2maxData (1).csv")
If R Studio outputted an error, most likely the working directory was not properly set or the filename entered does not exactly match the actual filename (R is very case sensitive!). If R Studio didn’t indicate any errors in loading in the data, then you’re good to go!
Variable Names
To view the names of the variables, type the command:
names(VO2Data)
## [1] "ID" "Age" "Weight" "HR" "Sex" "VO2max" "RPE"
## [8] "nVO2max"
# assign data to variables
age <- VO2Data$Age
sex <- VO2Data$Sex
weight <- VO2Data$Weight
HR <- VO2Data$HR
VO2max <- VO2Data$VO2max
Independent t test
Exercise: Perform independent t-tests on the dependent variables to determine differences between males and females.
Edit this code by adding the variables of interest
to this code, in which the first two variables of weight
and heart rate HR have already been set.
# perform independent t-tests on these two dependent variables betweeen sexes
t.test(weight~sex)
##
## Welch Two Sample t-test
##
## data: weight by sex
## t = -5.3612, df = 92.068, p-value = 6.096e-07
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -19.158206 -8.800704
## sample estimates:
## mean in group 0 mean in group 1
## 70.85324 84.83270
t.test(HR~sex)
##
## Welch Two Sample t-test
##
## data: HR by sex
## t = -1.0559, df = 71.928, p-value = 0.2945
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -11.723951 3.604689
## sample estimates:
## mean in group 0 mean in group 1
## 139.4324 143.4921
# simply copy the t.test code, paste, and change the dependent variable to VO2max
t.test(VO2max~sex)
##
## Welch Two Sample t-test
##
## data: VO2max by sex
## t = -4.7754, df = 82.629, p-value = 7.668e-06
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -10.609258 -4.369944
## sample estimates:
## mean in group 0 mean in group 1
## 38.91135 46.40095
t.test(age~sex)
##
## Welch Two Sample t-test
##
## data: age by sex
## t = 0.40774, df = 61.426, p-value = 0.6849
## alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
## 95 percent confidence interval:
## -3.231933 4.887874
## sample estimates:
## mean in group 0 mean in group 1
## 31.62162 30.79365
Now let’s perform a dependent t test for paired groups on running distance between carbohydrate and carbohydrate + protein shake. In this study, researchers would like to know whether a new carbohydrate-protein drink leads to a difference in performance compared to the carbohydrate-only sports drink. To do this, the researchers recruited 20 participants who each performed two trials in which they had to run as far as possible in 2 hours on a treadmill. In one of the trials they drank the carbohydrate-only drink and in the other trial they drank the carbohydrate-protein drink. The order of the trials was counterbalanced and the distance they ran in both trials was recorded.
Paired t test
Verify that the carb_vs_carbprotein.csv file is in your working directory. Import the dataset, if needed.
# paried t-test on macronutrient data
MacroData <- read_csv("carb_vs_carbprotein.csv")
Exercise: Assign data to a variable called
carb_prot by copying the code for carb and
pasting it under the carbohydrate + protein comment, and changing the
variable after the “$” sign.
# carbohydrate
carb <- MacroData$carb
# carbohydrate + protein
carb_prot <- MacroData$carb_protein
Exercise: Get summary statistics of the running distances for both carb and carb_protein conditions.
# mean +/- sd running distance for carb consumption
summarise(MacroData, carbo = mean(carb), sd = sd(carb))
# mean +/- sd running distance for carb consumption
summarise(MacroData, carbo_protein = mean(carb_protein), sd = sd(carb_protein))
Now let’s determine whether the mean difference in running distance between the two macronutrient consumption conditions is statistically significant.
Exercise Perform a paired t test on the
carb vs. carb + protein data by deleting the “#” before the
t.test function and Knit.
# paired test
t.test(carb, carb_prot, paired=TRUE)
##
## Paired t-test
##
## data: carb and carb_prot
## t = -6.3524, df = 19, p-value = 4.283e-06
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -0.18014508 -0.09085492
## sample estimates:
## mean difference
## -0.1355
Now press Knit and view the resulting page, which will display the results of your analysis above and the information needed to complete this assignment in an easy-to-read format.
In your summary report, you should be able to: * identify the research variables (independent, dependent), research questions, and nulll hypotheses in both scenarios * make conclusions about the differences in the dependent variables between male and female participants in the exercise capacity sample * tabulate the mean +/- SD running distances for carbohydrate only and carbohydrate + protein consumptions * make a conclusion about the difference in distance run between carbohydrate vs. carbohydrate + protein consumption
That was a short introduction to R and RStudio, but we will provide you with more functions and a more complete sense of the language as the course progresses. You might find the following tips and resources helpful.
In this course we will be using the dplyr (for data
wrangling) and ggplot2 (for data visualization)
extensively. If you are googling for R code, make sure to also include
these package names in your search query. For example, instead of
googling “scatterplot in R”, google “scatterplot in R with
ggplot2”.
The following cheathseets may come in handy throughout the course. Note that some of the code on these cheatsheets may be too advanced for this course, however majority of it will become useful as you progress through the course material.
While you will get plenty of exercise working with these packages in the labs of this course, if you would like further opportunities to practice we recommend checking out the relevant courses at DataCamp.
This is a derivative of an OpenIntro lab, and is released under a Attribution-NonCommercial-ShareAlike 3.0 United States license.