Data analysis performed for Charlotte Lemgart’s physiology of exercise lab report
Research has shown mixed relationships between anaerobic and aerobic capacity. In this study hockey players participating in the NHL combine were tested both anaerobically and aerobically. The goal of this analysis is to test whether there is a relationship between their peak aerobic power and anaerobic power. Associations will be tested between the max aerobic test variables (both VO2 max in ml/kg/min and peak aerobic test watts), Wingate peak watts, and vertical jump average watts (Lewis equation, jump with a pause).
compare Wingate peak watts to VO2 max watts and VO2 max ml kg min to peak wingate watt kg compare lewis average pause watts to the peak watts and to VO2 max watts to see if there is any relationships and if jump capacity has an impact
Underlying this analysis is the belief that balanced and similiar training between the athletes would result in a positive relationship between these variables. However, it is possible that factors such as types of training varying between players, or personal characteristics may result in a weak or no relationship between these variables.
# Load necessary R packages
library(dplyr) #For data manipulation
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(reshape2) #For data manipulation related to plotting
library(ggplot2) #For plotting
library(pastecs) #For descriptive statistics
## Loading required package: boot
##
## Attaching package: 'pastecs'
##
## The following objects are masked from 'package:dplyr':
##
## first, last
# Set working directory - mac file system
setwd("~/physExLab/NHLcombinedata")
# Load data
nhldata<-read.csv("data.csv")
#Create weight in kg variable
nhldata<-mutate(nhldata, Weight_kg = Weight_lb/2.2)
#Create aerobic test peak watts / kg variable
nhldata<-mutate(nhldata, VO2max_watts_kg= VO2max_watts/Weight_kg)
#Create Lewis average watts / kg
nhldata<-mutate(nhldata, LewisAvg_.Pause_watts_kg= LewisAvg_.Pause_watts/Weight_kg)
#Create dataframe with only the variables we are interested in
nhldata<-select(nhldata,Weight_kg, WingatePeak_Watts, PeakWingate_W_kg, VO2max_l_min, VO2max_ml_kg_min, VO2max_watts, VO2max_watts_kg, LewisAvg_.Pause_watts, LewisAvg_.Pause_watts_kg)
Here is a summary of the resulting dataframe. As you can see there are missing values in most variables.
str(nhldata)
## 'data.frame': 100 obs. of 9 variables:
## $ Weight_kg : num 95.1 80 87.4 82.1 81.2 ...
## $ WingatePeak_Watts : int 1136 1017 1171 933 1089 1223 1210 917 1047 1191 ...
## $ PeakWingate_W_kg : num 12 12.7 13.4 11.4 13.4 14.1 13 12.8 13.9 14 ...
## $ VO2max_l_min : num 4.75 3.99 4.94 5.64 4.92 4.41 5.19 4.17 4.39 5.45 ...
## $ VO2max_ml_kg_min : num 50 49.9 56.5 68.7 60.6 50.6 55.8 57.9 58 63.8 ...
## $ VO2max_watts : int 520 400 520 520 520 520 520 440 480 520 ...
## $ VO2max_watts_kg : num 5.47 5 5.95 6.33 6.41 ...
## $ LewisAvg_.Pause_watts : num 1390 1330 1372 1357 1370 ...
## $ LewisAvg_.Pause_watts_kg: num 14.6 16.6 15.7 16.5 16.9 ...
summary(nhldata)
## Weight_kg WingatePeak_Watts PeakWingate_W_kg VO2max_l_min
## Min. : 69.68 Min. : 784 Min. : 9.00 Min. :3.640
## 1st Qu.: 80.83 1st Qu.:1057 1st Qu.:12.80 1st Qu.:4.383
## Median : 85.30 Median :1142 Median :13.40 Median :4.725
## Mean : 84.99 Mean :1135 Mean :13.42 Mean :4.773
## 3rd Qu.: 89.15 3rd Qu.:1195 3rd Qu.:14.00 3rd Qu.:5.185
## Max. :109.41 Max. :1502 Max. :15.90 Max. :6.280
## NA's :4 NA's :4 NA's :6
## VO2max_ml_kg_min VO2max_watts VO2max_watts_kg LewisAvg_.Pause_watts
## Min. :44.90 Min. :400.0 Min. :4.396 Min. :1125
## 1st Qu.:51.77 1st Qu.:480.0 1st Qu.:5.649 1st Qu.:1302
## Median :56.30 Median :520.0 Median :6.064 Median :1416
## Mean :56.41 Mean :505.5 Mean :5.981 Mean :1414
## 3rd Qu.:60.58 3rd Qu.:550.0 3rd Qu.:6.334 3rd Qu.:1508
## Max. :68.70 Max. :640.0 Max. :7.556 Max. :1865
## NA's :6 NA's :6 NA's :6 NA's :4
## LewisAvg_.Pause_watts_kg
## Min. :12.85
## 1st Qu.:15.94
## Median :16.62
## Mean :16.66
## 3rd Qu.:17.29
## Max. :20.63
## NA's :4
But we are really only interested in cases that have complete data - so need to eliminate cases with missing data (note - if we had plans for a publication we would be much more cautious about doing this)
As you can see we have gone from 100 subjects to 94 subjects - so 6 subjects had incomplete data in the variables of interest.
nhldata <- nhldata[complete.cases(nhldata), ]
str(nhldata)
## 'data.frame': 94 obs. of 9 variables:
## $ Weight_kg : num 95.1 80 87.4 82.1 81.2 ...
## $ WingatePeak_Watts : int 1136 1017 1171 933 1089 1223 1210 917 1047 1191 ...
## $ PeakWingate_W_kg : num 12 12.7 13.4 11.4 13.4 14.1 13 12.8 13.9 14 ...
## $ VO2max_l_min : num 4.75 3.99 4.94 5.64 4.92 4.41 5.19 4.17 4.39 5.45 ...
## $ VO2max_ml_kg_min : num 50 49.9 56.5 68.7 60.6 50.6 55.8 57.9 58 63.8 ...
## $ VO2max_watts : int 520 400 520 520 520 520 520 440 480 520 ...
## $ VO2max_watts_kg : num 5.47 5 5.95 6.33 6.41 ...
## $ LewisAvg_.Pause_watts : num 1390 1330 1372 1357 1370 ...
## $ LewisAvg_.Pause_watts_kg: num 14.6 16.6 15.7 16.5 16.9 ...
summary(nhldata)
## Weight_kg WingatePeak_Watts PeakWingate_W_kg VO2max_l_min
## Min. : 69.68 Min. : 784 Min. : 9.00 Min. :3.640
## 1st Qu.: 80.49 1st Qu.:1050 1st Qu.:12.80 1st Qu.:4.383
## Median : 84.95 Median :1142 Median :13.40 Median :4.725
## Mean : 84.73 Mean :1134 Mean :13.41 Mean :4.773
## 3rd Qu.: 89.07 3rd Qu.:1195 3rd Qu.:14.00 3rd Qu.:5.185
## Max. :109.41 Max. :1502 Max. :15.90 Max. :6.280
## VO2max_ml_kg_min VO2max_watts VO2max_watts_kg LewisAvg_.Pause_watts
## Min. :44.90 Min. :400.0 Min. :4.396 Min. :1125
## 1st Qu.:51.77 1st Qu.:480.0 1st Qu.:5.649 1st Qu.:1301
## Median :56.30 Median :520.0 Median :6.064 Median :1413
## Mean :56.41 Mean :505.5 Mean :5.981 Mean :1409
## 3rd Qu.:60.58 3rd Qu.:550.0 3rd Qu.:6.334 3rd Qu.:1500
## Max. :68.70 Max. :640.0 Max. :7.556 Max. :1865
## LewisAvg_.Pause_watts_kg
## Min. :12.85
## 1st Qu.:15.90
## Median :16.62
## Mean :16.63
## 3rd Qu.:17.17
## Max. :20.63
Descriptive summary of the variables of interest:
options(scipen=100) #supresses scientific notation of output
options(digits=2) #limits output to 2 significant digits
stat.desc(nhldata, basic=F) #runs basic stats
## Weight_kg WingatePeak_Watts PeakWingate_W_kg VO2max_l_min
## median 84.955 1142.50 13.400 4.725
## mean 84.725 1133.59 13.415 4.773
## SE.mean 0.647 13.15 0.119 0.056
## CI.mean.0.95 1.285 26.10 0.236 0.111
## var 39.387 16243.43 1.330 0.292
## std.dev 6.276 127.45 1.153 0.540
## coef.var 0.074 0.11 0.086 0.113
## VO2max_ml_kg_min VO2max_watts VO2max_watts_kg
## median 56.300 520.00 6.064
## mean 56.412 505.53 5.981
## SE.mean 0.557 5.70 0.065
## CI.mean.0.95 1.106 11.32 0.128
## var 29.174 3057.24 0.393
## std.dev 5.401 55.29 0.627
## coef.var 0.096 0.11 0.105
## LewisAvg_.Pause_watts LewisAvg_.Pause_watts_kg
## median 1413.0 16.624
## mean 1409.1 16.628
## SE.mean 15.1 0.119
## CI.mean.0.95 30.0 0.236
## var 21482.0 1.329
## std.dev 146.6 1.153
## coef.var 0.1 0.069
A look at the univariate distributions. These are separated out to account for the different Y axis scaling - there are other ways to deal with scaling but for now this seemed the best way (comparing apples to apples on the Y axes of all of these graphs)
Decided to use Boxplots - The box plot (a.k.a. box and whisker diagram) is a standardized way of displaying the distribution of data based on the five number summary: minimum, first quartile, median, third quartile, and maximum. Outliers are plotted as individual points.
Data with the common Y axis of Watts:
watts<-select(nhldata,LewisAvg_.Pause_watts, WingatePeak_Watts,VO2max_watts)
ggplot(melt(watts), aes(x = variable, y = value)) + geom_boxplot()
## No id variables; using all as measure variables
Data with the common Y axis of Watts/kg:
watts_kg<-select(nhldata,LewisAvg_.Pause_watts_kg, PeakWingate_W_kg, VO2max_watts_kg)
ggplot(melt(watts_kg), aes(x = variable, y = value)) + geom_boxplot()
## No id variables; using all as measure variables
Data with the common Y axis of oxygen consumption (one for L/min and one for L/kg/min):
vo2l<-select(nhldata,VO2max_l_min)
vo2lkg<-select(nhldata,VO2max_ml_kg_min)
ggplot(melt(vo2l), aes(x = variable, y = value)) + geom_boxplot()
## No id variables; using all as measure variables
ggplot(melt(vo2lkg), aes(x = variable, y = value)) + geom_boxplot()
## No id variables; using all as measure variables
Next we can compare bivariate distributions (i.e. scatter plots) to look at associations. This first draft will only include the scatter plots without assumptions of the best approach to fitting the data with a function (i.e. linear model).
There is clearly going to be a strong linear relationship between VO2max in L/min and in L/kg/min:
qplot(VO2max_l_min,VO2max_ml_kg_min,data=nhldata,geom=c("point"))
To help visualize what is happenning (why there is variability) we can weight the points based on the players weight - so that larger points indicate a heavier player. As you can see - heavier players may have larger VO2 max in L/min, but they have lower VO2 max in ml/kg/min - which makes sense.
qplot(VO2max_l_min,VO2max_ml_kg_min,data=nhldata,geom=c("point"),size = Weight_kg)
VO2 L/min and VO2 watts - here I have watts on the X axis because technically the power being produced is a determinant of the oxygen need (and therefore consumed). One thing to note is that VO2 max watts seems to be a factor variable (not continuous), this is because of the mode of testing the players. They use a Monark bike and each stage of the max test is a pre determined power. So the max watts achieved is based on the state. Therefore, variation in actual watts due to variations in pedal RPMs from the RPMs required in the test would not be captured.
qplot(VO2max_watts, VO2max_l_min,data=nhldata,geom=c("point"))
Here we look at the same plot as above but with the points weighted based on player weight. There is a much less clear pattern than the plot of VO2 L/min and VO2 ml/kg/min when weighting based on weight.
qplot(VO2max_watts, VO2max_l_min,data=nhldata,geom=c("point"),size = Weight_kg)
Wingate power - comparision between Watts and Watts/Kg:
qplot(WingatePeak_Watts, PeakWingate_W_kg,data=nhldata,geom=c("point"))
Wingate power - comparision between Watts and Watts/Kg, when weighting the points based on weight, just like in VO2, it is clear that those that have more mass have greater Watts, but not necessarily greater Watts/Kg.
qplot(WingatePeak_Watts, PeakWingate_W_kg,data=nhldata,geom=c("point"),size = Weight_kg)
One last set of “getting familiar with the data” analysis within each major variable. Comparing vertical jump average power watts to watts/kg. Expect the same as above - but a useful exercise in getting familiar with the data. As above - first without weight based on weight, and then with such weighting.
qplot(LewisAvg_.Pause_watts, LewisAvg_.Pause_watts_kg,data=nhldata,geom=c("point"))
qplot(LewisAvg_.Pause_watts, LewisAvg_.Pause_watts_kg,data=nhldata,geom=c("point"),size = Weight_kg)
Now it is time to get to the tasks: 1. Compare Wingate peak watts to VO2 max watts
qplot(WingatePeak_Watts, VO2max_watts,data=nhldata,geom=c("point"))
qplot(PeakWingate_W_kg, VO2max_ml_kg_min,data=nhldata,geom=c("point"))
qplot(LewisAvg_.Pause_watts, WingatePeak_Watts,data=nhldata,geom=c("point"),size = Weight_kg)
lewis average pause watts to the peak aerobic watts -with visualization of the impact of weight
qplot(LewisAvg_.Pause_watts, VO2max_watts,data=nhldata,geom=c("point"),size = Weight_kg)
Next we want to see if there is a pattern in the distribution of vertical jump power across the bivariate distribution of wingate and aerobic power
qplot(WingatePeak_Watts, VO2max_watts,data=nhldata,geom=c("point"),size = LewisAvg_.Pause_watts)
Finally we want to see if there is a pattern in the distribution of vertical jump power and body mass across the bivariate distribution of wingate and aerobic power - this may be too much in one graphic but it is a start:
qplot(WingatePeak_Watts, VO2max_watts,data=nhldata,geom=c("point"),size = LewisAvg_.Pause_watts, color = Weight_kg)
There could be more to come. Let’s start with the above and see where it takes the report. It may be interesting to put some of these into linear models and see what we get - but I want to do so cautiously, not just throw a model together to see what comes out of it. Once models are built, then the fit lines can be added to any / all of the graphics above.