Lemgart Analysis - NHL draft combine data

Data analysis performed for Charlotte Lemgart’s physiology of exercise lab report

Research has shown mixed relationships between anaerobic and aerobic capacity. In this study hockey players participating in the NHL combine were tested both anaerobically and aerobically. The goal of this analysis is to test whether there is a relationship between their peak aerobic power and anaerobic power. Associations will be tested between the max aerobic test variables (both VO2 max in ml/kg/min and peak aerobic test watts), Wingate peak watts, and vertical jump average watts (Lewis equation, jump with a pause).

compare Wingate peak watts to VO2 max watts and VO2 max ml kg min to peak wingate watt kg compare lewis average pause watts to the peak watts and to VO2 max watts to see if there is any relationships and if jump capacity has an impact

Underlying this analysis is the belief that balanced and similiar training between the athletes would result in a positive relationship between these variables. However, it is possible that factors such as types of training varying between players, or personal characteristics may result in a weak or no relationship between these variables.

# Load necessary R packages
library(dplyr) #For data manipulation

## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(reshape2) #For data manipulation related to plotting
library(ggplot2) #For plotting
library(pastecs) #For descriptive statistics

## Loading required package: boot
## 
## Attaching package: 'pastecs'
## 
## The following objects are masked from 'package:dplyr':
## 
##     first, last

# Set working directory - mac file system
setwd("~/physExLab/NHLcombinedata")

# Load data
nhldata<-read.csv("data.csv")

#Create weight in kg variable
nhldata<-mutate(nhldata, Weight_kg = Weight_lb/2.2)

#Create aerobic test peak watts / kg variable
nhldata<-mutate(nhldata, VO2max_watts_kg= VO2max_watts/Weight_kg)

#Create Lewis average watts / kg
nhldata<-mutate(nhldata, LewisAvg_.Pause_watts_kg= LewisAvg_.Pause_watts/Weight_kg)

#Create dataframe with only the variables we are interested in
nhldata<-select(nhldata,Weight_kg, WingatePeak_Watts, PeakWingate_W_kg, VO2max_l_min, VO2max_ml_kg_min, VO2max_watts, VO2max_watts_kg, LewisAvg_.Pause_watts, LewisAvg_.Pause_watts_kg)

Here is a summary of the resulting dataframe. As you can see there are missing values in most variables.

str(nhldata)

## 'data.frame':    100 obs. of  9 variables:
##  $ Weight_kg               : num  95.1 80 87.4 82.1 81.2 ...
##  $ WingatePeak_Watts       : int  1136 1017 1171 933 1089 1223 1210 917 1047 1191 ...
##  $ PeakWingate_W_kg        : num  12 12.7 13.4 11.4 13.4 14.1 13 12.8 13.9 14 ...
##  $ VO2max_l_min            : num  4.75 3.99 4.94 5.64 4.92 4.41 5.19 4.17 4.39 5.45 ...
##  $ VO2max_ml_kg_min        : num  50 49.9 56.5 68.7 60.6 50.6 55.8 57.9 58 63.8 ...
##  $ VO2max_watts            : int  520 400 520 520 520 520 520 440 480 520 ...
##  $ VO2max_watts_kg         : num  5.47 5 5.95 6.33 6.41 ...
##  $ LewisAvg_.Pause_watts   : num  1390 1330 1372 1357 1370 ...
##  $ LewisAvg_.Pause_watts_kg: num  14.6 16.6 15.7 16.5 16.9 ...

summary(nhldata)

##    Weight_kg      WingatePeak_Watts PeakWingate_W_kg  VO2max_l_min  
##  Min.   : 69.68   Min.   : 784      Min.   : 9.00    Min.   :3.640  
##  1st Qu.: 80.83   1st Qu.:1057      1st Qu.:12.80    1st Qu.:4.383  
##  Median : 85.30   Median :1142      Median :13.40    Median :4.725  
##  Mean   : 84.99   Mean   :1135      Mean   :13.42    Mean   :4.773  
##  3rd Qu.: 89.15   3rd Qu.:1195      3rd Qu.:14.00    3rd Qu.:5.185  
##  Max.   :109.41   Max.   :1502      Max.   :15.90    Max.   :6.280  
##                   NA's   :4         NA's   :4        NA's   :6      
##  VO2max_ml_kg_min  VO2max_watts   VO2max_watts_kg LewisAvg_.Pause_watts
##  Min.   :44.90    Min.   :400.0   Min.   :4.396   Min.   :1125         
##  1st Qu.:51.77    1st Qu.:480.0   1st Qu.:5.649   1st Qu.:1302         
##  Median :56.30    Median :520.0   Median :6.064   Median :1416         
##  Mean   :56.41    Mean   :505.5   Mean   :5.981   Mean   :1414         
##  3rd Qu.:60.58    3rd Qu.:550.0   3rd Qu.:6.334   3rd Qu.:1508         
##  Max.   :68.70    Max.   :640.0   Max.   :7.556   Max.   :1865         
##  NA's   :6        NA's   :6       NA's   :6       NA's   :4            
##  LewisAvg_.Pause_watts_kg
##  Min.   :12.85           
##  1st Qu.:15.94           
##  Median :16.62           
##  Mean   :16.66           
##  3rd Qu.:17.29           
##  Max.   :20.63           
##  NA's   :4

But we are really only interested in cases that have complete data - so need to eliminate cases with missing data (note - if we had plans for a publication we would be much more cautious about doing this)

As you can see we have gone from 100 subjects to 94 subjects - so 6 subjects had incomplete data in the variables of interest.

nhldata <- nhldata[complete.cases(nhldata), ]
str(nhldata)

## 'data.frame':    94 obs. of  9 variables:
##  $ Weight_kg               : num  95.1 80 87.4 82.1 81.2 ...
##  $ WingatePeak_Watts       : int  1136 1017 1171 933 1089 1223 1210 917 1047 1191 ...
##  $ PeakWingate_W_kg        : num  12 12.7 13.4 11.4 13.4 14.1 13 12.8 13.9 14 ...
##  $ VO2max_l_min            : num  4.75 3.99 4.94 5.64 4.92 4.41 5.19 4.17 4.39 5.45 ...
##  $ VO2max_ml_kg_min        : num  50 49.9 56.5 68.7 60.6 50.6 55.8 57.9 58 63.8 ...
##  $ VO2max_watts            : int  520 400 520 520 520 520 520 440 480 520 ...
##  $ VO2max_watts_kg         : num  5.47 5 5.95 6.33 6.41 ...
##  $ LewisAvg_.Pause_watts   : num  1390 1330 1372 1357 1370 ...
##  $ LewisAvg_.Pause_watts_kg: num  14.6 16.6 15.7 16.5 16.9 ...

summary(nhldata)

##    Weight_kg      WingatePeak_Watts PeakWingate_W_kg  VO2max_l_min  
##  Min.   : 69.68   Min.   : 784      Min.   : 9.00    Min.   :3.640  
##  1st Qu.: 80.49   1st Qu.:1050      1st Qu.:12.80    1st Qu.:4.383  
##  Median : 84.95   Median :1142      Median :13.40    Median :4.725  
##  Mean   : 84.73   Mean   :1134      Mean   :13.41    Mean   :4.773  
##  3rd Qu.: 89.07   3rd Qu.:1195      3rd Qu.:14.00    3rd Qu.:5.185  
##  Max.   :109.41   Max.   :1502      Max.   :15.90    Max.   :6.280  
##  VO2max_ml_kg_min  VO2max_watts   VO2max_watts_kg LewisAvg_.Pause_watts
##  Min.   :44.90    Min.   :400.0   Min.   :4.396   Min.   :1125         
##  1st Qu.:51.77    1st Qu.:480.0   1st Qu.:5.649   1st Qu.:1301         
##  Median :56.30    Median :520.0   Median :6.064   Median :1413         
##  Mean   :56.41    Mean   :505.5   Mean   :5.981   Mean   :1409         
##  3rd Qu.:60.58    3rd Qu.:550.0   3rd Qu.:6.334   3rd Qu.:1500         
##  Max.   :68.70    Max.   :640.0   Max.   :7.556   Max.   :1865         
##  LewisAvg_.Pause_watts_kg
##  Min.   :12.85           
##  1st Qu.:15.90           
##  Median :16.62           
##  Mean   :16.63           
##  3rd Qu.:17.17           
##  Max.   :20.63

Descriptive summary of the variables of interest:

options(scipen=100) #supresses scientific notation of output
options(digits=2) #limits output to 2 significant digits
stat.desc(nhldata, basic=F) #runs basic stats

##              Weight_kg WingatePeak_Watts PeakWingate_W_kg VO2max_l_min
## median          84.955           1142.50           13.400        4.725
## mean            84.725           1133.59           13.415        4.773
## SE.mean          0.647             13.15            0.119        0.056
## CI.mean.0.95     1.285             26.10            0.236        0.111
## var             39.387          16243.43            1.330        0.292
## std.dev          6.276            127.45            1.153        0.540
## coef.var         0.074              0.11            0.086        0.113
##              VO2max_ml_kg_min VO2max_watts VO2max_watts_kg
## median                 56.300       520.00           6.064
## mean                   56.412       505.53           5.981
## SE.mean                 0.557         5.70           0.065
## CI.mean.0.95            1.106        11.32           0.128
## var                    29.174      3057.24           0.393
## std.dev                 5.401        55.29           0.627
## coef.var                0.096         0.11           0.105
##              LewisAvg_.Pause_watts LewisAvg_.Pause_watts_kg
## median                      1413.0                   16.624
## mean                        1409.1                   16.628
## SE.mean                       15.1                    0.119
## CI.mean.0.95                  30.0                    0.236
## var                        21482.0                    1.329
## std.dev                      146.6                    1.153
## coef.var                       0.1                    0.069

A look at the univariate distributions. These are separated out to account for the different Y axis scaling - there are other ways to deal with scaling but for now this seemed the best way (comparing apples to apples on the Y axes of all of these graphs)

Decided to use Boxplots - The box plot (a.k.a. box and whisker diagram) is a standardized way of displaying the distribution of data based on the five number summary: minimum, first quartile, median, third quartile, and maximum. Outliers are plotted as individual points.

Data with the common Y axis of Watts:

watts<-select(nhldata,LewisAvg_.Pause_watts, WingatePeak_Watts,VO2max_watts)
ggplot(melt(watts), aes(x = variable, y = value)) + geom_boxplot()

## No id variables; using all as measure variables

Data with the common Y axis of Watts/kg:

watts_kg<-select(nhldata,LewisAvg_.Pause_watts_kg, PeakWingate_W_kg, VO2max_watts_kg)
ggplot(melt(watts_kg), aes(x = variable, y = value)) + geom_boxplot()

## No id variables; using all as measure variables

Data with the common Y axis of oxygen consumption (one for L/min and one for L/kg/min):

vo2l<-select(nhldata,VO2max_l_min)
vo2lkg<-select(nhldata,VO2max_ml_kg_min)
ggplot(melt(vo2l), aes(x = variable, y = value)) + geom_boxplot()

## No id variables; using all as measure variables

ggplot(melt(vo2lkg), aes(x = variable, y = value)) + geom_boxplot()

## No id variables; using all as measure variables

Next we can compare bivariate distributions (i.e. scatter plots) to look at associations. This first draft will only include the scatter plots without assumptions of the best approach to fitting the data with a function (i.e. linear model).

There is clearly going to be a strong linear relationship between VO2max in L/min and in L/kg/min:

qplot(VO2max_l_min,VO2max_ml_kg_min,data=nhldata,geom=c("point"))

To help visualize what is happenning (why there is variability) we can weight the points based on the players weight - so that larger points indicate a heavier player. As you can see - heavier players may have larger VO2 max in L/min, but they have lower VO2 max in ml/kg/min - which makes sense.

qplot(VO2max_l_min,VO2max_ml_kg_min,data=nhldata,geom=c("point"),size = Weight_kg)

VO2 L/min and VO2 watts - here I have watts on the X axis because technically the power being produced is a determinant of the oxygen need (and therefore consumed). One thing to note is that VO2 max watts seems to be a factor variable (not continuous), this is because of the mode of testing the players. They use a Monark bike and each stage of the max test is a pre determined power. So the max watts achieved is based on the state. Therefore, variation in actual watts due to variations in pedal RPMs from the RPMs required in the test would not be captured.

qplot(VO2max_watts, VO2max_l_min,data=nhldata,geom=c("point"))

Here we look at the same plot as above but with the points weighted based on player weight. There is a much less clear pattern than the plot of VO2 L/min and VO2 ml/kg/min when weighting based on weight.

qplot(VO2max_watts, VO2max_l_min,data=nhldata,geom=c("point"),size = Weight_kg)

Wingate power - comparision between Watts and Watts/Kg:

qplot(WingatePeak_Watts, PeakWingate_W_kg,data=nhldata,geom=c("point"))

Wingate power - comparision between Watts and Watts/Kg, when weighting the points based on weight, just like in VO2, it is clear that those that have more mass have greater Watts, but not necessarily greater Watts/Kg.

qplot(WingatePeak_Watts, PeakWingate_W_kg,data=nhldata,geom=c("point"),size = Weight_kg)

One last set of “getting familiar with the data” analysis within each major variable. Comparing vertical jump average power watts to watts/kg. Expect the same as above - but a useful exercise in getting familiar with the data. As above - first without weight based on weight, and then with such weighting.

qplot(LewisAvg_.Pause_watts, LewisAvg_.Pause_watts_kg,data=nhldata,geom=c("point"))

qplot(LewisAvg_.Pause_watts, LewisAvg_.Pause_watts_kg,data=nhldata,geom=c("point"),size = Weight_kg)

Now it is time to get to the tasks: 1. Compare Wingate peak watts to VO2 max watts

qplot(WingatePeak_Watts, VO2max_watts,data=nhldata,geom=c("point"))

Compare VO2 max ml kg min to peak wingate watt kg

qplot(PeakWingate_W_kg, VO2max_ml_kg_min,data=nhldata,geom=c("point"))

Compare lewis average pause watts to the peak watts and to VO2 max watts to see if there is any relationships and if jump capacity has an impact First with straight forward scatter plots for each comparision lewis average pause watts to the peak wingate watts - with visualization of the impact of weight

qplot(LewisAvg_.Pause_watts, WingatePeak_Watts,data=nhldata,geom=c("point"),size = Weight_kg)

lewis average pause watts to the peak aerobic watts -with visualization of the impact of weight

qplot(LewisAvg_.Pause_watts, VO2max_watts,data=nhldata,geom=c("point"),size = Weight_kg)

Next we want to see if there is a pattern in the distribution of vertical jump power across the bivariate distribution of wingate and aerobic power

qplot(WingatePeak_Watts, VO2max_watts,data=nhldata,geom=c("point"),size = LewisAvg_.Pause_watts)

Finally we want to see if there is a pattern in the distribution of vertical jump power and body mass across the bivariate distribution of wingate and aerobic power - this may be too much in one graphic but it is a start:

qplot(WingatePeak_Watts, VO2max_watts,data=nhldata,geom=c("point"),size = LewisAvg_.Pause_watts, color = Weight_kg)

There could be more to come. Let’s start with the above and see where it takes the report. It may be interesting to put some of these into linear models and see what we get - but I want to do so cautiously, not just throw a model together to see what comes out of it. Once models are built, then the fit lines can be added to any / all of the graphics above.

Lemgart Analysis - NHL draft combine data

Sean Collins

December 2, 2015