Almonte - NHL draft combine data

Data analysis performed for Almonte’s physiology of exercise lab report

Question: Do skating players (forward and defense) entering the NHL combine scouting tests have different testing results in key indices of aerobic and anaerobic fitness?

Expected: Forwards will have a higher aerobic fitness, defense will have a higher peak power, but also a higher fatigue index percent (they will fatigue faster).

Request: Compare averages– mean, standard deviation, etc.– of both groups: please add histograms - I went with boxplots thinking they would provide a better approach to comparing the groups; tables -all the data you need for tables is included but not in a presentable way so you can select what you want and add to a table Aerobic fitness with VO2 max (ml/kg/min) Anaerobic fitness with peak power Anaerobic fatigue index

Other variables you might be interested have been added - you can disregard; if it is disruptive to have them included here they can be removed

Note: goalies are not included in the analysis

Analysis

# Load necessary R packages
library(dplyr) #For data manipulation

## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(reshape2) #For data manipulation related to plotting
library(ggplot2) #For plotting
library(pastecs) #For descriptive statistics

## Loading required package: boot
## 
## Attaching package: 'pastecs'
## 
## The following objects are masked from 'package:dplyr':
## 
##     first, last

library(psych) #For descriptive statistics

## 
## Attaching package: 'psych'
## 
## The following object is masked from 'package:boot':
## 
##     logit
## 
## The following object is masked from 'package:ggplot2':
## 
##     %+%

# Set working directory - mac file system
setwd("~/physExLab/NHLcombinedata")

# Load data
nhldata<-read.csv("data_position.csv")

#Create weight in kg variable
nhldata<-mutate(nhldata, Weight_kg = Weight_lb/2.2)

#Create aerobic test peak watts / kg variable
nhldata<-mutate(nhldata, VO2max_watts_kg= VO2max_watts/Weight_kg)

#Create Lewis average watts / kg
nhldata<-mutate(nhldata, LewisAvg_.Pause_watts_kg= LewisAvg_.Pause_watts/Weight_kg)

#Create dataframe with only the variables we are interested in
nhldata<-select(nhldata,PosNumeric, Position, Weight_kg, WingatePeak_Watts, PeakWingate_W_kg, WingatePercentDrop, VO2max_l_min, VO2max_ml_kg_min, VO2max_watts, VO2max_watts_kg, LewisAvg_.Pause_watts, LewisAvg_.Pause_watts_kg)

Here is a summary of the resulting dataframe. As you can see there are missing values in most variables.

str(nhldata)

## 'data.frame':    90 obs. of  12 variables:
##  $ PosNumeric              : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Position                : Factor w/ 2 levels "Defense","Forward": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Weight_kg               : num  95.1 80 87.4 82.1 93 ...
##  $ WingatePeak_Watts       : int  1136 1017 1171 933 1210 1047 1191 1234 975 1099 ...
##  $ PeakWingate_W_kg        : num  12 12.7 13.4 11.4 13 ...
##  $ WingatePercentDrop      : num  54.5 52.5 36.7 39.5 54.4 57.4 49.3 54.1 33.7 46.8 ...
##  $ VO2max_l_min            : num  4.75 3.99 4.94 5.64 5.19 4.39 5.45 4.96 4.44 4.64 ...
##  $ VO2max_ml_kg_min        : num  50 49.9 56.5 68.7 55.8 58 63.8 58.6 57.6 52 ...
##  $ VO2max_watts            : int  520 400 520 520 520 480 520 400 480 480 ...
##  $ VO2max_watts_kg         : num  5.47 5 5.95 6.33 5.59 ...
##  $ LewisAvg_.Pause_watts   : num  1390 1330 1372 1357 1650 ...
##  $ LewisAvg_.Pause_watts_kg: num  14.6 16.6 15.7 16.5 17.7 ...

summary(nhldata)

##    PosNumeric       Position    Weight_kg      WingatePeak_Watts
##  Min.   :1.000   Defense:28   Min.   : 69.68   Min.   : 784     
##  1st Qu.:1.000   Forward:62   1st Qu.: 80.67   1st Qu.:1062     
##  Median :1.000                Median : 85.09   Median :1148     
##  Mean   :1.311                Mean   : 84.91   Mean   :1136     
##  3rd Qu.:2.000                3rd Qu.: 89.09   3rd Qu.:1194     
##  Max.   :2.000                Max.   :109.41   Max.   :1502     
##                                                NA's   :3        
##  PeakWingate_W_kg WingatePercentDrop  VO2max_l_min   VO2max_ml_kg_min
##  Min.   : 9.03    Min.   :33.70      Min.   :3.640   Min.   :44.90   
##  1st Qu.:12.85    1st Qu.:45.20      1st Qu.:4.350   1st Qu.:51.77   
##  Median :13.44    Median :50.00      Median :4.725   Median :56.20   
##  Mean   :13.44    Mean   :49.60      Mean   :4.766   Mean   :56.35   
##  3rd Qu.:13.98    3rd Qu.:54.15      3rd Qu.:5.185   3rd Qu.:60.58   
##  Max.   :15.93    Max.   :77.60      Max.   :6.280   Max.   :68.70   
##  NA's   :3        NA's   :3          NA's   :4       NA's   :4       
##   VO2max_watts   VO2max_watts_kg LewisAvg_.Pause_watts
##  Min.   :400.0   Min.   :4.396   Min.   :1125         
##  1st Qu.:480.0   1st Qu.:5.649   1st Qu.:1306         
##  Median :520.0   Median :6.083   Median :1414         
##  Mean   :505.1   Mean   :5.977   Mean   :1416         
##  3rd Qu.:550.0   3rd Qu.:6.334   3rd Qu.:1509         
##  Max.   :640.0   Max.   :7.556   Max.   :1865         
##  NA's   :4       NA's   :4       NA's   :3            
##  LewisAvg_.Pause_watts_kg
##  Min.   :14.32           
##  1st Qu.:15.92           
##  Median :16.62           
##  Mean   :16.71           
##  3rd Qu.:17.35           
##  Max.   :20.63           
##  NA's   :3

But we are really only interested in cases that have complete data - so need to eliminate cases with missing data (note - if we had plans for a publication we would be much more cautious about doing this)

As you can see we have gone from 90 subjects to 86 subjects - so 4 subjects had incomplete data in the variables of interest.

nhldata <- nhldata[complete.cases(nhldata), ]
str(nhldata)

## 'data.frame':    86 obs. of  12 variables:
##  $ PosNumeric              : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Position                : Factor w/ 2 levels "Defense","Forward": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Weight_kg               : num  95.1 80 87.4 82.1 93 ...
##  $ WingatePeak_Watts       : int  1136 1017 1171 933 1210 1047 1191 1234 975 1099 ...
##  $ PeakWingate_W_kg        : num  12 12.7 13.4 11.4 13 ...
##  $ WingatePercentDrop      : num  54.5 52.5 36.7 39.5 54.4 57.4 49.3 54.1 33.7 46.8 ...
##  $ VO2max_l_min            : num  4.75 3.99 4.94 5.64 5.19 4.39 5.45 4.96 4.44 4.64 ...
##  $ VO2max_ml_kg_min        : num  50 49.9 56.5 68.7 55.8 58 63.8 58.6 57.6 52 ...
##  $ VO2max_watts            : int  520 400 520 520 520 480 520 400 480 480 ...
##  $ VO2max_watts_kg         : num  5.47 5 5.95 6.33 5.59 ...
##  $ LewisAvg_.Pause_watts   : num  1390 1330 1372 1357 1650 ...
##  $ LewisAvg_.Pause_watts_kg: num  14.6 16.6 15.7 16.5 17.7 ...

summary(nhldata)

##    PosNumeric       Position    Weight_kg      WingatePeak_Watts
##  Min.   :1.000   Defense:27   Min.   : 69.68   Min.   : 784     
##  1st Qu.:1.000   Forward:59   1st Qu.: 80.49   1st Qu.:1061     
##  Median :1.000                Median : 84.86   Median :1150     
##  Mean   :1.314                Mean   : 84.70   Mean   :1136     
##  3rd Qu.:2.000                3rd Qu.: 89.00   3rd Qu.:1195     
##  Max.   :2.000                Max.   :109.41   Max.   :1502     
##  PeakWingate_W_kg WingatePercentDrop  VO2max_l_min   VO2max_ml_kg_min
##  Min.   : 9.03    Min.   :33.70      Min.   :3.640   Min.   :44.90   
##  1st Qu.:12.84    1st Qu.:45.20      1st Qu.:4.350   1st Qu.:51.77   
##  Median :13.45    Median :49.90      Median :4.725   Median :56.20   
##  Mean   :13.45    Mean   :49.54      Mean   :4.766   Mean   :56.35   
##  3rd Qu.:13.99    3rd Qu.:54.17      3rd Qu.:5.185   3rd Qu.:60.58   
##  Max.   :15.93    Max.   :77.60      Max.   :6.280   Max.   :68.70   
##   VO2max_watts   VO2max_watts_kg LewisAvg_.Pause_watts
##  Min.   :400.0   Min.   :4.396   Min.   :1125         
##  1st Qu.:480.0   1st Qu.:5.649   1st Qu.:1304         
##  Median :520.0   Median :6.083   Median :1413         
##  Mean   :505.1   Mean   :5.977   Mean   :1414         
##  3rd Qu.:550.0   3rd Qu.:6.334   3rd Qu.:1505         
##  Max.   :640.0   Max.   :7.556   Max.   :1865         
##  LewisAvg_.Pause_watts_kg
##  Min.   :14.32           
##  1st Qu.:15.90           
##  Median :16.62           
##  Mean   :16.69           
##  3rd Qu.:17.29           
##  Max.   :20.63

Descriptive summary of the (potential) variables of interest (all subjects):

options(scipen=100) #supresses scientific notation of output
options(digits=2) #suggests a limit of 2 significant digits - but it is only a suggestion
stat.desc(nhldata, basic=F) #runs basic stats

##              PosNumeric Position Weight_kg WingatePeak_Watts
## median             1.00       NA    84.864           1149.50
## mean               1.31       NA    84.700           1136.30
## SE.mean            0.05       NA     0.678             13.65
## CI.mean.0.95       0.10       NA     1.349             27.14
## var                0.22       NA    39.580          16019.67
## std.dev            0.47       NA     6.291            126.57
## coef.var           0.36       NA     0.074              0.11
##              PeakWingate_W_kg WingatePercentDrop VO2max_l_min
## median                 13.445              49.90        4.725
## mean                   13.450              49.54        4.766
## SE.mean                 0.125               0.74        0.059
## CI.mean.0.95            0.249               1.48        0.118
## var                     1.352              47.70        0.302
## std.dev                 1.163               6.91        0.550
## coef.var                0.086               0.14        0.115
##              VO2max_ml_kg_min VO2max_watts VO2max_watts_kg
## median                 56.200       520.00           6.083
## mean                   56.346       505.12           5.977
## SE.mean                 0.593         6.03           0.068
## CI.mean.0.95            1.179        11.99           0.136
## var                    30.222      3126.46           0.403
## std.dev                 5.497        55.91           0.635
## coef.var                0.098         0.11           0.106
##              LewisAvg_.Pause_watts LewisAvg_.Pause_watts_kg
## median                      1413.0                   16.625
## mean                        1414.0                   16.687
## SE.mean                       15.9                    0.121
## CI.mean.0.95                  31.7                    0.241
## var                        21843.3                    1.261
## std.dev                      147.8                    1.123
## coef.var                       0.1                    0.067

Descriptive summary of the (potential) variables of interest for forwards:

forwards<-filter(nhldata,PosNumeric==1)
options(scipen=100) #supresses scientific notation of output
options(digits=2) #suggests a limit of 2 significant digits - but it is only a suggestion
stat.desc(forwards, basic=F) #runs basic stats

##              PosNumeric Position Weight_kg WingatePeak_Watts
## median                1       NA    84.591           1140.00
## mean                  1       NA    84.271           1128.83
## SE.mean               0       NA     0.754             15.45
## CI.mean.0.95          0       NA     1.510             30.92
## var                   0       NA    33.564          14078.28
## std.dev               0       NA     5.793            118.65
## coef.var              0       NA     0.069              0.11
##              PeakWingate_W_kg WingatePercentDrop VO2max_l_min
## median                 13.370              50.00        4.730
## mean                   13.431              49.41        4.735
## SE.mean                 0.150               0.99        0.065
## CI.mean.0.95            0.300               1.98        0.131
## var                     1.328              57.90        0.252
## std.dev                 1.152               7.61        0.502
## coef.var                0.086               0.15        0.106
##              VO2max_ml_kg_min VO2max_watts VO2max_watts_kg
## median                 56.100      520.000           6.000
## mean                   56.306      498.983           5.938
## SE.mean                 0.719        6.440           0.081
## CI.mean.0.95            1.439       12.892           0.161
## var                    30.508     2447.224           0.383
## std.dev                 5.523       49.469           0.619
## coef.var                0.098        0.099           0.104
##              LewisAvg_.Pause_watts LewisAvg_.Pause_watts_kg
## median                    1406.000                   16.540
## mean                      1398.710                   16.599
## SE.mean                     17.034                    0.139
## CI.mean.0.95                34.098                    0.278
## var                      17119.680                    1.134
## std.dev                    130.842                    1.065
## coef.var                     0.094                    0.064

Descriptive summary of the (potential) variables of interest for defenders:

defense<-filter(nhldata,PosNumeric==2)
options(scipen=100) #supresses scientific notation of output
options(digits=2) #suggests a limit of 2 significant digits - but it is only a suggestion
stat.desc(defense, basic=F) #runs basic stats

##              PosNumeric Position Weight_kg WingatePeak_Watts
## median                2       NA    86.318           1151.00
## mean                  2       NA    85.636           1152.63
## SE.mean               0       NA     1.404             27.60
## CI.mean.0.95          0       NA     2.885             56.73
## var                   0       NA    53.195          20563.09
## std.dev               0       NA     7.293            143.40
## coef.var              0       NA     0.085              0.12
##              PeakWingate_W_kg WingatePercentDrop VO2max_l_min
## median                 13.460              49.40         4.68
## mean                   13.490              49.84         4.83
## SE.mean                 0.232               0.99         0.12
## CI.mean.0.95            0.477               2.04         0.26
## var                     1.455              26.66         0.42
## std.dev                 1.206               5.16         0.65
## coef.var                0.089               0.10         0.13
##              VO2max_ml_kg_min VO2max_watts VO2max_watts_kg
## median                 56.480       520.00            6.19
## mean                   56.431       518.52            6.06
## SE.mean                 1.067        12.90            0.13
## CI.mean.0.95            2.193        26.51            0.27
## var                    30.737      4490.03            0.45
## std.dev                 5.544        67.01            0.67
## coef.var                0.098         0.13            0.11
##              LewisAvg_.Pause_watts LewisAvg_.Pause_watts_kg
## median                     1435.00                   16.712
## mean                       1447.38                   16.881
## SE.mean                      34.17                    0.238
## CI.mean.0.95                 70.25                    0.490
## var                       31533.21                    1.536
## std.dev                     177.58                    1.239
## coef.var                      0.12                    0.073

Decided to use Boxplots - The box plot (a.k.a. box and whisker diagram) is a standardized way of displaying the distribution of data based on the five number summary: minimum, first quartile, median, third quartile, and maximum. Outliers are plotted as individual points. They are easier to look at for comparing two factors of a variable (forward, defense)

ggplot(nhldata, aes(factor(Position), WingatePeak_Watts)) + geom_boxplot()

ggplot(nhldata, aes(factor(Position), PeakWingate_W_kg)) + geom_boxplot()

ggplot(nhldata, aes(factor(Position), WingatePercentDrop)) + geom_boxplot()

ggplot(nhldata, aes(factor(Position), VO2max_l_min)) + geom_boxplot()

ggplot(nhldata, aes(factor(Position), VO2max_ml_kg_min)) + geom_boxplot()

ggplot(nhldata, aes(factor(Position), VO2max_watts_kg)) + geom_boxplot()

ggplot(nhldata, aes(factor(Position), LewisAvg_.Pause_watts)) + geom_boxplot()

ggplot(nhldata, aes(factor(Position), LewisAvg_.Pause_watts_kg)) + geom_boxplot()

The fact that there is not much difference between the absolute and relative versions of these data is really due to the fact that there is not much of a difference between the groups in body mass:

ggplot(nhldata, aes(factor(Position), Weight_kg)) + geom_boxplot()

Just for kicks - I also used a t-test to see whether there were differences between the positions (the graphics suggest not - but wanted to see if the null was accepted in the t-test (i.e. p > 0.05 if we accept an alpha of 0.05))

t.test(nhldata$Weight_kg~nhldata$Position)

## 
##  Welch Two Sample t-test
## 
## data:  nhldata$Weight_kg by nhldata$Position
## t = 0.9, df = 40, p-value = 0.4
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.9  4.6
## sample estimates:
## mean in group Defense mean in group Forward 
##                    86                    84

t.test(nhldata$WingatePeak_Watts~nhldata$Position)

## 
##  Welch Two Sample t-test
## 
## data:  nhldata$WingatePeak_Watts by nhldata$Position
## t = 0.8, df = 40, p-value = 0.5
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -40  88
## sample estimates:
## mean in group Defense mean in group Forward 
##                  1153                  1129

t.test(nhldata$PeakWingate_W_kg~nhldata$Position)

## 
##  Welch Two Sample t-test
## 
## data:  nhldata$PeakWingate_W_kg by nhldata$Position
## t = 0.2, df = 50, p-value = 0.8
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.50  0.62
## sample estimates:
## mean in group Defense mean in group Forward 
##                    13                    13

t.test(nhldata$WingatePercentDrop~nhldata$Position)

## 
##  Welch Two Sample t-test
## 
## data:  nhldata$WingatePercentDrop by nhldata$Position
## t = 0.3, df = 70, p-value = 0.8
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.4  3.2
## sample estimates:
## mean in group Defense mean in group Forward 
##                    50                    49

t.test(nhldata$VO2max_l_min~nhldata$Position)

## 
##  Welch Two Sample t-test
## 
## data:  nhldata$VO2max_l_min by nhldata$Position
## t = 0.7, df = 40, p-value = 0.5
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.19  0.38
## sample estimates:
## mean in group Defense mean in group Forward 
##                   4.8                   4.7

t.test(nhldata$VO2max_ml_kg_min~nhldata$Position)

## 
##  Welch Two Sample t-test
## 
## data:  nhldata$VO2max_ml_kg_min by nhldata$Position
## t = 0.1, df = 50, p-value = 0.9
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.5  2.7
## sample estimates:
## mean in group Defense mean in group Forward 
##                    56                    56

t.test(nhldata$VO2max_watts~nhldata$Position)

## 
##  Welch Two Sample t-test
## 
## data:  nhldata$VO2max_watts by nhldata$Position
## t = 1, df = 40, p-value = 0.2
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -9.6 48.7
## sample estimates:
## mean in group Defense mean in group Forward 
##                   519                   499

t.test(nhldata$VO2max_watts_kg~nhldata$Position)

## 
##  Welch Two Sample t-test
## 
## data:  nhldata$VO2max_watts_kg by nhldata$Position
## t = 0.8, df = 50, p-value = 0.4
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.18  0.43
## sample estimates:
## mean in group Defense mean in group Forward 
##                   6.1                   5.9

t.test(nhldata$LewisAvg_.Pause_watts~nhldata$Position)

## 
##  Welch Two Sample t-test
## 
## data:  nhldata$LewisAvg_.Pause_watts by nhldata$Position
## t = 1, df = 40, p-value = 0.2
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -29 126
## sample estimates:
## mean in group Defense mean in group Forward 
##                  1447                  1399

t.test(nhldata$LewisAvg_.Pause_watts_kg~nhldata$Position)

## 
##  Welch Two Sample t-test
## 
## data:  nhldata$LewisAvg_.Pause_watts_kg by nhldata$Position
## t = 1, df = 40, p-value = 0.3
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.27  0.84
## sample estimates:
## mean in group Defense mean in group Forward 
##                    17                    17

Clearly there are no significant differences in this data set between these players. Makes for an interesting report though - particularly if there are papers out there showing a difference in older players. Raises the question of whether the players have not been playing long enough to be trained into their position where they partially adapt based on their role.

Almonte - NHL draft combine data

Sean Collins

December 2, 2015