D’Amour - NHL draft combine data

Data analysis performed for D’Amour’s physiology of exercise lab report

“bench, pushups, and push power would be the best way to represent the”hitting strength" for the players. Id compare the two numbers for bench and pushups to factor in body weight too so its not looking at if heavier people are stronger as well“”

Question: Do skating players (forward and defense) entering the NHL combine scouting tests have different testing results indices of “hitting strength”?

Operationalize “hitting strength” as: bench, pushups, and push power; Consider both absolute values and normalize for body mass

Expected: Defense will have a higher “hitting strength”

Note: goalies are not included in the analysis

Approach:

Load and clean data (take complete cases and variables of interest)
Descriptive statistics for each group
Boxplots to compare groups
t-tests to compare group differences (statistically test the null hypothesis that the groups are not different)

All analyses performed in R (version 3.2.0 (www.r-project.org)) using RStudio (www.rstudio.com)

Variables

BenchPress_reps150 BenchPress_lbperlbBodyWeight PushUps_max PushUpsxBody.Weight Push.Strength..lb. PushStrength_lbperlbBodyWeight

Note: since data includes body mass normalized variables, this analysis will not calculate those variables; therefore the normalizations are based on “lbs” not “kgs” since that is how the NHL has done the processing

1. Loading R packages, getting data ready

# Load necessary R packages
library(dplyr) #For data manipulation

## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(reshape2) #For data manipulation related to plotting
library(ggplot2) #For plotting
library(pastecs) #For descriptive statistics

## Loading required package: boot
## 
## Attaching package: 'pastecs'
## 
## The following objects are masked from 'package:dplyr':
## 
##     first, last

library(psych) #For descriptive statistics

## 
## Attaching package: 'psych'
## 
## The following object is masked from 'package:boot':
## 
##     logit
## 
## The following object is masked from 'package:ggplot2':
## 
##     %+%

# Set working directory - mac file system
setwd("~/physExLab/NHLcombinedata")

# Load data
nhldata<-read.csv("data_position.csv")

#Create dataframe with only the variables we are interested in
nhldata<-select(nhldata,PosNumeric, Position, Weight_lb,BenchPress_reps150, BenchPress_lbperlbBodyWeight, PushUps_max, PushUpsxBody.Weight, Push.Strength..lb., PushStrength_lbperlbBodyWeight)

Loaded data with all forwards and defense (including subjects with missing values)

As you can see there are missing values in most variables.

str(nhldata)

## 'data.frame':    90 obs. of  9 variables:
##  $ PosNumeric                    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Position                      : Factor w/ 2 levels "Defense","Forward": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Weight_lb                     : num  209 176 192 181 205 ...
##  $ BenchPress_reps150            : int  12 10 5 5 10 7 6 9 4 3 ...
##  $ BenchPress_lbperlbBodyWeight  : num  8.6 8.52 3.9 4.15 7.33 6.31 4.79 7.25 3.54 2.29 ...
##  $ PushUps_max                   : int  24 21 18 26 34 28 25 24 28 14 ...
##  $ PushUpsxBody.Weight           : num  5021 3696 3461 4696 6956 ...
##  $ Push.Strength..lb.            : num  277 247 205 239 294 188 255 275 191 176 ...
##  $ PushStrength_lbperlbBodyWeight: num  1.32 1.4 1.07 1.32 1.44 1.13 1.36 1.48 1.13 0.89 ...

summary(nhldata)

##    PosNumeric       Position    Weight_lb     BenchPress_reps150
##  Min.   :1.000   Defense:28   Min.   :153.3   Min.   : 0.000    
##  1st Qu.:1.000   Forward:62   1st Qu.:177.5   1st Qu.: 4.000    
##  Median :1.000                Median :187.2   Median : 6.500    
##  Mean   :1.311                Mean   :186.8   Mean   : 6.756    
##  3rd Qu.:2.000                3rd Qu.:196.0   3rd Qu.: 9.000    
##  Max.   :2.000                Max.   :240.7   Max.   :13.000    
##                                               NA's   :8         
##  BenchPress_lbperlbBodyWeight  PushUps_max    PushUpsxBody.Weight
##  Min.   : 0.000               Min.   :12.00   Min.   :1760       
##  1st Qu.: 3.550               1st Qu.:20.75   1st Qu.:3695       
##  Median : 5.465               Median :24.00   Median :4605       
##  Mean   : 5.446               Mean   :24.64   Mean   :4588       
##  3rd Qu.: 7.190               3rd Qu.:28.00   3rd Qu.:5104       
##  Max.   :13.130               Max.   :42.00   Max.   :8085       
##  NA's   :8                    NA's   :6       NA's   :6          
##  Push.Strength..lb. PushStrength_lbperlbBodyWeight
##  Min.   :142.0      Min.   :0.730                 
##  1st Qu.:198.0      1st Qu.:1.080                 
##  Median :222.8      Median :1.220                 
##  Mean   :228.2      Mean   :1.239                 
##  3rd Qu.:260.0      3rd Qu.:1.387                 
##  Max.   :366.0      Max.   :2.580                 
##  NA's   :4          NA's   :4

But we are really only interested in cases that have complete data - so need to eliminate cases with missing data (note - if we had plans for a publication we would be much more cautious about doing this)

Cleaning loaded data to extract only “complete cases”

A complete case has all of the variables of interest

We have gone from 90 subjects to 82 subjects - so 8 subjects had incomplete data in the variables of interest.

nhldata <- nhldata[complete.cases(nhldata), ]
str(nhldata)

## 'data.frame':    82 obs. of  9 variables:
##  $ PosNumeric                    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Position                      : Factor w/ 2 levels "Defense","Forward": 2 2 2 2 2 2 2 2 2 2 ...
##  $ Weight_lb                     : num  209 176 192 181 205 ...
##  $ BenchPress_reps150            : int  12 10 5 5 10 7 6 9 4 3 ...
##  $ BenchPress_lbperlbBodyWeight  : num  8.6 8.52 3.9 4.15 7.33 6.31 4.79 7.25 3.54 2.29 ...
##  $ PushUps_max                   : int  24 21 18 26 34 28 25 24 28 14 ...
##  $ PushUpsxBody.Weight           : num  5021 3696 3461 4696 6956 ...
##  $ Push.Strength..lb.            : num  277 247 205 239 294 188 255 275 191 176 ...
##  $ PushStrength_lbperlbBodyWeight: num  1.32 1.4 1.07 1.32 1.44 1.13 1.36 1.48 1.13 0.89 ...

summary(nhldata)

##    PosNumeric       Position    Weight_lb     BenchPress_reps150
##  Min.   :1.000   Defense:27   Min.   :153.3   Min.   : 0.000    
##  1st Qu.:1.000   Forward:55   1st Qu.:177.5   1st Qu.: 4.000    
##  Median :1.000                Median :187.2   Median : 6.500    
##  Mean   :1.329                Mean   :186.8   Mean   : 6.756    
##  3rd Qu.:2.000                3rd Qu.:195.9   3rd Qu.: 9.000    
##  Max.   :2.000                Max.   :240.7   Max.   :13.000    
##  BenchPress_lbperlbBodyWeight  PushUps_max    PushUpsxBody.Weight
##  Min.   : 0.000               Min.   :12.00   Min.   :1760       
##  1st Qu.: 3.550               1st Qu.:20.25   1st Qu.:3693       
##  Median : 5.465               Median :24.00   Median :4549       
##  Mean   : 5.446               Mean   :24.57   Mean   :4573       
##  3rd Qu.: 7.190               3rd Qu.:28.00   3rd Qu.:5073       
##  Max.   :13.130               Max.   :42.00   Max.   :8085       
##  Push.Strength..lb. PushStrength_lbperlbBodyWeight
##  Min.   :142.0      Min.   :0.730                 
##  1st Qu.:198.5      1st Qu.:1.093                 
##  Median :228.0      Median :1.240                 
##  Mean   :229.6      Mean   :1.246                 
##  3rd Qu.:262.2      3rd Qu.:1.397                 
##  Max.   :366.0      Max.   :2.580

2. Descriptive statistics

Descriptive summary (both groups):

options(scipen=100) #supresses scientific notation of output
options(digits=2) #suggests a limit of 2 significant digits - but it is only a suggestion
stat.desc(nhldata, basic=F) #runs basic stats

##              PosNumeric Position Weight_lb BenchPress_reps150
## median            1.000       NA   187.200               6.50
## mean              1.329       NA   186.779               6.76
## SE.mean           0.052       NA     1.531               0.36
## CI.mean.0.95      0.104       NA     3.047               0.71
## var               0.224       NA   192.248              10.58
## std.dev           0.473       NA    13.865               3.25
## coef.var          0.356       NA     0.074               0.48
##              BenchPress_lbperlbBodyWeight PushUps_max PushUpsxBody.Weight
## median                               5.46       24.00             4548.80
## mean                                 5.45       24.57             4573.33
## SE.mean                              0.29        0.67              141.06
## CI.mean.0.95                         0.58        1.33              280.67
## var                                  6.86       36.57          1631632.73
## std.dev                              2.62        6.05             1277.35
## coef.var                             0.48        0.25                0.28
##              Push.Strength..lb. PushStrength_lbperlbBodyWeight
## median                   228.00                          1.240
## mean                     229.61                          1.246
## SE.mean                    4.77                          0.028
## CI.mean.0.95               9.49                          0.056
## var                     1867.38                          0.066
## std.dev                   43.21                          0.256
## coef.var                   0.19                          0.206

Descriptive summary for forwards:

forwards<-filter(nhldata,PosNumeric==1)
options(scipen=100) #supresses scientific notation of output
options(digits=2) #suggests a limit of 2 significant digits - but it is only a suggestion
stat.desc(forwards, basic=F) #runs basic stats

##              PosNumeric Position Weight_lb BenchPress_reps150
## median                1       NA   186.300               7.00
## mean                  1       NA   185.984               6.89
## SE.mean               0       NA     1.719               0.44
## CI.mean.0.95          0       NA     3.446               0.87
## var                   0       NA   162.450              10.43
## std.dev               0       NA    12.746               3.23
## coef.var              0       NA     0.069               0.47
##              BenchPress_lbperlbBodyWeight PushUps_max PushUpsxBody.Weight
## median                               5.67       24.00             4656.40
## mean                                 5.63       24.75             4568.66
## SE.mean                              0.36        0.83              172.16
## CI.mean.0.95                         0.73        1.66              345.15
## var                                  7.29       37.90          1630086.24
## std.dev                              2.70        6.16             1276.75
## coef.var                             0.48        0.25                0.28
##              Push.Strength..lb. PushStrength_lbperlbBodyWeight
## median                   230.00                          1.270
## mean                     232.45                          1.276
## SE.mean                    5.69                          0.037
## CI.mean.0.95              11.41                          0.075
## var                     1780.02                          0.076
## std.dev                   42.19                          0.276
## coef.var                   0.18                          0.216

Descriptive summary for defenders:

defense<-filter(nhldata,PosNumeric==2)
options(scipen=100) #supresses scientific notation of output
options(digits=2) #suggests a limit of 2 significant digits - but it is only a suggestion
stat.desc(defense, basic=F) #runs basic stats

##              PosNumeric Position Weight_lb BenchPress_reps150
## median                2       NA   189.900               5.00
## mean                  2       NA   188.400               6.48
## SE.mean               0       NA     3.088               0.64
## CI.mean.0.95          0       NA     6.347               1.32
## var                   0       NA   257.464              11.18
## std.dev               0       NA    16.046               3.34
## coef.var              0       NA     0.085               0.52
##              BenchPress_lbperlbBodyWeight PushUps_max PushUpsxBody.Weight
## median                               4.48       24.00             4465.00
## mean                                 5.07       24.22             4582.83
## SE.mean                              0.47        1.14              250.74
## CI.mean.0.95                         0.97        2.34              515.40
## var                                  6.01       35.03          1697459.95
## std.dev                              2.45        5.92             1302.87
## coef.var                             0.48        0.24                0.28
##              Push.Strength..lb. PushStrength_lbperlbBodyWeight
## median                    212.0                          1.180
## mean                      223.8                          1.186
## SE.mean                     8.8                          0.039
## CI.mean.0.95               18.0                          0.080
## var                      2068.9                          0.041
## std.dev                    45.5                          0.202
## coef.var                    0.2                          0.171

3. Boxplots to compare groups

The box plot (a.k.a. box and whisker diagram) is a standardized way of displaying the distribution of data based on the five number summary: minimum, first quartile, median, third quartile, and maximum. Outliers are plotted as individual points. They are easier to look at for comparing two factors of a variable (forward, defense)

ggplot(nhldata, aes(factor(Position), BenchPress_reps150)) + geom_boxplot()

ggplot(nhldata, aes(factor(Position), BenchPress_lbperlbBodyWeight)) + geom_boxplot()

ggplot(nhldata, aes(factor(Position), PushUps_max)) + geom_boxplot()

ggplot(nhldata, aes(factor(Position), PushUpsxBody.Weight)) + geom_boxplot()

ggplot(nhldata, aes(factor(Position), Push.Strength..lb.)) + geom_boxplot()

ggplot(nhldata, aes(factor(Position), PushStrength_lbperlbBodyWeight)) + geom_boxplot()

The fact that there is not much difference between the absolute and relative versions of these data is really due to the fact that there is not much of a difference between the groups in body mass:

ggplot(nhldata, aes(factor(Position), Weight_lb)) + geom_boxplot()

4. T-tests

to compare group differences (statistically test the null hypothesis that the groups are not different) (the graphics suggest no difference - but wanted to see if the null was accepted in the t-test (i.e. p > 0.05 if we accept an alpha of 0.05))

t.test(nhldata$Weight_lb~nhldata$Position)

## 
##  Welch Two Sample t-test
## 
## data:  nhldata$Weight_lb by nhldata$Position
## t = 0.7, df = 40, p-value = 0.5
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -4.7  9.5
## sample estimates:
## mean in group Defense mean in group Forward 
##                   188                   186

t.test(nhldata$BenchPress_reps150~nhldata$Position)

## 
##  Welch Two Sample t-test
## 
## data:  nhldata$BenchPress_reps150 by nhldata$Position
## t = -0.5, df = 50, p-value = 0.6
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -2.0  1.2
## sample estimates:
## mean in group Defense mean in group Forward 
##                   6.5                   6.9

t.test(nhldata$BenchPress_lbperlbBodyWeight~nhldata$Position)

## 
##  Welch Two Sample t-test
## 
## data:  nhldata$BenchPress_lbperlbBodyWeight by nhldata$Position
## t = -0.9, df = 60, p-value = 0.3
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.76  0.63
## sample estimates:
## mean in group Defense mean in group Forward 
##                   5.1                   5.6

t.test(nhldata$PushUps_max~nhldata$Position)

## 
##  Welch Two Sample t-test
## 
## data:  nhldata$PushUps_max by nhldata$Position
## t = -0.4, df = 50, p-value = 0.7
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -3.3  2.3
## sample estimates:
## mean in group Defense mean in group Forward 
##                    24                    25

t.test(nhldata$PushUpsxBody.Weight~nhldata$Position)

## 
##  Welch Two Sample t-test
## 
## data:  nhldata$PushUpsxBody.Weight by nhldata$Position
## t = 0.05, df = 50, p-value = 1
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -596  625
## sample estimates:
## mean in group Defense mean in group Forward 
##                  4583                  4569

t.test(nhldata$Push.Strength..lb.~nhldata$Position)

## 
##  Welch Two Sample t-test
## 
## data:  nhldata$Push.Strength..lb. by nhldata$Position
## t = -0.8, df = 50, p-value = 0.4
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -30  12
## sample estimates:
## mean in group Defense mean in group Forward 
##                   224                   232

t.test(nhldata$PushStrength_lbperlbBodyWeight~nhldata$Position)

## 
##  Welch Two Sample t-test
## 
## data:  nhldata$PushStrength_lbperlbBodyWeight by nhldata$Position
## t = -2, df = 70, p-value = 0.1
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.197  0.018
## sample estimates:
## mean in group Defense mean in group Forward 
##                   1.2                   1.3

Thoughts

Clearly there are no significant differences in this data set between these players. You should see if have been any other analyses looking at this question in NHL players, particularly older players. These players are just entering their potential professional career. After being drafted many of them spend years in college or waiting to actually get to the NHL (playing in other leagues). If older NHL players show a difference, perhaps it comes about from the rigours of training as they continue to develop as players. These players maybe have not been playing long enough to be trained into their position where they partially adapt based on their role.