Data analysis performed for D’Amour’s physiology of exercise lab report
“bench, pushups, and push power would be the best way to represent the”hitting strength" for the players. Id compare the two numbers for bench and pushups to factor in body weight too so its not looking at if heavier people are stronger as well“”
Question: Do skating players (forward and defense) entering the NHL combine scouting tests have different testing results indices of “hitting strength”?
Operationalize “hitting strength” as: bench, pushups, and push power; Consider both absolute values and normalize for body mass
Expected: Defense will have a higher “hitting strength”
Note: goalies are not included in the analysis
All analyses performed in R (version 3.2.0 (www.r-project.org)) using RStudio (www.rstudio.com)
BenchPress_reps150 BenchPress_lbperlbBodyWeight PushUps_max PushUpsxBody.Weight Push.Strength..lb. PushStrength_lbperlbBodyWeight
Note: since data includes body mass normalized variables, this analysis will not calculate those variables; therefore the normalizations are based on “lbs” not “kgs” since that is how the NHL has done the processing
# Load necessary R packages
library(dplyr) #For data manipulation
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(reshape2) #For data manipulation related to plotting
library(ggplot2) #For plotting
library(pastecs) #For descriptive statistics
## Loading required package: boot
##
## Attaching package: 'pastecs'
##
## The following objects are masked from 'package:dplyr':
##
## first, last
library(psych) #For descriptive statistics
##
## Attaching package: 'psych'
##
## The following object is masked from 'package:boot':
##
## logit
##
## The following object is masked from 'package:ggplot2':
##
## %+%
# Set working directory - mac file system
setwd("~/physExLab/NHLcombinedata")
# Load data
nhldata<-read.csv("data_position.csv")
#Create dataframe with only the variables we are interested in
nhldata<-select(nhldata,PosNumeric, Position, Weight_lb,BenchPress_reps150, BenchPress_lbperlbBodyWeight, PushUps_max, PushUpsxBody.Weight, Push.Strength..lb., PushStrength_lbperlbBodyWeight)
As you can see there are missing values in most variables.
str(nhldata)
## 'data.frame': 90 obs. of 9 variables:
## $ PosNumeric : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Position : Factor w/ 2 levels "Defense","Forward": 2 2 2 2 2 2 2 2 2 2 ...
## $ Weight_lb : num 209 176 192 181 205 ...
## $ BenchPress_reps150 : int 12 10 5 5 10 7 6 9 4 3 ...
## $ BenchPress_lbperlbBodyWeight : num 8.6 8.52 3.9 4.15 7.33 6.31 4.79 7.25 3.54 2.29 ...
## $ PushUps_max : int 24 21 18 26 34 28 25 24 28 14 ...
## $ PushUpsxBody.Weight : num 5021 3696 3461 4696 6956 ...
## $ Push.Strength..lb. : num 277 247 205 239 294 188 255 275 191 176 ...
## $ PushStrength_lbperlbBodyWeight: num 1.32 1.4 1.07 1.32 1.44 1.13 1.36 1.48 1.13 0.89 ...
summary(nhldata)
## PosNumeric Position Weight_lb BenchPress_reps150
## Min. :1.000 Defense:28 Min. :153.3 Min. : 0.000
## 1st Qu.:1.000 Forward:62 1st Qu.:177.5 1st Qu.: 4.000
## Median :1.000 Median :187.2 Median : 6.500
## Mean :1.311 Mean :186.8 Mean : 6.756
## 3rd Qu.:2.000 3rd Qu.:196.0 3rd Qu.: 9.000
## Max. :2.000 Max. :240.7 Max. :13.000
## NA's :8
## BenchPress_lbperlbBodyWeight PushUps_max PushUpsxBody.Weight
## Min. : 0.000 Min. :12.00 Min. :1760
## 1st Qu.: 3.550 1st Qu.:20.75 1st Qu.:3695
## Median : 5.465 Median :24.00 Median :4605
## Mean : 5.446 Mean :24.64 Mean :4588
## 3rd Qu.: 7.190 3rd Qu.:28.00 3rd Qu.:5104
## Max. :13.130 Max. :42.00 Max. :8085
## NA's :8 NA's :6 NA's :6
## Push.Strength..lb. PushStrength_lbperlbBodyWeight
## Min. :142.0 Min. :0.730
## 1st Qu.:198.0 1st Qu.:1.080
## Median :222.8 Median :1.220
## Mean :228.2 Mean :1.239
## 3rd Qu.:260.0 3rd Qu.:1.387
## Max. :366.0 Max. :2.580
## NA's :4 NA's :4
But we are really only interested in cases that have complete data - so need to eliminate cases with missing data (note - if we had plans for a publication we would be much more cautious about doing this)
A complete case has all of the variables of interest
We have gone from 90 subjects to 82 subjects - so 8 subjects had incomplete data in the variables of interest.
nhldata <- nhldata[complete.cases(nhldata), ]
str(nhldata)
## 'data.frame': 82 obs. of 9 variables:
## $ PosNumeric : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Position : Factor w/ 2 levels "Defense","Forward": 2 2 2 2 2 2 2 2 2 2 ...
## $ Weight_lb : num 209 176 192 181 205 ...
## $ BenchPress_reps150 : int 12 10 5 5 10 7 6 9 4 3 ...
## $ BenchPress_lbperlbBodyWeight : num 8.6 8.52 3.9 4.15 7.33 6.31 4.79 7.25 3.54 2.29 ...
## $ PushUps_max : int 24 21 18 26 34 28 25 24 28 14 ...
## $ PushUpsxBody.Weight : num 5021 3696 3461 4696 6956 ...
## $ Push.Strength..lb. : num 277 247 205 239 294 188 255 275 191 176 ...
## $ PushStrength_lbperlbBodyWeight: num 1.32 1.4 1.07 1.32 1.44 1.13 1.36 1.48 1.13 0.89 ...
summary(nhldata)
## PosNumeric Position Weight_lb BenchPress_reps150
## Min. :1.000 Defense:27 Min. :153.3 Min. : 0.000
## 1st Qu.:1.000 Forward:55 1st Qu.:177.5 1st Qu.: 4.000
## Median :1.000 Median :187.2 Median : 6.500
## Mean :1.329 Mean :186.8 Mean : 6.756
## 3rd Qu.:2.000 3rd Qu.:195.9 3rd Qu.: 9.000
## Max. :2.000 Max. :240.7 Max. :13.000
## BenchPress_lbperlbBodyWeight PushUps_max PushUpsxBody.Weight
## Min. : 0.000 Min. :12.00 Min. :1760
## 1st Qu.: 3.550 1st Qu.:20.25 1st Qu.:3693
## Median : 5.465 Median :24.00 Median :4549
## Mean : 5.446 Mean :24.57 Mean :4573
## 3rd Qu.: 7.190 3rd Qu.:28.00 3rd Qu.:5073
## Max. :13.130 Max. :42.00 Max. :8085
## Push.Strength..lb. PushStrength_lbperlbBodyWeight
## Min. :142.0 Min. :0.730
## 1st Qu.:198.5 1st Qu.:1.093
## Median :228.0 Median :1.240
## Mean :229.6 Mean :1.246
## 3rd Qu.:262.2 3rd Qu.:1.397
## Max. :366.0 Max. :2.580
options(scipen=100) #supresses scientific notation of output
options(digits=2) #suggests a limit of 2 significant digits - but it is only a suggestion
stat.desc(nhldata, basic=F) #runs basic stats
## PosNumeric Position Weight_lb BenchPress_reps150
## median 1.000 NA 187.200 6.50
## mean 1.329 NA 186.779 6.76
## SE.mean 0.052 NA 1.531 0.36
## CI.mean.0.95 0.104 NA 3.047 0.71
## var 0.224 NA 192.248 10.58
## std.dev 0.473 NA 13.865 3.25
## coef.var 0.356 NA 0.074 0.48
## BenchPress_lbperlbBodyWeight PushUps_max PushUpsxBody.Weight
## median 5.46 24.00 4548.80
## mean 5.45 24.57 4573.33
## SE.mean 0.29 0.67 141.06
## CI.mean.0.95 0.58 1.33 280.67
## var 6.86 36.57 1631632.73
## std.dev 2.62 6.05 1277.35
## coef.var 0.48 0.25 0.28
## Push.Strength..lb. PushStrength_lbperlbBodyWeight
## median 228.00 1.240
## mean 229.61 1.246
## SE.mean 4.77 0.028
## CI.mean.0.95 9.49 0.056
## var 1867.38 0.066
## std.dev 43.21 0.256
## coef.var 0.19 0.206
forwards<-filter(nhldata,PosNumeric==1)
options(scipen=100) #supresses scientific notation of output
options(digits=2) #suggests a limit of 2 significant digits - but it is only a suggestion
stat.desc(forwards, basic=F) #runs basic stats
## PosNumeric Position Weight_lb BenchPress_reps150
## median 1 NA 186.300 7.00
## mean 1 NA 185.984 6.89
## SE.mean 0 NA 1.719 0.44
## CI.mean.0.95 0 NA 3.446 0.87
## var 0 NA 162.450 10.43
## std.dev 0 NA 12.746 3.23
## coef.var 0 NA 0.069 0.47
## BenchPress_lbperlbBodyWeight PushUps_max PushUpsxBody.Weight
## median 5.67 24.00 4656.40
## mean 5.63 24.75 4568.66
## SE.mean 0.36 0.83 172.16
## CI.mean.0.95 0.73 1.66 345.15
## var 7.29 37.90 1630086.24
## std.dev 2.70 6.16 1276.75
## coef.var 0.48 0.25 0.28
## Push.Strength..lb. PushStrength_lbperlbBodyWeight
## median 230.00 1.270
## mean 232.45 1.276
## SE.mean 5.69 0.037
## CI.mean.0.95 11.41 0.075
## var 1780.02 0.076
## std.dev 42.19 0.276
## coef.var 0.18 0.216
defense<-filter(nhldata,PosNumeric==2)
options(scipen=100) #supresses scientific notation of output
options(digits=2) #suggests a limit of 2 significant digits - but it is only a suggestion
stat.desc(defense, basic=F) #runs basic stats
## PosNumeric Position Weight_lb BenchPress_reps150
## median 2 NA 189.900 5.00
## mean 2 NA 188.400 6.48
## SE.mean 0 NA 3.088 0.64
## CI.mean.0.95 0 NA 6.347 1.32
## var 0 NA 257.464 11.18
## std.dev 0 NA 16.046 3.34
## coef.var 0 NA 0.085 0.52
## BenchPress_lbperlbBodyWeight PushUps_max PushUpsxBody.Weight
## median 4.48 24.00 4465.00
## mean 5.07 24.22 4582.83
## SE.mean 0.47 1.14 250.74
## CI.mean.0.95 0.97 2.34 515.40
## var 6.01 35.03 1697459.95
## std.dev 2.45 5.92 1302.87
## coef.var 0.48 0.24 0.28
## Push.Strength..lb. PushStrength_lbperlbBodyWeight
## median 212.0 1.180
## mean 223.8 1.186
## SE.mean 8.8 0.039
## CI.mean.0.95 18.0 0.080
## var 2068.9 0.041
## std.dev 45.5 0.202
## coef.var 0.2 0.171
The box plot (a.k.a. box and whisker diagram) is a standardized way of displaying the distribution of data based on the five number summary: minimum, first quartile, median, third quartile, and maximum. Outliers are plotted as individual points. They are easier to look at for comparing two factors of a variable (forward, defense)
ggplot(nhldata, aes(factor(Position), BenchPress_reps150)) + geom_boxplot()
ggplot(nhldata, aes(factor(Position), BenchPress_lbperlbBodyWeight)) + geom_boxplot()
ggplot(nhldata, aes(factor(Position), PushUps_max)) + geom_boxplot()
ggplot(nhldata, aes(factor(Position), PushUpsxBody.Weight)) + geom_boxplot()
ggplot(nhldata, aes(factor(Position), Push.Strength..lb.)) + geom_boxplot()
ggplot(nhldata, aes(factor(Position), PushStrength_lbperlbBodyWeight)) + geom_boxplot()
The fact that there is not much difference between the absolute and relative versions of these data is really due to the fact that there is not much of a difference between the groups in body mass:
ggplot(nhldata, aes(factor(Position), Weight_lb)) + geom_boxplot()
to compare group differences (statistically test the null hypothesis that the groups are not different) (the graphics suggest no difference - but wanted to see if the null was accepted in the t-test (i.e. p > 0.05 if we accept an alpha of 0.05))
t.test(nhldata$Weight_lb~nhldata$Position)
##
## Welch Two Sample t-test
##
## data: nhldata$Weight_lb by nhldata$Position
## t = 0.7, df = 40, p-value = 0.5
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -4.7 9.5
## sample estimates:
## mean in group Defense mean in group Forward
## 188 186
t.test(nhldata$BenchPress_reps150~nhldata$Position)
##
## Welch Two Sample t-test
##
## data: nhldata$BenchPress_reps150 by nhldata$Position
## t = -0.5, df = 50, p-value = 0.6
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2.0 1.2
## sample estimates:
## mean in group Defense mean in group Forward
## 6.5 6.9
t.test(nhldata$BenchPress_lbperlbBodyWeight~nhldata$Position)
##
## Welch Two Sample t-test
##
## data: nhldata$BenchPress_lbperlbBodyWeight by nhldata$Position
## t = -0.9, df = 60, p-value = 0.3
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.76 0.63
## sample estimates:
## mean in group Defense mean in group Forward
## 5.1 5.6
t.test(nhldata$PushUps_max~nhldata$Position)
##
## Welch Two Sample t-test
##
## data: nhldata$PushUps_max by nhldata$Position
## t = -0.4, df = 50, p-value = 0.7
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -3.3 2.3
## sample estimates:
## mean in group Defense mean in group Forward
## 24 25
t.test(nhldata$PushUpsxBody.Weight~nhldata$Position)
##
## Welch Two Sample t-test
##
## data: nhldata$PushUpsxBody.Weight by nhldata$Position
## t = 0.05, df = 50, p-value = 1
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -596 625
## sample estimates:
## mean in group Defense mean in group Forward
## 4583 4569
t.test(nhldata$Push.Strength..lb.~nhldata$Position)
##
## Welch Two Sample t-test
##
## data: nhldata$Push.Strength..lb. by nhldata$Position
## t = -0.8, df = 50, p-value = 0.4
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -30 12
## sample estimates:
## mean in group Defense mean in group Forward
## 224 232
t.test(nhldata$PushStrength_lbperlbBodyWeight~nhldata$Position)
##
## Welch Two Sample t-test
##
## data: nhldata$PushStrength_lbperlbBodyWeight by nhldata$Position
## t = -2, df = 70, p-value = 0.1
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.197 0.018
## sample estimates:
## mean in group Defense mean in group Forward
## 1.2 1.3
Clearly there are no significant differences in this data set between these players. You should see if have been any other analyses looking at this question in NHL players, particularly older players. These players are just entering their potential professional career. After being drafted many of them spend years in college or waiting to actually get to the NHL (playing in other leagues). If older NHL players show a difference, perhaps it comes about from the rigours of training as they continue to develop as players. These players maybe have not been playing long enough to be trained into their position where they partially adapt based on their role.