Data analysis performed for Almonte’s physiology of exercise lab report
Question: Do skating players (forward and defense) entering the NHL combine scouting tests have different testing results in key indices of aerobic and anaerobic fitness?
Expected: Forwards will have a higher aerobic fitness, defense will have a higher peak power, but also a higher fatigue index percent (they will fatigue faster).
Request: Compare averages– mean, standard deviation, etc.– of both groups: please add histograms - I went with boxplots thinking they would provide a better approach to comparing the groups; tables -all the data you need for tables is included but not in a presentable way so you can select what you want and add to a table Aerobic fitness with VO2 max (ml/kg/min) Anaerobic fitness with peak power Anaerobic fatigue index
Other variables you might be interested have been added - you can disregard; if it is disruptive to have them included here they can be removed
Note: goalies are not included in the analysis
Analysis
# Load necessary R packages
library(dplyr) #For data manipulation
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(reshape2) #For data manipulation related to plotting
library(ggplot2) #For plotting
library(pastecs) #For descriptive statistics
## Loading required package: boot
##
## Attaching package: 'pastecs'
##
## The following objects are masked from 'package:dplyr':
##
## first, last
library(psych) #For descriptive statistics
##
## Attaching package: 'psych'
##
## The following object is masked from 'package:boot':
##
## logit
##
## The following object is masked from 'package:ggplot2':
##
## %+%
# Set working directory - mac file system
setwd("~/physExLab/NHLcombinedata")
# Load data
nhldata<-read.csv("data_position.csv")
#Create weight in kg variable
nhldata<-mutate(nhldata, Weight_kg = Weight_lb/2.2)
#Create aerobic test peak watts / kg variable
nhldata<-mutate(nhldata, VO2max_watts_kg= VO2max_watts/Weight_kg)
#Create Lewis average watts / kg
nhldata<-mutate(nhldata, LewisAvg_.Pause_watts_kg= LewisAvg_.Pause_watts/Weight_kg)
#Create dataframe with only the variables we are interested in
nhldata<-select(nhldata,PosNumeric, Position, Weight_kg, WingatePeak_Watts, PeakWingate_W_kg, WingatePercentDrop, VO2max_l_min, VO2max_ml_kg_min, VO2max_watts, VO2max_watts_kg, LewisAvg_.Pause_watts, LewisAvg_.Pause_watts_kg)
Here is a summary of the resulting dataframe. As you can see there are missing values in most variables.
str(nhldata)
## 'data.frame': 90 obs. of 12 variables:
## $ PosNumeric : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Position : Factor w/ 2 levels "Defense","Forward": 2 2 2 2 2 2 2 2 2 2 ...
## $ Weight_kg : num 95.1 80 87.4 82.1 93 ...
## $ WingatePeak_Watts : int 1136 1017 1171 933 1210 1047 1191 1234 975 1099 ...
## $ PeakWingate_W_kg : num 12 12.7 13.4 11.4 13 ...
## $ WingatePercentDrop : num 54.5 52.5 36.7 39.5 54.4 57.4 49.3 54.1 33.7 46.8 ...
## $ VO2max_l_min : num 4.75 3.99 4.94 5.64 5.19 4.39 5.45 4.96 4.44 4.64 ...
## $ VO2max_ml_kg_min : num 50 49.9 56.5 68.7 55.8 58 63.8 58.6 57.6 52 ...
## $ VO2max_watts : int 520 400 520 520 520 480 520 400 480 480 ...
## $ VO2max_watts_kg : num 5.47 5 5.95 6.33 5.59 ...
## $ LewisAvg_.Pause_watts : num 1390 1330 1372 1357 1650 ...
## $ LewisAvg_.Pause_watts_kg: num 14.6 16.6 15.7 16.5 17.7 ...
summary(nhldata)
## PosNumeric Position Weight_kg WingatePeak_Watts
## Min. :1.000 Defense:28 Min. : 69.68 Min. : 784
## 1st Qu.:1.000 Forward:62 1st Qu.: 80.67 1st Qu.:1062
## Median :1.000 Median : 85.09 Median :1148
## Mean :1.311 Mean : 84.91 Mean :1136
## 3rd Qu.:2.000 3rd Qu.: 89.09 3rd Qu.:1194
## Max. :2.000 Max. :109.41 Max. :1502
## NA's :3
## PeakWingate_W_kg WingatePercentDrop VO2max_l_min VO2max_ml_kg_min
## Min. : 9.03 Min. :33.70 Min. :3.640 Min. :44.90
## 1st Qu.:12.85 1st Qu.:45.20 1st Qu.:4.350 1st Qu.:51.77
## Median :13.44 Median :50.00 Median :4.725 Median :56.20
## Mean :13.44 Mean :49.60 Mean :4.766 Mean :56.35
## 3rd Qu.:13.98 3rd Qu.:54.15 3rd Qu.:5.185 3rd Qu.:60.58
## Max. :15.93 Max. :77.60 Max. :6.280 Max. :68.70
## NA's :3 NA's :3 NA's :4 NA's :4
## VO2max_watts VO2max_watts_kg LewisAvg_.Pause_watts
## Min. :400.0 Min. :4.396 Min. :1125
## 1st Qu.:480.0 1st Qu.:5.649 1st Qu.:1306
## Median :520.0 Median :6.083 Median :1414
## Mean :505.1 Mean :5.977 Mean :1416
## 3rd Qu.:550.0 3rd Qu.:6.334 3rd Qu.:1509
## Max. :640.0 Max. :7.556 Max. :1865
## NA's :4 NA's :4 NA's :3
## LewisAvg_.Pause_watts_kg
## Min. :14.32
## 1st Qu.:15.92
## Median :16.62
## Mean :16.71
## 3rd Qu.:17.35
## Max. :20.63
## NA's :3
But we are really only interested in cases that have complete data - so need to eliminate cases with missing data (note - if we had plans for a publication we would be much more cautious about doing this)
As you can see we have gone from 90 subjects to 86 subjects - so 4 subjects had incomplete data in the variables of interest.
nhldata <- nhldata[complete.cases(nhldata), ]
str(nhldata)
## 'data.frame': 86 obs. of 12 variables:
## $ PosNumeric : int 1 1 1 1 1 1 1 1 1 1 ...
## $ Position : Factor w/ 2 levels "Defense","Forward": 2 2 2 2 2 2 2 2 2 2 ...
## $ Weight_kg : num 95.1 80 87.4 82.1 93 ...
## $ WingatePeak_Watts : int 1136 1017 1171 933 1210 1047 1191 1234 975 1099 ...
## $ PeakWingate_W_kg : num 12 12.7 13.4 11.4 13 ...
## $ WingatePercentDrop : num 54.5 52.5 36.7 39.5 54.4 57.4 49.3 54.1 33.7 46.8 ...
## $ VO2max_l_min : num 4.75 3.99 4.94 5.64 5.19 4.39 5.45 4.96 4.44 4.64 ...
## $ VO2max_ml_kg_min : num 50 49.9 56.5 68.7 55.8 58 63.8 58.6 57.6 52 ...
## $ VO2max_watts : int 520 400 520 520 520 480 520 400 480 480 ...
## $ VO2max_watts_kg : num 5.47 5 5.95 6.33 5.59 ...
## $ LewisAvg_.Pause_watts : num 1390 1330 1372 1357 1650 ...
## $ LewisAvg_.Pause_watts_kg: num 14.6 16.6 15.7 16.5 17.7 ...
summary(nhldata)
## PosNumeric Position Weight_kg WingatePeak_Watts
## Min. :1.000 Defense:27 Min. : 69.68 Min. : 784
## 1st Qu.:1.000 Forward:59 1st Qu.: 80.49 1st Qu.:1061
## Median :1.000 Median : 84.86 Median :1150
## Mean :1.314 Mean : 84.70 Mean :1136
## 3rd Qu.:2.000 3rd Qu.: 89.00 3rd Qu.:1195
## Max. :2.000 Max. :109.41 Max. :1502
## PeakWingate_W_kg WingatePercentDrop VO2max_l_min VO2max_ml_kg_min
## Min. : 9.03 Min. :33.70 Min. :3.640 Min. :44.90
## 1st Qu.:12.84 1st Qu.:45.20 1st Qu.:4.350 1st Qu.:51.77
## Median :13.45 Median :49.90 Median :4.725 Median :56.20
## Mean :13.45 Mean :49.54 Mean :4.766 Mean :56.35
## 3rd Qu.:13.99 3rd Qu.:54.17 3rd Qu.:5.185 3rd Qu.:60.58
## Max. :15.93 Max. :77.60 Max. :6.280 Max. :68.70
## VO2max_watts VO2max_watts_kg LewisAvg_.Pause_watts
## Min. :400.0 Min. :4.396 Min. :1125
## 1st Qu.:480.0 1st Qu.:5.649 1st Qu.:1304
## Median :520.0 Median :6.083 Median :1413
## Mean :505.1 Mean :5.977 Mean :1414
## 3rd Qu.:550.0 3rd Qu.:6.334 3rd Qu.:1505
## Max. :640.0 Max. :7.556 Max. :1865
## LewisAvg_.Pause_watts_kg
## Min. :14.32
## 1st Qu.:15.90
## Median :16.62
## Mean :16.69
## 3rd Qu.:17.29
## Max. :20.63
Descriptive summary of the (potential) variables of interest (all subjects):
options(scipen=100) #supresses scientific notation of output
options(digits=2) #suggests a limit of 2 significant digits - but it is only a suggestion
stat.desc(nhldata, basic=F) #runs basic stats
## PosNumeric Position Weight_kg WingatePeak_Watts
## median 1.00 NA 84.864 1149.50
## mean 1.31 NA 84.700 1136.30
## SE.mean 0.05 NA 0.678 13.65
## CI.mean.0.95 0.10 NA 1.349 27.14
## var 0.22 NA 39.580 16019.67
## std.dev 0.47 NA 6.291 126.57
## coef.var 0.36 NA 0.074 0.11
## PeakWingate_W_kg WingatePercentDrop VO2max_l_min
## median 13.445 49.90 4.725
## mean 13.450 49.54 4.766
## SE.mean 0.125 0.74 0.059
## CI.mean.0.95 0.249 1.48 0.118
## var 1.352 47.70 0.302
## std.dev 1.163 6.91 0.550
## coef.var 0.086 0.14 0.115
## VO2max_ml_kg_min VO2max_watts VO2max_watts_kg
## median 56.200 520.00 6.083
## mean 56.346 505.12 5.977
## SE.mean 0.593 6.03 0.068
## CI.mean.0.95 1.179 11.99 0.136
## var 30.222 3126.46 0.403
## std.dev 5.497 55.91 0.635
## coef.var 0.098 0.11 0.106
## LewisAvg_.Pause_watts LewisAvg_.Pause_watts_kg
## median 1413.0 16.625
## mean 1414.0 16.687
## SE.mean 15.9 0.121
## CI.mean.0.95 31.7 0.241
## var 21843.3 1.261
## std.dev 147.8 1.123
## coef.var 0.1 0.067
Descriptive summary of the (potential) variables of interest for forwards:
forwards<-filter(nhldata,PosNumeric==1)
options(scipen=100) #supresses scientific notation of output
options(digits=2) #suggests a limit of 2 significant digits - but it is only a suggestion
stat.desc(forwards, basic=F) #runs basic stats
## PosNumeric Position Weight_kg WingatePeak_Watts
## median 1 NA 84.591 1140.00
## mean 1 NA 84.271 1128.83
## SE.mean 0 NA 0.754 15.45
## CI.mean.0.95 0 NA 1.510 30.92
## var 0 NA 33.564 14078.28
## std.dev 0 NA 5.793 118.65
## coef.var 0 NA 0.069 0.11
## PeakWingate_W_kg WingatePercentDrop VO2max_l_min
## median 13.370 50.00 4.730
## mean 13.431 49.41 4.735
## SE.mean 0.150 0.99 0.065
## CI.mean.0.95 0.300 1.98 0.131
## var 1.328 57.90 0.252
## std.dev 1.152 7.61 0.502
## coef.var 0.086 0.15 0.106
## VO2max_ml_kg_min VO2max_watts VO2max_watts_kg
## median 56.100 520.000 6.000
## mean 56.306 498.983 5.938
## SE.mean 0.719 6.440 0.081
## CI.mean.0.95 1.439 12.892 0.161
## var 30.508 2447.224 0.383
## std.dev 5.523 49.469 0.619
## coef.var 0.098 0.099 0.104
## LewisAvg_.Pause_watts LewisAvg_.Pause_watts_kg
## median 1406.000 16.540
## mean 1398.710 16.599
## SE.mean 17.034 0.139
## CI.mean.0.95 34.098 0.278
## var 17119.680 1.134
## std.dev 130.842 1.065
## coef.var 0.094 0.064
Descriptive summary of the (potential) variables of interest for defenders:
defense<-filter(nhldata,PosNumeric==2)
options(scipen=100) #supresses scientific notation of output
options(digits=2) #suggests a limit of 2 significant digits - but it is only a suggestion
stat.desc(defense, basic=F) #runs basic stats
## PosNumeric Position Weight_kg WingatePeak_Watts
## median 2 NA 86.318 1151.00
## mean 2 NA 85.636 1152.63
## SE.mean 0 NA 1.404 27.60
## CI.mean.0.95 0 NA 2.885 56.73
## var 0 NA 53.195 20563.09
## std.dev 0 NA 7.293 143.40
## coef.var 0 NA 0.085 0.12
## PeakWingate_W_kg WingatePercentDrop VO2max_l_min
## median 13.460 49.40 4.68
## mean 13.490 49.84 4.83
## SE.mean 0.232 0.99 0.12
## CI.mean.0.95 0.477 2.04 0.26
## var 1.455 26.66 0.42
## std.dev 1.206 5.16 0.65
## coef.var 0.089 0.10 0.13
## VO2max_ml_kg_min VO2max_watts VO2max_watts_kg
## median 56.480 520.00 6.19
## mean 56.431 518.52 6.06
## SE.mean 1.067 12.90 0.13
## CI.mean.0.95 2.193 26.51 0.27
## var 30.737 4490.03 0.45
## std.dev 5.544 67.01 0.67
## coef.var 0.098 0.13 0.11
## LewisAvg_.Pause_watts LewisAvg_.Pause_watts_kg
## median 1435.00 16.712
## mean 1447.38 16.881
## SE.mean 34.17 0.238
## CI.mean.0.95 70.25 0.490
## var 31533.21 1.536
## std.dev 177.58 1.239
## coef.var 0.12 0.073
Decided to use Boxplots - The box plot (a.k.a. box and whisker diagram) is a standardized way of displaying the distribution of data based on the five number summary: minimum, first quartile, median, third quartile, and maximum. Outliers are plotted as individual points. They are easier to look at for comparing two factors of a variable (forward, defense)
ggplot(nhldata, aes(factor(Position), WingatePeak_Watts)) + geom_boxplot()
ggplot(nhldata, aes(factor(Position), PeakWingate_W_kg)) + geom_boxplot()
ggplot(nhldata, aes(factor(Position), WingatePercentDrop)) + geom_boxplot()
ggplot(nhldata, aes(factor(Position), VO2max_l_min)) + geom_boxplot()
ggplot(nhldata, aes(factor(Position), VO2max_ml_kg_min)) + geom_boxplot()
ggplot(nhldata, aes(factor(Position), VO2max_watts_kg)) + geom_boxplot()
ggplot(nhldata, aes(factor(Position), LewisAvg_.Pause_watts)) + geom_boxplot()
ggplot(nhldata, aes(factor(Position), LewisAvg_.Pause_watts_kg)) + geom_boxplot()
The fact that there is not much difference between the absolute and relative versions of these data is really due to the fact that there is not much of a difference between the groups in body mass:
ggplot(nhldata, aes(factor(Position), Weight_kg)) + geom_boxplot()
Just for kicks - I also used a t-test to see whether there were differences between the positions (the graphics suggest not - but wanted to see if the null was accepted in the t-test (i.e. p > 0.05 if we accept an alpha of 0.05))
t.test(nhldata$Weight_kg~nhldata$Position)
##
## Welch Two Sample t-test
##
## data: nhldata$Weight_kg by nhldata$Position
## t = 0.9, df = 40, p-value = 0.4
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1.9 4.6
## sample estimates:
## mean in group Defense mean in group Forward
## 86 84
t.test(nhldata$WingatePeak_Watts~nhldata$Position)
##
## Welch Two Sample t-test
##
## data: nhldata$WingatePeak_Watts by nhldata$Position
## t = 0.8, df = 40, p-value = 0.5
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -40 88
## sample estimates:
## mean in group Defense mean in group Forward
## 1153 1129
t.test(nhldata$PeakWingate_W_kg~nhldata$Position)
##
## Welch Two Sample t-test
##
## data: nhldata$PeakWingate_W_kg by nhldata$Position
## t = 0.2, df = 50, p-value = 0.8
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.50 0.62
## sample estimates:
## mean in group Defense mean in group Forward
## 13 13
t.test(nhldata$WingatePercentDrop~nhldata$Position)
##
## Welch Two Sample t-test
##
## data: nhldata$WingatePercentDrop by nhldata$Position
## t = 0.3, df = 70, p-value = 0.8
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2.4 3.2
## sample estimates:
## mean in group Defense mean in group Forward
## 50 49
t.test(nhldata$VO2max_l_min~nhldata$Position)
##
## Welch Two Sample t-test
##
## data: nhldata$VO2max_l_min by nhldata$Position
## t = 0.7, df = 40, p-value = 0.5
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.19 0.38
## sample estimates:
## mean in group Defense mean in group Forward
## 4.8 4.7
t.test(nhldata$VO2max_ml_kg_min~nhldata$Position)
##
## Welch Two Sample t-test
##
## data: nhldata$VO2max_ml_kg_min by nhldata$Position
## t = 0.1, df = 50, p-value = 0.9
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -2.5 2.7
## sample estimates:
## mean in group Defense mean in group Forward
## 56 56
t.test(nhldata$VO2max_watts~nhldata$Position)
##
## Welch Two Sample t-test
##
## data: nhldata$VO2max_watts by nhldata$Position
## t = 1, df = 40, p-value = 0.2
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -9.6 48.7
## sample estimates:
## mean in group Defense mean in group Forward
## 519 499
t.test(nhldata$VO2max_watts_kg~nhldata$Position)
##
## Welch Two Sample t-test
##
## data: nhldata$VO2max_watts_kg by nhldata$Position
## t = 0.8, df = 50, p-value = 0.4
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.18 0.43
## sample estimates:
## mean in group Defense mean in group Forward
## 6.1 5.9
t.test(nhldata$LewisAvg_.Pause_watts~nhldata$Position)
##
## Welch Two Sample t-test
##
## data: nhldata$LewisAvg_.Pause_watts by nhldata$Position
## t = 1, df = 40, p-value = 0.2
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -29 126
## sample estimates:
## mean in group Defense mean in group Forward
## 1447 1399
t.test(nhldata$LewisAvg_.Pause_watts_kg~nhldata$Position)
##
## Welch Two Sample t-test
##
## data: nhldata$LewisAvg_.Pause_watts_kg by nhldata$Position
## t = 1, df = 40, p-value = 0.3
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.27 0.84
## sample estimates:
## mean in group Defense mean in group Forward
## 17 17
Clearly there are no significant differences in this data set between these players. Makes for an interesting report though - particularly if there are papers out there showing a difference in older players. Raises the question of whether the players have not been playing long enough to be trained into their position where they partially adapt based on their role.