Analysis
Preliminary Analysis
This first output is simply unavoidable to load packages - it is not part of the analysis but I cannot get it to go away.
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
##
## Loading required package: boot
##
## Attaching package: 'pastecs'
##
## The following objects are masked from 'package:dplyr':
##
## first, last
Summary of Data of interest in this analysis: it is clear there is missing data
## 'data.frame': 100 obs. of 9 variables:
## $ Weight_kg : num 95.1 80 87.4 82.1 81.2 ...
## $ WingatePeak_Watts : int 1136 1017 1171 933 1089 1223 1210 917 1047 1191 ...
## $ PeakWingate_W_kg : num 12 12.7 13.4 11.4 13.4 14.1 13 12.8 13.9 14 ...
## $ VO2max_l_min : num 4.75 3.99 4.94 5.64 4.92 4.41 5.19 4.17 4.39 5.45 ...
## $ VO2max_ml_kg_min : num 50 49.9 56.5 68.7 60.6 50.6 55.8 57.9 58 63.8 ...
## $ VO2max_watts : int 520 400 520 520 520 520 520 440 480 520 ...
## $ VO2max_watts_kg : num 5.47 5 5.95 6.33 6.41 ...
## $ SayersPeak_NoPause_watts : int 5185 5229 5413 5367 5569 4979 6322 4946 4994 5673 ...
## $ SayersPeak_NoPause_watts_kg: num 54.5 65.4 61.9 65.4 68.6 ...
## Weight_kg WingatePeak_Watts PeakWingate_W_kg VO2max_l_min
## Min. : 69.68 Min. : 784 Min. : 9.00 Min. :3.640
## 1st Qu.: 80.83 1st Qu.:1057 1st Qu.:12.80 1st Qu.:4.383
## Median : 85.30 Median :1142 Median :13.40 Median :4.725
## Mean : 84.99 Mean :1135 Mean :13.42 Mean :4.773
## 3rd Qu.: 89.15 3rd Qu.:1195 3rd Qu.:14.00 3rd Qu.:5.185
## Max. :109.41 Max. :1502 Max. :15.90 Max. :6.280
## NA's :4 NA's :4 NA's :6
## VO2max_ml_kg_min VO2max_watts VO2max_watts_kg SayersPeak_NoPause_watts
## Min. :44.90 Min. :400.0 Min. :4.396 Min. :4480
## 1st Qu.:51.77 1st Qu.:480.0 1st Qu.:5.649 1st Qu.:5188
## Median :56.30 Median :520.0 Median :6.064 Median :5598
## Mean :56.41 Mean :505.5 Mean :5.981 Mean :5620
## 3rd Qu.:60.58 3rd Qu.:550.0 3rd Qu.:6.334 3rd Qu.:6042
## Max. :68.70 Max. :640.0 Max. :7.556 Max. :6714
## NA's :6 NA's :6 NA's :6 NA's :4
## SayersPeak_NoPause_watts_kg
## Min. :54.53
## 1st Qu.:62.99
## Median :66.26
## Mean :66.34
## 3rd Qu.:69.81
## Max. :80.08
## NA's :4
Since we are only interested in cases that have complete data - so need to eliminate cases with missing data
As you can see we have gone from 100 subjects to 94 subjects - so 6 subjects had incomplete data in the variables of interest.
Summary of data with cases that have missing values removed
## 'data.frame': 94 obs. of 9 variables:
## $ Weight_kg : num 95.1 80 87.4 82.1 81.2 ...
## $ WingatePeak_Watts : int 1136 1017 1171 933 1089 1223 1210 917 1047 1191 ...
## $ PeakWingate_W_kg : num 12 12.7 13.4 11.4 13.4 14.1 13 12.8 13.9 14 ...
## $ VO2max_l_min : num 4.75 3.99 4.94 5.64 4.92 4.41 5.19 4.17 4.39 5.45 ...
## $ VO2max_ml_kg_min : num 50 49.9 56.5 68.7 60.6 50.6 55.8 57.9 58 63.8 ...
## $ VO2max_watts : int 520 400 520 520 520 520 520 440 480 520 ...
## $ VO2max_watts_kg : num 5.47 5 5.95 6.33 6.41 ...
## $ SayersPeak_NoPause_watts : int 5185 5229 5413 5367 5569 4979 6322 4946 4994 5673 ...
## $ SayersPeak_NoPause_watts_kg: num 54.5 65.4 61.9 65.4 68.6 ...
## Weight_kg WingatePeak_Watts PeakWingate_W_kg VO2max_l_min
## Min. : 69.68 Min. : 784 Min. : 9.00 Min. :3.640
## 1st Qu.: 80.49 1st Qu.:1050 1st Qu.:12.80 1st Qu.:4.383
## Median : 84.95 Median :1142 Median :13.40 Median :4.725
## Mean : 84.73 Mean :1134 Mean :13.41 Mean :4.773
## 3rd Qu.: 89.07 3rd Qu.:1195 3rd Qu.:14.00 3rd Qu.:5.185
## Max. :109.41 Max. :1502 Max. :15.90 Max. :6.280
## VO2max_ml_kg_min VO2max_watts VO2max_watts_kg SayersPeak_NoPause_watts
## Min. :44.90 Min. :400.0 Min. :4.396 Min. :4480
## 1st Qu.:51.77 1st Qu.:480.0 1st Qu.:5.649 1st Qu.:5186
## Median :56.30 Median :520.0 Median :6.064 Median :5582
## Mean :56.41 Mean :505.5 Mean :5.981 Mean :5603
## 3rd Qu.:60.58 3rd Qu.:550.0 3rd Qu.:6.334 3rd Qu.:6026
## Max. :68.70 Max. :640.0 Max. :7.556 Max. :6714
## SayersPeak_NoPause_watts_kg
## Min. :54.53
## 1st Qu.:62.77
## Median :66.10
## Mean :66.23
## 3rd Qu.:69.09
## Max. :80.08
Descriptive summary of the variables of interest:
## Weight_kg WingatePeak_Watts PeakWingate_W_kg VO2max_l_min
## median 84.955 1142.50 13.400 4.725
## mean 84.725 1133.59 13.415 4.773
## SE.mean 0.647 13.15 0.119 0.056
## CI.mean.0.95 1.285 26.10 0.236 0.111
## var 39.387 16243.43 1.330 0.292
## std.dev 6.276 127.45 1.153 0.540
## coef.var 0.074 0.11 0.086 0.113
## VO2max_ml_kg_min VO2max_watts VO2max_watts_kg
## median 56.300 520.00 6.064
## mean 56.412 505.53 5.981
## SE.mean 0.557 5.70 0.065
## CI.mean.0.95 1.106 11.32 0.128
## var 29.174 3057.24 0.393
## std.dev 5.401 55.29 0.627
## coef.var 0.096 0.11 0.105
## SayersPeak_NoPause_watts SayersPeak_NoPause_watts_kg
## median 5582.000 66.104
## mean 5603.117 66.226
## SE.mean 55.425 0.555
## CI.mean.0.95 110.062 1.102
## var 288758.492 28.973
## std.dev 537.363 5.383
## coef.var 0.096 0.081
Univariate distributions
These are separated in groups of variables that have similiar Y axis scaling
Boxplots - The box plot (a.k.a. box and whisker diagram) is a standardized way of displaying the distribution of data based on the five number summary: minimum, first quartile, median, third quartile, and maximum. Outliers are plotted as individual points.
Data with the common Y axis of Watts:
## No id variables; using all as measure variables

Data with the common Y axis of Watts/kg:
## No id variables; using all as measure variables

Data with the common Y axis of oxygen consumption (one for L/min and one for L/kg/min):
## No id variables; using all as measure variables

## No id variables; using all as measure variables

Bivariate distributions (scatter plots to look at associations between two variables)
Next we can compare bivariate distributions (i.e. scatter plots) to look at associations. This first draft will only include the scatter plots without assumptions of the best approach to fitting the data with a function (i.e. linear model).
VO2max in L/min and in L/kg/min:

To help visualize what is happenning (why there is variability) we can weight the points based on the players weight - so that larger points indicate a heavier player. As you can see - heavier players may have larger VO2 max in L/min, but they have lower VO2 max in ml/kg/min - which makes sense.

VO2 L/min and VO2 watts
Watts on the X axis because technically the power being produced is a determinant of the oxygen need (and therefore consumed). One thing to note is that VO2 max watts seems to be a factor variable (not continuous), this is because of the mode of testing the players. They use a Monark bike and each stage of the max test is a pre determined power. So the max watts achieved is based on the state. Therefore, variation in actual watts due to variations in pedal RPMs from the RPMs required in the test would not be captured.

VO2 L/min and VO2 watts (weighted for body mass)
Here we look at the same plot as above but with the points weighted based on player weight. There is a much less clear pattern than the plot of VO2 L/min and VO2 ml/kg/min when weighting based on weight. 

Wingate power - comparision between Watts and Watts/Kg:

Wingate power comparision between Watts and Watts/Kg
When weighting the points based on weight, just like in VO2, it is clear that those that have more mass have greater Watts, but not necessarily greater Watts/Kg

Vertical jump average power watts to watts/kg
Expect the same as above - but a useful exercise in getting familiar with the data.
first without weighting based on body mass

next with weighting based on body mass

Primary analysis questions:
- Compare Wingate peak watts to VO2 max watts

- Compare VO2 max ml kg min to peak wingate watt kg

- Compare Sayers peak no pause watts to the peak watts and to VO2 max watts to see if there is any relationships and if jump capacity has an impact
Expanded and improved upon First with straight forward scatter plots for each comparision Sayers peak no pause watts to the peak wingate watts - with visualization of the impact of weight 
Linear model
fit <- lm(WingatePeak_Watts ~ SayersPeak_NoPause_watts, data=nhldata) #fit model
summary(fit) # show results
##
## Call:
## lm(formula = WingatePeak_Watts ~ SayersPeak_NoPause_watts, data = nhldata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -441.8 -55.2 0.9 65.1 250.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 512.3344 123.0347 4.16 0.000070 ***
## SayersPeak_NoPause_watts 0.1109 0.0219 5.07 0.000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 113 on 92 degrees of freedom
## Multiple R-squared: 0.219, Adjusted R-squared: 0.21
## F-statistic: 25.7 on 1 and 92 DF, p-value: 0.00000203
Sayers peak no pause watts to the peak aerobic watts -with visualization of the impact of weight 
Next we want to see if there is a pattern in the distribution of vertical jump power across the bivariate distribution of wingate and aerobic power 
Finally we want to see if there is a pattern in the distribution of vertical jump power and body mass across the bivariate distribution of wingate and aerobic power - this may be too much in one graphic but it is a start: 
There could be more to come. Let’s start with the above and see where it takes the report. It may be interesting to put some of these into linear models and see what we get - but I want to do so cautiously, not just throw a model together to see what comes out of it. Once models are built, then the fit lines can be added to any / all of the graphics above.