Data analysis performed for Hunt’s physiology of exercise lab report

updated on december 4, 2015 - set echo to false in the R code to remove the code from the analysis; and added additional labels and explanation.

Question:

Looking at the NHL combine data from (2010), how does the participants musculoskeletal fitness, specifically their peak vertical leg power (sayers), relate to their VO2max, precisely ml/kg/min and peak power output, specifically watts/kg?

VO2max = Aerobic, stationary bike with weighted increments Peak power output = Anaerobic, wingate

Analysis

Preliminary Analysis

This first output is simply unavoidable to load packages - it is not part of the analysis but I cannot get it to go away.

## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## 
## Loading required package: boot
## 
## Attaching package: 'pastecs'
## 
## The following objects are masked from 'package:dplyr':
## 
##     first, last
Summary of Data of interest in this analysis: it is clear there is missing data
## 'data.frame':    100 obs. of  9 variables:
##  $ Weight_kg                  : num  95.1 80 87.4 82.1 81.2 ...
##  $ WingatePeak_Watts          : int  1136 1017 1171 933 1089 1223 1210 917 1047 1191 ...
##  $ PeakWingate_W_kg           : num  12 12.7 13.4 11.4 13.4 14.1 13 12.8 13.9 14 ...
##  $ VO2max_l_min               : num  4.75 3.99 4.94 5.64 4.92 4.41 5.19 4.17 4.39 5.45 ...
##  $ VO2max_ml_kg_min           : num  50 49.9 56.5 68.7 60.6 50.6 55.8 57.9 58 63.8 ...
##  $ VO2max_watts               : int  520 400 520 520 520 520 520 440 480 520 ...
##  $ VO2max_watts_kg            : num  5.47 5 5.95 6.33 6.41 ...
##  $ SayersPeak_NoPause_watts   : int  5185 5229 5413 5367 5569 4979 6322 4946 4994 5673 ...
##  $ SayersPeak_NoPause_watts_kg: num  54.5 65.4 61.9 65.4 68.6 ...
##    Weight_kg      WingatePeak_Watts PeakWingate_W_kg  VO2max_l_min  
##  Min.   : 69.68   Min.   : 784      Min.   : 9.00    Min.   :3.640  
##  1st Qu.: 80.83   1st Qu.:1057      1st Qu.:12.80    1st Qu.:4.383  
##  Median : 85.30   Median :1142      Median :13.40    Median :4.725  
##  Mean   : 84.99   Mean   :1135      Mean   :13.42    Mean   :4.773  
##  3rd Qu.: 89.15   3rd Qu.:1195      3rd Qu.:14.00    3rd Qu.:5.185  
##  Max.   :109.41   Max.   :1502      Max.   :15.90    Max.   :6.280  
##                   NA's   :4         NA's   :4        NA's   :6      
##  VO2max_ml_kg_min  VO2max_watts   VO2max_watts_kg SayersPeak_NoPause_watts
##  Min.   :44.90    Min.   :400.0   Min.   :4.396   Min.   :4480            
##  1st Qu.:51.77    1st Qu.:480.0   1st Qu.:5.649   1st Qu.:5188            
##  Median :56.30    Median :520.0   Median :6.064   Median :5598            
##  Mean   :56.41    Mean   :505.5   Mean   :5.981   Mean   :5620            
##  3rd Qu.:60.58    3rd Qu.:550.0   3rd Qu.:6.334   3rd Qu.:6042            
##  Max.   :68.70    Max.   :640.0   Max.   :7.556   Max.   :6714            
##  NA's   :6        NA's   :6       NA's   :6       NA's   :4               
##  SayersPeak_NoPause_watts_kg
##  Min.   :54.53              
##  1st Qu.:62.99              
##  Median :66.26              
##  Mean   :66.34              
##  3rd Qu.:69.81              
##  Max.   :80.08              
##  NA's   :4

Since we are only interested in cases that have complete data - so need to eliminate cases with missing data

As you can see we have gone from 100 subjects to 94 subjects - so 6 subjects had incomplete data in the variables of interest.

Summary of data with cases that have missing values removed
## 'data.frame':    94 obs. of  9 variables:
##  $ Weight_kg                  : num  95.1 80 87.4 82.1 81.2 ...
##  $ WingatePeak_Watts          : int  1136 1017 1171 933 1089 1223 1210 917 1047 1191 ...
##  $ PeakWingate_W_kg           : num  12 12.7 13.4 11.4 13.4 14.1 13 12.8 13.9 14 ...
##  $ VO2max_l_min               : num  4.75 3.99 4.94 5.64 4.92 4.41 5.19 4.17 4.39 5.45 ...
##  $ VO2max_ml_kg_min           : num  50 49.9 56.5 68.7 60.6 50.6 55.8 57.9 58 63.8 ...
##  $ VO2max_watts               : int  520 400 520 520 520 520 520 440 480 520 ...
##  $ VO2max_watts_kg            : num  5.47 5 5.95 6.33 6.41 ...
##  $ SayersPeak_NoPause_watts   : int  5185 5229 5413 5367 5569 4979 6322 4946 4994 5673 ...
##  $ SayersPeak_NoPause_watts_kg: num  54.5 65.4 61.9 65.4 68.6 ...
##    Weight_kg      WingatePeak_Watts PeakWingate_W_kg  VO2max_l_min  
##  Min.   : 69.68   Min.   : 784      Min.   : 9.00    Min.   :3.640  
##  1st Qu.: 80.49   1st Qu.:1050      1st Qu.:12.80    1st Qu.:4.383  
##  Median : 84.95   Median :1142      Median :13.40    Median :4.725  
##  Mean   : 84.73   Mean   :1134      Mean   :13.41    Mean   :4.773  
##  3rd Qu.: 89.07   3rd Qu.:1195      3rd Qu.:14.00    3rd Qu.:5.185  
##  Max.   :109.41   Max.   :1502      Max.   :15.90    Max.   :6.280  
##  VO2max_ml_kg_min  VO2max_watts   VO2max_watts_kg SayersPeak_NoPause_watts
##  Min.   :44.90    Min.   :400.0   Min.   :4.396   Min.   :4480            
##  1st Qu.:51.77    1st Qu.:480.0   1st Qu.:5.649   1st Qu.:5186            
##  Median :56.30    Median :520.0   Median :6.064   Median :5582            
##  Mean   :56.41    Mean   :505.5   Mean   :5.981   Mean   :5603            
##  3rd Qu.:60.58    3rd Qu.:550.0   3rd Qu.:6.334   3rd Qu.:6026            
##  Max.   :68.70    Max.   :640.0   Max.   :7.556   Max.   :6714            
##  SayersPeak_NoPause_watts_kg
##  Min.   :54.53              
##  1st Qu.:62.77              
##  Median :66.10              
##  Mean   :66.23              
##  3rd Qu.:69.09              
##  Max.   :80.08
Descriptive summary of the variables of interest:
##              Weight_kg WingatePeak_Watts PeakWingate_W_kg VO2max_l_min
## median          84.955           1142.50           13.400        4.725
## mean            84.725           1133.59           13.415        4.773
## SE.mean          0.647             13.15            0.119        0.056
## CI.mean.0.95     1.285             26.10            0.236        0.111
## var             39.387          16243.43            1.330        0.292
## std.dev          6.276            127.45            1.153        0.540
## coef.var         0.074              0.11            0.086        0.113
##              VO2max_ml_kg_min VO2max_watts VO2max_watts_kg
## median                 56.300       520.00           6.064
## mean                   56.412       505.53           5.981
## SE.mean                 0.557         5.70           0.065
## CI.mean.0.95            1.106        11.32           0.128
## var                    29.174      3057.24           0.393
## std.dev                 5.401        55.29           0.627
## coef.var                0.096         0.11           0.105
##              SayersPeak_NoPause_watts SayersPeak_NoPause_watts_kg
## median                       5582.000                      66.104
## mean                         5603.117                      66.226
## SE.mean                        55.425                       0.555
## CI.mean.0.95                  110.062                       1.102
## var                        288758.492                      28.973
## std.dev                       537.363                       5.383
## coef.var                        0.096                       0.081

Univariate distributions

These are separated in groups of variables that have similiar Y axis scaling

Boxplots - The box plot (a.k.a. box and whisker diagram) is a standardized way of displaying the distribution of data based on the five number summary: minimum, first quartile, median, third quartile, and maximum. Outliers are plotted as individual points.

Data with the common Y axis of Watts:

## No id variables; using all as measure variables

Data with the common Y axis of Watts/kg:
## No id variables; using all as measure variables

Data with the common Y axis of oxygen consumption (one for L/min and one for L/kg/min):
## No id variables; using all as measure variables

## No id variables; using all as measure variables

Bivariate distributions (scatter plots to look at associations between two variables)

Next we can compare bivariate distributions (i.e. scatter plots) to look at associations. This first draft will only include the scatter plots without assumptions of the best approach to fitting the data with a function (i.e. linear model).

VO2max in L/min and in L/kg/min:

To help visualize what is happenning (why there is variability) we can weight the points based on the players weight - so that larger points indicate a heavier player. As you can see - heavier players may have larger VO2 max in L/min, but they have lower VO2 max in ml/kg/min - which makes sense.

VO2 L/min and VO2 watts

Watts on the X axis because technically the power being produced is a determinant of the oxygen need (and therefore consumed). One thing to note is that VO2 max watts seems to be a factor variable (not continuous), this is because of the mode of testing the players. They use a Monark bike and each stage of the max test is a pre determined power. So the max watts achieved is based on the state. Therefore, variation in actual watts due to variations in pedal RPMs from the RPMs required in the test would not be captured.

VO2 L/min and VO2 watts (weighted for body mass)

Here we look at the same plot as above but with the points weighted based on player weight. There is a much less clear pattern than the plot of VO2 L/min and VO2 ml/kg/min when weighting based on weight.

Wingate power - comparision between Watts and Watts/Kg:

Wingate power comparision between Watts and Watts/Kg

When weighting the points based on weight, just like in VO2, it is clear that those that have more mass have greater Watts, but not necessarily greater Watts/Kg

Vertical jump average power watts to watts/kg

Expect the same as above - but a useful exercise in getting familiar with the data.

first without weighting based on body mass

next with weighting based on body mass

Primary analysis questions:

  1. Compare Wingate peak watts to VO2 max watts

  1. Compare VO2 max ml kg min to peak wingate watt kg

  1. Compare Sayers peak no pause watts to the peak watts and to VO2 max watts to see if there is any relationships and if jump capacity has an impact

Expanded and improved upon First with straight forward scatter plots for each comparision Sayers peak no pause watts to the peak wingate watts - with visualization of the impact of weight

Linear model

fit <- lm(WingatePeak_Watts ~ SayersPeak_NoPause_watts, data=nhldata) #fit model
summary(fit) # show results 
## 
## Call:
## lm(formula = WingatePeak_Watts ~ SayersPeak_NoPause_watts, data = nhldata)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -441.8  -55.2    0.9   65.1  250.8 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              512.3344   123.0347    4.16 0.000070 ***
## SayersPeak_NoPause_watts   0.1109     0.0219    5.07 0.000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 113 on 92 degrees of freedom
## Multiple R-squared:  0.219,  Adjusted R-squared:  0.21 
## F-statistic: 25.7 on 1 and 92 DF,  p-value: 0.00000203

Sayers peak no pause watts to the peak aerobic watts -with visualization of the impact of weight

  1. Next we want to see if there is a pattern in the distribution of vertical jump power across the bivariate distribution of wingate and aerobic power

  2. Finally we want to see if there is a pattern in the distribution of vertical jump power and body mass across the bivariate distribution of wingate and aerobic power - this may be too much in one graphic but it is a start:

There could be more to come. Let’s start with the above and see where it takes the report. It may be interesting to put some of these into linear models and see what we get - but I want to do so cautiously, not just throw a model together to see what comes out of it. Once models are built, then the fit lines can be added to any / all of the graphics above.