Data analysis performed for Albert’s physiology of exercise lab report

Based on the Astrand et al, 1964 data set

Hypotheses

  1. Inverse relationship between vo2max and resting heart rate
  2. Submaximal heart rate and max heart rate relate to max vo2

Analysis strategy

The only variable that needs modification is vo2, currently in absolute units (L/min), makes more sense to change units to ml/kg/min

Hypothesis 1

Scatter plot and linear regression model between resting heart rate and vo2max

Hypothesis 2

Scatter plot and linear regression model between maximal heart rate and vo2max

Scatter plot and linear regression model between submaximal heart (stage 2 of the exercise test) and vo2max. Stage 2 was chosen because all subjects were performing the same rate of work and none of the subjects had achieved their vo2max yet (i.e. some had reached max at the rate of work associated with stage 3 of the protocol)

Analysis steps

  1. Load R packages, load data, organize data for analysis
  2. Descriptive statistics of relevant variables
  3. Univariate distributions with histograms (no need to include all of these)
  4. Bivariate distributions with scatterplots
  5. Linear regression

For the sake of presentation 4 & 5 will be combined so that output releated to the linear regression model will follow the scatterplot of the same variables.

Load and clean data for analysis

library(dplyr) #For data manipulation
## 
## Attaching package: 'dplyr'
## 
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2) #For plotting
library(pastecs) #For descriptive statistics
## Loading required package: boot
## 
## Attaching package: 'pastecs'
## 
## The following objects are masked from 'package:dplyr':
## 
##     first, last
library(psych) #For descriptive statistics
## 
## Attaching package: 'psych'
## 
## The following object is masked from 'package:boot':
## 
##     logit
## 
## The following object is masked from 'package:ggplot2':
## 
##     %+%
setwd("~/physexlab/Astrand_1964")
data <- read.csv("data.csv") #load the data into R
data <- mutate(data,VO2_mlkg = (VO2*1000)/weight) #create VO2 in ml/kg/min
data<-select(data, subject, age, height, weight, stage, rest, max, HR, VO2_mlkg) #select variables of interest
#need to arrange data so that each row is one subject and contains the resting, stage 2 and max HR and VO2, basically going from a vertical data base with >1 rows per subject; to a horizonal data base with 1 row per subject
#Achieve this by cutting the data into separate dataframes for each condition (rest, stage 2, max)
rest<-filter(data,rest==1) #resting data
rest<-select(rest,subject, age, height, weight,HR, VO2_mlkg)
rest<- rename(rest, restHR = HR, restVO2_mlkg = VO2_mlkg)
max<-filter(data,max==1) #max data
max<-select(max,subject, HR, VO2_mlkg)
max<- rename(max, maxHR = HR, maxVO2_mlkg = VO2_mlkg)
stage2<-filter(data,stage==2) #stage 2 data
stage2<-select(stage2,subject, HR, VO2_mlkg)
stage2<- rename(stage2, stage2HR = HR, stage2VO2_mlkg = VO2_mlkg)
#now join the separate data frames back together into one dataframe
data<-full_join(rest,max,by="subject")
data<-full_join(data, stage2, by="subject")

Analysis

Descriptive statistics of relevant variables

options(scipen=100) #supresses scientific notation of output
options(digits=2) #suggests a limit of 2 significant digits - but it is only a suggestion
stat.desc(data, basic=F) #runs basic stats
##              subject   age  height weight restHR restVO2_mlkg   maxHR
## median         12.00 22.00 171.000  65.00  74.00         4.62 191.000
## mean           12.00 22.43 174.609  68.96  73.61         4.46 189.435
## SE.mean         1.41  0.58   1.649   1.90   2.12         0.14   2.022
## CI.mean.0.95    2.93  1.20   3.419   3.94   4.40         0.29   4.194
## var            46.00  7.71  62.522  83.04 103.52         0.44  94.075
## std.dev         6.78  2.78   7.907   9.11  10.17         0.66   9.699
## coef.var        0.57  0.12   0.045   0.13   0.14         0.15   0.051
##              maxVO2_mlkg stage2HR stage2VO2_mlkg
## median             47.50   132.00          24.00
## mean               48.02   133.91          24.04
## SE.mean             1.61     5.13           0.78
## CI.mean.0.95        3.34    10.65           1.62
## var                59.64   605.99          14.06
## std.dev             7.72    24.62           3.75
## coef.var            0.16     0.18           0.16

Univariate distributions with histograms

qplot(age, data=data, geom="histogram")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

qplot(restHR, data=data, geom="histogram")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

qplot(maxHR, data=data, geom="histogram")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

qplot(maxVO2_mlkg, data=data, geom="histogram")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

qplot(stage2HR, data=data, geom="histogram")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

qplot(stage2VO2_mlkg, data=data, geom="histogram")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.

Bivariate distributions with scatterplots and Linear regression

Is there an inverse relationship between vo2max and resting heart rate
qplot(y=maxVO2_mlkg, x=restHR, data=data) + stat_smooth(method=lm, formula=y~x)

restHR_VO2maxFit <- lm(maxVO2_mlkg ~ restHR, data=data) #fit model
summary(restHR_VO2maxFit) # show results 
## 
## Call:
## lm(formula = maxVO2_mlkg ~ restHR, data = data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -9.81  -6.70  -1.09   5.08  18.37 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   57.593     12.121    4.75  0.00011 ***
## restHR        -0.130      0.163   -0.80  0.43425    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.8 on 21 degrees of freedom
## Multiple R-squared:  0.0294, Adjusted R-squared:  -0.0168 
## F-statistic: 0.636 on 1 and 21 DF,  p-value: 0.434
Thoughts

Interestingly there is a remarkably weak relationship in this sample between resting HR and max VO2. You should think about the reasons for this lack of a relatioonship for what is usually considered a well known and understood relationship. Do you think it could have anything to do with the study methods and when they took this apparant “resting heart rate”? Could subjects have been anxious prior to the exercise test - increasing their HR in such a way that it was no longer a true resting heart rate, and not predictive of what their true resting heart may be?

Is there an inverse relationship between submaximal heart rate and max vo2?

qplot(y=maxVO2_mlkg, x=stage2HR, data=data) + stat_smooth(method=lm, formula=y~x)

stage2HR_VO2maxFit <- lm(maxVO2_mlkg ~ stage2HR, data=data) #fit model
summary(stage2HR_VO2maxFit) # show results 
## 
## Call:
## lm(formula = maxVO2_mlkg ~ stage2HR, data = data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -9.89  -4.04  -1.31   3.17  16.93 
## 
## Coefficients:
##             Estimate Std. Error t value     Pr(>|t|)    
## (Intercept)  71.4227     7.7351    9.23 0.0000000077 ***
## stage2HR     -0.1748     0.0569   -3.07       0.0058 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.6 on 21 degrees of freedom
## Multiple R-squared:  0.31,   Adjusted R-squared:  0.278 
## F-statistic: 9.45 on 1 and 21 DF,  p-value: 0.00575
Thoughts

Here we see a much stronger relationship between stage 2 HR (submaximal workload that all subjects were able to achieve) and the VO2 max. You should discuss why it might be the case that resting HR was not strongly associated, but that submaximal HR is more strongly associated with VO2 max in this sample, again, considering the factors that could impact resting HR. Is it reasonable to suspect the pre test nervous HR response would be less of a factor once the person is in stage2 of the actual test, so that their physiological capacity has a stronger impact on their HR than nervousness?

#### Is there a relationship between maximal heart rate and max vo2?

qplot(y=maxVO2_mlkg, x=maxHR, data=data) + stat_smooth(method=lm, formula=y~x)

maxHR_VO2maxFit <- lm(maxVO2_mlkg ~ maxHR, data=data) #fit model
summary(maxHR_VO2maxFit) # show results 
## 
## Call:
## lm(formula = maxVO2_mlkg ~ maxHR, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -10.689  -6.664  -0.977   4.512  20.456 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   67.707     32.674    2.07    0.051 .
## maxHR         -0.104      0.172   -0.60    0.553  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.8 on 21 degrees of freedom
## Multiple R-squared:  0.017,  Adjusted R-squared:  -0.0298 
## F-statistic: 0.364 on 1 and 21 DF,  p-value: 0.553
Thoughts

As expected there is not a relationship between max HR and max VO2 - this coheres with our understanding. The primary determinant of max HR is age; and for any given age there is a lot of variability in VO2 max. Also - there is very little variance in this samples age range.

For kicks - Max HR and age in this sample

qplot(y=maxHR, x=age, data=data) + stat_smooth(method=lm, formula=y~x)

age_maxHRFit <- lm(maxHR ~ age, data=data) #fit model
summary(age_maxHRFit) # show results 
## 
## Call:
## lm(formula = maxHR ~ age, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -14.817  -6.617   0.183   4.283  16.783 
## 
## Coefficients:
##             Estimate Std. Error t value         Pr(>|t|)    
## (Intercept)  229.814     14.761   15.57 0.00000000000052 ***
## age           -1.800      0.653   -2.76            0.012 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.5 on 21 degrees of freedom
## Multiple R-squared:  0.266,  Adjusted R-squared:  0.231 
## F-statistic: 7.59 on 1 and 21 DF,  p-value: 0.0119
Thoughts

Here we see that there is a moderate association between age and maxHR in this sample - which is pretty amazing to me given the really small age range of the subjects. It seems the two 30 year olds are driving the relationship with their lower max HRs. Interesting.y, their max HRs are even lower than the standard (220 - age) equation would predict. This is not uncommon on the bike ergometer where LE fatigue tends to limit people due to the isolated muscle demand in the quadriceps. In fact - as you can see - the entire sample has a slightly lower max HR than the (220-age) equation would predict.