Based on the Astrand et al, 1964 data set
The only variable that needs modification is vo2, currently in absolute units (L/min), makes more sense to change units to ml/kg/min
Scatter plot and linear regression model between resting heart rate and vo2max
Scatter plot and linear regression model between maximal heart rate and vo2max
Scatter plot and linear regression model between submaximal heart (stage 2 of the exercise test) and vo2max. Stage 2 was chosen because all subjects were performing the same rate of work and none of the subjects had achieved their vo2max yet (i.e. some had reached max at the rate of work associated with stage 3 of the protocol)
For the sake of presentation 4 & 5 will be combined so that output releated to the linear regression model will follow the scatterplot of the same variables.
library(dplyr) #For data manipulation
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2) #For plotting
library(pastecs) #For descriptive statistics
## Loading required package: boot
##
## Attaching package: 'pastecs'
##
## The following objects are masked from 'package:dplyr':
##
## first, last
library(psych) #For descriptive statistics
##
## Attaching package: 'psych'
##
## The following object is masked from 'package:boot':
##
## logit
##
## The following object is masked from 'package:ggplot2':
##
## %+%
setwd("~/physexlab/Astrand_1964")
data <- read.csv("data.csv") #load the data into R
data <- mutate(data,VO2_mlkg = (VO2*1000)/weight) #create VO2 in ml/kg/min
data<-select(data, subject, age, height, weight, stage, rest, max, HR, VO2_mlkg) #select variables of interest
#need to arrange data so that each row is one subject and contains the resting, stage 2 and max HR and VO2, basically going from a vertical data base with >1 rows per subject; to a horizonal data base with 1 row per subject
#Achieve this by cutting the data into separate dataframes for each condition (rest, stage 2, max)
rest<-filter(data,rest==1) #resting data
rest<-select(rest,subject, age, height, weight,HR, VO2_mlkg)
rest<- rename(rest, restHR = HR, restVO2_mlkg = VO2_mlkg)
max<-filter(data,max==1) #max data
max<-select(max,subject, HR, VO2_mlkg)
max<- rename(max, maxHR = HR, maxVO2_mlkg = VO2_mlkg)
stage2<-filter(data,stage==2) #stage 2 data
stage2<-select(stage2,subject, HR, VO2_mlkg)
stage2<- rename(stage2, stage2HR = HR, stage2VO2_mlkg = VO2_mlkg)
#now join the separate data frames back together into one dataframe
data<-full_join(rest,max,by="subject")
data<-full_join(data, stage2, by="subject")
options(scipen=100) #supresses scientific notation of output
options(digits=2) #suggests a limit of 2 significant digits - but it is only a suggestion
stat.desc(data, basic=F) #runs basic stats
## subject age height weight restHR restVO2_mlkg maxHR
## median 12.00 22.00 171.000 65.00 74.00 4.62 191.000
## mean 12.00 22.43 174.609 68.96 73.61 4.46 189.435
## SE.mean 1.41 0.58 1.649 1.90 2.12 0.14 2.022
## CI.mean.0.95 2.93 1.20 3.419 3.94 4.40 0.29 4.194
## var 46.00 7.71 62.522 83.04 103.52 0.44 94.075
## std.dev 6.78 2.78 7.907 9.11 10.17 0.66 9.699
## coef.var 0.57 0.12 0.045 0.13 0.14 0.15 0.051
## maxVO2_mlkg stage2HR stage2VO2_mlkg
## median 47.50 132.00 24.00
## mean 48.02 133.91 24.04
## SE.mean 1.61 5.13 0.78
## CI.mean.0.95 3.34 10.65 1.62
## var 59.64 605.99 14.06
## std.dev 7.72 24.62 3.75
## coef.var 0.16 0.18 0.16
qplot(age, data=data, geom="histogram")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
qplot(restHR, data=data, geom="histogram")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
qplot(maxHR, data=data, geom="histogram")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
qplot(maxVO2_mlkg, data=data, geom="histogram")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
qplot(stage2HR, data=data, geom="histogram")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
qplot(stage2VO2_mlkg, data=data, geom="histogram")
## stat_bin: binwidth defaulted to range/30. Use 'binwidth = x' to adjust this.
qplot(y=maxVO2_mlkg, x=restHR, data=data) + stat_smooth(method=lm, formula=y~x)
restHR_VO2maxFit <- lm(maxVO2_mlkg ~ restHR, data=data) #fit model
summary(restHR_VO2maxFit) # show results
##
## Call:
## lm(formula = maxVO2_mlkg ~ restHR, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.81 -6.70 -1.09 5.08 18.37
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 57.593 12.121 4.75 0.00011 ***
## restHR -0.130 0.163 -0.80 0.43425
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.8 on 21 degrees of freedom
## Multiple R-squared: 0.0294, Adjusted R-squared: -0.0168
## F-statistic: 0.636 on 1 and 21 DF, p-value: 0.434
Interestingly there is a remarkably weak relationship in this sample between resting HR and max VO2. You should think about the reasons for this lack of a relatioonship for what is usually considered a well known and understood relationship. Do you think it could have anything to do with the study methods and when they took this apparant “resting heart rate”? Could subjects have been anxious prior to the exercise test - increasing their HR in such a way that it was no longer a true resting heart rate, and not predictive of what their true resting heart may be?
qplot(y=maxVO2_mlkg, x=stage2HR, data=data) + stat_smooth(method=lm, formula=y~x)
stage2HR_VO2maxFit <- lm(maxVO2_mlkg ~ stage2HR, data=data) #fit model
summary(stage2HR_VO2maxFit) # show results
##
## Call:
## lm(formula = maxVO2_mlkg ~ stage2HR, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.89 -4.04 -1.31 3.17 16.93
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 71.4227 7.7351 9.23 0.0000000077 ***
## stage2HR -0.1748 0.0569 -3.07 0.0058 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.6 on 21 degrees of freedom
## Multiple R-squared: 0.31, Adjusted R-squared: 0.278
## F-statistic: 9.45 on 1 and 21 DF, p-value: 0.00575
Here we see a much stronger relationship between stage 2 HR (submaximal workload that all subjects were able to achieve) and the VO2 max. You should discuss why it might be the case that resting HR was not strongly associated, but that submaximal HR is more strongly associated with VO2 max in this sample, again, considering the factors that could impact resting HR. Is it reasonable to suspect the pre test nervous HR response would be less of a factor once the person is in stage2 of the actual test, so that their physiological capacity has a stronger impact on their HR than nervousness?
qplot(y=maxVO2_mlkg, x=maxHR, data=data) + stat_smooth(method=lm, formula=y~x)
maxHR_VO2maxFit <- lm(maxVO2_mlkg ~ maxHR, data=data) #fit model
summary(maxHR_VO2maxFit) # show results
##
## Call:
## lm(formula = maxVO2_mlkg ~ maxHR, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.689 -6.664 -0.977 4.512 20.456
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 67.707 32.674 2.07 0.051 .
## maxHR -0.104 0.172 -0.60 0.553
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.8 on 21 degrees of freedom
## Multiple R-squared: 0.017, Adjusted R-squared: -0.0298
## F-statistic: 0.364 on 1 and 21 DF, p-value: 0.553
As expected there is not a relationship between max HR and max VO2 - this coheres with our understanding. The primary determinant of max HR is age; and for any given age there is a lot of variability in VO2 max. Also - there is very little variance in this samples age range.
qplot(y=maxHR, x=age, data=data) + stat_smooth(method=lm, formula=y~x)
age_maxHRFit <- lm(maxHR ~ age, data=data) #fit model
summary(age_maxHRFit) # show results
##
## Call:
## lm(formula = maxHR ~ age, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.817 -6.617 0.183 4.283 16.783
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 229.814 14.761 15.57 0.00000000000052 ***
## age -1.800 0.653 -2.76 0.012 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.5 on 21 degrees of freedom
## Multiple R-squared: 0.266, Adjusted R-squared: 0.231
## F-statistic: 7.59 on 1 and 21 DF, p-value: 0.0119
Here we see that there is a moderate association between age and maxHR in this sample - which is pretty amazing to me given the really small age range of the subjects. It seems the two 30 year olds are driving the relationship with their lower max HRs. Interesting.y, their max HRs are even lower than the standard (220 - age) equation would predict. This is not uncommon on the bike ergometer where LE fatigue tends to limit people due to the isolated muscle demand in the quadriceps. In fact - as you can see - the entire sample has a slightly lower max HR than the (220-age) equation would predict.