This R script imports data generated from a runner wearing a GPS-enabled jogging watch with heartrate monitor. First non-standard packages are installed. Next, the data is manipulated to yield results more efficiently. The result is a presentation of charts communicating key metrics including analysis of speed, elevation climbed, heart rate, and GPS-mapped route.
This script includes sanity checkes throughout that have been commented out, including head(), summary(), and class()
Here we require packages the script calls.
# install.packages("XML")
# install.packages("ggmap")
# install.packages("reshape")
# install.packages("gpairs")
require(XML)
## Loading required package: XML
require(ggmap)
## Loading required package: ggmap
## Loading required package: ggplot2
require(reshape)
## Loading required package: reshape
require(gpairs)
## Loading required package: gpairs
require(ggplot2)
Import data for map generation and jog statistics, import raw TCX file as value:
mapData <- xmlParse("fog.tcx")
#Import raw TCX to dataframe. This will be used to generate map data
mapRawData <- xmlToDataFrame(nodes <- getNodeSet(mapData, "//ns:Trackpoint", "ns"))
head(mapRawData)
## Time Position
## 1 2014-09-04T23:03:37.000Z 37.78069546446204-122.47062250971794
## 2 2014-09-04T23:03:40.000Z 37.78062807396054-122.47062703594565
## 3 2014-09-04T23:03:41.000Z 37.78058834373951-122.47063893824816
## 4 2014-09-04T23:03:42.000Z 37.78055163100362-122.47064262628555
## 5 2014-09-04T23:03:43.000Z 37.780513912439346-122.4706432968378
## 6 2014-09-04T23:03:44.000Z 37.78048072010279-122.47064765542746
## AltitudeMeters DistanceMeters HeartRateBpm Extensions
## 1 52.0 1.4900000095367432 87 1.315000057220459
## 2 52.0 8.979999542236328 92 1.5609999895095827
## 3 52.0 13.369999885559082 96 1.8930000066757202
## 4 52.20000076293945 17.459999084472656 99 2.1040000915527344
## 5 52.400001525878906 21.6299991607666 103 2.316999912261963
## 6 52.400001525878906 25.34000015258789 106 2.5250000953674316
We see that it is a bunch of factors, which can be tricky to graph in a meaningful way, due to the complications from levels. Instead of using functions like as.numeric or type.convert storing as vectors, and then creating a another dataframe, Golden Cheetah is used to efficiently produce a CSV with managable classes from the TCX file
Import jog statistics from CSV
jogRawData <- read.csv("gold.csv")
# Uncomment sanity checks as needed
#class(jogRawData)
#head(jogRawData)
#str(jogRawData)
# Let's take a looks at the content of this dataframe
head(jogRawData)
## Minutes Torq..N.m. Km.h Watts Km Cadence Hrate ID Altitude..m.
## 1 0.01667 0 5.029 0 0.003987 0 88.67 1 52.0
## 2 0.03333 0 5.324 0 0.006483 0 90.33 1 52.0
## 3 0.05000 0 5.620 0 0.008980 0 92.00 1 52.0
## 4 0.06667 0 6.815 0 0.013370 0 96.00 1 52.0
## 5 0.08333 0 7.574 0 0.017460 0 99.00 1 52.2
## 6 0.10000 0 8.341 0 0.021630 0 103.00 1 52.4
gpairs(jogRawData)
## Warning: 3 columns with less than two distinct values eliminated
We can remove columns for Toruqe since this was not a bike ride and ID since it appears to operate as a mile marker
So let’s make a three subsets
jogChartsTemp <- data.frame(jogRawData[c("Minutes",
"Hrate")])
jogMetricKmh <- jogRawData$Km.h
jogMetricKm <- jogRawData$Km
jogMetricAlt <- jogRawData$Altitude..m.
# Check jogChartsTemp
head(jogChartsTemp)
## Minutes Hrate
## 1 0.01667 88.67
## 2 0.03333 90.33
## 3 0.05000 92.00
## 4 0.06667 96.00
## 5 0.08333 99.00
## 6 0.10000 103.00
# Check jogMetricKmh & jogMetricKm & jogMetricAlt
head(jogMetricKmh)
## [1] 5.029 5.324 5.620 6.815 7.574 8.341
head(jogMetricKm)
## [1] 0.003987 0.006483 0.008980 0.013370 0.017460 0.021630
head(jogMetricAlt)
## [1] 52.0 52.0 52.0 52.0 52.2 52.4
I tend to think in terms of standard units, so let’s convert units from metric to standard
jogStdMph <- jogMetricKmh/1.609
jogStdMi <- jogMetricKm/1.609
jogStdAlt <- jogMetricAlt*3.28
# Check conversion
head(jogStdMph)
## [1] 3.126 3.309 3.493 4.235 4.708 5.184
head(jogStdMi)
## [1] 0.002478 0.004029 0.005581 0.008310 0.010851 0.013443
head(jogStdAlt)
## [1] 170.6 170.6 170.6 170.6 171.2 171.9
Finally, I prefer pace to speed, so we need to convert the data accordingly:
jogMpm <- jogStdMph/60
jogPace <- 1/jogMpm
head(jogPace)
## [1] 19.20 18.13 17.18 14.17 12.75 11.57
Now we can create a final dataframe for data manipulation:
jogCharts <- data.frame(jogChartsTemp, jogPace, jogStdMi, jogStdAlt)
names(jogCharts) <- c("Time", "Bpm", "Pace", "Dist", "Alt")
head(jogCharts)
## Time Bpm Pace Dist Alt
## 1 0.01667 88.67 19.20 0.002478 170.6
## 2 0.03333 90.33 18.13 0.004029 170.6
## 3 0.05000 92.00 17.18 0.005581 170.6
## 4 0.06667 96.00 14.17 0.008310 170.6
## 5 0.08333 99.00 12.75 0.010851 171.2
## 6 0.10000 103.00 11.57 0.013443 171.9
Now we want to create a subset of GPS data using reshape package
jogPos <- colsplit(mapRawData$Position, split = "-", names = c("lat",
"lon"))
head(jogPos)
## lat lon
## 1 37.78 122.5
## 2 37.78 122.5
## 3 37.78 122.5
## 4 37.78 122.5
## 5 37.78 122.5
## 6 37.78 122.5
Since the data capatured separates longitude and latitude by “-”, the column was split by the “-” character. We must return the negative value to the longitudinal coordinates to produce an accurate GPS route.
jogPos$lon <- as.numeric(lapply(jogPos$lon, function (x) 0-x))
head(jogPos)
## lat lon
## 1 37.78 -122.5
## 2 37.78 -122.5
## 3 37.78 -122.5
## 4 37.78 -122.5
## 5 37.78 -122.5
## 6 37.78 -122.5
Let’s first get an idea of what the data looks like:
gpairs(jogCharts)
summary(jogCharts)
## Time Bpm Pace Dist
## Min. : 0.02 Min. : 88.7 Min. : 6 Min. :0.002
## 1st Qu.: 8.91 1st Qu.:174.9 1st Qu.: 9 1st Qu.:0.997
## Median :17.80 Median :177.7 Median : 10 Median :1.901
## Mean :17.80 Mean :176.3 Mean :Inf Mean :1.871
## 3rd Qu.:26.69 3rd Qu.:180.0 3rd Qu.: 11 3rd Qu.:2.753
## Max. :35.58 Max. :195.0 Max. :Inf Max. :3.641
## Alt
## Min. :171
## 1st Qu.:195
## Median :207
## Mean :209
## 3rd Qu.:224
## Max. :236
For a more detailed analysis, let’s find min, median, and max values for plotting later
Time.max.long <- max(jogCharts$Time)
Time.max <- round(Time.max.long, 2)
Pace.min <- min(jogCharts$Pace)
Pace.med.long <- median(jogCharts$Pace)
Pace.med <- round(Pace.med.long, 2)
Pace.max <- max(jogCharts$Pace)
Pace.plot <- qplot(Dist, Pace,
data = jogCharts,
geom = "line")
pace <- Pace.plot +
geom_hline(aes(yintercept=Pace.med),
color="darkgreen",
linetype="dashed") +
labs(title="Pace [min/mi] vs Dist [mi]") +
ylim(5,12) +
annotate("text",
x = 0.2,
y = Pace.med+0.25,
color = "darkgreen",
label = "Median pace") +
annotate("text",
x = 0.2,
y = Pace.med-0.25,
color = "darkgreen",
label = Pace.med)
print(pace)
## Warning: Removed 5 rows containing missing values (geom_path).
Dist.max1 <- max(jogCharts$Dist)
Dist.max <- round(Dist.max1, 2)
Alt.min <- min(jogCharts$Alt)
Alt.med <- median(jogCharts$Alt)
Alt.max <- max(jogCharts$Alt)
Alt.range <- (Alt.max - Alt.min)
Alt.plot <- qplot(Dist,
Alt,
data = jogCharts,
geom = "point")
alt <- Alt.plot +
geom_hline(aes(yintercept=Alt.max),
color="darkblue",
linetype="dashed") +
labs(title="Alt [ft] vs Dist [mi]") +
geom_area() +
ylim (0, 300) +
xlim (0, Dist.max) +
annotate("text",
x = 0.5,
y = Alt.max+5,
color = "darkblue",
label = "Maximum Altitude") +
annotate("text",
x = 0.5,
y = Alt.max-5,
color = "darkblue",
label = Alt.max)
print(alt)
## Warning: Removed 2 rows containing missing values (position_stack).
## Warning: Removed 2 rows containing missing values (geom_point).
bpm.Min <- min(jogCharts$Bpm)
bpm.Med1 <- median(jogCharts$Bpm)
bpm.Med <- round(bpm.Med1,0)
bpm.Max <- max(jogCharts$Bpm)
bpm.plot <- qplot(Dist,
Bpm,
data = jogCharts,
geom = "point")
bpm <- bpm.plot +
geom_hline(aes(yintercept=bpm.Med),
color="red",
linetype="dashed") +
labs(title="BPM vs Dist [mi]") +
annotate("text",
x = 0.25,
y = bpm.Med+2,
label = "Median BPM",
color = "red") +
annotate("text",
x = 0.25,
y = bpm.Med-2,
label = bpm.Med,
color = "red")
print(bpm)
This is not typically provided by most jogging software packages, but it is interesting to see how pace affects and heart rate, excluding other factors.
perf <- qplot(Pace,
Bpm,
data = jogCharts,
geom = "point") +
xlim(0,20) +
labs(title = "Heartrate [BPM] as function of Pace [min/mi]") +
geom_hline(aes(yintercept=bpm.Med),
color="red",
linetype="dashed") +
geom_vline(aes(xintercept=Pace.med),
color="blue",
linetype="dashed") +
geom_density2d(color = "white") +
annotate("text",
x = Pace.med+1,
y = 10,
label = "Median pace",
color = "blue") +
annotate("text",
x = Pace.med+1,
y = 1,
label = Pace.med,
color = "blue") +
annotate("text",
x = 1.5,
y = bpm.Med+5,
label = "Median BPM",
color = "red") +
annotate("text",
x = 1.5,
y = bpm.Med-5,
label = bpm.Med,
color = "red")
print(perf)
## Warning: Removed 6 rows containing non-finite values (stat_density2d).
## Warning: Removed 5 rows containing missing values (geom_point).
The clustering indicates the body fatiguing during the run. This indicates that while the runner maintained some variablity in their pace, heart rate tended to stay within +/-25bpm. As you could expect, the clustering centers nicely on the median values for each variable.
Now we check if the GPS data maps the way we expect
qplot(lon, lat, data = jogPos)
It does, so let’s overlay it on a googlemap using the ggmap package. First, we want to grab a map from google centered on median longitudinal and latitudinal values, set to an appropriate zoom scale:
mapImageData <- get_googlemap(center = c(lon = median(jogPos$lon), lat = median(jogPos$lat)),
zoom = 15, maptype = c("roadmap"))
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=37.777859,-122.465516&zoom=15&size=%20640x640&maptype=roadmap&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms
Now we overlay the longitude and latitude coordinates from the jobPos dataframe
map <- ggmap(mapImageData,
extent = "device") + # takes out axes, etc.
geom_point(aes(x = lon, y = lat), data = jogPos, colour = "darkblue", size = 2, pch = 16)
print(map)
SUCCESS!!
print ("Total time [min]")
## [1] "Total time [min]"
print (Time.max)
## [1] 35.58
print ("Total distance of run [mi]")
## [1] "Total distance of run [mi]"
print (Dist.max)
## [1] 3.64
print ("Median pace [mi/min]")
## [1] "Median pace [mi/min]"
print (Pace.med)
## [1] 9.9
print ("Max heart rate [BPM]")
## [1] "Max heart rate [BPM]"
print (bpm.Max)
## [1] 195
print ("Total climb [ft]")
## [1] "Total climb [ft]"
print (Alt.range)
## [1] 65.6