Introduction

This R script imports data generated from a runner wearing a GPS-enabled jogging watch with heartrate monitor. First non-standard packages are installed. Next, the data is manipulated to yield results more efficiently. The result is a presentation of charts communicating key metrics including analysis of speed, elevation climbed, heart rate, and GPS-mapped route.

This script includes sanity checkes throughout that have been commented out, including head(), summary(), and class()

Package management

Here we require packages the script calls.

# install.packages("XML")
# install.packages("ggmap")
# install.packages("reshape")
# install.packages("gpairs")
require(XML)
## Loading required package: XML
require(ggmap)
## Loading required package: ggmap
## Loading required package: ggplot2
require(reshape)
## Loading required package: reshape
require(gpairs)
## Loading required package: gpairs
require(ggplot2)

Data Extraction, Transform, Loading (ETL)

Import data for map generation and jog statistics, import raw TCX file as value:

mapData <- xmlParse("fog.tcx")

#Import raw TCX to dataframe. This will be used to generate map data
mapRawData <- xmlToDataFrame(nodes <- getNodeSet(mapData, "//ns:Trackpoint", "ns"))
head(mapRawData)
##                       Time                             Position
## 1 2014-09-04T23:03:37.000Z 37.78069546446204-122.47062250971794
## 2 2014-09-04T23:03:40.000Z 37.78062807396054-122.47062703594565
## 3 2014-09-04T23:03:41.000Z 37.78058834373951-122.47063893824816
## 4 2014-09-04T23:03:42.000Z 37.78055163100362-122.47064262628555
## 5 2014-09-04T23:03:43.000Z 37.780513912439346-122.4706432968378
## 6 2014-09-04T23:03:44.000Z 37.78048072010279-122.47064765542746
##       AltitudeMeters     DistanceMeters HeartRateBpm         Extensions
## 1               52.0 1.4900000095367432           87  1.315000057220459
## 2               52.0  8.979999542236328           92 1.5609999895095827
## 3               52.0 13.369999885559082           96 1.8930000066757202
## 4  52.20000076293945 17.459999084472656           99 2.1040000915527344
## 5 52.400001525878906   21.6299991607666          103  2.316999912261963
## 6 52.400001525878906  25.34000015258789          106 2.5250000953674316

We see that it is a bunch of factors, which can be tricky to graph in a meaningful way, due to the complications from levels. Instead of using functions like as.numeric or type.convert storing as vectors, and then creating a another dataframe, Golden Cheetah is used to efficiently produce a CSV with managable classes from the TCX file

Import jog statistics from CSV

jogRawData <- read.csv("gold.csv")

# Uncomment sanity checks as needed
#class(jogRawData)
#head(jogRawData)
#str(jogRawData)
# Let's take a looks at the content of this dataframe
head(jogRawData)
##   Minutes Torq..N.m.  Km.h Watts       Km Cadence  Hrate ID Altitude..m.
## 1 0.01667          0 5.029     0 0.003987       0  88.67  1         52.0
## 2 0.03333          0 5.324     0 0.006483       0  90.33  1         52.0
## 3 0.05000          0 5.620     0 0.008980       0  92.00  1         52.0
## 4 0.06667          0 6.815     0 0.013370       0  96.00  1         52.0
## 5 0.08333          0 7.574     0 0.017460       0  99.00  1         52.2
## 6 0.10000          0 8.341     0 0.021630       0 103.00  1         52.4
gpairs(jogRawData)
## Warning: 3 columns with less than two distinct values eliminated

plot of chunk unnamed-chunk-3

We can remove columns for Toruqe since this was not a bike ride and ID since it appears to operate as a mile marker

So let’s make a three subsets

  1. Useful data to graph from jogRawData
  2. Metric pace, distance, and altitude data to be converted to standard units and added to 1
  3. Position data to later be divided into longitude and latitude from mapRawData
jogChartsTemp <- data.frame(jogRawData[c("Minutes",
                                     "Hrate")])
jogMetricKmh <- jogRawData$Km.h
jogMetricKm <- jogRawData$Km
jogMetricAlt <- jogRawData$Altitude..m.

# Check jogChartsTemp
head(jogChartsTemp)
##   Minutes  Hrate
## 1 0.01667  88.67
## 2 0.03333  90.33
## 3 0.05000  92.00
## 4 0.06667  96.00
## 5 0.08333  99.00
## 6 0.10000 103.00
# Check jogMetricKmh & jogMetricKm & jogMetricAlt
head(jogMetricKmh)
## [1] 5.029 5.324 5.620 6.815 7.574 8.341
head(jogMetricKm)
## [1] 0.003987 0.006483 0.008980 0.013370 0.017460 0.021630
head(jogMetricAlt)
## [1] 52.0 52.0 52.0 52.0 52.2 52.4

I tend to think in terms of standard units, so let’s convert units from metric to standard

jogStdMph <- jogMetricKmh/1.609
jogStdMi <- jogMetricKm/1.609
jogStdAlt <- jogMetricAlt*3.28

# Check conversion
head(jogStdMph)
## [1] 3.126 3.309 3.493 4.235 4.708 5.184
head(jogStdMi)
## [1] 0.002478 0.004029 0.005581 0.008310 0.010851 0.013443
head(jogStdAlt)
## [1] 170.6 170.6 170.6 170.6 171.2 171.9

Finally, I prefer pace to speed, so we need to convert the data accordingly:

jogMpm <- jogStdMph/60
jogPace <- 1/jogMpm
head(jogPace)
## [1] 19.20 18.13 17.18 14.17 12.75 11.57

Now we can create a final dataframe for data manipulation:

jogCharts <- data.frame(jogChartsTemp, jogPace, jogStdMi, jogStdAlt)
names(jogCharts) <- c("Time", "Bpm", "Pace", "Dist", "Alt")
head(jogCharts)
##      Time    Bpm  Pace     Dist   Alt
## 1 0.01667  88.67 19.20 0.002478 170.6
## 2 0.03333  90.33 18.13 0.004029 170.6
## 3 0.05000  92.00 17.18 0.005581 170.6
## 4 0.06667  96.00 14.17 0.008310 170.6
## 5 0.08333  99.00 12.75 0.010851 171.2
## 6 0.10000 103.00 11.57 0.013443 171.9

Now we want to create a subset of GPS data using reshape package

jogPos <- colsplit(mapRawData$Position, split = "-", names = c("lat", 
                                                               "lon"))
head(jogPos)
##     lat   lon
## 1 37.78 122.5
## 2 37.78 122.5
## 3 37.78 122.5
## 4 37.78 122.5
## 5 37.78 122.5
## 6 37.78 122.5

Since the data capatured separates longitude and latitude by “-”, the column was split by the “-” character. We must return the negative value to the longitudinal coordinates to produce an accurate GPS route.

jogPos$lon <- as.numeric(lapply(jogPos$lon, function (x) 0-x))
head(jogPos)
##     lat    lon
## 1 37.78 -122.5
## 2 37.78 -122.5
## 3 37.78 -122.5
## 4 37.78 -122.5
## 5 37.78 -122.5
## 6 37.78 -122.5

Statistical Analysis

Let’s first get an idea of what the data looks like:

gpairs(jogCharts)

plot of chunk unnamed-chunk-10

summary(jogCharts)
##       Time            Bpm             Pace          Dist      
##  Min.   : 0.02   Min.   : 88.7   Min.   :  6   Min.   :0.002  
##  1st Qu.: 8.91   1st Qu.:174.9   1st Qu.:  9   1st Qu.:0.997  
##  Median :17.80   Median :177.7   Median : 10   Median :1.901  
##  Mean   :17.80   Mean   :176.3   Mean   :Inf   Mean   :1.871  
##  3rd Qu.:26.69   3rd Qu.:180.0   3rd Qu.: 11   3rd Qu.:2.753  
##  Max.   :35.58   Max.   :195.0   Max.   :Inf   Max.   :3.641  
##       Alt     
##  Min.   :171  
##  1st Qu.:195  
##  Median :207  
##  Mean   :209  
##  3rd Qu.:224  
##  Max.   :236

For a more detailed analysis, let’s find min, median, and max values for plotting later

Time analysis

Time.max.long <- max(jogCharts$Time)
Time.max <- round(Time.max.long, 2)

Speed analysis

Pace.min <- min(jogCharts$Pace)
Pace.med.long <- median(jogCharts$Pace)
Pace.med <- round(Pace.med.long, 2)
Pace.max <- max(jogCharts$Pace)
Pace.plot <- qplot(Dist, Pace, 
                   data = jogCharts, 
                   geom = "line")
pace <- Pace.plot +
  geom_hline(aes(yintercept=Pace.med),
             color="darkgreen", 
             linetype="dashed") +
  labs(title="Pace [min/mi] vs Dist [mi]") +
  ylim(5,12) +
  annotate("text", 
           x = 0.2, 
           y = Pace.med+0.25, 
           color = "darkgreen", 
           label = "Median pace") +
annotate("text", 
         x = 0.2, 
         y = Pace.med-0.25, 
         color = "darkgreen", 
         label = Pace.med)

print(pace)
## Warning: Removed 5 rows containing missing values (geom_path).

plot of chunk unnamed-chunk-12

Distance analysis

Dist.max1 <- max(jogCharts$Dist)
Dist.max <- round(Dist.max1, 2)

Elevation analysis

Alt.min <- min(jogCharts$Alt)
Alt.med <- median(jogCharts$Alt)
Alt.max <- max(jogCharts$Alt)
Alt.range <- (Alt.max - Alt.min)
Alt.plot <- qplot(Dist, 
                  Alt, 
                  data = jogCharts, 
                  geom = "point")
alt <- Alt.plot + 
  geom_hline(aes(yintercept=Alt.max), 
             color="darkblue", 
             linetype="dashed") +
  labs(title="Alt [ft] vs Dist [mi]") +
  geom_area() +
  ylim (0, 300) +
  xlim (0, Dist.max) +
  annotate("text", 
           x = 0.5, 
           y = Alt.max+5, 
           color = "darkblue",
           label = "Maximum Altitude") +
  annotate("text", 
           x = 0.5, 
           y = Alt.max-5, 
           color = "darkblue",
           label = Alt.max)
print(alt)
## Warning: Removed 2 rows containing missing values (position_stack).
## Warning: Removed 2 rows containing missing values (geom_point).

plot of chunk unnamed-chunk-14

Heartrate analysis

bpm.Min <- min(jogCharts$Bpm)
bpm.Med1 <- median(jogCharts$Bpm)
bpm.Med <- round(bpm.Med1,0)
bpm.Max <- max(jogCharts$Bpm)
bpm.plot <- qplot(Dist, 
                  Bpm, 
                  data = jogCharts, 
                  geom = "point")
bpm <- bpm.plot + 
  geom_hline(aes(yintercept=bpm.Med), 
             color="red", 
             linetype="dashed") + 
  labs(title="BPM vs Dist [mi]") +
  annotate("text", 
           x = 0.25, 
           y = bpm.Med+2, 
           label = "Median BPM", 
           color = "red") +
  annotate("text", 
           x = 0.25, 
           y = bpm.Med-2, 
           label = bpm.Med, 
           color = "red")

print(bpm)

plot of chunk unnamed-chunk-15

Performance anaylsis

This is not typically provided by most jogging software packages, but it is interesting to see how pace affects and heart rate, excluding other factors.

perf <- qplot(Pace, 
              Bpm, 
              data = jogCharts, 
              geom = "point") +
  xlim(0,20) + 
  labs(title = "Heartrate [BPM] as function of Pace [min/mi]") + 
  geom_hline(aes(yintercept=bpm.Med), 
             color="red", 
             linetype="dashed") + 
  geom_vline(aes(xintercept=Pace.med), 
             color="blue", 
             linetype="dashed") +
  geom_density2d(color = "white") +
  annotate("text", 
           x = Pace.med+1, 
           y = 10,
           label = "Median pace", 
           color = "blue") +
  annotate("text", 
           x = Pace.med+1, 
           y = 1, 
           label = Pace.med, 
           color = "blue") +
  annotate("text", 
           x = 1.5, 
           y = bpm.Med+5, 
           label = "Median BPM", 
           color = "red") +
  annotate("text", 
           x = 1.5, 
           y = bpm.Med-5, 
           label = bpm.Med, 
           color = "red")
  
print(perf)
## Warning: Removed 6 rows containing non-finite values (stat_density2d).
## Warning: Removed 5 rows containing missing values (geom_point).

plot of chunk unnamed-chunk-16

The clustering indicates the body fatiguing during the run. This indicates that while the runner maintained some variablity in their pace, heart rate tended to stay within +/-25bpm. As you could expect, the clustering centers nicely on the median values for each variable.

GPS Map Generation

Now we check if the GPS data maps the way we expect

qplot(lon, lat, data = jogPos)

plot of chunk unnamed-chunk-17

It does, so let’s overlay it on a googlemap using the ggmap package. First, we want to grab a map from google centered on median longitudinal and latitudinal values, set to an appropriate zoom scale:

mapImageData <- get_googlemap(center = c(lon = median(jogPos$lon), lat = median(jogPos$lat)), 
                              zoom = 15, maptype = c("roadmap"))
## Map from URL : http://maps.googleapis.com/maps/api/staticmap?center=37.777859,-122.465516&zoom=15&size=%20640x640&maptype=roadmap&sensor=false
## Google Maps API Terms of Service : http://developers.google.com/maps/terms

Now we overlay the longitude and latitude coordinates from the jobPos dataframe

map <- ggmap(mapImageData,
      extent = "device") + # takes out axes, etc.
  geom_point(aes(x = lon, y = lat), data = jogPos, colour = "darkblue", size = 2, pch = 16)
print(map)

plot of chunk unnamed-chunk-19

SUCCESS!!

Final stats

print ("Total time [min]")
## [1] "Total time [min]"
print (Time.max)
## [1] 35.58
print ("Total distance of run [mi]")
## [1] "Total distance of run [mi]"
print (Dist.max)
## [1] 3.64
print ("Median pace [mi/min]")
## [1] "Median pace [mi/min]"
print (Pace.med)
## [1] 9.9
print ("Max heart rate [BPM]")
## [1] "Max heart rate [BPM]"
print (bpm.Max)
## [1] 195
print ("Total climb [ft]")
## [1] "Total climb [ft]"
print (Alt.range)
## [1] 65.6