1 Motivation

I’ve been running regularly since last july with a new watch (TomTom). I had some script to retrieve and convert their binary format (ttbin) when the watch is connected (USB). So the goal is to get these data and make my own visualization instead of using TomTom website.

1.1 Data

When setting/connecting the running watch, it creates directories like :

/Users/jonathanbouchet/TomTom\ MySports/<date>/

where each ttbin file is.

I then run a python script to loop over all directories and summarize each run into a csv file. The summary/ column names is below :

library(ggplot2)
library(dplyr)
library(plotly)
df<-read.csv('foo.csv',sep=',')
head(df)
##          date      start        end duration latitude longitude distance
## 1 2016-07-16   06:14:19   07:20:37        66 41.12115 -81.46048 7.026884
## 2 2016-07-17   06:18:00   07:28:28        70 41.12118 -81.46054 7.227120
## 3 2016-07-18   05:58:01   06:47:16        49 41.12115 -81.46054 5.103436
## 4 2016-07-20   15:33:06   16:03:41        30  0.00000   0.00000 2.019636
## 5 2016-07-21   06:13:15   06:53:28        40 41.12118 -81.46052 4.103205
## 6 2016-07-22   06:04:43   06:52:30        47 41.12110 -81.46052 5.008434
##         type
## 1    running
## 2    running
## 3    running
## 4  treadmill
## 5    running
## 6    running
summary(df)
##           date            start             end         duration     
##  2016-10-02 :  3    06:05:36 :  3    06:47:16 :  2   Min.   : 10.00  
##  2016-07-24 :  2    05:58:53 :  2    06:49:44 :  2   1st Qu.: 48.00  
##  2016-08-11 :  2    05:59:35 :  2    06:50:57 :  2   Median : 52.00  
##  2016-08-14 :  2    06:01:57 :  2    06:51:02 :  2   Mean   : 52.72  
##  2016-08-21 :  2    06:01:58 :  2    06:52:25 :  2   3rd Qu.: 61.00  
##  2016-08-22 :  2    06:02:43 :  2    06:52:30 :  2   Max.   :103.00  
##  (Other)    :224   (Other)   :224   (Other)   :225                   
##     latitude       longitude         distance              type    
##  Min.   : 0.00   Min.   :-81.46   Min.   : 1.020    running  : 66  
##  1st Qu.: 0.00   1st Qu.:-81.46   1st Qu.: 4.780    treadmill:171  
##  Median : 0.00   Median :  0.00   Median : 5.250                   
##  Mean   :11.45   Mean   :-22.69   Mean   : 5.337                   
##  3rd Qu.:41.12   3rd Qu.:  0.00   3rd Qu.: 6.150                   
##  Max.   :41.12   Max.   :  0.00   Max.   :10.677                   
## 

The next steps are to add some other features for making useful plots :

tmp<-as.Date(df$date, format="%Y-%m-%d")
#create columns fro month(name,numeric) and hours
df$month<-as.numeric(format(tmp,'%m'))
df$month_name<-month.abb[df$month]
df$hour<-as.numeric(format(as.POSIXct(df$start,format="%H:%M:%S"),"%H"))

getAM<-function(x){
    if(x<12){return('AM')}
    else {return('PM')}
}
#create a new column moring/evening runs
df$TimeInDay<-sapply(df$hour,getAM)

1.2 Plots

pl <- ggplot(data=df,aes(x=distance,y=duration)) +
  geom_point(aes(color=type,shape=TimeInDay),size=5,alpha=.75) + 
  xlab('Distance [miles]') + ylab('Time [min]') + xlim(0,12) + ylim(0,110) + 
  ggtitle(' Time vs. Distance')
pl<- pl + geom_smooth(aes(group=1),method='lm',formula=y~x,color='black',size=.5) +
  scale_colour_manual(values=c("#E2D200","#46ACC8"))
print(pl)

1.3 Comments

  • There is (clearly) a correlation between my running time and the distance
  • Longer runs appear to be more when running outside (type=running) because running on the treadmill is boring
  • Most of the runs occur in the morning
gl <- ggplot(df, aes(x=date, y=distance)) + 
  geom_bar(aes(fill=type),stat="identity") + 
  theme(axis.text.x = element_text(size=4,angle=90, hjust=1)) +
  scale_fill_manual(values=c("#E2D200","#46ACC8"))
print(gl)

1.4 Comments

  • Time serie analysis ? There is clearly a seasonality in my data since I do long runs mostly saturday’s and sundays’s then I take the next day off.
  • As a function of time, we see also that starting October, most of the runs are a type=treadmill (because now it’s becoming cold in Ohio)

2 Summary

Next for the summary is to aggregate data per month :

df2<-as.data.frame(df %>% group_by(month_name, type) %>%select(duration, distance, type) %>% summarise(totDistance = sum(distance), totTime= sum(duration)))
## Adding missing grouping variables: `month_name`
lev<-c("Jul","Aug","Sep","Oct","Nov","Dec","Jan","Feb","Mar")
df2$MONTH <- factor(df2$month_name, levels = lev)
ggplot(data=df2,aes(x=MONTH,y=totDistance,fill=type)) + 
  geom_bar(stat='identity') + ylab('total distance [miles]') +
  scale_fill_manual(values=c("#E2D200","#46ACC8"))

2.1 Comments

An interesting plot since it clearly shows that Winter is coming, meaning I am running now more on the treadmill rather than outside