I use the pre-installed iPhone Health App to track my running+walking distance and regularly monitor my progress over the course of the month to see how the total distance I’ve covered for a given month compares to previous months. Recently I updated my iPhone to iOS 9.3 and one of the changes I actually noticed was to the Health App which prevents me from being able to make this total distance month-to-month comparison.
After the iOS 9.3 update, instead of summarizing each month by total running+walking distance, the app now summarizes each month by providing you the average running+walking distance per day for that given month. I personally prefer the total running+walking distance summary for several reasons, so I thought I would easily export the data from my phone to my computer and quickly generate a plot to summarize my data the way I wanted. As it turned out, it took a bit more time the first time around than I expected, and in anticipation of wanting to quickly generate these graphs in the future, I set out to streamline the following process.
The process is: 1) Export Health data from within the Health App 2) Send via email attachment 3) Download and unzip file 4) Parse XML file and extract most relevant data 5) summarize and visualize data
While #1 and #2 are easy enough to do, #3-5 are a bit more time consuming, but by way of the ‘gmailr’ package, I successfully automated steps 3-5 which at the end output a nice visualization of my data the way I like it!
Load all packages needed.
#load packages
require(XML)
## Loading required package: XML
require(dplyr)
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
require(ggplot2)
## Loading required package: ggplot2
require(tidyr)
## Loading required package: tidyr
require(gmailr)
## Loading required package: gmailr
##
## Attaching package: 'gmailr'
## The following object is masked from 'package:dplyr':
##
## id
## The following object is masked from 'package:utils':
##
## history
## The following objects are masked from 'package:base':
##
## body, date, labels, message
At this point, you’ll need to use your Google API credentials to use the ‘gmailr’ packages. We’ll use the gmail_auth function to pass in our credentials.
#google auth
gmail_auth(id="your-id",secret="your-secret")
Once authenticated, search emails by subject for string matching “Health Data” using the threads function in ‘gmailr’. This is the default email subject generated when exporting health data from the Health App. Limit the results to only 1 since we only want to return the most recent email matching our search term.
#store message id of latest email with subject
#line 'Health Data' to 'msg' variable
msg<-as.character(unlist(threads(search="Health Data",num_results=1))[1])
## Auto-refreshing stale OAuth token.
#save email attachment
save_attachments(message(msg))
unzip('export.zip',files="export.xml")
At this point, we’ve downloaded the attachment associated with our email of interest. The next step is to extract the most relevant data and convert the raw xml file into a format more amenable to analysis.
#parse XML file & convert to list
data<-xmlParse("export.xml")
xml_data<-xmlToList(data)
#Example of a record entry
xml_data[30000]
## $Record
## type
## "HKQuantityTypeIdentifierDistanceWalkingRunning"
## sourceName
## "AP iPhone"
## sourceVersion
## "9.2.1"
## device
## "<<HKDevice: 0x13fa0db70>, name:iPhone, manufacturer:Apple, model:iPhone, hardware:iPhone7,2, software:9.2.1>"
## unit
## "mi"
## creationDate
## "2016-04-07 13:17:25 -0700"
## startDate
## "2016-04-07 12:38:38 -0700"
## endDate
## "2016-04-07 12:43:41 -0700"
## value
## "0.0424831"
#unlist xml_data to access individual elements by name
#Is there a more efficient way to do this?
xml_unlist<-unlist(xml_data)
#extract data of interest: type, unit, value, date
#is there a cleaner way to obtain the same result?
Record.type<-as.vector(xml_unlist[grep("Record.type|Record..attrs.type",names(xml_unlist))])
Record.unit<-as.vector(xml_unlist[grep("Record.unit|Record..attrs.unit",names(xml_unlist))])
Record.value<-as.vector(xml_unlist[grep("Record.value|Record..attrs.value",names(xml_unlist))])
Record.creationDate<-as.vector(xml_unlist[grep("Record.creationDate|Record..attrs.creationDate",names(xml_unlist))])
#combine data of interest into a single data.frame
healthData<-data.frame(cbind(Record.creationDate,Record.type,Record.unit,Record.value),stringsAsFactors=F)
#change value column data type to numeric
healthData$Record.value<-as.numeric(healthData$Record.value)
#separate date column into its constituents, delimited by spaces
healthData<-healthData %>% separate(Record.creationDate,c('date','time','misc'),sep=" ")
#further separate date into year,month,day columns
healthData<-healthData %>% separate(date,c('year','month','day'),sep="-")
#make Record.type column more readable
healthData$Record.type<-gsub('HKQuantityTypeIdentifier',"",healthData$Record.type)
#summarise values by monthly totals
healthData_summary<-arrange(summarise(group_by(healthData,Record.type,year,month),sum=sum(Record.value)),desc(year)) %>% data.frame()
#view structure of data.frame
str(subset(healthData,Record.type=='DistanceWalkingRunning'))
## 'data.frame': 19731 obs. of 8 variables:
## $ year : chr "2015" "2015" "2015" "2015" ...
## $ month : chr "07" "08" "08" "08" ...
## $ day : chr "25" "19" "19" "19" ...
## $ time : chr "04:51:11" "13:46:01" "13:46:01" "13:46:01" ...
## $ misc : chr "-0700" "-0700" "-0700" "-0700" ...
## $ Record.type : chr "DistanceWalkingRunning" "DistanceWalkingRunning" "DistanceWalkingRunning" "DistanceWalkingRunning" ...
## $ Record.unit : chr "mi" "mi" "mi" "mi" ...
## $ Record.value: num 0.010911 0.024333 0.000292 0.059969 0.00087 ...
Finally, we’ll plot the total number of miles covered per month over the course of the last year.
ggplot(subset(healthData_summary,Record.type=="DistanceWalkingRunning"),aes(x=Record.type,y=sum,fill=year))+geom_bar(stat="identity")+facet_grid(.~month)+geom_text(aes(label=round(sum,1)),position=position_dodge(width=0.9),vjust=-0.25) + theme(axis.text.x=element_blank()) + labs(list(title="Total Running+Walking Distance per Month for 2015-2016",x="Walking+Running",y="Total Distance (mi)"))
ggsave("HealthDataSummary.jpg")
## Saving 7 x 5 in image
Finally, send an email with the image as an attachment
#compose new message
mime() %>%
to("anthonygpena@gmail.com") %>%
from("anthonygpena@gmail.com") -> html_msg
#insert subject & attach image
html_msg %>%
subject("Export-analyzed") %>%
attach_file("HealthDataSummary.jpg",type="jpg") -> file_attachment
#send
send_message(file_attachment)
## Id: 15557d892c3bafd6
## To:
## From:
## Date:
## Subject:
##