class: center, middle, inverse, title-slide # Accessing Apple Health Data ## Visualising data measured from activity watches ### Heidi Thornton, PhD ### 2020/05/15 --- background-image: url("Images/Watch.png") background-size: 30% background-position: 50% 95% # Alternative Tracking Method #### As sports scientists, during this COVID-19 shut, we have limmited ways of determining adherance or completion of training programs (which of course are completed within Govenment guidelines)! -- As a **guide** to measure the training completed during this period is via activity watches -- Whilst the reliability and validity of activity watches is not widely available, these measure basic metrics such as total distance, duration, HR, energy burnt etc -- Here, I will run through how to export this data without an API, basic analysis and manipulation and will demonstrate a few plots using ***ggplot*** --- background-image: url("Images/Steps.png") background-size: 60% background-position: 50% 95% # Accessing Apple Health Data **Step 1** - Click on apple health app on your phone, on summary page in the top right, click on circle with first name letter (figure on left) -- **Step 2** - Slide down to ‘Export All Health Data’ (middle figure) -- **Step 3** - This export will take up to a few minutes, then a page to email/message etc it will pop up. Simply email to yourself and download it --- background-image: url("Images/Heart.png") background-size: 15% background-position: 95% 3% # Opening Apple Health Data You can open the zip folder directly using R, however for this project I will extract the file that is within the folder'apple_health_export'. -- Inside the 'apple_health_export' folder, there will be 2 files. For the purpose of this, you will only need the ‘export’ file -- I have made a new folder that houses all files where I will set my working directly to this location and load packages. If you don't have these packages installed use install.packages("packagename") ```r setwd("C:/Users/Heidi.Thornton/Desktop/Apple") ``` ```r library("XML") library("methods") library("tidyverse") library("lubridate") library("ggplot2") library("dplyr") ``` --- background-image: url("Images/R.png") background-size: 20% background-position: 50% 90% # Why do this in R? ###R is reproducible - microsoft excel is not... -- ####We could simply export the raw data into excel and manipulate and visualise it there -- ####But...R isn't that difficult and there are lots of resources out there to help learn it -- ### Have the **end game** in mind --- # Import the Data into R The file format (XML; Extensible Markup Language) is quite easy to work with in R using the **XML** package -- I have created a folder where I have put my xml file. If you have multiple files you can use a loop to access them all - I will only use one file though -- First, we need to make an xml object and view it's contents using summary(xml) -- ```r xml <- xmlParse(paste("xml/Thornton, Heidi.xml")) summary(xml) ``` ``` ## $nameCounts ## ## Record Workout MetadataEntry ExportDate HealthData ## 200264 202 70 1 1 ## Me ## 1 ## ## $numNodes ## [1] 200539 ``` --- # View the Data I'm interested in the workout data which we can open using **xmlAttrsToDataFrame** -- ```r df_workout <- XML:::xmlAttrsToDataFrame(xml["//Workout"])[c(1:2,4,6,12)] head(df_workout, n = 5) # View the data ``` ``` ## workoutActivityType duration totalDistance ## 1 HKWorkoutActivityTypeRunning 42.44751790364583 8.01347998046875 ## 2 HKWorkoutActivityTypeCycling 45.69051513671875 11.1736796875 ## 3 HKWorkoutActivityTypeOther 45.29886474609375 3.60589990234375 ## 4 HKWorkoutActivityTypeOther 58.97060139973959 1.7677099609375 ## 5 HKWorkoutActivityTypeRunning 39.55806477864584 4.17927978515625 ## totalEnergyBurned endDate ## 1 2041.792 2019-11-23 06:40:35 +1000 ## 2 1096.208 2019-11-24 06:56:26 +1000 ## 3 836.8 2019-11-24 15:27:41 +1000 ## 4 736.384 2019-11-25 05:27:23 +1000 ## 5 669.4400000000001 2019-11-25 16:42:42 +1000 ``` -- Here we have the session 'type' and the respective data for each day which needs some cleaning up --- # Plotting Data Now lets start with my running sessions which we need to filter -- We will use %>% (pipes) from the **tidyverse** package as it is much quicker rather than making new data frames and will plot the data using **ggplot2** -- ```r df_workout %>% # Change data types (i.e. distance to m not km, numeric) mutate(workoutActivityType = as.character(workoutActivityType), totalDistance = as.numeric(as.character(totalDistance))*1000, # convert to m not km duration = as.numeric(duration), # Change to numeric endDate = as.Date(endDate)) %>% # change date from factor format # Only running sessions- depending on watch the name may differ filter(workoutActivityType == "HKWorkoutActivityTypeRunning") %>% filter(endDate >= "2020-03-23") %>% # only after shut down # Create ggplot ggplot(aes(x= endDate, y = totalDistance)) + geom_bar(stat="identity", fill='#5ab4ac')+ labs(title = "Running Sessions") + # title ylab(bquote(bold("Total distance (m)"))) + # y axis title xlab(bquote(bold("Date")))+ # x axis title scale_x_date(date_breaks = "7 days") + theme_minimal() # remove some theme formatting ``` --- # Daily Running Sessions Now we have our first plot, showing running volume (m) by day, including days after the COVID shut down. Not exactly periodised, but it's better than nothing.... -- <img src="Presentation_files/figure-html/Distance Graph ouput-1.png" style="display: block; margin: auto;" /> --- # Weekly Running Volume We need to manipulate the data a bit more to get the weekly running volume -- ```r df_workout %>% mutate(workoutActivityType = as.character(workoutActivityType), totalDistance = as.numeric(as.character(totalDistance))*1000, endDate = as.Date(endDate), week = isoweek(ymd(endDate))) %>% # Add week column filter(workoutActivityType == "HKWorkoutActivityTypeRunning") %>% filter(endDate >= "2020-03-23")%>% group_by(week)%>% # summarise by week (starts week in 1st jan) ggplot(aes(x=week, y=totalDistance)) + geom_bar(stat="identity", fill='#5ab4ac')+labs(title = "Running Sessions")+ ylab(bquote(bold("Total distance (m)"))) + xlab(bquote(bold("Week")))+ theme_minimal() ``` <img src="Presentation_files/figure-html/Weekly volume-1.png" style="display: block; margin: auto;" /> --- # Energy Consumption I want to know energy consumption by **activity type** which we need to rename -- ```r df_workout %>% mutate(workoutActivityType = as.character(workoutActivityType), totalEnergyBurned = as.numeric(as.character(totalEnergyBurned)), endDate = as.Date(endDate), week = isoweek(ymd(endDate)), # Add week column Type = str_sub(workoutActivityType, 22)) %>% # new column- text after 22nd character filter(endDate >= "2020-03-23" & Type %in% c('Running','Other')) %>% # Filter out cycling group_by(week) %>% ggplot(aes(x=week, y=totalEnergyBurned, fill=Type)) + geom_bar(stat="identity", position=position_dodge())+ facet_wrap(~Type)+ labs(title = "Energy Consumption by Session Type")+ ylab(bquote(bold("Energy Consumption (kj)"))) + xlab(bquote(bold("Week")))+ scale_x_continuous( breaks = seq(13, 19, by = 1)) + theme_minimal() + theme(legend.position = "none") ``` --- # Energy Consumption Plot Now we can see weekly energy consumption by **activity type**. 'Other' sessions include weights or walking -- <img src="Presentation_files/figure-html/energy graph-1.png" style="display: block; margin: auto;" /> --- # General Activity Data Lets move on from workout data and import 'Record' data. This one will take a fair while to load -- ```r df_record <- XML:::xmlAttrsToDataFrame(xml["//Record"]) [c(1,6,8)] ``` -- ```r # See data types available in record df_record %>% mutate(Type = str_sub(type, 25)) %>% # Include text after the 25th character select(Type) %>% distinct ``` ``` ## Type ## 1 DietaryWater ## 2 Height ## 3 BodyMass ## 4 HeartRate ## 5 StepCount ## 6 DistanceWalkingRunning ## 7 BasalEnergyBurned ## 8 ActiveEnergyBurned ## 9 FlightsClimbed ## 10 RestingHeartRate ## 11 HeadphoneAudioExposure ## 12 SleepAnalysis ``` -- If you want to view the full dataset, you can use **head(df_record)** --- # Steps Data Manipulation This one isn't exactly useful for athletes - this is more for my own interest of my activity (or lack of) during the COVID shut down I am replicating a plot created [online](https://taraskaduk.com/2019/03/23/apple-health/), demonstrating step count -- ```r df_record %>% mutate(Type = str_remove(type, "HKQuantityTypeIdentifier"), # Rename type value = as.numeric(as.character(value)), Date = as.Date.character(startDate),weekday = wday(Date), # Day of week hour = hour(startDate)) %>% # Need to use the factor date filter(Type == 'StepCount' & Date >= "2020-03-23") %>% group_by(Date, weekday, hour) %>% # Summarise by date, weekday and hour summarise(value = sum(value)) %>% # Sum steps over ^^ group_by(weekday, hour) %>% # Now summarise by weekday and hour summarise(value = mean(value)) %>% # Take mean steps over ^^ filter(between(hour,6,21)) %>% # Filtering to include between 6am - 9pm ggplot(aes(x=hour, y=weekday, fill=value)) + geom_tile(col = 'grey40') + scale_fill_continuous(labels = scales::comma, low='grey95',high ='#008FD5') + scale_x_continuous(breaks = c(6,9,12,15,18), label = c("6 AM","9 AM", "Midday", "3PM", "6 PM")) + scale_y_reverse(breaks = c(1,2,3,4,5,6,7), label =c("Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"))+ labs(title = "Step Count Heatmap") + ylab(bquote(bold("Weekday"))) + xlab(bquote(bold("Hour")))+ guides(fill=FALSE)+ coord_equal()+ theme_minimal() ``` --- # Steps by Hour by Day Heatmap Not a lot of activity at the moment.... <img src="Presentation_files/figure-html/Step count plot by hour-1.png" style="display: block; margin: auto;" /> --- # Heart Rate One last one - we will plot HR across 2 days. I have added a colour scale for low (green) and high (red) HR -- .pull-left[ ```r df_record %>% mutate(Type = str_remove(type, "HKQuantityTypeIdentifier"), value = as.numeric(as.character(value)), startDate = as_datetime(startDate), Date = as.Date.character(startDate)) %>% filter(Type == 'HeartRate') %>% filter(Date >= as.Date("2020-04-03") & Date <= as.Date("2020-04-04")) %>% ggplot(aes(x=startDate, y=value,colour=value))+ geom_line(size=1) + scale_color_gradient(low="green", high="red")+ labs(title = "Heart Rate") + ylab(bquote(bold("Heart Rate"))) + xlab(bquote(bold("Date/Time")))+ theme_minimal() ``` ] .pull-right[ <!-- --> ] --- # Access More Apple Health Info There are other data types available from Apple Health ```apple df_record <- XML:::xmlAttrsToDataFrame(xml["//Record"]) df_activity <- XML:::xmlAttrsToDataFrame(xml["//ActivitySummary"]) df_workout <- XML:::xmlAttrsToDataFrame(xml["//Workout"]) df_clinical <- XML:::xmlAttrsToDataFrame(xml["//ClinicalRecord"]) df_location <- XML:::xmlAttrsToDataFrame(xml["//Location"]) ``` For more information on analysing Apple Health data, check out; ####["Analyze and visualize your iPhone's Health app data in R"](https://taraskaduk.com/2019/03/23/apple-health/) ####["Explore your Apple Watch heart rate data in R"](https://jeffjjohnston.github.io/rstudio/rmarkdown/2016/04/28/explore-your-apple-watch-heart-rate-data.html) --- # Thanks for looking! 😊 #### If you want to learn more about R, there is some awesome work out there from fellow Aussies! Click on the names to view ####[Alice Sweeting](http://sportstatisticsrsweet.rbind.io/)<br><br> ####[Mitch Henderson](https://www.mitchhenderson.org/)<br><br> ####[Jacquie Tran](https://www.jacquietran.com/)<br><br> <a href="mailto:heidi.thornton@goldcoastfc.com.au"> .white[<i class="fas fa-envelope ">
<@heidithornton09] </a>