Overview


Quantified Self Project

Apple Watch has been on market for several years and I got my watch three years ago. During the wearing of it, the watch has collected tons of data about myself. Besides the charts and analysis provided by Apple, I am curious about my health or sports performance. The quantified self is a way to know more about this data. It motivates me to raise questions about myself, collect data about myself, analyzed the data by myself, and generate the result for myself. This project will follow exactly the above procedure and help me to know more about my data.

After careful exploratory data analysis (EDA), some questions were raised from the data.

  1. When the first version of Apple Watch was released, I was surprised by its health tracking function. I bought the Apple Watch series 2 to track my health data and encourage myself to lead a healthier life. Did the Apple Watch fulfilled its responsibility?

  2. Most of my health data come from winter seasons, it there a reason for it? Reviewing all dates, the highest number of calories I burned for active energies falls in winter seasons. Is this one of the reasons?

  3. Did I do well in daily exercise (not sports)? How did I spend my weekends from the perspective of health data?

  4. My basal energy burned through years seems pretty stable. But my body mass (weight) dropped since mid2017. I am assuming the lighter my body is, the less basal energy I will burn. Is this true?

  5. I have been using Slopes to track my ski and snowboarding activities since 2018. I love the feeling of skiing on first track and I was always trying to do it. How did I do? I also love to catch the last chair. Did I make it? What’s my skiing and snowboarding habits?

  6. How was my BMI performance? Is there a relationship between BMI and the workout I did?

#data overview====
names(df)
 [1] "type"          "sourceName"    "sourceVersion" "unit"         
 [5] "creationDate"  "startDate"     "endDate"       "value"        
 [9] "device"        "year"          "month"         "monthn"       
[13] "date"          "dayofweek"     "hour"         
summary(df)
                                             type       
 HKQuantityTypeIdentifierActiveEnergyBurned    :751368  
 HKQuantityTypeIdentifierBasalEnergyBurned     :448116  
 HKQuantityTypeIdentifierHeartRate             :304091  
 HKQuantityTypeIdentifierStepCount             :101020  
 HKQuantityTypeIdentifierDistanceWalkingRunning: 69428  
 HKQuantityTypeIdentifierAppleExerciseTime     : 45725  
 (Other)                                       : 13156  
       sourceName      sourceVersion           unit        
 Watch      :1591875   5.1.1  :429999   kcal     :1199484  
 iPhone     : 109961   4.3.1  :192609   count/min: 304648  
 Mi Fit     :  29260   4.3    :172284   count    : 111656  
 Slopes     :   1564   4.2    :168005   km       :  69429  
 WaterMinder:    231   5.1.2  :119276   min      :  45725  
 SwimIO     :      8   3.2.2  :118051   mi       :    720  
 (Other)    :      5   (Other):532680   (Other)  :   1242  
  creationDate                   startDate                  
 Min.   :2015-09-28 18:56:23   Min.   :2015-09-28 18:24:49  
 1st Qu.:2017-09-20 13:01:02   1st Qu.:2017-09-16 11:34:38  
 Median :2018-05-24 06:03:41   Median :2018-05-19 14:21:28  
 Mean   :2018-03-25 23:18:14   Mean   :2018-03-23 10:25:30  
 3rd Qu.:2018-12-07 16:45:45   3rd Qu.:2018-11-25 16:17:46  
 Max.   :2019-02-22 18:47:53   Max.   :2019-02-22 18:45:50  
                                                            
    endDate                        value         
 Min.   :2015-09-28 18:26:28   Min.   :   0.000  
 1st Qu.:2017-09-16 11:37:33   1st Qu.:   0.071  
 Median :2018-05-19 14:29:29   Median :   0.228  
 Mean   :2018-03-23 10:27:16   Mean   :  24.193  
 3rd Qu.:2018-11-25 16:17:57   3rd Qu.:  16.750  
 Max.   :2019-02-22 18:46:51   Max.   :4246.990  
                                                 
                                                                                                             device       
 <<HKDevice: 0x282016c10>, name:Apple Watch, manufacturer:Apple, model:Watch, hardware:Watch2,4, software:5.1.1>:     77  
 <<HKDevice: 0x282015db0>, name:Apple Watch, manufacturer:Apple, model:Watch, hardware:Watch2,4, software:5.1.1>:     73  
 <<HKDevice: 0x28204c370>, name:Apple Watch, manufacturer:Apple, model:Watch, hardware:Watch2,4, software:5.1.1>:     73  
 <<HKDevice: 0x2820cb250>, name:Apple Watch, manufacturer:Apple, model:Watch, hardware:Watch2,4, software:5.1.1>:     73  
 <<HKDevice: 0x282015220>, name:Apple Watch, manufacturer:Apple, model:Watch, hardware:Watch2,4, software:5.1.1>:     72  
 (Other)                                                                                                        :1690136  
 NA's                                                                                                           :  42400  
   year             month            monthn           date          
 2015:  13237   12     :415495   Min.   : 1.000   Length:1732904    
 2016: 103653   01     :290076   1st Qu.: 3.000   Class :character  
 2017: 458630   11     :193528   Median : 7.000   Mode  :character  
 2018:1020503   04     :136254   Mean   : 7.088                     
 2019: 136881   07     :127272   3rd Qu.:11.000                     
                09     :107467   Max.   :12.000                     
                (Other):462812                                      
     dayofweek           hour       
 Sunday   :433899   13     :186089  
 Monday   :159299   14     :182793  
 Tuesday  :148606   12     :177990  
 Wednesday:134339   15     :177614  
 Thursday :171969   16     :153210  
 Friday   :182854   11     :147750  
 Saturday :501938   (Other):707458  
str(df)
'data.frame':   1732904 obs. of  15 variables:
 $ type         : Factor w/ 18 levels "HKQuantityTypeIdentifierActiveEnergyBurned",..: 6 6 6 6 6 6 6 6 6 6 ...
 $ sourceName   : Factor w/ 8 levels "Health","iPhone",..: 8 8 8 8 8 8 8 8 8 8 ...
 $ sourceVersion: Factor w/ 122 levels "1.8.7.1","10.0.1",..: 1 1 1 1 1 1 1 1 1 72 ...
 $ unit         : Factor w/ 11 levels "count","count/min",..: 9 9 9 9 9 9 9 9 9 9 ...
 $ creationDate : POSIXct, format: "2016-01-22 17:09:41" "2016-01-25 17:00:56" ...
 $ startDate    : POSIXct, format: "2016-01-22 17:09:40" "2016-01-25 17:00:56" ...
 $ endDate      : POSIXct, format: "2016-01-22 17:09:40" "2016-01-25 17:00:56" ...
 $ value        : num  350 350 350 350 350 350 350 350 500 250 ...
 $ device       : Factor w/ 275084 levels "<<HKDevice: 0x282001180>, name:Apple Watch, manufacturer:Apple, model:Watch, hardware:Watch2,4, software:3.1.1>",..: NA NA NA NA NA NA NA NA NA NA ...
 $ year         : Factor w/ 5 levels "2015","2016",..: 2 2 2 2 2 2 2 2 2 2 ...
 $ month        : Factor w/ 12 levels "01","02","03",..: 1 1 1 1 1 1 2 2 2 11 ...
 $ monthn       : num  1 1 1 1 1 1 2 2 2 11 ...
 $ date         : chr  "2016-01-22" "2016-01-25" "2016-01-26" "2016-01-28" ...
 $ dayofweek    : Ord.factor w/ 7 levels "Sunday"<"Monday"<..: 6 2 3 5 5 7 2 3 2 4 ...
 $ hour         : Factor w/ 24 levels "00","01","02",..: 18 18 13 16 24 17 22 18 18 19 ...

Q1: The Apple Watch


Did the Apple Watch fulfilled its responsibility?

When the first version of Apple Watch was released, I was surprised by its health tracking function. bought the Apple Watch series 2 to track my health data and encourage myself to lead a healthier life. Did the Apple Watch fulfilled its responsibility?

From the charts I could tell that the watch did a pretty good job. First of all, it generated several times of data records than my iphone. Secondly, from apple exercise time bar chart by month by year, I could say, for each year, there are only four months that has a less number of time compared with previous year. That is, in two thirds of the year, I did better than previous year. Thirdly, from point plot of step count data by date and box plot of step count by year and by month, I can see that the value and the mean tend to be higher in more recent years. I am ok to say I take more steps in the past three years. Lastly, from body mass and BMI data, it is clear that I lost a number of weight after mid2017. In conclusion, the data indicates that the watch did a good job.

Row

chart 1

chart 2

chart 3

Row

chart 4

chart 5

chart 6

Q2: The winter season


Most of my health data come from winter seasons, is there a reason for it? Reviewing all dates, the highest number of calories I burned for active energies falls in winter seasons. Is this one of the reasons?

From the monthly health data chart, I found that data from winter seasons, which is from November to April in Boston, MA, takes most part of the data records. I was wondering if there is a reason for this. By looking at the chart showing active energy burned by month, I think I know the reason. Usually when I am doing an exercise or sport, I will burn more calories and the device will record more data. Since the active energy burned is pretty high in winter seasons, especially in December. I believe I did more sports in winter that increased the amount of records generated in winter seasons. The chart for apple exercise time by month by year supported my idea because it shows more exercise time in winter seasons. To be specific, I ski and snowboarding more in winters. The chart for distance down hill snowsports supported it. As you can see, the data for this category only comes from winter seasons.

Row

chart 1

chart 2

Row

chart 3

chacrt 4

Q3: Exercise and weekends


Did I do well in daily exercise (not sports)? How did I spend my weekends from the perspective of health data?

To keep fit, one should not only do sports on weekends, but also keep on daily work out. Therefore the data for weekdays (M to F) is important to answer the first question. Firstly I take a look at active energy burned. It shows in total I burned around only half energy on weekdays than on weekends. Secondly, the heat map for active energy burned by day of week by hour indicates the same information. Next, the heat map for active energy burned by month by day of week also shows that I didn’t do well in weekday exercise. It also answered the second question by telling that for summer seasons, I didn’t do well on weekend exercise neither. In addition, the apple exercise time tells the same story. These data and charts dipicted me as a guy who only workout in winter weekends (yeah that’s when I am enjoying the snow season in ski resorts).

Row

chart 1

chart 2

chart 3

Row

chart 4

chart 5

Q4: Basal Energy burned and body mass


My basal energy burned through years seems pretty stable. But my body mass (weight) dropped since mid 2017. I am assuming the lighter my body is, the less basal energy I will burn. Is this true?

The first four graphs show the basal energy burned by date, boxplot of daily basal energy burned, monthly basal energy burned, and heat map of basal energy burned by day of week by hour. These graph delivered an information that basal energy burned seems pretty stable, although from the heat map we can see it is higher on weekends when I was doing exercise, and in the morning around 8, when I was getting up and hurry for school and work.

The next two graphs depicted my weight information. From the graph we can see the weight is dropping since mid 2017. I calculated the mean of daily body mass and would like to merge the data for body mass and basal energy burned using date information that both dataset have. I hope I can find any relationship between these two variables.

The linear model shows the the coefficient is only 0.01. That means these two variables has a very very weak positive relationship. The plot with a loess curve tells the same story. The basal energy burned is positively related to body mass but to a minimum extend. However, it is always good to keep fit and control weight.

Row

chart 1

chart 2

chart 3

Row

chart 4

chart 5

chart 6

Row

chart 7

chart 8

column

chart 9


Call:
lm(formula = bodymassvalue ~ basalvalue, data = relation)

Coefficients:
(Intercept)   basalvalue  
   55.17786      0.01216  

Call:
lm(formula = bodymassvalue ~ basalvalue, data = relation)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.4783 -2.3349 -0.5618  2.3131  8.9916 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 55.177858   4.971958  11.098  < 2e-16 ***
basalvalue   0.012158   0.002737   4.442 2.79e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.068 on 81 degrees of freedom
Multiple R-squared:  0.1959,    Adjusted R-squared:  0.186 
F-statistic: 19.74 on 1 and 81 DF,  p-value: 2.79e-05

Q5: Skiing and snowboarding


I’ve been using Slopes to track my ski and snowboarding activities since 2018. I love the feeling of skiing on first track and I was always trying to do it. How did I do? I also love to catch the last chair, did I make it? What’s my skiing and snowboarding habits?

Usually a ski resort starts operation on 8 am on weekends and the last chair would on 4 pm.

From the chart we can see, I only skied a little miles at 8 am. Speaking of the weekend, the heat map reveals that I am on the mountain earlier on Saturday than on Sunday. Therefore to experience the first track, I should get up earlier, especially on Sunday.

Since some of the resorts I visited have night ski and I visited some resorts in the Rockies, where there is a time zone difference, it is hard to tell if I catched last chair. However, from the chart we can see I was still on the mountain after 4 pm.

From the interactive 3D plot, we can see I spent most of my days on mountain on weekends. Only in few days I skied Monday to Friday. My snow season starts around November and ends in May. I usually ski more distance in one day in first half of the season. Normally I ski at least five miles in one day. However, there is one outlier shows I skied only half mile in one day. I remember that was when I was teaching snowboarding so I only did one run on that day.

Row

chart 1

chart 2

column

chart 3

Q6: BMI and work out


How was my BMI performance? Is there a relationship between BMI and the workout I did?

The first chart tells that my BMI went over the upper limit of normal range and fall back afterwards. Based on linear regression model, I would say there is no strong relationship between BMI and exercise time, active energy burned, and step count. However, this might because part of the data is missing and the sample I have is not sufficient enough.

Row

chart 1

column

chart 2

column

chart 3

column

chart 4

---
title: "ANLY 512 Final Project"
author: "Li Li 236090"
date: "February 22, 2011"
output:
  flexdashboard::flex_dashboard:
    social: menu
    source_code: embed
    layout: column
    vertical_layout: scroll
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo=FALSE)
#package instalation====
library(plotly)
library(XML)
library(dplyr)
library(ggplot2)
library(ggthemes)
library(lubridate)
library(pander)
library(magrittr)
library(xml2)




#load data====
#load apple health export.xml file
data=file.choose() #choose Export.xml file
xml=xmlParse(data)
#transform xml file to data frame - select the Record rows from the xml file
df=XML:::xmlAttrsToDataFrame(xml["//Record"])




#data overview====
names(df)
summary(df)
str(df)




#data manipulation====
#make value variable numeric
df$value=as.numeric(as.character(df$value))
#make creationDate, startDate, and endDate in a date time variable POSIXct in eastern time zone
df$creationDate=ymd_hms(df$creationDate,tz="America/New_York")
df$startDate=ymd_hms(df$startDate,tz="America/New_York")
df$endDate=ymd_hms(df$endDate,tz="America/New_York")
summary(df$endDate)
##add new variables: year month date dayofweek hour
df$year=format(df$endDate,"%Y")
df$year=as.factor(df$year)
df$month=format(df$endDate,"%m")
df$month=as.factor(df$month)
df$monthn=format(df$endDate,"%m")
df$monthn=as.numeric(as.character(df$monthn))
df$date=format(df$endDate,"%Y-%m-%d")
df$dayofweek=wday(df$endDate,label=TRUE,abbr=FALSE)
df$hour=format(df$endDate,"%H")
df$hour=as.factor(df$hour)
str(df)
#Since the data ranged from date 2015/9/28 to date 2019/2/19, the result in analyzing monthly and annually data might be misleading.
#Therefore, I only use the three year data from 2016/1/1 to 2018/12/31.
df2=subset(df,year==2016|year==2017|year==2018)
summary(df2$endDate)
df2016=subset(df,year==2016)
df2017=subset(df,year==2017)
df2018=subset(df,year==2018)
```

Overview
==========================================================================

***
Quantified Self Project

Apple Watch has been on market for several years and I got my watch three years ago. During the wearing of it, the watch has collected tons of data about myself. Besides the charts and analysis provided by Apple, I am curious about my health or sports performance. The quantified self is a way to know more about this data. It motivates me to raise questions about myself, collect data about myself, analyzed the data by myself, and generate the result for myself. This project will follow exactly the above procedure and help me to know more about my data.

After careful exploratory data analysis (EDA), some questions were raised from the data.

a. When the first version of Apple Watch was released, I was surprised by its health tracking function. I bought the Apple Watch series 2 to track my health data and encourage myself to lead a healthier life. Did the Apple Watch fulfilled its responsibility?

b. Most of my health data come from winter seasons, it there a reason for it? Reviewing all dates, the highest number of calories I burned for active energies falls in winter seasons. Is this one of the reasons?

c. Did I do well in daily exercise (not sports)? How did I spend my weekends from the perspective of health data?

d. My basal energy burned through years seems pretty stable. But my body mass (weight) dropped since mid2017. I am assuming the lighter my body is, the less basal energy I will burn. Is this true?

e. I have been using Slopes to track my ski and snowboarding activities since 2018. I love the feeling of skiing on first track and I was always trying to do it. How did I do? I also love to catch the last chair. Did I make it? What's my skiing and snowboarding habits?

f. How was my BMI performance? Is there a relationship between BMI and the workout I did?

```{r,echo=TRUE}
#data overview====
names(df)
summary(df)
str(df)
```




Q1: The Apple Watch
==========================================================================

***
Did the Apple Watch fulfilled its responsibility?

When the first version of Apple Watch was released, I was surprised by its health tracking function. bought the Apple Watch series 2 to track my health data and encourage myself to lead a healthier life. Did the Apple Watch fulfilled its responsibility?

From the charts I could tell that the watch did a pretty good job. First of all, it generated several times of data records than my iphone. Secondly, from apple exercise time bar chart by month by year, I could say, for each year, there are only four months that has a less number of time compared with previous year. That is, in two thirds of the year, I did better than previous year. Thirdly, from point plot of step count data by date and box plot of step count by year and by month, I can see that the value and the mean tend to be higher in more recent years. I am ok to say I take more steps in the past three years. Lastly, from body mass and BMI data, it is clear that I lost a number of weight after mid2017. In conclusion, the data indicates that the watch did a good job.

Row {data-height=600}
-------------------------------------
### chart 1
```{r, echo=FALSE}
source=summary(df2$sourceName)
topsource=head(sort(source,decreasing=TRUE),4)
par(mar=c(10,10,3,3))
sourceplot=barplot(topsource,main="Source of Health Data",xlab="",ylab="",las=2,space=1,names.arg="")
text(sourceplot[,1], -3.7, srt = 90, adj= 1, xpd = TRUE, labels = names(topsource) , cex=1.2)
```

### chart 2
```{r, echo=FALSE}
#bar chart apple exercise time by month by year
df2 %>%
  filter(type == 'HKQuantityTypeIdentifierAppleExerciseTime') %>%
  group_by(year,month) %>%
  summarize(appleexercisetime=sum(value)) %>%
  ggplot(aes(x=month, y=appleexercisetime, fill=year)) + 
  geom_bar(position='dodge', stat='identity') +
  scale_y_continuous(labels = scales::comma) +
  scale_fill_brewer() +
  theme_bw() +  
  theme(panel.grid.major = element_blank())
```


### chart 3
```{r,echo=FALSE}
#point plot step count by date
stepcount=subset(df2,df2$type=="HKQuantityTypeIdentifierStepCount")
stepcount1=aggregate(stepcount$value,by=list((substr(stepcount$date,1,1096))),sum) 
names(stepcount1)=c("date","value")
stepcount1$date=as.factor(stepcount1$date)
plot(stepcount1$date,stepcount1$value,ylim=c(0,50000))
abline(h=8716,col="blue")
```


Row {data-height=600}
-------------------------------------
### chart 4
```{r,echo=FALSE}
#boxplot step count by month by year
df2 %>%
  filter(type == 'HKQuantityTypeIdentifierStepCount') %>%
  group_by(date,month,year) %>%
  summarize(stepcount=sum(value)) %>%
  ggplot(aes(x=month, y=stepcount)) + 
  geom_boxplot(aes(fill=(year))) + 
  scale_fill_brewer() +
  theme_bw() +  
  theme(panel.grid.major = element_blank())
```

### chart 5
```{r,echo=FALSE}
#17 HKQuantityTypeIdentifierBodyMass====
bodymass=subset(df2,df2$type=="HKQuantityTypeIdentifierBodyMass")
plot(bodymass$endDate,bodymass$value)
lines(loess.smooth(bodymass$endDate,bodymass$value),col="blue")
```


### chart 6
```{r,echo=FALSE}
#18 HKQuantityTypeIdentifierBodyMassIndex====
bmi=subset(df2,df2$type=="HKQuantityTypeIdentifierBodyMassIndex")
plot(bmi$endDate,bmi$value,ylim=c(18,27))
lines(loess.smooth(bmi$endDate,bmi$value),col="blue")
abline(h=18.5,col="red")
abline(h=23.9,col="red")
```



Q2: The winter season
==========================================================================

***
Most of my health data come from winter seasons, is there a reason for it? Reviewing all dates, the highest number of calories I burned for active energies falls in winter seasons. Is this one of the reasons?

From the monthly health data chart, I found that data from winter seasons, which is from November to April in Boston, MA, takes most part of the data records. I was wondering if there is a reason for this. By looking at the chart showing active energy burned by month, I think I know the reason. Usually when I am doing an exercise or sport, I will burn more calories and the device will record more data. Since the active energy burned is pretty high in winter seasons, especially in December. I believe I did more sports in winter that increased the amount of records generated in winter seasons. The chart for apple exercise time by month by year supported my idea because it shows more exercise time in winter seasons. To be specific, I ski and snowboarding more in winters. The chart for distance down hill snowsports supported it. As you can see, the data for this category only comes from winter seasons.

Row {data-height=600}
-------------------------------------
### chart 1
```{r, echo=FALSE}
#03 month====
month=summary(df2$month)
par(mfrow=c(1,2))
barplot(month,main="Month of Health Data",xlab="Month",ylab="",space=1)
pie(month,main="Month of Health Data")
```


### chart 2
```{r, echo=FALSE}
#bar chart active energy burned by month
df2 %>%
  filter(type == 'HKQuantityTypeIdentifierActiveEnergyBurned') %>%
  group_by(month) %>%
  summarize(activeenergyburned=sum(value)) %>%
  ggplot(aes(x=month, y=activeenergyburned, fill=month)) + 
  geom_bar(position='dodge', stat='identity') +
  scale_y_continuous(labels = scales::comma) +
  theme_bw() +  
  theme(panel.grid.major = element_blank())
```

Row {data-height=600}
-------------------------------------
### chart 3
```{r, echo=FALSE}
#bar chart apple exercise time by month by year
df2 %>%
  filter(type == 'HKQuantityTypeIdentifierAppleExerciseTime') %>%
  group_by(year,month) %>%
  summarize(appleexercisetime=sum(value)) %>%
  ggplot(aes(x=month, y=appleexercisetime, fill=year)) + 
  geom_bar(position='dodge', stat='identity') +
  scale_y_continuous(labels = scales::comma) +
  scale_fill_brewer() +
  theme_bw() +  
  theme(panel.grid.major = element_blank())
```


### chacrt 4
```{r, echo=FALSE}
#bar chart distance downhill snow sports by month by year
df2 %>%
  filter(type == 'HKQuantityTypeIdentifierDistanceDownhillSnowSports') %>%
  group_by(year,month) %>%
  summarize(distancedownhillsnowsports=sum(value)) %>%
  ggplot(aes(x=month, y=distancedownhillsnowsports, fill=year)) + 
  geom_bar(position='dodge', stat='identity') +
  scale_y_continuous(labels = scales::comma) +
  scale_fill_brewer() +
  theme_bw() +  
  theme(panel.grid.major = element_blank())
```




Q3: Exercise and weekends
==========================================================================

***
Did I do well in daily exercise (not sports)? How did I spend my weekends from the perspective of health data?

To keep fit, one should not only do sports on weekends, but also keep on daily work out. Therefore the data for weekdays (M to F) is important to answer the first question. Firstly I take a look at active energy burned. It shows in total I burned around only half energy on weekdays than on weekends. Secondly, the heat map for active energy burned by day of week by hour indicates the same information. Next, the heat map for active energy burned by month by day of week also shows that I didn't do well in weekday exercise. It also answered the second question by telling that for summer seasons, I didn't do well on weekend exercise neither. In addition, the apple exercise time tells the same story. These data and charts dipicted me as a guy who only workout in winter weekends (yeah that's when I am enjoying the snow season in ski resorts).

Row {data-height=600}
-------------------------------------
### chart 1
```{r, echo=FALSE}
#barchart active energy burned by day of week
df2 %>%
  filter(type == 'HKQuantityTypeIdentifierActiveEnergyBurned') %>%
  group_by(dayofweek) %>%
  summarize(activeenergyburned=sum(value)) %>%
  ggplot(aes(x=dayofweek, y=activeenergyburned)) + 
  geom_bar(position='dodge', stat='identity') +
  scale_y_continuous(labels = scales::comma) +
  scale_fill_brewer() +
  theme_bw() +  
  theme(panel.grid.major = element_blank())
```


### chart 2
```{r, echo=FALSE}
#heatmap active energy burned by day of week by hour of day
df2 %>%
  filter(type == 'HKQuantityTypeIdentifierActiveEnergyBurned') %>%
  group_by(date,dayofweek,hour) %>% 
  summarize(activeenergyburned=sum(value)) %>% 
  group_by(hour,dayofweek) %>% 
  summarize(activeenergyburned=sum(activeenergyburned)) %>% 
  arrange(desc(activeenergyburned)) %>%
  ggplot(aes(x=dayofweek, y=hour, fill=activeenergyburned)) + 
  geom_tile() + 
  scale_fill_continuous(labels = scales::comma, low = 'white', high = 'red') +
  theme_bw() + 
  theme(panel.grid.major = element_blank())
```


### chart 3
```{r, echo=FALSE}
#heatmap active energy burned by day of week by month
df2 %>%
  filter(type == 'HKQuantityTypeIdentifierActiveEnergyBurned') %>%
  group_by(date,dayofweek,month) %>% 
  summarize(activeenergyburned=sum(value)) %>% 
  group_by(month,dayofweek) %>% 
  summarize(activeenergyburned=sum(activeenergyburned)) %>% 
  arrange(desc(activeenergyburned)) %>%
  ggplot(aes(x=month, y=dayofweek, fill=activeenergyburned)) + 
  geom_tile() + 
  scale_fill_continuous(labels = scales::comma, low = 'white', high = 'red') +
  theme_bw() + 
  theme(panel.grid.major = element_blank())
```


Row {data-height=600}
-------------------------------------
### chart 4
```{r, echo=FALSE}
#heatmap apple exercise time by day of week by hour of day
df2 %>%
  filter(type == 'HKQuantityTypeIdentifierAppleExerciseTime') %>%
  group_by(date,dayofweek,hour) %>% 
  summarize(appleexercisetime=sum(value)) %>% 
  group_by(hour,dayofweek) %>% 
  summarize(appleexercisetime=sum(appleexercisetime)) %>% 
  arrange(desc(appleexercisetime)) %>%
  ggplot(aes(x=dayofweek, y=hour, fill=appleexercisetime)) + 
  geom_tile() + 
  scale_fill_continuous(labels = scales::comma, low = 'white', high = 'red') +
  theme_bw() + 
  theme(panel.grid.major = element_blank())
```


### chart 5
```{r, echo=FALSE}
#heatmap apple exercise time by day of week by month
df2 %>%
  filter(type == 'HKQuantityTypeIdentifierAppleExerciseTime') %>%
  group_by(date,dayofweek,month) %>% 
  summarize(appleexercisetime=sum(value)) %>% 
  group_by(month,dayofweek) %>% 
  summarize(appleexercisetime=sum(appleexercisetime)) %>% 
  arrange(desc(appleexercisetime)) %>%
  ggplot(aes(x=month, y=dayofweek, fill=appleexercisetime)) + 
  geom_tile() + 
  scale_fill_continuous(labels = scales::comma, low = 'white', high = 'red') +
  theme_bw() + 
  theme(panel.grid.major = element_blank())
```





Q4: Basal Energy burned and body mass
==========================================================================

***
My basal energy burned through years seems pretty stable. But my body mass (weight) dropped since mid 2017. I am assuming the lighter my body is, the less basal energy I will burn. Is this true?

The first four graphs show the basal energy burned by date, boxplot of daily basal energy burned, monthly basal energy burned, and heat map of basal energy burned by day of week by hour. These graph delivered an information that basal energy burned seems pretty stable, although from the heat map we can see it is higher on weekends when I was doing exercise, and in the morning around 8, when I was getting up and hurry for school and work.

The next two graphs depicted my weight information. From the graph we can see the weight is dropping since mid 2017. I calculated the mean of daily body mass and would like to merge the data for body mass and basal energy burned using date information that both dataset have. I hope I can find any relationship between these two variables.

The linear model shows the the coefficient is only 0.01. That means these two variables has a very very weak positive relationship. The plot with a loess curve tells the same story. The basal energy burned is positively related to body mass but to a minimum extend. However, it is always good to keep fit and control weight.

Row {data-height=600}
-------------------------------------
### chart 1
```{r, echo=FALSE}
#point plot basal energy burned by date
basalenergy=subset(df2,df2$type=="HKQuantityTypeIdentifierBasalEnergyBurned")
basalenergy1=aggregate(basalenergy$value,by=list((substr(basalenergy$date,1,771))),sum) 
names(basalenergy1)=c("date","value")
basalenergy1$date=as.factor(basalenergy1$date)
plot(basalenergy1$date,basalenergy1$value,ylim=c(1400,2400))
abline(h=1849,col="blue")
```


### chart 2
```{r, echo=FALSE}
boxplot(basalenergy1$value,ylim=c(1400,2400))
```


### chart 3
```{r, echo=FALSE}
#bar chart basal energy burned by month by year
df2 %>%
  filter(type == 'HKQuantityTypeIdentifierBasalEnergyBurned') %>%
  group_by(year,month) %>%
  summarize(basalenergyburned=sum(value)) %>%
  ggplot(aes(x=month, y=basalenergyburned, fill=year)) + 
  geom_bar(position='dodge', stat='identity') +
  scale_y_continuous(labels = scales::comma) +
  scale_fill_brewer() +
  theme_bw() +  
  theme(panel.grid.major = element_blank())
```


Row {data-height=600}
-------------------------------------
### chart 4
```{r, echo=FALSE}
#heatmap basal energy burned by day of week by hour of day
df2 %>%
  filter(type == 'HKQuantityTypeIdentifierBasalEnergyBurned') %>%
  group_by(date,dayofweek,hour) %>% 
  summarize(basalenergyburned=sum(value)) %>% 
  group_by(hour,dayofweek) %>% 
  summarize(basalenergyburned=sum(basalenergyburned)) %>% 
  arrange(desc(basalenergyburned)) %>%
  ggplot(aes(x=dayofweek, y=hour, fill=basalenergyburned)) + 
  geom_tile() + 
  scale_fill_continuous(labels = scales::comma, low = 'white', high = 'red') +
  theme_bw() + 
  theme(panel.grid.major = element_blank())
```


### chart 5
```{r, echo=FALSE}
#17 HKQuantityTypeIdentifierBodyMass====
bodymass=subset(df2,df2$type=="HKQuantityTypeIdentifierBodyMass")
par(mar=c(3,3,3,3))
boxplot(bodymass$value)
```


### chart 6
```{r,echo=FALSE}
plot(bodymass$endDate,bodymass$value)
lines(loess.smooth(bodymass$endDate,bodymass$value),col="blue")
```

Row {data-height=600}
-------------------------------------
### chart 7
```{r, echo=FALSE}
#relationship between body mass and and basal energy burned
bodymass1=aggregate(bodymass$value,by=list((substr(bodymass$date,1,153))),mean) 
names(bodymass1)=c("date","bodymassvalue")
bodymass1$date=as.factor(bodymass1$date)
plot(bodymass1$date,bodymass1$bodymassvalue)
```


### chart 8
```{r, echo=FALSE}
basalenergy=subset(df2,df2$type=="HKQuantityTypeIdentifierBasalEnergyBurned")
basalenergy2=aggregate(basalenergy$value,by=list((substr(basalenergy$date,1,771))),sum)
names(basalenergy2)=c("date","basalvalue")
basalenergy2$date=as.factor(basalenergy2$date)
plot(basalenergy2$date,basalenergy2$basalvalue,ylim=c(1400,2400))
```


column {data-weight=1000}
-------------------------------------
### chart 9
```{r, echo=FALSE}
relation=merge(bodymass1, basalenergy2, by = "date",all = FALSE)
relation$date=as.POSIXct(as.character.POSIXt(as.character.Date(as.character(relation$date))))
relation=relation[order(relation$date),]



lm=lm(bodymassvalue~basalvalue,data=relation)
lm
summary(lm)
plot(relation$bodymassvalue,relation$basalvalue)
lines(loess.smooth(relation$bodymassvalue,relation$basalvalue),col="blue")
```



Q5: Skiing and snowboarding
==========================================================================

***
I've been using Slopes to track my ski and snowboarding activities since 2018. I love the feeling of skiing on first track and I was always trying to do it. How did I do? I also love to catch the last chair, did I make it? What's my skiing and snowboarding habits?

Usually a ski resort starts operation on 8 am on weekends and the last chair would on 4 pm.

From the chart we can see, I only skied a little miles at 8 am. Speaking of the weekend, the heat map reveals that I am on the mountain earlier on Saturday than on Sunday. Therefore to experience the first track, I should get up earlier, especially on Sunday.

Since some of the resorts I visited have night ski and I visited some resorts in the Rockies, where there is a time zone difference, it is hard to tell if I catched last chair. However, from the chart we can see I was still on the mountain after 4 pm.

From the interactive 3D plot, we can see I spent most of my days on mountain on weekends. Only in few days I skied Monday to Friday. My snow season starts around November and ends in May. I usually ski more distance in one day in first half of the season. Normally I ski at least five miles in one day. However, there is one outlier shows I skied only half mile in one day. I remember that was when I was teaching snowboarding so I only did one run on that day.

Row {data-height=600}
-------------------------------------
### chart 1
```{r, echo=FALSE}
#bar chart distance downhill snow sports by hour
df2 %>%
  filter(type == 'HKQuantityTypeIdentifierDistanceDownhillSnowSports') %>%
  group_by(hour) %>%
  summarize(distancedownhillsnowsports=sum(value)) %>%
  ggplot(aes(x=hour, y=distancedownhillsnowsports,fill=hour)) + 
  geom_bar(position='dodge', stat='identity') +
  scale_y_continuous(labels = scales::comma) +
  theme_bw() +  
  theme(panel.grid.major = element_blank())
```


### chart 2
```{r, echo=FALSE}
#heatmap distance downhill snow sports by day of week by hour of day
df2 %>%
  filter(type == 'HKQuantityTypeIdentifierDistanceDownhillSnowSports') %>%
  group_by(date,dayofweek,hour) %>% 
  summarize(distancedownhillsnowsports=sum(value)) %>% 
  group_by(hour,dayofweek) %>% 
  summarize(distancedownhillsnowsports=sum(distancedownhillsnowsports)) %>% 
  arrange(desc(distancedownhillsnowsports)) %>%
  ggplot(aes(x=dayofweek, y=hour, fill=distancedownhillsnowsports)) + 
  geom_tile() + 
  scale_fill_continuous(labels = scales::comma, low = 'white', high = 'red') +
  theme_bw() + 
  theme(panel.grid.major = element_blank())
```


column {data-weight=600}
-------------------------------------
### chart 3
```{r, echo=FALSE}
#3D plot of distance downhill snow sports by hour by month
distancedownhillsnowsports=subset(df2,df2$type=="HKQuantityTypeIdentifierDistanceDownhillSnowSports")
distancedownhillsnowsports2=aggregate(distancedownhillsnowsports$value,
                                      by=list((substr(distancedownhillsnowsports$date,1,39))),sum) 
names(distancedownhillsnowsports2)=c("date","value")
distancedownhillsnowsports2$date=as.character(distancedownhillsnowsports2$date)
distancedownhillsnowsports2$date=as.character.Date(distancedownhillsnowsports2$date)
distancedownhillsnowsports2$date=as.POSIXct(distancedownhillsnowsports2$date)
distancedownhillsnowsports2$dayofweek=wday(distancedownhillsnowsports2$date,label=TRUE, abbr=FALSE)
distancedownhillsnowsports2$month=format(distancedownhillsnowsports2$date,"%m")
distancedownhillsnowsports2$month=as.numeric(distancedownhillsnowsports2$month)
distancedownhillsnowsports2$dayofweek=as.numeric(distancedownhillsnowsports2$dayofweek)
plot_ly(distancedownhillsnowsports2, x = distancedownhillsnowsports2$month, 
        y = distancedownhillsnowsports2$dayofweek, 
        z = distancedownhillsnowsports2$value,
        marker = list(color = distancedownhillsnowsports2$value, colorscale = c('#FFE1A1', '#683531'), showscale = TRUE)) %>%
  add_markers() %>%
  layout(scene = list(xaxis = list(title = 'month'),
                      yaxis = list(title = 'day of week'),
                      zaxis = list(title = 'distance')),
         annotations = list(
           x = 1.13,
           y = 1.05,
           text = 'Distance',
           showarrow = FALSE
         ))
```



Q6: BMI and work out
==========================================================================

***
How was my BMI performance? Is there a relationship between BMI and the workout I did?

The first chart tells that my BMI went over the upper limit of normal range and fall back afterwards. Based on linear regression model, I would say there is no strong relationship between BMI and exercise time, active energy burned, and step count. However, this might because part of the data is missing and the sample I have is not sufficient enough.

Row {data-height=600}
-------------------------------------
### chart 1
```{r echo=FALSE}
bmi=subset(df2,df2$type=="HKQuantityTypeIdentifierBodyMassIndex")
plot(bmi$endDate,bmi$value,ylim=c(18,27))
lines(loess.smooth(bmi$endDate,bmi$value),col="blue")
abline(h=18.5,col="red")
abline(h=23.9,col="red")
```


column {data-weight=800, data-hight=800}
-------------------------------------
### chart 2
```{r echo=FALSE}
#relationship between BMI and and apple exercise time
bmi1=aggregate(bmi$value,by=list((substr(bmi$date,1,145))),mean) 
names(bmi1)=c("date","bmivalue")
bmi1$date=as.factor(bmi1$date)
appleexercise=subset(df2,df2$type=="HKQuantityTypeIdentifierAppleExerciseTime")
appleexercise2=aggregate(appleexercise$value,by=list((substr(appleexercise$date,1,636))),sum) 
names(appleexercise2)=c("date","appleexercisetimevalue")
appleexercise2$date=as.factor(appleexercise2$date)
relation2=merge(bmi1, appleexercise2, by = "date",all = FALSE)
relation2$date=as.POSIXct(as.character.POSIXt(as.character.Date(as.character(relation2$date))))
relation2=relation2[order(relation2$date),]
lm2=lm(appleexercisetimevalue~bmivalue,data=relation2)
plot(relation2$appleexercisetimevalue,relation2$bmivalue)
abline(lm2)
lines(loess.smooth(relation2$appleexercisetimevalue,relation2$bmivalue),col="blue")
```


column {data-weight=800, data-hight=800}
-------------------------------------
### chart 3
```{r echo=FALSE}
#relationship between BMI and and active energy burned
bmi1=aggregate(bmi$value,by=list((substr(bmi$date,1,145))),mean) 
names(bmi1)=c("date","bmivalue")
bmi1$date=as.factor(bmi1$date)
activeenergy=subset(df2,df2$type=="HKQuantityTypeIdentifierActiveEnergyBurned")
activeenergy2=aggregate(activeenergy$value,by=list((substr(activeenergy$date,1,851))),sum) 
names(activeenergy2)=c("date","activeenergyvalue")
activeenergy2$date=as.factor(activeenergy2$date)
relation3=merge(bmi1, activeenergy2, by = "date",all = FALSE)
relation3$date=as.POSIXct(as.character.POSIXt(as.character.Date(as.character(relation3$date))))
relation3=relation3[order(relation3$date),]
lm3=lm(activeenergyvalue~bmivalue,data=relation3)
plot(relation3$activeenergyvalue,relation3$bmivalue)
abline(lm3)
lines(loess.smooth(relation3$activeenergyvalue,relation3$bmivalue),col="blue")
```

column {data-weight=800, data-hight=800}
-------------------------------------
### chart 4
```{r echo=FALSE}
#relationship between BMI and and step count
bmi1=aggregate(bmi$value,by=list((substr(bmi$date,1,145))),mean) 
names(bmi1)=c("date","bmivalue")
bmi1$date=as.factor(bmi1$date)
stepcount=subset(df2,df2$type=="HKQuantityTypeIdentifierStepCount")
stepcount2=aggregate(stepcount$value,by=list((substr(stepcount$date,1,1096))),sum) 
names(stepcount2)=c("date","stepcountvalue")
stepcount2$date=as.factor(stepcount2$date)
relation4=merge(bmi1, stepcount2, by = "date",all = FALSE)
relation4$date=as.POSIXct(as.character.POSIXt(as.character.Date(as.character(relation4$date))))
relation4=relation4[order(relation4$date),]
lm4=lm(stepcountvalue~bmivalue,data=relation4)
plot(relation4$stepcountvalue,relation4$bmivalue)
abline(lm4)
lines(loess.smooth(relation4$stepcountvalue,relation4$bmivalue),col="blue")
```