Summary of data collection
I used Fitbit to collect majority of my data for this final project. The data I collected include steps, calories burnt, exercise category, heart rate, food intake calories, money spent on each meal and stress level. For calorie related data, I used MyFitnessPal to help me count all the food intake calories for each meal. The dataset covers one week of my schedule, which starts from 8 am and ends at 9 pm. My goal of this project is to have a better understanding of how stress level impact my day with the factors of exercises and meals. I would like to answer the following five questions in this project:
Question 1: How many steps did I take in each day? Question 2: How many calories did I burn each day? Question 3: The hours within 6 days burned the most calories Question 4: The correlation between stress level and calories Question 5: Stress level and calories burned in each day
Method: the method I use is the comparison method and find out the movement trend of observation.
Tool: I mainly used ‘ggplot’, because it is the most common method in R language
Load libraries
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.4
library(lubridate)
## Warning: package 'lubridate' was built under R version 3.4.4
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
library(magrittr)
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.4.4
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:lubridate':
##
## intersect, setdiff, union
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(data.table)
## Warning: package 'data.table' was built under R version 3.4.4
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
##
## between, first, last
## The following objects are masked from 'package:lubridate':
##
## hour, isoweek, mday, minute, month, quarter, second, wday,
## week, yday, year
library(plyr)
## -------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## -------------------------------------------------------------------------
##
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following object is masked from 'package:lubridate':
##
## here
library(ggthemes)
## Warning: package 'ggthemes' was built under R version 3.4.4
Load dataset
dataset <- read.csv('dataset.csv')
#dataset$Steps <- as.numeric(dataset$Steps)
#dataset$DATE <- as.Date(dataset$DATE, format='%m/%d/%Y')
#x <- dataset$DATE
#dataset$DATE <- x + hours(dataset$HOUR)
How many steps do i usually take on a typical work week? I started collecting the data for this project on Sunday, which is Day 1. My steps ranked higher on Sunday because that’s the only full day off I have for each week. I usually try to get a good workout done on Sunday. Day 7 is Saturday which I still retained a good amount of steps. I usually go to grocery shopping after work on Saturday. As indicated in the graph, Monday and Wednesday and Friday are low in steps because I have class in those evenings and also those three days are more packed with work tasks.
How many caleries burnt in each day? As indicated in the graph, the calories corresponds with the steps taken. When I work out more on the days, the calories burnt increases.
dataset3 <- dataset[,c('Day','Calories.burnt')]
dt3 <- data.table(dataset2)
dt3 <- ddply(dataset3, c("Day"), summarize, Calories = sum(Calories.burnt))
#dataset2 %>%
#group_by(Day) %>%
#summarise(total = sum(Steps))
ggplot(data = dt3, aes(x = Day, y = Calories)) +
geom_line(color = 'red')
The hours within 7 days that burnt the most calories: I added all the calories burnt for the seven days to see which hour of the day I usually burn more calories. Since I usually work out in the afternoon, the calories burnt reached the highest on average at around 4 pm of the day.
dataset4 <- dataset[,c('Hour','Calories.burnt')]
dt4 <- data.table(dataset2)
dt4 <- ddply(dataset4, c("Hour"), summarize, Calories = sum(Calories.burnt))
#dataset2 %>%
#group_by(Day) %>%
#summarise(total = sum(Steps))
ggplot(data = dt4, aes(x = Hour, y = Calories)) +
geom_line(color = 'green')
The correlation between stress level and calories burnt I grouped stress level from one to five on a typical workday. From the graph, we could see Monday is usually the most stressful day. Also, when I feel stressful at work, I tend to stay still and not to exercise as much. For stress level 3, 4 and 5, the calories I burnt are less than 500 calories on average.
dataset5 <- dataset[,c('Stress.Level..I.V.','Calories.burnt')]
dt5 <- data.table(dataset5)
dt5 <- ddply(dataset5, c("Stress.Level..I.V."), summarize, Calories = sum(Calories.burnt))
#dataset2 %>%
#group_by(Day) %>%
#summarise(total = sum(Steps))
ggplot(data = dt5, aes(x = Stress.Level..I.V., y = Calories)) +
geom_line(color = 'purple')
Stress level and calories burnt in each day This is the interative graph that tells the relatoinship between stress level and calories burnt on each day. Monday is usually the more stressful day with more work tasks. You could tell that the calories burnt for higher stress level goes down.
ggplot(data = dataset, aes(x = Calories.burnt, y = Day)) +
geom_jitter(aes(colour = factor(Stress.Level..I.V.))) +
theme_few()+
theme(legend.title = element_blank())+
labs(x = "Calories", y = "Days", title = "Stree level VS Calories")+
scale_fill_discrete(name = "Stress.Level..I.V.") +
scale_x_continuous(name="Calories.burnt", breaks=c(0, 500,1000,1500,2000,2500,3000,3500,4000,4500,5000,5500,6000))