This is a Self Quantified Project aided at assessing my daily wellness, subjects studied here include: sleeping hour, coffee use habit, steps and walking distance, time use on various activities.
Data collection: Data was collection both manually and digitally. Some variables, such as Steps,Walking + Running Distance and Phone Use time were collected on my personal Iphone. While other variables, such as Coffee Use, Sleeping Hour, Reading, Workout are collected and recorded manually. All data were recorded into an Excel sheet on my laptop, therefore I used read.xlsx() to read the dataset into RStudio.
Goal: The goal of this project is to assess my daily wellness. I chose my rest, my coffee use, my activities to represent wellness here. Based on the data I collected, 5 questions are asked for this analysis.
What is the pattern of my walking distance and steps during a week?
What is my coffee consumption habit?
What is my sleeping pattern?
Does the coffee use related to my sleeping time?
How is my time distributed a day? Is it healthy?
Preprocesssing: Data preprocessing is a must for every analysis since no dataset is clean and perfect. The data preperation here includes: making sure of the data type of each variable is correct, rename the variables for ease, convert the variales using minutes as unit to using hr as unit, replacing missing value with the mean of that variable’s value. The dataset below is the cleaned dataset.
Analysis Method: Visulization Analysis is to be used for this project. Packages used here include: ggplot2, ggthemes, plotly, dygraphs, xts, latticeExtra.
Date | Coffee | Sleeping | Watching_TV | Steps | Walking_and_Running | Workout | phone_hr | reading_hr | weekday |
---|---|---|---|---|---|---|---|---|---|
2019-05-17 | 1 | 8.00 | 1.258621 | 5177 | 2.20 | 0 | 4.107692 | 0.046875 | Friday |
2019-05-18 | 1 | 7.15 | 1.258621 | 8781 | 3.40 | 0 | 4.107692 | 0.000000 | Saturday |
2019-05-19 | 1 | 7.15 | 2.000000 | 3961 | 1.60 | 0 | 4.107692 | 0.000000 | Sunday |
2019-05-20 | 1 | 7.15 | 2.000000 | 1896 | 0.82 | 0 | 4.107692 | 0.000000 | Monday |
2019-05-21 | 0 | 7.00 | 1.000000 | 2842 | 1.00 | 0 | 4.107692 | 0.000000 | Tuesday |
2019-05-22 | 2 | 8.00 | 1.258621 | 3617 | 1.40 | 0 | 4.107692 | 0.500000 | Wednesday |
2019-05-23 | 1 | 6.00 | 1.258621 | 4108 | 1.40 | 1 | 4.107692 | 0.000000 | Thursday |
2019-05-24 | 1 | 7.00 | 0.500000 | 9617 | 3.80 | 0 | 4.107692 | 0.000000 | Friday |
2019-05-25 | 1 | 6.50 | 0.500000 | 9594 | 3.60 | 0 | 4.107692 | 0.000000 | Saturday |
2019-05-26 | 1 | 7.00 | 1.000000 | 73 | 0.03 | 0 | 3.866667 | 0.000000 | Sunday |
2019-05-27 | 1 | 7.00 | 1.000000 | 102 | 0.04 | 0 | 5.133333 | 0.000000 | Monday |
2019-05-28 | 1 | 7.00 | 1.000000 | 3182 | 1.20 | 0 | 4.300000 | 0.000000 | Tuesday |
2019-05-29 | 1 | 8.00 | 2.000000 | 42 | 0.01 | 0 | 3.616667 | 0.000000 | Wednesday |
2019-05-30 | 1 | 8.00 | 3.000000 | 940 | 0.37 | 0 | 4.050000 | 0.000000 | Thursday |
2019-05-31 | 2 | 8.00 | 2.000000 | 706 | 0.28 | 0 | 2.916667 | 0.000000 | Friday |
2019-06-01 | 2 | 6.00 | 2.000000 | 6827 | 2.80 | 0 | 4.107692 | 0.000000 | Saturday |
2019-06-02 | 1 | 7.00 | 1.000000 | 2623 | 1.00 | 0 | 4.107692 | 0.000000 | Sunday |
2019-06-03 | 2 | 8.00 | 2.000000 | 15949 | 6.70 | 0 | 4.107692 | 0.000000 | Monday |
2019-06-04 | 1 | 7.00 | 3.000000 | 7138 | 3.00 | 2 | 4.107692 | 0.000000 | Tuesday |
2019-06-05 | 2 | 7.00 | 1.500000 | 4802 | 1.80 | 0 | 4.107692 | 0.000000 | Wednesday |
2019-06-06 | 1 | 7.00 | 2.000000 | 69 | 0.02 | 0 | 4.107692 | 0.000000 | Thursday |
2019-06-07 | 1 | 6.00 | 1.000000 | 380 | 0.16 | 0 | 4.107692 | 0.000000 | Friday |
2019-06-08 | 2 | 7.00 | 1.000000 | 5238 | 1.90 | 0 | 4.107692 | 0.000000 | Saturday |
2019-06-09 | 1 | 8.00 | 2.000000 | 12997 | 5.20 | 0 | 4.107692 | 0.000000 | Sunday |
2019-06-10 | 1 | 7.00 | 0.000000 | 7377 | 3.10 | 0 | 4.107692 | 0.000000 | Monday |
2019-06-11 | 2 | 7.00 | 0.000000 | 12339 | 5.10 | 0 | 4.107692 | 0.000000 | Tuesday |
2019-06-12 | 1 | 6.50 | 0.000000 | 17754 | 7.00 | 0 | 3.766667 | 0.000000 | Wednesday |
2019-06-13 | 1 | 7.00 | 0.000000 | 14290 | 5.70 | 0 | 4.300000 | 0.000000 | Thursday |
2019-06-14 | 1 | 8.00 | 0.000000 | 9672 | 4.10 | 2 | 2.966667 | 0.000000 | Friday |
2019-06-15 | 1 | 8.00 | 0.000000 | 18365 | 8.00 | 0 | 3.733333 | 1.000000 | Saturday |
2019-06-16 | 1 | 6.00 | 0.000000 | 7701 | 3.10 | 0 | 6.733333 | 0.000000 | Sunday |
2019-06-17 | 0 | 7.00 | 3.000000 | 85 | 0.03 | 0 | 4.783333 | 0.000000 | Monday |
2019-06-18 | 2 | 7.50 | 2.000000 | 2944 | 1.10 | 0 | 3.233333 | 0.000000 | Tuesday |
In order to see how much steps I take and the distance of my movement per day, I created two line plot using xyplot function and conbime them into one plot with two y-axis. doubleYScale function is used here since Steps and Walking and Running have different units and y axis. I am also interested in seeing how my movement changes on different weekdays, so boxplot is applied here.
From the plots above we could see:
Steps and Walking and Running Distances are closely related and show the same pattern.
No clear evidence indicate whether I am walking more or less overtime in the past 30 days.
Saturday is my most active day, with highest average steps and walking distance, while I tend to move less on Monday and Thursday.
integer(0)
I want to see what is my coffee use habit. Firstly I created a piechart and see how often I drink 0,1 and 2 cup of coffee. Then I want to how much coffee I drink overtime, ggplot, geom_point, and geom_jitter is used here. Boxplot is used here again to see whether my coffee use changes on different weekday.
Here’re some insights:
Around 75% of the days I drink one cup of coffer per day.
Wednesday is the day I drink more coffee on average.
I drank more coffee from the end of May till the beginning of June, then fell back to one cup per day.
I still drink coffee on weekends, which means I’ve built an drinking habit even when I am not at work.
integer(0)
Piechart is firstly used here to see the Sleeping Hour distribution in the past 30 days. In order to see whether weekday has an effect on my sleeping hour, ggplot and geom_point graphs colored by Weekday variable, as well as a boxplot are used.
More than 75% of the days I had an over 7 hour sleeping. But there is still a significant amount of time I slept only for 6 hours.
No significant change of sleeping hour in the past 30 days.
Friday is usually the day I have most sleeping, followed by Wednesday.
Saturday is usually the day I have least sleeping.
Excessive use of coffee might cause sleeping difficulty. I want to see if coffee use influences my sleeping hour. Ggplot and geom_point as well as geom_smooth are used here to see the relationship between my coffee use and sleeping. I also use doubleYScale function to create a plot with 2 y-axis so I could see the pattern of coffee use and sleeping hour at the same time.
From the graphs above:
My Sleeping hour and Coffee Use have a non-linear relationship, and no clear pattern.
I tend to drink more coffee when I have an 7 to 7.5 hr sleep, while I tend to drink less after a 7.5 hr or more sleeping.
I want to see how I spent my day in the past month, therefore dygraph is applied here. I created a new variable called “Other” to represent the rest of time on a day excluding those on the dataset. A stackGraph is used to show the time spent per day and how it changes.
From the graph we could see that:
There are six activities taken into account in this analysis, Sleeping, Watching TV, Phone Use, Workout, Reading, Others.
I spent least time on reading and workout in general, while spent most time on others.
Suprisingly, I spent more around 4 hours on my phone everyday, which is a huge amount of time.
TV watching is another problem, which takes up around 1.5 hr per day.
Time use on all activities is pretty stable in the past 33 days, which indicate this has been my lifestyle.
At the beginning of this project I metioned there’re 5 research questions. Let’s look at the answer here:
I have to admit I was a little bit shocked when I see the result of this project. The Quantified Self revealed some lifestyle and wellness issues of mine. Less-than-6-hour sleep exists, not enough time put into reading and workingout, too much phone use and enetertainment, etc. But knowing the problem is the first step to fix it. So I’d love to make some changes of my lifestyle.
Set a sleep alarm and record my sleeping hour everyday. Make sure I have at least 7 hour of sleep.
Set a phone use limit on my IPhone, reduce the daily phone use to 2- 3 hour.
Set a TV watching limit, don’t be a couch potato.
Make a workout plan and encourage myself to go to the gym at least 2 times per week.
Set up a reading time and try to do some reading everyday, with physical book instead of staring at the screens.
Record my other activies on a day and live a self-awared life.
---
title: "Quantified Self Project"
author: "Yueying Zhang"
date: "June 20 2019"
output:
flexdashboard::flex_dashboard:
orientation: rows
vertical_layout: fill
social: menu
source: embed
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
#load library
library(readxl)
library(tidyverse)
library(ggplot2)
library(ggthemes)
library(plotly)
library(readxl)
library(dygraphs)
library(xts)
library(latticeExtra)
#import the dataset
self = read_xlsx("ANLY512 quantified self.xlsx")
#take a look at the dataset and see their data types
glimpse(self)
##Pre-processing
# convert the some variable into another data type
self$Date = as.Date(self$Date)
self$`total_phone_use(mins)` = as.numeric(self$`total_phone_use(mins)`)
self$`Sleeping (hrs)`= as.numeric(self$`Sleeping (hrs)`)
glimpse(self)
#change the name of the variables
self = rename(self,Phone_Use=`total_phone_use(mins)`)
self = rename(self,Coffee=`Coffee(cup)`)
self = rename(self, Sleeping = `Sleeping (hrs)`)
self = rename(self, Watching_TV = `Watching TV (hrs)`)
self = rename(self, Reading = `Reading (mins)`)
self = rename(self, Walking_and_Running = `walking + running distance(mile)`)
self = rename(self, Workout = `Gym (hrs)`)
#Convert phone_use and reading variables into hrs
self = self %>% mutate(phone_hr = Phone_Use/60,reading_hr = Reading/60)
#Replace the missing value with the mean of each variable
self$Sleeping[is.na(self$Sleeping)]=mean(self$Sleeping, na.rm = TRUE)
self$Watching_TV[is.na(self$Watching_TV)]=mean(self$Watching_TV, na.rm = TRUE)
self$phone_hr[is.na(self$phone_hr)]=mean(self$phone_hr, na.rm = TRUE)
self$reading_hr[is.na(self$reading_hr)]=mean(self$reading_hr, na.rm = TRUE)
#Add a weekday variable to the dataset
self1 <- self
self1$weekday = weekdays(self1$Date)
self = self1
```
Introduction
=====================================
Row
-----------------------------------------------------------------------
### Summary
- This is a Self Quantified Project aided at assessing my daily wellness, subjects studied here include:
sleeping hour, coffee use habit, steps and walking distance, time use on various activities.
- Data collection: Data was collection both manually and digitally. Some variables, such as Steps,Walking + Running
Distance and Phone Use time were collected on my personal Iphone. While other variables, such as Coffee Use, Sleeping
Hour, Reading, Workout are collected and recorded manually. All data were recorded into an Excel sheet on my laptop, therefore
I used read.xlsx() to read the dataset into RStudio.
- Goal: The goal of this project is to assess my daily wellness. I chose my rest, my coffee use, my activities to represent
wellness here. Based on the data I collected, 5 questions are asked for this analysis.
1. What is the pattern of my walking distance and steps during a week?
2. What is my coffee consumption habit?
3. What is my sleeping pattern?
4. Does the coffee use related to my sleeping time?
5. How is my time distributed a day? Is it healthy?
- Preprocesssing: Data preprocessing is a must for every analysis since no dataset is clean and perfect. The data preperation here includes: making sure of the data type of each variable is correct, rename the variables for ease, convert the variales using minutes as unit to using hr as unit, replacing missing value with the mean of that variable's value. The dataset below is the cleaned
dataset.
- Analysis Method: Visulization Analysis is to be used for this project. Packages used here include: ggplot2, ggthemes, plotly, dygraphs, xts, latticeExtra.
Row
-----------------------------------------------------------------------
### Dataset with selected variables
```{r}
knitr::kable(self[,c(1,3,4,5,8,9,10,12,13,14)])
```
MOVEMENT
=====================================
Row
-------------------------------------
### Steps and Walking + Running Distance by Date
```{r}
#Steps and distance by date
steps = xyplot(Steps ~ Date, self, type = "l" , lwd=2)
distance = xyplot(Walking_and_Running ~ Date, self, type = "l" , lwd=2)
doubleYScale(steps, distance, text = c("Steps", "Distance(miles)") , add.ylab2 = TRUE)
```
### Steps by Weekday
```{r}
#Steps by weekdays
ggplot(self) + geom_boxplot(aes(weekday, Steps)) +
ggtitle('Steps by Weekday') + labs(x = "Weekdays") +
theme_classic()
```
### Walking and Running Distance by Weekday
```{r}
#Walking and running distance by weekday
ggplot(self) + geom_boxplot(aes(weekday, Walking_and_Running)) +
ggtitle("Walking and Running Distance by Weekday") + labs(x = "Weekdays", y = "Walking and Running Distance") +
theme_classic()
```
Row {data-height=200}
-------------------------------------
###Summary
In order to see how much steps I take and the distance of my movement per day, I created two line plot using xyplot function and conbime them into one plot with two y-axis. doubleYScale function is used here since Steps and Walking and Running have different units
and y axis. I am also interested in seeing how my movement changes on different weekdays, so boxplot is applied here.
From the plots above we could see:
- Steps and Walking and Running Distances are closely related and show the same pattern.
- No clear evidence indicate whether I am walking more or less overtime in the past 30 days.
- Saturday is my most active day, with highest average steps and walking distance, while I tend to move less on Monday and Thursday.
COFFEE USE
=====================================
Row
-------------------------------------
### Coffee Use Distribution
```{r}
pie(table(self$Coffee)) + title("Coffee Use Distribution")
```
### Coffee use by Date
```{r}
ggplot(self,aes(Date, Coffee)) + geom_point(col = "brown") +
geom_jitter(col = "brown") +
geom_smooth(col = 'red', fill = "pink" ) +
labs(title= "Coffee use by Date") +
theme_classic()
```
### Coffee Use by Weekday
```{r}
ggplot(self) + geom_boxplot(aes(weekday, Coffee)) +
ggtitle("Coffee Use by Weekday") +
labs(y = "Coffee (Cup)") +
theme_classic()
```
Row {data-height=200}
-------------------------------------
###Summary
I want to see what is my coffee use habit. Firstly I created a piechart and see how often I drink 0,1 and 2 cup of coffee. Then I want to how much coffee I drink overtime, ggplot, geom_point, and geom_jitter is used here. Boxplot is used here again to see whether my coffee use changes on different weekday.
Here're some insights:
- Around 75% of the days I drink one cup of coffer per day.
- Wednesday is the day I drink more coffee on average.
- I drank more coffee from the end of May till the beginning of June, then fell back to one cup per day.
- I still drink coffee on weekends, which means I've built an drinking habit even when I am not at work.
SLEEPING
=====================================
Row
-------------------------------------
### Sleeping Hour Distribution
```{r}
pie(table(self$Sleeping)) + title('Sleeping Hour Distribution')
```
### Sleeping Hour Per Day
```{r}
ggplot(self) + geom_point(aes(Date, Sleeping, color = weekday))+
theme_classic() +
labs(title = "Sleeping Hour Per Day", y = "Sleeping Hour")
```
### Sleeping Hour Per Weekday
```{r}
ggplot(self) + geom_boxplot(aes(weekday, Sleeping))+
theme_classic() +
labs(title = "Sleeping Hour Per Weekday", y = "Sleeping Hour")
```
Row {data-height=200}
------------------------------------------
###Summary
Piechart is firstly used here to see the Sleeping Hour distribution in the past 30 days. In order to see whether weekday has an effect on my sleeping hour, ggplot and geom_point graphs colored by Weekday variable, as well as a boxplot are used.
- More than 75% of the days I had an over 7 hour sleeping. But there is still a significant amount of time I slept only for 6 hours.
- No significant change of sleeping hour in the past 30 days.
- Friday is usually the day I have most sleeping, followed by Wednesday.
- Saturday is usually the day I have least sleeping.
SLEEPING VS COFFEE
=====================================
Row
-------------------------------------
### Sleeping Hour vs Coffee Use
```{r}
ggplotly(ggplot(self,aes(x = Sleeping, y = Coffee))+ geom_point(col = 'brown') + geom_smooth(col = 'red', fill = "pink") +
geom_jitter(col = "brown") +
theme_classic() +
ggtitle("Sleeping Hour vs Coffee Use") +
labs(x = "Sleeping Hours", y = "Coffee Use (Cup)"))
```
### Coffee Use vs Sleeping Hour
```{r}
coffee = xyplot(Coffee ~ Date, self, type = "l" , lwd=2)
sleeping = xyplot(Sleeping ~ Date, self, type = "l" , lwd=2)
doubleYScale(coffee, sleeping, text = c("Coffee Use", "Sleeping Hour") , add.ylab2 = TRUE)
```
Row {data-height=200}
------------------------------------------
###Summary
Excessive use of coffee might cause sleeping difficulty. I want to see if coffee use influences my sleeping hour.
Ggplot and geom_point as well as geom_smooth are used here to see the relationship between my coffee use and sleeping. I also use doubleYScale function to create a plot with 2 y-axis so I could see the pattern of coffee use and sleeping hour at the same time.
From the graphs above:
- My Sleeping hour and Coffee Use have a non-linear relationship, and no clear pattern.
- I tend to drink more coffee when I have an 7 to 7.5 hr sleep, while I tend to drink less after a 7.5 hr or more sleeping.
TIME USE
=====================================
Row
-------------------------------------
### Time Use Per Day
```{r}
timeuse = self %>% mutate(other = 24 -Sleeping-Watching_TV-phone_hr-reading_hr-Workout )
timeuse = xts(timeuse[,c(4,5,12,10,13,15)], order.by = timeuse$Date)
dygraph(timeuse,main = "Time Use By Date") %>% dyAxis("y", label = "Hour") %>% dyOptions(stackedGraph = TRUE)
```
Row {data-height=200}
------------------------------------------
###Summary
I want to see how I spent my day in the past month, therefore dygraph is applied here. I created a new variable called "Other" to represent the rest of time on a day excluding those on the dataset. A stackGraph is used to show the time spent per day and how it changes.
From the graph we could see that:
- There are six activities taken into account in this analysis, Sleeping, Watching TV, Phone Use, Workout, Reading, Others.
- I spent least time on reading and workout in general, while spent most time on others.
- Suprisingly, I spent more around 4 hours on my phone everyday, which is a huge amount of time.
- TV watching is another problem, which takes up around 1.5 hr per day.
- Time use on all activities is pretty stable in the past 33 days, which indicate this has been my lifestyle.
Conclusion
=====================================
Row
------------------------------
### Conclusion
At the beginning of this project I metioned there're 5 research questions. Let's look at the answer here:
1. What is the pattern of my walking distance and steps during a week?
- Saturday is when I walk most and take most steps, while Monday and Thursday are my laziest days.
2. What is my coffee consumption habit?
- I tend to drink coffee everyday, while most of the time I drink around 1 cup of coffee. Wednesday is the day I drink most.
3. What is my sleeping pattern?
- Most of the days I sleep more than 7 hours. I tend to sleep more on Friday and Wednesday, while sleep less on Saturday.
4. Does the coffee use related to my sleeping time?
- There is no significant relationship between my coffee use and sleeping hour.
5. How is my time distributed a day? Is it healthy?
- My time spent on each ativities is pretty stable overtime. Other activities take up most of the time, while reading and working out take up least of the time.
Row
------------------------------
###Takeaway
I have to admit I was a little bit shocked when I see the result of this project. The Quantified Self revealed some lifestyle and wellness issues of mine. Less-than-6-hour sleep exists, not enough time put into reading and workingout, too much phone use and enetertainment, etc. But knowing the problem is the first step to fix it. So I'd love to make some changes of my lifestyle.
1. Set a sleep alarm and record my sleeping hour everyday. Make sure I have at least 7 hour of sleep.
2. Set a phone use limit on my IPhone, reduce the daily phone use to 2- 3 hour.
3. Set a TV watching limit, don't be a couch potato.
4. Make a workout plan and encourage myself to go to the gym at least 2 times per week.
5. Set up a reading time and try to do some reading everyday, with physical book instead of staring at the screens.
6. Record my other activies on a day and live a self-awared life.