Overview

#Overview

The Quantified Self (QS) is a movement motivated to leverage the synergy of wearables, analytics, and“Big Data”. This movement exploits the ease and convenience of data acquisition through the internet of things (IoT) to feed the growing obsession of personal informatics and quotidian data. The website http://quantifiedself.com/ is a great place to start to understand more about the QS movement. The value of the QS for our class is that its core mandate is to visualize and generate questions and insights about a topic that is of immense importance to most people – themselves. It also produces a wealth of data in a variety of forms. Therefore, designing this project around the QS movement makes perfect sense because it offers you the opportunity to be both the data and question provider, the data analyst, the vis designer, and the end user. This means you will be in the unique position of being capable of providing feedback and direction at all points along the data visualization/analysis life cycle.

#Objective

Develop a visualization dashboard based on a series of data about your own life. The actual data used for this project can range from daily sleep regimes, TV shows watched, types of food eaten, spending habits, commute times to work, travel habits, to blood pressure and nutrient intake. The amount of data you collect and harvest will differ based on your specified objectives. Ultimately the project must meet certain key objectives: 1. You must provide an written summary of your data collection, analysis and visualization methods, including the why you chose your methods, and what tools you utilized. 2. Your summary must outline ≥ 5 questions that can be evaluated using a data-driven approach. These questions should be more than just “How many miles did I run”, although a couple of your questions could be stated that way. 3. You must collect, manage, and store the data necessary for this visualization. 4. You must design and create an appropriate set of visualizations (try not to use just one type of visualization) within a dashboard/storyboard that provides insight into your specified questions, with a minimum of ≥ 1 interactive graphical element.

#Methods

Getting data

Over the course of one month, I collected data from the Health app on my iPhone to track my daily steps and sleep time. The steps data includes the number of steps taken each day, while the sleep data includes the start and end times of each sleep session, as well as the total amount of sleep time. This data can provide valuable insights into my daily physical activity and sleep patterns, which can be visualized and analyzed using data visualization tools to further understand and optimize my health and wellness.

Alternatively, and more professionally, there are tons of packages that allow you to access data from R. See here for a great primer on accessing NOAA data with ‘R’. It is also a good introduction to API keys and their use.

#Questions:

  1. What is the average number of steps taken per day over the course of the month? How does this vary from day to day?
  2. Are there any patterns in the times of day when the most steps are taken? Are there certain days of the week where more steps are taken than others?
  3. What is the average duration of sleep per night over the course of the month? How does this vary from night to night?
  4. Are there any patterns in the times of day when the most sleep is obtained? Are there certain days of the week where more sleep is obtained than others?
  5. Is there a correlation between the number of steps taken in a day and the amount of sleep obtained that night?

Daily Steps

summary

The code converts the date column to a date format, calculates the average number of steps taken per day using the dplyr library, and creates a line graph using the ggplot2 library. The resulting graph will show the average number of steps taken per day over the course of the month.

The daily average number of steps taken over the course of the month is approximately 3000, with significant fluctuations observed across different days.

Day of the Week Pattern

summary

The code creates two new columns in the health_data dataframe: day_of_week, which is the day of the week as a factor, and hour_of_day, which is the hour of the day extracted from the date column.

Finally, the code creates a histogram of steps taken stratified by day_of_week and hour_of_day using ggplot2. The labs function is used to customize the axis and legend labels, and the ggtitle function is used to add a title to the plot.

Based on the data collected, the average number of steps taken per day varied throughout the week. Fridays recorded the highest average steps, while Wednesdays had the lowest average steps.

Average sleep time per night

Summary

The code calculates the average duration of sleep per night, creates a new column for the night of the week, groups the data by night of the week, and calculates the average sleep duration for each night. It then creates a bar chart of the average sleep duration by night of the week, with the night of the week on the x-axis and the average sleep duration in minutes on the y-axis.

On average, I slept for about 400 minutes per night.

Sleep Duration by Night of the Week

Summary

This plot shows the distribution of sleep duration for each day of the week using a box-and-whisker plot. The box represents the middle 50% of the data, and the whiskers represent the range of the data. Outliers are shown as individual points.

The relationship between the sleep duration and steps taken

Summary

The code converts the “total_sleep_time” column of the data frame from a character type to an integer type. Finally, it creates a scatter plot using ggplot2, where the x-axis represents the “steps” variable, the y-axis represents the “total_sleep_time” variable, and the color of each point on the scatter plot represents the corresponding date. The title and axis labels are also added to the plot using the labs function.

Conclusions

  1. What is the average number of steps taken per day over the course of the month? How does this vary from day to day?

The daily average number of steps taken over the course of the month is approximately 3000, with significant fluctuations observed across different days.

  1. Are there any patterns in the times of day when the most steps are taken? Are there certain days of the week where more steps are taken than others?

Based on the data collected, the average number of steps taken per day varied throughout the week. Fridays recorded the highest average steps, while Wednesdays had the lowest average steps.

  1. What is the average duration of sleep per night over the course of the month? How does this vary from night to night?

On average, I slept for about 400 minutes per night.

  1. Are there any patterns in the times of day when the most sleep is obtained? Are there certain days of the week where more sleep is obtained than others?

Thursday was the day I slept the most based on the collected data

  1. Is there a correlation between the number of steps taken in a day and the amount of sleep obtained that night?

The scatter plot shows the relationship between sleep duration and steps taken. There appears to be no clear linear relationship between the two variables, suggesting that they are not strongly correlated. However, there are some clusters of points that indicate some correlation between sleep and steps, such as a cluster of points around 4000-5000 steps and 400-500 minutes of sleep, and another cluster of points around 6000-8000 steps and 400-500 minutes of sleep. Additionally, the color of the points corresponds to the date, which suggests that there may be some temporal patterns in the relationship between sleep and steps.

---
title: "The Quantified Self"
output:
  flexdashboard::flex_dashboard:
    orientation: rows
    social: menu
    source: embed
    vertical_layout: fill
  html_document:
    df_print: paged
---
Overview
=====================================
#Overview

The Quantified Self (QS) is a movement motivated to leverage the synergy of wearables, analytics, and“Big Data”. This movement exploits the ease and convenience of data acquisition through the internet of things (IoT) to feed the growing obsession of personal informatics and quotidian data. The website http://quantifiedself.com/ is a great place to start to understand more about the QS movement. The value of the QS for our class is that its core mandate is to visualize and generate questions and insights about a topic that is of immense importance to most people – themselves. It also produces a wealth of data in a variety of forms. Therefore, designing this project around the QS movement makes perfect sense because it offers you the opportunity to be both the data and question provider, the data analyst, the vis designer, and the end user. This means you will be in the unique position of being capable of providing feedback and direction at all points along the data visualization/analysis life cycle.

#Objective

Develop a visualization dashboard based on a series of data about your own life. The actual data used for this project can range from daily sleep regimes, TV shows watched, types of food eaten, spending habits, commute times to work, travel habits, to blood pressure and nutrient intake. The amount of data you collect and harvest will differ based on your specified objectives.
Ultimately the project must meet certain key objectives:
1. You must provide an written summary of your data collection, analysis and visualization methods, including the why you chose your methods, and what tools you utilized.
2. Your summary must outline ≥ 5 questions that can be evaluated using a data-driven approach. These questions should be more than just “How many miles did I run”, although a couple of your questions could be stated that way.
3. You must collect, manage, and store the data necessary for this visualization.
4. You must design and create an appropriate set of visualizations (try not to use just one type of visualization) within a dashboard/storyboard that provides insight into your specified questions, with a minimum of ≥ 1 interactive graphical element.

#Methods

*Getting data*

Over the course of one month, I collected data from the Health app on my iPhone to track my daily steps and sleep time. The steps data includes the number of steps taken each day, while the sleep data includes the start and end times of each sleep session, as well as the total amount of sleep time. This data can provide valuable insights into my daily physical activity and sleep patterns, which can be visualized and analyzed using data visualization tools to further understand and optimize my health and wellness.

Alternatively, and more professionally, there are tons of packages that allow you to access data from R. See here for a great primer on accessing NOAA data with ‘R’. It is also a good introduction to API keys and their use.

#Questions: 

1. What is the average number of steps taken per day over the course of the month? How does this vary from day to day?
2. Are there any patterns in the times of day when the most steps are taken? Are there certain days of the week where more steps are taken than others?
3. What is the average duration of sleep per night over the course of the month? How does this vary from night to night?
4. Are there any patterns in the times of day when the most sleep is obtained? Are there certain days of the week where more sleep is obtained than others?
5. Is there a correlation between the number of steps taken in a day and the amount of sleep obtained that night?

Daily Steps
=====================================
### **summary**

The code converts the date column to a date format, calculates the average number of steps taken per day using the dplyr library, and creates a line graph using the ggplot2 library. The resulting graph will show the average number of steps taken per day over the course of the month.

The daily average number of steps taken over the course of the month is approximately 3000, with significant fluctuations observed across different days.

###

```{r}
# Load required libraries
# Read the data from a csv file
health_data <- read.csv("health_data.csv")

# Convert date column to date format
health_data$date <- as.Date(health_data$date, format = "%m/%d/%Y")

# Calculate daily total steps
daily_steps <- aggregate(steps ~ date, data = health_data, sum)

# Calculate the average daily steps
avg_daily_steps <- mean(daily_steps$steps)

# Plot the trend of daily steps over the course of the month
library(ggplot2)

ggplot(daily_steps, aes(x = date, y = steps)) +
  geom_line(color = "blue") +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(title = "Daily Steps Trend",
       x = "Date",
       y = "Steps") +
  theme_minimal()
```


Day of the Week Pattern
===================================================

### **summary**

The code creates two new columns in the health_data dataframe: day_of_week, which is the day of the week as a factor, and hour_of_day, which is the hour of the day extracted from the date column.

Finally, the code creates a histogram of steps taken stratified by day_of_week and hour_of_day using ggplot2. The labs function is used to customize the axis and legend labels, and the ggtitle function is used to add a title to the plot.

Based on the data collected, the average number of steps taken per day varied throughout the week. Fridays recorded the highest average steps, while Wednesdays had the lowest average steps.

###

```{r}
# Convert date to day of the week
health_data$day_of_week <- weekdays(as.Date(health_data$date))

# Calculate average steps per day of the week
steps_by_day <- aggregate(steps ~ day_of_week, data = health_data, FUN = mean)

# Create bar plot
barplot(steps_by_day$steps, names.arg = steps_by_day$day_of_week, 
        xlab = "Day of the Week", ylab = "Average Steps", main = "Average Steps by Day of the Week")
```


Average sleep time per night
=====================================

*Summary*

The code calculates the average duration of sleep per night, creates a new column for the night of the week, groups the data by night of the week, and calculates the average sleep duration for each night. It then creates a bar chart of the average sleep duration by night of the week, with the night of the week on the x-axis and the average sleep duration in minutes on the y-axis. 

On average, I slept for about 400 minutes per night.

###

```{r}
library(dplyr)
library(ggplot2)

# Load the data from the file
health_data <- read.csv("health_data.csv")

# Calculate the average duration of sleep per night
avg_sleep_duration <- mean(health_data$total_sleep_time_mins)

health_data$date <- as.Date(health_data$date, format = "%m/%d/%Y")

# Create a new column for the night of the week
health_data$night_of_week <- weekdays(as.Date(health_data$date))

# Group the data by night of the week and calculate the average sleep duration
avg_sleep_by_night <- health_data %>%
  group_by(night_of_week) %>%
  summarize(avg_sleep = mean(total_sleep_time_mins))

# Create a bar chart of the average sleep duration by night of the week
ggplot(avg_sleep_by_night, aes(x = night_of_week, y = avg_sleep)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  labs(title = "Average Sleep Duration by Night of the Week",
       x = "Night of the Week",
       y = "Average Sleep Duration (minutes)")
```


Sleep Duration by Night of the Week
=====================================

*Summary*

This plot shows the distribution of sleep duration for each day of the week using a box-and-whisker plot. The box represents the middle 50% of the data, and the whiskers represent the range of the data. Outliers are shown as individual points.

###
```{r}
library(ggplot2)

# Load the data from the file
health_data <- read.csv("health_data.csv")

# Convert the "total sleep time" column to minutes
health_data$total_sleep_time <- as.integer(health_data$total_sleep_time)

# Calculate the average duration of sleep per night
avg_sleep_duration <- mean(health_data$total_sleep_time_mins)

health_data$date <- as.Date(health_data$date, format = "%m/%d/%Y")

# Create a new column for the night of the week
health_data$night_of_week <- weekdays(as.Date(health_data$date))

# Create a new column for the night of the week
health_data$night_of_week <- weekdays(as.Date(health_data$date))

# Create a box plot of sleep duration by night of the week
ggplot(health_data, aes(x = night_of_week, y = total_sleep_time)) +
  geom_boxplot() +
  labs(title = "Sleep Duration by Night of the Week",
       x = "Night of the Week",
       y = "Sleep Duration (minutes)")
```



The relationship between the sleep duration and steps taken
=====================================

*Summary*

The code converts the "total_sleep_time" column of the data frame from a character type to an integer type. Finally, it creates a scatter plot using ggplot2, where the x-axis represents the "steps" variable, the y-axis represents the "total_sleep_time" variable, and the color of each point on the scatter plot represents the corresponding date. The title and axis labels are also added to the plot using the labs function.

###
```{r}
# Load the data from the file
health_data <- read.csv("health_data.csv")

# Convert the "total sleep time" column to minutes
health_data$total_sleep_time <- as.integer(health_data$total_sleep_time)

# Create a scatter plot of sleep duration vs. steps taken with color based on the date
ggplot(health_data, aes(x = steps, y = total_sleep_time, color = date)) +
  geom_point() +
  labs(title = "Sleep Duration vs. Steps Taken",
       x = "Steps Taken",
       y = "Sleep Duration (minutes)")
```


Conclusions
=====================================
1. What is the average number of steps taken per day over the course of the month? How does this vary from day to day?

The daily average number of steps taken over the course of the month is approximately 3000, with significant fluctuations observed across different days.

2. Are there any patterns in the times of day when the most steps are taken? Are there certain days of the week where more steps are taken than others?

Based on the data collected, the average number of steps taken per day varied throughout the week. Fridays recorded the highest average steps, while Wednesdays had the lowest average steps.

3. What is the average duration of sleep per night over the course of the month? How does this vary from night to night?

On average, I slept for about 400 minutes per night.

4. Are there any patterns in the times of day when the most sleep is obtained? Are there certain days of the week where more sleep is obtained than others?

Thursday was the day I slept the most based on the collected data

5. Is there a correlation between the number of steps taken in a day and the amount of sleep obtained that night?

The scatter plot shows the relationship between sleep duration and steps taken. There appears to be no clear linear relationship between the two variables, suggesting that they are not strongly correlated. However, there are some clusters of points that indicate some correlation between sleep and steps, such as a cluster of points around 4000-5000 steps and 400-500 minutes of sleep, and another cluster of points around 6000-8000 steps and 400-500 minutes of sleep. Additionally, the color of the points corresponds to the date, which suggests that there may be some temporal patterns in the relationship between sleep and steps.