Agenda

Row

Overview

The Quantified Self Final Project is a demonstration of the potential of wearables, analytics, and “Big Data” to achieve personal informatics and quantified self. The project aims to use personal data to answer five important Cases related to personal health, wellness, and productivity. The Cases include:

How does the duration of workouts vary across different workout types? How does my monthly spending change over the course of a year? What is the correlation between sleep hours and productivity? What are the average daily nutrient intakes over the past month? How does my commute time to work differ between weekdays and weekends? To answer these Cases, the project utilizes various technologies and tools to gather, analyze and visualize personal data. The data for this project was randomly generated for demonstration purposes and was stored in an excel sheet. The data was then imported using the read_excel function.

The project uses a dashboard visualization to demonstrate the various data trends. The dashboard includes bar charts, scatter plots, and heat maps to provide insights into the data. These visualizations are used to identify patterns and trends in personal health, wellness, and productivity. Overall, the project highlights the potential of data analytics and wearables to improve personal health and productivity.

* Source of data *

Row

Data Source:

The data used in this project was collected from 2 sources. 1. Apple watch data exported from Health app 2. Manually saved data from Intex Fitrist Smart Band which I collected a while back.

Row

Case 1: Apple Data

The data for the first Case, which involves Apple data, was gathered by using an Apple Watch, which automatically tracks physical activity data such as calories burned, exercise time, and standing time. This data is then stored in the Apple Health app, which can be exported as a xml file and then I converted it to csv file. This is used for Q1, Q2 and Q3.

Row

Case 2: Heart Rate Fluctuation

The data for this Case was collected by manually measuring heart rate using a smart band (Intext Fitrist) over a period of one year (from January 1st, 2022 to December 31st, 2022). The data was entered into a spreadsheet and saved in a CSV file named “Q4_heart_rate.csv”.I used the old data cause it was accurate and I finally used it for this project.

Row

Case 3: Monthly & Weekday Miles

The data for this Case was collected by manually tracking daily miles walked and run for each month and day of the week over a period of one year (from January 1st, 2022 to December 31st, 2022) using the smart band. The data was entered into spreadsheets for monthly and weekly data and saved in CSV files named “Q5_monthly_miles.csv” and “Q5_weekday_miles.csv”, respectively.

Workouts

Row

Q1: How does the duration of workouts vary across different workout types?

To answer this question, we can create a bar plot showing the average workout duration for each workout type, and make it interactive by allowing the user to hover over the bars to see the exact duration. This plot gives us insights into how the duration of workouts varies across different types of workouts. It shows that some types of workouts tend to be longer than others on average. This information can be useful for important as it helps individuals optimize their workout routines by choosing workouts that fit their schedule and fitness goals. For example, if someone has limited time for exercise, they may choose to focus on shorter workouts such as walking, while if someone has more time, they may choose to do longer workouts such as elliptical or functional strength training. The interactive bar plot created using Plotly shows the same information in a more visually appealing and interactive format. The user can hover over each bar to see the exact duration value, which makes it easy to compare the average durations across different workout types. The plot shows that “Swimming” has the longest average duration of workouts, followed by “Walking” and “Cycling”. “Running” and “Elliptical” have the shortest average durations. This information can be useful for individuals who are looking to optimize their workout routines by choosing workouts that fit their schedule and fitness goals. Overall, this plot provides a clear and informative summary of the workout data.

Row

Chart 1

library(plotly)
library(ggplot2)

# define a color palette
colors <- c("#8dd3c7", "#ffffb3", "#bebada", "#fb8072", "#80b1d3", "#fdb462", "#b3de69", "#fccde5", "#d9d9d9")

activity_type_map <- c(
  "HKWorkoutActivityTypeCoreTraining" = "Core Training",
  "HKWorkoutActivityTypeCycling" = "Cycling",
  "HKWorkoutActivityTypeElliptical" = "Elliptical",
  "HKWorkoutActivityTypeFunctionalStrengthTraining" = "Functional Strength Training",
  "HKWorkoutActivityTypeRunning" = "Running",
  "HKWorkoutActivityTypeSwimming" = "Swimming",
  "HKWorkoutActivityTypeTraditionalStrengthTraining" = "Traditional Strength Training",
  "HKWorkoutActivityTypeWalking" = "Walking"
)


# calculate the average workout duration for each workout type
workout_summary <- workout_data %>% 
  group_by(workoutActivityType) %>% 
  summarize(avg_duration = mean(duration))
workout_summary <- workout_summary %>%
  mutate(workoutActivityType = recode(workoutActivityType, !!!activity_type_map))

# create a bar plot with plotly
plot_ly(workout_summary, x = ~workoutActivityType, y = ~avg_duration, type = "bar", 
        hoverinfo = "text", text = ~paste0(round(avg_duration, 1), " min"),
        marker = list(color = colors)) %>%
  layout(xaxis = list(title = "Workout Type"), yaxis = list(title = "Average Duration (min)"),
         title = "Average Workout Duration by Type", 
         plot_bgcolor = "#f5f5f5", paper_bgcolor = "#f5f5f5")

Daily Physical Activity

Row

Q2 : How has my daily physical activity level changed over time, and have I been meeting my daily activity goals?

The plot displays my daily physical activity level over time, with active energy burned and exercise time on the y-axis and dates on the x-axis. The dashed lines represent the goals for active energy burned and exercise time, respectively. The plot indicates that my activity level has varied over time, with peaks and troughs in both active energy burned and exercise time. While I have generally met my daily activity goals, there have been days when I fell short, particularly during periods of low activity. The interactive nature of the plot allows for zooming in and out and hovering over specific data points for more information.

Row

Chart 1

library(plotly)
plot <- ggplot(activity_data, aes(x = dateComponents)) +
  geom_line(aes(y = activeEnergyBurned, color = "Active Energy Burned")) +
  geom_line(aes(y = activeEnergyBurnedGoal, color = "Active Energy Burned Goal"), linetype = "dashed") +
  geom_line(aes(y = appleExerciseTime * 10, color = "Exercise Time")) +
  geom_line(aes(y = appleExerciseTimeGoal * 10, color = "Exercise Time Goal"), linetype = "dashed") +
  labs(title = "Daily Physical Activity Level Over Time",
       x = "Date",
       y = "Calories Burned/Exercise Time (mins)") +
  scale_color_manual(name = "Activity Type",
                     values = c("Active Energy Burned" = "red",
                                "Active Energy Burned Goal" = "blue",
                                "Exercise Time" = "orange",
                                "Exercise Time Goal" = "green")) +
  theme(plot.title = element_text(size = 20, face = "bold"),
        axis.title = element_text(size = 14),
        legend.title = element_text(size = 14),
        legend.text = element_text(size = 12))

ggplotly(plot)

Daily water intake

Row

Q3: How has my daily water intake varied over time?

The resulting plot shows the total amount of water consumed each day over time. We can see that there are some days where no water was consumed, but in general, the amount of water consumed each day is relatively consistent over time. There are also some spikes in the data, which indicate I got yelled by my girlfriend for not drinking water.The most recent data is empty since I stopped tracking my water consumption since I got busy with my job.

Row

Chart 1

# Convert dates to POSIXct format
health_data$startDate <- as.POSIXct(health_data$startDate, format = "%Y-%m-%d %H:%M:%S %z")
health_data$endDate <- as.POSIXct(health_data$endDate, format = "%Y-%m-%d %H:%M:%S %z")
health_data$creationDate <- as.POSIXct(health_data$creationDate, format = "%Y-%m-%d %H:%M:%S %z")

# Group data by day and calculate total water consumed each day
daily_water <- health_data %>%
  mutate(date = as.Date(startDate)) %>%
  group_by(date) %>%
  summarize(total_water = sum(value))
# Create interactive line plot with Plotly
plot_ly(daily_water, x = ~date, y = ~total_water, type = "scatter", mode = "lines+markers", width = "100%") %>%
  layout(xaxis = list(title = "Date"), yaxis = list(title = "Total Water Consumed (mL)"), 
         title = "Daily Water Consumption over Time")

Heart rate fluctualtion

Column

Question

Q4: What are my heart rate last year?

Explanation

The visualization shows the fluctuation in mean heart rate over time, with each point on the line graph representing the average heart rate for a given month. The line chart indicates a gradual increase in mean heart rate from January to July, followed by a sharp spike in August, and a subsequent decrease in the following months. The trend suggests that there may have been a significant change in the individual’s activity or health during the month of August that caused the heart rate to increase, but further investigation would be needed to confirm this. The x-axis labels have been rotated to make them more readable using the ggpubr package.

Column

Chart 1

# install.packages("ggpubr")

library(ggpubr)
library(lubridate)
df_q4$date <- as.Date(df_q4$date, format = "%Y-%m-%d")

df_q4$year_month <- floor_date(df_q4$date, unit = "month")


# Group by year and month and calculate the mean heart rate for each group
df_q4_grouped <- df_q4 %>%
  group_by(year_month) %>%
  summarize(mean_heart_rate = mean(heart_rate))

# Plot the mean heart rate by year and month using a line chart
heart_rate_chart <- ggplot(df_q4_grouped, aes(x = year_month, y = mean_heart_rate)) +
  geom_line(color = "firebrick", size = 1) +
  geom_point(color = "firebrick", size = 2) +
  labs(title = "Heart Rate Fluctuation", x = "Year-Month", y = "Mean Heart Rate") +
  theme(panel.background = element_rect(fill = "white"))

# Use ggpubr to rotate the x-axis labels for better readability
ggpar(heart_rate_chart, x.text.angle = 45)

Monthly & Weekday Miles

Column

Q5: What’s my monthly and weekly active miles trend?

As we could see from below two bar charts, I walked and ran more miles during June and July because of summer time and I travel more than other months. It seems that I walk less during weekends, it might be because I stayed at home normally and partied with friends during weekend.

Row {.tabset .tabset-fade}

Chart 2

## Monthly Miles
monthly_miles <- df_q5_monthly
monthly_miles_chart <- ggplot(monthly_miles, aes(x = Month, y = Distance)) +
  geom_bar(stat = "identity", fill = "navy") +
  labs(title = "Monthly Miles", x = "Month", y = "Distance (miles)") +
  theme(panel.background = element_rect(fill = "white"))

monthly_miles_chart

Chart 3

## Weekday Miles
weekday_miles <- df_q5_weekly
weekday_miles_chart <- ggplot(weekday_miles, aes(x = Weekday, y = Distance)) +
  geom_bar(stat = "identity", fill = "darkgreen") +
  labs(title = "Weekday Miles", x = "Weekday", y = "Distance (miles)") +
  theme(panel.background = element_rect(fill = "white"))

weekday_miles_chart