Bellabeat is a data-centered wellness femtech company located in San Francisco, founded by Urška Sršen and Sandro Mur in 2013. It designed and engineered fashionable wellness trackers that help users to manage their health and wellness while conforming to women’s lifestyles on any occasion.
While the company’s growth has extended internationally, Sršen, the CEO, is looking for other growth opportunities. She’s particularly interested in analyzing usage data from non-Bellabeat users, such as those from Fitbit. She believes that the insight gained from these analyses could reveal valuable trends for potential growth opportunities that could help Bellabeat to strategize its marketing plan. To this end, I have been asked to serve as a data analyst for Bellabeat’s Marketing team (fictional) and to gather insights by analyzing device usage data by Fitbit consumers.
The analysis will follow Google Data Analytics’ framework, namely the six phases of data analysis: Ask, Prepare, Process, Analyze, Share, and Act.
In this phase of the analysis, the focus will be on identifying the business task and stakeholders, and determining how the insights gained could help to drive business decisions forward.
The task entails utilizing insights gained from the analysis to provide answers to the following questions:
This stage begins by examining the raw data used for this analysis for integrity, security, and credibility. The suggested data is the Fitbit Fitness Tracker Data by Mobius and is publically available (CC0: Public Domain) in Kaggle. According to the website, the datasets were generated by respondents to a distributed survey via Amazon Mechanical Turk over a period af two months, from 2016-03-12 to 2016-05-12. Thirty eligible Fitbit users consented to the submission of personal tracker data, including daily, hourly, and minute-level output for physical activity, heart rate (per second), and sleep monitoring in a total of 18 .csv files of datasets arranged in long format.
Reliability: Low - data were collected for only 30 individuals with unknown gender, age, and ethnicity over a period of only 60 days. It’s not sure if the data would provide a representative population and duration of usage. . However, one could assume that all the participants are women since Bellabeat’s tracker devices are made for female health and wellness only. Therefore, gender data might not be applicable to the current analysis, except when Bellabeat plans to also extend its offerings to men.
Original: About Medium, while the data were generated by a third party, they were provided directly by actual Fitbit users, albeit via Amazon Mechanical Turk and participation is voluntary.
Comprehensive: High-Medium, while the data contain the necessary information to do the needed analysis, some data tables do not have data entries for all 30 individuals and in some cases, the entire column is blank.
Current: NO, the data are more than seven years old as of 2023-02, therefore, both the data and device capabilities are outdated.
Cited: Yes, the source is well-documented and contains metadata.
The tracker data were registered at daily, hourly and minute levels. For this study, daily-level activities that include calories burnt would give a higher-level view of the analysis. For the weightLogInfo_merged.csv file, while it only consists of eight users’ tracker data, it would be worth examining how this group of users chose activity’s intensity and duration, and the ensuing calories consumed. Since BMI (Kg/m2) = weight/height2, which can be used as a screening tool to identify potential weight problems of an individual. It would also be interesting to find out if those who logged weight data were underweight (BMI<18.5), normal weight (BMI=18.5-24.9), overweight (BMI=25-29.9), or obese (BMI>30), according to the CDC. The files chosen for this study are as follows:
library(magrittr)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ tidyr::extract() masks magrittr::extract()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ purrr::set_names() masks magrittr::set_names()
library(dplyr)
library(lubridate)
## Loading required package: timechange
##
## Attaching package: 'lubridate'
##
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(readr)
library(ggplot2)
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(skimr)
library(hms)
##
## Attaching package: 'hms'
##
## The following object is masked from 'package:lubridate':
##
## hms
library(readr)
library(knitr)
library(scales)
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
daily_activity <- read.csv("~/Downloads/dailyActivity_merged.csv")
sleep_day <- read.csv("~/Downloads/sleepDay_merged.csv")
weight_log_info<- read.csv("~/Downloads/weightLogInfo_merged.csv")
head(daily_activity)
## Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366 4/12/2016 13162 8.50 8.50
## 2 1503960366 4/13/2016 10735 6.97 6.97
## 3 1503960366 4/14/2016 10460 6.74 6.74
## 4 1503960366 4/15/2016 9762 6.28 6.28
## 5 1503960366 4/16/2016 12669 8.16 8.16
## 6 1503960366 4/17/2016 9705 6.48 6.48
## LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.44 0.40
## 4 0 2.14 1.26
## 5 0 2.71 0.41
## 6 0 3.19 0.78
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 11 181 1218 1776
## 4 34 209 726 1745
## 5 10 221 773 1863
## 6 20 164 539 1728
glimpse(daily_activity)
## Rows: 940
## Columns: 15
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 150396036…
## $ ActivityDate <chr> "4/12/2016", "4/13/2016", "4/14/2016", "4/15/…
## $ TotalSteps <int> 13162, 10735, 10460, 9762, 12669, 9705, 13019…
## $ TotalDistance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8…
## $ TrackerDistance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8…
## $ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveDistance <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3.5…
## $ ModeratelyActiveDistance <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1.3…
## $ LightActiveDistance <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5.0…
## $ SedentaryActiveDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveMinutes <int> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66, 4…
## $ FairlyActiveMinutes <int> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, 21…
## $ LightlyActiveMinutes <int> 328, 217, 181, 209, 221, 164, 233, 264, 205, …
## $ SedentaryMinutes <int> 728, 776, 1218, 726, 773, 539, 1149, 775, 818…
## $ Calories <int> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 203…
head(sleep_day)
## Id SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 4/12/2016 12:00:00 AM 1 327
## 2 1503960366 4/13/2016 12:00:00 AM 2 384
## 3 1503960366 4/15/2016 12:00:00 AM 1 412
## 4 1503960366 4/16/2016 12:00:00 AM 2 340
## 5 1503960366 4/17/2016 12:00:00 AM 1 700
## 6 1503960366 4/19/2016 12:00:00 AM 1 304
## TotalTimeInBed
## 1 346
## 2 407
## 3 442
## 4 367
## 5 712
## 6 320
glimpse(sleep_day)
## Rows: 413
## Columns: 5
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 150…
## $ SleepDay <chr> "4/12/2016 12:00:00 AM", "4/13/2016 12:00:00 AM", "…
## $ TotalSleepRecords <int> 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ TotalMinutesAsleep <int> 327, 384, 412, 340, 700, 304, 360, 325, 361, 430, 2…
## $ TotalTimeInBed <int> 346, 407, 442, 367, 712, 320, 377, 364, 384, 449, 3…
head(weight_log_info)
## Id Date WeightKg WeightPounds Fat BMI IsManualReport
## 1 1503960366 2016-05-02 52.6 115.9631 22 22.65 TRUE
## 2 1503960366 2016-05-03 52.6 115.9631 NA 22.65 TRUE
## 3 1927972279 2016-04-13 133.5 294.3171 NA 47.54 FALSE
## 4 2873212765 2016-04-21 56.7 125.0021 NA 21.45 TRUE
## 5 2873212765 2016-05-12 57.3 126.3249 NA 21.69 TRUE
## 6 4319703577 2016-04-17 72.4 159.6147 25 27.45 TRUE
## LogId
## 1 1.46223e+12
## 2 1.46232e+12
## 3 1.46051e+12
## 4 1.46128e+12
## 5 1.46310e+12
## 6 1.46094e+12
glimpse(weight_log_info)
## Rows: 67
## Columns: 8
## $ Id <dbl> 1503960366, 1503960366, 1927972279, 2873212765, 2873212…
## $ Date <chr> "2016-05-02", "2016-05-03", "2016-04-13", "2016-04-21",…
## $ WeightKg <dbl> 52.6, 52.6, 133.5, 56.7, 57.3, 72.4, 72.3, 69.7, 70.3, …
## $ WeightPounds <dbl> 115.9631, 115.9631, 294.3171, 125.0021, 126.3249, 159.6…
## $ Fat <int> 22, NA, NA, NA, NA, 25, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ BMI <dbl> 22.65, 22.65, 47.54, 21.45, 21.69, 27.45, 27.38, 27.25,…
## $ IsManualReport <lgl> TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, …
## $ LogId <dbl> 1.46223e+12, 1.46232e+12, 1.46051e+12, 1.46128e+12, 1.4…
n_distinct(daily_activity$Id)
## [1] 33
n_distinct(sleep_day$Id)
## [1] 24
n_distinct(weight_log_info$Id)
## [1] 8
sum(duplicated(daily_activity))
## [1] 0
sum(duplicated(sleep_day))
## [1] 3
sum(duplicated(weight_log_info))
## [1] 0
daily_activity <- daily_activity %>%
distinct() %>%
drop_na()
sleep_day <- sleep_day %>%
distinct() %>%
drop_na()
sum(duplicated(sleep_day))
## [1] 0
daily_activity<- daily_activity %>% clean_names()
### Check if column names are now 'clean'- all columns are in a consistent format
colnames(daily_activity)
## [1] "id" "activity_date"
## [3] "total_steps" "total_distance"
## [5] "tracker_distance" "logged_activities_distance"
## [7] "very_active_distance" "moderately_active_distance"
## [9] "light_active_distance" "sedentary_active_distance"
## [11] "very_active_minutes" "fairly_active_minutes"
## [13] "lightly_active_minutes" "sedentary_minutes"
## [15] "calories"
glimpse(daily_activity)
## Rows: 940
## Columns: 15
## $ id <dbl> 1503960366, 1503960366, 1503960366, 1503960…
## $ activity_date <chr> "4/12/2016", "4/13/2016", "4/14/2016", "4/1…
## $ total_steps <int> 13162, 10735, 10460, 9762, 12669, 9705, 130…
## $ total_distance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9…
## $ tracker_distance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9…
## $ logged_activities_distance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ very_active_distance <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3…
## $ moderately_active_distance <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1…
## $ light_active_distance <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5…
## $ sedentary_active_distance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ very_active_minutes <int> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66,…
## $ fairly_active_minutes <int> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, …
## $ lightly_active_minutes <int> 328, 217, 181, 209, 221, 164, 233, 264, 205…
## $ sedentary_minutes <int> 728, 776, 1218, 726, 773, 539, 1149, 775, 8…
## $ calories <int> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 2…
### Clean the sleep_day data frame
sleep_day <- sleep_day %>% clean_names()
### Check if column names are now clean
colnames(sleep_day)
## [1] "id" "sleep_day" "total_sleep_records"
## [4] "total_minutes_asleep" "total_time_in_bed"
glimpse(sleep_day)
## Rows: 410
## Columns: 5
## $ id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1…
## $ sleep_day <chr> "4/12/2016 12:00:00 AM", "4/13/2016 12:00:00 AM",…
## $ total_sleep_records <int> 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ total_minutes_asleep <int> 327, 384, 412, 340, 700, 304, 360, 325, 361, 430,…
## $ total_time_in_bed <int> 346, 407, 442, 367, 712, 320, 377, 364, 384, 449,…
### Clean the weight_log_info data frame()
weight_log_info <- weight_log_info %>% clean_names()
### Check if column names are now 'clean'
colnames(weight_log_info)
## [1] "id" "date" "weight_kg" "weight_pounds"
## [5] "fat" "bmi" "is_manual_report" "log_id"
glimpse(weight_log_info)
## Rows: 67
## Columns: 8
## $ id <dbl> 1503960366, 1503960366, 1927972279, 2873212765, 28732…
## $ date <chr> "2016-05-02", "2016-05-03", "2016-04-13", "2016-04-21…
## $ weight_kg <dbl> 52.6, 52.6, 133.5, 56.7, 57.3, 72.4, 72.3, 69.7, 70.3…
## $ weight_pounds <dbl> 115.9631, 115.9631, 294.3171, 125.0021, 126.3249, 159…
## $ fat <int> 22, NA, NA, NA, NA, 25, NA, NA, NA, NA, NA, NA, NA, N…
## $ bmi <dbl> 22.65, 22.65, 47.54, 21.45, 21.69, 27.45, 27.38, 27.2…
## $ is_manual_report <lgl> TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE…
## $ log_id <dbl> 1.46223e+12, 1.46232e+12, 1.46051e+12, 1.46128e+12, 1…
daily_activity <- daily_activity %>%
rename(Date = activity_date) %>%
mutate(Date = as_date(Date, format = "%m/%d/%Y"))
### Check to see if the data type of the dates has been changed
head(daily_activity)
## id Date total_steps total_distance tracker_distance
## 1 1503960366 2016-04-12 13162 8.50 8.50
## 2 1503960366 2016-04-13 10735 6.97 6.97
## 3 1503960366 2016-04-14 10460 6.74 6.74
## 4 1503960366 2016-04-15 9762 6.28 6.28
## 5 1503960366 2016-04-16 12669 8.16 8.16
## 6 1503960366 2016-04-17 9705 6.48 6.48
## logged_activities_distance very_active_distance moderately_active_distance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.44 0.40
## 4 0 2.14 1.26
## 5 0 2.71 0.41
## 6 0 3.19 0.78
## light_active_distance sedentary_active_distance very_active_minutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## fairly_active_minutes lightly_active_minutes sedentary_minutes calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 11 181 1218 1776
## 4 34 209 726 1745
## 5 10 221 773 1863
## 6 20 164 539 1728
glimpse(daily_activity)
## Rows: 940
## Columns: 15
## $ id <dbl> 1503960366, 1503960366, 1503960366, 1503960…
## $ Date <date> 2016-04-12, 2016-04-13, 2016-04-14, 2016-0…
## $ total_steps <int> 13162, 10735, 10460, 9762, 12669, 9705, 130…
## $ total_distance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9…
## $ tracker_distance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9…
## $ logged_activities_distance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ very_active_distance <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3…
## $ moderately_active_distance <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1…
## $ light_active_distance <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5…
## $ sedentary_active_distance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ very_active_minutes <int> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66,…
## $ fairly_active_minutes <int> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, …
## $ lightly_active_minutes <int> 328, 217, 181, 209, 221, 164, 233, 264, 205…
## $ sedentary_minutes <int> 728, 776, 1218, 726, 773, 539, 1149, 775, 8…
## $ calories <int> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 2…
sleep_day <- sleep_day %>%
rename(Date = sleep_day) %>%
mutate(Date = as_date(Date,format ="%m/%d/%Y %I:%M:%S %p"))
### Check to see if the data type of the dates has been changed
head(sleep_day)
## id Date total_sleep_records total_minutes_asleep
## 1 1503960366 2016-04-12 1 327
## 2 1503960366 2016-04-13 2 384
## 3 1503960366 2016-04-15 1 412
## 4 1503960366 2016-04-16 2 340
## 5 1503960366 2016-04-17 1 700
## 6 1503960366 2016-04-19 1 304
## total_time_in_bed
## 1 346
## 2 407
## 3 442
## 4 367
## 5 712
## 6 320
colnames(sleep_day)
## [1] "id" "Date" "total_sleep_records"
## [4] "total_minutes_asleep" "total_time_in_bed"
glimpse(sleep_day)
## Rows: 410
## Columns: 5
## $ id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1…
## $ Date <date> 2016-04-12, 2016-04-13, 2016-04-15, 2016-04-16, …
## $ total_sleep_records <int> 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ total_minutes_asleep <int> 327, 384, 412, 340, 700, 304, 360, 325, 361, 430,…
## $ total_time_in_bed <int> 346, 407, 442, 367, 712, 320, 377, 364, 384, 449,…
### The date data type for the weight_log_info data frame has been cleaned using Excel
### so the dates are already in ISO format, only need to rename the column name, from
### "date" to "Date".
weight_log_info <- weight_log_info %>%
rename("Date" = "date")
### Check to see if the data type of the dates has been changed
head(weight_log_info)
## id Date weight_kg weight_pounds fat bmi is_manual_report
## 1 1503960366 2016-05-02 52.6 115.9631 22 22.65 TRUE
## 2 1503960366 2016-05-03 52.6 115.9631 NA 22.65 TRUE
## 3 1927972279 2016-04-13 133.5 294.3171 NA 47.54 FALSE
## 4 2873212765 2016-04-21 56.7 125.0021 NA 21.45 TRUE
## 5 2873212765 2016-05-12 57.3 126.3249 NA 21.69 TRUE
## 6 4319703577 2016-04-17 72.4 159.6147 25 27.45 TRUE
## log_id
## 1 1.46223e+12
## 2 1.46232e+12
## 3 1.46051e+12
## 4 1.46128e+12
## 5 1.46310e+12
## 6 1.46094e+12
glimpse(weight_log_info)
## Rows: 67
## Columns: 8
## $ id <dbl> 1503960366, 1503960366, 1927972279, 2873212765, 28732…
## $ Date <chr> "2016-05-02", "2016-05-03", "2016-04-13", "2016-04-21…
## $ weight_kg <dbl> 52.6, 52.6, 133.5, 56.7, 57.3, 72.4, 72.3, 69.7, 70.3…
## $ weight_pounds <dbl> 115.9631, 115.9631, 294.3171, 125.0021, 126.3249, 159…
## $ fat <int> 22, NA, NA, NA, NA, 25, NA, NA, NA, NA, NA, NA, NA, N…
## $ bmi <dbl> 22.65, 22.65, 47.54, 21.45, 21.69, 27.45, 27.38, 27.2…
## $ is_manual_report <lgl> TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE…
## $ log_id <dbl> 1.46223e+12, 1.46232e+12, 1.46051e+12, 1.46128e+12, 1…
daily_activity %>%
select(total_steps,total_distance,sedentary_minutes) %>%
summary()
## total_steps total_distance sedentary_minutes
## Min. : 0 Min. : 0.000 Min. : 0.0
## 1st Qu.: 3790 1st Qu.: 2.620 1st Qu.: 729.8
## Median : 7406 Median : 5.245 Median :1057.5
## Mean : 7638 Mean : 5.490 Mean : 991.2
## 3rd Qu.:10727 3rd Qu.: 7.713 3rd Qu.:1229.5
## Max. :36019 Max. :28.030 Max. :1440.0
sleep_day %>%
select(total_sleep_records,
total_minutes_asleep,
total_time_in_bed) %>%
summary()
## total_sleep_records total_minutes_asleep total_time_in_bed
## Min. :1.00 Min. : 58.0 Min. : 61.0
## 1st Qu.:1.00 1st Qu.:361.0 1st Qu.:403.8
## Median :1.00 Median :432.5 Median :463.0
## Mean :1.12 Mean :419.2 Mean :458.5
## 3rd Qu.:1.00 3rd Qu.:490.0 3rd Qu.:526.0
## Max. :3.00 Max. :796.0 Max. :961.0
First will examine the relationship between steps taken in a day and sedentary minutes. This might help to engage more consumers to start walking more.
ggplot(data=daily_activity, aes(x=total_steps, y=sedentary_minutes, color=calories)) +
geom_point() +
labs(title = "Figure 1: How Sedentary Time Varies with Step Counts") +
theme(plot.title=element_text(face="bold"))
At first glance of this, it doesn’t seem to make sense that there are two clusters of points for the same type of plot (sendary_minutes vs. total_steps). ). One obvious observation is that the two clusters correlate negatively with an increase in step counts. This makes sense since when one spends more time walking/running, it leaves less time for sedentary activities.
The presence of two clusters of points with similar trends could be due to the intensity levels of activity (steps) one engages in. When one spends more time on light_activities, e.g., slow walking consumes calories at a slower rate. While in this case, one could attain bigger steps, it takes more time to accomplish the same step counts as compared to a fast stride. Therefore, leaving less time for sedentary activities. The opposite is true for fast walking (very_active_minutes), it leaves users with more time for sedentary activities. Therefore, the more activity (e.g. fast walking) represents the upper cluster of points and the less intense activities (e.g. slow walking) form the lower cluster of points of the two scattered plots.
ggplot(data=sleep_day, aes(x=total_minutes_asleep, y=total_time_in_bed, color=total_time_in_bed)) +
geom_point() +
labs(title = "Figure 2: Relation between Time in Bed and Time Asleep") +
theme(plot.title=element_text(face="bold"))
In this plot, there are two groups of points parallel to each other with the smaller group above the larger one between ~180 (3 hours) and ~400 minutes (6.7 hours) of total_ minutes_asleep. This amount to an average of ~39 minutes that the users are awake in bed. This could be indicative of either bad sleepers or users doing other things in bed, such as reading a book or checking on their smartphone.
merged_data <- merge(sleep_day, daily_activity, by= c("id", "Date"))
head(merged_data)
## id Date total_sleep_records total_minutes_asleep
## 1 1503960366 2016-04-12 1 327
## 2 1503960366 2016-04-13 2 384
## 3 1503960366 2016-04-15 1 412
## 4 1503960366 2016-04-16 2 340
## 5 1503960366 2016-04-17 1 700
## 6 1503960366 2016-04-19 1 304
## total_time_in_bed total_steps total_distance tracker_distance
## 1 346 13162 8.50 8.50
## 2 407 10735 6.97 6.97
## 3 442 9762 6.28 6.28
## 4 367 12669 8.16 8.16
## 5 712 9705 6.48 6.48
## 6 320 15506 9.88 9.88
## logged_activities_distance very_active_distance moderately_active_distance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.14 1.26
## 4 0 2.71 0.41
## 5 0 3.19 0.78
## 6 0 3.53 1.32
## light_active_distance sedentary_active_distance very_active_minutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 2.83 0 29
## 4 5.04 0 36
## 5 2.51 0 38
## 6 5.03 0 50
## fairly_active_minutes lightly_active_minutes sedentary_minutes calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 34 209 726 1745
## 4 10 221 773 1863
## 5 20 164 539 1728
## 6 31 264 775 2035
glimpse(merged_data)
## Rows: 410
## Columns: 18
## $ id <dbl> 1503960366, 1503960366, 1503960366, 1503960…
## $ Date <date> 2016-04-12, 2016-04-13, 2016-04-15, 2016-0…
## $ total_sleep_records <int> 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ total_minutes_asleep <int> 327, 384, 412, 340, 700, 304, 360, 325, 361…
## $ total_time_in_bed <int> 346, 407, 442, 367, 712, 320, 377, 364, 384…
## $ total_steps <int> 13162, 10735, 9762, 12669, 9705, 15506, 105…
## $ total_distance <dbl> 8.50, 6.97, 6.28, 8.16, 6.48, 9.88, 6.68, 6…
## $ tracker_distance <dbl> 8.50, 6.97, 6.28, 8.16, 6.48, 9.88, 6.68, 6…
## $ logged_activities_distance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ very_active_distance <dbl> 1.88, 1.57, 2.14, 2.71, 3.19, 3.53, 1.96, 1…
## $ moderately_active_distance <dbl> 0.55, 0.69, 1.26, 0.41, 0.78, 1.32, 0.48, 0…
## $ light_active_distance <dbl> 6.06, 4.71, 2.83, 5.04, 2.51, 5.03, 4.24, 4…
## $ sedentary_active_distance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ very_active_minutes <int> 25, 21, 29, 36, 38, 50, 28, 19, 41, 39, 73,…
## $ fairly_active_minutes <int> 13, 19, 34, 10, 20, 31, 12, 8, 21, 5, 14, 2…
## $ lightly_active_minutes <int> 328, 217, 209, 221, 164, 264, 205, 211, 262…
## $ sedentary_minutes <int> 728, 776, 726, 773, 539, 775, 818, 838, 732…
## $ calories <int> 1985, 1797, 1745, 1863, 1728, 2035, 1786, 1…
n_distinct(merged_data$id)
## [1] 24
combined_data <- merge(daily_activity, sleep_day, by=c ("id", 'Date'), all = TRUE)
head(combined_data)
## id Date total_steps total_distance tracker_distance
## 1 1503960366 2016-04-12 13162 8.50 8.50
## 2 1503960366 2016-04-13 10735 6.97 6.97
## 3 1503960366 2016-04-14 10460 6.74 6.74
## 4 1503960366 2016-04-15 9762 6.28 6.28
## 5 1503960366 2016-04-16 12669 8.16 8.16
## 6 1503960366 2016-04-17 9705 6.48 6.48
## logged_activities_distance very_active_distance moderately_active_distance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.44 0.40
## 4 0 2.14 1.26
## 5 0 2.71 0.41
## 6 0 3.19 0.78
## light_active_distance sedentary_active_distance very_active_minutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## fairly_active_minutes lightly_active_minutes sedentary_minutes calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 11 181 1218 1776
## 4 34 209 726 1745
## 5 10 221 773 1863
## 6 20 164 539 1728
## total_sleep_records total_minutes_asleep total_time_in_bed
## 1 1 327 346
## 2 2 384 407
## 3 NA NA NA
## 4 1 412 442
## 5 2 340 367
## 6 1 700 712
colnames( combined_data)
## [1] "id" "Date"
## [3] "total_steps" "total_distance"
## [5] "tracker_distance" "logged_activities_distance"
## [7] "very_active_distance" "moderately_active_distance"
## [9] "light_active_distance" "sedentary_active_distance"
## [11] "very_active_minutes" "fairly_active_minutes"
## [13] "lightly_active_minutes" "sedentary_minutes"
## [15] "calories" "total_sleep_records"
## [17] "total_minutes_asleep" "total_time_in_bed"
glimpse(combined_data)
## Rows: 940
## Columns: 18
## $ id <dbl> 1503960366, 1503960366, 1503960366, 1503960…
## $ Date <date> 2016-04-12, 2016-04-13, 2016-04-14, 2016-0…
## $ total_steps <int> 13162, 10735, 10460, 9762, 12669, 9705, 130…
## $ total_distance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9…
## $ tracker_distance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9…
## $ logged_activities_distance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ very_active_distance <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3…
## $ moderately_active_distance <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1…
## $ light_active_distance <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5…
## $ sedentary_active_distance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ very_active_minutes <int> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66,…
## $ fairly_active_minutes <int> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, …
## $ lightly_active_minutes <int> 328, 217, 181, 209, 221, 164, 233, 264, 205…
## $ sedentary_minutes <int> 728, 776, 1218, 726, 773, 539, 1149, 775, 8…
## $ calories <int> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 2…
## $ total_sleep_records <int> 1, 2, NA, 1, 2, 1, NA, 1, 1, 1, NA, 1, 1, 1…
## $ total_minutes_asleep <int> 327, 384, NA, 412, 340, 700, NA, 304, 360, …
## $ total_time_in_bed <int> 346, 407, NA, 442, 367, 712, NA, 320, 377, …
n_distinct(combined_data$id)
## [1] 33
sum(is.na(combined_data))
## [1] 1590
combined_data <- combined_data %>%
mutate_if(is.numeric, ~replace(., is.na(.), 0))
### Check for NA again to make sure they are all gone
sum(is.na(combined_data))
## [1] 0
combined_data$weekday <- wday(combined_data$Date, label=TRUE, abbr=FALSE)
head(combined_data)
## id Date total_steps total_distance tracker_distance
## 1 1503960366 2016-04-12 13162 8.50 8.50
## 2 1503960366 2016-04-13 10735 6.97 6.97
## 3 1503960366 2016-04-14 10460 6.74 6.74
## 4 1503960366 2016-04-15 9762 6.28 6.28
## 5 1503960366 2016-04-16 12669 8.16 8.16
## 6 1503960366 2016-04-17 9705 6.48 6.48
## logged_activities_distance very_active_distance moderately_active_distance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.44 0.40
## 4 0 2.14 1.26
## 5 0 2.71 0.41
## 6 0 3.19 0.78
## light_active_distance sedentary_active_distance very_active_minutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## fairly_active_minutes lightly_active_minutes sedentary_minutes calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 11 181 1218 1776
## 4 34 209 726 1745
## 5 10 221 773 1863
## 6 20 164 539 1728
## total_sleep_records total_minutes_asleep total_time_in_bed weekday
## 1 1 327 346 Tuesday
## 2 2 384 407 Wednesday
## 3 0 0 0 Thursday
## 4 1 412 442 Friday
## 5 2 340 367 Saturday
## 6 1 700 712 Sunday
glimpse(combined_data)
## Rows: 940
## Columns: 19
## $ id <dbl> 1503960366, 1503960366, 1503960366, 1503960…
## $ Date <date> 2016-04-12, 2016-04-13, 2016-04-14, 2016-0…
## $ total_steps <dbl> 13162, 10735, 10460, 9762, 12669, 9705, 130…
## $ total_distance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9…
## $ tracker_distance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9…
## $ logged_activities_distance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ very_active_distance <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3…
## $ moderately_active_distance <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1…
## $ light_active_distance <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5…
## $ sedentary_active_distance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ very_active_minutes <dbl> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66,…
## $ fairly_active_minutes <dbl> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, …
## $ lightly_active_minutes <dbl> 328, 217, 181, 209, 221, 164, 233, 264, 205…
## $ sedentary_minutes <dbl> 728, 776, 1218, 726, 773, 539, 1149, 775, 8…
## $ calories <dbl> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 2…
## $ total_sleep_records <dbl> 1, 2, 0, 1, 2, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1…
## $ total_minutes_asleep <dbl> 327, 384, 0, 412, 340, 700, 0, 304, 360, 32…
## $ total_time_in_bed <dbl> 346, 407, 0, 442, 367, 712, 0, 320, 377, 36…
## $ weekday <ord> Tuesday, Wednesday, Thursday, Friday, Satur…
combined_data %>%
select(total_steps, total_distance, very_active_distance,
moderately_active_distance,moderately_active_distance,
light_active_distance, sedentary_active_distance,
sedentary_minutes,very_active_minutes, fairly_active_minutes,
lightly_active_minutes,calories) %>%
summary()
## total_steps total_distance very_active_distance
## Min. : 0 Min. : 0.000 Min. : 0.000
## 1st Qu.: 3790 1st Qu.: 2.620 1st Qu.: 0.000
## Median : 7406 Median : 5.245 Median : 0.210
## Mean : 7638 Mean : 5.490 Mean : 1.503
## 3rd Qu.:10727 3rd Qu.: 7.713 3rd Qu.: 2.053
## Max. :36019 Max. :28.030 Max. :21.920
## moderately_active_distance light_active_distance sedentary_active_distance
## Min. :0.0000 Min. : 0.000 Min. :0.000000
## 1st Qu.:0.0000 1st Qu.: 1.945 1st Qu.:0.000000
## Median :0.2400 Median : 3.365 Median :0.000000
## Mean :0.5675 Mean : 3.341 Mean :0.001606
## 3rd Qu.:0.8000 3rd Qu.: 4.782 3rd Qu.:0.000000
## Max. :6.4800 Max. :10.710 Max. :0.110000
## sedentary_minutes very_active_minutes fairly_active_minutes
## Min. : 0.0 Min. : 0.00 Min. : 0.00
## 1st Qu.: 729.8 1st Qu.: 0.00 1st Qu.: 0.00
## Median :1057.5 Median : 4.00 Median : 6.00
## Mean : 991.2 Mean : 21.16 Mean : 13.56
## 3rd Qu.:1229.5 3rd Qu.: 32.00 3rd Qu.: 19.00
## Max. :1440.0 Max. :210.00 Max. :143.00
## lightly_active_minutes calories
## Min. : 0.0 Min. : 0
## 1st Qu.:127.0 1st Qu.:1828
## Median :199.0 Median :2134
## Mean :192.8 Mean :2304
## 3rd Qu.:264.0 3rd Qu.:2793
## Max. :518.0 Max. :4900
head(combined_data)
## id Date total_steps total_distance tracker_distance
## 1 1503960366 2016-04-12 13162 8.50 8.50
## 2 1503960366 2016-04-13 10735 6.97 6.97
## 3 1503960366 2016-04-14 10460 6.74 6.74
## 4 1503960366 2016-04-15 9762 6.28 6.28
## 5 1503960366 2016-04-16 12669 8.16 8.16
## 6 1503960366 2016-04-17 9705 6.48 6.48
## logged_activities_distance very_active_distance moderately_active_distance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.44 0.40
## 4 0 2.14 1.26
## 5 0 2.71 0.41
## 6 0 3.19 0.78
## light_active_distance sedentary_active_distance very_active_minutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## fairly_active_minutes lightly_active_minutes sedentary_minutes calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 11 181 1218 1776
## 4 34 209 726 1745
## 5 10 221 773 1863
## 6 20 164 539 1728
## total_sleep_records total_minutes_asleep total_time_in_bed weekday
## 1 1 327 346 Tuesday
## 2 2 384 407 Wednesday
## 3 0 0 0 Thursday
## 4 1 412 442 Friday
## 5 2 340 367 Saturday
## 6 1 700 712 Sunday
glimpse(combined_data)
## Rows: 940
## Columns: 19
## $ id <dbl> 1503960366, 1503960366, 1503960366, 1503960…
## $ Date <date> 2016-04-12, 2016-04-13, 2016-04-14, 2016-0…
## $ total_steps <dbl> 13162, 10735, 10460, 9762, 12669, 9705, 130…
## $ total_distance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9…
## $ tracker_distance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9…
## $ logged_activities_distance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ very_active_distance <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3…
## $ moderately_active_distance <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1…
## $ light_active_distance <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5…
## $ sedentary_active_distance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ very_active_minutes <dbl> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66,…
## $ fairly_active_minutes <dbl> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, …
## $ lightly_active_minutes <dbl> 328, 217, 181, 209, 221, 164, 233, 264, 205…
## $ sedentary_minutes <dbl> 728, 776, 1218, 726, 773, 539, 1149, 775, 8…
## $ calories <dbl> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 2…
## $ total_sleep_records <dbl> 1, 2, 0, 1, 2, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1…
## $ total_minutes_asleep <dbl> 327, 384, 0, 412, 340, 700, 0, 304, 360, 32…
## $ total_time_in_bed <dbl> 346, 407, 0, 442, 367, 712, 0, 320, 377, 36…
## $ weekday <ord> Tuesday, Wednesday, Thursday, Friday, Satur…
activity_summary <- combined_data %>%
group_by(weekday) %>%
summarise(avg_daily_steps = mean(total_steps),
avg_sedentary_hours = (mean(sedentary_minutes)/60),
avg_daily_calories = mean(calories),
avg_weekday_active_hr = (mean(very_active_minutes) +
mean(fairly_active_minutes) +
mean(lightly_active_minutes))/60,
avg_very_active_hr = mean(very_active_minutes)/60,
percent_activity_hr = (mean(very_active_minutes)/(mean(very_active_minutes) +
mean(fairly_active_minutes) +
mean(lightly_active_minutes)))*100)
colnames(activity_summary)
## [1] "weekday" "avg_daily_steps" "avg_sedentary_hours"
## [4] "avg_daily_calories" "avg_weekday_active_hr" "avg_very_active_hr"
## [7] "percent_activity_hr"
head(activity_summary)
## # A tibble: 6 × 7
## weekday avg_daily_steps avg_sedentary_hours avg_da…¹ avg_w…² avg_v…³ perce…⁴
## <ord> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Sunday 6933. 16.5 2263 3.47 0.333 9.58
## 2 Monday 7781. 17.1 2324. 3.82 0.385 10.1
## 3 Tuesday 8125. 16.8 2356. 3.91 0.383 9.78
## 4 Wednesday 7559. 16.5 2303. 3.73 0.346 9.29
## 5 Thursday 7406. 16.0 2200. 3.61 0.323 8.95
## 6 Friday 7448. 16.7 2332. 3.94 0.334 8.48
## # … with abbreviated variable names ¹avg_daily_calories,
## # ²avg_weekday_active_hr, ³avg_very_active_hr, ⁴percent_activity_hr
glimpse(activity_summary)
## Rows: 7
## Columns: 7
## $ weekday <ord> Sunday, Monday, Tuesday, Wednesday, Thursday, Fr…
## $ avg_daily_steps <dbl> 6933.231, 7780.867, 8125.007, 7559.373, 7405.837…
## $ avg_sedentary_hours <dbl> 16.50427, 17.13236, 16.78936, 16.49133, 16.03322…
## $ avg_daily_calories <dbl> 2263.000, 2324.208, 2356.013, 2302.620, 2199.571…
## $ avg_weekday_active_hr <dbl> 3.474793, 3.819444, 3.910526, 3.728889, 3.613152…
## $ avg_very_active_hr <dbl> 0.3330579, 0.3851389, 0.3825658, 0.3463333, 0.32…
## $ percent_activity_hr <dbl> 9.584968, 10.083636, 9.782974, 9.287843, 8.95255…
### Total step counts on each weekday of the week.
ggplot(data=activity_summary, mapping=aes(x= weekday,y=avg_daily_steps, fill=weekday))+
geom_bar(stat="identity")+
labs(title="Figure 3: Average Step Counts on Each Weekday") +
theme(plot.title=element_text(face="bold"))
This plot clearly shows that users are most active on Tuesdays and Saturdays, accumulating >8,000 steps, which is within the recommended daily 7,500 - 10,000 steps, depending on age and sex. Sundays are the least active day with an average of ~7,000 step counts. This could be because some users spend time attending church services and relaxing before the start of another work week.
### Calories burnt on each weekday through the week
ggplot(data=activity_summary, mapping=aes(x= weekday, avg_daily_calories, fill=weekday))+
geom_bar(stat="identity")+
labs(title="Figure 4: Average Calories Burnt Each Weekday") +
theme(plot.title=element_text(face="bold"))
This plot shows that Tuesdays and Saturdays burn the most calories, ~2,400 after users have taken > 8,000 steps. However, the number of calories burnt do not quite correlate with the step counts from Wednesdays to Fridays. This might indicate that the intensity of activity (very_active, fairly_active, and lightly_active) and the combined effect of the total time that users engaged in each type of activity play a major role in the consumption of calories. So, will examine other activity factors that might affect calorie consumption.
As discussed in Figure 1 above, one needs to spend more time in lightly _active activities in order to achieve the same step counts and calories burnt, as compared to very_active activities that can attain the same effect in shoter time. In general, calories burnt would trend positively with an increase in step counts and the amount of time spent on any type of activity burns calories at different rates. However, while one would see similar correlations when plotting the overall calories vs total step counts, the points on the plots would be widely spread out since the data is an aggregate of all three types of activity intensity. Therefore next, we’ll examine how the duration of time users engaged in overall activities varies through the weekday.
### Total activity hours (very_active, + farly_active + lightly_active) vary through the week
ggplot(data=activity_summary, mapping=aes(x= weekday, avg_weekday_active_hr, fill=weekday))+
geom_bar(stat="identity")+
labs(title="Figure 5: Average Weekday Overall Activity Hours") +
theme(plot.title=element_text(face="bold"))
This plot shows the amount of time engaged in overall activities correlates very closely with calories burnt and somewhat with Figure 3 for step counts through the week. So next we’ll examine if the intensity of activities also plays a role in calories burnt. Since very intense activities (very_active) are supposed to burn calories at higher rates, so next will see how the amount of time spent on very intense activities change through the week.
### Activity Hours users spent on very_active activities each weekday.
ggplot(data=activity_summary, mapping=aes(x= weekday, avg_very_active_hr, fill=weekday))+
geom_bar(stat="identity")+
labs(title="Figure 6a: Very_active Activity Hours on Each Weekay") +
theme(plot.title=element_text(face="bold"))
### Percent Activity Hours users spent on very_active activities each weekday.
ggplot(data=activity_summary, mapping=aes(x= weekday, percent_activity_hr, fill=weekday))+
geom_bar(stat="identity")+
labs(title="Figure 6b: Percent Very_active Activity Hours") +
theme(plot.title=element_text(face="bold"))
Figures 6a and 6b actually show a small decrease in the amount of time spent on very intense activities on Sundays and Wednesdays through Fridays but the number of calories burnt did not. This means that there might be other factors accounting for the discrepancy. However, this plot correlates fairly with Figures 3 (avg_daily_steps vs. weekday) and 5 (avg_weekday_active_hr vs. weekday). It also bears a similar weekday trend with Figures 4 (avg_daily_calories vs. weekday) and Figure 5 (avg_weekday_active_hr vs weekday), except for a small time decrease on Sundays and Wednesdays to Fridays.
However, it’s unclear how the Bellabeat App measures intensities of activity and sedentary time. Does it only consider step-accrued activities? Figure 1 indicates that sedentary time is at its maximum of 24 and ~17 hours when the step counts = 0 for the upper and lower clusters, respectively. When the step counts ~1,500, the sedentary time decreases to ~17 and ~8.5 hours for the upper and lower clusters.
If the intensity or sedentary measurements were only based on steps accrued as Figure 1 seems to imply, it might have missed stationary high-intensity activities such as weightlifting and resulted in more sedentary time as shown in Figure 7, even though less time were engaged in overall activity and very_active activities. Since activities like sitting down or standing up doing chores, learning new materials, and working at their desk or in front of their computers consume calories at a different rate, Bellabeat will need to clarify how sedentary time and intensities of activity are measured. This is especially important since most users spend an average of > ~16 hours of sedentary time a day.
### How much average sedentary time do users spend each weekday
ggplot(data=activity_summary, mapping=aes(x= weekday, avg_sedentary_hours, fill=weekday))+
geom_bar(stat="identity")+
labs(title="Figure 7: Average Weekday Sedentary Hours") +
theme(plot.title=element_text(face="bold"))
It’s striking to see this plot trends closely with Figure 4, calories burnt through the week, except Saturdays when the time spent on sedentary activities is slightly lower than one would expect similar to that of Fridays. As discussed in Figure 6 above, if sedentary time means no steps accumulated, then intense activities like weightlifting would be considered sedentary even though it burns lots of calories without accumulating steps. Further, participants who did not register sleep time seem to consider sleep as a sedentary activity. Also, if sedentary time decrease on Saturdays, does it mean that users have more time to engage in step-accrued activities, e.g. running?
‘fat’ and ‘is_manual_report’ columns: Since most users did not register them, so will not be used in the analysis.
‘log_id’ column: It contains only log_id and does not contain activity data, which will be removed prior to data analysis.
In the absence of ‘fat’ information, the analysis will be based on BMI (bmi(Kg/m2) = weight/height2) data since BMI is a health screening tool, especially for potential weight problems.
weight_log_info_new = weight_log_info[c("id","Date", "weight_kg", "weight_pounds", "bmi")]
colnames(weight_log_info_new)
## [1] "id" "Date" "weight_kg" "weight_pounds"
## [5] "bmi"
head(weight_log_info_new)
## id Date weight_kg weight_pounds bmi
## 1 1503960366 2016-05-02 52.6 115.9631 22.65
## 2 1503960366 2016-05-03 52.6 115.9631 22.65
## 3 1927972279 2016-04-13 133.5 294.3171 47.54
## 4 2873212765 2016-04-21 56.7 125.0021 21.45
## 5 2873212765 2016-05-12 57.3 126.3249 21.69
## 6 4319703577 2016-04-17 72.4 159.6147 27.45
glimpse(weight_log_info_new)
## Rows: 67
## Columns: 5
## $ id <dbl> 1503960366, 1503960366, 1927972279, 2873212765, 28732127…
## $ Date <chr> "2016-05-02", "2016-05-03", "2016-04-13", "2016-04-21", …
## $ weight_kg <dbl> 52.6, 52.6, 133.5, 56.7, 57.3, 72.4, 72.3, 69.7, 70.3, 6…
## $ weight_pounds <dbl> 115.9631, 115.9631, 294.3171, 125.0021, 126.3249, 159.61…
## $ bmi <dbl> 22.65, 22.65, 47.54, 21.45, 21.69, 27.45, 27.38, 27.25, …
bmi_df1 (BMI=18.5-24.9), bmi_df2 (BMI=25-29.9) and bmi_df3 (BMI>30) for each category of bmi. Since the weight_log_info dataset does not contain bmi data < 18.5. Therefore, all analyses will be performed for the normal weight, overweight and obese BMI only.
### Create the normal weight bmi data frame
bmi_df1 <- subset(weight_log_info_new, bmi>=18.5 & bmi <= 24.9)
head(bmi_df1)
## id Date weight_kg weight_pounds bmi
## 1 1503960366 2016-05-02 52.6 115.9631 22.65
## 2 1503960366 2016-05-03 52.6 115.9631 22.65
## 4 2873212765 2016-04-21 56.7 125.0021 21.45
## 5 2873212765 2016-05-12 57.3 126.3249 21.69
## 14 6962181067 2016-04-12 62.5 137.7889 24.39
## 15 6962181067 2016-04-13 62.1 136.9071 24.24
glimpse(bmi_df1)
## Rows: 34
## Columns: 5
## $ id <dbl> 1503960366, 1503960366, 2873212765, 2873212765, 69621810…
## $ Date <chr> "2016-05-02", "2016-05-03", "2016-04-21", "2016-05-12", …
## $ weight_kg <dbl> 52.6, 52.6, 56.7, 57.3, 62.5, 62.1, 61.7, 61.5, 62.0, 61…
## $ weight_pounds <dbl> 115.9631, 115.9631, 125.0021, 126.3249, 137.7889, 136.90…
## $ bmi <dbl> 22.65, 22.65, 21.45, 21.69, 24.39, 24.24, 24.10, 24.00, …
### Rename the bmi column of bmi_normal_weight to bmi_normal_weight, according to the definition of CDC
bmi_df1 <- bmi_df1 %>%
rename(bmi_normal_weight = bmi)
head(bmi_df1)
## id Date weight_kg weight_pounds bmi_normal_weight
## 1 1503960366 2016-05-02 52.6 115.9631 22.65
## 2 1503960366 2016-05-03 52.6 115.9631 22.65
## 4 2873212765 2016-04-21 56.7 125.0021 21.45
## 5 2873212765 2016-05-12 57.3 126.3249 21.69
## 14 6962181067 2016-04-12 62.5 137.7889 24.39
## 15 6962181067 2016-04-13 62.1 136.9071 24.24
glimpse(bmi_df1)
## Rows: 34
## Columns: 5
## $ id <dbl> 1503960366, 1503960366, 2873212765, 2873212765, 6962…
## $ Date <chr> "2016-05-02", "2016-05-03", "2016-04-21", "2016-05-1…
## $ weight_kg <dbl> 52.6, 52.6, 56.7, 57.3, 62.5, 62.1, 61.7, 61.5, 62.0…
## $ weight_pounds <dbl> 115.9631, 115.9631, 125.0021, 126.3249, 137.7889, 13…
## $ bmi_normal_weight <dbl> 22.65, 22.65, 21.45, 21.69, 24.39, 24.24, 24.10, 24.…
### Data frame for the overweight bmi
bmi_df2 <- subset(weight_log_info_new, bmi>=25 & bmi <= 29.9)
head(bmi_df2)
## id Date weight_kg weight_pounds bmi
## 6 4319703577 2016-04-17 72.4 159.6147 27.45
## 7 4319703577 2016-05-04 72.3 159.3942 27.38
## 8 4558609924 2016-04-18 69.7 153.6622 27.25
## 9 4558609924 2016-04-25 70.3 154.9850 27.46
## 10 4558609924 2016-05-01 69.9 154.1031 27.32
## 11 4558609924 2016-05-02 69.2 152.5599 27.04
glimpse(bmi_df2)
## Rows: 32
## Columns: 5
## $ id <dbl> 4319703577, 4319703577, 4558609924, 4558609924, 45586099…
## $ Date <chr> "2016-04-17", "2016-05-04", "2016-04-18", "2016-04-25", …
## $ weight_kg <dbl> 72.4, 72.3, 69.7, 70.3, 69.9, 69.2, 69.1, 90.7, 85.8, 84…
## $ weight_pounds <dbl> 159.6147, 159.3942, 153.6622, 154.9850, 154.1031, 152.55…
## $ bmi <dbl> 27.45, 27.38, 27.25, 27.46, 27.32, 27.04, 27.00, 28.00, …
### Rename the bmi column of bmi_overweight to bmi_overweight
bmi_df2 <- bmi_df2 %>%
rename(bmi_overweight = bmi)
head(bmi_df2)
## id Date weight_kg weight_pounds bmi_overweight
## 6 4319703577 2016-04-17 72.4 159.6147 27.45
## 7 4319703577 2016-05-04 72.3 159.3942 27.38
## 8 4558609924 2016-04-18 69.7 153.6622 27.25
## 9 4558609924 2016-04-25 70.3 154.9850 27.46
## 10 4558609924 2016-05-01 69.9 154.1031 27.32
## 11 4558609924 2016-05-02 69.2 152.5599 27.04
glimpse(bmi_df2)
## Rows: 32
## Columns: 5
## $ id <dbl> 4319703577, 4319703577, 4558609924, 4558609924, 4558609…
## $ Date <chr> "2016-04-17", "2016-05-04", "2016-04-18", "2016-04-25",…
## $ weight_kg <dbl> 72.4, 72.3, 69.7, 70.3, 69.9, 69.2, 69.1, 90.7, 85.8, 8…
## $ weight_pounds <dbl> 159.6147, 159.3942, 153.6622, 154.9850, 154.1031, 152.5…
## $ bmi_overweight <dbl> 27.45, 27.38, 27.25, 27.46, 27.32, 27.04, 27.00, 28.00,…
bmi_df3 <- subset(weight_log_info_new, bmi>=30.0)
head(bmi_df3)
## id Date weight_kg weight_pounds bmi
## 3 1927972279 2016-04-13 133.5 294.3171 47.54
glimpse(bmi_df3)
## Rows: 1
## Columns: 5
## $ id <dbl> 1927972279
## $ Date <chr> "2016-04-13"
## $ weight_kg <dbl> 133.5
## $ weight_pounds <dbl> 294.3171
## $ bmi <dbl> 47.54
### Rename the bmi column of bmi_obese to obese
bmi_df3 <- bmi_df3 %>%
rename(bmi_obese=bmi)
### Check to see if the bmi column has been renamed
head(bmi_df3)
## id Date weight_kg weight_pounds bmi_obese
## 3 1927972279 2016-04-13 133.5 294.3171 47.54
glimpse(bmi_df3)
## Rows: 1
## Columns: 5
## $ id <dbl> 1927972279
## $ Date <chr> "2016-04-13"
## $ weight_kg <dbl> 133.5
## $ weight_pounds <dbl> 294.3171
## $ bmi_obese <dbl> 47.54
```r
### Examine some summary statistics of bmi_df1, bmi_df2 and bmi_df3
bmi_df1 %>%
select(weight_kg, weight_pounds, bmi_normal_weight) %>%
summary()
## weight_kg weight_pounds bmi_normal_weight
## Min. :52.60 Min. :116.0 Min. :21.45
## 1st Qu.:61.20 1st Qu.:134.9 1st Qu.:23.89
## Median :61.40 Median :135.4 Median :23.96
## Mean :60.76 Mean :134.0 Mean :23.80
## 3rd Qu.:61.70 3rd Qu.:136.0 3rd Qu.:24.10
## Max. :62.50 Max. :137.8 Max. :24.39
bmi_df2 %>%
select(weight_kg, weight_pounds,bmi_overweight) %>%
summary()
## weight_kg weight_pounds bmi_overweight
## Min. :69.10 Min. :152.3 Min. :25.14
## 1st Qu.:84.30 1st Qu.:185.8 1st Qu.:25.43
## Median :85.05 Median :187.5 Median :25.56
## Mean :82.10 Mean :181.0 Mean :25.96
## 3rd Qu.:85.42 3rd Qu.:188.3 3rd Qu.:26.01
## Max. :90.70 Max. :200.0 Max. :28.00
bmi_df3 %>%
select(weight_kg, weight_pounds, bmi_obese) %>%
summary()
## weight_kg weight_pounds bmi_obese
## Min. :133.5 Min. :294.3 Min. :47.54
## 1st Qu.:133.5 1st Qu.:294.3 1st Qu.:47.54
## Median :133.5 Median :294.3 Median :47.54
## Mean :133.5 Mean :294.3 Mean :47.54
## 3rd Qu.:133.5 3rd Qu.:294.3 3rd Qu.:47.54
## Max. :133.5 Max. :294.3 Max. :47.54
### Merge bmi_df1, bmi_df2 and bmi_df3 with selected columns of the daily_activity data frame
bmi_daily1 <- merge(daily_activity, bmi_df1, by = c ("id", "Date"))
colnames(bmi_daily1)
## [1] "id" "Date"
## [3] "total_steps" "total_distance"
## [5] "tracker_distance" "logged_activities_distance"
## [7] "very_active_distance" "moderately_active_distance"
## [9] "light_active_distance" "sedentary_active_distance"
## [11] "very_active_minutes" "fairly_active_minutes"
## [13] "lightly_active_minutes" "sedentary_minutes"
## [15] "calories" "weight_kg"
## [17] "weight_pounds" "bmi_normal_weight"
head(bmi_daily1)
## id Date total_steps total_distance tracker_distance
## 1 1503960366 2016-05-02 14727 9.71 9.71
## 2 1503960366 2016-05-03 15103 9.66 9.66
## 3 2873212765 2016-04-21 8859 5.98 5.98
## 4 2873212765 2016-05-12 7566 5.11 5.11
## 5 6962181067 2016-04-12 10199 6.74 6.74
## 6 6962181067 2016-04-13 5652 3.74 3.74
## logged_activities_distance very_active_distance moderately_active_distance
## 1 0 3.21 0.57
## 2 0 3.73 1.05
## 3 0 0.13 0.37
## 4 0 0.00 0.00
## 5 0 3.40 0.83
## 6 0 0.57 1.21
## light_active_distance sedentary_active_distance very_active_minutes
## 1 5.92 0.00 41
## 2 4.88 0.00 50
## 3 5.47 0.01 2
## 4 5.11 0.00 0
## 5 2.51 0.00 50
## 6 1.96 0.00 8
## fairly_active_minutes lightly_active_minutes sedentary_minutes calories
## 1 15 277 798 2004
## 2 24 254 816 1990
## 3 10 371 1057 1970
## 4 0 268 720 1431
## 5 14 189 796 1994
## 6 24 142 548 1718
## weight_kg weight_pounds bmi_normal_weight
## 1 52.6 115.9631 22.65
## 2 52.6 115.9631 22.65
## 3 56.7 125.0021 21.45
## 4 57.3 126.3249 21.69
## 5 62.5 137.7889 24.39
## 6 62.1 136.9071 24.24
glimpse(bmi_daily1)
## Rows: 34
## Columns: 18
## $ id <dbl> 1503960366, 1503960366, 2873212765, 2873212…
## $ Date <date> 2016-05-02, 2016-05-03, 2016-04-21, 2016-0…
## $ total_steps <int> 14727, 15103, 8859, 7566, 10199, 5652, 1551…
## $ total_distance <dbl> 9.71, 9.66, 5.98, 5.11, 6.74, 3.74, 1.03, 3…
## $ tracker_distance <dbl> 9.71, 9.66, 5.98, 5.11, 6.74, 3.74, 1.03, 3…
## $ logged_activities_distance <dbl> 0.000000, 0.000000, 0.000000, 0.000000, 0.0…
## $ very_active_distance <dbl> 3.21, 3.73, 0.13, 0.00, 3.40, 0.57, 0.00, 0…
## $ moderately_active_distance <dbl> 0.57, 1.05, 0.37, 0.00, 0.83, 1.21, 0.00, 0…
## $ light_active_distance <dbl> 5.92, 4.88, 5.47, 5.11, 2.51, 1.96, 1.03, 3…
## $ sedentary_active_distance <dbl> 0.00, 0.00, 0.01, 0.00, 0.00, 0.00, 0.00, 0…
## $ very_active_minutes <int> 41, 50, 2, 0, 50, 8, 0, 0, 50, 5, 13, 35, 4…
## $ fairly_active_minutes <int> 15, 24, 10, 0, 14, 24, 0, 0, 3, 13, 42, 41,…
## $ lightly_active_minutes <int> 277, 254, 371, 268, 189, 142, 86, 217, 280,…
## $ sedentary_minutes <int> 798, 816, 1057, 720, 796, 548, 862, 837, 74…
## $ calories <int> 2004, 1990, 1970, 1431, 1994, 1718, 1466, 1…
## $ weight_kg <dbl> 52.6, 52.6, 56.7, 57.3, 62.5, 62.1, 61.7, 6…
## $ weight_pounds <dbl> 115.9631, 115.9631, 125.0021, 126.3249, 137…
## $ bmi_normal_weight <dbl> 22.65, 22.65, 21.45, 21.69, 24.39, 24.24, 2…
bmi_daily2 <- merge(daily_activity, bmi_df2, by = c ("id", "Date"))
colnames(bmi_daily2)
## [1] "id" "Date"
## [3] "total_steps" "total_distance"
## [5] "tracker_distance" "logged_activities_distance"
## [7] "very_active_distance" "moderately_active_distance"
## [9] "light_active_distance" "sedentary_active_distance"
## [11] "very_active_minutes" "fairly_active_minutes"
## [13] "lightly_active_minutes" "sedentary_minutes"
## [15] "calories" "weight_kg"
## [17] "weight_pounds" "bmi_overweight"
head(bmi_daily2)
## id Date total_steps total_distance tracker_distance
## 1 4319703577 2016-04-17 29 0.02 0.02
## 2 4319703577 2016-05-04 10429 7.02 7.02
## 3 4558609924 2016-04-18 8940 5.91 5.91
## 4 4558609924 2016-04-25 8095 5.35 5.35
## 5 4558609924 2016-05-01 3428 2.27 2.27
## 6 4558609924 2016-05-02 7891 5.22 5.22
## logged_activities_distance very_active_distance moderately_active_distance
## 1 0 0.00 0.00
## 2 0 0.59 0.58
## 3 0 0.98 0.93
## 4 0 0.59 0.25
## 5 0 0.00 0.00
## 6 0 0.00 0.00
## light_active_distance sedentary_active_distance very_active_minutes
## 1 0.02 0 0
## 2 5.85 0 8
## 3 4.00 0 14
## 4 4.51 0 18
## 5 2.27 0 0
## 6 5.22 0 0
## fairly_active_minutes lightly_active_minutes sedentary_minutes calories
## 1 0 3 1363 1464
## 2 13 313 1106 2282
## 3 15 331 1080 2116
## 4 10 340 993 2225
## 5 0 190 1121 1692
## 6 0 383 1057 2066
## weight_kg weight_pounds bmi_overweight
## 1 72.4 159.6147 27.45
## 2 72.3 159.3942 27.38
## 3 69.7 153.6622 27.25
## 4 70.3 154.9850 27.46
## 5 69.9 154.1031 27.32
## 6 69.2 152.5599 27.04
glimpse(bmi_daily2)
## Rows: 32
## Columns: 18
## $ id <dbl> 4319703577, 4319703577, 4558609924, 4558609…
## $ Date <date> 2016-04-17, 2016-05-04, 2016-04-18, 2016-0…
## $ total_steps <int> 29, 10429, 8940, 8095, 3428, 7891, 11451, 1…
## $ total_distance <dbl> 0.02, 7.02, 5.91, 5.35, 2.27, 5.22, 7.57, 9…
## $ tracker_distance <dbl> 0.02, 7.02, 5.91, 5.35, 2.27, 5.22, 7.57, 9…
## $ logged_activities_distance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ very_active_distance <dbl> 0.00, 0.59, 0.98, 0.59, 0.00, 0.00, 0.43, 5…
## $ moderately_active_distance <dbl> 0.00, 0.58, 0.93, 0.25, 0.00, 0.00, 1.62, 0…
## $ light_active_distance <dbl> 0.02, 5.85, 4.00, 4.51, 2.27, 5.22, 5.52, 2…
## $ sedentary_active_distance <dbl> 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0…
## $ very_active_minutes <int> 0, 8, 14, 18, 0, 0, 6, 200, 85, 108, 68, 94…
## $ fairly_active_minutes <int> 0, 13, 15, 10, 0, 0, 30, 37, 7, 18, 13, 29,…
## $ lightly_active_minutes <int> 3, 313, 331, 340, 190, 383, 339, 159, 312, …
## $ sedentary_minutes <int> 1363, 1106, 1080, 993, 1121, 1057, 1065, 52…
## $ calories <int> 1464, 2282, 2116, 2225, 1692, 2066, 2223, 4…
## $ weight_kg <dbl> 72.4, 72.3, 69.7, 70.3, 69.9, 69.2, 69.1, 9…
## $ weight_pounds <dbl> 159.6147, 159.3942, 153.6622, 154.9850, 154…
## $ bmi_overweight <dbl> 27.45, 27.38, 27.25, 27.46, 27.32, 27.04, 2…
bmi_daily3 <- merge(daily_activity, bmi_df3, by = c ("id", "Date"))
colnames(bmi_daily3)
## [1] "id" "Date"
## [3] "total_steps" "total_distance"
## [5] "tracker_distance" "logged_activities_distance"
## [7] "very_active_distance" "moderately_active_distance"
## [9] "light_active_distance" "sedentary_active_distance"
## [11] "very_active_minutes" "fairly_active_minutes"
## [13] "lightly_active_minutes" "sedentary_minutes"
## [15] "calories" "weight_kg"
## [17] "weight_pounds" "bmi_obese"
head(bmi_daily3)
## id Date total_steps total_distance tracker_distance
## 1 1927972279 2016-04-13 356 0.25 0.25
## logged_activities_distance very_active_distance moderately_active_distance
## 1 0 0 0
## light_active_distance sedentary_active_distance very_active_minutes
## 1 0.25 0 0
## fairly_active_minutes lightly_active_minutes sedentary_minutes calories
## 1 0 32 986 2151
## weight_kg weight_pounds bmi_obese
## 1 133.5 294.3171 47.54
glimpse(bmi_daily3)
## Rows: 1
## Columns: 18
## $ id <dbl> 1927972279
## $ Date <date> 2016-04-13
## $ total_steps <int> 356
## $ total_distance <dbl> 0.25
## $ tracker_distance <dbl> 0.25
## $ logged_activities_distance <dbl> 0
## $ very_active_distance <dbl> 0
## $ moderately_active_distance <dbl> 0
## $ light_active_distance <dbl> 0.25
## $ sedentary_active_distance <dbl> 0
## $ very_active_minutes <int> 0
## $ fairly_active_minutes <int> 0
## $ lightly_active_minutes <int> 32
## $ sedentary_minutes <int> 986
## $ calories <int> 2151
## $ weight_kg <dbl> 133.5
## $ weight_pounds <dbl> 294.3171
## $ bmi_obese <dbl> 47.54
n_distinct(bmi_daily1$id)
## [1] 3
n_distinct(bmi_daily2$id)
## [1] 4
n_distinct(bmi_daily3$id)
## [1] 1
### Take a look at some summary statics of bmi_daily1, bmi_daily2 and bmi_daily3
bmi_daily1 %>%
select(bmi_normal_weight, weight_kg, weight_pounds, bmi_normal_weight, total_steps, total_distance, very_active_minutes, fairly_active_minutes, lightly_active_minutes, sedentary_minutes, calories) %>%
summary()
## bmi_normal_weight weight_kg weight_pounds total_steps
## Min. :21.45 Min. :52.60 Min. :116.0 Min. : 1551
## 1st Qu.:23.89 1st Qu.:61.20 1st Qu.:134.9 1st Qu.: 6745
## Median :23.96 Median :61.40 Median :135.4 Median :10422
## Mean :23.80 Mean :60.76 Mean :134.0 Mean : 9984
## 3rd Qu.:24.10 3rd Qu.:61.70 3rd Qu.:136.0 3rd Qu.:12556
## Max. :24.39 Max. :62.50 Max. :137.8 Max. :20031
## total_distance very_active_minutes fairly_active_minutes
## Min. : 1.030 Min. : 0.00 Min. : 0.00
## 1st Qu.: 4.455 1st Qu.: 0.50 1st Qu.: 4.75
## Median : 6.890 Median :16.50 Median :15.00
## Mean : 6.698 Mean :22.47 Mean :18.12
## 3rd Qu.: 8.675 3rd Qu.:40.25 3rd Qu.:32.50
## Max. :13.240 Max. :62.00 Max. :42.00
## lightly_active_minutes sedentary_minutes calories
## Min. : 86.0 Min. : 127.0 Min. : 928
## 1st Qu.:214.2 1st Qu.: 637.5 1st Qu.:1851
## Median :267.5 Median : 693.0 Median :2030
## Mean :251.1 Mean : 683.6 Mean :1965
## 3rd Qu.:293.2 3rd Qu.: 728.8 3rd Qu.:2148
## Max. :371.0 Max. :1057.0 Max. :2571
bmi_daily2 %>%
select(bmi_overweight, weight_kg, weight_pounds, bmi_overweight, total_steps, total_distance, very_active_minutes, fairly_active_minutes, lightly_active_minutes, sedentary_minutes, calories) %>%
summary()
## bmi_overweight weight_kg weight_pounds total_steps
## Min. :25.14 Min. :69.10 Min. :152.3 Min. : 29
## 1st Qu.:25.43 1st Qu.:84.30 1st Qu.:185.8 1st Qu.:10622
## Median :25.56 Median :85.05 Median :187.5 Median :12608
## Mean :25.96 Mean :82.10 Mean :181.0 Mean :14720
## 3rd Qu.:26.01 3rd Qu.:85.42 3rd Qu.:188.3 3rd Qu.:20018
## Max. :28.00 Max. :90.70 Max. :200.0 Max. :29326
## total_distance very_active_minutes fairly_active_minutes
## Min. : 0.02 Min. : 0.00 Min. : 0.00
## 1st Qu.: 7.42 1st Qu.: 18.00 1st Qu.: 4.00
## Median : 8.94 Median : 65.00 Median : 8.00
## Mean :12.16 Mean : 58.72 Mean :10.66
## 3rd Qu.:18.14 3rd Qu.: 89.25 3rd Qu.:13.50
## Max. :26.72 Max. :200.00 Max. :37.00
## lightly_active_minutes sedentary_minutes calories
## Min. : 3.0 Min. : 525 Min. :1464
## 1st Qu.:212.8 1st Qu.:1064 1st Qu.:2594
## Median :226.5 Median :1112 Median :3466
## Mean :241.3 Mean :1088 Mean :3173
## 3rd Qu.:298.5 3rd Qu.:1150 3rd Qu.:3804
## Max. :429.0 Max. :1363 Max. :4552
bmi_daily3 %>%
select(bmi_obese, weight_kg, weight_pounds, bmi_obese, total_steps, total_distance, very_active_minutes, fairly_active_minutes, lightly_active_minutes, sedentary_minutes, calories) %>%
summary()
## bmi_obese weight_kg weight_pounds total_steps total_distance
## Min. :47.54 Min. :133.5 Min. :294.3 Min. :356 Min. :0.25
## 1st Qu.:47.54 1st Qu.:133.5 1st Qu.:294.3 1st Qu.:356 1st Qu.:0.25
## Median :47.54 Median :133.5 Median :294.3 Median :356 Median :0.25
## Mean :47.54 Mean :133.5 Mean :294.3 Mean :356 Mean :0.25
## 3rd Qu.:47.54 3rd Qu.:133.5 3rd Qu.:294.3 3rd Qu.:356 3rd Qu.:0.25
## Max. :47.54 Max. :133.5 Max. :294.3 Max. :356 Max. :0.25
## very_active_minutes fairly_active_minutes lightly_active_minutes
## Min. :0 Min. :0 Min. :32
## 1st Qu.:0 1st Qu.:0 1st Qu.:32
## Median :0 Median :0 Median :32
## Mean :0 Mean :0 Mean :32
## 3rd Qu.:0 3rd Qu.:0 3rd Qu.:32
## Max. :0 Max. :0 Max. :32
## sedentary_minutes calories
## Min. :986 Min. :2151
## 1st Qu.:986 1st Qu.:2151
## Median :986 Median :2151
## Mean :986 Mean :2151
## 3rd Qu.:986 3rd Qu.:2151
## Max. :986 Max. :2151
daily_bmi_activity <- merge(daily_activity, weight_log_info_new, by=c ('id', 'Date'), all = TRUE) %>%
drop_na() %>%
select(-tracker_distance)
options(repr.plot.width=30)
head(daily_bmi_activity)
## id Date total_steps total_distance logged_activities_distance
## 1 1503960366 2016-05-02 14727 9.71 0
## 2 1503960366 2016-05-03 15103 9.66 0
## 3 1927972279 2016-04-13 356 0.25 0
## 4 2873212765 2016-04-21 8859 5.98 0
## 5 2873212765 2016-05-12 7566 5.11 0
## 6 4319703577 2016-04-17 29 0.02 0
## very_active_distance moderately_active_distance light_active_distance
## 1 3.21 0.57 5.92
## 2 3.73 1.05 4.88
## 3 0.00 0.00 0.25
## 4 0.13 0.37 5.47
## 5 0.00 0.00 5.11
## 6 0.00 0.00 0.02
## sedentary_active_distance very_active_minutes fairly_active_minutes
## 1 0.00 41 15
## 2 0.00 50 24
## 3 0.00 0 0
## 4 0.01 2 10
## 5 0.00 0 0
## 6 0.00 0 0
## lightly_active_minutes sedentary_minutes calories weight_kg weight_pounds
## 1 277 798 2004 52.6 115.9631
## 2 254 816 1990 52.6 115.9631
## 3 32 986 2151 133.5 294.3171
## 4 371 1057 1970 56.7 125.0021
## 5 268 720 1431 57.3 126.3249
## 6 3 1363 1464 72.4 159.6147
## bmi
## 1 22.65
## 2 22.65
## 3 47.54
## 4 21.45
## 5 21.69
## 6 27.45
n_distinct(daily_bmi_activity$id)
## [1] 8
### Check to see if all the NAs are removed
sum(is.na(daily_bmi_activity))
## [1] 0
summary(daily_bmi_activity)
## id Date total_steps total_distance
## Min. :1.504e+09 Min. :2016-04-12 Min. : 29 Min. : 0.020
## 1st Qu.:6.962e+09 1st Qu.:2016-04-19 1st Qu.: 8477 1st Qu.: 5.945
## Median :6.962e+09 Median :2016-04-27 Median :11101 Median : 8.110
## Mean :7.009e+09 Mean :2016-04-26 Mean :12102 Mean : 9.211
## 3rd Qu.:8.878e+09 3rd Qu.:2016-05-04 3rd Qu.:14996 3rd Qu.: 9.710
## Max. :8.878e+09 Max. :2016-05-12 Max. :29326 Max. :26.720
## logged_activities_distance very_active_distance moderately_active_distance
## Min. :0.0000 Min. : 0.000 Min. :0.000
## 1st Qu.:0.0000 1st Qu.: 0.450 1st Qu.:0.115
## Median :0.0000 Median : 1.770 Median :0.380
## Mean :0.1498 Mean : 3.758 Mean :0.651
## 3rd Qu.:0.0000 3rd Qu.: 4.095 3rd Qu.:0.990
## Max. :4.0817 Max. :21.660 Max. :2.390
## light_active_distance sedentary_active_distance very_active_minutes
## Min. : 0.020 Min. :0.000000 Min. : 0.00
## 1st Qu.: 3.725 1st Qu.:0.000000 1st Qu.: 7.50
## Median : 4.890 Median :0.000000 Median : 29.00
## Mean : 4.782 Mean :0.004776 Mean : 39.45
## 3rd Qu.: 5.870 3rd Qu.:0.000000 3rd Qu.: 63.00
## Max. :10.710 Max. :0.110000 Max. :200.00
## fairly_active_minutes lightly_active_minutes sedentary_minutes calories
## Min. : 0.00 Min. : 3.0 Min. : 127.0 Min. : 928
## 1st Qu.: 4.00 1st Qu.:212.5 1st Qu.: 686.0 1st Qu.:1998
## Median :12.00 Median :235.0 Median : 837.0 Median :2174
## Mean :14.28 Mean :243.1 Mean : 881.4 Mean :2545
## 3rd Qu.:21.50 3rd Qu.:296.0 3rd Qu.:1105.0 3rd Qu.:3258
## Max. :42.00 Max. :429.0 Max. :1363.0 Max. :4552
## weight_kg weight_pounds bmi
## Min. : 52.60 Min. :116.0 Min. :21.45
## 1st Qu.: 61.40 1st Qu.:135.4 1st Qu.:23.96
## Median : 62.50 Median :137.8 Median :24.39
## Mean : 72.04 Mean :158.8 Mean :25.19
## 3rd Qu.: 85.05 3rd Qu.:187.5 3rd Qu.:25.56
## Max. :133.50 Max. :294.3 Max. :47.54
daily_bmi_activity <- daily_bmi_activity %>%
mutate(total_daily_active_hours=(very_active_minutes + fairly_active_minutes + lightly_active_minutes)/60)
colnames(daily_bmi_activity)
## [1] "id" "Date"
## [3] "total_steps" "total_distance"
## [5] "logged_activities_distance" "very_active_distance"
## [7] "moderately_active_distance" "light_active_distance"
## [9] "sedentary_active_distance" "very_active_minutes"
## [11] "fairly_active_minutes" "lightly_active_minutes"
## [13] "sedentary_minutes" "calories"
## [15] "weight_kg" "weight_pounds"
## [17] "bmi" "total_daily_active_hours"
head(daily_bmi_activity)
## id Date total_steps total_distance logged_activities_distance
## 1 1503960366 2016-05-02 14727 9.71 0
## 2 1503960366 2016-05-03 15103 9.66 0
## 3 1927972279 2016-04-13 356 0.25 0
## 4 2873212765 2016-04-21 8859 5.98 0
## 5 2873212765 2016-05-12 7566 5.11 0
## 6 4319703577 2016-04-17 29 0.02 0
## very_active_distance moderately_active_distance light_active_distance
## 1 3.21 0.57 5.92
## 2 3.73 1.05 4.88
## 3 0.00 0.00 0.25
## 4 0.13 0.37 5.47
## 5 0.00 0.00 5.11
## 6 0.00 0.00 0.02
## sedentary_active_distance very_active_minutes fairly_active_minutes
## 1 0.00 41 15
## 2 0.00 50 24
## 3 0.00 0 0
## 4 0.01 2 10
## 5 0.00 0 0
## 6 0.00 0 0
## lightly_active_minutes sedentary_minutes calories weight_kg weight_pounds
## 1 277 798 2004 52.6 115.9631
## 2 254 816 1990 52.6 115.9631
## 3 32 986 2151 133.5 294.3171
## 4 371 1057 1970 56.7 125.0021
## 5 268 720 1431 57.3 126.3249
## 6 3 1363 1464 72.4 159.6147
## bmi total_daily_active_hours
## 1 22.65 5.5500000
## 2 22.65 5.4666667
## 3 47.54 0.5333333
## 4 21.45 6.3833333
## 5 21.69 4.4666667
## 6 27.45 0.0500000
glimpse(daily_bmi_activity)
## Rows: 67
## Columns: 18
## $ id <dbl> 1503960366, 1503960366, 1927972279, 2873212…
## $ Date <date> 2016-05-02, 2016-05-03, 2016-04-13, 2016-0…
## $ total_steps <int> 14727, 15103, 356, 8859, 7566, 29, 10429, 8…
## $ total_distance <dbl> 9.71, 9.66, 0.25, 5.98, 5.11, 0.02, 7.02, 5…
## $ logged_activities_distance <dbl> 0.000000, 0.000000, 0.000000, 0.000000, 0.0…
## $ very_active_distance <dbl> 3.21, 3.73, 0.00, 0.13, 0.00, 0.00, 0.59, 0…
## $ moderately_active_distance <dbl> 0.57, 1.05, 0.00, 0.37, 0.00, 0.00, 0.58, 0…
## $ light_active_distance <dbl> 5.92, 4.88, 0.25, 5.47, 5.11, 0.02, 5.85, 4…
## $ sedentary_active_distance <dbl> 0.00, 0.00, 0.00, 0.01, 0.00, 0.00, 0.00, 0…
## $ very_active_minutes <int> 41, 50, 0, 2, 0, 0, 8, 14, 18, 0, 0, 6, 200…
## $ fairly_active_minutes <int> 15, 24, 0, 10, 0, 0, 13, 15, 10, 0, 0, 30, …
## $ lightly_active_minutes <int> 277, 254, 32, 371, 268, 3, 313, 331, 340, 1…
## $ sedentary_minutes <int> 798, 816, 986, 1057, 720, 1363, 1106, 1080,…
## $ calories <int> 2004, 1990, 2151, 1970, 1431, 1464, 2282, 2…
## $ weight_kg <dbl> 52.6, 52.6, 133.5, 56.7, 57.3, 72.4, 72.3, …
## $ weight_pounds <dbl> 115.9631, 115.9631, 294.3171, 125.0021, 126…
## $ bmi <dbl> 22.65, 22.65, 47.54, 21.45, 21.69, 27.45, 2…
## $ total_daily_active_hours <dbl> 5.5500000, 5.4666667, 0.5333333, 6.3833333,…
data_bmi_type <- daily_bmi_activity %>%
summarise(bmi_type= factor(case_when( bmi>=18.5 & bmi <= 24.9 ~ 'normal_weight_bmi', bmi>=25 & bmi <= 29.9 ~ 'overweight_bmi', bmi>=30.0 ~ 'obese_bmi'), levels = c('normal_weight_bmi', 'overweight_bmi', 'obese_bmi')), .group=id, calories, total_steps, very_active_minutes, fairly_active_minutes, lightly_active_minutes, sedentary_minutes, total_daily_active_hours) %>%
drop_na()
colnames(data_bmi_type)
## [1] "bmi_type" ".group"
## [3] "calories" "total_steps"
## [5] "very_active_minutes" "fairly_active_minutes"
## [7] "lightly_active_minutes" "sedentary_minutes"
## [9] "total_daily_active_hours"
head(data_bmi_type)
## bmi_type .group calories total_steps very_active_minutes
## 1 normal_weight_bmi 1503960366 2004 14727 41
## 2 normal_weight_bmi 1503960366 1990 15103 50
## 3 obese_bmi 1927972279 2151 356 0
## 4 normal_weight_bmi 2873212765 1970 8859 2
## 5 normal_weight_bmi 2873212765 1431 7566 0
## 6 overweight_bmi 4319703577 1464 29 0
## fairly_active_minutes lightly_active_minutes sedentary_minutes
## 1 15 277 798
## 2 24 254 816
## 3 0 32 986
## 4 10 371 1057
## 5 0 268 720
## 6 0 3 1363
## total_daily_active_hours
## 1 5.5500000
## 2 5.4666667
## 3 0.5333333
## 4 6.3833333
## 5 4.4666667
## 6 0.0500000
glimpse(data_bmi_type)
## Rows: 67
## Columns: 9
## $ bmi_type <fct> normal_weight_bmi, normal_weight_bmi, obese_b…
## $ .group <dbl> 1503960366, 1503960366, 1927972279, 287321276…
## $ calories <int> 2004, 1990, 2151, 1970, 1431, 1464, 2282, 211…
## $ total_steps <int> 14727, 15103, 356, 8859, 7566, 29, 10429, 894…
## $ very_active_minutes <int> 41, 50, 0, 2, 0, 0, 8, 14, 18, 0, 0, 6, 200, …
## $ fairly_active_minutes <int> 15, 24, 0, 10, 0, 0, 13, 15, 10, 0, 0, 30, 37…
## $ lightly_active_minutes <int> 277, 254, 32, 371, 268, 3, 313, 331, 340, 190…
## $ sedentary_minutes <int> 798, 816, 986, 1057, 720, 1363, 1106, 1080, 9…
## $ total_daily_active_hours <dbl> 5.5500000, 5.4666667, 0.5333333, 6.3833333, 4…
### In Boxplots
ggplot(data_bmi_type, aes(bmi_type, total_steps, fill= bmi_type )) +
geom_boxplot() +
theme(legend.position="none") +
labs(title="Figure 8: Daily Step Counts Accrued by Each BMI Type", x=NULL) +
theme(legend.position="none", text = element_text(size = 11), plot.title = element_text(hjust = 0.5))
ggplot(data_bmi_type, aes(bmi_type , calories, fill= bmi_type )) +
geom_boxplot() +
theme(legend.position="none") +
labs(title="Figure 9: Daily Calories Burnt by Each BMI Type", x=NULL) +
theme(legend.position="none", text = element_text(size = 11), plot.title = element_text(hjust = 0.5))
In these two plots, both the normal_weight_bmi and overwight_bmi users take about 1,000 or more per day as suggested by the WHO. For the normal_weight_bmi users, the distributions of Calories and total steps taken are similar, <50% of normal_weight_bmi users take >1000 steps and burn >2,000 Calories.
However, for the overwight_bmi users, the distributions of step counts and calories burnt contradict each other, with >50% of overwight_bmi users taking >1,100 steps a day, but < 50% of them burnt >3,500 Calories. One would expect the distributions of step counts and Calories burnt to be similar since accumulating more steps burn more Calories. This brings up the question of whether the amount of activity time and/or the level of intensity of activity have more effect on the number of calories burnt. Next, will take a closer look at this by creating of boxplot of the total_daily_active hours spent for each BMI type.
### Total daily hours spent on overall activities for each BMI type
ggplot(data_bmi_type, aes(bmi_type, total_daily_active_hours, fill= bmi_type )) +
geom_boxplot() +
theme(legend.position="none") +
labs(title="Figure 10: Daily Overall Activity Hours Spent by Each BMI Type", x=NULL) +
theme(legend.position="none", text = element_text(size = 11), plot.title = element_text(hjust = 0.5))
In this boxplot, the normal_weight_bmi users are very consistent in the distributions in Calories, total_steps and total_daily_active_hours , with >50% of them burning <2,000 Calories (Figure 9), taking <1,000 total_steps (Figure 8) and spending < 5.5 total_daily_active_hours (Figure 10), respectively.
For the overwight_bmi users, the distributions of users in total_daily_active_hours are about 50%, above and below 5.5 total_daily_active_hours. These observations do not account for the reverse distributions of total steps (Figure 8) and calories burnt (Figure 9). So, we’ll use bar plots to examine the actual step counts, calories burnt, and overall activity hours.
### Daily step counts accrued by Each bmi type in Bar_plots
data_bmi_type %>%
group_by(bmi_type) %>%
summarise(avg_total_steps = mean(total_steps)) %>%
ggplot(aes(bmi_type, y=avg_total_steps, fill= bmi_type)) +
geom_col()+
theme(legend.position="none") +
labs(title="Figure 11: Daily Steps Accrued by each BMI Type", x=NULL) +
theme(legend.position="none", text = element_text(size = 11),plot.title = element_text(hjust = 0.5))
### Daily calories burnt by Each bmi type in Bar_plots
data_bmi_type %>%
group_by(bmi_type) %>%
summarise(avg_calories = mean(calories)) %>%
ggplot(aes(bmi_type, y=avg_calories, fill= bmi_type)) +
geom_col()+
theme(legend.position="none") +
labs(title="Figure 12: Daily Calories Burnt by each BMI Type", x=NULL) +
theme(legend.position="none", text = element_text(size = 11),plot.title = element_text(hjust = 0.5))
### Daily Hours Spent on Overal Activities by each bmi type in Bar_plots
data_bmi_type %>%
group_by(bmi_type) %>%
summarise(avg_total_daily_active_hours = mean(total_daily_active_hours)) %>%
ggplot(aes(bmi_type,y=avg_total_daily_active_hours, fill=bmi_type)) +
geom_col() +
theme(legend.position="none") +
labs(title="Figure 13: Daily Overall Activity Hours by Each BMI Type", x=NULL) +
theme(legend.position="none", text = element_text(size = 11),plot.title = element_text(hjust = 0.5))
Figure 11 clearly shows that the over_weight_bmi users accrued ~30% more step counts relative to the normal_weight_bmi users, up to a maximum of ~15,000 daily steps, which is above the recommended steps of 7,000 to 10,000 depending on age and sex. This could be one of the reasons that the overweight_bmi users burnt ~30% more calories than the normal_weight_bmi users (Figure 12), reaching > 3,000 calories/day. While this amount of calories burnt is also above the recommended calories of 1,600 -2,400 for adult women and 2,000 - 3,000 per day for adult men, still doesn’t explain the reverse distributions of users in step counts and calories burnt observed in Figure 8 and 9, respectively.
Figure 13 shows the overweight_bmi users spent <30 min more in activity hours than the normal_weight_bmi users,but it still does not explain the reverse user distributions in Figures 8 and 9. So, we’ll do a more detailed analysis using group bar plots.
One striking observation from Figures 11-13 is that the obese_bmi user spends a total of only ~32 minutes on lightly_active activities but was able to consume 2,000 calories, whereas the normal_weight_bmi users have to spend ~5 activity hours to burn the same calories. One likely explanation is that the obese_bmi user carries a lot more weight. The extra weights carried by the obese_bmi user compared to the normal_weight_bmi and bmi_overwight users are 160 lb/73 kg and 113 lb/51 kg, respectively. This is similar to one carrying extra loads of this amount when performing any type of activity, thus burning calories at a higher rate even with very_lightly activities.
The same might be partly true when comparing the overwight_bmi users with the normal_weight_bmi users, depending on if both types of users performed exactly the same type of activities. Therefore, not taking into a user’s weight and time engaged in each type of activity, the data might give the apparent effectiveness of lightly active activities in consuming calories at first glance.
### Calculate the average daily hours spent on each type of activities
data_bmi_type_long <- data_bmi_type %>%
group_by(bmi_type) %>%
summarise(very_active = mean(very_active_minutes)/60, fairly_active = mean(fairly_active_minutes)/60, lightly_active = mean(lightly_active_minutes)/60, sedentary = mean(sedentary_minutes)/60) %>%
pivot_longer(data_bmi_type, cols= (very_active:lightly_active), names_to = "Activity_Type", values_to ="Hours")
## Warning in gsub(vec_paste0("^", names_prefix), "", cols): argument 'pattern' has
## length > 1 and only the first element will be used
head(data_bmi_type_long)
## # A tibble: 6 × 4
## bmi_type sedentary Activity_Type Hours
## <fct> <dbl> <chr> <dbl>
## 1 normal_weight_bmi 11.4 very_active 0.375
## 2 normal_weight_bmi 11.4 fairly_active 0.302
## 3 normal_weight_bmi 11.4 lightly_active 4.18
## 4 overweight_bmi 18.1 very_active 0.979
## 5 overweight_bmi 18.1 fairly_active 0.178
## 6 overweight_bmi 18.1 lightly_active 4.02
glimpse(data_bmi_type_long)
## Rows: 9
## Columns: 4
## $ bmi_type <fct> normal_weight_bmi, normal_weight_bmi, normal_weight_bmi,…
## $ sedentary <dbl> 11.39363, 11.39363, 11.39363, 18.13958, 18.13958, 18.139…
## $ Activity_Type <chr> "very_active", "fairly_active", "lightly_active", "very_…
## $ Hours <dbl> 0.3745098, 0.3019608, 4.1843137, 0.9786458, 0.1776042, 4…
colnames(data_bmi_type_long)
## [1] "bmi_type" "sedentary" "Activity_Type" "Hours"
data_bmi_type_long
## # A tibble: 9 × 4
## bmi_type sedentary Activity_Type Hours
## <fct> <dbl> <chr> <dbl>
## 1 normal_weight_bmi 11.4 very_active 0.375
## 2 normal_weight_bmi 11.4 fairly_active 0.302
## 3 normal_weight_bmi 11.4 lightly_active 4.18
## 4 overweight_bmi 18.1 very_active 0.979
## 5 overweight_bmi 18.1 fairly_active 0.178
## 6 overweight_bmi 18.1 lightly_active 4.02
## 7 obese_bmi 16.4 very_active 0
## 8 obese_bmi 16.4 fairly_active 0
## 9 obese_bmi 16.4 lightly_active 0.533
### Compare the daily hours each type of MBI user spends on each type of activity.
ggplot(data_bmi_type_long, aes(bmi_type, y = Hours, x = bmi_type, fill= Activity_Type)) +
geom_bar(position = "dodge", stat="identity") +
labs(title= "Figure 14: Comparing Daily Very_, Fairly_ and Lightly_ Active Hours")
### Create another data frame to include time spent on sedentary activities
data_bmi_type_long_all <- data_bmi_type %>%
group_by(bmi_type) %>%
summarise(very_active = mean(very_active_minutes)/60, fairly_active = mean(fairly_active_minutes)/60, lightly_active = mean(lightly_active_minutes)/60, sedentary = mean(sedentary_minutes)/60) %>%
pivot_longer(data_bmi_type, cols= (very_active:sedentary), names_to = "Activity_Type", values_to ="Hours")
## Warning in gsub(vec_paste0("^", names_prefix), "", cols): argument 'pattern' has
## length > 1 and only the first element will be used
head(data_bmi_type_long_all)
## # A tibble: 6 × 3
## bmi_type Activity_Type Hours
## <fct> <chr> <dbl>
## 1 normal_weight_bmi very_active 0.375
## 2 normal_weight_bmi fairly_active 0.302
## 3 normal_weight_bmi lightly_active 4.18
## 4 normal_weight_bmi sedentary 11.4
## 5 overweight_bmi very_active 0.979
## 6 overweight_bmi fairly_active 0.178
glimpse(data_bmi_type_long_all)
## Rows: 12
## Columns: 3
## $ bmi_type <fct> normal_weight_bmi, normal_weight_bmi, normal_weight_bmi,…
## $ Activity_Type <chr> "very_active", "fairly_active", "lightly_active", "seden…
## $ Hours <dbl> 0.3745098, 0.3019608, 4.1843137, 11.3936275, 0.9786458, …
colnames(data_bmi_type_long_all)
## [1] "bmi_type" "Activity_Type" "Hours"
data_bmi_type_long_all
## # A tibble: 12 × 3
## bmi_type Activity_Type Hours
## <fct> <chr> <dbl>
## 1 normal_weight_bmi very_active 0.375
## 2 normal_weight_bmi fairly_active 0.302
## 3 normal_weight_bmi lightly_active 4.18
## 4 normal_weight_bmi sedentary 11.4
## 5 overweight_bmi very_active 0.979
## 6 overweight_bmi fairly_active 0.178
## 7 overweight_bmi lightly_active 4.02
## 8 overweight_bmi sedentary 18.1
## 9 obese_bmi very_active 0
## 10 obese_bmi fairly_active 0
## 11 obese_bmi lightly_active 0.533
## 12 obese_bmi sedentary 16.4
### How each type of BMI users spend their daily hours on each activity type, and sedentariness
ggplot(data_bmi_type_long_all, aes(bmi_type, y = Hours, x = bmi_type, fill= Activity_Type)) +
geom_bar(position = "dodge", stat="identity") +
labs(title= "Fiure 15: Comparing Various Daily Activity and Sedentary Time")
Figures 14-15 Analysis
Figures 14 and 15 clearly show that the overweight_bmi users spent >2x the amount of time on very_active activities, and thus burnt calories at a higher rate. This and the fact that overweight_bmi users carry more weight (on average, 47 lb/23 kg) than the normal_weght_users could be reasons why the overweight_bmi users burnt more calories than the normal_weight_bmi users (as explained above in Figures 11-13 Analysis).
This observation could also account for the reverse correlation observed in Step counts (Figure 8) and calories burnt (Figure 9). It’s because the overweight_bmi users spent more lightly_active time to accrue step counts and thus burnt calories at a lower rate.
This creates inconsistent sedentary hours among users. So, we’ll focus on the percent total daily activity hours each type of BMI user spends on each very_, farily_ and light_active activities.
### Calculate percent daily activity time spent on very_, farily_ and lightly_ active ativities.
data_bmi_type_long_all_percent <- data_bmi_type %>%
group_by(bmi_type) %>%
summarise(percent_very_active = mean(very_active_minutes)*100/(mean(very_active_minutes) +
mean(fairly_active_minutes) +
mean(lightly_active_minutes)),
percent_fairly_active = mean(fairly_active_minutes)*100/(mean(very_active_minutes) +
mean(fairly_active_minutes) +
mean(lightly_active_minutes)) ,
percent_lightly_active = mean(lightly_active_minutes)*100/(mean(very_active_minutes) +
mean(fairly_active_minutes) +
mean(lightly_active_minutes))) %>%
pivot_longer(data_bmi_type, cols= (percent_very_active:percent_lightly_active), names_to = "Percent_Activity_Type", values_to ="Percent")
## Warning in gsub(vec_paste0("^", names_prefix), "", cols): argument 'pattern' has
## length > 1 and only the first element will be used
colnames(data_bmi_type_long_all_percent)
## [1] "bmi_type" "Percent_Activity_Type" "Percent"
head(data_bmi_type_long_all_percent)
## # A tibble: 6 × 3
## bmi_type Percent_Activity_Type Percent
## <fct> <chr> <dbl>
## 1 normal_weight_bmi percent_very_active 7.70
## 2 normal_weight_bmi percent_fairly_active 6.21
## 3 normal_weight_bmi percent_lightly_active 86.1
## 4 overweight_bmi percent_very_active 18.9
## 5 overweight_bmi percent_fairly_active 3.43
## 6 overweight_bmi percent_lightly_active 77.7
glimpse(data_bmi_type_long_all_percent)
## Rows: 9
## Columns: 3
## $ bmi_type <fct> normal_weight_bmi, normal_weight_bmi, normal_wei…
## $ Percent_Activity_Type <chr> "percent_very_active", "percent_fairly_active", …
## $ Percent <dbl> 7.704720, 6.212182, 86.083098, 18.899618, 3.4298…
data_bmi_type_long_all_percent
## # A tibble: 9 × 3
## bmi_type Percent_Activity_Type Percent
## <fct> <chr> <dbl>
## 1 normal_weight_bmi percent_very_active 7.70
## 2 normal_weight_bmi percent_fairly_active 6.21
## 3 normal_weight_bmi percent_lightly_active 86.1
## 4 overweight_bmi percent_very_active 18.9
## 5 overweight_bmi percent_fairly_active 3.43
## 6 overweight_bmi percent_lightly_active 77.7
## 7 obese_bmi percent_very_active 0
## 8 obese_bmi percent_fairly_active 0
## 9 obese_bmi percent_lightly_active 100
### Percent activity time each BMI type users spend on very, fairly and lightly active activities
ggplot(data_bmi_type_long_all_percent, aes(bmi_type, y = Percent, x = bmi_type, fill= Percent_Activity_Type)) +
geom_bar(position = "dodge", stat="identity") +
labs(title = "Figure 16: Percent Time on Various Activity Intensities")
Figure 16 Analysis
In this Figure, it’s obvious that the overweight_bmi users spent ~20% of activity time on very_active activities, whereas the normal_weight_bmi users spent < 8% time on very_active activities and thus burnt calories at a lower rate. For lightly_active activities, the normal_weight_bmi users spent a bit more (>85%) times on them, compared to ~78% time that the overweight_bmi users spent. The small increase in lightly_active hours is not enough to compensate for the calories consumed by spending a short time on very_active activities. It’s even more clear in this Figure that the obese_bmi user spent 100% of activity time on lightly_active activities. This is not surprising considering the huge extra weight on the body that consumes calories at a much higher rate.
```
In this Act Phase the main tasks are to:
Based on the insights discussed above in the Share and Analysis sections, Bellabeat is strongly encouraged to do the following:
Data’s age, size, duration of measurements, sampling bias, and lack of information on age, and ethnicity need improvement as suggested below:
Collect more recent, reliable with longer duration - e.g. make data collection automatic for at least 6 months and stored in the Bellabeat server for all registered users.
Make it mandatory for registered users to record their age, sex (if not female), and ethnicity so Bellabeat could better advise users on their health and activity needs based on this information.
Mandate users to measure and record measurements by attaching a photo of the readings on their weight, and height, along with waist and hip circumference that would be saved in the Bellabeat server. This would allow Belllbeat to determine users’ BMI and the possibility of abdominal adiposity for predicting cardiovascular disease risk.
Lacking blood pressure data - Bellabeat should be able to register these data automatically like other health trackers and automatically save them. This is especially important for users with a history of high blood pressure, including some pregnant women since high blood pressure might be a sign of preeclampsia that requires a Bed Rest prescription.
Possible data bias: Bellabeat only collect data from users who could afford the device, which lack inclusiveness and introduce bias to the data. To circumvent this, Bellabeat could run a 6-month or longer campaign by recruiting volunteers to use the device and collect data similar to that of paid users. Simultaneously, Bellabeat would help them to be aware of the availability of the technology and its advantages, and the possibility of marketing a less fancy version that costs less to those who cannot afford to purchase one. Alternatively, Bellabeat could create a low-cost rental program that can also be marketed to clinics as a prescription for dangerously overweight patients.
Most users spend the majority of time on lightly_active activities and little or none on very_active activities. Since a recent study shows that one can significantly reduce the risk of developing cardiovascular diseases by doing at least moderate-intensity exercises, Bellabeat could alert users whose BMI and/or abdominal adiposity data are at risk to include some intense exercise in their daily schedule.
Many users did not record sleep and weight data. Some appear to combine sleep data with sedentary time and ended with a total of 24 hours when summing sedentary time and total activity time. It’s likely that some forgot to wear the device to bed or were not aware of the significance of wearing them all day. To mitigate the lack of sleep and weight data, it’s recommended that Bellabeat do the following:
Clearly define Sedentary time - e.g. does it include sleep time? or what’s considered sedentary time, when no step is accrued? The inconsistency of sedentary time data makes it difficult to analyze the data and creates a false appearance that most users spend too much time being inactive.
Create short videos in various languages to show users what data are important to input for a healthy fitness program. For example, recommendations for the amount of daily sleep time, step counts, calorie counts, as well as their BMI and abdominal adiposity risk based on the data they entered.
BMI analysis shows that all users who record the weight_log_info data (including all normal_weight, overweight, and obese bmi users) are those who care about their health the most, compared to the average users analyzed. BMI users are observed to accrue more daily steps, achieved more calories burn, and spent more time in activities, especially in very_active ones. Therefore, it’s recommended that Bellabeat to do:
Make recording the weight_log_info data mandatory so that the data are more reliable - since <30% of users registered their data in this dataset. Also, require all users to include waist and hip circumference measurements in the weight_log_info dataset.
Market the advantage of Bellabeat’s ability to use weight_log_info data to alert users if they are at risk of cardiovascular disease based on their weight_log_info data, and give compliments to those whose weight_log_info data are normal.