Bellabest leaf tracker
To gain insight into how consumers use non-Bellabeat smart devices, and apply these insights to one Bellabeat product.
Bigdata
The FitBit Fitness Tracker Data provided by Mobius was recommended by the cofounders. This dataset is stored on the Kaggle platform. It is contains 17 Excel files that are available for download and in a variety of formats including narrow and wide formats for calories and steps per minute.
The FitBit Fitness Tracker is: * a publicly available dataset with a CCO Public Domain designation * is available on the Kaggle platform which is an open-source platform
The data was obtained from people who responded to a survey that was distributed by Amazon Mechanical Turk between March 12 and May 12, 2016. The data for 30 consenting consumers were submitted to this dataset.
Reliability: Without knowing how the 30 people who
chose to submit their health data for this survey were selected, the
reliability of the data will be questionable. Random selection is
important in data analysis and if this was not done then there may be a
selection bias in the data.
Originality: A link to the original dataset was
provided on the Kaggle website and it showed the original data source on
Zenodo.org https://zenodo.org/record/53894#.Y2v1l-TMJPZ
Comprehensiveness: Gender is not specified. Bellabeats
consumers are women and the Fitbit dataset does not specify the gender
of the consumers in the dataset. A dataset of female consumers of
health-related smart devices would be more applicable to the business
task at hand.
Current: The data was collected in 2016.
Cited: The citation for the datasource is Furberg, R.,
Brinton, J., Keating, M., & Ortiz, A. (2016). Crowd-sourced Fitbit
datasets 03.12.2016-05.12.2016 [Data set]. Zenodo. https://doi.org/10.5281/zenodo.53894
Overall, the dataset comes from an original, currently cited data source.
Load packages-tidyverse, skimr, & janitor
install.packages("tidyverse")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("skimr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("janitor")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
Loading packages
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(skimr)
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
Used R to process the data.The following data cleaning techniques were used: Created dataframes for a) sleep activity, b) day sleep, c) weight log, d) hourly calories,and e) daily calories
sleepactivity_df <- read_csv("minuteSleep_merged.csv")
## Rows: 188521 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): date
## dbl (3): Id, value, logId
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
sleepday <-read_csv("sleepDay_merged.csv")
## Rows: 413 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): SleepDay
## dbl (4): Id, TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
weightlog<-read_csv("weightLogInfo_merged.csv")
## Rows: 67 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Date
## dbl (6): Id, WeightKg, WeightPounds, Fat, BMI, LogId
## lgl (1): IsManualReport
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
hourlycalories <- read_csv("hourlyCalories_merged.csv")
## Rows: 22099 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityHour
## dbl (2): Id, Calories
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dailycalories <- read_csv("dailyCalories_merged.csv")
## Rows: 940 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDay
## dbl (2): Id, Calories
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Used the head(), str(), colname(), and glimpse() functions to analyze all 5 dataframes. Head, colname, and glimpse functions are displayed below
head(dailycalories)
## # A tibble: 6 × 3
## Id ActivityDay Calories
## <dbl> <chr> <dbl>
## 1 1503960366 4/12/2016 1985
## 2 1503960366 4/13/2016 1797
## 3 1503960366 4/14/2016 1776
## 4 1503960366 4/15/2016 1745
## 5 1503960366 4/16/2016 1863
## 6 1503960366 4/17/2016 1728
head(hourlycalories)
## # A tibble: 6 × 3
## Id ActivityHour Calories
## <dbl> <chr> <dbl>
## 1 1503960366 4/12/2016 12:00:00 AM 81
## 2 1503960366 4/12/2016 1:00:00 AM 61
## 3 1503960366 4/12/2016 2:00:00 AM 59
## 4 1503960366 4/12/2016 3:00:00 AM 47
## 5 1503960366 4/12/2016 4:00:00 AM 48
## 6 1503960366 4/12/2016 5:00:00 AM 48
head(sleepactivity_df)
## # A tibble: 6 × 4
## Id date value logId
## <dbl> <chr> <dbl> <dbl>
## 1 1503960366 4/12/2016 2:47:30 AM 3 11380564589
## 2 1503960366 4/12/2016 2:48:30 AM 2 11380564589
## 3 1503960366 4/12/2016 2:49:30 AM 1 11380564589
## 4 1503960366 4/12/2016 2:50:30 AM 1 11380564589
## 5 1503960366 4/12/2016 2:51:30 AM 1 11380564589
## 6 1503960366 4/12/2016 2:52:30 AM 1 11380564589
head(sleepday)
## # A tibble: 6 × 5
## Id SleepDay TotalSleepRecords TotalMinutesAsleep TotalT…¹
## <dbl> <chr> <dbl> <dbl> <dbl>
## 1 1503960366 4/12/2016 12:00:00 AM 1 327 346
## 2 1503960366 4/13/2016 12:00:00 AM 2 384 407
## 3 1503960366 4/15/2016 12:00:00 AM 1 412 442
## 4 1503960366 4/16/2016 12:00:00 AM 2 340 367
## 5 1503960366 4/17/2016 12:00:00 AM 1 700 712
## 6 1503960366 4/19/2016 12:00:00 AM 1 304 320
## # … with abbreviated variable name ¹TotalTimeInBed
head(weightlog)
## # A tibble: 6 × 8
## Id Date WeightKg Weight…¹ Fat BMI IsMan…² LogId
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <lgl> <dbl>
## 1 1503960366 5/2/2016 11:59:59 PM 52.6 116. 22 22.6 TRUE 1.46e12
## 2 1503960366 5/3/2016 11:59:59 PM 52.6 116. NA 22.6 TRUE 1.46e12
## 3 1927972279 4/13/2016 1:08:52 AM 134. 294. NA 47.5 FALSE 1.46e12
## 4 2873212765 4/21/2016 11:59:59 PM 56.7 125. NA 21.5 TRUE 1.46e12
## 5 2873212765 5/12/2016 11:59:59 PM 57.3 126. NA 21.7 TRUE 1.46e12
## 6 4319703577 4/17/2016 11:59:59 PM 72.4 160. 25 27.5 TRUE 1.46e12
## # … with abbreviated variable names ¹WeightPounds, ²IsManualReport
colnames(dailycalories)
## [1] "Id" "ActivityDay" "Calories"
colnames(hourlycalories)
## [1] "Id" "ActivityHour" "Calories"
colnames(sleepactivity_df)
## [1] "Id" "date" "value" "logId"
colnames(sleepday)
## [1] "Id" "SleepDay" "TotalSleepRecords"
## [4] "TotalMinutesAsleep" "TotalTimeInBed"
colnames(weightlog)
## [1] "Id" "Date" "WeightKg" "WeightPounds"
## [5] "Fat" "BMI" "IsManualReport" "LogId"
glimpse(dailycalories)
## Rows: 940
## Columns: 3
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1503960366…
## $ ActivityDay <chr> "4/12/2016", "4/13/2016", "4/14/2016", "4/15/2016", "4/16/…
## $ Calories <dbl> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 2035, 1786, 1775…
glimpse(hourlycalories)
## Rows: 22,099
## Columns: 3
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 150396036…
## $ ActivityHour <chr> "4/12/2016 12:00:00 AM", "4/12/2016 1:00:00 AM", "4/12/20…
## $ Calories <dbl> 81, 61, 59, 47, 48, 48, 48, 47, 68, 141, 99, 76, 73, 66, …
glimpse(sleepactivity_df)
## Rows: 188,521
## Columns: 4
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 1503…
## $ date <chr> "4/12/2016 2:47:30 AM", "4/12/2016 2:48:30 AM", "4/12/2016 2:49:…
## $ value <dbl> 3, 2, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 2, 1, 1, 1, 1, 1, 1…
## $ logId <dbl> 11380564589, 11380564589, 11380564589, 11380564589, 11380564589,…
glimpse(sleepday)
## Rows: 413
## Columns: 5
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 150…
## $ SleepDay <chr> "4/12/2016 12:00:00 AM", "4/13/2016 12:00:00 AM", "…
## $ TotalSleepRecords <dbl> 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ TotalMinutesAsleep <dbl> 327, 384, 412, 340, 700, 304, 360, 325, 361, 430, 2…
## $ TotalTimeInBed <dbl> 346, 407, 442, 367, 712, 320, 377, 364, 384, 449, 3…
glimpse(weightlog)
## Rows: 67
## Columns: 8
## $ Id <dbl> 1503960366, 1503960366, 1927972279, 2873212765, 2873212…
## $ Date <chr> "5/2/2016 11:59:59 PM", "5/3/2016 11:59:59 PM", "4/13/2…
## $ WeightKg <dbl> 52.6, 52.6, 133.5, 56.7, 57.3, 72.4, 72.3, 69.7, 70.3, …
## $ WeightPounds <dbl> 115.9631, 115.9631, 294.3171, 125.0021, 126.3249, 159.6…
## $ Fat <dbl> 22, NA, NA, NA, NA, 25, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ BMI <dbl> 22.65, 22.65, 47.54, 21.45, 21.69, 27.45, 27.38, 27.25,…
## $ IsManualReport <lgl> TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, …
## $ LogId <dbl> 1.462234e+12, 1.462320e+12, 1.460510e+12, 1.461283e+12,…
n_distinct(dailycalories$Id)
## [1] 33
n_distinct(hourlycalories$Id)
## [1] 33
n_distinct(sleepactivity_df$Id)
## [1] 24
n_distinct(sleepday$Id)
## [1] 24
n_distinct(weightlog$Id)
## [1] 8
The weight log data was collected for 8 participants while the other data was collected from 24 or 33 participants. This indicates an issue with the usage of the smart device.
sleepday_weight <- merge(sleepday, weightlog, by=c('Id'))
head(sleepday_weight)
## Id SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 5/8/2016 12:00:00 AM 1 594
## 2 1503960366 5/8/2016 12:00:00 AM 1 594
## 3 1503960366 5/7/2016 12:00:00 AM 1 331
## 4 1503960366 5/7/2016 12:00:00 AM 1 331
## 5 1503960366 4/26/2016 12:00:00 AM 1 245
## 6 1503960366 4/26/2016 12:00:00 AM 1 245
## TotalTimeInBed Date WeightKg WeightPounds Fat BMI
## 1 611 5/3/2016 11:59:59 PM 52.6 115.9631 NA 22.65
## 2 611 5/2/2016 11:59:59 PM 52.6 115.9631 22 22.65
## 3 349 5/3/2016 11:59:59 PM 52.6 115.9631 NA 22.65
## 4 349 5/2/2016 11:59:59 PM 52.6 115.9631 22 22.65
## 5 274 5/3/2016 11:59:59 PM 52.6 115.9631 NA 22.65
## 6 274 5/2/2016 11:59:59 PM 52.6 115.9631 22 22.65
## IsManualReport LogId
## 1 TRUE 1.462320e+12
## 2 TRUE 1.462234e+12
## 3 TRUE 1.462320e+12
## 4 TRUE 1.462234e+12
## 5 TRUE 1.462320e+12
## 6 TRUE 1.462234e+12
n_distinct(sleepday_weight$Id)
## [1] 6
sleep_weight <-read_csv("sleep_weight_merged.csv")
## Rows: 6 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (3): ID, Weight, MinutesAsleep
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
colnames(sleep_weight)
## [1] "ID" "Weight" "MinutesAsleep"
glimpse(sleep_weight)
## Rows: 6
## Columns: 3
## $ ID <dbl> 1503960366, 1927972279, 4319703577, 4558609924, 55771503…
## $ Weight <dbl> 115.9631, 294.3171, 159.5045, 153.5299, 199.9593, 135.70…
## $ MinutesAsleep <dbl> 360.2800, 417.0000, 476.6538, 127.6000, 432.0000, 448.00…
Summary analyses showed that the weight log data entries came from 8 participants while the data for minutes of sleep per day and sleep activity had 24 participants. Merging the data to include only participants who entered weight data and had sleep information recorded resulted in 6 participants. This is not enough of a sample size to make any generalizable conclusions about sleep and weight.
The main finding is that the weight log data was only gathered for less than 1/4 of the partcipants (8 out of 30 participants). Why is this? Further research was conducted to see how the weight log data for FitBit is collected. I found that this data has to be manually entered by the consumer https://community.fitbit.com/t5/Flex-Flex-2/How-does-your-fitbit-track-your-weight/td-p/1811894.
Key Findings
Manual entry of weight info leads to missing data with the FitBit tracker. This is due to low usage of this feature on the device. Manual entry may deter consumers especially if it is time consuming or they are not meeting their health goals. Weight measurement is often used as a key marker of health and fitness. This can be a debatable topic given the recent trend towards focussing on health instead of weight https://www.franchisehelp.com/industry-reports/weight-loss-industry-analysis-2020-cost-trends/. However, weight measurement still remains a key part of health-based smart devices https://www.health.harvard.edu/staying-healthy/wearable-fitness-trackers-may-aid-weight-loss-efforts. Therefore, if consumers can link their activities to weight goals without manual entry they may get more utility out of the devices. Automating weight measurement can make weight data more reliable and effective in finding the relationship between all data measured with the smart devices.
Bellabeats can include a weight measurement feature on its app. The following steps can be used to guide implementation.