Founded in 2013, BellaBeat is known as a high-tech wellness company with multiple health-focused smart products for women. Its app and products collect data on activity, sleep, stress, hydration level and reproductive health to empower women with knowledge about their own health and habits.
Their signature products include:
○ Bellabeat app: The app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. Users can use this data to better understand their current habits and make healthy decisions. The app connects to their line of smart wellness products.
○ Leaf: Bellabeat’s classic wellness tracker can be worn as a bracelet, necklace, or clip. The Leaf tracker connects to the Bellabeat app to track activity, sleep, and stress.
○ Time: The timeless look of a classic timepiece is combined with smart technology to track user activity, sleep, and stress. The Time watch also connects to the Bellabeat app to provide you with insights into your daily wellness.
○ Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that users are appropriately hydrated throughout the day. The Spring bottle connects to the Bellabeat app to track your hydration levels.
○ Bellabeat membership: A subscription-based membership program gives users 24/7 access to fully personalized guidance on nutrition, activity, sleep, health and beauty, and mindfulness based on their lifestyle and goals.
○ Analyse smart product usage data.
○ Provide high quality recommendation for Bellabeat’s marketing strategy.
○ What are some trends in smart device usage?
○ How could these trends apply to Bellabeat customers?
○ How could these trends help influence Bellabeat’s marketing strategy?
○ Urška Sršen: Bellabeat’s co-founder and Chief Creative Officer.
○ Sando Mur: Mathematician and Bellabeat’s cofounder; a key member of the Bellabeat executive team
○ Bellabeat marketing analytics team: A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat’s marketing strategy.
Data is available through the public domain on Kaggle (Dataset made available through Mobius): Fitbit Fitness Tracker Data
The data is collected from 30 fitbit users. Thirty eligible Bellabeat users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits. All the separate 18 files are in csv format.
○ Reliable - LOW - Lack of information on the margin of error and the usage of a small sample size will affect the amount of analysis that can be done.
○ Originality - LOW - These datasets were generated by respondents to a distributed survey via Amazon Mechanical Turk between (Third-party provider).
○ Comprehensive - LOW - There is no information about the participants, such as gender, age, location, health state, etc. This could mean that data is not randomized and could pose bias in the analysis.
○ Current - LOW - The data set is collected in 2016 which is already outdated and not relevant to the current trend in smart device usage.
○ Cited - LOW - the data set was generated by the third-party provider Amazon Mechanical Turk which operates as the crowdsourcing website for businesses to hire remote personnel to do the survey collection. This is not clear whether they provide a credible source.
With the low integrity and credibility assessment on the data set, it is clearly not a good quality data source to provide reliable insights to Bellabeat. Therefore, I would recommend treating the following analysis based on this data set as the starting point. Further recommendations should be verified and produced through an analysis of a larger and more reliable dataset.
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.5 v dplyr 1.0.7
## v tidyr 1.1.4 v stringr 1.4.0
## v readr 2.0.2 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(janitor)
## Warning: package 'janitor' was built under R version 4.1.2
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(ggplot2)
library(tidyr)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(dplyr)
The files were imported into R Studio and the data frames were created with simplified names.
daily_activity <- read_csv("dailyActivity_merged.csv")
## Rows: 940 Columns: 15
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): ActivityDate
## dbl (14): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Daily_Calories <- read_csv("dailyCalories_merged.csv")
## Rows: 940 Columns: 3
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): ActivityDay
## dbl (2): Id, Calories
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Daily_Intensities <- read_csv("dailyIntensities_merged.csv")
## Rows: 940 Columns: 10
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): ActivityDay
## dbl (9): Id, SedentaryMinutes, LightlyActiveMinutes, FairlyActiveMinutes, Ve...
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Daily_Steps <- read_csv("dailySteps_merged.csv")
## Rows: 940 Columns: 3
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): ActivityDay
## dbl (2): Id, StepTotal
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Heart_rate_seconds <- read_csv("heartrate_seconds_merged.csv")
## Rows: 2483658 Columns: 3
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): Time
## dbl (2): Id, Value
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Minute_METs <- read_csv("minuteMETsNarrow_merged.csv")
## Rows: 1325580 Columns: 3
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): ActivityMinute
## dbl (2): Id, METs
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Sleep_Day <- read_csv("sleepDay_merged.csv")
## Rows: 413 Columns: 5
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): SleepDay
## dbl (4): Id, TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
Weight_log_info <- read_csv("weightLogInfo_merged.csv")
## Rows: 67 Columns: 8
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): Date
## dbl (6): Id, WeightKg, WeightPounds, Fat, BMI, LogId
## lgl (1): IsManualReport
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(daily_activity)
## # A tibble: 6 x 15
## Id ActivityDate TotalSteps TotalDistance TrackerDistance LoggedActivitie~
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1503960366 4/12/2016 13162 8.5 8.5 0
## 2 1503960366 4/13/2016 10735 6.97 6.97 0
## 3 1503960366 4/14/2016 10460 6.74 6.74 0
## 4 1503960366 4/15/2016 9762 6.28 6.28 0
## 5 1503960366 4/16/2016 12669 8.16 8.16 0
## 6 1503960366 4/17/2016 9705 6.48 6.48 0
## # ... with 9 more variables: VeryActiveDistance <dbl>,
## # ModeratelyActiveDistance <dbl>, LightActiveDistance <dbl>,
## # SedentaryActiveDistance <dbl>, VeryActiveMinutes <dbl>,
## # FairlyActiveMinutes <dbl>, LightlyActiveMinutes <dbl>,
## # SedentaryMinutes <dbl>, Calories <dbl>
glimpse(daily_activity)
## Rows: 940
## Columns: 15
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 150396036~
## $ ActivityDate <chr> "4/12/2016", "4/13/2016", "4/14/2016", "4/15/~
## $ TotalSteps <dbl> 13162, 10735, 10460, 9762, 12669, 9705, 13019~
## $ TotalDistance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8~
## $ TrackerDistance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8~
## $ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ VeryActiveDistance <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3.5~
## $ ModeratelyActiveDistance <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1.3~
## $ LightActiveDistance <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5.0~
## $ SedentaryActiveDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ VeryActiveMinutes <dbl> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66, 4~
## $ FairlyActiveMinutes <dbl> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, 21~
## $ LightlyActiveMinutes <dbl> 328, 217, 181, 209, 221, 164, 233, 264, 205, ~
## $ SedentaryMinutes <dbl> 728, 776, 1218, 726, 773, 539, 1149, 775, 818~
## $ Calories <dbl> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 203~
colnames(daily_activity)
## [1] "Id" "ActivityDate"
## [3] "TotalSteps" "TotalDistance"
## [5] "TrackerDistance" "LoggedActivitiesDistance"
## [7] "VeryActiveDistance" "ModeratelyActiveDistance"
## [9] "LightActiveDistance" "SedentaryActiveDistance"
## [11] "VeryActiveMinutes" "FairlyActiveMinutes"
## [13] "LightlyActiveMinutes" "SedentaryMinutes"
## [15] "Calories"
head(Daily_Calories)
## # A tibble: 6 x 3
## Id ActivityDay Calories
## <dbl> <chr> <dbl>
## 1 1503960366 4/12/2016 1985
## 2 1503960366 4/13/2016 1797
## 3 1503960366 4/14/2016 1776
## 4 1503960366 4/15/2016 1745
## 5 1503960366 4/16/2016 1863
## 6 1503960366 4/17/2016 1728
glimpse(Daily_Calories)
## Rows: 940
## Columns: 3
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1503960366~
## $ ActivityDay <chr> "4/12/2016", "4/13/2016", "4/14/2016", "4/15/2016", "4/16/~
## $ Calories <dbl> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 2035, 1786, 1775~
colnames(Daily_Calories)
## [1] "Id" "ActivityDay" "Calories"
head(Daily_Intensities)
## # A tibble: 6 x 10
## Id ActivityDay SedentaryMinutes LightlyActiveMinutes FairlyActiveMinu~
## <dbl> <chr> <dbl> <dbl> <dbl>
## 1 1503960366 4/12/2016 728 328 13
## 2 1503960366 4/13/2016 776 217 19
## 3 1503960366 4/14/2016 1218 181 11
## 4 1503960366 4/15/2016 726 209 34
## 5 1503960366 4/16/2016 773 221 10
## 6 1503960366 4/17/2016 539 164 20
## # ... with 5 more variables: VeryActiveMinutes <dbl>,
## # SedentaryActiveDistance <dbl>, LightActiveDistance <dbl>,
## # ModeratelyActiveDistance <dbl>, VeryActiveDistance <dbl>
glimpse(Daily_Intensities)
## Rows: 940
## Columns: 10
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 150396036~
## $ ActivityDay <chr> "4/12/2016", "4/13/2016", "4/14/2016", "4/15/~
## $ SedentaryMinutes <dbl> 728, 776, 1218, 726, 773, 539, 1149, 775, 818~
## $ LightlyActiveMinutes <dbl> 328, 217, 181, 209, 221, 164, 233, 264, 205, ~
## $ FairlyActiveMinutes <dbl> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, 21~
## $ VeryActiveMinutes <dbl> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66, 4~
## $ SedentaryActiveDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
## $ LightActiveDistance <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5.0~
## $ ModeratelyActiveDistance <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1.3~
## $ VeryActiveDistance <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3.5~
colnames(Daily_Intensities)
## [1] "Id" "ActivityDay"
## [3] "SedentaryMinutes" "LightlyActiveMinutes"
## [5] "FairlyActiveMinutes" "VeryActiveMinutes"
## [7] "SedentaryActiveDistance" "LightActiveDistance"
## [9] "ModeratelyActiveDistance" "VeryActiveDistance"
head(Daily_Steps)
## # A tibble: 6 x 3
## Id ActivityDay StepTotal
## <dbl> <chr> <dbl>
## 1 1503960366 4/12/2016 13162
## 2 1503960366 4/13/2016 10735
## 3 1503960366 4/14/2016 10460
## 4 1503960366 4/15/2016 9762
## 5 1503960366 4/16/2016 12669
## 6 1503960366 4/17/2016 9705
glimpse(Daily_Steps)
## Rows: 940
## Columns: 3
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1503960366~
## $ ActivityDay <chr> "4/12/2016", "4/13/2016", "4/14/2016", "4/15/2016", "4/16/~
## $ StepTotal <dbl> 13162, 10735, 10460, 9762, 12669, 9705, 13019, 15506, 1054~
colnames(Daily_Steps)
## [1] "Id" "ActivityDay" "StepTotal"
head(Heart_rate_seconds)
## # A tibble: 6 x 3
## Id Time Value
## <dbl> <chr> <dbl>
## 1 2022484408 4/12/2016 7:21:00 AM 97
## 2 2022484408 4/12/2016 7:21:05 AM 102
## 3 2022484408 4/12/2016 7:21:10 AM 105
## 4 2022484408 4/12/2016 7:21:20 AM 103
## 5 2022484408 4/12/2016 7:21:25 AM 101
## 6 2022484408 4/12/2016 7:22:05 AM 95
glimpse(Heart_rate_seconds)
## Rows: 2,483,658
## Columns: 3
## $ Id <dbl> 2022484408, 2022484408, 2022484408, 2022484408, 2022484408, 2022~
## $ Time <chr> "4/12/2016 7:21:00 AM", "4/12/2016 7:21:05 AM", "4/12/2016 7:21:~
## $ Value <dbl> 97, 102, 105, 103, 101, 95, 91, 93, 94, 93, 92, 89, 83, 61, 60, ~
colnames(Heart_rate_seconds)
## [1] "Id" "Time" "Value"
head(Minute_METs)
## # A tibble: 6 x 3
## Id ActivityMinute METs
## <dbl> <chr> <dbl>
## 1 1503960366 4/12/2016 12:00:00 AM 10
## 2 1503960366 4/12/2016 12:01:00 AM 10
## 3 1503960366 4/12/2016 12:02:00 AM 10
## 4 1503960366 4/12/2016 12:03:00 AM 10
## 5 1503960366 4/12/2016 12:04:00 AM 10
## 6 1503960366 4/12/2016 12:05:00 AM 12
glimpse(Minute_METs)
## Rows: 1,325,580
## Columns: 3
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1503960~
## $ ActivityMinute <chr> "4/12/2016 12:00:00 AM", "4/12/2016 12:01:00 AM", "4/12~
## $ METs <dbl> 10, 10, 10, 10, 10, 12, 12, 12, 12, 12, 12, 12, 10, 10,~
colnames(Minute_METs)
## [1] "Id" "ActivityMinute" "METs"
head(Sleep_Day)
## # A tibble: 6 x 5
## Id SleepDay TotalSleepRecor~ TotalMinutesAsl~ TotalTimeInBed
## <dbl> <chr> <dbl> <dbl> <dbl>
## 1 1503960366 4/12/2016 12:00:00 AM 1 327 346
## 2 1503960366 4/13/2016 12:00:00 AM 2 384 407
## 3 1503960366 4/15/2016 12:00:00 AM 1 412 442
## 4 1503960366 4/16/2016 12:00:00 AM 2 340 367
## 5 1503960366 4/17/2016 12:00:00 AM 1 700 712
## 6 1503960366 4/19/2016 12:00:00 AM 1 304 320
glimpse(Sleep_Day)
## Rows: 413
## Columns: 5
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 150~
## $ SleepDay <chr> "4/12/2016 12:00:00 AM", "4/13/2016 12:00:00 AM", "~
## $ TotalSleepRecords <dbl> 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
## $ TotalMinutesAsleep <dbl> 327, 384, 412, 340, 700, 304, 360, 325, 361, 430, 2~
## $ TotalTimeInBed <dbl> 346, 407, 442, 367, 712, 320, 377, 364, 384, 449, 3~
colnames(Sleep_Day)
## [1] "Id" "SleepDay" "TotalSleepRecords"
## [4] "TotalMinutesAsleep" "TotalTimeInBed"
head(Weight_log_info)
## # A tibble: 6 x 8
## Id Date WeightKg WeightPounds Fat BMI IsManualReport LogId
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <lgl> <dbl>
## 1 1503960366 5/2/2016~ 52.6 116. 22 22.6 TRUE 1.46e12
## 2 1503960366 5/3/2016~ 52.6 116. NA 22.6 TRUE 1.46e12
## 3 1927972279 4/13/201~ 134. 294. NA 47.5 FALSE 1.46e12
## 4 2873212765 4/21/201~ 56.7 125. NA 21.5 TRUE 1.46e12
## 5 2873212765 5/12/201~ 57.3 126. NA 21.7 TRUE 1.46e12
## 6 4319703577 4/17/201~ 72.4 160. 25 27.5 TRUE 1.46e12
glimpse(Weight_log_info)
## Rows: 67
## Columns: 8
## $ Id <dbl> 1503960366, 1503960366, 1927972279, 2873212765, 2873212~
## $ Date <chr> "5/2/2016 11:59:59 PM", "5/3/2016 11:59:59 PM", "4/13/2~
## $ WeightKg <dbl> 52.6, 52.6, 133.5, 56.7, 57.3, 72.4, 72.3, 69.7, 70.3, ~
## $ WeightPounds <dbl> 115.9631, 115.9631, 294.3171, 125.0021, 126.3249, 159.6~
## $ Fat <dbl> 22, NA, NA, NA, NA, 25, NA, NA, NA, NA, NA, NA, NA, NA,~
## $ BMI <dbl> 22.65, 22.65, 47.54, 21.45, 21.69, 27.45, 27.38, 27.25,~
## $ IsManualReport <lgl> TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, ~
## $ LogId <dbl> 1.462234e+12, 1.462320e+12, 1.460510e+12, 1.461283e+12,~
colnames(Weight_log_info)
## [1] "Id" "Date" "WeightKg" "WeightPounds"
## [5] "Fat" "BMI" "IsManualReport" "LogId"
From a quick overview of column names, the daily_activity data frame appears to incorporate data from Daily_Calories, Daily_Intensities, Daily_Steps data frames. All 4 data frames contain ‘Id’ columns which we can use to compare and determine whether the same number of observations match those Ids.
The sqldf package is loaded to determine whether the values of Daily_Calories, Daily_Intensities, and Daily_Steps are contained in daily_activity using SQL syntax. However, in order to compare those data frames, it is required that the number of columns in the data frames must be the same. Therefore, a temporary data frame with the important columns would be created first.
library(sqldf)
## Warning: package 'sqldf' was built under R version 4.1.2
## Loading required package: gsubfn
## Warning: package 'gsubfn' was built under R version 4.1.2
## Loading required package: proto
## Warning: package 'proto' was built under R version 4.1.2
## Loading required package: RSQLite
## Warning: package 'RSQLite' was built under R version 4.1.2
daily_activity_2 <- daily_activity %>%
select(Id,ActivityDate,Calories)
head(daily_activity_2)
## # A tibble: 6 x 3
## Id ActivityDate Calories
## <dbl> <chr> <dbl>
## 1 1503960366 4/12/2016 1985
## 2 1503960366 4/13/2016 1797
## 3 1503960366 4/14/2016 1776
## 4 1503960366 4/15/2016 1745
## 5 1503960366 4/16/2016 1863
## 6 1503960366 4/17/2016 1728
Check_Activity_Calories <- sqldf('SELECT * FROM daily_activity_2 INTERSECT SELECT * FROM Daily_Calories')
head(Check_Activity_Calories)
## Id ActivityDate Calories
## 1 1503960366 4/12/2016 1985
## 2 1503960366 4/13/2016 1797
## 3 1503960366 4/14/2016 1776
## 4 1503960366 4/15/2016 1745
## 5 1503960366 4/16/2016 1863
## 6 1503960366 4/17/2016 1728
nrow(Check_Activity_Calories)
## [1] 940
daily_activity_3 <- daily_activity %>%
select(Id, ActivityDate,SedentaryMinutes,LightlyActiveMinutes,FairlyActiveMinutes,VeryActiveMinutes,SedentaryActiveDistance,LightActiveDistance,ModeratelyActiveDistance,VeryActiveDistance)
head(daily_activity_3)
## # A tibble: 6 x 10
## Id ActivityDate SedentaryMinutes LightlyActiveMinutes FairlyActiveMin~
## <dbl> <chr> <dbl> <dbl> <dbl>
## 1 1503960366 4/12/2016 728 328 13
## 2 1503960366 4/13/2016 776 217 19
## 3 1503960366 4/14/2016 1218 181 11
## 4 1503960366 4/15/2016 726 209 34
## 5 1503960366 4/16/2016 773 221 10
## 6 1503960366 4/17/2016 539 164 20
## # ... with 5 more variables: VeryActiveMinutes <dbl>,
## # SedentaryActiveDistance <dbl>, LightActiveDistance <dbl>,
## # ModeratelyActiveDistance <dbl>, VeryActiveDistance <dbl>
Check_Activity_Intensities <- sqldf('SELECT * FROM daily_activity_3 INTERSECT SELECT * FROM Daily_Intensities')
head(Check_Activity_Intensities)
## Id ActivityDate SedentaryMinutes LightlyActiveMinutes
## 1 1503960366 4/12/2016 728 328
## 2 1503960366 4/13/2016 776 217
## 3 1503960366 4/14/2016 1218 181
## 4 1503960366 4/15/2016 726 209
## 5 1503960366 4/16/2016 773 221
## 6 1503960366 4/17/2016 539 164
## FairlyActiveMinutes VeryActiveMinutes SedentaryActiveDistance
## 1 13 25 0
## 2 19 21 0
## 3 11 30 0
## 4 34 29 0
## 5 10 36 0
## 6 20 38 0
## LightActiveDistance ModeratelyActiveDistance VeryActiveDistance
## 1 6.06 0.55 1.88
## 2 4.71 0.69 1.57
## 3 3.91 0.40 2.44
## 4 2.83 1.26 2.14
## 5 5.04 0.41 2.71
## 6 2.51 0.78 3.19
nrow(Check_Activity_Intensities)
## [1] 940
daily_activity_4 <- daily_activity %>%
select(Id,ActivityDate,TotalSteps)
head(daily_activity_4)
## # A tibble: 6 x 3
## Id ActivityDate TotalSteps
## <dbl> <chr> <dbl>
## 1 1503960366 4/12/2016 13162
## 2 1503960366 4/13/2016 10735
## 3 1503960366 4/14/2016 10460
## 4 1503960366 4/15/2016 9762
## 5 1503960366 4/16/2016 12669
## 6 1503960366 4/17/2016 9705
Check_Activity_Steps <- sqldf('SELECT * FROM daily_activity_4 INTERSECT SELECT * FROM Daily_Steps')
head(Check_Activity_Steps)
## Id ActivityDate TotalSteps
## 1 1503960366 4/12/2016 13162
## 2 1503960366 4/13/2016 10735
## 3 1503960366 4/14/2016 10460
## 4 1503960366 4/15/2016 9762
## 5 1503960366 4/16/2016 12669
## 6 1503960366 4/17/2016 9705
nrow(Check_Activity_Steps)
## [1] 940
nrow(daily_activity[duplicated(daily_activity),])
## [1] 0
0 duplicate row was found.
colSums(daily_activity != 0)
## Id ActivityDate TotalSteps
## 940 940 863
## TotalDistance TrackerDistance LoggedActivitiesDistance
## 862 862 32
## VeryActiveDistance ModeratelyActiveDistance LightActiveDistance
## 527 554 855
## SedentaryActiveDistance VeryActiveMinutes FairlyActiveMinutes
## 82 531 556
## LightlyActiveMinutes SedentaryMinutes Calories
## 856 939 936
The daily_activity data frame appears to contain a lot of cell equal to ‘0’. In order to prevent skewed results, the filter() function will be used to omit those zero cells.
daily_activity_new <- daily_activity %>%
filter(TotalSteps !=0)
daily_activity_new <- daily_activity %>%
filter(TotalDistance != 0)
Sleep_Day data frame has the column SleepDay with combined data of Date and Time. The function ‘separate()’ will be used to divide the data into 2 columns to use later in analysis.
Sleep_Day_New <- Sleep_Day %>%
separate(SleepDay,c("Date","Time")," ")
## Warning: Expected 2 pieces. Additional pieces discarded in 413 rows [1, 2, 3, 4,
## 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
head(Sleep_Day_New)
## # A tibble: 6 x 6
## Id Date Time TotalSleepRecor~ TotalMinutesAsl~ TotalTimeInBed
## <dbl> <chr> <chr> <dbl> <dbl> <dbl>
## 1 1503960366 4/12/2016 12:00:00 1 327 346
## 2 1503960366 4/13/2016 12:00:00 2 384 407
## 3 1503960366 4/15/2016 12:00:00 1 412 442
## 4 1503960366 4/16/2016 12:00:00 2 340 367
## 5 1503960366 4/17/2016 12:00:00 1 700 712
## 6 1503960366 4/19/2016 12:00:00 1 304 320
nrow(Sleep_Day_New[duplicated(Sleep_Day_New),])
## [1] 3
3 duplicate rows were found and should be removed.
Sleep_Day_New <- unique(Sleep_Day_New)
nrow(Sleep_Day_New)
## [1] 410
n_distinct(daily_activity_new$Id)
## [1] 33
n_distinct(Heart_rate_seconds$Id)
## [1] 14
n_distinct(Minute_METs$Id)
## [1] 33
n_distinct(Sleep_Day_New$Id)
## [1] 24
n_distinct(Weight_log_info$Id)
## [1] 8
nrow(daily_activity_new)
## [1] 862
nrow(Heart_rate_seconds)
## [1] 2483658
nrow(Minute_METs)
## [1] 1325580
nrow(Sleep_Day_New)
## [1] 410
nrow(Weight_log_info)
## [1] 67
The Heart_rate_seconds and the Weight_log_info data frames appears to show a low number of distinct participants compared to other data frames (using n_distinct() function). Therefore, it would not be sensible to use these two data frames to draw conclusions and recommendations for this project.
daily_activity_new %>%
select(TotalSteps,TotalDistance,SedentaryMinutes,LightlyActiveMinutes,FairlyActiveMinutes,VeryActiveMinutes,SedentaryActiveDistance,LightActiveDistance,ModeratelyActiveDistance,VeryActiveDistance,Calories) %>%
summary()
## TotalSteps TotalDistance SedentaryMinutes LightlyActiveMinutes
## Min. : 8 Min. : 0.010 Min. : 0.0 Min. : 0.0
## 1st Qu.: 4927 1st Qu.: 3.373 1st Qu.: 721.2 1st Qu.:147.0
## Median : 8054 Median : 5.590 Median :1020.5 Median :208.5
## Mean : 8329 Mean : 5.986 Mean : 955.2 Mean :210.3
## 3rd Qu.:11096 3rd Qu.: 7.905 3rd Qu.:1189.0 3rd Qu.:272.0
## Max. :36019 Max. :28.030 Max. :1440.0 Max. :518.0
## FairlyActiveMinutes VeryActiveMinutes SedentaryActiveDistance
## Min. : 0.00 Min. : 0.00 Min. :0.000000
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.:0.000000
## Median : 8.00 Median : 7.00 Median :0.000000
## Mean : 14.79 Mean : 23.04 Mean :0.001752
## 3rd Qu.: 21.00 3rd Qu.: 35.00 3rd Qu.:0.000000
## Max. :143.00 Max. :210.00 Max. :0.110000
## LightActiveDistance ModeratelyActiveDistance VeryActiveDistance Calories
## Min. : 0.000 Min. :0.0000 Min. : 0.000 Min. : 52
## 1st Qu.: 2.350 1st Qu.:0.0000 1st Qu.: 0.000 1st Qu.:1857
## Median : 3.580 Median :0.3100 Median : 0.410 Median :2220
## Mean : 3.643 Mean :0.6189 Mean : 1.639 Mean :2362
## 3rd Qu.: 4.897 3rd Qu.:0.8675 3rd Qu.: 2.277 3rd Qu.:2832
## Max. :10.710 Max. :6.4800 Max. :21.920 Max. :4900
Total Steps: The average daily steps recorded from the participants are 8,329 steps. It is believed through decades that the daily 10,000-step meter is recommended for health and longevity. Even though the participants do not maintain the optimal step count, new studies showed that walking at least 7,000 steps daily still yield health benefits. Therefore, Bellabeat users are doing well in this area. They can easily achieve the 10,000 daily steps goal if they can connect with a walking partner or walking group on BellaBeat app.
Sedentary Minutes: The average sedentary time is 955.2 minutes or 16 hours. The BellaBeat users have spent a significant time for little or no exercise,probably involving lots of lying down and sitting which poses more health risks, such as metabolic syndrome.
Very Active Minutes (full-time athlete) and Fairly Active Minutes (active lifestyle): Average very active minutes and fairly active minutes were at 23.04 minutes and 14.79 minutes respectively. In term of weekly time (5 days/week), average very active minutes and fairly active minutes were 115.2 minutes (~1.92 hours) and 73.95 minutes(~1.2 hours). According to CDC, adults should do 75 minutes (1 hour and 15 minutes) to 150 minutes (2 hours and 30 minutes) a week of vigorous-intensity physical activity or at least 150 minutes (2 hours and 30 minutes) to 300 minutes (5 hours) a week of moderate-intensity. The participants have been doing relatively well in this area.
Calories: The average amount of daily calories burned is 2,362 kcal. Studies suggested that individuals need to burn 3500 calories a day to lose one pound a day, though this 3,500-calorie-burning workout in one day may not realistic for some people. It is said that the average person burns around 1800 calories a day doing absolutely nothing. The BellaBeat users are burning calories more than normal norms, but they can shed more pounds if they can achieve the daily calories burnt target.
Heart_rate_seconds %>%
select(Value) %>%
summary()
## Value
## Min. : 36.00
## 1st Qu.: 63.00
## Median : 73.00
## Mean : 77.33
## 3rd Qu.: 88.00
## Max. :203.00
According to Mayoclinic, a normal resting heart rate for adults ranges from 60 to 100 beats per minute. Given the low participant in the data frame, the average heart rate is recorded at 77 beats per minute (bpm) from the BellaBeat users which is normal in recommended range. Heart rate will vary depending on age, emotions, gender, fitness activity, stress level, medications, body mass, body position. It is said that the more fit users are, the lower heart rate is.
Minute_METs %>%
select(METs) %>%
summary()
## METs
## Min. : 0.00
## 1st Qu.: 10.00
## Median : 10.00
## Mean : 14.69
## 3rd Qu.: 11.00
## Max. :157.00
METs is known as metabolic equivalent to task, the ratio of the metabolic rate during exercise to the metabolic rate at rest. A MET score of 1 represents the amount of energy used when a person is at rest. Light activity requires less than 3 METs, moderate activities require 3 - 6 METs whereas vigorous activitues require more than 6 METs. The average METs recorded from the BellaBeat participant throughout the day is 14.69 METs which is unbelievably high. Given that the participants are low active (high sedentary minutes), this METs figure is conflicting with other results and raises concerns for its data accuracy.
Sleep_Day_New %>%
select(TotalSleepRecords,TotalMinutesAsleep,TotalTimeInBed) %>%
summary()
## TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## Min. :1.00 Min. : 58.0 Min. : 61.0
## 1st Qu.:1.00 1st Qu.:361.0 1st Qu.:403.8
## Median :1.00 Median :432.5 Median :463.0
## Mean :1.12 Mean :419.2 Mean :458.5
## 3rd Qu.:1.00 3rd Qu.:490.0 3rd Qu.:526.0
## Max. :3.00 Max. :796.0 Max. :961.0
The average total minutes being asleep is recorded as 419.2 minutes equivalent to 7 hours. The average total time BellaBeat users spending in bed is 458.5 minutes, equivalent to 7.64 hours. That said, 38 minutes is the average time BellaBeat users spent restless or awake in bed. Adults are recommended to aim for 7 - 9 hours of sleep to function at their best. BellaBeat users are doing well in this area. However, regularly waking up for at least 20 to 30 minutes at night or taking more than 30 minutes to drift off to sleep might be a sign insomnia. Further data should be collected to understand more about sleep phases and other factors contributing to a good sleep.
Weight_log_info %>%
select(WeightKg,WeightPounds,BMI) %>%
summary()
## WeightKg WeightPounds BMI
## Min. : 52.60 Min. :116.0 Min. :21.45
## 1st Qu.: 61.40 1st Qu.:135.4 1st Qu.:23.96
## Median : 62.50 Median :137.8 Median :24.39
## Mean : 72.04 Mean :158.8 Mean :25.19
## 3rd Qu.: 85.05 3rd Qu.:187.5 3rd Qu.:25.56
## Max. :133.50 Max. :294.3 Max. :47.54
According to CDC, healthy weight has BMI values in the range of 18.5 - 24.9. Overweight has BMI values in the range of 25.0 - 29.9. Despite the low number of participants in the data frame, the average BMI is recorded at 25.19 which is an overweight value. Even though BMI gives a good estimate of total body fat for most people, it does not apply for everybody. For example, very muscular people can have a high BMI because of their muscle mass, even though they’re not necessarily overweight. A person with a high BMI should be evaluated by a health care provider, who might use other factors such as skin fold thickness (a measure of body fat), waist size, evaluations of diet and family health problems, and other factors to find out if a person’s weight might pose a health risk.
Since its launch, BellaBeat empowers women by providing insights on their activity, sleep, stress, hydration, and reproductive health. Based on an analysis of BellaBeat users’ experiences with features and products, some recommendations for Bellabeat’s future growth will be suggested.