Bellabeat is a high-tech company that manufactures health-focused smart products, such as smart bracelet and watch that is capable in collecting data on activity, sleep, stress, and reproductive health which allows Bellabeat to empower women with knowledge about their own health and habits.
Since it was founded in 2013, Bellabeat has grown rapidly and quickly positioned itself as a tech-driven wellness company for women. Bellabeat believes that an analysis of its available consumer data would reveal more opportunities for growth.
The team was tasked to analyze smart device usage data in order to gain insight into how consumers use non-Bellabeat smart devices. Using the insights gained, the team was tasked to select one Bellabeat product to see how the insights can be applied.
Information about the data
These data are open source and public and was generated by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016-05.12.2016
Due to the limitation of the data, not all 18 sets of data will be used.
Are there issues with bias or credibility in this data?
No data on gender and age is provided, which data might be biased towards certain groups
Another set of data can be used to address the limitations. However, in this case study only this set of data will be used.
Sample size of the data is too small to draw unbiased decision and conclusion. Assuming a common confidence level of 95%, common margin of error of 5% and population proportion of 50%, a minimum of 385 or more measurements/surveys are needed. However, this dataset only contains sample size of 30 participants
Are there any problems with the data?
There are 18 sets of data, which is confusing to understand.
Time of the data is more then 5 years ago(2016). Habits and performance of fitbit users might have changed .
install.packages("tidyverse")
install.packages("ggplot2")
install.packages("dplyr")
install.packages("tidyr")
install.packages("lubridate")
install.packages("janitor")
library(tidyverse)
library(ggplot2)
library(dplyr)
library(tidyr)
library(lubridate)
library(janitor)
dailyActivity_merged<-read_csv("dailyActivity_merged.csv")
sleepDay_merged<-read_csv("sleepDay_merged.csv")
dailySteps_merged<-read_csv("dailySteps_merged.csv")
hourlySteps_merged<-read_csv("hourlySteps_merged.csv")
head (dailyActivity_merged)
str(dailyActivity_merged)
## spec_tbl_df [940 × 15] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Id : num [1:940] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityDate : chr [1:940] "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
## $ TotalSteps : num [1:940] 13162 10735 10460 9762 12669 ...
## $ TotalDistance : num [1:940] 8.5 6.97 6.74 6.28 8.16 ...
## $ TrackerDistance : num [1:940] 8.5 6.97 6.74 6.28 8.16 ...
## $ LoggedActivitiesDistance: num [1:940] 0 0 0 0 0 0 0 0 0 0 ...
## $ VeryActiveDistance : num [1:940] 1.88 1.57 2.44 2.14 2.71 ...
## $ ModeratelyActiveDistance: num [1:940] 0.55 0.69 0.4 1.26 0.41 ...
## $ LightActiveDistance : num [1:940] 6.06 4.71 3.91 2.83 5.04 ...
## $ SedentaryActiveDistance : num [1:940] 0 0 0 0 0 0 0 0 0 0 ...
## $ VeryActiveMinutes : num [1:940] 25 21 30 29 36 38 42 50 28 19 ...
## $ FairlyActiveMinutes : num [1:940] 13 19 11 34 10 20 16 31 12 8 ...
## $ LightlyActiveMinutes : num [1:940] 328 217 181 209 221 164 233 264 205 211 ...
## $ SedentaryMinutes : num [1:940] 728 776 1218 726 773 ...
## $ Calories : num [1:940] 1985 1797 1776 1745 1863 ...
## - attr(*, "spec")=
## .. cols(
## .. Id = col_double(),
## .. ActivityDate = col_character(),
## .. TotalSteps = col_double(),
## .. TotalDistance = col_double(),
## .. TrackerDistance = col_double(),
## .. LoggedActivitiesDistance = col_double(),
## .. VeryActiveDistance = col_double(),
## .. ModeratelyActiveDistance = col_double(),
## .. LightActiveDistance = col_double(),
## .. SedentaryActiveDistance = col_double(),
## .. VeryActiveMinutes = col_double(),
## .. FairlyActiveMinutes = col_double(),
## .. LightlyActiveMinutes = col_double(),
## .. SedentaryMinutes = col_double(),
## .. Calories = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
n_distinct(dailyActivity_merged$Id)
## [1] 33
is.null(dailyActivity_merged)
## [1] FALSE
head(dailyActivity_merged$ActivityDate)
## [1] "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" "4/16/2016" "4/17/2016"
dailyActivity_merged$ActivityDate=as.Date(dailyActivity_merged$ActivityDate,"%m/%d/%Y")
head(dailyActivity_merged$ActivityDate)
## [1] "2016-04-12" "2016-04-13" "2016-04-14" "2016-04-15" "2016-04-16"
## [6] "2016-04-17"
head (sleepDay_merged)
str(sleepDay_merged)
## spec_tbl_df [413 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Id : num [1:413] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ SleepDay : chr [1:413] "4/12/2016 12:00:00 AM" "4/13/2016 12:00:00 AM" "4/15/2016 12:00:00 AM" "4/16/2016 12:00:00 AM" ...
## $ TotalSleepRecords : num [1:413] 1 2 1 2 1 1 1 1 1 1 ...
## $ TotalMinutesAsleep: num [1:413] 327 384 412 340 700 304 360 325 361 430 ...
## $ TotalTimeInBed : num [1:413] 346 407 442 367 712 320 377 364 384 449 ...
## - attr(*, "spec")=
## .. cols(
## .. Id = col_double(),
## .. SleepDay = col_character(),
## .. TotalSleepRecords = col_double(),
## .. TotalMinutesAsleep = col_double(),
## .. TotalTimeInBed = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
n_distinct(sleepDay_merged$Id)
## [1] 24
is.null(sleepDay_merged)
## [1] FALSE
sleepDay_merged$SleepDay=as.Date(sleepDay_merged$SleepDay,"%m/%d/%Y")
head (dailySteps_merged)
str(dailySteps_merged)
## spec_tbl_df [940 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Id : num [1:940] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityDay: chr [1:940] "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
## $ StepTotal : num [1:940] 13162 10735 10460 9762 12669 ...
## - attr(*, "spec")=
## .. cols(
## .. Id = col_double(),
## .. ActivityDay = col_character(),
## .. StepTotal = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
n_distinct(dailySteps_merged$Id)
## [1] 33
is.null(dailySteps_merged)
## [1] FALSE
dailySteps_merged$ActivityDay=as.Date(dailySteps_merged$ActivityDay,"%m/%d/%Y")
head (hourlySteps_merged)
str(hourlySteps_merged)
## spec_tbl_df [22,099 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Id : num [1:22099] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityHour: chr [1:22099] "4/12/2016 12:00:00 AM" "4/12/2016 1:00:00 AM" "4/12/2016 2:00:00 AM" "4/12/2016 3:00:00 AM" ...
## $ StepTotal : num [1:22099] 373 160 151 0 0 ...
## - attr(*, "spec")=
## .. cols(
## .. Id = col_double(),
## .. ActivityHour = col_character(),
## .. StepTotal = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
n_distinct(hourlySteps_merged$Id)
## [1] 33
is.null(hourlySteps_merged)
## [1] FALSE
Hourly_steps<- hourlySteps_merged %>%
rename(date_time = ActivityHour) %>%
mutate(date_time = as.POSIXct(date_time,format ="%m/%d/%Y %I:%M:%S %p" , tz=Sys.timezone()))
Hourly_steps <- Hourly_steps %>%
separate(date_time, into = c("date", "time"), sep= " ") %>%
mutate(date = ymd(date))
head(Hourly_steps)
str(Hourly_steps)
## tibble [22,099 × 4] (S3: tbl_df/tbl/data.frame)
## $ Id : num [1:22099] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ date : Date[1:22099], format: "2016-04-12" "2016-04-12" ...
## $ time : chr [1:22099] "00:00:00" "01:00:00" "02:00:00" "03:00:00" ...
## $ StepTotal: num [1:22099] 373 160 151 0 0 ...
sleepDay_merged=rename(sleepDay_merged, ActivityDate=SleepDay)
data_merged=merge(dailyActivity_merged,sleepDay_merged, by=c("Id","ActivityDate"))
From the graph above, it can be seen that:
Generally, total steps falls between the range of 0 to 15000 steps
Steps and sedentary minutes are inversely correlated. With increase in steps, there will be a decrease in sedentary minutes.
When total steps reaches above 15000, there is a slight increase in sedentary minutes. However, there is insufficient measurements for total steps of 15000 and above. Hence, more data will be required for this observation.
From the graph above, it can be seen that: * Steps is directly
correlated to calories. With increase in total steps, there will be
increase in calories
From the graph above, it can be seen that there is no correlation between steps and sleep minutes
According to healthline, the minimum total steps for fairly active is 7500
From the bar chart above, the days that is fairly active are Monday, Tuesday and Saturday.
According to national sleep foundation, healthy adults need between 7 and 9 hours of sleep per night. By taking the average, sleep timing of 8 hours will be used.
From the bar chart above, it can be observed that none of the users meet the advised hours of sleep.
It can be observed that users are generally active during the early afternoon(1200-1300) as well as early evening(1700-1800).
Activeness can be measured through the use of heartbeat or total
steps. According to centers for disease control and prevention,
activeness intensity is dependence on age and percentage of maximum
heartbeat. In this data set, the respective activeness is measured based
on heartbeat in minutes and has been taken down using fit bit.
From the pie chart, it can be observed that:
81% of the users are sitting down at sedentary most of the time. This could possibly mean that most fitbit users are either student or working in office environment, doing desk job.
16% of the users have a lightly active lifestyle. This might be due to commuting to work/school
1% are fairly active and 2% are very active users. It is possible to increase the percentage on these two groups through marketing.
Below are the future actions that is recommended to Bellabeat:
Using Bellabeat app, include a daily goal reminder and congratulate the user once the recommended total steps is reached. If possible, work with other companies to get sponsorship to reward the user if they reached the goal, so as to better motivate them.
Using Bellabeat’s app to auto generate a weekly report for users to see their physical fitness and sleep performance.
Using Bellabeat’s leaf/time, set a stretching reminder for users that sits for a long time and tell them the disadvantages of inactive lifestyle.
Using Bellabeat’s leaf/time, set a sleep reminder to remind to user to sleep on time
There could be a lot of reasons for the user to have a low step count and inactive lifestyle, such as: did not bring along their tracker, tracker does not look fashionable, short battery life, etc. Hence, the company is strongly advised to: