Bellabeat is a high-tech manufacturer of health-focused products for women. Collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits. Although Bellabeat is a successful small company, they have the potential to become a larger player in the global smart device market. Urška Sršen, cofounder and Chief Creative Officer of Bellabeat, believes that analyzing smart device fitness data could help unlock new growth opportunities for the company.
Since it was founded in 2013, Bellabeat has grown rapidly and quickly positioned itself as a tech-driven wellness company for women. The company has 5 focus products: bellabeat app, leaf, time, spring and bellabeat membership. Bellabeat is a successful small company, but they have the potential to become a larger player in the global smart device market. Our team have been asked to analyze smart device data to gain insight into how consumers are using their smart devices. The insights we discover will then help guide marketing strategy for the company.
To identify potential opportunities for growth and provide recommendations for the Bellabeat marketing strategy improvement based on trends in smart device usage.
Urška Sršen: Bellabeat’s cofounder and Chief Creative Officer Sando Mur: Mathematician and Bellabeat’s co-founder
Questions to guide the analysis:
What are some trends in smart device usage?
How could these trends apply to Bellabeat customers?
How could these trends help influence Bellabeat marketing strategy?
The data being used in this case study can be found here: https://www.kaggle.com/datasets/arashnic/fitbit. The data is stored and uploaded in R Studio. This Kaggle data set contains personal fitness tracker from thirty fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits. The data set contains 18 CSV files organized in long format.
Whether the dataset follows the ROCCC Analysis as mentioned below:
Reliability - LOW: The data comes from 30 fitbit users who consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring.
Original - LOW: Third party data collect using Amazon Mechanical Turk.
Comprehensive - MED: The dataset contains multiple fields on daily activity intensity, calories used, daily steps taken, daily sleep time and weight record.
Current - LOW: This data is from March 2016 through May 2016. The data is not current, meaning that user habits may have changed over the years.
Cited - LOW: Data was collected from a third party, therefore unknown.
install.packages("tidyverse")
library ("tidyverse")
Importing the datasets
activity <- read.csv("C:/Users/HP/OneDrive/Desktop/Capstone/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
calories <- read.csv("C:/Users/HP/OneDrive/Desktop/Capstone/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/hourlyCalories_merged.csv")
intensity <- read.csv("C:/Users/HP/OneDrive/Desktop/Capstone/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/hourlyIntensities_merged.csv")
sleep <- read.csv("C:/Users/HP/OneDrive/Desktop/Capstone/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
weight <- read.csv("C:/Users/HP/OneDrive/Desktop/Capstone/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/weightLogInfo_merged.csv")
Date Formatting
#intensity
intensity$ActivityHour=as.POSIXct(intensity$ActivityHour, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
intensity$time <- format(intensity$ActivityHour, format = "%H:%M:%S")
intensity$date <- format(intensity$ActivityHour, format = "%m/%d/%y")
# calories
calories$ActivityHour=as.POSIXct(calories$ActivityHour, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
calories$time <- format(calories$ActivityHour, format = "%H:%M:%S")
calories$date <- format(calories$ActivityHour, format = "%m/%d/%y")
# activity
activity$ActivityDate=as.POSIXct(activity$ActivityDate, format="%m/%d/%Y", tz=Sys.timezone())
activity$date <- format(activity$ActivityDate, format = "%m/%d/%y")
# sleep
sleep$SleepDay=as.POSIXct(sleep$SleepDay, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
sleep$date <- format(sleep$SleepDay, format = "%m/%d/%y")
To beging the analysis phase, we will first see how many participants there are in each category.
# Finding number of participants in each category
n_distinct(activity$Id)
n_distinct(calories$Id)
n_distinct(intensity$Id)
n_distinct(sleep$Id)
n_distinct(weight$Id)
There are 33 participants in the activity, calories, and intensities datasets, 24 in the sleep dataset, and only 8 in the weight dataset. The fact that there are only 8 participants in the weight dataset means that more data would be needed to make a strong recommendation or conclusion.
Summary of the datasets:
#activity
activity %>%
select(TotalSteps,
TotalDistance,
SedentaryMinutes, Calories) %>%
summary()
# active minutes per category
activity %>%
select(VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes) %>%
summary()
# calories
calories %>%
select(Calories) %>%
summary()
# sleep
sleep %>%
select(TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed) %>%
summary()
# weight
weight %>%
select(WeightKg, BMI) %>%
summary()
Observations made from the above summaries:
Sedentary minutes on average is 16.5 hours.
The average number of steps per day is 7638. The CDC recommends people take 10,000 steps daily.
The majority of the participants are lightly active.
The average participant burns 97 calories per hour.
On average, participants sleep for 7 hours.
Before beginning to visualize the data, we need to merge the sleep and activity datasets.
merged_data <- merge(sleep, activity, by=c('Id', 'date'))
head(merged_data)
ggplot(data=activity, aes(x=TotalSteps, y=Calories)) +
geom_point() + geom_smooth() + labs(title="Total Steps vs. Calories")
ggplot(data=sleep, aes(x=TotalMinutesAsleep, y=TotalTimeInBed)) +
geom_point()+ labs(title="Total Minutes Asleep vs. Total Time in Bed")
ggplot(data=merged_data, aes(x=TotalMinutesAsleep, y=SedentaryMinutes)) +
geom_point(color='darkblue') + geom_smooth() +
labs(title="Minutes Asleep vs. Sedentary Minutes")
Plots:
Judging from the scatter chart above, there is a correlation between total number of steps taken and calories burned.
The more steps each participant takes, the more calories they burn.
We can clearly see the negative relationship between Sedentary Minutes and Sleep time.
Note: if Bellabeat users want to improve their sleep, Bellabeat app can recommend reducing sedentary time.
As we already know, collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits. Since it was founded in 2013, Bellabeat has grown rapidly and quickly positioned itself as a tech-driven wellness company for women.
After analyzing FitBit Fitness Tracker Data, I found some insights that would help influence Bellabeat marketing strategy.
As stated previously, the average number of steps per day is 7,638. This is lower than what the CDC recommends. According to CDC’s official website: 8,000 steps per day was associated with a 51% lower risk for all-cause mortality (or death from all causes). Taking 12,000 steps per day was associated with a 65% lower risk compared with taking 4,000 steps. one thing Bellabeat can do is suggest that users take at least 8,000 steps per day and explain the benefits that come with it.
The majority of participants are lightly active. Bellabeat should offer a progression system in the app to encourage participants to become at least fairly active.
Bellabeat can suggest some ideas for low calorie breakfast, lunch, and dinner foods to help users that want to lose weight.
If users want to improve the quality of their sleep, Bellabeat should consider using app notifications reminding users to get enough rest, as well as recommending reducing sedentary time.