Bellabeat is a high-tech company founded by Urška Sršen and Sando Mur which is involved in the manufacture of health-focused smart products using their artistic background to develop beautifully designed technology that informs and inspires women around the world.
List of their products:
Bellabeat app: Provides users data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits.
Leaf: Bellabeat’s classic wellness tracker can be worn as a bracelet, necklace, or clip. The Leaf tracker connects to the Bellabeat app to track activity, sleep, and stress.
Time: This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress. The Time watch connects to the Bellabeat app to provide you with insights into your daily wellness.
Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day. The Spring bottle connects to the Bellabeat app to track your hydration levels.
Bellabeat membership: Gives users 24/7 access to fully personalized guidance on nutrition, activity, sleep, health and beauty, and mindfulness based on their lifestyle and goals.
Being able to provide the following:
Conduct a thorough data analysis to uncover key patterns and trends in Bellabeat device usage. This analysis should aim to identify:
User behavior: Analyze activity levels across different demographics, identify peak usage times, and understand how usage patterns vary across different product lines.
Product performance: Evaluate key metrics like step count, sleep duration, heart rate variability, and app engagement. Identify areas for product improvement and new feature development.
Customer segmentation: Identify distinct user segments based on their usage patterns, demographics, and fitness goals. This will enable targeted marketing campaigns and personalized user experiences.
By uncovering these hidden insights, Bellabeat can develop more effective marketing strategies, tailor product offerings to specific user needs, and ultimately enhance customer satisfaction and brand loyalty.”
We are going to use a public data that explores smart device users’ daily habits(CC0: Public Domain, dataset made available through Mobius): This Kaggle data set contains personal fitness tracker from thirty fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits.
The dataset can be found in the following -> link
The next steps are going to be focused on:
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr 1.1.2 âś” readr 2.1.4
## âś” forcats 1.0.0 âś” stringr 1.5.0
## âś” ggplot2 3.4.2 âś” tibble 3.2.1
## âś” lubridate 1.9.2 âś” tidyr 1.3.0
## âś” purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(lubridate)
library(readr)
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(ggbeeswarm)
library(ggridges)
library(data.table)
##
## Attaching package: 'data.table'
##
## The following objects are masked from 'package:lubridate':
##
## hour, isoweek, mday, minute, month, quarter, second, wday, week,
## yday, year
##
## The following objects are masked from 'package:dplyr':
##
## between, first, last
##
## The following object is masked from 'package:purrr':
##
## transpose
heartrate_seconds<- read_csv("C:/Users/Moi/Desktop/Certificados/Google Certificate/Capstones Projects/Bellabeat Proyect/Fitabase Data 3.12.16-4.11.16/heartrate_seconds_merged.csv")
## Rows: 1154681 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Time
## dbl (2): Id, Value
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(heartrate_seconds)
hourlyCalories<- read_csv("C:/Users/Moi/Desktop/Certificados/Google Certificate/Capstones Projects/Bellabeat Proyect/Fitabase Data 3.12.16-4.11.16/hourlyCalories_merged.csv")
## Rows: 24084 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityHour
## dbl (2): Id, Calories
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(hourlyCalories)
hourlySteps<- read_csv("C:/Users/Moi/Desktop/Certificados/Google Certificate/Capstones Projects/Bellabeat Proyect/Fitabase Data 3.12.16-4.11.16/hourlySteps_merged.csv")
## Rows: 24084 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityHour
## dbl (2): Id, StepTotal
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(hourlySteps)
sleepDay<- read_csv("C:/Users/Moi/Desktop/Certificados/Google Certificate/Capstones Projects/Bellabeat Proyect/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
## Rows: 413 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): SleepDay
## dbl (4): Id, TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(sleepDay)
str(heartrate_seconds)
## spc_tbl_ [1,154,681 Ă— 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Id : num [1:1154681] 2.02e+09 2.02e+09 2.02e+09 2.02e+09 2.02e+09 ...
## $ Time : chr [1:1154681] "4/1/2016 7:54:00 AM" "4/1/2016 7:54:05 AM" "4/1/2016 7:54:10 AM" "4/1/2016 7:54:15 AM" ...
## $ Value: num [1:1154681] 93 91 96 98 100 101 104 105 102 106 ...
## - attr(*, "spec")=
## .. cols(
## .. Id = col_double(),
## .. Time = col_character(),
## .. Value = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
summary(heartrate_seconds)
## Id Time Value
## Min. :2.022e+09 Length:1154681 Min. : 36.00
## 1st Qu.:4.020e+09 Class :character 1st Qu.: 66.00
## Median :5.554e+09 Mode :character Median : 77.00
## Mean :5.352e+09 Mean : 79.76
## 3rd Qu.:6.962e+09 3rd Qu.: 90.00
## Max. :8.878e+09 Max. :185.00
head(heartrate_seconds)
## # A tibble: 6 Ă— 3
## Id Time Value
## <dbl> <chr> <dbl>
## 1 2022484408 4/1/2016 7:54:00 AM 93
## 2 2022484408 4/1/2016 7:54:05 AM 91
## 3 2022484408 4/1/2016 7:54:10 AM 96
## 4 2022484408 4/1/2016 7:54:15 AM 98
## 5 2022484408 4/1/2016 7:54:20 AM 100
## 6 2022484408 4/1/2016 7:54:25 AM 101
str(hourlyCalories)
## spc_tbl_ [24,084 Ă— 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Id : num [1:24084] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityHour: chr [1:24084] "3/12/2016 12:00:00 AM" "3/12/2016 1:00:00 AM" "3/12/2016 2:00:00 AM" "3/12/2016 3:00:00 AM" ...
## $ Calories : num [1:24084] 48 48 48 48 48 48 48 48 48 49 ...
## - attr(*, "spec")=
## .. cols(
## .. Id = col_double(),
## .. ActivityHour = col_character(),
## .. Calories = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
summary(hourlyCalories)
## Id ActivityHour Calories
## Min. :1.504e+09 Length:24084 Min. : 42.00
## 1st Qu.:2.347e+09 Class :character 1st Qu.: 61.00
## Median :4.559e+09 Mode :character Median : 77.00
## Mean :4.889e+09 Mean : 94.27
## 3rd Qu.:6.962e+09 3rd Qu.:104.00
## Max. :8.878e+09 Max. :933.00
head(hourlyCalories)
## # A tibble: 6 Ă— 3
## Id ActivityHour Calories
## <dbl> <chr> <dbl>
## 1 1503960366 3/12/2016 12:00:00 AM 48
## 2 1503960366 3/12/2016 1:00:00 AM 48
## 3 1503960366 3/12/2016 2:00:00 AM 48
## 4 1503960366 3/12/2016 3:00:00 AM 48
## 5 1503960366 3/12/2016 4:00:00 AM 48
## 6 1503960366 3/12/2016 5:00:00 AM 48
str(hourlySteps)
## spc_tbl_ [24,084 Ă— 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Id : num [1:24084] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityHour: chr [1:24084] "3/12/2016 12:00:00 AM" "3/12/2016 1:00:00 AM" "3/12/2016 2:00:00 AM" "3/12/2016 3:00:00 AM" ...
## $ StepTotal : num [1:24084] 0 0 0 0 0 0 0 0 0 8 ...
## - attr(*, "spec")=
## .. cols(
## .. Id = col_double(),
## .. ActivityHour = col_character(),
## .. StepTotal = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
summary(hourlySteps)
## Id ActivityHour StepTotal
## Min. :1.504e+09 Length:24084 Min. : 0.0
## 1st Qu.:2.347e+09 Class :character 1st Qu.: 0.0
## Median :4.559e+09 Mode :character Median : 10.0
## Mean :4.889e+09 Mean : 286.2
## 3rd Qu.:6.962e+09 3rd Qu.: 289.0
## Max. :8.878e+09 Max. :10565.0
head(hourlySteps)
## # A tibble: 6 Ă— 3
## Id ActivityHour StepTotal
## <dbl> <chr> <dbl>
## 1 1503960366 3/12/2016 12:00:00 AM 0
## 2 1503960366 3/12/2016 1:00:00 AM 0
## 3 1503960366 3/12/2016 2:00:00 AM 0
## 4 1503960366 3/12/2016 3:00:00 AM 0
## 5 1503960366 3/12/2016 4:00:00 AM 0
## 6 1503960366 3/12/2016 5:00:00 AM 0
str(sleepDay)
## spc_tbl_ [413 Ă— 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Id : num [1:413] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ SleepDay : chr [1:413] "4/12/2016 12:00:00 AM" "4/13/2016 12:00:00 AM" "4/15/2016 12:00:00 AM" "4/16/2016 12:00:00 AM" ...
## $ TotalSleepRecords : num [1:413] 1 2 1 2 1 1 1 1 1 1 ...
## $ TotalMinutesAsleep: num [1:413] 327 384 412 340 700 304 360 325 361 430 ...
## $ TotalTimeInBed : num [1:413] 346 407 442 367 712 320 377 364 384 449 ...
## - attr(*, "spec")=
## .. cols(
## .. Id = col_double(),
## .. SleepDay = col_character(),
## .. TotalSleepRecords = col_double(),
## .. TotalMinutesAsleep = col_double(),
## .. TotalTimeInBed = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
summary(sleepDay)
## Id SleepDay TotalSleepRecords TotalMinutesAsleep
## Min. :1.504e+09 Length:413 Min. :1.000 Min. : 58.0
## 1st Qu.:3.977e+09 Class :character 1st Qu.:1.000 1st Qu.:361.0
## Median :4.703e+09 Mode :character Median :1.000 Median :433.0
## Mean :5.001e+09 Mean :1.119 Mean :419.5
## 3rd Qu.:6.962e+09 3rd Qu.:1.000 3rd Qu.:490.0
## Max. :8.792e+09 Max. :3.000 Max. :796.0
## TotalTimeInBed
## Min. : 61.0
## 1st Qu.:403.0
## Median :463.0
## Mean :458.6
## 3rd Qu.:526.0
## Max. :961.0
head(sleepDay)
## # A tibble: 6 Ă— 5
## Id SleepDay TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## <dbl> <chr> <dbl> <dbl> <dbl>
## 1 1503960366 4/12/2016 12:0… 1 327 346
## 2 1503960366 4/13/2016 12:0… 2 384 407
## 3 1503960366 4/15/2016 12:0… 1 412 442
## 4 1503960366 4/16/2016 12:0… 2 340 367
## 5 1503960366 4/17/2016 12:0… 1 700 712
## 6 1503960366 4/19/2016 12:0… 1 304 320
Taking a look if we found some NA’s
colSums(is.na(heartrate_seconds))
## Id Time Value
## 0 0 0
colSums(is.na(hourlyCalories))
## Id ActivityHour Calories
## 0 0 0
colSums(is.na(hourlySteps))
## Id ActivityHour StepTotal
## 0 0 0
colSums(is.na(sleepDay))
## Id SleepDay TotalSleepRecords TotalMinutesAsleep
## 0 0 0 0
## TotalTimeInBed
## 0
Let’s review if there are any duplicated values
summary(duplicated(heartrate_seconds))
## Mode FALSE
## logical 1154681
summary(duplicated(hourlyCalories))
## Mode FALSE
## logical 24084
summary(duplicated(hourlySteps))
## Mode FALSE
## logical 24084
summary(duplicated((sleepDay)))
## Mode FALSE TRUE
## logical 410 3
sleepDay<-sleepDay[!duplicated(sleepDay),]
Knowing that the Calories and Steps datasets have the same number of rows and columns, we are going to merge both of them considering the Id and the ActivityHour which are the same
hourly_calories_steps<-merge(hourlyCalories, hourlySteps,by=c("Id","ActivityHour"),all=TRUE)
We are going to change the dates from chr to date format
heartrate_seconds$Time <- mdy_hms(heartrate_seconds$Time)
hourlyCalories$ActivityHour<-mdy_hms(hourlyCalories$ActivityHour)
hourlySteps$ActivityHour<-mdy_hms(hourlySteps$ActivityHour)
hourly_calories_steps$ActivityHour<-mdy_hms(hourly_calories_steps$ActivityHour)
sleepDay$SleepDay<-as.Date(sleepDay$SleepDay, format = '%m/%d/%Y')
Now lets add weekdays to each data set for a better daily analysis
sleepDay$Day_week<-weekdays(sleepDay$SleepDay)
hourlyCalories$Day_week<-weekdays(hourlyCalories$ActivityHour)
hourlySteps$Day_week<-weekdays(hourlySteps$ActivityHour)
hourly_calories_steps$Day_week<-weekdays(hourly_calories_steps$ActivityHour)
For a more accurate analysis with the heart rate, steps and calories data set, we are going to divide every hour, minute and second just in case
heartrate_seconds$Hour<-hour(heartrate_seconds$Time)
heartrate_seconds$Minute<-minute(heartrate_seconds$Time)
heartrate_seconds$Seconds<-second(heartrate_seconds$Time)
hourlyCalories$Hour<-hour(hourlyCalories$ActivityHour)
hourlySteps$Hour<-hour(hourlySteps$ActivityHour)
hourly_calories_steps$Hour<-hour(hourly_calories_steps$ActivityHour)
We are going to measure the heart rate by every hour to analyse during which period of time, the heart is more active
ggplot(heartrate_seconds, aes(x = Value)) +
geom_density() +
facet_wrap(~ Hour) +
labs(x = "Heart Rate", y = "Density", title = "Heart Rate Density by Hour") +
theme_bw()
The following graphs depict periods of lower heart rate variability, typically between 2:00 AM and 6:00 AM. Lower heart rate variability during these hours often indicates deeper sleep and a more relaxed state, as the heart’s rhythm remains relatively consistent. By the other hand, increased heart rate variability observed between 10:00 AM and 8:00 PM likely reflects heightened activity levels during the day. The flatter appearance of the graphs during these hours suggests a wider range of heart rates, indicative of increased physiological activity and engagement in daily routines.”
The sleepDay dataset is going to be distributed by the quality of sleep This division is going to be distributed by the number of minutes
sleepDay$sleep_quality <- cut(sleepDay$TotalMinutesAsleep,
breaks = c(-Inf, 240, 420, 500, Inf),
labels = c("Bad sleeper", "Regular sleeper", "Good sleeper", "Oversleeping"))
We are going to ilustrate the percentage distribution for better understanding of the records
sleep_quantity_count<-sleepDay %>%
group_by(sleep_quality) %>%
summarize(count_sleepers = sum(sleep_quality %in%
c("Bad sleeper", "Regular sleeper",'Good sleeper','Oversleeping')))
total_sleepers<-sum(sleep_quantity_count$count_sleepers)
sleep_quantity_counts_percentage <- sleep_quantity_count %>%
mutate(percentage = (count_sleepers / total_sleepers) * 100)
ggplot(sleep_quantity_counts_percentage, aes(x = "", y = percentage, fill = sleep_quality)) +
geom_bar(stat = "identity", width = 1) +
coord_polar(theta = "y") +
labs(title = "Distribution of Sleep Quality", fill = "Sleep Quality") +
theme_void() +
geom_text(aes(label = paste0(round(percentage, 1), "% (", count_sleepers, ")")),
position = position_stack(vjust = 0.5))
Sleep quality box plot
ggplot(sleepDay, aes(x = factor(Day_week,
levels = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")),
y = TotalMinutesAsleep,
color = sleep_quality)) +
geom_boxplot()+labs(x = "Days of the Week", y = "Total Minutes Asleep")
Now we are going to divide the types of users based on their calorie expenditure to determine their activity levels. The more calories they are burning, the more involved the body is in an energetic state
hourly_calories_steps$type_of_user<-cut(hourly_calories_steps$Calories,
breaks = c(-Inf, 100, 200, 400, Inf),
labels = c("LowActivity", "RegularActivity", "HighlyActive", "VeryActive"))
We are going to do another percentage distribution for better understanding of the records
users_quantity<-hourly_calories_steps %>%
group_by(type_of_user) %>%
summarize(count_users = sum(type_of_user %in%
c("LowActivity", "RegularActivity",'HighlyActive','VeryActive')))
total_users<-sum(users_quantity$count_users)
users_quantity_counts_percentage <- users_quantity %>%
mutate(percentage = (count_users / total_users) * 100)
ggplot(users_quantity_counts_percentage, aes(x = type_of_user, y = percentage, fill = type_of_user)) +
geom_col() +
labs(x = "Type of User", y = "Percentage", fill = "Type of User") +
geom_text(aes(label = paste0(round(percentage, 1), "% (", count_users, ")")),
position = position_stack(vjust = 0.7))
cor(hourly_calories_steps$Calories,hourly_calories_steps$StepTotal)
## [1] 0.8257203
ggplot(hourly_calories_steps)+geom_smooth(mapping = aes(x=StepTotal,
y=Calories))+geom_point(mapping=aes(x=StepTotal, y=Calories, color=type_of_user))
## `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
ggplot(hourly_calories_steps, aes(x = factor(Hour))) +
stat_summary(fun = "mean", geom = "bar", aes(y = Calories), fill='red') +
labs(x = "Hour", y = "Mean Calories")
ggplot(hourly_calories_steps, aes(x = factor(Hour))) +
stat_summary(fun = "mean", geom = "bar", aes(y = StepTotal), fill='orange') +
labs(x = "Hour", y = "Mean Steps")
calories <- hourly_calories_steps %>%
group_by(Day_week, type_of_user)%>%
summarise(mean_calories = mean(Calories))
## `summarise()` has grouped output by 'Day_week'. You can override using the
## `.groups` argument.
steps <- hourly_calories_steps %>%
group_by(Day_week, type_of_user)%>%
summarise(mean_steps = mean(StepTotal))
## `summarise()` has grouped output by 'Day_week'. You can override using the
## `.groups` argument.
ggplot(calories, aes(x = factor(Day_week,levels = c("Monday", "Tuesday",
"Wednesday", "Thursday", "Friday", "Saturday", "Sunday")), y = mean_calories,
fill = type_of_user)) + geom_col(position = "dodge") +
labs(x = "Day of the Week", y = "Mean Calories", fill = "User Type")+geom_text(aes(label = round(mean_calories, 1)),
position = position_dodge(width = 0.9), vjust = -0.5, size = 3)
ggplot(steps, aes(x = factor(Day_week,levels = c("Monday", "Tuesday",
"Wednesday", "Thursday", "Friday", "Saturday", "Sunday")), y = mean_steps,
fill = type_of_user)) + geom_col(position = "dodge") +
labs(x = "Day of the Week", y = "Mean Steps", fill = "User Type")+geom_text(aes(label = round(mean_steps, 1)),
position = position_dodge(width = 0.9), vjust = -0.5, size = 3)
As a conclusion, through every step of the analysis and distributing the users based on their qualities like sleeping minutes, burned calories, heart rate and quantity of steps, I highly recommend to get involved into a more personalized marketing strategy for each customer trough the Bellabeat app or promoting the membership trough special periods of time where the customer can be more interesting about his health condition.
Segmentation: Making small groups of people by qualities makes the decision making more effective and easy to track by offering special messages of motivation aiming the groups of people who doesn’t have to much activity or people who tend to achieve positive goals like more time of sleeping or some extra steps during a day.Specially for the segment of client where are grouped into the LowActivity or RegularActivity group
A/B test: It can be interesting to make an hypothesis about decision making about customers that are getting started with the app considering some pop up messages with reminders about daily weeks achievements. Specially for the days with low activity like Thursdays or Tuesdays. Group A goes for people that are not going to receive the pop up and Group B goes for people who are going to receive it the message. With a pop up message trough the app about how good the user is achieving their healthy goals. We want to measure the influence of this messages to see how well they continue with a healthy routine.
Ads during peak moments of use: The results showed us that the peak moments where people get to be more active are from 12:00PM till 19:00. During those moments of the day, most of the ads related to fitness or health care can be added more constantly.
Sleep quality campaign: Small reminders about the importance of sleep quality, specially during the days with a highest amount of users who are sleeping less than 4 hours like Monday, Thursday and Saturday
Customer insights: Trough the app membership, it can be possible to show the rithm of the heart rates every hour and customize it to suggest daily diets for a better heart care.