Founded in 2014, Bellabeat is the company that developed one of the first wearables specifically designed for women and has since gone on to create a portfolio of digital products for tracking and improving the health of women.
Focusing on creating innovative health and wellness products for women, their mission is to empower women to take control of their health by providing them with technology-driven solutions that blend design and function.
💡 BUSINESS TASK : Analyze Fitbit data to gain insight and help guide marketing strategy for Bellabeat to grow as a global player.
Primary stakeholders: Urška Sršen and Sando Mur, executive team members.
Secondary stakeholders: Bellabeat marketing analytics team.
🗄️ DATA SOURCE : 30 participants FitBit Fitness Tracker Data from Mobius: https://www.kaggle.com/arashnic/fitbit
The dataset has 18 CSV. The data also follow a ROCCC approach:
⚠️ The data set has limitations:
n_distinct() to check
for unique user Id, the set has 33 user data from daily activity, 24
from sleep and only 8 from weight. There are 3 extra users and some
users did not record their data for tracking daily activity and
sleep.Importing the 3 CSVs : daily_activity, sleep, weight into R using
read.csv.
daily_activity <- read.csv("/Users/karanarora/Documents/Learning/Projects/Bellabeat/dailyActivity_merged.csv")
sleep_day <- read.csv("/Users/karanarora/Documents/Learning/Projects/Bellabeat/sleepDay_merged.csv")
weight <-read.csv("/Users/karanarora/Documents/Learning/Projects/Bellabeat/weightLogInfo_merged.csv")
Examining the three tables (daily_activity,sleep,weight) for any duplicates or null values :
sum(is.na(daily_activity))
## [1] 0
sum(is.na(sleep_day))
## [1] 0
sum(is.na(weight))
## [1] 65
sum(duplicated(daily_activity))
## [1] 0
sum(duplicated(sleep_day))
## [1] 3
sum(duplicated(weight))
## [1] 0
Removing the 3 duplicates from sleep_day :
sleep_day <- sleep_day[!duplicated(sleep_day), ]
sum(duplicated(sleep_day))
## [1] 0
Convert ActivityDate into date format and add column for the weekday :
daily_activity <- daily_activity %>%
mutate( Weekday = weekdays(as.Date(ActivityDate,"%m/%d/%Y")))
Checking to see if we have 30 users using n_distinct.
The data set has 33 users from daily_activity, 24 from sleep and only 8
from weight. If there is a discrepency such as in the weight table,
checking to see how the data are recorded. The way the user record the
data may give us insight on why there is missing data.
weight %>%
filter(IsManualReport == "TRUE") %>%
group_by(Id) %>%
summarise("Manual Weight Report" = n()) %>%
distinct()
## # A tibble: 0 × 2
## # ℹ 2 variables: Id <dbl>, Manual Weight Report <int>
Merging the 3 data tables :
merged1 <- merge(daily_activity,sleep_day,by = c("Id"), all=TRUE)
merged_data <- merge(merged1,weight,by = c("Id"), all = TRUE)
Checking how often users record their data. We can see most of the
users record their data from Tuesday to Thursday using
ggplot. We need to ask why the users are not recording
their data on Mondays or Fridays?
ggplot(data=merged_data, aes(x= Weekday)) + geom_bar(fill = "blue") +
labs(title = "Data Recording During the Week")
⛔ From weekday’s total asleep minutes, we can see the graph look almost same as the graph above! We can confirmed that most sleep data is also recorded during Tuesday to Thursday. This raised a question “how comprehensive are the data to form an accurate analysis?”
ggplot(data = merged_data,aes(x = Weekday,y = TotalMinutesAsleep, fill = Weekday))+
geom_bar(stat="identity",fill="blue")+
labs(title = "Total Minutes Asleep During the Week", y = "Total Minutes Asleep")
Check min, max, mean, median and any outliers. Avg weight is 135 pounds with BMI of 24 and burn 2050 calories. Avg steps is 10200, max is almost triple that 36000 steps. Users spend on avg 12 hours a day in sedentary minutes, 4 hours lightly active, only half hour in fairly+very active! Users also gets about 7 hour of sleep.
merged_data %>%
select(TotalSteps,
TotalDistance,
VeryActiveMinutes,
FairlyActiveMinutes,
LightlyActiveMinutes,
SedentaryMinutes,
Calories,
TotalSleepRecords,
TotalMinutesAsleep,
TotalTimeInBed,
WeightPounds,
BMI) %>%
summary()
## TotalSteps TotalDistance VeryActiveMinutes FairlyActiveMinutes
## Min. : 0 Min. : 0.000 Min. : 0.00 Min. : 0.00
## 1st Qu.: 5832 1st Qu.: 3.910 1st Qu.: 0.00 1st Qu.: 3.00
## Median :10199 Median : 6.820 Median : 15.00 Median : 14.00
## Mean : 9373 Mean : 6.415 Mean : 23.57 Mean : 17.82
## 3rd Qu.:12109 3rd Qu.: 8.350 3rd Qu.: 38.00 3rd Qu.: 31.00
## Max. :36019 Max. :28.030 Max. :210.00 Max. :143.00
##
## LightlyActiveMinutes SedentaryMinutes Calories TotalSleepRecords
## Min. : 0.0 Min. : 0.0 Min. : 0 Min. :1.000
## 1st Qu.:194.0 1st Qu.: 637.0 1st Qu.:1850 1st Qu.:1.000
## Median :238.0 Median : 697.0 Median :2046 Median :1.000
## Mean :232.2 Mean : 722.6 Mean :2103 Mean :1.101
## 3rd Qu.:288.0 3rd Qu.: 745.0 3rd Qu.:2182 3rd Qu.:1.000
## Max. :518.0 Max. :1440.0 Max. :4900 Max. :3.000
## NA's :971
## TotalMinutesAsleep TotalTimeInBed WeightPounds BMI
## Min. : 58.0 Min. : 61.0 Min. :116.0 Min. :21.45
## 1st Qu.:400.0 1st Qu.:421.0 1st Qu.:134.9 1st Qu.:23.89
## Median :442.0 Median :457.0 Median :135.6 Median :24.00
## Mean :433.8 Mean :458.2 Mean :139.6 Mean :24.42
## 3rd Qu.:477.0 3rd Qu.:510.0 3rd Qu.:136.7 3rd Qu.:24.21
## Max. :796.0 Max. :961.0 Max. :294.3 Max. :47.54
## NA's :971 NA's :971 NA's :8881 NA's :8881
total_minutes <- sum(daily_activity$SedentaryMinutes, daily_activity$VeryActiveMinutes,
daily_activity$FairlyActiveMinutes, daily_activity$LightlyActiveMinutes)
sedentary_percentage <- sum(daily_activity$SedentaryMinutes)/total_minutes*100
lightly_percentage <- sum(daily_activity$LightlyActiveMinutes)/total_minutes*100
fairly_percentage <- sum(daily_activity$FairlyActiveMinutes)/total_minutes*100
active_percentage <- sum(daily_activity$VeryActiveMinutes)/total_minutes*100
percentage <- data.frame(
level = c("Sedentary", "Lightly", "Fairly", "Active"),
minutes = c(sedentary_percentage,lightly_percentage,fairly_percentage,active_percentage)
)
plot_ly(percentage, labels = ~level, values = ~minutes, type = 'pie',textposition = 'outside',textinfo = 'label+percent') %>%
layout(title = 'Activity Level Minutes',
xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))
The American Heart Association and World Health Organization recommend at least 150 minutes of moderate-intensity activity or 75 minutes of vigorous activity, or a combination of both, each week. That means it needs an daily goal of 21.4 minutes of FairlyActiveMinutes or 10.7 minutes of VeryActiveMinutes.
In our dataset, 30 users met fairly active minutes or very active minutes.
active_users <- daily_activity %>%
filter(FairlyActiveMinutes >= 21.4 | VeryActiveMinutes>= 10.7) %>%
group_by(Id) %>%
count(Id)
active_users
## # A tibble: 30 × 2
## # Groups: Id [30]
## Id n
## <dbl> <int>
## 1 1503960366 30
## 2 1624580081 6
## 3 1644430081 16
## 4 1927972279 2
## 5 2022484408 28
## 6 2320127002 2
## 7 2347167796 12
## 8 2873212765 11
## 9 3372868164 11
## 10 3977333714 27
## # ℹ 20 more rows
The bar graph shows that there is a jump on Tuesday: user spent LESS time in sedentary minutes and take MORE steps. Users are out and about on Tuesday.
ggplot(data = merged_data,aes(x = Weekday,y = TotalSteps, fill = Weekday))+
geom_bar(stat="identity")+
labs(title = "More Steps on Tuesday", y = "Total Steps")
The more active that you’re, the more steps you take, and the more calories you will burn. This is an obvious fact, but we can still look into the data to find any interesting. Here we see that some users who are sedentary, take minimal steps, but still able to burn over 1500 to 2500 calories compare to users who are more active, take more steps, but still burn similar calories.
According to article: Fitbit Sleep Study, 55 minutes are spent awake in bed before going to sleep. We have 13 users in our dataset spend 55 minutes awake before alseep.
sleep_day_in_hour <-sleep_day
sleep_day_in_hour$TotalMinutesAsleep <- sleep_day_in_hour$TotalMinutesAsleep/60
sleep_day_in_hour$TotalTimeInBed <- sleep_day_in_hour$TotalTimeInBed/60
head(sleep_day_in_hour)
## Id SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 4/12/2016 12:00:00 AM 1 5.450000
## 2 1503960366 4/13/2016 12:00:00 AM 2 6.400000
## 3 1503960366 4/15/2016 12:00:00 AM 1 6.866667
## 4 1503960366 4/16/2016 12:00:00 AM 2 5.666667
## 5 1503960366 4/17/2016 12:00:00 AM 1 11.666667
## 6 1503960366 4/19/2016 12:00:00 AM 1 5.066667
## TotalTimeInBed
## 1 5.766667
## 2 6.783333
## 3 7.366667
## 4 6.116667
## 5 11.866667
## 6 5.333333
13 users spend more than 55 minutes in bed before falling asleep :
awake_in_bed <- mutate(sleep_day, AwakeTime = TotalTimeInBed - TotalMinutesAsleep)
awake_in_bed <- awake_in_bed %>%
filter(AwakeTime >= 55) %>%
group_by(Id) %>%
arrange(AwakeTime, desc=TRUE)
n_distinct(awake_in_bed$Id)
## [1] 13
Majority of the users sleep between 5 to 10 hours burns around 1500 to 4500 calories a day.
Conclusion based on our analysis:
Sedentary make up a significant portion, 81% of users daily active minutes. Users spend on avg 12 hours a day in sedentary minutes, 4 hours lightly active, and only half-hour in fairly+very active! We see the most change on Saturday: users take more steps, burn more calories, and spend less time sedentary. Sunday is the most “lazy” day for users. 54% of the users who recorded their sleep data spent 55 minutes awake in bed before falling asleep. Users takes the most steps from 5 PM to 7 PM Users who are sedentary take minimal steps and burn 1500 to 2500 calories compared to users who are more active, take more steps, but still burn similar calories. Marketing recommendations to expand globally:
🔢 Obtain more data for an accurate analysis, encouraging users to use a wifi-connected scale instead of manual weight entries.
🚲 Educational healthy style campaign encourages users to have short active exercises during the week, longer during the weekends, especially on Sunday where we see the lowest steps and most sedentary minutes.
🎁 Educational healthy style campaign can pair with a point-award incentive system. Users completing the whole week’s exercise will receive Bellabeat points on products/memberships.
🏃♂️ The product, such as Leaf wellness tracker, can beat or vibrate after a prolonged period of sedentary minutes, signaling the user it’s time to get active! Similarly, it can also remind the user it’s time to sleep after sensing a prolonged awake time in bed.