Fit Bit Analysis
This analysis is made with scenario of a data scientist was assigned to assist a fitness technology unicorn, named “BugarBahagia”, to improve their penetration to market by analyzing FitBit customer data.
Preface
Data Source and Goals
The dataset that i am using for this case study comes from open dataset: https://www.kaggle.com/datasets/arashnic/fitbit
It’s a public dataset, that was generated by respondents to a survey via Amazon Mechanical Turk. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. By explorating those datasets,and the data frames we will be working also include:
daily activity information
daily sleep information
weight log info information
Goals of the case study
Our goals for the project are:
Identify the trends in smart mobile usage
Determine how those trends could apply to BugarBahagia products and customers
Explore how those trends share influence to BugarBahagia’s marketing strategy
Library Being Used
library(tidyverse)
library(plotly)
library(scales)
library(glue)
library(lubridate)
library(hrbrthemes)
library(ggplot2)
library(ggcorrplot)Data Reading / CSV Loading
setwd("c:/Users/ASUS/Documents/Algoritma/3_DV_LBB")
daily_activity <- read.csv("dailyActivity_merged.csv")
sleeping_day <- read.csv("sleepDay_merged.csv")
weight_info <- read.csv("weightLogInfo_merged.csv")The Daily Activities
head(daily_activity) #> Id ActivityDate TotalSteps TotalDistance TrackerDistance
#> 1 1503960366 4/12/2016 13162 8.50 8.50
#> 2 1503960366 4/13/2016 10735 6.97 6.97
#> 3 1503960366 4/14/2016 10460 6.74 6.74
#> 4 1503960366 4/15/2016 9762 6.28 6.28
#> 5 1503960366 4/16/2016 12669 8.16 8.16
#> 6 1503960366 4/17/2016 9705 6.48 6.48
#> LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
#> 1 0 1.88 0.55
#> 2 0 1.57 0.69
#> 3 0 2.44 0.40
#> 4 0 2.14 1.26
#> 5 0 2.71 0.41
#> 6 0 3.19 0.78
#> LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
#> 1 6.06 0 25
#> 2 4.71 0 21
#> 3 3.91 0 30
#> 4 2.83 0 29
#> 5 5.04 0 36
#> 6 2.51 0 38
#> FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
#> 1 13 328 728 1985
#> 2 19 217 776 1797
#> 3 11 181 1218 1776
#> 4 34 209 726 1745
#> 5 10 221 773 1863
#> 6 20 164 539 1728
colnames(daily_activity) #> [1] "Id" "ActivityDate"
#> [3] "TotalSteps" "TotalDistance"
#> [5] "TrackerDistance" "LoggedActivitiesDistance"
#> [7] "VeryActiveDistance" "ModeratelyActiveDistance"
#> [9] "LightActiveDistance" "SedentaryActiveDistance"
#> [11] "VeryActiveMinutes" "FairlyActiveMinutes"
#> [13] "LightlyActiveMinutes" "SedentaryMinutes"
#> [15] "Calories"
glimpse(daily_activity)#> Rows: 940
#> Columns: 15
#> $ Id <dbl> 1503960366, 1503960366, 1503960366, 150396036~
#> $ ActivityDate <chr> "4/12/2016", "4/13/2016", "4/14/2016", "4/15/~
#> $ TotalSteps <int> 13162, 10735, 10460, 9762, 12669, 9705, 13019~
#> $ TotalDistance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8~
#> $ TrackerDistance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8~
#> $ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
#> $ VeryActiveDistance <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3.5~
#> $ ModeratelyActiveDistance <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1.3~
#> $ LightActiveDistance <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5.0~
#> $ SedentaryActiveDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
#> $ VeryActiveMinutes <int> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66, 4~
#> $ FairlyActiveMinutes <int> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, 21~
#> $ LightlyActiveMinutes <int> 328, 217, 181, 209, 221, 164, 233, 264, 205, ~
#> $ SedentaryMinutes <int> 728, 776, 1218, 726, 773, 539, 1149, 775, 818~
#> $ Calories <int> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 203~
The Daily Activities
head(sleeping_day) #> Id SleepDay TotalSleepRecords TotalMinutesAsleep
#> 1 1503960366 4/12/2016 12:00:00 AM 1 327
#> 2 1503960366 4/13/2016 12:00:00 AM 2 384
#> 3 1503960366 4/15/2016 12:00:00 AM 1 412
#> 4 1503960366 4/16/2016 12:00:00 AM 2 340
#> 5 1503960366 4/17/2016 12:00:00 AM 1 700
#> 6 1503960366 4/19/2016 12:00:00 AM 1 304
#> TotalTimeInBed
#> 1 346
#> 2 407
#> 3 442
#> 4 367
#> 5 712
#> 6 320
colnames(sleeping_day) #> [1] "Id" "SleepDay" "TotalSleepRecords"
#> [4] "TotalMinutesAsleep" "TotalTimeInBed"
glimpse(sleeping_day)#> Rows: 413
#> Columns: 5
#> $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 150~
#> $ SleepDay <chr> "4/12/2016 12:00:00 AM", "4/13/2016 12:00:00 AM", "~
#> $ TotalSleepRecords <int> 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
#> $ TotalMinutesAsleep <int> 327, 384, 412, 340, 700, 304, 360, 325, 361, 430, 2~
#> $ TotalTimeInBed <int> 346, 407, 442, 367, 712, 320, 377, 364, 384, 449, 3~
Weight Information
head(sleeping_day) #> Id SleepDay TotalSleepRecords TotalMinutesAsleep
#> 1 1503960366 4/12/2016 12:00:00 AM 1 327
#> 2 1503960366 4/13/2016 12:00:00 AM 2 384
#> 3 1503960366 4/15/2016 12:00:00 AM 1 412
#> 4 1503960366 4/16/2016 12:00:00 AM 2 340
#> 5 1503960366 4/17/2016 12:00:00 AM 1 700
#> 6 1503960366 4/19/2016 12:00:00 AM 1 304
#> TotalTimeInBed
#> 1 346
#> 2 407
#> 3 442
#> 4 367
#> 5 712
#> 6 320
colnames(sleeping_day) #> [1] "Id" "SleepDay" "TotalSleepRecords"
#> [4] "TotalMinutesAsleep" "TotalTimeInBed"
glimpse(sleeping_day)#> Rows: 413
#> Columns: 5
#> $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 150~
#> $ SleepDay <chr> "4/12/2016 12:00:00 AM", "4/13/2016 12:00:00 AM", "~
#> $ TotalSleepRecords <int> 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
#> $ TotalMinutesAsleep <int> 327, 384, 412, 340, 700, 304, 360, 325, 361, 430, 2~
#> $ TotalTimeInBed <int> 346, 407, 442, 367, 712, 320, 377, 364, 384, 449, 3~
Data Measurements
Total Steps, Distance and Sedentary Minutes on each daily_activity
daily_activity %>%
select(TotalSteps,
TotalDistance,
SedentaryMinutes)%>%
summary()#> TotalSteps TotalDistance SedentaryMinutes
#> Min. : 0 Min. : 0.000 Min. : 0.0
#> 1st Qu.: 3790 1st Qu.: 2.620 1st Qu.: 729.8
#> Median : 7406 Median : 5.245 Median :1057.5
#> Mean : 7638 Mean : 5.490 Mean : 991.2
#> 3rd Qu.:10727 3rd Qu.: 7.713 3rd Qu.:1229.5
#> Max. :36019 Max. :28.030 Max. :1440.0
Sleep Records, Minutes Asleep and Time in Bed for sleep day
sleeping_day %>%
select(TotalSleepRecords,
TotalMinutesAsleep,
TotalTimeInBed)%>%
summary()#> TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
#> Min. :1.000 Min. : 58.0 Min. : 61.0
#> 1st Qu.:1.000 1st Qu.:361.0 1st Qu.:403.0
#> Median :1.000 Median :433.0 Median :463.0
#> Mean :1.119 Mean :419.5 Mean :458.6
#> 3rd Qu.:1.000 3rd Qu.:490.0 3rd Qu.:526.0
#> Max. :3.000 Max. :796.0 Max. :961.0
BMI and weight informations
weight_info %>%
select(WeightPounds,
BMI)%>%
summary()#> WeightPounds BMI
#> Min. :116.0 Min. :21.45
#> 1st Qu.:135.4 1st Qu.:23.96
#> Median :137.8 Median :24.39
#> Mean :158.8 Mean :25.19
#> 3rd Qu.:187.5 3rd Qu.:25.56
#> Max. :294.3 Max. :47.54
Data Visualization - POV of Steps
Total Steps Vs Calories Burned
We are assuming there were relationship between total steps taken in a day and calories burned. Likewise for sedentary minutes in a day compared to total steps. These assumptions exercised in this graph below:
ggplot(data=daily_activity, aes(x=TotalSteps, y=Calories)) + geom_point(col = "darkgreen")+ stat_smooth(method=lm, col = "darkred") +
labs(title = "Total Steps vs. Calories Burned",
x= "Total Steps",
y="Calories")+
theme_minimal()
As those assumptions being tested, the calories burned trends upward as
the total number of steps increases. This would be a good opportunity
for a marketing strategy. The more the subjects move, the more calories
they burn!
Total Steps Vs Sedentary Minutes
We are now trying to understand if there were relations between total steps and time being spent sedentary.
ggplot(data=daily_activity, aes(x=TotalSteps, y=SedentaryMinutes)) + geom_point(col = "darkgreen") +
stat_smooth(method=lm, col = "darkred") +
labs(title = "Total Steps vs Sedentary Minutes",
x= "Total Steps",
y="Sedentary Minutes")+
theme_minimal()
Quite interesting that total steps isn’t very related to time spent
sedentary. The startup we are helping actually could market their
devices to notify their users if they have been stationary for some
period of time.
Data Visualization - POV of Activity Intensity
Is there any relationship between activity intensity and calories burned?
We categorized three kinds of activity intensity; light, moderate and very active. Let’s test if very intense activity burns the most calories. We will plot each intensity category and see if there is a correlation between activity and calories.
Light Active
ggplot(data=daily_activity, aes(x=LightActiveDistance, y=Calories)) + geom_point(col = "navy") + stat_smooth(method=lm,col = "blue") +
labs(title = "Calories Burned From Light Activity",
x= "Light Activie Distance",
y="Calories")+
theme_ipsum_tw()
Then we are now identifying the relationship strength between
light activity with calories being burned
cor(daily_activity$LightActiveDistance, daily_activity$Calories, method = "pearson")#> [1] 0.4669168
Moderate Active
ggplot(data=daily_activity, aes(x=ModeratelyActiveDistance, y=Calories)) + geom_point(col = "navy") + stat_smooth(method=lm,col = "blue") +
labs(title = "Calories Burned From Moderate Activity",
x= "Moderate Activie Distance",
y="Calories")+
theme_ipsum_tw()
Then we are now identifying the relationship strength between
moderate activity with calories being burned
cor(daily_activity$ModeratelyActiveDistance, daily_activity$Calories, method = "pearson")#> [1] 0.2167899
Intensely Active
ggplot(data=daily_activity, aes(x=VeryActiveDistance, y=Calories)) + geom_point(col = "navy")+ stat_smooth(method=lm) +
labs(title = "Calories Burned From Intense Activity",
x= "Very Active Distance",
y="Calories")+
theme_ipsum_tw()
Then we are now identifying the relationship strength between
high intensity activity with calories being burned
cor(daily_activity$VeryActiveDistance, daily_activity$Calories, method = "pearson")#> [1] 0.4919586
Summary
Summary for the three levels of intensity
The results are quite interesting. The very active distance had the highest correlation of 0.49. The second highest was actually the light active distance at a correlation of 0.46. Moderately active distance had the lowest correlation at .2167.
Since light active distance had a close correlation to very active, a marketing strategy could be focused around getting up and moving instead of focusing on high intensity workouts.
Sleep and Bed Time
You think the results should be obvious that sleep time and bed time are linear?? Check this out:
ggplot(data=sleeping_day, aes(x=TotalMinutesAsleep, y=TotalTimeInBed)) + geom_point(col = "navy") + stat_smooth(method=lm) +
labs(title = "Total Minutes Asleep vs. Total Time in Bed",
x= "Total Minutes Asleep",
y="Total Time in Bed")+
theme_light()
See something? There are some outliers in the data.
Some of the data points to people spending much more time in bed than
time asleep. In other term, they are what we call Mager
*:D
- Mager is Malas Gerak in Indonesia, people who love their beds soo much! :D
Weight Vs Activity
Do people who weights more are less active?
Merging the datasets
combined_weight_act <- merge(weight_info, daily_activity, by="Id")
n_distinct(combined_weight_act$Id)#> [1] 8
there are 8 unique Id’s in the combined data set. This matches the total for weight_info.
The Graph
Weight compared to total steps taken
ggplot(data=combined_weight_act, aes(x=WeightPounds, y=TotalSteps)) + geom_point(col = "navy") +
labs(title = "Weight vs. Total Steps",
x= "Weight (lbs)",
y="Total Steps") +
theme_light()cor(combined_weight_act$WeightPounds, combined_weight_act$TotalSteps, method = "pearson")#> [1] 0.2647917
There is a small correlation between weight and total steps but it is not strong enough to develop a marketing strategy around weight loss gimmick. # Overall Relationships
combined_weight_heat <- merge(combined_weight_act, sleeping_day, by="Id")
head(combined_weight_heat)#> Id Date WeightKg WeightPounds Fat BMI
#> 1 1503960366 5/3/2016 11:59:59 PM 52.6 115.9631 NA 22.65
#> 2 1503960366 5/3/2016 11:59:59 PM 52.6 115.9631 NA 22.65
#> 3 1503960366 5/3/2016 11:59:59 PM 52.6 115.9631 NA 22.65
#> 4 1503960366 5/3/2016 11:59:59 PM 52.6 115.9631 NA 22.65
#> 5 1503960366 5/3/2016 11:59:59 PM 52.6 115.9631 NA 22.65
#> 6 1503960366 5/3/2016 11:59:59 PM 52.6 115.9631 NA 22.65
#> IsManualReport LogId ActivityDate TotalSteps TotalDistance
#> 1 True 1462319999000 4/17/2016 9705 6.48
#> 2 True 1462319999000 4/17/2016 9705 6.48
#> 3 True 1462319999000 4/17/2016 9705 6.48
#> 4 True 1462319999000 4/17/2016 9705 6.48
#> 5 True 1462319999000 4/17/2016 9705 6.48
#> 6 True 1462319999000 4/17/2016 9705 6.48
#> TrackerDistance LoggedActivitiesDistance VeryActiveDistance
#> 1 6.48 0 3.19
#> 2 6.48 0 3.19
#> 3 6.48 0 3.19
#> 4 6.48 0 3.19
#> 5 6.48 0 3.19
#> 6 6.48 0 3.19
#> ModeratelyActiveDistance LightActiveDistance SedentaryActiveDistance
#> 1 0.78 2.51 0
#> 2 0.78 2.51 0
#> 3 0.78 2.51 0
#> 4 0.78 2.51 0
#> 5 0.78 2.51 0
#> 6 0.78 2.51 0
#> VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes
#> 1 38 20 164 539
#> 2 38 20 164 539
#> 3 38 20 164 539
#> 4 38 20 164 539
#> 5 38 20 164 539
#> 6 38 20 164 539
#> Calories SleepDay TotalSleepRecords TotalMinutesAsleep
#> 1 1728 5/8/2016 12:00:00 AM 1 594
#> 2 1728 5/7/2016 12:00:00 AM 1 331
#> 3 1728 4/26/2016 12:00:00 AM 1 245
#> 4 1728 4/16/2016 12:00:00 AM 2 340
#> 5 1728 4/12/2016 12:00:00 AM 1 327
#> 6 1728 4/13/2016 12:00:00 AM 2 384
#> TotalTimeInBed
#> 1 611
#> 2 349
#> 3 274
#> 4 367
#> 5 346
#> 6 407
all <- combined_weight_heat %>%
select(WeightKg,BMI,TotalSteps,TotalDistance,TotalMinutesAsleep)
ggcorrplot(cor(all),method = "circle",ggtheme = ggplot2::theme_minimal(),
legend.title = "Corelation Strength",colors = c("blue","yellow", "darkgreen"))Summary
Trends in smart device usage We saw that tracking steps, activity and calories burned were among the most popular metrics being tracked by users. Sleep was second most popular and only a few individuals tracked their weight.
We also saw that calories burned is related to total steps taken throughout the day. Typically, the higher the steps the more calories burned. Interestingly, the time spend sedentary was not inversely proportional to calories burned.
One important thing to note is that Fitbit does not track water intake.
Knowledge
BugarBahagia can put more focus into activity and sleep tracking when it comes to products and marketing strategy. Users are interesting in tracking daily steps. Marketing this aspect of the products could be a good way to appeal to customers. Since even light activity was effective at burning calories, a marketing strategy could center around getting up and moving.
Since Fitbit does not track hydration or water intake, this provides a good opportunity to market the uniqueness of BugarBahagia’s to construct newproduct. Additional market analysis may be needed to see if other competitors are providing a hydration tracker.