Fit Bit Analysis

This analysis is made with scenario of a data scientist was assigned to assist a fitness technology unicorn, named “BugarBahagia”, to improve their penetration to market by analyzing FitBit customer data.

Preface

Data Source and Goals

The dataset that i am using for this case study comes from open dataset: https://www.kaggle.com/datasets/arashnic/fitbit

It’s a public dataset, that was generated by respondents to a survey via Amazon Mechanical Turk. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. By explorating those datasets,and the data frames we will be working also include:

  • daily activity information

  • daily sleep information

  • weight log info information

Goals of the case study

Our goals for the project are:

  • Identify the trends in smart mobile usage

  • Determine how those trends could apply to BugarBahagia products and customers

  • Explore how those trends share influence to BugarBahagia’s marketing strategy

Library Being Used

library(tidyverse)
library(plotly)
library(scales)
library(glue)
library(lubridate)
library(hrbrthemes)
library(ggplot2)
library(ggcorrplot)

Data Reading / CSV Loading

setwd("c:/Users/ASUS/Documents/Algoritma/3_DV_LBB")
daily_activity <-  read.csv("dailyActivity_merged.csv") 
sleeping_day <- read.csv("sleepDay_merged.csv")
weight_info <- read.csv("weightLogInfo_merged.csv")

The Daily Activities

head(daily_activity) 
#>           Id ActivityDate TotalSteps TotalDistance TrackerDistance
#> 1 1503960366    4/12/2016      13162          8.50            8.50
#> 2 1503960366    4/13/2016      10735          6.97            6.97
#> 3 1503960366    4/14/2016      10460          6.74            6.74
#> 4 1503960366    4/15/2016       9762          6.28            6.28
#> 5 1503960366    4/16/2016      12669          8.16            8.16
#> 6 1503960366    4/17/2016       9705          6.48            6.48
#>   LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
#> 1                        0               1.88                     0.55
#> 2                        0               1.57                     0.69
#> 3                        0               2.44                     0.40
#> 4                        0               2.14                     1.26
#> 5                        0               2.71                     0.41
#> 6                        0               3.19                     0.78
#>   LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
#> 1                6.06                       0                25
#> 2                4.71                       0                21
#> 3                3.91                       0                30
#> 4                2.83                       0                29
#> 5                5.04                       0                36
#> 6                2.51                       0                38
#>   FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
#> 1                  13                  328              728     1985
#> 2                  19                  217              776     1797
#> 3                  11                  181             1218     1776
#> 4                  34                  209              726     1745
#> 5                  10                  221              773     1863
#> 6                  20                  164              539     1728
colnames(daily_activity) 
#>  [1] "Id"                       "ActivityDate"            
#>  [3] "TotalSteps"               "TotalDistance"           
#>  [5] "TrackerDistance"          "LoggedActivitiesDistance"
#>  [7] "VeryActiveDistance"       "ModeratelyActiveDistance"
#>  [9] "LightActiveDistance"      "SedentaryActiveDistance" 
#> [11] "VeryActiveMinutes"        "FairlyActiveMinutes"     
#> [13] "LightlyActiveMinutes"     "SedentaryMinutes"        
#> [15] "Calories"
glimpse(daily_activity)
#> Rows: 940
#> Columns: 15
#> $ Id                       <dbl> 1503960366, 1503960366, 1503960366, 150396036~
#> $ ActivityDate             <chr> "4/12/2016", "4/13/2016", "4/14/2016", "4/15/~
#> $ TotalSteps               <int> 13162, 10735, 10460, 9762, 12669, 9705, 13019~
#> $ TotalDistance            <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8~
#> $ TrackerDistance          <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8~
#> $ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
#> $ VeryActiveDistance       <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3.5~
#> $ ModeratelyActiveDistance <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1.3~
#> $ LightActiveDistance      <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5.0~
#> $ SedentaryActiveDistance  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
#> $ VeryActiveMinutes        <int> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66, 4~
#> $ FairlyActiveMinutes      <int> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, 21~
#> $ LightlyActiveMinutes     <int> 328, 217, 181, 209, 221, 164, 233, 264, 205, ~
#> $ SedentaryMinutes         <int> 728, 776, 1218, 726, 773, 539, 1149, 775, 818~
#> $ Calories                 <int> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 203~

The Daily Activities

head(sleeping_day) 
#>           Id              SleepDay TotalSleepRecords TotalMinutesAsleep
#> 1 1503960366 4/12/2016 12:00:00 AM                 1                327
#> 2 1503960366 4/13/2016 12:00:00 AM                 2                384
#> 3 1503960366 4/15/2016 12:00:00 AM                 1                412
#> 4 1503960366 4/16/2016 12:00:00 AM                 2                340
#> 5 1503960366 4/17/2016 12:00:00 AM                 1                700
#> 6 1503960366 4/19/2016 12:00:00 AM                 1                304
#>   TotalTimeInBed
#> 1            346
#> 2            407
#> 3            442
#> 4            367
#> 5            712
#> 6            320
colnames(sleeping_day) 
#> [1] "Id"                 "SleepDay"           "TotalSleepRecords" 
#> [4] "TotalMinutesAsleep" "TotalTimeInBed"
glimpse(sleeping_day)
#> Rows: 413
#> Columns: 5
#> $ Id                 <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 150~
#> $ SleepDay           <chr> "4/12/2016 12:00:00 AM", "4/13/2016 12:00:00 AM", "~
#> $ TotalSleepRecords  <int> 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
#> $ TotalMinutesAsleep <int> 327, 384, 412, 340, 700, 304, 360, 325, 361, 430, 2~
#> $ TotalTimeInBed     <int> 346, 407, 442, 367, 712, 320, 377, 364, 384, 449, 3~

Weight Information

head(sleeping_day) 
#>           Id              SleepDay TotalSleepRecords TotalMinutesAsleep
#> 1 1503960366 4/12/2016 12:00:00 AM                 1                327
#> 2 1503960366 4/13/2016 12:00:00 AM                 2                384
#> 3 1503960366 4/15/2016 12:00:00 AM                 1                412
#> 4 1503960366 4/16/2016 12:00:00 AM                 2                340
#> 5 1503960366 4/17/2016 12:00:00 AM                 1                700
#> 6 1503960366 4/19/2016 12:00:00 AM                 1                304
#>   TotalTimeInBed
#> 1            346
#> 2            407
#> 3            442
#> 4            367
#> 5            712
#> 6            320
colnames(sleeping_day) 
#> [1] "Id"                 "SleepDay"           "TotalSleepRecords" 
#> [4] "TotalMinutesAsleep" "TotalTimeInBed"
glimpse(sleeping_day)
#> Rows: 413
#> Columns: 5
#> $ Id                 <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 150~
#> $ SleepDay           <chr> "4/12/2016 12:00:00 AM", "4/13/2016 12:00:00 AM", "~
#> $ TotalSleepRecords  <int> 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
#> $ TotalMinutesAsleep <int> 327, 384, 412, 340, 700, 304, 360, 325, 361, 430, 2~
#> $ TotalTimeInBed     <int> 346, 407, 442, 367, 712, 320, 377, 364, 384, 449, 3~

Data Measurements

Total Steps, Distance and Sedentary Minutes on each daily_activity

daily_activity %>%
  select(TotalSteps,
         TotalDistance,
         SedentaryMinutes)%>%
  summary()
#>    TotalSteps    TotalDistance    SedentaryMinutes
#>  Min.   :    0   Min.   : 0.000   Min.   :   0.0  
#>  1st Qu.: 3790   1st Qu.: 2.620   1st Qu.: 729.8  
#>  Median : 7406   Median : 5.245   Median :1057.5  
#>  Mean   : 7638   Mean   : 5.490   Mean   : 991.2  
#>  3rd Qu.:10727   3rd Qu.: 7.713   3rd Qu.:1229.5  
#>  Max.   :36019   Max.   :28.030   Max.   :1440.0

Sleep Records, Minutes Asleep and Time in Bed for sleep day

sleeping_day %>%
  select(TotalSleepRecords,
         TotalMinutesAsleep,
         TotalTimeInBed)%>%
  summary()
#>  TotalSleepRecords TotalMinutesAsleep TotalTimeInBed 
#>  Min.   :1.000     Min.   : 58.0      Min.   : 61.0  
#>  1st Qu.:1.000     1st Qu.:361.0      1st Qu.:403.0  
#>  Median :1.000     Median :433.0      Median :463.0  
#>  Mean   :1.119     Mean   :419.5      Mean   :458.6  
#>  3rd Qu.:1.000     3rd Qu.:490.0      3rd Qu.:526.0  
#>  Max.   :3.000     Max.   :796.0      Max.   :961.0

BMI and weight informations

weight_info %>%
  select(WeightPounds,
         BMI)%>%
  summary()
#>   WeightPounds        BMI       
#>  Min.   :116.0   Min.   :21.45  
#>  1st Qu.:135.4   1st Qu.:23.96  
#>  Median :137.8   Median :24.39  
#>  Mean   :158.8   Mean   :25.19  
#>  3rd Qu.:187.5   3rd Qu.:25.56  
#>  Max.   :294.3   Max.   :47.54

Data Visualization - POV of Steps

Total Steps Vs Calories Burned

We are assuming there were relationship between total steps taken in a day and calories burned. Likewise for sedentary minutes in a day compared to total steps. These assumptions exercised in this graph below:

ggplot(data=daily_activity, aes(x=TotalSteps, y=Calories)) + geom_point(col = "darkgreen")+ stat_smooth(method=lm, col = "darkred") +
  labs(title = "Total Steps vs. Calories Burned",
       x= "Total Steps",
       y="Calories")+
theme_minimal()

As those assumptions being tested, the calories burned trends upward as the total number of steps increases. This would be a good opportunity for a marketing strategy. The more the subjects move, the more calories they burn!

Total Steps Vs Sedentary Minutes

We are now trying to understand if there were relations between total steps and time being spent sedentary.

ggplot(data=daily_activity, aes(x=TotalSteps, y=SedentaryMinutes)) + geom_point(col = "darkgreen") +
   stat_smooth(method=lm, col = "darkred") +
   labs(title = "Total Steps vs Sedentary Minutes",
      x= "Total Steps",
      y="Sedentary Minutes")+
theme_minimal()

Quite interesting that total steps isn’t very related to time spent sedentary. The startup we are helping actually could market their devices to notify their users if they have been stationary for some period of time.

Data Visualization - POV of Activity Intensity

Is there any relationship between activity intensity and calories burned?

We categorized three kinds of activity intensity; light, moderate and very active. Let’s test if very intense activity burns the most calories. We will plot each intensity category and see if there is a correlation between activity and calories.

Light Active

ggplot(data=daily_activity, aes(x=LightActiveDistance, y=Calories)) + geom_point(col = "navy") + stat_smooth(method=lm,col = "blue") +
  labs(title = "Calories Burned From Light Activity",
       x= "Light Activie Distance",
       y="Calories")+
theme_ipsum_tw()

Then we are now identifying the relationship strength between light activity with calories being burned

cor(daily_activity$LightActiveDistance, daily_activity$Calories, method = "pearson")
#> [1] 0.4669168

Moderate Active

ggplot(data=daily_activity, aes(x=ModeratelyActiveDistance, y=Calories)) + geom_point(col = "navy") + stat_smooth(method=lm,col = "blue") +
  labs(title = "Calories Burned From Moderate Activity",
       x= "Moderate Activie Distance",
       y="Calories")+
theme_ipsum_tw()

Then we are now identifying the relationship strength between moderate activity with calories being burned

cor(daily_activity$ModeratelyActiveDistance, daily_activity$Calories, method = "pearson")
#> [1] 0.2167899

Intensely Active

ggplot(data=daily_activity, aes(x=VeryActiveDistance, y=Calories)) + geom_point(col = "navy")+ stat_smooth(method=lm) +
  labs(title = "Calories Burned From Intense Activity",
       x= "Very Active Distance",
       y="Calories")+
theme_ipsum_tw()

Then we are now identifying the relationship strength between high intensity activity with calories being burned

cor(daily_activity$VeryActiveDistance, daily_activity$Calories, method = "pearson")
#> [1] 0.4919586

Summary

Summary for the three levels of intensity

The results are quite interesting. The very active distance had the highest correlation of 0.49. The second highest was actually the light active distance at a correlation of 0.46. Moderately active distance had the lowest correlation at .2167.

Since light active distance had a close correlation to very active, a marketing strategy could be focused around getting up and moving instead of focusing on high intensity workouts.

Sleep and Bed Time

You think the results should be obvious that sleep time and bed time are linear?? Check this out:

ggplot(data=sleeping_day, aes(x=TotalMinutesAsleep, y=TotalTimeInBed)) + geom_point(col = "navy") + stat_smooth(method=lm) +
  labs(title = "Total Minutes Asleep vs. Total Time in Bed",
       x= "Total Minutes Asleep",
       y="Total Time in Bed")+
theme_light()

See something? There are some outliers in the data. Some of the data points to people spending much more time in bed than time asleep. In other term, they are what we call Mager *:D

  • Mager is Malas Gerak in Indonesia, people who love their beds soo much! :D

Weight Vs Activity

Do people who weights more are less active?

Merging the datasets

combined_weight_act <- merge(weight_info, daily_activity, by="Id") 

n_distinct(combined_weight_act$Id)
#> [1] 8

there are 8 unique Id’s in the combined data set. This matches the total for weight_info.

The Graph

Weight compared to total steps taken

ggplot(data=combined_weight_act, aes(x=WeightPounds, y=TotalSteps)) + geom_point(col = "navy") +
  labs(title = "Weight vs. Total Steps",
       x= "Weight (lbs)",
       y="Total Steps") +
  theme_light()

cor(combined_weight_act$WeightPounds, combined_weight_act$TotalSteps, method = "pearson")
#> [1] 0.2647917

There is a small correlation between weight and total steps but it is not strong enough to develop a marketing strategy around weight loss gimmick. # Overall Relationships

combined_weight_heat <- merge(combined_weight_act, sleeping_day, by="Id") 
head(combined_weight_heat)
#>           Id                 Date WeightKg WeightPounds Fat   BMI
#> 1 1503960366 5/3/2016 11:59:59 PM     52.6     115.9631  NA 22.65
#> 2 1503960366 5/3/2016 11:59:59 PM     52.6     115.9631  NA 22.65
#> 3 1503960366 5/3/2016 11:59:59 PM     52.6     115.9631  NA 22.65
#> 4 1503960366 5/3/2016 11:59:59 PM     52.6     115.9631  NA 22.65
#> 5 1503960366 5/3/2016 11:59:59 PM     52.6     115.9631  NA 22.65
#> 6 1503960366 5/3/2016 11:59:59 PM     52.6     115.9631  NA 22.65
#>   IsManualReport         LogId ActivityDate TotalSteps TotalDistance
#> 1           True 1462319999000    4/17/2016       9705          6.48
#> 2           True 1462319999000    4/17/2016       9705          6.48
#> 3           True 1462319999000    4/17/2016       9705          6.48
#> 4           True 1462319999000    4/17/2016       9705          6.48
#> 5           True 1462319999000    4/17/2016       9705          6.48
#> 6           True 1462319999000    4/17/2016       9705          6.48
#>   TrackerDistance LoggedActivitiesDistance VeryActiveDistance
#> 1            6.48                        0               3.19
#> 2            6.48                        0               3.19
#> 3            6.48                        0               3.19
#> 4            6.48                        0               3.19
#> 5            6.48                        0               3.19
#> 6            6.48                        0               3.19
#>   ModeratelyActiveDistance LightActiveDistance SedentaryActiveDistance
#> 1                     0.78                2.51                       0
#> 2                     0.78                2.51                       0
#> 3                     0.78                2.51                       0
#> 4                     0.78                2.51                       0
#> 5                     0.78                2.51                       0
#> 6                     0.78                2.51                       0
#>   VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes
#> 1                38                  20                  164              539
#> 2                38                  20                  164              539
#> 3                38                  20                  164              539
#> 4                38                  20                  164              539
#> 5                38                  20                  164              539
#> 6                38                  20                  164              539
#>   Calories              SleepDay TotalSleepRecords TotalMinutesAsleep
#> 1     1728  5/8/2016 12:00:00 AM                 1                594
#> 2     1728  5/7/2016 12:00:00 AM                 1                331
#> 3     1728 4/26/2016 12:00:00 AM                 1                245
#> 4     1728 4/16/2016 12:00:00 AM                 2                340
#> 5     1728 4/12/2016 12:00:00 AM                 1                327
#> 6     1728 4/13/2016 12:00:00 AM                 2                384
#>   TotalTimeInBed
#> 1            611
#> 2            349
#> 3            274
#> 4            367
#> 5            346
#> 6            407
all <- combined_weight_heat %>% 
  select(WeightKg,BMI,TotalSteps,TotalDistance,TotalMinutesAsleep) 
ggcorrplot(cor(all),method = "circle",ggtheme = ggplot2::theme_minimal(),
           legend.title = "Corelation Strength",colors = c("blue","yellow", "darkgreen"))

Summary

Trends in smart device usage We saw that tracking steps, activity and calories burned were among the most popular metrics being tracked by users. Sleep was second most popular and only a few individuals tracked their weight.

We also saw that calories burned is related to total steps taken throughout the day. Typically, the higher the steps the more calories burned. Interestingly, the time spend sedentary was not inversely proportional to calories burned.

One important thing to note is that Fitbit does not track water intake.

Knowledge

BugarBahagia can put more focus into activity and sleep tracking when it comes to products and marketing strategy. Users are interesting in tracking daily steps. Marketing this aspect of the products could be a good way to appeal to customers. Since even light activity was effective at burning calories, a marketing strategy could center around getting up and moving.

Since Fitbit does not track hydration or water intake, this provides a good opportunity to market the uniqueness of BugarBahagia’s to construct newproduct. Additional market analysis may be needed to see if other competitors are providing a hydration tracker.