Google Capstone Project - Bellabeat case study

1. About The Company

Urška Sršen and Sando Mur founded Bellabeat, a high-tech company that manufactures health-focused smart products.Sršen used her background as an artist to develop beautifully designed technology that informs and inspires women around the world. Collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits. Since it was founded in 2013, Bellabeat has grown rapidly and quickly positioned itself as a tech-driven wellness company for women.

Products

Bellabeat app: The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. This data can help users better understand their current habits and make healthy decisions. The Bellabeat app connects to their line of smart wellness products.

Leaf: Bellabeat’s classic wellness tracker can be worn as a bracelet, necklace, or clip. The Leaf tracker connects to the Bellabeat app to track activity, sleep, and stress.

Time: This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress. The Time watch connects to the Bellabeat app to provide you with insights into your daily wellness.

Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day. The Spring bottle connects to the Bellabeat app to track your hydration levels.

Bellabeat membership: Bellabeat also offers a subscription-based membership program for users. Membership gives users 24/7 access to fully personalized guidance on nutrition, activity, sleep, health and beauty, and mindfulness based on their lifestyle and goals.

Business Task

To analyze smart device usage data to gain insight into how consumers use non-Bellabeat smart devices.

  1. What are some trends in smart device usage?
  2. How could these trends apply to Bellabeat customers?
  3. How could these trends help influence Bellabeat marketing strategy?

Datasets used

FitBit Fitness Tracker Data (CC0: Public Domain, dataset made available through Mobius). This Kaggle data set generated by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016-05.12.2016. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. Individual reports can be parsed by export session ID (column A) or timestamp (column B). Variation between output represents use of different types of Fitbit trackers and individual tracking behaviors / preferences.

Installed Packages

install.packages("tidyverse")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("lubridate")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("dplyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("ggplot2")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("tidyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("here")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("skimr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
install.packages("janitor")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching packages
## ───────────────────────────────────────
## tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0     ✔ purrr   1.0.1
## ✔ tibble  3.1.8     ✔ stringr 1.5.0
## ✔ tidyr   1.2.1     ✔ forcats 0.5.2
## ✔ readr   2.1.3     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

Importing Data

Importing required data from Fabfit Dataset

Activity <- read.csv("/cloud/project/fitbit/dailyActivity_merged.csv")
Calories <- read.csv("/cloud/project/fitbit/dailyCalories_merged.csv")
Intensities <- read.csv("/cloud/project/fitbit/dailyIntensities_merged.csv")
Steps <- read.csv("/cloud/project/fitbit/dailySteps_merged.csv")
Heartrate <- read.csv("/cloud/project/fitbit/heartrate_seconds_merged.csv")
Sleepday <- read.csv("/cloud/project/fitbit/sleepDay_merged.csv")
Weightlog_info <- read.csv("/cloud/project/fitbit/weightLogInfo_merged.csv")

Verifying Data uploaded

head (Activity)
##           Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366    4/12/2016      13162          8.50            8.50
## 2 1503960366    4/13/2016      10735          6.97            6.97
## 3 1503960366    4/14/2016      10460          6.74            6.74
## 4 1503960366    4/15/2016       9762          6.28            6.28
## 5 1503960366    4/16/2016      12669          8.16            8.16
## 6 1503960366    4/17/2016       9705          6.48            6.48
##   LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1                        0               1.88                     0.55
## 2                        0               1.57                     0.69
## 3                        0               2.44                     0.40
## 4                        0               2.14                     1.26
## 5                        0               2.71                     0.41
## 6                        0               3.19                     0.78
##   LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1                6.06                       0                25
## 2                4.71                       0                21
## 3                3.91                       0                30
## 4                2.83                       0                29
## 5                5.04                       0                36
## 6                2.51                       0                38
##   FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1                  13                  328              728     1985
## 2                  19                  217              776     1797
## 3                  11                  181             1218     1776
## 4                  34                  209              726     1745
## 5                  10                  221              773     1863
## 6                  20                  164              539     1728
head (Sleepday)
##           Id              SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 4/12/2016 12:00:00 AM                 1                327
## 2 1503960366 4/13/2016 12:00:00 AM                 2                384
## 3 1503960366 4/15/2016 12:00:00 AM                 1                412
## 4 1503960366 4/16/2016 12:00:00 AM                 2                340
## 5 1503960366 4/17/2016 12:00:00 AM                 1                700
## 6 1503960366 4/19/2016 12:00:00 AM                 1                304
##   TotalTimeInBed
## 1            346
## 2            407
## 3            442
## 4            367
## 5            712
## 6            320
head (Heartrate)
##           Id                 Time Value
## 1 2022484408 4/12/2016 7:21:00 AM    97
## 2 2022484408 4/12/2016 7:21:05 AM   102
## 3 2022484408 4/12/2016 7:21:10 AM   105
## 4 2022484408 4/12/2016 7:21:20 AM   103
## 5 2022484408 4/12/2016 7:21:25 AM   101
## 6 2022484408 4/12/2016 7:22:05 AM    95
head (Calories)
##           Id ActivityDay Calories
## 1 1503960366   4/12/2016     1985
## 2 1503960366   4/13/2016     1797
## 3 1503960366   4/14/2016     1776
## 4 1503960366   4/15/2016     1745
## 5 1503960366   4/16/2016     1863
## 6 1503960366   4/17/2016     1728
head (Intensities)
##           Id ActivityDay SedentaryMinutes LightlyActiveMinutes
## 1 1503960366   4/12/2016              728                  328
## 2 1503960366   4/13/2016              776                  217
## 3 1503960366   4/14/2016             1218                  181
## 4 1503960366   4/15/2016              726                  209
## 5 1503960366   4/16/2016              773                  221
## 6 1503960366   4/17/2016              539                  164
##   FairlyActiveMinutes VeryActiveMinutes SedentaryActiveDistance
## 1                  13                25                       0
## 2                  19                21                       0
## 3                  11                30                       0
## 4                  34                29                       0
## 5                  10                36                       0
## 6                  20                38                       0
##   LightActiveDistance ModeratelyActiveDistance VeryActiveDistance
## 1                6.06                     0.55               1.88
## 2                4.71                     0.69               1.57
## 3                3.91                     0.40               2.44
## 4                2.83                     1.26               2.14
## 5                5.04                     0.41               2.71
## 6                2.51                     0.78               3.19
head (Steps)
##           Id ActivityDay StepTotal
## 1 1503960366   4/12/2016     13162
## 2 1503960366   4/13/2016     10735
## 3 1503960366   4/14/2016     10460
## 4 1503960366   4/15/2016      9762
## 5 1503960366   4/16/2016     12669
## 6 1503960366   4/17/2016      9705
head (Weightlog_info)
##           Id                  Date WeightKg WeightPounds Fat   BMI
## 1 1503960366  5/2/2016 11:59:59 PM     52.6     115.9631  22 22.65
## 2 1503960366  5/3/2016 11:59:59 PM     52.6     115.9631  NA 22.65
## 3 1927972279  4/13/2016 1:08:52 AM    133.5     294.3171  NA 47.54
## 4 2873212765 4/21/2016 11:59:59 PM     56.7     125.0021  NA 21.45
## 5 2873212765 5/12/2016 11:59:59 PM     57.3     126.3249  NA 21.69
## 6 4319703577 4/17/2016 11:59:59 PM     72.4     159.6147  25 27.45
##   IsManualReport        LogId
## 1           True 1.462234e+12
## 2           True 1.462320e+12
## 3          False 1.460510e+12
## 4           True 1.461283e+12
## 5           True 1.463098e+12
## 6           True 1.460938e+12
Insight: Columns Id can be considered as Primary key in these data frames. Activity data set has all details of calories, Intensities and Steps.

Identifiy no.of participants in data frames

Activity %>% 
  summarise(Acivity_particpants = n_distinct(Activity$Id))
##   Acivity_particpants
## 1                  33
Sleepday %>% 
  summarise(Acivity_particpants = n_distinct(Sleepday$Id))
##   Acivity_particpants
## 1                  24
Heartrate %>% 
  summarise(Acivity_particpants = n_distinct(Heartrate$Id))
##   Acivity_particpants
## 1                  14
Weightlog_info %>% 
  summarise(Acivity_particpants = n_distinct(Weightlog_info$Id))
##   Acivity_particpants
## 1                   8
Insight: Activity particpants in Weightlog is very low. Taking this as a part of analysis will create a Biased results.
Sleepday has duplicates. Duplicates should be removed.
Sleep <- unique(Sleepday)
sum(duplicated(Sleepday))
## [1] 3

Summarized Data

Activity %>% 
  select(TotalSteps,TotalDistance,TrackerDistance,Calories,SedentaryMinutes) %>% 
  drop_na() %>% 
  summary()
##    TotalSteps    TotalDistance    TrackerDistance     Calories   
##  Min.   :    0   Min.   : 0.000   Min.   : 0.000   Min.   :   0  
##  1st Qu.: 3790   1st Qu.: 2.620   1st Qu.: 2.620   1st Qu.:1828  
##  Median : 7406   Median : 5.245   Median : 5.245   Median :2134  
##  Mean   : 7638   Mean   : 5.490   Mean   : 5.475   Mean   :2304  
##  3rd Qu.:10727   3rd Qu.: 7.713   3rd Qu.: 7.710   3rd Qu.:2793  
##  Max.   :36019   Max.   :28.030   Max.   :28.030   Max.   :4900  
##  SedentaryMinutes
##  Min.   :   0.0  
##  1st Qu.: 729.8  
##  Median :1057.5  
##  Mean   : 991.2  
##  3rd Qu.:1229.5  
##  Max.   :1440.0
Sleepday %>% 
  select(TotalMinutesAsleep,TotalTimeInBed) %>% 
  drop_na() %>% 
  summary()
##  TotalMinutesAsleep TotalTimeInBed 
##  Min.   : 58.0      Min.   : 61.0  
##  1st Qu.:361.0      1st Qu.:403.0  
##  Median :433.0      Median :463.0  
##  Mean   :419.5      Mean   :458.6  
##  3rd Qu.:490.0      3rd Qu.:526.0  
##  Max.   :796.0      Max.   :961.0
Insight:
  • Average number of steps tracked is 7638. Number of steps between 7500 and 10000 is considered as Active. Mark of 10000 steps will reduce the chance of medical issue and helps to stay fit.

  • The average sedentary time is 991.2 mins, which is 16.5 hrs. This mean users are inactive about more than half a day on average.

  • Average calories burned is 2304cal. This is depended on body type and type of work out.

  • Total Distance and Tracked Distance are almost similar so tracking technology is good.

Merge Data

activity_sleep <- merge(Activity, Sleepday, by= c("Id"))

ANALYSIS

TOTAL STEPS vs CALORIES

ggplot(data=Activity, aes(x=TotalSteps, y = Calories, color=SedentaryMinutes))+ 
  geom_point()+ 
  stat_smooth(method=lm)+
  scale_color_gradient(low="steelblue", high="orange")+labs(title="The Relationship Between Calories and Inactive Minutes")
## `geom_smooth()` using formula = 'y ~ x'

Insights:
  • Taking steps between 7500 and 10000 are able to burn over 2000 to 2500 Calories.
  • Taking steps and calories have a positive correlation. Taking More steps lead to burn more calories.
  • Important relation will be differ with individuals.

STEPS vs INACTIVE MINUTES

library(ggplot2)

ggplot(data=Activity)+
  geom_point(mapping=aes(x=TotalSteps, y=SedentaryMinutes), color="orange") +
  geom_smooth(mapping=aes(x=TotalSteps, y=SedentaryMinutes)) +
  labs(title="The Relationship Between Total Steps and Sedentary Minutes", x="Total Steps", y="Sedentary Minutes")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Insights:

This graph shows a negative correlation between total steps and sedentary minutes - the lower the total steps, the higher the sedentary minutes.

SLEEP MINUTES vs TIME IN BED

ggplot(data=Sleepday, mapping=aes(x=TotalMinutesAsleep, y=TotalTimeInBed), color="blue") +
  geom_point() +
  geom_smooth() +
  labs(title="The Relationship Between the Total Minutes sleep and the Total Time in Bed", x="Total Minutes Alseep", y="Total Time in Bed (min)")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Insight:
  • There is a positive correlation between the total minutes asleep and the amount of time spent in bed. Using this data, Bellabeat can use an app that notifies its customers about when it would be the right time to go to bed so that they can get an adequate amount of sleep.

Conclusion

  1. Sedentary minutes average are very high for users; the tracker software identify and notify it. Setting up targets and provide points will motivate the users.

  2. Create interconnections and leaderboards connecting social medias will be an extra motivation for users to become more active.

  3. Average Sleep time is 7hr. To encourage better sleeping habits, Bellabeat could incorporate reminders through an app that notifies users of the best time to go to sleep and wake up in order to feel refreshed in the morning and get adequate amount of sleep.

  4. Create short workout modes that can be done in anywhere.

  5. Design aspects : Merging fitness tracker to fashion is very good initiative, Creating more intresting design will lead to increase market shares, Comparing other fitness brands Bellabeat products like IVY Leaf (bracelet), Leaf Urban (pendent) are more fasionable.

  6. Creating a community that is interactive with members would be beneficial.

Note:
  • A larger sample size in order to improve the statistical significance of the analysis.

  • Collect a longer period of tracking data, ideally for 6 months to a year, to account for behavioural changes due to the changes in seasons.

  • The need to obtain current data in order to better reflect current consumer behaviours and/or trends in smart device usage.