Bellabeat

INTRODUCTION

Bellabeat: How Can a Wellness Technology Company Play It Smart?

Bellabeat is a high-tech manufacturer of beautifully-designed health-focused smart products for women since 2013. Inspiring and empowering women with knowledge about their own health and habits, Bellabeat has grown rapidly and quickly positioned itself as a tech-driven wellness company for females.

Bellabeat Products

  • Bellabeat App: The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. This data can help users better understand their current habits and make healthy decisions. The Bellabeat app connects to their line of smart wellness products.

  • Leaf: : Bellabeat’s classic wellness tracker can be worn as a bracelet, necklace, or clip. The Leaf tracker connects to the Bellabeat app to track activity, sleep, and stress.

  • Time: : This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress. The Time watch connects to the Bellabeat app to provide you with insights into your daily wellness.

  • Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day. The Spring bottle connects to the Bellabeat app to track your hydration levels.

  • Bellabeat Membership: Bellabeat also offers a subscription-based membership program for users. Membership gives users 24/7 access to fully personalized guidance on nutrition, activity, sleep, health and beauty, and mindfulness based on their lifestyle and goals.

STEP 1: ASK

In this step, we define the problem and objectives of our case study and its desired outcome.

1.1 Business Objectives:

  • What are the trends identified?
  • How could these trends apply to Bellabeat customers?
  • How could these trends help influence Bellabeat marketing strategy?

1.2 Deliverables:

  • A summary of the business task.
  • A description of all data sources used.
  • Documentation of any cleaning or manipulation of data.
  • A summary of analysis.
  • Supporting visualizations and key findings.

STEP 2: PREPARE

2.1 Data Source:

  • The data is publicly available on Kaggle: FitBit Fitness Tracker Data and stored in 18 csv files.
  • Data Collected: The daily activity data set (‘daily_activity’), which contains merged data from other provided files like daily calories, daily intensities, and daily steps, the weight data set (‘weight_info’), and the daily sleep data set (‘daily_sleep’).
  • 30 FitBit users who consented to the submission of personal tracker data.

2.2 Credibility:

A good data source should be Reliable, Original, Comprehensive, Current, and Cited (ROCCC).

  • Reliable: NOT reliable. This data only contains about 30 selected individuals.
  • Original: NOT original. The data set was generated by respondents to a distributed survey via Amazon Mechanical Turk.
  • Comprehensive: NOT comprehensive. The data is not comprehensive in the sense that other data (not present) would be useful to create a more accurate analysis (e.g., sex, age, height, etc.).
  • Current: NOT current. The data was obtained seven years.
  • Cited: Cited but NOT credible.

STEP 3: PROCESS

I cleaned the datasets with EXCEL where I removed duplicates, removed empty rows or columns, did a little formatting, trimmed spaces, etc. Pinch to zoom. slide and scroll between worksheets to view data.

But I would do a little more cleaning with R

3.1 Loading the Packages:

library("tidyverse")
library("lubridate")
library("ggplot2")
library("readr")
library("tidyr")
library("dplyr")
library("skimr")
library("janitor")
library("scales")

3.1 Importing Datasets:

daily_activity<- read_csv("dailyActivity_merged.csv")
weight_info <- read_csv("weightLogInfo_merged.csv")
daily_sleep <- read_csv("sleepDay_merged.csv")

3.2 Viewing the data:

head(daily_activity)
## # A tibble: 6 × 15
##       Id Activ…¹ Total…² Total…³ Track…⁴ Logge…⁵ VeryA…⁶ Moder…⁷ Light…⁸ Seden…⁹
##    <dbl> <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 1.50e9 4/12/2…   13162    8.5     8.5        0    1.88   0.550    6.06       0
## 2 1.50e9 4/13/2…   10735    6.97    6.97       0    1.57   0.690    4.71       0
## 3 1.50e9 4/14/2…   10460    6.74    6.74       0    2.44   0.400    3.91       0
## 4 1.50e9 4/15/2…    9762    6.28    6.28       0    2.14   1.26     2.83       0
## 5 1.50e9 4/16/2…   12669    8.16    8.16       0    2.71   0.410    5.04       0
## 6 1.50e9 4/17/2…    9705    6.48    6.48       0    3.19   0.780    2.51       0
## # … with 5 more variables: VeryActiveMinutes <dbl>, FairlyActiveMinutes <dbl>,
## #   LightlyActiveMinutes <dbl>, SedentaryMinutes <dbl>, Calories <dbl>, and
## #   abbreviated variable names ¹​ActivityDate, ²​TotalSteps, ³​TotalDistance,
## #   ⁴​TrackerDistance, ⁵​LoggedActivitiesDistance, ⁶​VeryActiveDistance,
## #   ⁷​ModeratelyActiveDistance, ⁸​LightActiveDistance, ⁹​SedentaryActiveDistance
head(daily_sleep)
## # A tibble: 6 × 5
##           Id SleepDay              TotalSleepRecords TotalMinutesAsleep TotalT…¹
##        <dbl> <chr>                             <dbl>              <dbl>    <dbl>
## 1 1503960366 4/12/2016 12:00:00 AM                 1                327      346
## 2 1503960366 4/13/2016 12:00:00 AM                 2                384      407
## 3 1503960366 4/15/2016 12:00:00 AM                 1                412      442
## 4 1503960366 4/16/2016 12:00:00 AM                 2                340      367
## 5 1503960366 4/17/2016 12:00:00 AM                 1                700      712
## 6 1503960366 4/19/2016 12:00:00 AM                 1                304      320
## # … with abbreviated variable name ¹​TotalTimeInBed
head(weight_info)
## # A tibble: 6 × 8
##           Id Date                  WeightKg Weight…¹   Fat   BMI IsMan…²   LogId
##        <dbl> <chr>                    <dbl>    <dbl> <dbl> <dbl> <lgl>     <dbl>
## 1 1503960366 5/2/2016 11:59:59 PM      52.6     116.    22  22.6 TRUE    1.46e12
## 2 1503960366 5/3/2016 11:59:59 PM      52.6     116.    NA  22.6 TRUE    1.46e12
## 3 1927972279 4/13/2016 1:08:52 AM     134.      294.    NA  47.5 FALSE   1.46e12
## 4 2873212765 4/21/2016 11:59:59 PM     56.7     125.    NA  21.5 TRUE    1.46e12
## 5 2873212765 5/12/2016 11:59:59 PM     57.3     126.    NA  21.7 TRUE    1.46e12
## 6 4319703577 4/17/2016 11:59:59 PM     72.4     160.    25  27.5 TRUE    1.46e12
## # … with abbreviated variable names ¹​WeightPounds, ²​IsManualReport
colnames(daily_activity)
##  [1] "Id"                       "ActivityDate"            
##  [3] "TotalSteps"               "TotalDistance"           
##  [5] "TrackerDistance"          "LoggedActivitiesDistance"
##  [7] "VeryActiveDistance"       "ModeratelyActiveDistance"
##  [9] "LightActiveDistance"      "SedentaryActiveDistance" 
## [11] "VeryActiveMinutes"        "FairlyActiveMinutes"     
## [13] "LightlyActiveMinutes"     "SedentaryMinutes"        
## [15] "Calories"
colnames(weight_info)
## [1] "Id"             "Date"           "WeightKg"       "WeightPounds"  
## [5] "Fat"            "BMI"            "IsManualReport" "LogId"
colnames(daily_sleep)
## [1] "Id"                 "SleepDay"           "TotalSleepRecords" 
## [4] "TotalMinutesAsleep" "TotalTimeInBed"

3.3 Cleaning the data:

Here, I would be looking for any other missing values, changing column names. Basically I am just perusing my data.

daily_activity %>% 
  filter(TotalSteps !=0)
## # A tibble: 863 × 15
##            Id Activity…¹ Total…² Total…³ Track…⁴ Logge…⁵ VeryA…⁶ Moder…⁷ Light…⁸
##         <dbl> <chr>        <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1 1503960366 4/12/2016    13162    8.5     8.5        0    1.88   0.550    6.06
##  2 1503960366 4/13/2016    10735    6.97    6.97       0    1.57   0.690    4.71
##  3 1503960366 4/14/2016    10460    6.74    6.74       0    2.44   0.400    3.91
##  4 1503960366 4/15/2016     9762    6.28    6.28       0    2.14   1.26     2.83
##  5 1503960366 4/16/2016    12669    8.16    8.16       0    2.71   0.410    5.04
##  6 1503960366 4/17/2016     9705    6.48    6.48       0    3.19   0.780    2.51
##  7 1503960366 4/18/2016    13019    8.59    8.59       0    3.25   0.640    4.71
##  8 1503960366 4/19/2016    15506    9.88    9.88       0    3.53   1.32     5.03
##  9 1503960366 4/20/2016    10544    6.68    6.68       0    1.96   0.480    4.24
## 10 1503960366 4/21/2016     9819    6.34    6.34       0    1.34   0.350    4.65
## # … with 853 more rows, 6 more variables: SedentaryActiveDistance <dbl>,
## #   VeryActiveMinutes <dbl>, FairlyActiveMinutes <dbl>,
## #   LightlyActiveMinutes <dbl>, SedentaryMinutes <dbl>, Calories <dbl>, and
## #   abbreviated variable names ¹​ActivityDate, ²​TotalSteps, ³​TotalDistance,
## #   ⁴​TrackerDistance, ⁵​LoggedActivitiesDistance, ⁶​VeryActiveDistance,
## #   ⁷​ModeratelyActiveDistance, ⁸​LightActiveDistance
colnames(daily_sleep)[which(names(daily_sleep) == "TotalTimeInBed")] <- "TotalBedTime"
n_distinct(daily_activity$Id)
## [1] 33
n_distinct(daily_sleep$Id)
## [1] 24
n_distinct(weight_info$Id)
## [1] 8

STEP 4: ANALYZE

Here, I will perform Statistical analysis, get a summary of my data and look at trends.

Summarize data for visualization;

skim_without_charts(daily_activity)
Data summary
Name daily_activity
Number of rows 940
Number of columns 15
_______________________
Column type frequency:
character 1
numeric 14
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
ActivityDate 0 1 8 9 0 31 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
Id 0 1 4.855407e+09 2.424805e+09 1503960366 2.320127e+09 4.445115e+09 6.962181e+09 8.877689e+09
TotalSteps 0 1 7.637910e+03 5.087150e+03 0 3.789750e+03 7.405500e+03 1.072700e+04 3.601900e+04
TotalDistance 0 1 5.490000e+00 3.920000e+00 0 2.620000e+00 5.240000e+00 7.710000e+00 2.803000e+01
TrackerDistance 0 1 5.480000e+00 3.910000e+00 0 2.620000e+00 5.240000e+00 7.710000e+00 2.803000e+01
LoggedActivitiesDistance 0 1 1.100000e-01 6.200000e-01 0 0.000000e+00 0.000000e+00 0.000000e+00 4.940000e+00
VeryActiveDistance 0 1 1.500000e+00 2.660000e+00 0 0.000000e+00 2.100000e-01 2.050000e+00 2.192000e+01
ModeratelyActiveDistance 0 1 5.700000e-01 8.800000e-01 0 0.000000e+00 2.400000e-01 8.000000e-01 6.480000e+00
LightActiveDistance 0 1 3.340000e+00 2.040000e+00 0 1.950000e+00 3.360000e+00 4.780000e+00 1.071000e+01
SedentaryActiveDistance 0 1 0.000000e+00 1.000000e-02 0 0.000000e+00 0.000000e+00 0.000000e+00 1.100000e-01
VeryActiveMinutes 0 1 2.116000e+01 3.284000e+01 0 0.000000e+00 4.000000e+00 3.200000e+01 2.100000e+02
FairlyActiveMinutes 0 1 1.356000e+01 1.999000e+01 0 0.000000e+00 6.000000e+00 1.900000e+01 1.430000e+02
LightlyActiveMinutes 0 1 1.928100e+02 1.091700e+02 0 1.270000e+02 1.990000e+02 2.640000e+02 5.180000e+02
SedentaryMinutes 0 1 9.912100e+02 3.012700e+02 0 7.297500e+02 1.057500e+03 1.229500e+03 1.440000e+03
Calories 0 1 2.303610e+03 7.181700e+02 0 1.828500e+03 2.134000e+03 2.793250e+03 4.900000e+03
skim_without_charts(daily_sleep)
Data summary
Name daily_sleep
Number of rows 413
Number of columns 5
_______________________
Column type frequency:
character 1
numeric 4
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
SleepDay 0 1 20 21 0 31 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
Id 0 1 5.000979e+09 2.06036e+09 1503960366 3977333714 4702921684 6962181067 8792009665
TotalSleepRecords 0 1 1.120000e+00 3.50000e-01 1 1 1 1 3
TotalMinutesAsleep 0 1 4.194700e+02 1.18340e+02 58 361 433 490 796
TotalBedTime 0 1 4.586400e+02 1.27100e+02 61 403 463 526 961
skim_without_charts(weight_info)
Data summary
Name weight_info
Number of rows 67
Number of columns 8
_______________________
Column type frequency:
character 1
logical 1
numeric 6
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Date 0 1 19 21 0 56 0

Variable type: logical

skim_variable n_missing complete_rate mean count
IsManualReport 0 1 0.61 TRU: 41, FAL: 26

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
Id 0 1.00 7.009282e+09 1.950322e+09 1.503960e+09 6.962181e+09 6.962181e+09 8.877689e+09 8.877689e+09
WeightKg 0 1.00 7.204000e+01 1.392000e+01 5.260000e+01 6.140000e+01 6.250000e+01 8.505000e+01 1.335000e+02
WeightPounds 0 1.00 1.588100e+02 3.070000e+01 1.159600e+02 1.353600e+02 1.377900e+02 1.875000e+02 2.943200e+02
Fat 65 0.03 2.350000e+01 2.120000e+00 2.200000e+01 2.275000e+01 2.350000e+01 2.425000e+01 2.500000e+01
BMI 0 1.00 2.519000e+01 3.070000e+00 2.145000e+01 2.396000e+01 2.439000e+01 2.556000e+01 4.754000e+01
LogId 0 1.00 1.461772e+12 7.829948e+08 1.460444e+12 1.461079e+12 1.461802e+12 1.462375e+12 1.463098e+12

Summarize data for Visualization

daily_activity_new <- daily_activity %>% 
  select(Id, ActivityDate, TotalSteps, VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes, SedentaryMinutes, Calories) %>% 
  rename(Date = ActivityDate)

weight_info_new <- weight_info %>% 
  select(Id, Date, BMI, WeightPounds, IsManualReport)

daily_sleep_new <- daily_sleep %>% 
  select(Id, SleepDay, TotalMinutesAsleep, TotalBedTime)

Findings:

  • Total steps recommended by the World Health Organization(WHO): the ideal daily number of steps one should achieve is 10,000. Individuals have an average daily number of total steps to be 7,632.
  • Since the average BMI is 25.2, this puts these individuals in the overweight category, according to the World Health Organisation (WHO).
  • The individuals spent on average 955.8 minutes a day being sedentary, that is on average 16 hours a day.
  • The ‘Fairly Active Minutes’ equates to on average to 13.63 minutes per day. According to the World Health Organization, adults should do at least 150–300 minutes of moderate-intensity aerobic physical activity. Our participant’s activity in this range equals 95.41 Minutes per week.
  • The average person is getting just under the minimum recommended amount of sleep (7 hours) a person should get, according to the National Sleep Foundation (NSF).Bella beat users have an average or 6.9 sleep hours.

STEP 5: SHARE

Creating visualizations to communicate insights and findings:

ggplot(data=daily_activity) +
  geom_point(mapping=aes(x=TotalSteps, y=Calories), color="purple") +
  geom_smooth(mapping=aes(x=TotalSteps, y=Calories)) +
  labs(title="The Relationship Between Total Steps and Calories Burned", x="Total Steps", y="Calories Burned (kcal)")

The more steps an individual takes, the more calories are burned. And of course, the more active a person is, the more steps they will take, which then means more calories are burned. The average person from this data set is only reaching about 8000 Total Steps for the day, which equates to just under 2500 calories burned for the day.

A graph showing mean steps per hour using EXCEL (pinch to zoom in and out)

The steps are higher during the day and peak between 5pm and 7pm when people are possibly finishing work and travelling home.


ggplot(data=daily_activity_new) +
  geom_point(mapping=aes(x=Calories, y=TotalSteps, color="Calories"))+
  geom_smooth(mapping=aes(x=Calories, y=TotalSteps, regLineColor= "blue"))+
  labs(title="Calories burned for every step taken", x="Calories Burned", y="Total Steps")

ggplot(data=daily_activity_new) +
  geom_point(mapping=aes(x=Calories, y=VeryActiveMinutes, color="Calories"))+
  geom_smooth(mapping=aes(x=Calories, y=VeryActiveMinutes, regLineColor= "blue"))+
  labs(title="Calories burned for every Very Active minutes", x="Calories Burned", y="Very Active Minutes")

#slicing segments
VeryActiveMin <- sum(daily_activity_new$VeryActiveMinutes)
FairlyActiveMin <- sum(daily_activity_new$FairlyActiveMinutes)
LightlyActiveMin <- sum(daily_activity_new$LightlyActiveMinutes)
SedentaryMin <- sum(daily_activity_new$SedentaryMinutes)
TotalMin <- VeryActiveMin + FairlyActiveMin + LightlyActiveMin + SedentaryMin

# plotting the chart
slices <- c(VeryActiveMin,FairlyActiveMin,LightlyActiveMin,SedentaryMin)
lbls <- c("VeryActive","FairlyActive","LightlyActive","Sedentary")
pct <- round(slices/sum(slices)*100)
lbls <- paste(lbls, pct)
lbls <- paste(lbls, "%", sep="")
pie(slices, labels = lbls, col = rainbow(length(lbls)), main = "Percentage of Activity in Minutes")

STEP 6: ACT

RECOMMENDATIONS:

  • Introduce reminders; this could be a reminder to exercise, sleep, stroll or drink water. This would increase benfits for the users and more usage for the app.
  • Bellabeat marketing team can encourage users by offering incentives for consistent tracking, like in-app competitions against friends or other users in the same locality.
  • Implement a reward system for the users. This could be in the form of a new electronic badge or trophy when completing their desired goal for that day. By earning a small reward, this could help to boost users participation and will make them more likely to achieve their goals.
  • Introduce real time graphs and dashboards for users to see activity streaks and trends.
THANK YOU