Table of Contents

  1. Scenario

  2. Ask

    1. Identifying the business task
    2. Key stakeholders
  3. Prepare

    1. Dataset quality & ROCCC analysis
    2. Data verification
    3. Analysis limitations
  4. Process

    1. Data cleaning & data selection
  5. Analyze

    1. Relationship between calories vs. total no. of steps
    2. Relationship between sedentary minutes vs. minutes slept
    3. Quantifying sedentary minutes
    4. Measuring user smart device usage
  6. Act



1. Scenario

Bellabeat is a high-tech company founded in 2013 that manufactures health-focused smart products. The company has grown rapidly and quickly positioned itself as a tech-driven wellness company for women. Currently, they offer 5 products as follows: Bellabeat App, Leaf, Time, Spring, and Bellabeat Membership.


Despite being a relatively small-scale company, their potential for growth in technology’s health and wellness sector can be driven by further analyzing consumer’s behavior towards smart device products. The insights gained from these consumer behaviors can establish Bellabeat as a top-competing company in the global smart device market.



2. Ask

i - Identifying the business task:

Gain insights from FitBit Fitness Tracker Data to understand consumer’s fitness habits tracked by the FitBit smart device, and ultimately utilize these insights to assist Bellabeat’s marketing team to make a strategic data-driven plan for driving business growth.


ii - Key stakeholders:
  • Primary: Urška Sršen & Sando Mur
  • Secondary: Bellabeat Marketing Analytics Team



3. Prepare

i - Dataset Quality & ROCCC Analysis:

The dataset was obtained from FitBit Fitness Tracker Data (Kaggle) which is a CC0: Public Domain, made available through Mobius. It contains fitness data of 30 eligible FitBit users, with the dataset having a total of 18 .csv files.


To determine if the data set is acceptable, we utilize the ROCCC approach:

Criteria Rating (1 - Lowest, 5 - Highest) Comment
Reliable 2 Fitness data was only collected from 30 individuals – this is the minimum sample size to achieve reasonable statistical power.
Orignal 1 Data was collected from a third party - Amazon Mechanical Turk
Comprehensive 4 Dataset comprises of a minute, hourly and daily level. Most importantly, insights can already be derived from the daily-level data for activity, intensity, calories, steps and sleep.
Current 3 Collected from Mar 2016 to May 2016, approximately 7 years ago. 2016 smart devices were most likely base-models for the current devices (2023), so data collected from these older devices already have “foundational” information (such as weight, steps taken, activity intensity, etc.)
Cited 2 Dataset was originally from zenodo.org which shows at least 1 citation.



ii - Data verification:

Note: I will first set up my R environment by loading all the packages mentioned below to prepare for my data analysis:

  • tidyverse - for general data manipulation, cleaning exploration and visualization

  • janitor - for cleaning and transforming data

  • ggpmisc - for additional geoms/stats/scales

  • ggpubr - for additional tools for formatting plots, including themes, labels, etc.

library(tidyverse)
library(janitor)
library(ggpmisc)
library(ggplot2)
library(ggpubr)
library(lubridate)


I will be focusing on the daily csv files as it should be able to provide the necessary high-level insights which Bellabeat can use to design an effective marketing strategy. The following files were selected for my data preparation:

  • dailyActivity_merged.csv

  • dailyCalories_merged.csv

  • dailyIntensities_merged.csv

  • dailySteps_merged.csv

  • sleepDay_merged.csv

  • weightLogInfo_merged.csv


Since the dataset contains numerous tables with varying size, I will be using R for a quicker scan of the dataset. It is important to first identify the parameter or category that connects the dataset together.

Running the following code below was enough for me to identify that the ID and Date are two parameters which connects the daily csv files. These parameters will be the basis for my analysis.

dailyActivity <- read_csv("~/R Projects/dailyActivity_merged.csv")
## Rows: 940 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): ActivityDate
## dbl (14): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dailyCalories <- read_csv("~/R Projects/dailyCalories_merged.csv")
## Rows: 940 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDay
## dbl (2): Id, Calories
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dailyIntensities <- read_csv("~/R Projects/dailyIntensities_merged.csv")
## Rows: 940 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDay
## dbl (9): Id, SedentaryMinutes, LightlyActiveMinutes, FairlyActiveMinutes, Ve...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dailySteps <- read_csv("~/R Projects/dailySteps_merged.csv")
## Rows: 940 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDay
## dbl (2): Id, StepTotal
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
sleepDay <- read_csv("~/R Projects/sleepDay_merged.csv")
## Rows: 413 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): SleepDay
## dbl (4): Id, TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
weightInfo <- read_csv("~/R Projects/weightLogInfo_merged.csv")
## Rows: 67 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Date
## dbl (6): Id, WeightKg, WeightPounds, Fat, BMI, LogId
## lgl (1): IsManualReport
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.


To verify the number of unique user ID in each table, the following code was run:

distinct(dailyActivity, Id)
## # A tibble: 33 × 1
##            Id
##         <dbl>
##  1 1503960366
##  2 1624580081
##  3 1644430081
##  4 1844505072
##  5 1927972279
##  6 2022484408
##  7 2026352035
##  8 2320127002
##  9 2347167796
## 10 2873212765
## # … with 23 more rows
distinct(dailyCalories, Id)
## # A tibble: 33 × 1
##            Id
##         <dbl>
##  1 1503960366
##  2 1624580081
##  3 1644430081
##  4 1844505072
##  5 1927972279
##  6 2022484408
##  7 2026352035
##  8 2320127002
##  9 2347167796
## 10 2873212765
## # … with 23 more rows
distinct(dailyIntensities, Id)
## # A tibble: 33 × 1
##            Id
##         <dbl>
##  1 1503960366
##  2 1624580081
##  3 1644430081
##  4 1844505072
##  5 1927972279
##  6 2022484408
##  7 2026352035
##  8 2320127002
##  9 2347167796
## 10 2873212765
## # … with 23 more rows
distinct(dailySteps, Id)
## # A tibble: 33 × 1
##            Id
##         <dbl>
##  1 1503960366
##  2 1624580081
##  3 1644430081
##  4 1844505072
##  5 1927972279
##  6 2022484408
##  7 2026352035
##  8 2320127002
##  9 2347167796
## 10 2873212765
## # … with 23 more rows
distinct(sleepDay, Id)
## # A tibble: 24 × 1
##            Id
##         <dbl>
##  1 1503960366
##  2 1644430081
##  3 1844505072
##  4 1927972279
##  5 2026352035
##  6 2320127002
##  7 2347167796
##  8 3977333714
##  9 4020332650
## 10 4319703577
## # … with 14 more rows
distinct(weightInfo, Id)
## # A tibble: 8 × 1
##           Id
##        <dbl>
## 1 1503960366
## 2 1927972279
## 3 2873212765
## 4 4319703577
## 5 4558609924
## 6 5577150313
## 7 6962181067
## 8 8877689391


Notes:

  • Since the weightLogInfo_merged.csv table has only 8 distinct user ID’s, we will be automatically disregarding this data due to the low number of sample size.

  • There are inconsistencies in the unique user ID in the tables despite the dataset mentioning that there are only 30 supposed users.



iii - Analysis limitations :
  • Based on the data verifications, we can see that the sample size is relatively small (n = 30). However, this meets the minimum sample size to form an effective analysis. (Source)

  • Since the user demographic is unknown or not specified, the data could be involving sampling bias.

  • The duration of the data collection is also limited (2 months) – which may not be enough to track accurate habits. As per MindOwl, it takes roughly 66 days to form a habit. (Source)

  • It was stated that this dataset comprised of the response of 30 FitBit users, but upon investigation, there were 33 users (based on the unique user ID). The data did not specify or account for the 3 additional users which could be from erroneous data, or duplicated data.



4. Process

i - Data cleaning & data selection:

To detect all NA and duplicate values from all 5 tables (dailyActivity, dailyCalories, dailyIntensities, dailySteps, and sleepDay), the following code below was run. No NA or duplicate values from tables dailyActivity, dailyCalories, dailyIntensities, and dailySteps were found.

dim(dailyActivity)
## [1] 940  15
sum(is.na(dailyActivity))
## [1] 0
sum(duplicated(dailyActivity))
## [1] 0
dim(dailyCalories)
## [1] 940   3
sum(is.na(dailyCalories))
## [1] 0
sum(duplicated(dailyCalories))
## [1] 0
dim(dailyIntensities)
## [1] 940  10
sum(is.na(dailyIntensities))
## [1] 0
sum(duplicated(dailyIntensities))
## [1] 0
dim(dailySteps)
## [1] 940   3
sum(is.na(dailySteps))
## [1] 0
sum(duplicated(dailySteps))
## [1] 0


For sleepDay, we could see that there are 3 duplicated records, and was cleaned using the code below:

dim(sleepDay)
## [1] 413   5
sum(is.na(sleepDay))
## [1] 0
sum(duplicated(sleepDay))
## [1] 3
## Deleting duplicate records
sleepDay1<- sleepDay[!duplicated(sleepDay), ]

## Rechecking if duplicate records still exist
dim(sleepDay1)
## [1] 410   5


Summary of removing NA and duplicate values:

Previous Table Name New Table Name NA Values Exist Duplicates Exist Changes Made
dailyActivity dailyActivity No No None
dailyCalories dailyCalories No No None
dailyIntensities dailyIntensities No No None
dailySteps dailySteps No No None
sleepDay sleepDay1 No Yes Duplicates removed (3 records)


To further narrow down which data to be used, we will look at tables dailyActivity, dailySteps, dailyCalories, and dailyIntensities.


For dailySteps, a visual inspection shows that data from the table is already incorporated in the dailyActivity table. To confirm, we will create a subset of dailyActivity using the first 3 columns, and compare it to dailySteps.

# Comparing dailyActivity with dailySteps data

dailyActivityTEST1 <- select(dailyActivity, Id, ActivityDate, TotalSteps)
dailyActivityTEST1 <- rename(dailyActivityTEST1, ActivityDay = ActivityDate)

dailyStepsTEST <- rename(dailySteps, TotalSteps = StepTotal)

all.equal(dailyActivityTEST1, dailyStepsTEST, check.attributes = FALSE)
## [1] TRUE


Similar to the dailySteps table, we will be repeating this step with the dailyCalories table:

# Comparing dailyActivity with dailyCalories data

dailyActivityTEST2 <- select(dailyActivity, Id, ActivityDate, Calories)
dailyActivityTEST2 <- rename(dailyActivityTEST2, ActivityDay = ActivityDate)

all.equal(dailyActivityTEST2, dailyCalories, check.attributes = FALSE)
## [1] TRUE


Proceeding with the same checks for dailyIntensities table:

# Comparing dailyActivity with dailyIntensities data

dailyActivityTEST3 <- select(dailyActivity, Id, ActivityDate, SedentaryMinutes, 
                             LightlyActiveMinutes, FairlyActiveMinutes, VeryActiveMinutes, 
                             SedentaryActiveDistance, LightActiveDistance, 
                             ModeratelyActiveDistance, VeryActiveDistance)
dailyActivityTEST3 <- rename(dailyActivityTEST3, ActivityDay = ActivityDate)
all.equal(dailyActivityTEST3, dailyIntensities, check.attributes = FALSE)
## [1] TRUE


This proves that the dailySteps, dataCalories, and dataIntensities tables are already incorporated in dailyActivity table, and future analysis from these three tables can be done directly via the dailyActivity table; therefore, we will proceed to omit these three tables.


Final tables to be used for the analysis:

  • dailyActivity

  • sleepDay1


We will further clean this data by ensuring that the 2 primary keys, ID and Date, are in the same format & column name:

# Cleaning dailyActivity and sleepDay1 data

clean_names(dailyActivity)
## # A tibble: 940 × 15
##            id activity…¹ total…² total…³ track…⁴ logge…⁵ very_…⁶ moder…⁷ light…⁸
##         <dbl> <chr>        <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1 1503960366 4/12/2016    13162    8.5     8.5        0    1.88   0.550    6.06
##  2 1503960366 4/13/2016    10735    6.97    6.97       0    1.57   0.690    4.71
##  3 1503960366 4/14/2016    10460    6.74    6.74       0    2.44   0.400    3.91
##  4 1503960366 4/15/2016     9762    6.28    6.28       0    2.14   1.26     2.83
##  5 1503960366 4/16/2016    12669    8.16    8.16       0    2.71   0.410    5.04
##  6 1503960366 4/17/2016     9705    6.48    6.48       0    3.19   0.780    2.51
##  7 1503960366 4/18/2016    13019    8.59    8.59       0    3.25   0.640    4.71
##  8 1503960366 4/19/2016    15506    9.88    9.88       0    3.53   1.32     5.03
##  9 1503960366 4/20/2016    10544    6.68    6.68       0    1.96   0.480    4.24
## 10 1503960366 4/21/2016     9819    6.34    6.34       0    1.34   0.350    4.65
## # … with 930 more rows, 6 more variables: sedentary_active_distance <dbl>,
## #   very_active_minutes <dbl>, fairly_active_minutes <dbl>,
## #   lightly_active_minutes <dbl>, sedentary_minutes <dbl>, calories <dbl>, and
## #   abbreviated variable names ¹​activity_date, ²​total_steps, ³​total_distance,
## #   ⁴​tracker_distance, ⁵​logged_activities_distance, ⁶​very_active_distance,
## #   ⁷​moderately_active_distance, ⁸​light_active_distance
dailyActivity <- rename_with(dailyActivity, tolower)
dailyActivity <- rename(dailyActivity, date = activitydate)
dailyActivity$date <- as_date(dailyActivity$date, format = "%m/%d/%Y")

clean_names(sleepDay1)
## # A tibble: 410 × 5
##            id sleep_day             total_sleep_records total_minutes_…¹ total…²
##         <dbl> <chr>                               <dbl>            <dbl>   <dbl>
##  1 1503960366 4/12/2016 12:00:00 AM                   1              327     346
##  2 1503960366 4/13/2016 12:00:00 AM                   2              384     407
##  3 1503960366 4/15/2016 12:00:00 AM                   1              412     442
##  4 1503960366 4/16/2016 12:00:00 AM                   2              340     367
##  5 1503960366 4/17/2016 12:00:00 AM                   1              700     712
##  6 1503960366 4/19/2016 12:00:00 AM                   1              304     320
##  7 1503960366 4/20/2016 12:00:00 AM                   1              360     377
##  8 1503960366 4/21/2016 12:00:00 AM                   1              325     364
##  9 1503960366 4/23/2016 12:00:00 AM                   1              361     384
## 10 1503960366 4/24/2016 12:00:00 AM                   1              430     449
## # … with 400 more rows, and abbreviated variable names ¹​total_minutes_asleep,
## #   ²​total_time_in_bed
sleepDay1 <- rename_with(sleepDay1, tolower)
sleepDay1 <- rename(sleepDay1, date = sleepday)
sleepDay1$date <- as_date(sleepDay1$date, format = "%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())


Checking our cleaned data tables:

# Checking cleaned data tables

head(dailyActivity, 3)
## # A tibble: 3 × 15
##           id date       totals…¹ total…² track…³ logge…⁴ verya…⁵ moder…⁶ light…⁷
##        <dbl> <date>        <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 1503960366 2016-04-12    13162    8.5     8.5        0    1.88   0.550    6.06
## 2 1503960366 2016-04-13    10735    6.97    6.97       0    1.57   0.690    4.71
## 3 1503960366 2016-04-14    10460    6.74    6.74       0    2.44   0.400    3.91
## # … with 6 more variables: sedentaryactivedistance <dbl>,
## #   veryactiveminutes <dbl>, fairlyactiveminutes <dbl>,
## #   lightlyactiveminutes <dbl>, sedentaryminutes <dbl>, calories <dbl>, and
## #   abbreviated variable names ¹​totalsteps, ²​totaldistance, ³​trackerdistance,
## #   ⁴​loggedactivitiesdistance, ⁵​veryactivedistance, ⁶​moderatelyactivedistance,
## #   ⁷​lightactivedistance
head(sleepDay1, 3)
## # A tibble: 3 × 5
##           id date       totalsleeprecords totalminutesasleep totaltimeinbed
##        <dbl> <date>                 <dbl>              <dbl>          <dbl>
## 1 1503960366 2016-04-12                 1                327            346
## 2 1503960366 2016-04-13                 2                384            407
## 3 1503960366 2016-04-15                 1                412            442


Merging our data tables:

daily_activity_sleep <- merge(dailyActivity, sleepDay1, by = c("id", "date"))
head(daily_activity_sleep)
##           id       date totalsteps totaldistance trackerdistance
## 1 1503960366 2016-04-12      13162          8.50            8.50
## 2 1503960366 2016-04-13      10735          6.97            6.97
## 3 1503960366 2016-04-15       9762          6.28            6.28
## 4 1503960366 2016-04-16      12669          8.16            8.16
## 5 1503960366 2016-04-17       9705          6.48            6.48
## 6 1503960366 2016-04-19      15506          9.88            9.88
##   loggedactivitiesdistance veryactivedistance moderatelyactivedistance
## 1                        0               1.88                     0.55
## 2                        0               1.57                     0.69
## 3                        0               2.14                     1.26
## 4                        0               2.71                     0.41
## 5                        0               3.19                     0.78
## 6                        0               3.53                     1.32
##   lightactivedistance sedentaryactivedistance veryactiveminutes
## 1                6.06                       0                25
## 2                4.71                       0                21
## 3                2.83                       0                29
## 4                5.04                       0                36
## 5                2.51                       0                38
## 6                5.03                       0                50
##   fairlyactiveminutes lightlyactiveminutes sedentaryminutes calories
## 1                  13                  328              728     1985
## 2                  19                  217              776     1797
## 3                  34                  209              726     1745
## 4                  10                  221              773     1863
## 5                  20                  164              539     1728
## 6                  31                  264              775     2035
##   totalsleeprecords totalminutesasleep totaltimeinbed
## 1                 1                327            346
## 2                 2                384            407
## 3                 1                412            442
## 4                 2                340            367
## 5                 1                700            712
## 6                 1                304            320


Now that the tables have been selected, cleaned, and merged, we will now move forward with the analysis.



5. Analyze

Our analysis will be focusing on understanding the following:


i - Relationship between calories vs. total no. of steps:

We will first analyze the data between the amount of calories burned vs. total number of steps taken on a daily basis. To do this, we have utilized the code below:

d_steps_calories <- select(daily_activity_sleep, totalsteps, calories)

ggplot(data = d_steps_calories, aes(x = totalsteps, y = calories)) +
  geom_point() +
  stat_cor(method = "pearson", type = "text", label.x = 0.5, label.y = 0.9) +
  stat_smooth(method = "lm", se = FALSE) +
  labs(title = "Calories Burned vs. Total Steps")

From an overall view, we can see that as the number of steps increasing, the calories burned also increases. Initially, we can say that there is a positive correlation. However, the calculated R value (0.41) indicates a low positive correlation. This means that although total steps may have an effect on calories burned, other factors such as body size and composition, birth sex, age, etc. also play a role in calories burned. (Source)



ii - Relationship between sedentary minutes vs. minutes slept:

An interesting analysis would be to understand how sedentary minutes may affect the minutes slept during the day. To form a comparison, the following code was used:

d_sedentary_sleep <- select(daily_activity_sleep, sedentaryminutes, totalminutesasleep)

ggplot(data = d_sedentary_sleep, aes(x = sedentaryminutes, y = totalminutesasleep)) +
  geom_point() +
  stat_cor(method = "pearson", type = "text", label.x = 0.5, label.y = 0.9) +
  stat_smooth(method = "lm", se = FALSE) +
  labs(title = "Sedentary Minutes vs. Total Minutes Slept")

From the plot, we can see that an indirectly proportional relationship is observed when comparing sedentary minutes to the time spent asleep (daily level). Although, it is important to note that this relationship only falls on the moderately negative correlation due to the -0.6 R value. However, an article regarding the study of sedentary behavior and sleep quality further highlights that sedentary behavior was associated to a higher risk of insomnia, sleep disturbance, and poorer sleep quality (Source). This study further emphasizes that sleep quality, or in this case, “time slept”, is negatively affected by increased sedentary time.



iii - Quantifying sedentary minutes:

According to the Department of Health (Govt. of Australia), the maximum sedentary time for adults is between 7 – 10 hours (420 – 600 minutes) (Source). To further understand and visualize the percentage of the sample following this recommendation, we have utilized the following code to generate the bar graph shown below.


Getting the average sedentary minutes of each user and creating a table to track each user into a sequential ID (1, 2, 3, etc.) for easier plotting:

d_sedentary_total <- select(daily_activity_sleep, id, sedentaryminutes) %>% 
  group_by(id) %>%
  summarise(sedentaryminutes_avg= mean(sedentaryminutes))

d_sedentary_total$id_series <- seq(1, 24)
ggplot(d_sedentary_total, aes(x = id_series, y = sedentaryminutes_avg)) +
  geom_bar(stat="identity", fill = "darkblue", colour = "black") +
  geom_hline(yintercept = 600, linetype = "dashed", color = "red") +
  labs(title = "Average Sedentary Minutes")

Note: Each bar in the “id_series” axis represents a unique user (24 bars for 24 unique users)



iv - Measuring user smart device usage

Last case of the analysis would be to understand the frequency of smart device usage between the 24 unique user IDs. This was done by counting the number of records per user over the 2-month period of the data collected.
Smart device usage was categorized into 3:

  • High Usage = 21 to 30 days

  • Average Usage = 11 to 20 days

  • Low Usage = 1 to 10 days


First, we will create a new table to count the number of days each unique user ID had a record.

d_usage <- daily_activity_sleep %>% 
  group_by(id) %>% 
  summarise(days_used = sum(n()))


Then we will modify the table to categorize the users between usage frequency, and the percent each category represents:

d_usage_summary <- d_usage %>%  
  mutate(usage = ifelse(d_usage$days_used <= 10, "Low Usage",
                ifelse(d_usage$days_used <= 20, "Average Usage", "High Usage"))) %>%
  group_by(usage) %>%
  summarise(count =n()) %>% 
  mutate(percent_usage = round((count/24)*100))


Plotting a pie chart to visualize the categorization of users on usage frequency:

d_usage_summary %>%
  ggplot(aes(x = "",y = percent_usage, fill = usage)) +
  geom_bar(stat = "identity", width = 1)+
  coord_polar("y", start=0)+
  theme_minimal()+
  theme(axis.title.x= element_blank(),
        axis.title.y = element_blank(),
        panel.border = element_blank(), 
        panel.grid = element_blank(), 
        axis.ticks = element_blank(),
        axis.text.x = element_blank(),
        plot.title = element_text(hjust = 0.5, size = 14, face = "bold")) +
  geom_text(aes(label = c("12%", "50%", "38%")), position = position_stack(vjust = 0.5)) +
  scale_fill_manual(values = c("#3982b8","#6abce2","#b8e2f0")) +
  labs(title="Daily Usage of Smart Device")

It can be seen from the chart that only 50% of the users have a high usage of the smart devices. This indicates that not all users diligently use their smart devices to track their health and fitness conditions.



6. Act

Bellabeat’s dedication to empower women with knowledge about their own health and habits is critical for their business to continue their rapid growth. With the analysis provided in this case study, Bellabeat will be able to make informed, data-driven decisions, which will further cement their presence in the tech-driven wellness space for women.


Data collection: One key aspect in understanding consumer trends is by collection of the data itself. Bellabeat could utilize their Bellabeat app to store and track user data instead of relying on third party data collection sources. This will ensure that data tracked will include demographic information of users which may be a vital factor in making an informed marketing strategy.


Ultimately, the findings from the analysis phase enabled us to draw specific recommendations to help guide Bellabeat’s marketing strategy:

Recommendation Descripton
Enable notifications for sedentary time We have classified that 22 out of 24 (91%) of users in the sample size exceeded the maximum recommended daily sedentary minutes. To tackle this, Bellabeat could send notifications to users notifying them of their sedentary minutes by utilizing the geolocation services feature (to track distance moved). Note that there was a negative correlation with sedentary minutes vs. minutes slept, indicating that lesser sedentary minutes could lead to a better quality and longer sleep.

Bellabeat could utilize the marketing strategy of “longer and quality” sleep by advocating for a more active lifestyle (reduced sedentary minutes).
Enable notifications for inactivity & give daily incentives for smart device usage (daily log-in tracking). We have quantified that 50% of the sample size (12% - Average Usage, 38% - Low Usage) were not using their smart device trackers as often as possible. This insight could help Bellabeat campaign for the importance of using their smart devices (as often as possible) to accurately track and record each individual’s wellness level.

Bellabeat could include a daily “log-in” incentive that increases exponentially (Day 1 – Earn 0.25 coins, Day 2 – Earn 0.5 coins, etc.), which caps and resets on a weekly basis (Day 7 – Earn 2 coins). Assuming that Bellabeat has partner organizations, these coins could then be used as a discount voucher for purchases made within those partner businesses.
Trophy system for users The analysis shows that more steps taken = more calories burned, and less sedentary time = longer and better quality sleep. This implies that a healthier lifestyle involves more “movement” and activity.

Bellabeat could market the importance of having a healthier lifefestyle by including a trophy system for users who earn specific badges for specific milestones. A sample idea would be:
* Roadrunner Badge – achieved when a user completes 50,000 steps in a week
* Speed Demon Badge – achieved when a user completes 100,000 steps in a week.

These badges could be in categorized in tiers (to highlight degree of difficulty) and should also be publicly available, so that users may display it in the profile section in the Bellabeat app, and share it amongst people within their community. This encourages a healthy sense of competition amongst users.



All in all, the recommendations listed in the table above can be further emphasized by reinstating the general (must have) in Bellabeat’s smart devices:

  1. Include various designs – so users can carry the devices to any occasion

  2. Promote durable and long-lasting batteries – so users will not worry about battery life

  3. Weather-proof build - so the feature-packed devices are complimented by the robust build

  4. Warranty – so the trust and loyalty of consumers remain