Bellabeat Case Study

How Can a Wellness Technology Company Play It Smart?

1. Summary (Ask Phase)

This capstone case study is for the Google Data Analytics Professional Certificate. The scenario is as follows: You are a junior data analyst working on the marketing analyst team at Bellabeat. You have been asked to focus on one of Bellabeat’s products and analyze smart device data to gain insight into how consumers are using their smart devices. The insights you discover will then help guide a marketing strategy for the company. You will present your analysis to the Bellabeat executive team along with your recommendations for Bellabeat’s marketing strategy.

1.1 About the company

Bellabeat is a high-tech wellness company that manufactures beautifully designed health-focused smart products for women. Even though Bellabeat is considered a small company, they have grown rapidly and have offices around the world.

The current advertising strategy consists of traditional methods but focuses on digital marketing. You may see or hear about Bellabeat on Facebook, Instagram, Twitter, YouTube, Google Search, Google Ads, radio stations, billboards, print, and television.

1.2 Bellabeats products

Bellabeat app: provides health related data for activity, sleep, menstrual cycle, and habits. The Bellabeat app connects to Bellabeat’s other smart wellness products.
Leaf: wellness tracker that can be worn as a bracelet, necklace, or clip
Time: wellness watch with a timeless classic look
Spring: smart water bottle that tracks daily water intake
Bellabeat membership: 24/7 personalized guidance on health & wellness based on lifestyle and goals.

1.3 Stakeholders

Urška Sršen - Cofounder and
Chief Creative Officer Sando Mur - Mathematician, Cofounder, and key member of the executive team
Bellabeat marketing analytics team

1.4 Business Task

Business Task: Discover growth opportunities to improve Bellabeat and their marketing strategy by analyzing smart device data.

Stakeholder Urška Sršen has asked for us to focus on one of Bellabeat’s products, analyze smart device usage data, find trends, and gain insights into how consumers use non-Bellabeat smart devices.

The Bellabeat product I’ve chosen for this case study is the Bellabeat app.

2. Prepare phase

Stakeholder Urška Sršen encouraged the use of a specific public data set, FitBit Fitness Tracker Data. Sršen tells us that this data set may have some limitations and encourages us to add another data set to help address those limitations.

2.1 FitBit Fitness Tracker Metadata

FitBit Fitness Tracker Data is an open-source data set made available by Mobius on Kaggle and was last updated two years ago. A CC0: Public Domain license link is located on Fitbit Fitness Tracker Data page and has been verified.

Surveys for Fitbit Fitness Tracker Data were generated and distributed by Amazon Mechanical Turk between March 12, 2016 - May 12, 2016 to thirty(30) eligible Fitbit users. The thirty(30) Fitbit users consented to submitting personal tracker data, including minute-level output for physical activity, heart rate, steps, and sleep monitoring.

Fitabase was used to collect and store survey data. Fitabase is a comprehensive data management platform designed to support innovative research projects using wearable and internet-connected devices. The Fitbit Fitness Tracker Data made available to us dates from April 12, 2016 - May 12, 2016.

2.2 Data Set Information

FitBit Fitness Tracker Data is stored in 18 CSV Microsoft Excel files. To better understand these files and what they contain I downloaded them to my computer and opened them in Excel. The data format is long and structured. These are the files:

dailyActivity_merged.csv
dailyCalories_merged.csv
dailyIntensities_merged.csv
dailySteps_merged.csv
heartrate_seconds_merged.csv
hourlyCalories_merged.csv
hourlyIntensities_merged.csv
hourlySteps_merged.csv
minuteCaloriesNarrow_merged.csv
minuteCaloriesWide_merged.csv
minuteIntensitiesNarrow_merged.csv
minuteIntensitiesWide_merged.csv
minuteMETsNarrow_merged.csv
minuteSleep_merged.csv
minuteStepsNarrow_merged.csv
minuteStepsWide_merged.csv
sleepDay_merged.csv
weightLogInfo_merged.csv

2.3 Sort & Filter Data

These are the steps I took to sort and filter the data:

Made copies of all the CSV files in Excel.
Made each file into its own tab on one Excel spreadsheet called “preparing fitbit data sheets”.
These three data sets, dailyCalories_merged, dailyIntensities_merged, and dailySteps_merged, can be removed because their data is in dailyActivity_merged.
These six data sets, minuteCaloriesNarrow_merged, minuteCaloriesWide_merged, minuteIntensitiesNarrow_merged, minuteIntensitiesWide_merged, minuteStepsNarrow_merged, minuteStepsWide_merged I have chosen not to use after reviewing their data because the important information that we need from them is already located in: hourlySteps_merged, hourlyCalories_merged, and hourlyIntensities_merged.
The data set minuteSleep_merged has a column called ‘value’ that contains the only useful information for this particular data set. Each cell indicates the sleep state for that minute of the participant. 1=asleep, 2=restless, 3=awake. (Information provided by Fitabase Data Dictionary page 17). While this is good information, it will not be necessary to use for this case study so it will be removed.
The data sets heartrate_seconds_merged and minuteMETsNarrow_merged will also be removed. While it is good that there is lots of data for these data sets, it won’t be necessary to use them for this analysis.

There are now six data sets to work with.
In the data set dailyActivity_merged:
- Columns ‘TotalDistance’ and ‘TrackerDistance’ have the same exact data so I’ll remove the column ‘TrackerDistance’.
- Columns ‘LoggedActivitiesDistance’ and ‘SedentaryActiveDistance’ have almost no data, so I’ll remove these columns. They will not impact this analysis.
Column ‘LogId’ in data set weightLogInfo_merged and column ‘logId’ in data set minuteSleep_merged refer to unique log Id numbers in Fitbit’s system for that sleep record. This information isn’t useful to us because we already have the participants Id numbers in another column that is consistent with all the data sets. So these two columns will be removed.
Column ‘Fat’ in data set weightLogInfo_merged has two entries which is not enough data to compare anything to, so this column will be removed.
Filtered all data sets from the smallest Id number to the largest Id number.
Saved each data set with “prepared” at the end of the file name. These are the six data sets I’ll work with:
- dailyActivity_merged_prepared
- hourlyCalories_merged_prepared
- hourlyIntensities_merged_prepared
- hourlySteps_merged_prepared
- sleepDay_merged_prepared
- weightLogInfo_merged_prepared

2.4 Limitations

There are many limitations for this data.

The survey takes place over a short duration of time, two months, and the data collected spans over one month. A longer period of time for surveys and data collection would provide more useful and accurate results.
The data was created six years ago, it’s currently 2022. Even though Mobius updated the notebook on Kaggle two years ago, the data is still old.
It is unknown what the gender, age, and location is for the participants.
There seems to be data for 33 participants but only 30 were mentioned in the metadata. This could be considered sample bias. Having only 33 participants is an under-representation of all the fitbit users out there.
In the data there is evidence that suggests some participants take their fitbit fitness tracker off which can affect the data and results.
There are a few terms and abbreviations used in the data that are unclear. Fortunately, in the discussion section on the Fitbit Fitness Tracker Data notebook on Kaggle, Laimis Andrijauskas provided a link to the Fitabase Data Dictionary which clears up a lot of confusion.
In the Fitabase Data Dictionary there is an important note on page 15 for the file minuteMETsNarrow_merged that states, “All MET values exported from Fitabase are multiplied by 10. Please divide by 10 to get accurate MET values.” This would have affected my results and understanding of this data if I didn’t know about the Fitabase Data Dictionary and I were to have used the data.
The integrity of the Fitbit Fitness Tracker Data is questionable based on the above limitations.

3. Process Phase

Some data sets have large amounts of data, so I will switch from Excel to R because it can handle large data sets easily. These are the packages I will be using in R:

tidyverse
tidyr
dplyr
lubridate
janitor
skimr
readr
tibble
ggplot2
yaml

3.1 Cleaning Data

Loading packages

library(tidyverse)
library(tidyr)
library(dplyr)
library(ggplot2)
library(janitor)
library(lubridate)
library(readr)
library(skimr)
library(tibble)
library(yaml)

Loading the data sets, cleaning the file names, and cleaning the column names so we have consistent, easy to read information.

#Changed 'dailyActivity_merged_prepared.csv' to 'daily_activity'
daily_activity <- read_csv("C:/Users/jessi/OneDrive/Desktop/prepared data/dailyActivity_merged_prepared.csv") %>%
  clean_names()

#Changed 'hourlyCalories_merged_prepared.csv' to 'hourly_calories'
hourly_calories <- read_csv("C:/Users/jessi/OneDrive/Desktop/prepared data/hourlyCalories_merged_prepared.csv") %>% 
  clean_names()

#Changed 'hourlyIntensities_merged_prepared.csv' to 'hourly_intensities'
hourly_intensities <- read_csv("C:/Users/jessi/OneDrive/Desktop/prepared data/hourlyIntensities_merged_prepared.csv") %>% 
  clean_names()

#Changed 'hourlySteps_merged_prepared.csv'  to 'hourly_steps'
hourly_steps <- read_csv("C:/Users/jessi/OneDrive/Desktop/prepared data/hourlySteps_merged_prepared.csv") %>% 
  clean_names()

#Changed 'sleepDay_merged_prepared.csv' to 'daily_sleep'
daily_sleep <- read_csv("C:/Users/jessi/OneDrive/Desktop/prepared data/sleepDay_merged_prepared.csv") %>% 
  clean_names()

#Changed 'weightLogInfo_merged_prepared.csv' to 'weight_log'
weight_log <- read_csv("C:/Users/jessi/OneDrive/Desktop/prepared data/weightLogInfo_merged_prepared.csv") %>% 
  clean_names()

Summary of the data sets.

str(daily_activity)

## spec_tbl_df [940 × 12] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ id                        : num [1:940] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ activity_date             : chr [1:940] "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
##  $ total_steps               : num [1:940] 13162 10735 10460 9762 12669 ...
##  $ total_distance            : num [1:940] 8.5 6.97 6.74 6.28 8.16 ...
##  $ very_active_distance      : num [1:940] 1.88 1.57 2.44 2.14 2.71 ...
##  $ moderately_active_distance: num [1:940] 0.55 0.69 0.4 1.26 0.41 ...
##  $ light_active_distance     : num [1:940] 6.06 4.71 3.91 2.83 5.04 ...
##  $ very_active_minutes       : num [1:940] 25 21 30 29 36 38 42 50 28 19 ...
##  $ fairly_active_minutes     : num [1:940] 13 19 11 34 10 20 16 31 12 8 ...
##  $ lightly_active_minutes    : num [1:940] 328 217 181 209 221 164 233 264 205 211 ...
##  $ sedentary_minutes         : num [1:940] 728 776 1218 726 773 ...
##  $ calories                  : num [1:940] 1985 1797 1776 1745 1863 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Id = col_double(),
##   ..   ActivityDate = col_character(),
##   ..   TotalSteps = col_double(),
##   ..   TotalDistance = col_double(),
##   ..   VeryActiveDistance = col_double(),
##   ..   ModeratelyActiveDistance = col_double(),
##   ..   LightActiveDistance = col_double(),
##   ..   VeryActiveMinutes = col_double(),
##   ..   FairlyActiveMinutes = col_double(),
##   ..   LightlyActiveMinutes = col_double(),
##   ..   SedentaryMinutes = col_double(),
##   ..   Calories = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

str(daily_sleep)

## spec_tbl_df [413 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ id                  : num [1:413] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ sleep_day           : chr [1:413] "4/12/2016" "4/13/2016" "4/15/2016" "4/16/2016" ...
##  $ total_sleep_records : num [1:413] 1 2 1 2 1 1 1 1 1 1 ...
##  $ total_minutes_asleep: num [1:413] 327 384 412 340 700 304 360 325 361 430 ...
##  $ total_time_in_bed   : num [1:413] 346 407 442 367 712 320 377 364 384 449 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Id = col_double(),
##   ..   SleepDay = col_character(),
##   ..   TotalSleepRecords = col_double(),
##   ..   TotalMinutesAsleep = col_double(),
##   ..   TotalTimeInBed = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

str(hourly_calories)

## spec_tbl_df [22,099 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ id           : num [1:22099] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ activity_hour: chr [1:22099] "4/12/2016 0:00" "4/12/2016 1:00" "4/12/2016 2:00" "4/12/2016 3:00" ...
##  $ calories     : num [1:22099] 81 61 59 47 48 48 48 47 68 141 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Id = col_double(),
##   ..   ActivityHour = col_character(),
##   ..   Calories = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

str(hourly_intensities)

## spec_tbl_df [22,099 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ id               : num [1:22099] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ activity_hour    : chr [1:22099] "4/12/2016 0:00" "4/12/2016 1:00" "4/12/2016 2:00" "4/12/2016 3:00" ...
##  $ total_intensity  : num [1:22099] 20 8 7 0 0 0 0 0 13 30 ...
##  $ average_intensity: num [1:22099] 0.333 0.133 0.117 0 0 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Id = col_double(),
##   ..   ActivityHour = col_character(),
##   ..   TotalIntensity = col_double(),
##   ..   AverageIntensity = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

str(hourly_steps)

## spec_tbl_df [22,099 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ id           : num [1:22099] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ activity_hour: chr [1:22099] "4/12/2016 0:00" "4/12/2016 1:00" "4/12/2016 2:00" "4/12/2016 3:00" ...
##  $ step_total   : num [1:22099] 373 160 151 0 0 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Id = col_double(),
##   ..   ActivityHour = col_character(),
##   ..   StepTotal = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>

str(weight_log)

## spec_tbl_df [67 × 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ id              : num [1:67] 1.50e+09 1.50e+09 1.93e+09 2.87e+09 2.87e+09 ...
##  $ date            : chr [1:67] "5/2/2016 23:59" "5/3/2016 23:59" "4/13/2016 1:08" "4/21/2016 23:59" ...
##  $ weight_kg       : num [1:67] 52.6 52.6 133.5 56.7 57.3 ...
##  $ weight_pounds   : num [1:67] 116 116 294 125 126 ...
##  $ bmi             : num [1:67] 22.6 22.6 47.5 21.5 21.7 ...
##  $ is_manual_report: logi [1:67] TRUE TRUE FALSE TRUE TRUE TRUE ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Id = col_double(),
##   ..   Date = col_character(),
##   ..   WeightKg = col_double(),
##   ..   WeightPounds = col_double(),
##   ..   BMI = col_double(),
##   ..   IsManualReport = col_logical()
##   .. )
##  - attr(*, "problems")=<externalptr>

Viewing column names.

colnames(daily_activity)

##  [1] "id"                         "activity_date"             
##  [3] "total_steps"                "total_distance"            
##  [5] "very_active_distance"       "moderately_active_distance"
##  [7] "light_active_distance"      "very_active_minutes"       
##  [9] "fairly_active_minutes"      "lightly_active_minutes"    
## [11] "sedentary_minutes"          "calories"

colnames(daily_sleep)

## [1] "id"                   "sleep_day"            "total_sleep_records" 
## [4] "total_minutes_asleep" "total_time_in_bed"

colnames(hourly_calories)

## [1] "id"            "activity_hour" "calories"

colnames(hourly_intensities)

## [1] "id"                "activity_hour"     "total_intensity"  
## [4] "average_intensity"

colnames(hourly_steps)

## [1] "id"            "activity_hour" "step_total"

colnames(weight_log)

## [1] "id"               "date"             "weight_kg"        "weight_pounds"   
## [5] "bmi"              "is_manual_report"

Counting how many unique participant Id’s there actually are in each data set.

n_unique(daily_activity$id)

## [1] 33

n_unique(daily_sleep$id)

## [1] 24

n_unique(hourly_calories$id)

## [1] 33

n_unique(hourly_intensities$id)

## [1] 33

n_unique(hourly_steps$id)

## [1] 33

n_unique(weight_log$id)

## [1] 8

Some data sets have more than 30 participants and some have less. This can cause our data to be skewed since the metadata claimed there were 30 participants.

Looking for any missing values.

skim_without_charts(daily_activity)

Data summary
Name	daily_activity
Number of rows	940
Number of columns	12
_______________________
Column type frequency:
character	1
numeric	11
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
activity_date	0	1	8	9	0	31	0

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100
id	1	4.855407e+09	2.424805e+09	1503960366	2.320127e+09	4.445115e+09	6.962181e+09	8.877689e+09
total_steps	1	7.637910e+03	5.087150e+03	0	3.789750e+03	7.405500e+03	1.072700e+04	3.601900e+04
total_distance	1	5.490000e+00	3.920000e+00	0	2.620000e+00	5.240000e+00	7.710000e+00	2.803000e+01
very_active_distance	1	1.500000e+00	2.660000e+00	0	0.000000e+00	2.100000e-01	2.050000e+00	2.192000e+01
moderately_active_distance	1	5.700000e-01	8.800000e-01	0	0.000000e+00	2.400000e-01	8.000000e-01	6.480000e+00
light_active_distance	1	3.340000e+00	2.040000e+00	0	1.950000e+00	3.360000e+00	4.780000e+00	1.071000e+01
very_active_minutes	1	2.116000e+01	3.284000e+01	0	0.000000e+00	4.000000e+00	3.200000e+01	2.100000e+02
fairly_active_minutes	1	1.356000e+01	1.999000e+01	0	0.000000e+00	6.000000e+00	1.900000e+01	1.430000e+02
lightly_active_minutes	1	1.928100e+02	1.091700e+02	0	1.270000e+02	1.990000e+02	2.640000e+02	5.180000e+02
sedentary_minutes	1	9.912100e+02	3.012700e+02	0	7.297500e+02	1.057500e+03	1.229500e+03	1.440000e+03
calories	1	2.303610e+03	7.181700e+02	0	1.828500e+03	2.134000e+03	2.793250e+03	4.900000e+03

skim_without_charts(daily_sleep)

Data summary
Name	daily_sleep
Number of rows	413
Number of columns	5
_______________________
Column type frequency:
character	1
numeric	4
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
sleep_day	0	1	8	9	0	31	0

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100
id	1	5.000979e+09	2.06036e+09	1503960366	3977333714	4702921684	6962181067	8792009665
total_sleep_records	1	1.120000e+00	3.50000e-01	1	1	1	1	3
total_minutes_asleep	1	4.194700e+02	1.18340e+02	58	361	433	490	796
total_time_in_bed	1	4.586400e+02	1.27100e+02	61	403	463	526	961

skim_without_charts(hourly_calories)

Data summary
Name	hourly_calories
Number of rows	22099
Number of columns	3
_______________________
Column type frequency:
character	1
numeric	2
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
activity_hour	0	1	13	15	0	736	0

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100
id	0	1	4.848235e+09	2.4225e+09	1503960366	2320127002	4445114986	6962181067	8877689391
calories	0	1	9.739000e+01	6.0700e+01	42	63	83	108	948

skim_without_charts(hourly_intensities)

Data summary
Name	hourly_intensities
Number of rows	22099
Number of columns	4
_______________________
Column type frequency:
character	1
numeric	3
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
activity_hour	0	1	13	15	0	736	0

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100
id	1	4.848235e+09	2.4225e+09	1503960366	2320127002	4.445115e+09	6.962181e+09	8877689391
total_intensity	1	1.204000e+01	2.1130e+01	0	0	3.000000e+00	1.600000e+01	180
average_intensity	1	2.000000e-01	3.5000e-01	0	0	5.000000e-02	2.700000e-01	3

skim_without_charts(hourly_steps)

Data summary
Name	hourly_steps
Number of rows	22099
Number of columns	3
_______________________
Column type frequency:
character	1
numeric	2
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
activity_hour	0	1	13	15	0	736	0

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100
id	0	1	4.848235e+09	2.4225e+09	1503960366	2320127002	4445114986	6962181067	8877689391
step_total	0	1	3.201700e+02	6.9038e+02	0	0	40	357	10554

skim_without_charts(weight_log)

Data summary
Name	weight_log
Number of rows	67
Number of columns	6
_______________________
Column type frequency:
character	1
logical	1
numeric	4
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
date	0	1	13	15	0	56	0

Variable type: logical

skim_variable	n_missing	complete_rate	mean	count
is_manual_report	0	1	0.61	TRU: 41, FAL: 26

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100
id	1	7.009282e+09	1.950322e+09	1.50396e+09	6.962181e+09	6.962181e+09	8.877689e+09	8.877689e+09
weight_kg	1	7.204000e+01	1.392000e+01	5.26000e+01	6.140000e+01	6.250000e+01	8.505000e+01	1.335000e+02
weight_pounds	1	1.588100e+02	3.070000e+01	1.15960e+02	1.353600e+02	1.377900e+02	1.875000e+02	2.943200e+02
bmi	1	2.519000e+01	3.070000e+00	2.14500e+01	2.396000e+01	2.439000e+01	2.556000e+01	4.754000e+01

The data tells us that there are no missing values.

Checking for duplicates.

sum(duplicated(daily_activity))

## [1] 0

sum(duplicated(daily_sleep))

## [1] 3

sum(duplicated(hourly_calories))

## [1] 0

sum(duplicated(hourly_intensities))

## [1] 0

sum(duplicated(hourly_steps))

## [1] 0

sum(duplicated(weight_log))

## [1] 0

There are 3 duplicates in ‘daily_sleep’ so those will need to be removed.

#Removing duplicates.
daily_sleep <- daily_sleep %>% 
  distinct()

#Checking that the duplicates were removed.
sum(duplicated(daily_sleep))

## [1] 0

All of the data sets have some kind of change that needs to happen with the column that contains the date or date and time. For 2 of the data sets I’m going to change the format of the date and change the column name to ‘date’. For 4 data sets I will change the format of the date and time, move date and time into their own columns called ‘date’ and ‘time’, and then remove the original column that contained both date and time.

daily_activity <- daily_activity %>% 
  rename(date = activity_date) %>% 
  mutate(date = as_date(date, format = "%m/%d/%Y"))

daily_sleep <- daily_sleep %>%
  rename(date = sleep_day) %>%
  mutate(date = as_date(date, format = "%m/%d/%Y"))

hourly_calories <- hourly_calories %>% 
  mutate(activity_hour = as.POSIXct(activity_hour, format = "%m/%d/%Y %H:%M" , tz=Sys.timezone())) %>%
  separate(activity_hour, into = c('date', 'time'), sep = ' ', remove = TRUE) 

hourly_intensities <- hourly_intensities %>%
  mutate(activity_hour = as.POSIXct(activity_hour, format = "%m/%d/%Y %H:%M" , tz=Sys.timezone())) %>%
  separate(activity_hour, into = c('date', 'time'), sep = ' ', remove = TRUE)

hourly_steps <- hourly_steps %>%
  mutate(activity_hour = as.POSIXct(activity_hour, format = "%m/%d/%Y %H:%M" , tz=Sys.timezone())) %>%
  separate(activity_hour, into = c('date', 'time'), sep = ' ', remove = TRUE)

weight_log <- weight_log %>%
  mutate(date = as.POSIXct(date, format = "%m/%d/%Y %H:%M" , tz=Sys.timezone())) %>%
  separate(date, into = c('date', 'time'), sep = ' ', remove = TRUE)

Double checking that the codes ran correctly

head(daily_activity)

## # A tibble: 6 × 12
##           id date       total_…¹ total…² very_…³ moder…⁴ light…⁵ very_…⁶ fairl…⁷
##        <dbl> <date>        <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 1503960366 2016-04-12    13162    8.5     1.88   0.550    6.06      25      13
## 2 1503960366 2016-04-13    10735    6.97    1.57   0.690    4.71      21      19
## 3 1503960366 2016-04-14    10460    6.74    2.44   0.400    3.91      30      11
## 4 1503960366 2016-04-15     9762    6.28    2.14   1.26     2.83      29      34
## 5 1503960366 2016-04-16    12669    8.16    2.71   0.410    5.04      36      10
## 6 1503960366 2016-04-17     9705    6.48    3.19   0.780    2.51      38      20
## # … with 3 more variables: lightly_active_minutes <dbl>,
## #   sedentary_minutes <dbl>, calories <dbl>, and abbreviated variable names
## #   ¹total_steps, ²total_distance, ³very_active_distance,
## #   ⁴moderately_active_distance, ⁵light_active_distance, ⁶very_active_minutes,
## #   ⁷fairly_active_minutes
## # ℹ Use `colnames()` to see all variable names

head(daily_sleep)

## # A tibble: 6 × 5
##           id date       total_sleep_records total_minutes_asleep total_time_in…¹
##        <dbl> <date>                   <dbl>                <dbl>           <dbl>
## 1 1503960366 2016-04-12                   1                  327             346
## 2 1503960366 2016-04-13                   2                  384             407
## 3 1503960366 2016-04-15                   1                  412             442
## 4 1503960366 2016-04-16                   2                  340             367
## 5 1503960366 2016-04-17                   1                  700             712
## 6 1503960366 2016-04-19                   1                  304             320
## # … with abbreviated variable name ¹total_time_in_bed

head(hourly_calories)

## # A tibble: 6 × 4
##           id date       time     calories
##        <dbl> <chr>      <chr>       <dbl>
## 1 1503960366 2016-04-12 00:00:00       81
## 2 1503960366 2016-04-12 01:00:00       61
## 3 1503960366 2016-04-12 02:00:00       59
## 4 1503960366 2016-04-12 03:00:00       47
## 5 1503960366 2016-04-12 04:00:00       48
## 6 1503960366 2016-04-12 05:00:00       48

head(hourly_intensities)

## # A tibble: 6 × 5
##           id date       time     total_intensity average_intensity
##        <dbl> <chr>      <chr>              <dbl>             <dbl>
## 1 1503960366 2016-04-12 00:00:00              20             0.333
## 2 1503960366 2016-04-12 01:00:00               8             0.133
## 3 1503960366 2016-04-12 02:00:00               7             0.117
## 4 1503960366 2016-04-12 03:00:00               0             0    
## 5 1503960366 2016-04-12 04:00:00               0             0    
## 6 1503960366 2016-04-12 05:00:00               0             0

head(hourly_steps)

## # A tibble: 6 × 4
##           id date       time     step_total
##        <dbl> <chr>      <chr>         <dbl>
## 1 1503960366 2016-04-12 00:00:00        373
## 2 1503960366 2016-04-12 01:00:00        160
## 3 1503960366 2016-04-12 02:00:00        151
## 4 1503960366 2016-04-12 03:00:00          0
## 5 1503960366 2016-04-12 04:00:00          0
## 6 1503960366 2016-04-12 05:00:00          0

head(weight_log)

## # A tibble: 6 × 7
##           id date       time     weight_kg weight_pounds   bmi is_manual_report
##        <dbl> <chr>      <chr>        <dbl>         <dbl> <dbl> <lgl>           
## 1 1503960366 2016-05-02 23:59:00      52.6          116.  22.6 TRUE            
## 2 1503960366 2016-05-03 23:59:00      52.6          116.  22.6 TRUE            
## 3 1927972279 2016-04-13 01:08:00     134.           294.  47.5 FALSE           
## 4 2873212765 2016-04-21 23:59:00      56.7          125.  21.5 TRUE            
## 5 2873212765 2016-05-12 23:59:00      57.3          126.  21.7 TRUE            
## 6 4319703577 2016-04-17 23:59:00      72.4          160.  27.5 TRUE

Merging daily_activity and daily_sleep into a new data set called daily_log using full_join() function with ‘id’ and ‘date’ as the common column. Also, daily_activity has 530 more rows than daily_sleep so zeros will need to fill in the blanks instead of NA so that our daily_activity data doesn’t get cut off where daily_sleep ends (at row 410).

daily_log <- full_join(daily_activity, daily_sleep, by = c('id', 'date'))
daily_log[is.na(daily_log)] <- 0
str(daily_log)

## tibble [940 × 15] (S3: tbl_df/tbl/data.frame)
##  $ id                        : num [1:940] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ date                      : Date[1:940], format: "2016-04-12" "2016-04-13" ...
##  $ total_steps               : num [1:940] 13162 10735 10460 9762 12669 ...
##  $ total_distance            : num [1:940] 8.5 6.97 6.74 6.28 8.16 ...
##  $ very_active_distance      : num [1:940] 1.88 1.57 2.44 2.14 2.71 ...
##  $ moderately_active_distance: num [1:940] 0.55 0.69 0.4 1.26 0.41 ...
##  $ light_active_distance     : num [1:940] 6.06 4.71 3.91 2.83 5.04 ...
##  $ very_active_minutes       : num [1:940] 25 21 30 29 36 38 42 50 28 19 ...
##  $ fairly_active_minutes     : num [1:940] 13 19 11 34 10 20 16 31 12 8 ...
##  $ lightly_active_minutes    : num [1:940] 328 217 181 209 221 164 233 264 205 211 ...
##  $ sedentary_minutes         : num [1:940] 728 776 1218 726 773 ...
##  $ calories                  : num [1:940] 1985 1797 1776 1745 1863 ...
##  $ total_sleep_records       : num [1:940] 1 2 0 1 2 1 0 1 1 1 ...
##  $ total_minutes_asleep      : num [1:940] 327 384 0 412 340 700 0 304 360 325 ...
##  $ total_time_in_bed         : num [1:940] 346 407 0 442 367 712 0 320 377 364 ...

Merging hourly_calories, hourly_intensities, and hourly_steps into a new data set called hourly_log.

hourly_log <- print(list(hourly_steps, hourly_calories, hourly_intensities) %>%
        reduce(full_join, by = c('id', 'date', 'time')))

## # A tibble: 22,099 × 7
##            id date       time     step_total calories total_intensity average_…¹
##         <dbl> <chr>      <chr>         <dbl>    <dbl>           <dbl>      <dbl>
##  1 1503960366 2016-04-12 00:00:00        373       81              20      0.333
##  2 1503960366 2016-04-12 01:00:00        160       61               8      0.133
##  3 1503960366 2016-04-12 02:00:00        151       59               7      0.117
##  4 1503960366 2016-04-12 03:00:00          0       47               0      0    
##  5 1503960366 2016-04-12 04:00:00          0       48               0      0    
##  6 1503960366 2016-04-12 05:00:00          0       48               0      0    
##  7 1503960366 2016-04-12 06:00:00          0       48               0      0    
##  8 1503960366 2016-04-12 07:00:00          0       47               0      0    
##  9 1503960366 2016-04-12 08:00:00        250       68              13      0.217
## 10 1503960366 2016-04-12 09:00:00       1864      141              30      0.5  
## # … with 22,089 more rows, and abbreviated variable name ¹average_intensity
## # ℹ Use `print(n = ...)` to see more rows

str(hourly_log)

## tibble [22,099 × 7] (S3: tbl_df/tbl/data.frame)
##  $ id               : num [1:22099] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ date             : chr [1:22099] "2016-04-12" "2016-04-12" "2016-04-12" "2016-04-12" ...
##  $ time             : chr [1:22099] "00:00:00" "01:00:00" "02:00:00" "03:00:00" ...
##  $ step_total       : num [1:22099] 373 160 151 0 0 ...
##  $ calories         : num [1:22099] 81 61 59 47 48 48 48 47 68 141 ...
##  $ total_intensity  : num [1:22099] 20 8 7 0 0 0 0 0 13 30 ...
##  $ average_intensity: num [1:22099] 0.333 0.133 0.117 0 0 ...

The data has been cleaned and there are now 3 data sets to work with:

daily_log
hourly_log
weight_log

4. Analyzing Phase

The goal of this analysis is to gain insights into how consumers are using non-Bellabeat smart devices and give Bellabeat recommendations for improvements and new marketing strategies.

The Bellabeat product I chose to focus on was the Bellabeat App. Keeping that in mind, I focused on what it is participants tracked the most on their fitness tracker. This information would be stored directly in to an app for the participant to view. I’ll start my analysis by creating a few data sets that will show more detailed information.

4.1 Creating Data Sets

Adding weekdays to all data sets. This will help delegate individual days of the week to all categories of information.

daily_log$weekday <- weekdays(as.Date(daily_log$date))
hourly_log$weekday <- weekdays(as.Date(hourly_log$date))
weight_log$weekday <- weekdays(as.Date(weight_log$date))
str(daily_log)

## tibble [940 × 16] (S3: tbl_df/tbl/data.frame)
##  $ id                        : num [1:940] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ date                      : Date[1:940], format: "2016-04-12" "2016-04-13" ...
##  $ total_steps               : num [1:940] 13162 10735 10460 9762 12669 ...
##  $ total_distance            : num [1:940] 8.5 6.97 6.74 6.28 8.16 ...
##  $ very_active_distance      : num [1:940] 1.88 1.57 2.44 2.14 2.71 ...
##  $ moderately_active_distance: num [1:940] 0.55 0.69 0.4 1.26 0.41 ...
##  $ light_active_distance     : num [1:940] 6.06 4.71 3.91 2.83 5.04 ...
##  $ very_active_minutes       : num [1:940] 25 21 30 29 36 38 42 50 28 19 ...
##  $ fairly_active_minutes     : num [1:940] 13 19 11 34 10 20 16 31 12 8 ...
##  $ lightly_active_minutes    : num [1:940] 328 217 181 209 221 164 233 264 205 211 ...
##  $ sedentary_minutes         : num [1:940] 728 776 1218 726 773 ...
##  $ calories                  : num [1:940] 1985 1797 1776 1745 1863 ...
##  $ total_sleep_records       : num [1:940] 1 2 0 1 2 1 0 1 1 1 ...
##  $ total_minutes_asleep      : num [1:940] 327 384 0 412 340 700 0 304 360 325 ...
##  $ total_time_in_bed         : num [1:940] 346 407 0 442 367 712 0 320 377 364 ...
##  $ weekday                   : chr [1:940] "Tuesday" "Wednesday" "Thursday" "Friday" ...

str(hourly_log)

## tibble [22,099 × 8] (S3: tbl_df/tbl/data.frame)
##  $ id               : num [1:22099] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ date             : chr [1:22099] "2016-04-12" "2016-04-12" "2016-04-12" "2016-04-12" ...
##  $ time             : chr [1:22099] "00:00:00" "01:00:00" "02:00:00" "03:00:00" ...
##  $ step_total       : num [1:22099] 373 160 151 0 0 ...
##  $ calories         : num [1:22099] 81 61 59 47 48 48 48 47 68 141 ...
##  $ total_intensity  : num [1:22099] 20 8 7 0 0 0 0 0 13 30 ...
##  $ average_intensity: num [1:22099] 0.333 0.133 0.117 0 0 ...
##  $ weekday          : chr [1:22099] "Tuesday" "Tuesday" "Tuesday" "Tuesday" ...

str(weight_log)

## tibble [67 × 8] (S3: tbl_df/tbl/data.frame)
##  $ id              : num [1:67] 1.50e+09 1.50e+09 1.93e+09 2.87e+09 2.87e+09 ...
##  $ date            : chr [1:67] "2016-05-02" "2016-05-03" "2016-04-13" "2016-04-21" ...
##  $ time            : chr [1:67] "23:59:00" "23:59:00" "01:08:00" "23:59:00" ...
##  $ weight_kg       : num [1:67] 52.6 52.6 133.5 56.7 57.3 ...
##  $ weight_pounds   : num [1:67] 116 116 294 125 126 ...
##  $ bmi             : num [1:67] 22.6 22.6 47.5 21.5 21.7 ...
##  $ is_manual_report: logi [1:67] TRUE TRUE FALSE TRUE TRUE TRUE ...
##  $ weekday         : chr [1:67] "Monday" "Tuesday" "Wednesday" "Thursday" ...

Creating a column called ‘healthy_activity_minutes’ in the data set daily_log by adding ‘very_active_minutes’ and ‘fairly_active_minutes’ together. This will show how many participants exert themselves more and fall into a healthy activity level category.

daily_log$healthy_activity_minutes <- c(daily_log$very_active_minutes + daily_log$fairly_active_minutes)

head(daily_log)

## # A tibble: 6 × 17
##           id date       total_…¹ total…² very_…³ moder…⁴ light…⁵ very_…⁶ fairl…⁷
##        <dbl> <date>        <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 1503960366 2016-04-12    13162    8.5     1.88   0.550    6.06      25      13
## 2 1503960366 2016-04-13    10735    6.97    1.57   0.690    4.71      21      19
## 3 1503960366 2016-04-14    10460    6.74    2.44   0.400    3.91      30      11
## 4 1503960366 2016-04-15     9762    6.28    2.14   1.26     2.83      29      34
## 5 1503960366 2016-04-16    12669    8.16    2.71   0.410    5.04      36      10
## 6 1503960366 2016-04-17     9705    6.48    3.19   0.780    2.51      38      20
## # … with 8 more variables: lightly_active_minutes <dbl>,
## #   sedentary_minutes <dbl>, calories <dbl>, total_sleep_records <dbl>,
## #   total_minutes_asleep <dbl>, total_time_in_bed <dbl>, weekday <chr>,
## #   healthy_activity_minutes <dbl>, and abbreviated variable names
## #   ¹total_steps, ²total_distance, ³very_active_distance,
## #   ⁴moderately_active_distance, ⁵light_active_distance, ⁶very_active_minutes,
## #   ⁷fairly_active_minutes
## # ℹ Use `colnames()` to see all variable names

Creating a data set for totals by individual Id number. This will show some statistics for each individual participant for the total duration that data was submitted (31 days).

totals_per_id <- daily_log %>%
  group_by(id) %>%
  summarise(total_steps_per_id = sum(total_steps), total_healthy_activity_minutes = sum(very_active_minutes + fairly_active_minutes), total_days_tracked = sum(total_steps > 0), total_nights_tracked = sum(total_minutes_asleep > 0))
str(totals_per_id)

## tibble [33 × 5] (S3: tbl_df/tbl/data.frame)
##  $ id                            : num [1:33] 1.50e+09 1.62e+09 1.64e+09 1.84e+09 1.93e+09 ...
##  $ total_steps_per_id            : num [1:33] 375619 178061 218489 79982 28400 ...
##  $ total_healthy_activity_minutes: num [1:33] 1794 449 928 44 65 ...
##  $ total_days_tracked            : int [1:33] 30 31 30 21 17 31 31 31 18 31 ...
##  $ total_nights_tracked          : int [1:33] 25 0 4 3 5 0 28 1 15 0 ...

Creating a data set called weekday_log to see average daily steps based on the individual day of the week. This will show us the participants as a group for the average steps taken for each day of the week.

weekday_log <- daily_log %>%
  mutate(weekday = weekdays(date))

weekday_log$weekday <-ordered(weekday_log$weekday, levels=c("Monday", "Tuesday", "Wednesday", "Thursday",
"Friday", "Saturday", "Sunday"))

 weekday_log <-weekday_log %>%
  group_by(weekday) %>%
  summarize (average_daily_steps = mean(total_steps))

head(weekday_log)

## # A tibble: 6 × 2
##   weekday   average_daily_steps
##   <ord>                   <dbl>
## 1 Monday                  7781.
## 2 Tuesday                 8125.
## 3 Wednesday               7559.
## 4 Thursday                7406.
## 5 Friday                  7448.
## 6 Saturday                8153.

4.2 Analysis Summary

After creating new data sets some trends I noticed were: some days of the week were more popular than others for participants tracking steps, half of the participants engaged in healthy activity minutes, some participants hardly tracked anything, and very few participants tracked their weight.

My goal was to find trends in which categories participants tracked the most to see where their interests are when it comes to having a fitness tracker. Needing this information is imperative to being able to provide recommendations for Bellabeat.

5. Visualizations (Share Phase)

The visualizations I created include information about what participants tracked, what was most important to them. To give recommendations to Bellabeat I would first want to know why participants would purchase a fitness tracker in the first place. What features in the fitness tracker are the most popular?

5.1 Days Tracked

totals_per_id$id <-as.character(totals_per_id$id)
ggplot(data = totals_per_id) +
  geom_col(mapping = aes(id, total_days_tracked, fill =total_days_tracked))+
  theme(axis.text.x = element_text(angle = 90))+
  scale_fill_gradient(low = "plum1", high = "royalblue3")+
  labs(title = "The number of days participants tracked",
       subtitle="Days Tracked by Individual ID", x="ID", y="Number of days tracked")

This graph shows us the number of days participants tracked. Most of the participants wore their fitness tracker during the day, one third of the participants didn’t track multiple days, and one participant tracked only a couple days. From this information we can conclude that most participants are interested in the health related data that fitness trackers provide thus why they wear them pretty consistently.

5.2 Nights Tracked

totals_per_id$id <-as.character(totals_per_id$id)
ggplot(data = totals_per_id) +
  geom_col(mapping = aes(id, total_nights_tracked, fill =total_nights_tracked))+
  theme(axis.text.x = element_text(angle = 90))+
  scale_fill_gradient(low = "dodgerblue3", high = "mediumaquamarine")+
  labs(title = "The number of nights participants tracked",
       subtitle="Nights Tracked by Individual ID", x="ID", y="Number of nights tracked")

This graph shows us the number of nights participants wore their fitness tracker to bed. From this visual we can see that less than half of the participants wear their fitness tracker to bed which could be for so many reasons.

5.4 Weight Tracked

weight_tracked %>% 
ggplot(aes(x="",y=percent, fill=participants)) +
  geom_bar(stat = "identity", width = 1)+
  coord_polar("y", start=0)+
  theme_minimal()+
  theme(axis.title.x= element_blank(),
        axis.title.y = element_blank(),
        panel.border = element_blank(), 
        panel.grid = element_blank(), 
        axis.ticks = element_blank(),
        axis.text.x = element_blank(),
        plot.title = element_text(hjust = 0.5, size=14, face = "bold")) +
  scale_fill_manual(values = c("royalblue3", "plum")) +
  geom_text(aes(label = percent),
            position = position_stack(vjust = 0.5))+
  labs(title="How many participants tracked weight?")

In this pie chart we can see that a large percentage of participants did not track their weight.

5.5 Weekday Average Steps

ggplot(data=weekday_log, aes(x=weekday, y=average_daily_steps))+ 
  geom_bar(stat="identity", fill="plum4")+
  labs(title = "Days of the week participants get the most steps", 
       subtitle = "Average Total Steps Per Day", x="Weekday", y="Average Daily Steps")

This visual shows the average steps per day of the week for the participants as a whole. What’s interesting is that we can see that there are more steps taken on Tuesday and Saturday, then Monday and Wednesday following closely behind. With Sunday having the lowest average steps I would assume that participants use this day to rest and get ready for the work week.

5.6 Hourly Total Steps

hourly_log %>%
  group_by(time) %>%
  summarize(average_steps = mean(step_total)) %>%
  ggplot() +
  geom_col(mapping = aes(x=time, y = average_steps, fill = average_steps)) + 
  labs(title = "Most active time of day for participants", 
       subtitle = "Average Total Steps Per Hour", x="Hour", y="Steps") + 
  scale_fill_gradient(low = "blue", high = "pink")+
  theme(axis.text.x = element_text(angle = 90))

This graph shows us the average daily steps for participants per hour. We can see that most active times of day are from 12pm-2pm and from 5pm-7pm.

5.7 Steps & Intensity

ggplot(data = hourly_log)+
  geom_point(mapping = aes(x = step_total, y = total_intensity, color = id),)+
  geom_smooth(mapping = aes(x = step_total, y = total_intensity))+
  labs(title="Correlation between steps and intensity levels", 
       subtitle="Total steps & total intensity each hour by Id", 
       x="Total Steps", y="Total Intensity")

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

This graph shows us a representation of every hour tracked by Id for total steps and total intensity. After looking at this graph I realized that some participants reached Amazing totals for steps within one hour. I went back through the data and found some participants tracked over 8,000 steps within one hour. This makes me question the integrity of the data. But, there is a correlation between total steps and total intensity, the more steps in the hour the higher the intensity.

5.8 Healthy Activity Minutes

totals_per_id$id <-as.character(totals_per_id$id)
ggplot(data = totals_per_id) +
  geom_col(mapping = aes(id, total_healthy_activity_minutes, fill =total_healthy_activity_minutes))+
  theme(axis.text.x = element_text(angle = 90))+
  scale_fill_gradient(low = "slateblue2", high = "indianred2")+
  labs(title = "Participants total healthy activity minutes for 31 days",
       subtitle="Very active minute + fairly active minutes", x="ID", y="Healthy Activity Minutes")

For this graph I wanted to see how many of the participants engaged in “healthy” activity. As a reminder I added ‘very active minutes’ to ‘fairly active minutes’ to get “healthy active minutes”. This graph shows us that a very small group of participants exerted themselves to healthy activity levels.

6. Conclusion (Act Phase)

My analysis and visuals shows that participants wore their fitness tracker during the day to track steps. Counting steps is a base feature that leads to other statistics such as calories lost, distance traveled and weight loss.

76% of participants did not track their weight, more than 50% of participants did not track their sleep for more than 10 days, and more than 60% of participants did not engaged in healthy activity minutes exceeding 1,000 minutes a month. Based on these statistics, tracking steps is what’s important to participants.

Bellabeat can use this information and my analysis to promote their products, focus on promoting what features consumers look for in a fitness tracker, and showcase other features that stand out from the rest. Bellabeat can offer more than what your typical fitness tracker can and in a more comfortable way.

7. Recommendations

Bellabeat is a wellness brand focused on women. With that being said, it is hard to analyze data with so many limitations and not knowing the genders of our participants. This knowledge would help us make better recommendations for Bellabeat. I would recommend that Bellabeat conduct it’s own survey over a longer period of time to produce more accurate data results.

By analyzing the smart device data that was suggested, these are my recommendation for Bellabeat to discover growth opportunities to improve the company and their marketing strategy:

Target Clientele - Knowing that the consumers of Bellabeat products are young adult women, discover advertising techniques targeting that demographic would be benificial.
Notifications -
- Sending personalized notifications and messages makes a person feel more relevant and is more likely to be opened. This could be great motivation to perform the task that is asked of them.
- Create a notification that alerts the person when they have been in a sedentary state for a long period of time during the day.
- Remind them of their goals and suggest movement activity
- Set up a weekly notification to record weight and reward them for doing so.
- Promote healthy sleep habits with notifications suggesting winding down techniques, meditation, relaxing music or podcasts.
- Notifications for drinking more water, bonuses if they have Bellabeats smart water bottle.
Social Media - Having a online community of women share their success stories, achievements, workouts, and meal plans brings more people together. Adding the ability to ask questions anonymously and creating challenges would be important too.
Challenges - Re-engagement after someone is sick or goes on vacation can be hard for that person to get back into routine. Creating different kinds of challenges can keep users interested and engaged. Continuous motivation is key when it comes to consistently tracking your health. Creating games to keep users maintaining healthy habits and reward them for passing challenges can keep people engaged too.

Promote Bellabeat Products through social media reaching out to groups with women, companies that have fitness programs for employees, fitness facilities, and young moms. Bellabeat has many products that are amazing and not only help women track their health but are comfortable to wear.

Having the option to not wear something on your wrist constantly had me immediately interested. From personal experience, I hated wearing my fitness tracker to bed and I forget it sometimes when I put it on the charger. Having a necklace or clipping something to a shirt is a game changer. I’m sure I’m not alone with this feeling and many other women out there could benefit from knowing these items exist.

Bellabeats smart water bottle and products you don’t have to charge constantly like the Leaf or Time should be highly advertised. I look forward to seeing more of these products on the market soon.

*THANK YOU FOR TAKING THE TIME TO READ MY CASE STUDY

Bellabeat Case Study

Capstone Case Study for Google Data Analytics Professional Certificate

2022-08-17

How Can a Wellness Technology Company Play It Smart?

1. Summary (Ask Phase)

1.1 About the company

1.2 Bellabeats products

1.3 Stakeholders

1.4 Business Task

2. Prepare phase

2.1 FitBit Fitness Tracker Metadata

2.2 Data Set Information

2.3 Sort & Filter Data

2.4 Limitations

3. Process Phase

3.1 Cleaning Data

4. Analyzing Phase

4.1 Creating Data Sets

4.2 Analysis Summary

6. Conclusion (Act Phase)

7. Recommendations