Introduction

Welcome to my Google Data Analytics Professional Capstone Case Study. In this case study, I will act as a Data Analyst for Bellabeat and use Data Analytics to answer some business questions posed from an executive stakeholder. I will use the following steps to complete this Data Analysis:

Ask
- Define the problem I’m trying to solve
- Make sure I fully understand the stakeholder’s expectations
- Focus on the actual problem and avoid any distractions
- Collaborate with stakeholders and keep an open line of communication
- Take a step back and see the whole situation in context
Prepare
- What metrics to measure
- Locate data in your database
- Create security measures to protect that data
Process
- Using spreadsheet functions to find incorrectly entered data
- Using SQL functions to check for extra spaces
- Removing repeated entries
- Checking as much as possible for bias in the data
Analyze
- Perform calculations
- Combine data from multiple sources
- Create tables with your results
Share
- Make better decisions
- Make more informed decisions
- Lead to stronger outcomes
- Successfully communicate your findings
Act
- How can I use the feedback I received during the share phase (step 5) to actually meet the stakeholder’s needs and expectations?

About Bellabeat

Bellabeat is a wellness technology company that manufactures health-focused products that are marketed towards women.

Bellabeat

Questions (Ask Phase)

What are the problems I am trying to solve?

What are some trends in smart device usage?
How could these trends apply to Bellabeat customers?
How could these treands help influence Bellabeat marketing strategy?

Business Task

Urska Srsen, the CEO of Bellabeat, asks me to focus on a non-Bellabeat smart device and analyze that device’s data to see how consumers are using their device. Urska’s hope is that by analyzing consumer data we could reveal opportunities for growth by creating new marketing strategies.

Preparing the Data (Prepare Phase)

I will import the dataset in the “Prepare Phase.” I will, first, make sure the data is credible. Then, I will clean and organize the data.

Downloading the Data

Srsen encourages me to use a specific dataset:

FitBit Fitness Tracker Data (CC0: Public Domain, dataset made available through Mobius)

This Kaggle dataset contains personal fitness tracker data from 30 FitBit users. These eligible FitBit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate and sleep monitoring. It includes information about daily activity, steps and heart rate that can be used to explore user habits.

About: FitBit Fitness Tracker Data was generated by FitBit users via a survey by Amazon Mechanical Turk between 03.12.2016 and 05.12.2016. It includes 18 CSV files.

Loading packages

I’m installing packages into R that will assist me in my data cleaning and analysis.

install.packages("tidyverse")

## 
## The downloaded binary packages are in
##  /var/folders/7n/_w6s7zkx0573ltvgsfgz_qh00000gn/T//RtmpGsycUJ/downloaded_packages

install.packages("lubridate")

## 
## The downloaded binary packages are in
##  /var/folders/7n/_w6s7zkx0573ltvgsfgz_qh00000gn/T//RtmpGsycUJ/downloaded_packages

install.packages("dplyr")

## 
## The downloaded binary packages are in
##  /var/folders/7n/_w6s7zkx0573ltvgsfgz_qh00000gn/T//RtmpGsycUJ/downloaded_packages

install.packages("ggplot2")

## 
## The downloaded binary packages are in
##  /var/folders/7n/_w6s7zkx0573ltvgsfgz_qh00000gn/T//RtmpGsycUJ/downloaded_packages

install.packages("tidyr")

## 
## The downloaded binary packages are in
##  /var/folders/7n/_w6s7zkx0573ltvgsfgz_qh00000gn/T//RtmpGsycUJ/downloaded_packages

install.packages("here")

## 
## The downloaded binary packages are in
##  /var/folders/7n/_w6s7zkx0573ltvgsfgz_qh00000gn/T//RtmpGsycUJ/downloaded_packages

install.packages("skimr")

## 
## The downloaded binary packages are in
##  /var/folders/7n/_w6s7zkx0573ltvgsfgz_qh00000gn/T//RtmpGsycUJ/downloaded_packages

install.packages("janitor")

## 
## The downloaded binary packages are in
##  /var/folders/7n/_w6s7zkx0573ltvgsfgz_qh00000gn/T//RtmpGsycUJ/downloaded_packages

Now, I am loading these packages I installed.

library(tidyverse)
library(lubridate)
library(dplyr)
library(ggplot2)
library(tidyr)
library(here)
library(skimr)
library(janitor)

Importing Dataset

I am now going to import the dataset. After looking through the data, I decided to use only download the CSV files that would be useful in my analysis.

Activity <- read.csv("Desktop/Fitabase Data 4.12.16-5.12.16/Files_used/dailyActivity_merged.csv")
head(Activity)

##           Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366    4/12/2016      13162          8.50            8.50
## 2 1503960366    4/13/2016      10735          6.97            6.97
## 3 1503960366    4/14/2016      10460          6.74            6.74
## 4 1503960366    4/15/2016       9762          6.28            6.28
## 5 1503960366    4/16/2016      12669          8.16            8.16
## 6 1503960366    4/17/2016       9705          6.48            6.48
##   LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1                        0               1.88                     0.55
## 2                        0               1.57                     0.69
## 3                        0               2.44                     0.40
## 4                        0               2.14                     1.26
## 5                        0               2.71                     0.41
## 6                        0               3.19                     0.78
##   LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1                6.06                       0                25
## 2                4.71                       0                21
## 3                3.91                       0                30
## 4                2.83                       0                29
## 5                5.04                       0                36
## 6                2.51                       0                38
##   FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1                  13                  328              728     1985
## 2                  19                  217              776     1797
## 3                  11                  181             1218     1776
## 4                  34                  209              726     1745
## 5                  10                  221              773     1863
## 6                  20                  164              539     1728

colnames(Activity)

##  [1] "Id"                       "ActivityDate"            
##  [3] "TotalSteps"               "TotalDistance"           
##  [5] "TrackerDistance"          "LoggedActivitiesDistance"
##  [7] "VeryActiveDistance"       "ModeratelyActiveDistance"
##  [9] "LightActiveDistance"      "SedentaryActiveDistance" 
## [11] "VeryActiveMinutes"        "FairlyActiveMinutes"     
## [13] "LightlyActiveMinutes"     "SedentaryMinutes"        
## [15] "Calories"

str(Activity)

## 'data.frame':    940 obs. of  15 variables:
##  $ Id                      : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDate            : chr  "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
##  $ TotalSteps              : int  13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
##  $ TotalDistance           : num  8.5 6.97 6.74 6.28 8.16 ...
##  $ TrackerDistance         : num  8.5 6.97 6.74 6.28 8.16 ...
##  $ LoggedActivitiesDistance: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveDistance      : num  1.88 1.57 2.44 2.14 2.71 ...
##  $ ModeratelyActiveDistance: num  0.55 0.69 0.4 1.26 0.41 ...
##  $ LightActiveDistance     : num  6.06 4.71 3.91 2.83 5.04 ...
##  $ SedentaryActiveDistance : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveMinutes       : int  25 21 30 29 36 38 42 50 28 19 ...
##  $ FairlyActiveMinutes     : int  13 19 11 34 10 20 16 31 12 8 ...
##  $ LightlyActiveMinutes    : int  328 217 181 209 221 164 233 264 205 211 ...
##  $ SedentaryMinutes        : int  728 776 1218 726 773 539 1149 775 818 838 ...
##  $ Calories                : int  1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...

Calories <- read.csv("Desktop/Fitabase Data 4.12.16-5.12.16/dailyCalories_merged.csv")
head(Calories)

##           Id ActivityDay Calories
## 1 1503960366   4/12/2016     1985
## 2 1503960366   4/13/2016     1797
## 3 1503960366   4/14/2016     1776
## 4 1503960366   4/15/2016     1745
## 5 1503960366   4/16/2016     1863
## 6 1503960366   4/17/2016     1728

colnames(Calories)

## [1] "Id"          "ActivityDay" "Calories"

str(Calories)

## 'data.frame':    940 obs. of  3 variables:
##  $ Id         : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDay: chr  "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
##  $ Calories   : int  1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...

Intensities <- read.csv("Desktop/Fitabase Data 4.12.16-5.12.16/dailyIntensities_merged.csv")
head(Intensities)

##           Id ActivityDay SedentaryMinutes LightlyActiveMinutes
## 1 1503960366   4/12/2016              728                  328
## 2 1503960366   4/13/2016              776                  217
## 3 1503960366   4/14/2016             1218                  181
## 4 1503960366   4/15/2016              726                  209
## 5 1503960366   4/16/2016              773                  221
## 6 1503960366   4/17/2016              539                  164
##   FairlyActiveMinutes VeryActiveMinutes SedentaryActiveDistance
## 1                  13                25                       0
## 2                  19                21                       0
## 3                  11                30                       0
## 4                  34                29                       0
## 5                  10                36                       0
## 6                  20                38                       0
##   LightActiveDistance ModeratelyActiveDistance VeryActiveDistance
## 1                6.06                     0.55               1.88
## 2                4.71                     0.69               1.57
## 3                3.91                     0.40               2.44
## 4                2.83                     1.26               2.14
## 5                5.04                     0.41               2.71
## 6                2.51                     0.78               3.19

colnames(Intensities)

##  [1] "Id"                       "ActivityDay"             
##  [3] "SedentaryMinutes"         "LightlyActiveMinutes"    
##  [5] "FairlyActiveMinutes"      "VeryActiveMinutes"       
##  [7] "SedentaryActiveDistance"  "LightActiveDistance"     
##  [9] "ModeratelyActiveDistance" "VeryActiveDistance"

str(Intensities)

## 'data.frame':    940 obs. of  10 variables:
##  $ Id                      : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDay             : chr  "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
##  $ SedentaryMinutes        : int  728 776 1218 726 773 539 1149 775 818 838 ...
##  $ LightlyActiveMinutes    : int  328 217 181 209 221 164 233 264 205 211 ...
##  $ FairlyActiveMinutes     : int  13 19 11 34 10 20 16 31 12 8 ...
##  $ VeryActiveMinutes       : int  25 21 30 29 36 38 42 50 28 19 ...
##  $ SedentaryActiveDistance : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LightActiveDistance     : num  6.06 4.71 3.91 2.83 5.04 ...
##  $ ModeratelyActiveDistance: num  0.55 0.69 0.4 1.26 0.41 ...
##  $ VeryActiveDistance      : num  1.88 1.57 2.44 2.14 2.71 ...

Heartrate <- read.csv("Desktop/Fitabase Data 4.12.16-5.12.16/Files_used/heartrate_seconds_merged.csv")
head(Heartrate)

##           Id                 Time Value
## 1 2022484408 4/12/2016 7:21:00 AM    97
## 2 2022484408 4/12/2016 7:21:05 AM   102
## 3 2022484408 4/12/2016 7:21:10 AM   105
## 4 2022484408 4/12/2016 7:21:20 AM   103
## 5 2022484408 4/12/2016 7:21:25 AM   101
## 6 2022484408 4/12/2016 7:22:05 AM    95

colnames(Heartrate)

## [1] "Id"    "Time"  "Value"

str(Heartrate)

## 'data.frame':    2483658 obs. of  3 variables:
##  $ Id   : num  2.02e+09 2.02e+09 2.02e+09 2.02e+09 2.02e+09 ...
##  $ Time : chr  "4/12/2016 7:21:00 AM" "4/12/2016 7:21:05 AM" "4/12/2016 7:21:10 AM" "4/12/2016 7:21:20 AM" ...
##  $ Value: int  97 102 105 103 101 95 91 93 94 93 ...

Sleep <- read.csv("Desktop/Fitabase Data 4.12.16-5.12.16/Files_used/sleepDay_merged.csv")
head(Sleep)

##           Id              SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 4/12/2016 12:00:00 AM                 1                327
## 2 1503960366 4/13/2016 12:00:00 AM                 2                384
## 3 1503960366 4/15/2016 12:00:00 AM                 1                412
## 4 1503960366 4/16/2016 12:00:00 AM                 2                340
## 5 1503960366 4/17/2016 12:00:00 AM                 1                700
## 6 1503960366 4/19/2016 12:00:00 AM                 1                304
##   TotalTimeInBed
## 1            346
## 2            407
## 3            442
## 4            367
## 5            712
## 6            320

colnames(Sleep)

## [1] "Id"                 "SleepDay"           "TotalSleepRecords" 
## [4] "TotalMinutesAsleep" "TotalTimeInBed"

str(Sleep)

## 'data.frame':    413 obs. of  5 variables:
##  $ Id                : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ SleepDay          : chr  "4/12/2016 12:00:00 AM" "4/13/2016 12:00:00 AM" "4/15/2016 12:00:00 AM" "4/16/2016 12:00:00 AM" ...
##  $ TotalSleepRecords : int  1 2 1 2 1 1 1 1 1 1 ...
##  $ TotalMinutesAsleep: int  327 384 412 340 700 304 360 325 361 430 ...
##  $ TotalTimeInBed    : int  346 407 442 367 712 320 377 364 384 449 ...

Weight <- read.csv("Desktop/Fitabase Data 4.12.16-5.12.16/Files_used/weightLogInfo_merged.csv")
head(Weight)

##           Id                  Date WeightKg WeightPounds Fat   BMI
## 1 1503960366  5/2/2016 11:59:59 PM     52.6     115.9631  22 22.65
## 2 1503960366  5/3/2016 11:59:59 PM     52.6     115.9631  NA 22.65
## 3 1927972279  4/13/2016 1:08:52 AM    133.5     294.3171  NA 47.54
## 4 2873212765 4/21/2016 11:59:59 PM     56.7     125.0021  NA 21.45
## 5 2873212765 5/12/2016 11:59:59 PM     57.3     126.3249  NA 21.69
## 6 4319703577 4/17/2016 11:59:59 PM     72.4     159.6147  25 27.45
##   IsManualReport        LogId
## 1           True 1.462234e+12
## 2           True 1.462320e+12
## 3          False 1.460510e+12
## 4           True 1.461283e+12
## 5           True 1.463098e+12
## 6           True 1.460938e+12

colnames(Weight)

## [1] "Id"             "Date"           "WeightKg"       "WeightPounds"  
## [5] "Fat"            "BMI"            "IsManualReport" "LogId"

str(Weight)

## 'data.frame':    67 obs. of  8 variables:
##  $ Id            : num  1.50e+09 1.50e+09 1.93e+09 2.87e+09 2.87e+09 ...
##  $ Date          : chr  "5/2/2016 11:59:59 PM" "5/3/2016 11:59:59 PM" "4/13/2016 1:08:52 AM" "4/21/2016 11:59:59 PM" ...
##  $ WeightKg      : num  52.6 52.6 133.5 56.7 57.3 ...
##  $ WeightPounds  : num  116 116 294 125 126 ...
##  $ Fat           : int  22 NA NA NA NA 25 NA NA NA NA ...
##  $ BMI           : num  22.6 22.6 47.5 21.5 21.7 ...
##  $ IsManualReport: chr  "True" "True" "False" "True" ...
##  $ LogId         : num  1.46e+12 1.46e+12 1.46e+12 1.46e+12 1.46e+12 ...

We can now see that the CSV files were correctly imported into R.

Cleaning the Dataset (Process Phase)

I realized that there were some consistency issues with the times and dates of some of the datasets. I formatted them to make the date and time consistent throughout.

After that, I reviewed and cleaned the dataset using functions like glimpse(), skim_without_charts, distinct() and clean_names().

The steps I took when cleaning the dataset: * Activity, Calories and Intensities data had no issues with duplicates, spelling, missing values or blank spaces. * Converted the date and time to be consistent throughout all data. * There were 3 duplicates found in Sleep dataset–removed duplicates. * Removed ‘Fat’ column in Weight dataset that had too many missing values.

Formatting

As I stated above, there were some issues with the date and time of some of the data, so I changed the date/time in the Activity, Intensities and Sleep tables, so they are now formatted correctly.

#Activity
Activity$ActivityDate=as.POSIXct(Activity$ActivityDate, format="%m/%d/%Y", tz=Sys.timezone())
Activity$date <- format(Activity$ActivityDate, format = "%m/%d/%y")
Activity$ActivityDate=as.Date(Activity$ActivityDate, format="%m/%d/%Y", tz=Sys.timezone())
Activity$date=as.Date(Activity$date, format="%m/%d/%Y")

#Intensities
Intensities$ActivityDay=as.Date(Intensities$ActivityDay, format="%m/%d/%Y", tz=Sys.timezone())

#Sleep
Sleep$SleepDay=as.POSIXct(Sleep$SleepDay, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
Sleep$date <- format(Sleep$SleepDay, format = "%m/%d/%y")
Sleep$date=as.Date(Sleep$date, "% m/% d/% y")

Now, that the dates and times are formatted correctly and consistently, I can start analyzing the datasets.

Summarizing the Dataset (Analyze Phase)

All the data is cleaned, so now I can start analyzing the data.

How many participants contributed to each dataset?

#Activity
Activity %>%
summarise(Activity_participants = n_distinct(Activity$Id))

##   Activity_participants
## 1                    33

n_distinct(Calories$Id)

## [1] 33

n_distinct(Intensities$Id)

## [1] 33

n_distinct(Heartrate$Id)

## [1] 14

n_distinct(Sleep$Id)

## [1] 24

n_distinct(Weight$Id)

## [1] 8

From analyzing this data, I found that there are:

33 participants in the Activity, Calories and Intensities datasets
24 participants in the Sleep dataset
14 particiants in the Heartrate dataset
8 participants in the the Weight dataset

From this, I determined that there are not enough participants in the Heartrate and Weight dataset to receive any useful, unbiased information from them.

I will now only focus on the Activity, Calories, Intensities and Sleep datasets.

Statistics from each data frame:

Activity

# Activity
Activity %>%
select(TotalSteps,
TotalDistance,
SedentaryMinutes, Calories) %>%
summary()

##    TotalSteps    TotalDistance    SedentaryMinutes    Calories   
##  Min.   :    0   Min.   : 0.000   Min.   :   0.0   Min.   :   0  
##  1st Qu.: 3790   1st Qu.: 2.620   1st Qu.: 729.8   1st Qu.:1828  
##  Median : 7406   Median : 5.245   Median :1057.5   Median :2134  
##  Mean   : 7638   Mean   : 5.490   Mean   : 991.2   Mean   :2304  
##  3rd Qu.:10727   3rd Qu.: 7.713   3rd Qu.:1229.5   3rd Qu.:2793  
##  Max.   :36019   Max.   :28.030   Max.   :1440.0   Max.   :4900

Intensities

# Intensities
Intensities %>%
select(VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes, SedentaryMinutes) %>%
summary()

##  VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes
##  Min.   :  0.00    Min.   :  0.00      Min.   :  0.0        Min.   :   0.0  
##  1st Qu.:  0.00    1st Qu.:  0.00      1st Qu.:127.0        1st Qu.: 729.8  
##  Median :  4.00    Median :  6.00      Median :199.0        Median :1057.5  
##  Mean   : 21.16    Mean   : 13.56      Mean   :192.8        Mean   : 991.2  
##  3rd Qu.: 32.00    3rd Qu.: 19.00      3rd Qu.:264.0        3rd Qu.:1229.5  
##  Max.   :210.00    Max.   :143.00      Max.   :518.0        Max.   :1440.0

Calories

# Calories
Calories %>%
select(Calories) %>%
summary()

##     Calories   
##  Min.   :   0  
##  1st Qu.:1828  
##  Median :2134  
##  Mean   :2304  
##  3rd Qu.:2793  
##  Max.   :4900

Sleep

# Sleep
Sleep %>%
select(TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed) %>%
summary()

##  TotalSleepRecords TotalMinutesAsleep TotalTimeInBed 
##  Min.   :1.000     Min.   : 58.0      Min.   : 61.0  
##  1st Qu.:1.000     1st Qu.:361.0      1st Qu.:403.0  
##  Median :1.000     Median :433.0      Median :463.0  
##  Mean   :1.119     Mean   :419.5      Mean   :458.6  
##  3rd Qu.:1.000     3rd Qu.:490.0      3rd Qu.:526.0  
##  Max.   :3.000     Max.   :796.0      Max.   :961.0

Key Findings From Analysis:

The average sedentary time is high
The majority of participants are only lightly active
Participants sleep one time for an average of seven hours
The average total steps per day is 7,638, which is less than the CDC recommends

Merging Datasets

I need to merge some datasets before beginning to visualize the data. I will merge Sleep and Activity on the Id column. There are more participants in the Activity dataset, then the Sleep dataset, so I will use ‘outer_join’.

Combined_data_outer <- merge(Sleep, Activity, by="Id", all = TRUE)
n_distinct(Combined_data_outer$Id)

## [1] 33

Data Visualization (Share and Act Phases)

I will now visualize some key points.

Steps vs. Sedentary Minutes

ggplot(data=Activity, aes(x=TotalSteps, y=SedentaryMinutes)) + geom_point() + geom_smooth() + labs(title)

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

There seems to be a negative correlation between Steps and Sedentary minutes. The more Sedentary minutes, the less Steps a participant makes in a day. A reminder about the CDC’s recommended steps if those goals are not reached periodically through the day might help those make more steps and have less sedentary time.

Minutes Asleep vs. Time in Bed

ggplot(data=Sleep, aes(x=TotalMinutesAsleep, y=TotalTimeInBed)) + geom_point()+ geom_smooth() + labs(title)

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

There is a linear trend between Minutes Asleep and Time in Bed, so a reminder to go to bed, might help participants to improve their sleep times.

Total Steps vs. Calories

ggplot(data=Activity, aes(x=TotalSteps, y=Calories)) +
geom_point() + geom_smooth() + labs(title="Total Steps vs. Calories")

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

There is a positive correlation between Total Steps and Calories burned. The more active participants are, the more calories they burn. A reminder that participants can burn calories if more active could improve the amount of steps they take in a day.

Very Active Intensity vs. Time

Intensities$ActiveIntensity <- (Intensities$VeryActiveMinutes)/60

Combined_data <- merge(Weight, Intensities, by="Id", all=TRUE)
Combined_data$time <- format(Combined_data$Date, format = "%H:%M:%S")

ggplot(data=Combined_data, aes(x=time, y=ActiveIntensity)) + geom_col(fill = 'darkcyan') +
theme(axis.text.x = element_text(angle = 90)) +
labs(title = "Total Very Active Intensity vs Time")

The data shows that participants are more active in the morning and in the evening. We can assume this is due to being at work, so activity increases before and after work. Reminders can be sent to users in the morning and the evening to workout.

Conclusion

After collecting and analyzing data from other wearable technology, we can see the patterns of the majority of the consumers are very similar. To help Bellabeat’s marketing strategy, Bellabeat can send reminders at certain times of the day and also send facts about the correlation between movement and health. These reminders can urge consumers to move more during the day and, in effect, will help consumers make better choices and help their overall health and well-being. With consumers’ health improving, this will help with marketing as well. Bellabeat can market the stories of women who have been positively impacted by using Bellabeat wearables and products, which will urge other women to try the products. Since Bellabeat products are marketed towards women, the health reminders can be based on women’s health issues to make it more personalized. Facebook groups or some other type of online suppport community possibly through the Bellabeat app could also be utilized to help consumers stay motivated through positive peer pressure and accountability. Having a peer support network would market to women who are also looking for support in their wellness journeys through word of mouth.

Target Audience

Consumers working full-time jobs, spending a majority of their working day at a desk in an office who need reminders and motivation to move before and after work.

The consumer does some light activity to stay healthy, but they need to improve the amount and the rigor of their movements to maintain a healthier existence. Knowledge and reminders would help them stay motivated. Support communities could also help consumers stay accountable.

Message to Bellabeat

The Bellabeat products and app need to be personalized to women by taking into account womens’ health issues. The app and wearables need to provide gentle reminders for women to stay active and healthy. Bellabeat should also set up support networks for other women that subscribe to and/or wear Bellabeat technology, so that they have the accountability of peers.

Google Data Analytics Professional Capstone Project: Bellabeat

Amy Carden

2023-02-13