1 Company Summary

1.1 About the company

Bellabeat is a high-tech manufacturer of health-focused products for women, founded by Urška Sršen and Sando Mur in 2013. Their products collect data on activity, sleep, stress, and reproductive health to empower women with knowledge about their own health and habits. Ever since its founding, Bellabeat has rapidly grown and positioned itself as a tech-driven wellness company for women.

By 2016, Bellabeat had opened offices around the world and launched multiple products. Bellabeat products became available through a growing number of online retailers in addition to their own e-commerce channel on their website. The company has invested in traditional advertising media, such as radio, out-of-home billboards, print, and television, but focuses on digital marketing extensively. Bellabeat invests year-round in Google Search, maintaining active Facebook and Instagram pages, and consistently engages consumers on Twitter. Additionally, Bellabeat runs video ads on Youtube and display ads on the Google Display Network to support campaigns around key marketing dates.

1.2 Company Objective

Sršen knows that an analysis of Bellabeat’s available consumer data would reveal more opportunities for growth. She has asked the marketing analytics team to focus on a Bellabeat product and analyze smart device usage data in order to gain insight into how people are already using their smart devices. Then, using this information, she would like high-level recommendations for how these trends can inform Bellabeat marketing strategy.

1.3 Company Products

  • Bellabeat app: The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits.
  • Leaf: Bellabeat’s classic wellness tracker can be worn as a bracelet, necklace, or clip.
  • Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day.
  • Time: This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress. The Time watch connects to the Bellabeat app to provide you with insights into your daily wellness
  • Bellabeat membership: Bellabeat also offers a subscription-based membership program for users.

2 Ask Phase

2.1 Business Task

Sršen asks you to analyze smart device usage data in order to gain insight into how consumers use non-Bellabeat smart devices. She then wants you to select one Bellabeat product to apply these insights to in your presentation.

2.2 Questions guiding my analysis

  1. What are some trends in smart device usage?
  2. How could these trends apply to Bellabeat customers?
  3. How could these trends help influence Bellabeat marketing strategy?

2.3 Stakeholders

  • Urška Sršen: Bellabeat’s cofounder and Chief Creative Officer
  • Sando Mur: Mathematician and Bellabeat’s cofounder; key member of the Bellabeat executive team
  • Bellabeat marketing analytics team: A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat’s marketing strategy.

3 Prepare Phase

3.1 About Dataset

The Dataset used for this specific case study is FitBit Fitness Tracker Data. This Kaggle data set contains personal fitness tracker from thirty fitbit users. Thirty three eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits

3.2 Data Limitations

The sample size for this dataset is limited to only 33 participants while the market for fitness tracker is much larger.

3.3 Preparing Dataset

Loading Packages

library(tidyverse)
## Warning in Sys.timezone(): unable to identify current timezone 'H':
## please set environment variable 'TZ'
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.0     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.1     ✔ tibble    3.1.8
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(readxl)

Uploading Data from Database

weightLogInfo_merged <- read_excel("C:/Users/Juan Quintero/Downloads/weightLogInfo_merged.xlsx")
View(weightLogInfo_merged)
library(readxl)
heartrate_seconds_merged <- read_excel("C:/Users/Juan Quintero/Downloads/heartrate_seconds_merged.xlsx")
View(heartrate_seconds_merged)
library(readxl)
sleepDay_merged <- read_excel("C:/Users/Juan Quintero/Downloads/sleepDay_merged.xlsx")
View(sleepDay_merged)
library(readxl)
hourlySteps_merged <- read_excel("C:/Users/Juan Quintero/Downloads/hourlySteps_merged.xlsx")
View(hourlySteps_merged)
library(readxl)
hourlyCalories_merged <- read_excel("C:/Users/Juan Quintero/Downloads/hourlyCalories_merged.xlsx")
View(hourlyCalories_merged)
library(readxl)
DailyActivites_Merged <- read_excel("C:/Users/Juan Quintero/Downloads/DailyActivites_Merged.xlsx")
View(DailyActivites_Merged)

4 Processing Phase

4.1 Data Cleaning

WeightLog <- distinct(weightLogInfo_merged) %>% select(-Fat) %>% rename(DateTime=Date) 
WeightLog$Date <- sapply(strsplit(as.character(WeightLog$DateTime), " "), "[",1)  
WeightLog$Time <- sapply(strsplit(as.character(WeightLog$DateTime), " "), "[",2)  
WeightLog <- mutate(subset(WeightLog,select= c(1,2,8,9,3,4,5,6,7)))
WeightLog <- WeightLog %>% select(-DateTime) 
View(WeightLog) 

Made sure to remove duplicates. Deleted “Fat” column due to N/As making it unreliable. I also renamed columns so that when combining tables, they will all stay consistent. The datetime column was separated into date and time and columns were rearranged for a much cleaner and easier to read table.

HeartRateSeconds <- distinct(heartrate_seconds_merged) %>% rename(RatePerSecond=Value) %>% rename(DateTime=Time) 
HeartRateSeconds$Date <- sapply(strsplit(as.character(HeartRateSeconds$DateTime), " "), "[",1)
HeartRateSeconds$Time <- sapply(strsplit(as.character(HeartRateSeconds$DateTime), " "), "[",2)
HeartRateSeconds <- mutate(subset(HeartRateSeconds,select=c(1,2,4,5,3)))
HeartRateSeconds <- HeartRateSeconds %>% select(-DateTime) 
View(HeartRateSeconds) # Viewing Clean Data Frame #

Removed duplicates, renamed column to “rate per second” as it gives me a better understanding of what I am reading compared to “value”. I also split datetime into two columns, and rearranged columns as well.

SleepDays <- distinct(sleepDay_merged) %>% rename(Date=SleepDay) 
View(SleepDays) 

Removed duplicates, and renamed columns for consistency.

HourlySteps <- distinct(hourlySteps_merged) %>% rename(DateTime=ActivityHour) %>% rename(StepsTotal=StepTotal) 
HourlySteps$Date <- sapply(strsplit(as.character(HourlySteps$DateTime), " "), "[",1) 
HourlySteps$Time <- sapply(strsplit(as.character(HourlySteps$DateTime), " "), "[",2) 
HourlySteps <- mutate(subset(HourlySteps,select=c(1,2,4,5,3))) 
HourlySteps <- HourlySteps %>% select(-DateTime)
View(HourlySteps)

Removed duplicates, renamed columns, split datetime into two columns, and rearranged columns.

HourlyCalories <- distinct(hourlyCalories_merged) %>% rename(DateTime=ActivityHour) %>% rename(CaloriesBurnt=Calories) 
HourlyCalories$Date <- sapply(strsplit(as.character(HourlyCalories$DateTime), " "), "[",1) 
HourlyCalories$Time <- sapply(strsplit(as.character(HourlyCalories$DateTime), " "), "[",2) 
HourlyCalories <- mutate(subset(HourlyCalories,select=c(1,2,4,5,3))) 
HourlyCalories <- HourlyCalories %>% select(-DateTime) 
View(HourlyCalories) 

Removed duplicates, renamed columns, and split datetime table.

DailyActivities <- distinct(DailyActivites_Merged) %>% rename(Date=ActivityDate) 
View(DailyActivities)

Same process for consistency.

4.2 Exploring Tables

First 6 rows of each table

head(WeightLog)                  
## # A tibble: 6 × 8
##           Id Date       Time     WeightKg WeightPounds   BMI IsManualR…¹   LogId
##        <dbl> <chr>      <chr>       <dbl>        <dbl> <dbl> <lgl>         <dbl>
## 1 1503960366 2016-05-02 23:59:59     52.6         116.  22.6 TRUE        1.46e12
## 2 1503960366 2016-05-03 23:59:59     52.6         116.  22.6 TRUE        1.46e12
## 3 1927972279 2016-04-13 01:08:52    134.          294.  47.5 FALSE       1.46e12
## 4 2873212765 2016-04-21 23:59:59     56.7         125.  21.5 TRUE        1.46e12
## 5 2873212765 2016-05-12 23:59:59     57.3         126.  21.7 TRUE        1.46e12
## 6 4319703577 2016-04-17 23:59:59     72.4         160.  27.5 TRUE        1.46e12
## # … with abbreviated variable name ¹​IsManualReport
head(HeartRateSeconds)
## # A tibble: 6 × 4
##           Id Date       Time     RatePerSecond
##        <dbl> <chr>      <chr>            <dbl>
## 1 2022484408 2016-04-12 07:21:00            97
## 2 2022484408 2016-04-12 07:21:05           102
## 3 2022484408 2016-04-12 07:21:10           105
## 4 2022484408 2016-04-12 07:21:20           103
## 5 2022484408 2016-04-12 07:21:25           101
## 6 2022484408 2016-04-12 07:22:05            95
head(SleepDays)
## # A tibble: 6 × 5
##           Id Date                TotalSleepRecords TotalMinutesAsleep TotalTim…¹
##        <dbl> <dttm>                          <dbl>              <dbl>      <dbl>
## 1 1503960366 2016-04-12 00:00:00                 1                327        346
## 2 1503960366 2016-04-13 00:00:00                 2                384        407
## 3 1503960366 2016-04-15 00:00:00                 1                412        442
## 4 1503960366 2016-04-16 00:00:00                 2                340        367
## 5 1503960366 2016-04-17 00:00:00                 1                700        712
## 6 1503960366 2016-04-19 00:00:00                 1                304        320
## # … with abbreviated variable name ¹​TotalTimeInBed
head(HourlySteps)
## # A tibble: 6 × 4
##           Id Date       Time     StepsTotal
##        <dbl> <chr>      <chr>         <dbl>
## 1 1503960366 2016-04-12 00:00:00        373
## 2 1503960366 2016-04-12 01:00:00        160
## 3 1503960366 2016-04-12 02:00:00        151
## 4 1503960366 2016-04-12 03:00:00          0
## 5 1503960366 2016-04-12 04:00:00          0
## 6 1503960366 2016-04-12 05:00:00          0
head(HourlyCalories)
## # A tibble: 6 × 4
##           Id Date       Time     CaloriesBurnt
##        <dbl> <chr>      <chr>            <dbl>
## 1 1503960366 2016-04-12 00:00:00            81
## 2 1503960366 2016-04-12 01:00:00            61
## 3 1503960366 2016-04-12 02:00:00            59
## 4 1503960366 2016-04-12 03:00:00            47
## 5 1503960366 2016-04-12 04:00:00            48
## 6 1503960366 2016-04-12 05:00:00            48
head(DailyActivities)
## # A tibble: 6 × 15
##           Id Date                Total…¹ Total…² Track…³ Logge…⁴ VeryA…⁵ Moder…⁶
##        <dbl> <dttm>                <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 1503960366 2016-04-12 00:00:00   13162    8.5     8.5        0    1.88   0.550
## 2 1503960366 2016-04-13 00:00:00   10735    6.97    6.97       0    1.57   0.690
## 3 1503960366 2016-04-14 00:00:00   10460    6.74    6.74       0    2.44   0.400
## 4 1503960366 2016-04-15 00:00:00    9762    6.28    6.28       0    2.14   1.26 
## 5 1503960366 2016-04-16 00:00:00   12669    8.16    8.16       0    2.71   0.410
## 6 1503960366 2016-04-17 00:00:00    9705    6.48    6.48       0    3.19   0.780
## # … with 7 more variables: LightActiveDistance <dbl>,
## #   SedentaryActiveDistance <dbl>, VeryActiveMinutes <dbl>,
## #   FairlyActiveMinutes <dbl>, LightlyActiveMinutes <dbl>,
## #   SedentaryMinutes <dbl>, Calories <dbl>, and abbreviated variable names
## #   ¹​TotalSteps, ²​TotalDistance, ³​TrackerDistance, ⁴​LoggedActivitiesDistance,
## #   ⁵​VeryActiveDistance, ⁶​ModeratelyActiveDistance

Identifying all the Columns

colnames(WeightLog)            
## [1] "Id"             "Date"           "Time"           "WeightKg"      
## [5] "WeightPounds"   "BMI"            "IsManualReport" "LogId"
colnames(HeartRateSeconds)
## [1] "Id"            "Date"          "Time"          "RatePerSecond"
colnames(SleepDays)
## [1] "Id"                 "Date"               "TotalSleepRecords" 
## [4] "TotalMinutesAsleep" "TotalTimeInBed"
colnames(HourlySteps)
## [1] "Id"         "Date"       "Time"       "StepsTotal"
colnames(HourlyCalories)
## [1] "Id"            "Date"          "Time"          "CaloriesBurnt"
colnames(DailyActivities)
##  [1] "Id"                       "Date"                    
##  [3] "TotalSteps"               "TotalDistance"           
##  [5] "TrackerDistance"          "LoggedActivitiesDistance"
##  [7] "VeryActiveDistance"       "ModeratelyActiveDistance"
##  [9] "LightActiveDistance"      "SedentaryActiveDistance" 
## [11] "VeryActiveMinutes"        "FairlyActiveMinutes"     
## [13] "LightlyActiveMinutes"     "SedentaryMinutes"        
## [15] "Calories"

Identifying the Amount of Participants

n_distinct(WeightLog$Id)  
## [1] 8
n_distinct(HeartRateSeconds$Id)             
## [1] 14
n_distinct(SleepDays$Id) 
## [1] 24
n_distinct(HourlySteps$Id) 
## [1] 33
n_distinct(HourlyCalories$Id) 
## [1] 33
n_distinct(DailyActivities$Id) 
## [1] 33

Identifying the Amount of Rows for each Data Set

nrow(WeightLog)  
## [1] 67
nrow(HeartRateSeconds)                 
## [1] 2483658
nrow(SleepDays) 
## [1] 410
nrow(HourlySteps)  
## [1] 22099
nrow(HourlyCalories)
## [1] 22099
nrow(DailyActivities)
## [1] 940

Brief Summary for each Data Frame

WeightLog %>% select(WeightKg, WeightPounds, BMI) %>% summary()
##     WeightKg       WeightPounds        BMI       
##  Min.   : 52.60   Min.   :116.0   Min.   :21.45  
##  1st Qu.: 61.40   1st Qu.:135.4   1st Qu.:23.96  
##  Median : 62.50   Median :137.8   Median :24.39  
##  Mean   : 72.04   Mean   :158.8   Mean   :25.19  
##  3rd Qu.: 85.05   3rd Qu.:187.5   3rd Qu.:25.56  
##  Max.   :133.50   Max.   :294.3   Max.   :47.54
HeartRateSeconds %>% select(RatePerSecond) %>% summary()
##  RatePerSecond   
##  Min.   : 36.00  
##  1st Qu.: 63.00  
##  Median : 73.00  
##  Mean   : 77.33  
##  3rd Qu.: 88.00  
##  Max.   :203.00
SleepDays %>% select(TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed) %>% summary()  
##  TotalSleepRecords TotalMinutesAsleep TotalTimeInBed 
##  Min.   :1.00      Min.   : 58.0      Min.   : 61.0  
##  1st Qu.:1.00      1st Qu.:361.0      1st Qu.:403.8  
##  Median :1.00      Median :432.5      Median :463.0  
##  Mean   :1.12      Mean   :419.2      Mean   :458.5  
##  3rd Qu.:1.00      3rd Qu.:490.0      3rd Qu.:526.0  
##  Max.   :3.00      Max.   :796.0      Max.   :961.0
HourlySteps %>% select(StepsTotal) %>% summary()
##    StepsTotal     
##  Min.   :    0.0  
##  1st Qu.:    0.0  
##  Median :   40.0  
##  Mean   :  320.2  
##  3rd Qu.:  357.0  
##  Max.   :10554.0
HourlyCalories %>% select(CaloriesBurnt) %>% summary()
##  CaloriesBurnt   
##  Min.   : 42.00  
##  1st Qu.: 63.00  
##  Median : 83.00  
##  Mean   : 97.39  
##  3rd Qu.:108.00  
##  Max.   :948.00
DailyActivities %>% select(TotalSteps,TotalDistance,SedentaryMinutes) %>% summary()
##    TotalSteps    TotalDistance    SedentaryMinutes
##  Min.   :    0   Min.   : 0.000   Min.   :   0.0  
##  1st Qu.: 3790   1st Qu.: 2.620   1st Qu.: 729.8  
##  Median : 7406   Median : 5.245   Median :1057.5  
##  Mean   : 7638   Mean   : 5.490   Mean   : 991.2  
##  3rd Qu.:10727   3rd Qu.: 7.713   3rd Qu.:1229.5  
##  Max.   :36019   Max.   :28.030   Max.   :1440.0

5 Analysis Phase

Creating Health Reports based off of Participants BMI

WeightLog <- WeightLog %>% select(Id,Date,Time,WeightKg,WeightPounds,BMI,IsManualReport,LogId) %>% mutate(HealthStatus = case_when(
  BMI <= 18.5 ~ 'UnderWeight',
  BMI > 18.5 & BMI <=24.9 ~ 'Normal',                                             
  BMI >= 25 & BMI <=29.9 ~ 'OverWeight',
  BMI >= 30 & BMI <=39.9 ~ 'Obese',                     
  TRUE ~ 'AtRisk'
   ))
WeightLog <- mutate(subset(WeightLog,select= c(1,2,3,4,5,6,9,7,8)))
HealthReport <- WeightLog %>% distinct(Id,HealthStatus)
View(HealthReport)

Due to my limitations of only the body mass index and weight measurements, I created a new column using case_when() to understand the health status of each participant. I then brought it over into a new folder showing health status for the only 8 participants that logged in their weight. For more information on BMI categories seek my citation https://www.nhlbi.nih.gov/health/educational/lose_wt/BMI/bmicalc.htm.

Looking into steps taken by each participant throughout the day

AverageHourlySteps <- HourlySteps %>% group_by(Time) %>% summarize(AverageSteps=mean(StepsTotal))
View(AverageHourlySteps)

I wanted to know the average amount of steps taken throughout the day throughout all patients.

HourlyStepsViz <- ggplot(data=AverageHourlySteps)+
  geom_col(mapping=aes(x=Time,y=AverageSteps, fill='coral1'))+
  labs(title="Hourly Steps Throughout The Day")+
  theme(axis.text.x=element_text(angle=90))
plot(HourlyStepsViz)

Participants seem to be most active between 5pm to 7pm from a day to day basis.

Combining total steps taken and calories burnt by each participant

HourlyStepsTotal <- HourlySteps %>% group_by(Id) %>% summarise(HourlySteps=sum(StepsTotal))    
HourlyStepsTotal <- HourlyStepsTotal %>% rename(TotalSteps=HourlySteps)
head(HourlyStepsTotal)
## # A tibble: 6 × 2
##           Id TotalSteps
##        <dbl>      <dbl>
## 1 1503960366     374546
## 2 1624580081     177750
## 3 1644430081     217927
## 4 1844505072      79942
## 5 1927972279      28400
## 6 2022484408     351712
HourlyCaloriesTotal <- HourlyCalories %>% group_by(Id) %>% summarise(HourlyCalories=sum(CaloriesBurnt))    
HourlyCaloriesTotal <- HourlyCaloriesTotal %>% rename(TotalBurntCalories=HourlyCalories)
head(HourlyCaloriesTotal)
## # A tibble: 6 × 2
##           Id TotalBurntCalories
##        <dbl>              <dbl>
## 1 1503960366              56287
## 2 1624580081              45980
## 3 1644430081              84125
## 4 1844505072              48681
## 5 1927972279              67347
## 6 2022484408              77633
CaloriesBurntByStep <- merge(HourlyStepsTotal,HourlyCaloriesTotal, by="Id", all=TRUE)
head(CaloriesBurntByStep)
##           Id TotalSteps TotalBurntCalories
## 1 1503960366     374546              56287
## 2 1624580081     177750              45980
## 3 1644430081     217927              84125
## 4 1844505072      79942              48681
## 5 1927972279      28400              67347
## 6 2022484408     351712              77633
HealthEvaluation <- merge(CaloriesBurntByStep, HealthReport, by="Id")
(HealthEvaluation)
##           Id TotalSteps TotalBurntCalories HealthStatus
## 1 1503960366     374546              56287       Normal
## 2 1927972279      28400              67347       AtRisk
## 3 2873212765     234130              59059       Normal
## 4 4319703577     209464              61904   OverWeight
## 5 4558609924     237909              63046   OverWeight
## 6 5577150313     248519             100816   OverWeight
## 7 6962181067     303621              61461       Normal
## 8 8877689391     495623             105746   OverWeight

I wanted to narrow it down to most active users, at this time only 8 participants out of 33 logged in with total steps, and calories burnt.The rest were inactive for certain columns.

Weekday with most sleep time

SleepDate <- SleepDays %>% distinct(Date)
SleepDate$weekday <- weekdays(SleepDate$Date)          

SleepDays <- merge(SleepDays,SleepDate, by= "Date")
SleepDays <- mutate(subset(SleepDays,select=c(2,1,6,3,4,5)))
SleepDays <- SleepDays %>% rename(Weekday=weekday)
options(width=150)
head(SleepDays)
##           Id       Date Weekday TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## 1 1503960366 2016-04-12 Tuesday                 1                327            346
## 2 8378563200 2016-04-12 Tuesday                 1                338            356
## 3 5577150313 2016-04-12 Tuesday                 1                419            438
## 4 4020332650 2016-04-12 Tuesday                 1                501            541
## 5 5553957443 2016-04-12 Tuesday                 1                441            464
## 6 4445114986 2016-04-12 Tuesday                 2                429            457

In order to get a better understanding on each participants sleep schedules I added a new column by weekday.

WeekDays <- SleepDays$Weekday %>% factor(levels=c("Sunday","Monday","Tuesday", "Wednesday", "Thursday","Friday","Saturday"))
WeeklySleepViz <- ggplot(data=SleepDays)+geom_col(mapping=aes(x=WeekDays,y=TotalMinutesAsleep, fill='coral1')) +        
  labs(title= "Total Minutes Asleep Throughout the Week") + 
  annotate("text",x="Thursday",y=30000, label = "Most Sleep seems to be in the middle of the week")
plot(WeeklySleepViz)

It seems as though that their is early burnout as the most minutes total asleep is on a Wednesday. Second most is weekends which makes sense if most of these participants have day jobs working 5 days a week.

Correlation Between Total Time in Bed and Sleep Time

SleepTimeViz <- ggplot(data=SleepDays, aes(x=TotalMinutesAsleep,y=TotalTimeInBed))+
  geom_point(color='coral1')+geom_smooth(method=lm, se=FALSE,color = 'black', linewidth=1)+ 
  labs(title="Correlation between Minutes In Bed and Being Asleep")
plot(SleepTimeViz)
## `geom_smooth()` using formula = 'y ~ x'

Their is a positive correlation between total time in bed and total minutes asleep. Since they are dependent to one another, Bellabeat can further explore tracking users sleep cycle to get a better understanding of how they function with or without sleep.

Sleep Report to Health Evaluation

OverallTimeInBed <- SleepDays %>% group_by(Id) %>% summarise(OverallTimeInBed=sum(TotalTimeInBed))
OverallMinutesAsleep <- SleepDays %>% group_by(Id) %>% summarise(OverallMinutesAsleep=sum(TotalMinutesAsleep))
SleepReport <- merge(OverallMinutesAsleep, OverallTimeInBed, by="Id")
HealthEvaluation <- merge(SleepReport,HealthEvaluation, by="Id")

Identifying Daily Activities to Health Performance

n_distinct(DailyActivities$Date)
## [1] 31

31 Days or a month has been tracked on participants daily activities.

ActivityDate <- DailyActivities %>% distinct(Date)
ActivityDate$WeekDay <- weekdays(ActivityDate$Date)                             
DailyActivities <- merge(DailyActivities, ActivityDate, by= "Date")
DailyActivities <- mutate(subset(DailyActivities,select=c(2,1,16,3,4,5,6,7,8,9,10,11,12,13,14,15)))
ActiveViz <- ggplot(data=DailyActivities, aes(x=TotalSteps, y=Calories))+ 
  geom_point(color='coral1') + geom_smooth(method=lm, se=FALSE,color = 'black', linewidth=1)+     
  labs(title = 'Calories Burnt By Steps Taken')
ActiveViz
## `geom_smooth()` using formula = 'y ~ x'

Their is also a positive correlation between steps taken and the amount of calories burnt by users. From an overall standpoint, this can help encourage users of Bellabeat to track their day to day steps, and see their progress on how much calories burnt. Health should be the focus point for Bellabeat.

Heart Rate Per Second for each Participant

HeartDate <- HeartRateSeconds %>% distinct(Date)
HeartDate$Date = as.Date(strptime(HeartDate$Date, "%Y-%m-%d"))    
HeartRateSeconds$Date = as.Date(strptime(HeartRateSeconds$Date, "%Y-%m-%d"))
str(HeartRateSeconds)
## tibble [2,483,658 × 4] (S3: tbl_df/tbl/data.frame)
##  $ Id           : num [1:2483658] 2.02e+09 2.02e+09 2.02e+09 2.02e+09 2.02e+09 ...
##  $ Date         : Date[1:2483658], format: "2016-04-12" "2016-04-12" "2016-04-12" "2016-04-12" ...
##  $ Time         : chr [1:2483658] "07:21:00" "07:21:05" "07:21:10" "07:21:20" ...
##  $ RatePerSecond: num [1:2483658] 97 102 105 103 101 95 91 93 94 93 ...
HeartDate$WeekDay <- weekdays(SleepDate$Date)
HeartRateSeconds <- merge(HeartRateSeconds,HeartDate, by= "Date")
HeartRateSeconds <- mutate(subset(HeartRateSeconds, select=c(2,1,5,3,4)))
HeartRateMonitorViz <- ggplot(data=HeartRateSeconds)+
  geom_col(mapping=aes(x=factor(WeekDay, levels=c("Sunday","Monday","Tuesday", "Wednesday", "Thursday","Friday","Saturday")),
                       y=RatePerSecond, fill='coral1'))+
                       facet_wrap(~Id)+
                       labs(title="Monitoring Heart Rate Per Second Throughout The Week",x="WeekDays")+
  theme(axis.text.x=element_text(angle=90))
plot(HeartRateMonitorViz)

Monitoring heart rates can help elevate fitness level. This gives me the heart rates per second for every user that manually logged in. Continuing the trend of health, Bellabeat can address in their marketing developing health problems that can be found when looking into heart rates.

Monitoring Inactivity

ggplot(data=DailyActivities)+
  geom_col(mapping=aes(x=SedentaryMinutes,y=factor(WeekDay,levels=c("Sunday","Monday","Tuesday", "Wednesday", "Thursday","Friday","Saturday")),fill='coral1'))+labs(title="Weekly Inactive Minutes",y='WeekDays')

There is most inactive minutes from Tuesday to Thursday. These 3 days are spiked up much higher compared to the rest of the week. Reminders based off of inactivity patterns can go along way to encourage a consistency in activeness throughout the week. BellaBeat can suggest notifications on when user has been away for too long.

InactiveMinutes <- DailyActivities %>% group_by(Id) %>% summarise(Inactivity=sum(SedentaryMinutes))
HealthEvaluation <- merge(HealthEvaluation, InactiveMinutes, by="Id")  
HealthEvaluation <- mutate(subset(HealthEvaluation,select=c(1,2,3,7,4,5,6)))
n_distinct(HealthEvaluation$Id)
## [1] 6
head(HealthEvaluation)
##           Id OverallMinutesAsleep OverallTimeInBed Inactivity TotalSteps TotalBurntCalories HealthStatus
## 1 1503960366                 9007             9580      26293     374546              56287       Normal
## 2 1927972279                 2085             2189      40840      28400              67347       AtRisk
## 3 4319703577                12393            13051      22810     209464              61904   OverWeight
## 4 4558609924                  638              700      33902     237909              63046   OverWeight
## 5 5577150313                11232            11976      22633     248519             100816   OverWeight
## 6 6962181067                13888            14450      20532     303621              61461       Normal

To continue the health evaluation, I added total sedentary minutes as well, which brought the table to now 6 users that logged in for each category.

6 Share Phase

7 Act Phase

7.1 Conclusion

Bellabeats main focus is to create more engagement with their users by providing reminders, notifications, and health evaluations. It is easy to gain a following, but the idea is the maintain one as well. Prevent any decrease in monthly users. That’s why it is important to inform customers how our services will benefit them in the long run. A constant motivation to improve, even when results show a decline in activity. Small things such as congratulating a customer for walking a farther distance today compared to yesterday will go a long way. Giving diet plans based off of their BMI or health status can help users be more knowledgeable of their health, and what they eat. Reassurance prevents giving up and that is what Bellabeat should strive for.

7.2 Recomendations

  • Social Media Page
  • Free 1 month trial
  • Notifications, and Reminders
  • Weekly or Monthly Health Evaluation
  • Dietary and Nutritional Advice
  • Adding more tracking features such as a timer, or creating schedules