Bellabeats was founded by Urska Srsen and Sando Mur in 2013. High tech company focusing on wellness for women using beautifully designed technology that informs and inspires. Website
Bellabeats app: provides users with health data related to their activity, sleep, stress, menstrual cycle and mindfulness habits.
Leaf: Bellabeats wellness tracker that can be worn as a bracelet, necklace or clip. Connects to bellabeats app to track activity, sleep and stress.
Time: wellness smart watch that also connects to the bellabeats app and tracks users.
Spring: water bottle that tracks daily water intake with app.
Bellabeats membership: 24/7 subscription based membership that personalizes guidance on nutrition, activity, sleep, health, beauty, and mindfulness based on lifestyle and goals.
Urska Srsen: co founder and chief Creative officer of Bellabeats
Sando Mur: Mathematician and Bellabeats co founder
Bellabeats marketing analytics team
Junior data analyst working on the marketing team at Bellabeats tasked with analyzing smart device data to help unlock new growth opportunity by focusing on one bellabeats product and analyzing fitbit data to gain insight on how consumers are using there smart devices.
What are some trends in smart device usage?
How could these trends apply to Bellabeat customers?
How could these trends help influence Bellabeat marketing strategy?
FitBit Fitness Tracker Data by Mobius is a well documented data set generated by a distributed survey via Amazon Mechanical Turk between March 12th 2016 to May 12th 2016. 30 users consented to submitting personal tracker data that covers physical activity, sleep, and heart rate.
Outdated data from 2016 with a small sample size of 30 participants would not be enough to get accurate representative data for business decisions but will be used to to showcase my data analytic skills and provide insight.
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(ggplot2)
library(readr)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ stringr 1.5.1
## ✔ forcats 1.0.0 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readr)
dailyActivity <- read_csv("dailyActivity_merged.csv")
## Rows: 940 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDate
## dbl (14): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
dailyCalories <- read_csv("dailyCalories_merged.csv")
## Rows: 940 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDay
## dbl (2): Id, Calories
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
sleepDay <- read_csv("sleepDay_merged.csv")
## Rows: 413 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): SleepDay
## dbl (4): Id, TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
weightLogInfo <- read_csv("weightLogInfo_merged.csv")
## Rows: 67 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Date
## dbl (6): Id, WeightKg, WeightPounds, Fat, BMI, LogId
## lgl (1): IsManualReport
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Take a look at what type of data is collected
head(dailyActivity)
## # A tibble: 6 × 15
## Id ActivityDate TotalSteps TotalDistance TrackerDistance
## <dbl> <chr> <dbl> <dbl> <dbl>
## 1 1503960366 4/12/2016 13162 8.5 8.5
## 2 1503960366 4/13/2016 10735 6.97 6.97
## 3 1503960366 4/14/2016 10460 6.74 6.74
## 4 1503960366 4/15/2016 9762 6.28 6.28
## 5 1503960366 4/16/2016 12669 8.16 8.16
## 6 1503960366 4/17/2016 9705 6.48 6.48
## # ℹ 10 more variables: LoggedActivitiesDistance <dbl>,
## # VeryActiveDistance <dbl>, ModeratelyActiveDistance <dbl>,
## # LightActiveDistance <dbl>, SedentaryActiveDistance <dbl>,
## # VeryActiveMinutes <dbl>, FairlyActiveMinutes <dbl>,
## # LightlyActiveMinutes <dbl>, SedentaryMinutes <dbl>, Calories <dbl>
head(dailyCalories)
## # A tibble: 6 × 3
## Id ActivityDay Calories
## <dbl> <chr> <dbl>
## 1 1503960366 4/12/2016 1985
## 2 1503960366 4/13/2016 1797
## 3 1503960366 4/14/2016 1776
## 4 1503960366 4/15/2016 1745
## 5 1503960366 4/16/2016 1863
## 6 1503960366 4/17/2016 1728
head(sleepDay)
## # A tibble: 6 × 5
## Id SleepDay TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## <dbl> <chr> <dbl> <dbl> <dbl>
## 1 1503960366 4/12/2016 12:0… 1 327 346
## 2 1503960366 4/13/2016 12:0… 2 384 407
## 3 1503960366 4/15/2016 12:0… 1 412 442
## 4 1503960366 4/16/2016 12:0… 2 340 367
## 5 1503960366 4/17/2016 12:0… 1 700 712
## 6 1503960366 4/19/2016 12:0… 1 304 320
head(weightLogInfo)
## # A tibble: 6 × 8
## Id Date WeightKg WeightPounds Fat BMI IsManualReport LogId
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <lgl> <dbl>
## 1 1503960366 5/2/2016 … 52.6 116. 22 22.6 TRUE 1.46e12
## 2 1503960366 5/3/2016 … 52.6 116. NA 22.6 TRUE 1.46e12
## 3 1927972279 4/13/2016… 134. 294. NA 47.5 FALSE 1.46e12
## 4 2873212765 4/21/2016… 56.7 125. NA 21.5 TRUE 1.46e12
## 5 2873212765 5/12/2016… 57.3 126. NA 21.7 TRUE 1.46e12
## 6 4319703577 4/17/2016… 72.4 160. 25 27.5 TRUE 1.46e12
Identify all columns in the data
colnames(dailyActivity)
## [1] "Id" "ActivityDate"
## [3] "TotalSteps" "TotalDistance"
## [5] "TrackerDistance" "LoggedActivitiesDistance"
## [7] "VeryActiveDistance" "ModeratelyActiveDistance"
## [9] "LightActiveDistance" "SedentaryActiveDistance"
## [11] "VeryActiveMinutes" "FairlyActiveMinutes"
## [13] "LightlyActiveMinutes" "SedentaryMinutes"
## [15] "Calories"
colnames(dailyCalories)
## [1] "Id" "ActivityDay" "Calories"
colnames(sleepDay)
## [1] "Id" "SleepDay" "TotalSleepRecords"
## [4] "TotalMinutesAsleep" "TotalTimeInBed"
colnames(weightLogInfo)
## [1] "Id" "Date" "WeightKg" "WeightPounds"
## [5] "Fat" "BMI" "IsManualReport" "LogId"
n_distinct(dailyActivity$Id)
## [1] 33
n_distinct(sleepDay$Id)
## [1] 24
n_distinct(dailyCalories$Id)
## [1] 33
n_distinct(weightLogInfo$Id)
## [1] 8
Due to weight logs having only 8 participants we will refrain from using that data
sum(duplicated(dailyActivity))
## [1] 0
sum(duplicated(sleepDay))
## [1] 3
sum(duplicated(dailyCalories))
## [1] 0
Lets remove the duplicates from sleepDay
sleepDay <- sleepDay %>%
distinct() %>%
drop_na()
sum(duplicated(sleepDay))
## [1] 0
clean_names(dailyActivity)
## # A tibble: 940 × 15
## id activity_date total_steps total_distance tracker_distance
## <dbl> <chr> <dbl> <dbl> <dbl>
## 1 1503960366 4/12/2016 13162 8.5 8.5
## 2 1503960366 4/13/2016 10735 6.97 6.97
## 3 1503960366 4/14/2016 10460 6.74 6.74
## 4 1503960366 4/15/2016 9762 6.28 6.28
## 5 1503960366 4/16/2016 12669 8.16 8.16
## 6 1503960366 4/17/2016 9705 6.48 6.48
## 7 1503960366 4/18/2016 13019 8.59 8.59
## 8 1503960366 4/19/2016 15506 9.88 9.88
## 9 1503960366 4/20/2016 10544 6.68 6.68
## 10 1503960366 4/21/2016 9819 6.34 6.34
## # ℹ 930 more rows
## # ℹ 10 more variables: logged_activities_distance <dbl>,
## # very_active_distance <dbl>, moderately_active_distance <dbl>,
## # light_active_distance <dbl>, sedentary_active_distance <dbl>,
## # very_active_minutes <dbl>, fairly_active_minutes <dbl>,
## # lightly_active_minutes <dbl>, sedentary_minutes <dbl>, calories <dbl>
clean_names(sleepDay)
## # A tibble: 410 × 5
## id sleep_day total_sleep_records total_minutes_asleep total_time_in_bed
## <dbl> <chr> <dbl> <dbl> <dbl>
## 1 1.50e9 4/12/201… 1 327 346
## 2 1.50e9 4/13/201… 2 384 407
## 3 1.50e9 4/15/201… 1 412 442
## 4 1.50e9 4/16/201… 2 340 367
## 5 1.50e9 4/17/201… 1 700 712
## 6 1.50e9 4/19/201… 1 304 320
## 7 1.50e9 4/20/201… 1 360 377
## 8 1.50e9 4/21/201… 1 325 364
## 9 1.50e9 4/23/201… 1 361 384
## 10 1.50e9 4/24/201… 1 430 449
## # ℹ 400 more rows
clean_names(dailyCalories)
## # A tibble: 940 × 3
## id activity_day calories
## <dbl> <chr> <dbl>
## 1 1503960366 4/12/2016 1985
## 2 1503960366 4/13/2016 1797
## 3 1503960366 4/14/2016 1776
## 4 1503960366 4/15/2016 1745
## 5 1503960366 4/16/2016 1863
## 6 1503960366 4/17/2016 1728
## 7 1503960366 4/18/2016 1921
## 8 1503960366 4/19/2016 2035
## 9 1503960366 4/20/2016 1786
## 10 1503960366 4/21/2016 1775
## # ℹ 930 more rows
Having the date as a character could cause problems so we will properly format that in each data
dailyActivity <- dailyActivity %>%
mutate(ActivityDate= as_date(ActivityDate, format= "%m/%d/%Y")) %>%
rename(date= ActivityDate)
sleepDay <- sleepDay %>%
mutate(SleepDay= as_date(SleepDay, format= "%m/%d/%Y %I:%M:%S %p")) %>%
rename(date= SleepDay)
dailyCalories <- dailyCalories %>%
mutate(ActivityDay= as_date(ActivityDay, format= "%m/%d/%Y")) %>%
rename(date= ActivityDay)
Now we check to make sure it is all properly formatted
str(dailyActivity)
## tibble [940 × 15] (S3: tbl_df/tbl/data.frame)
## $ Id : num [1:940] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ date : Date[1:940], format: "2016-04-12" "2016-04-13" ...
## $ TotalSteps : num [1:940] 13162 10735 10460 9762 12669 ...
## $ TotalDistance : num [1:940] 8.5 6.97 6.74 6.28 8.16 ...
## $ TrackerDistance : num [1:940] 8.5 6.97 6.74 6.28 8.16 ...
## $ LoggedActivitiesDistance: num [1:940] 0 0 0 0 0 0 0 0 0 0 ...
## $ VeryActiveDistance : num [1:940] 1.88 1.57 2.44 2.14 2.71 ...
## $ ModeratelyActiveDistance: num [1:940] 0.55 0.69 0.4 1.26 0.41 ...
## $ LightActiveDistance : num [1:940] 6.06 4.71 3.91 2.83 5.04 ...
## $ SedentaryActiveDistance : num [1:940] 0 0 0 0 0 0 0 0 0 0 ...
## $ VeryActiveMinutes : num [1:940] 25 21 30 29 36 38 42 50 28 19 ...
## $ FairlyActiveMinutes : num [1:940] 13 19 11 34 10 20 16 31 12 8 ...
## $ LightlyActiveMinutes : num [1:940] 328 217 181 209 221 164 233 264 205 211 ...
## $ SedentaryMinutes : num [1:940] 728 776 1218 726 773 ...
## $ Calories : num [1:940] 1985 1797 1776 1745 1863 ...
str(sleepDay)
## tibble [410 × 5] (S3: tbl_df/tbl/data.frame)
## $ Id : num [1:410] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ date : Date[1:410], format: "2016-04-12" "2016-04-13" ...
## $ TotalSleepRecords : num [1:410] 1 2 1 2 1 1 1 1 1 1 ...
## $ TotalMinutesAsleep: num [1:410] 327 384 412 340 700 304 360 325 361 430 ...
## $ TotalTimeInBed : num [1:410] 346 407 442 367 712 320 377 364 384 449 ...
str(dailyCalories)
## tibble [940 × 3] (S3: tbl_df/tbl/data.frame)
## $ Id : num [1:940] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ date : Date[1:940], format: "2016-04-12" "2016-04-13" ...
## $ Calories: num [1:940] 1985 1797 1776 1745 1863 ...
dailyActivity %>%
select(TotalSteps, TotalDistance, SedentaryMinutes) %>%
summary()
## TotalSteps TotalDistance SedentaryMinutes
## Min. : 0 Min. : 0.000 Min. : 0.0
## 1st Qu.: 3790 1st Qu.: 2.620 1st Qu.: 729.8
## Median : 7406 Median : 5.245 Median :1057.5
## Mean : 7638 Mean : 5.490 Mean : 991.2
## 3rd Qu.:10727 3rd Qu.: 7.713 3rd Qu.:1229.5
## Max. :36019 Max. :28.030 Max. :1440.0
Using the median due to potential outliers we can see that on average people walk 7406 steps a day and travela distance of 5.245 miles. Sedentary minutes spent on average is 1057.5 or 17.6 hrs a day spent sedentary.
sleepDay %>%
select(TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed) %>%
summary()
## TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## Min. :1.00 Min. : 58.0 Min. : 61.0
## 1st Qu.:1.00 1st Qu.:361.0 1st Qu.:403.8
## Median :1.00 Median :432.5 Median :463.0
## Mean :1.12 Mean :419.2 Mean :458.5
## 3rd Qu.:1.00 3rd Qu.:490.0 3rd Qu.:526.0
## Max. :3.00 Max. :796.0 Max. :961.0
Using the median due to potential outliers we can see that the average person has 432.5 minutes or 7 hours and 12.5 minutes of time asleep and 463 minutes of time in bed. Subtracting time in bed from total time asleep gives us an average of 30.5 minutes of time spent awake in bed.
dailyCalories %>%
select(Calories) %>%
summary()
## Calories
## Min. : 0
## 1st Qu.:1828
## Median :2134
## Mean :2304
## 3rd Qu.:2793
## Max. :4900
The average amount of calories burned in a day by the users is 2134
dailyActivity %>%
select(VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes) %>%
summary()
## VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes
## Min. : 0.00 Min. : 0.00 Min. : 0.0
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.:127.0
## Median : 4.00 Median : 6.00 Median :199.0
## Mean : 21.16 Mean : 13.56 Mean :192.8
## 3rd Qu.: 32.00 3rd Qu.: 19.00 3rd Qu.:264.0
## Max. :210.00 Max. :143.00 Max. :518.0
We can see here the average time of each intensity with very active at 4 minutes average, fairly at 6 minutes active, and, light at 199 minutes active. Higher intensity results in less active minutes on average.
calorie_activity <- merge(dailyActivity, dailyCalories, by= c("Id","date","Calories"))
all_data <- merge(calorie_activity, sleepDay, by= c("Id","date"))
head(all_data)
## Id date Calories TotalSteps TotalDistance TrackerDistance
## 1 1503960366 2016-04-12 1985 13162 8.50 8.50
## 2 1503960366 2016-04-13 1797 10735 6.97 6.97
## 3 1503960366 2016-04-15 1745 9762 6.28 6.28
## 4 1503960366 2016-04-16 1863 12669 8.16 8.16
## 5 1503960366 2016-04-17 1728 9705 6.48 6.48
## 6 1503960366 2016-04-19 2035 15506 9.88 9.88
## LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.14 1.26
## 4 0 2.71 0.41
## 5 0 3.19 0.78
## 6 0 3.53 1.32
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 2.83 0 29
## 4 5.04 0 36
## 5 2.51 0 38
## 6 5.03 0 50
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes TotalSleepRecords
## 1 13 328 728 1
## 2 19 217 776 2
## 3 34 209 726 1
## 4 10 221 773 2
## 5 20 164 539 1
## 6 31 264 775 1
## TotalMinutesAsleep TotalTimeInBed
## 1 327 346
## 2 384 407
## 3 412 442
## 4 340 367
## 5 700 712
## 6 304 320
In conclusion I found that majority of participants spent most of their day sedentary. They sleep on average 7 hours and spend roughly 30 minutes awake in bed on average. The time they do spend on activities is limited with light activity being the highest average of the three followed by fairly active and high activity being the shortest amount of time on average.
With this data I can recommend using the bellabeats app to encourage users on spending more time of the day being active. Alerts and reminders that ping the smart devices like the Time or Leaf with goals to motivate individuals in staying active. Improving sleep would also be a good way to influence Bellabeats marketing having a dim light option on the app to help users fall asleep faster with no distractions.