Bellabeat is founded by Urška Sršen and Sando Mur, a high-tech company that manufactures health-focused smart products. Sršen used her background as an artist to develop beautifully designed technology that informs and inspires women around the world. Collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits. Since it was founded in 2013, Bellabeat has grown rapidly and quickly positioned itself as a tech-driven wellness company for women.
By 2016, Bellabeat had opened offices around the world and launched multiple products. Bellabeat products became available through a growing number of online retailers in addition to their own e-commerce channel on their website. The company has invested in traditional advertising media, such as radio, out-of-home billboards, print, and television, but focuses on digital marketing extensively. Bellabeat invests year-round in Google Search, maintaining active Facebook and Instagram pages, and consistently engages consumers on Twitter. Additionally, Bellabeat runs video ads on Youtube and display ads on the Google Display Network to support campaigns around key marketing dates.
Bellabeat app: The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. This data can help users better understand their current habits and make healthy decisions. The Bellabeat app connects to their line of smart wellness products. Leaf: Bellabeat’s classic wellness tracker can be worn as a bracelet, necklace, or clip. The Leaf tracker connects to the Bellabeat app to track activity, sleep, and stress.
Time: This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress. The Time watch connects to the Bellabeat app to provide you with insights into your daily wellness.
Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day. The Spring bottle connects to the Bellabeat app to track your hydration levels.
Bellabeat membership: Bellabeat also offers a subscription-based membership program for users. Membership gives users 24/7 access to fully personalized guidance on nutrition, activity, sleep, health and beauty, and mindfulness based on their lifestyle and goals.
Business Task
Analyze smart device usage data in order to gain insight into how consumers use non-Bellabeat smart devices
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.7 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
dailyActivity <- read_csv("dailyActivity_merged.csv")
## Rows: 940 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDate
## dbl (14): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
sleep_data <- read_csv("sleepDay_merged.csv")
## Rows: 413 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): SleepDay
## dbl (4): Id, TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
weight_info <- read_csv("weightLogInfo_merged.csv")
## Rows: 67 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Date
## dbl (6): Id, WeightKg, WeightPounds, Fat, BMI, LogId
## lgl (1): IsManualReport
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(dailyActivity)
## # A tibble: 6 × 15
## Id ActivityDate TotalSteps TotalDistance TrackerDistance LoggedActivitie…
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1.50e9 4/12/2016 13162 8.5 8.5 0
## 2 1.50e9 4/13/2016 10735 6.97 6.97 0
## 3 1.50e9 4/14/2016 10460 6.74 6.74 0
## 4 1.50e9 4/15/2016 9762 6.28 6.28 0
## 5 1.50e9 4/16/2016 12669 8.16 8.16 0
## 6 1.50e9 4/17/2016 9705 6.48 6.48 0
## # … with 9 more variables: VeryActiveDistance <dbl>,
## # ModeratelyActiveDistance <dbl>, LightActiveDistance <dbl>,
## # SedentaryActiveDistance <dbl>, VeryActiveMinutes <dbl>,
## # FairlyActiveMinutes <dbl>, LightlyActiveMinutes <dbl>,
## # SedentaryMinutes <dbl>, Calories <dbl>
head(sleep_data)
## # A tibble: 6 × 5
## Id SleepDay TotalSleepRecor… TotalMinutesAsl… TotalTimeInBed
## <dbl> <chr> <dbl> <dbl> <dbl>
## 1 1503960366 4/12/2016 12:00:0… 1 327 346
## 2 1503960366 4/13/2016 12:00:0… 2 384 407
## 3 1503960366 4/15/2016 12:00:0… 1 412 442
## 4 1503960366 4/16/2016 12:00:0… 2 340 367
## 5 1503960366 4/17/2016 12:00:0… 1 700 712
## 6 1503960366 4/19/2016 12:00:0… 1 304 320
head(weight_info)
## # A tibble: 6 × 8
## Id Date WeightKg WeightPounds Fat BMI IsManualReport LogId
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <lgl> <dbl>
## 1 1503960366 5/2/2016 … 52.6 116. 22 22.6 TRUE 1.46e12
## 2 1503960366 5/3/2016 … 52.6 116. NA 22.6 TRUE 1.46e12
## 3 1927972279 4/13/2016… 134. 294. NA 47.5 FALSE 1.46e12
## 4 2873212765 4/21/2016… 56.7 125. NA 21.5 TRUE 1.46e12
## 5 2873212765 5/12/2016… 57.3 126. NA 21.7 TRUE 1.46e12
## 6 4319703577 4/17/2016… 72.4 160. 25 27.5 TRUE 1.46e12
Clean the data
Finding out the unique Id for each datasets
n_distinct(dailyActivity$Id)
## [1] 33
n_distinct(sleep_data$Id)
## [1] 24
n_distinct(weight_info$Id)
## [1] 8
dailyActivity <- dailyActivity %>%
rename(date = ActivityDate)
sleep_data <- sleep_data %>%
rename(Date = SleepDay)
str(dailyActivity)
## spec_tbl_df [940 × 15] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Id : num [1:940] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ date : chr [1:940] "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
## $ TotalSteps : num [1:940] 13162 10735 10460 9762 12669 ...
## $ TotalDistance : num [1:940] 8.5 6.97 6.74 6.28 8.16 ...
## $ TrackerDistance : num [1:940] 8.5 6.97 6.74 6.28 8.16 ...
## $ LoggedActivitiesDistance: num [1:940] 0 0 0 0 0 0 0 0 0 0 ...
## $ VeryActiveDistance : num [1:940] 1.88 1.57 2.44 2.14 2.71 ...
## $ ModeratelyActiveDistance: num [1:940] 0.55 0.69 0.4 1.26 0.41 ...
## $ LightActiveDistance : num [1:940] 6.06 4.71 3.91 2.83 5.04 ...
## $ SedentaryActiveDistance : num [1:940] 0 0 0 0 0 0 0 0 0 0 ...
## $ VeryActiveMinutes : num [1:940] 25 21 30 29 36 38 42 50 28 19 ...
## $ FairlyActiveMinutes : num [1:940] 13 19 11 34 10 20 16 31 12 8 ...
## $ LightlyActiveMinutes : num [1:940] 328 217 181 209 221 164 233 264 205 211 ...
## $ SedentaryMinutes : num [1:940] 728 776 1218 726 773 ...
## $ Calories : num [1:940] 1985 1797 1776 1745 1863 ...
## - attr(*, "spec")=
## .. cols(
## .. Id = col_double(),
## .. ActivityDate = col_character(),
## .. TotalSteps = col_double(),
## .. TotalDistance = col_double(),
## .. TrackerDistance = col_double(),
## .. LoggedActivitiesDistance = col_double(),
## .. VeryActiveDistance = col_double(),
## .. ModeratelyActiveDistance = col_double(),
## .. LightActiveDistance = col_double(),
## .. SedentaryActiveDistance = col_double(),
## .. VeryActiveMinutes = col_double(),
## .. FairlyActiveMinutes = col_double(),
## .. LightlyActiveMinutes = col_double(),
## .. SedentaryMinutes = col_double(),
## .. Calories = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
str(sleep_data)
## spec_tbl_df [413 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Id : num [1:413] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ Date : chr [1:413] "4/12/2016 12:00:00 AM" "4/13/2016 12:00:00 AM" "4/15/2016 12:00:00 AM" "4/16/2016 12:00:00 AM" ...
## $ TotalSleepRecords : num [1:413] 1 2 1 2 1 1 1 1 1 1 ...
## $ TotalMinutesAsleep: num [1:413] 327 384 412 340 700 304 360 325 361 430 ...
## $ TotalTimeInBed : num [1:413] 346 407 442 367 712 320 377 364 384 449 ...
## - attr(*, "spec")=
## .. cols(
## .. Id = col_double(),
## .. SleepDay = col_character(),
## .. TotalSleepRecords = col_double(),
## .. TotalMinutesAsleep = col_double(),
## .. TotalTimeInBed = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
str(weight_info)
## spec_tbl_df [67 × 8] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Id : num [1:67] 1.50e+09 1.50e+09 1.93e+09 2.87e+09 2.87e+09 ...
## $ Date : chr [1:67] "5/2/2016 11:59:59 PM" "5/3/2016 11:59:59 PM" "4/13/2016 1:08:52 AM" "4/21/2016 11:59:59 PM" ...
## $ WeightKg : num [1:67] 52.6 52.6 133.5 56.7 57.3 ...
## $ WeightPounds : num [1:67] 116 116 294 125 126 ...
## $ Fat : num [1:67] 22 NA NA NA NA 25 NA NA NA NA ...
## $ BMI : num [1:67] 22.6 22.6 47.5 21.5 21.7 ...
## $ IsManualReport: logi [1:67] TRUE TRUE FALSE TRUE TRUE TRUE ...
## $ LogId : num [1:67] 1.46e+12 1.46e+12 1.46e+12 1.46e+12 1.46e+12 ...
## - attr(*, "spec")=
## .. cols(
## .. Id = col_double(),
## .. Date = col_character(),
## .. WeightKg = col_double(),
## .. WeightPounds = col_double(),
## .. Fat = col_double(),
## .. BMI = col_double(),
## .. IsManualReport = col_logical(),
## .. LogId = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
dailyActivity$date <- as.Date.character(dailyActivity$date, format="%m/%d/%Y")
sleep_data$Date <- as.Date.character(sleep_data$Date,format="%m/%d/%Y")
weight_info$Date <- as.Date.character(weight_info$Date,format="%m/%d/%Y")
sleep_data$date <- as.Date(sleep_data$Date)
sleep_data$time <- format(as.POSIXct(sleep_data$Date),
format = "%H:%M:%S")
weight_info$date <- as.Date(weight_info$Date)
weight_info$time <- format(as.POSIXct(weight_info$Date),
format = "%H:%M:%S")
dailyActivity$weekday = wday(dailyActivity$date, label = T)
sleep_data$weekday = wday(sleep_data$date, label = T)
weight_info$weekday = wday(weight_info$date, label = T)
head(dailyActivity)
## # A tibble: 6 × 16
## Id date TotalSteps TotalDistance TrackerDistance LoggedActivitie…
## <dbl> <date> <dbl> <dbl> <dbl> <dbl>
## 1 1.50e9 2016-04-12 13162 8.5 8.5 0
## 2 1.50e9 2016-04-13 10735 6.97 6.97 0
## 3 1.50e9 2016-04-14 10460 6.74 6.74 0
## 4 1.50e9 2016-04-15 9762 6.28 6.28 0
## 5 1.50e9 2016-04-16 12669 8.16 8.16 0
## 6 1.50e9 2016-04-17 9705 6.48 6.48 0
## # … with 10 more variables: VeryActiveDistance <dbl>,
## # ModeratelyActiveDistance <dbl>, LightActiveDistance <dbl>,
## # SedentaryActiveDistance <dbl>, VeryActiveMinutes <dbl>,
## # FairlyActiveMinutes <dbl>, LightlyActiveMinutes <dbl>,
## # SedentaryMinutes <dbl>, Calories <dbl>, weekday <ord>
head(sleep_data)
## # A tibble: 6 × 8
## Id Date TotalSleepRecor… TotalMinutesAsl… TotalTimeInBed date
## <dbl> <date> <dbl> <dbl> <dbl> <date>
## 1 1.50e9 2016-04-12 1 327 346 2016-04-12
## 2 1.50e9 2016-04-13 2 384 407 2016-04-13
## 3 1.50e9 2016-04-15 1 412 442 2016-04-15
## 4 1.50e9 2016-04-16 2 340 367 2016-04-16
## 5 1.50e9 2016-04-17 1 700 712 2016-04-17
## 6 1.50e9 2016-04-19 1 304 320 2016-04-19
## # … with 2 more variables: time <chr>, weekday <ord>
head(weight_info)
## # A tibble: 6 × 11
## Id Date WeightKg WeightPounds Fat BMI IsManualReport LogId
## <dbl> <date> <dbl> <dbl> <dbl> <dbl> <lgl> <dbl>
## 1 1503960366 2016-05-02 52.6 116. 22 22.6 TRUE 1.46e12
## 2 1503960366 2016-05-03 52.6 116. NA 22.6 TRUE 1.46e12
## 3 1927972279 2016-04-13 134. 294. NA 47.5 FALSE 1.46e12
## 4 2873212765 2016-04-21 56.7 125. NA 21.5 TRUE 1.46e12
## 5 2873212765 2016-05-12 57.3 126. NA 21.7 TRUE 1.46e12
## 6 4319703577 2016-04-17 72.4 160. 25 27.5 TRUE 1.46e12
## # … with 3 more variables: date <date>, time <chr>, weekday <ord>
dailyActivity %>%
select(TotalSteps,
TotalDistance,
SedentaryMinutes) %>%
summary()
## TotalSteps TotalDistance SedentaryMinutes
## Min. : 0 Min. : 0.000 Min. : 0.0
## 1st Qu.: 3790 1st Qu.: 2.620 1st Qu.: 729.8
## Median : 7406 Median : 5.245 Median :1057.5
## Mean : 7638 Mean : 5.490 Mean : 991.2
## 3rd Qu.:10727 3rd Qu.: 7.713 3rd Qu.:1229.5
## Max. :36019 Max. :28.030 Max. :1440.0
sleep_data %>%
select(TotalMinutesAsleep,
TotalTimeInBed) %>%
summary()
## TotalMinutesAsleep TotalTimeInBed
## Min. : 58.0 Min. : 61.0
## 1st Qu.:361.0 1st Qu.:403.0
## Median :433.0 Median :463.0
## Mean :419.5 Mean :458.6
## 3rd Qu.:490.0 3rd Qu.:526.0
## Max. :796.0 Max. :961.0
weight_info %>%
select(BMI,
WeightKg) %>%
summary()
## BMI WeightKg
## Min. :21.45 Min. : 52.60
## 1st Qu.:23.96 1st Qu.: 61.40
## Median :24.39 Median : 62.50
## Mean :25.19 Mean : 72.04
## 3rd Qu.:25.56 3rd Qu.: 85.05
## Max. :47.54 Max. :133.50
dailyActivity_Sleep_Merged <- inner_join(dailyActivity, sleep_data, by = c("Id", "date"))
dailyActivity_Sleep_Merged %>%
select(TotalSteps,
TotalDistance,
SedentaryMinutes,
TotalMinutesAsleep,
TotalTimeInBed) %>%
summary()
## TotalSteps TotalDistance SedentaryMinutes TotalMinutesAsleep
## Min. : 17 Min. : 0.010 Min. : 0.0 Min. : 58.0
## 1st Qu.: 5206 1st Qu.: 3.600 1st Qu.: 631.0 1st Qu.:361.0
## Median : 8925 Median : 6.290 Median : 717.0 Median :433.0
## Mean : 8541 Mean : 6.039 Mean : 712.2 Mean :419.5
## 3rd Qu.:11393 3rd Qu.: 8.030 3rd Qu.: 783.0 3rd Qu.:490.0
## Max. :22770 Max. :17.540 Max. :1265.0 Max. :796.0
## TotalTimeInBed
## Min. : 61.0
## 1st Qu.:403.0
## Median :463.0
## Mean :458.6
## 3rd Qu.:526.0
## Max. :961.0
dailyActivity_Weight_Merged <- inner_join(dailyActivity, weight_info, by = c("Id", "date"))
dailyActivity_Sleep_Weight_Merged <- inner_join(dailyActivity_Sleep_Merged, weight_info, by = c("Id", "date"))
dailyActivity_Sleep_Weight_Merged %>%
select(TotalSteps,
TotalDistance,
SedentaryMinutes,
TotalMinutesAsleep,
TotalTimeInBed,
BMI,
WeightKg) %>%
summary()
## TotalSteps TotalDistance SedentaryMinutes TotalMinutesAsleep
## Min. : 356 Min. : 0.250 Min. : 127.0 Min. :115.0
## 1st Qu.: 5780 1st Qu.: 3.825 1st Qu.: 635.5 1st Qu.:399.0
## Median :10524 Median : 6.960 Median : 689.0 Median :442.0
## Mean : 9687 Mean : 6.523 Mean : 688.5 Mean :430.3
## 3rd Qu.:12484 3rd Qu.: 8.730 3rd Qu.: 736.0 3rd Qu.:472.5
## Max. :20031 Max. :13.240 Max. :1121.0 Max. :630.0
## TotalTimeInBed BMI WeightKg
## Min. :129.0 Min. :22.65 Min. : 52.60
## 1st Qu.:420.0 1st Qu.:23.89 1st Qu.: 61.20
## Median :455.0 Median :24.00 Median : 61.50
## Mean :449.8 Mean :24.83 Mean : 64.17
## 3rd Qu.:494.0 3rd Qu.:24.17 3rd Qu.: 61.90
## Max. :679.0 Max. :47.54 Max. :133.50
head(dailyActivity_Sleep_Merged)
## # A tibble: 6 × 22
## Id date TotalSteps TotalDistance TrackerDistance LoggedActivitie…
## <dbl> <date> <dbl> <dbl> <dbl> <dbl>
## 1 1.50e9 2016-04-12 13162 8.5 8.5 0
## 2 1.50e9 2016-04-13 10735 6.97 6.97 0
## 3 1.50e9 2016-04-15 9762 6.28 6.28 0
## 4 1.50e9 2016-04-16 12669 8.16 8.16 0
## 5 1.50e9 2016-04-17 9705 6.48 6.48 0
## 6 1.50e9 2016-04-19 15506 9.88 9.88 0
## # … with 16 more variables: VeryActiveDistance <dbl>,
## # ModeratelyActiveDistance <dbl>, LightActiveDistance <dbl>,
## # SedentaryActiveDistance <dbl>, VeryActiveMinutes <dbl>,
## # FairlyActiveMinutes <dbl>, LightlyActiveMinutes <dbl>,
## # SedentaryMinutes <dbl>, Calories <dbl>, weekday.x <ord>, Date <date>,
## # TotalSleepRecords <dbl>, TotalMinutesAsleep <dbl>, TotalTimeInBed <dbl>,
## # time <chr>, weekday.y <ord>
head(dailyActivity_Weight_Merged)
## # A tibble: 6 × 25
## Id date TotalSteps TotalDistance TrackerDistance LoggedActivitie…
## <dbl> <date> <dbl> <dbl> <dbl> <dbl>
## 1 1.50e9 2016-05-02 14727 9.71 9.71 0
## 2 1.50e9 2016-05-03 15103 9.66 9.66 0
## 3 1.93e9 2016-04-13 356 0.25 0.25 0
## 4 2.87e9 2016-04-21 8859 5.98 5.98 0
## 5 2.87e9 2016-05-12 7566 5.11 5.11 0
## 6 4.32e9 2016-04-17 29 0.0200 0.0200 0
## # … with 19 more variables: VeryActiveDistance <dbl>,
## # ModeratelyActiveDistance <dbl>, LightActiveDistance <dbl>,
## # SedentaryActiveDistance <dbl>, VeryActiveMinutes <dbl>,
## # FairlyActiveMinutes <dbl>, LightlyActiveMinutes <dbl>,
## # SedentaryMinutes <dbl>, Calories <dbl>, weekday.x <ord>, Date <date>,
## # WeightKg <dbl>, WeightPounds <dbl>, Fat <dbl>, BMI <dbl>,
## # IsManualReport <lgl>, LogId <dbl>, time <chr>, weekday.y <ord>
head(dailyActivity_Sleep_Weight_Merged)
## # A tibble: 6 × 31
## Id date TotalSteps TotalDistance TrackerDistance LoggedActivitie…
## <dbl> <date> <dbl> <dbl> <dbl> <dbl>
## 1 1.50e9 2016-05-02 14727 9.71 9.71 0
## 2 1.50e9 2016-05-03 15103 9.66 9.66 0
## 3 1.93e9 2016-04-13 356 0.25 0.25 0
## 4 4.56e9 2016-05-01 3428 2.27 2.27 0
## 5 5.58e9 2016-04-17 12231 9.14 9.14 0
## 6 6.96e9 2016-04-12 10199 6.74 6.74 0
## # … with 25 more variables: VeryActiveDistance <dbl>,
## # ModeratelyActiveDistance <dbl>, LightActiveDistance <dbl>,
## # SedentaryActiveDistance <dbl>, VeryActiveMinutes <dbl>,
## # FairlyActiveMinutes <dbl>, LightlyActiveMinutes <dbl>,
## # SedentaryMinutes <dbl>, Calories <dbl>, weekday.x <ord>, Date.x <date>,
## # TotalSleepRecords <dbl>, TotalMinutesAsleep <dbl>, TotalTimeInBed <dbl>,
## # time.x <chr>, weekday.y <ord>, Date.y <date>, WeightKg <dbl>, …
uniqueId_dailyActivity <- filter(dailyActivity, Id == 1503960366)
uniqueId_dailyActivity_Sleep_Merged <- filter(dailyActivity_Sleep_Merged, Id == 1503960366)
n_distinct(dailyActivity$Id)
## [1] 33
n_distinct(sleep_data$Id)
## [1] 24
n_distinct(weight_info$Id)
## [1] 8
n_distinct(dailyActivity_Sleep_Merged$Id)
## [1] 24
n_distinct(dailyActivity_Weight_Merged$Id)
## [1] 8
n_distinct(dailyActivity_Sleep_Weight_Merged$Id)
## [1] 5