Bellabeat is a high-tech company founded in 2013 that manufactures health-focused smart products for women. The collection of data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their health and daily habits. The company has 5 products: bellabeat app, leaf, time, spring and bellabeat membership. Our team has been asked to analyse smart device data to gain insight into how consumers are using their gadgets. The insights of this analysis will help to guide Bellabeat’s marketing strategy.
Analise a given data set and present some recommendations to improve Bellabeat’s marketing strategy.
This data set has 2 main limitations: only 30 users and their data was collected from March to May of 2016 which is a short time frame.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 1.0.0
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.5.0
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(lubridate)
## Loading required package: timechange
##
## Attaching package: 'lubridate'
##
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(dplyr)
library(ggplot2)
library(tidyr)
I have already set my working directory.
dailyActivity_merged <- read_csv("Fitbit_Fitness_Tracker/dailyActivity_merged.csv")
## Rows: 940 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDate
## dbl (14): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Activity <- dailyActivity_merged
HeartRate <- read_csv("Fitbit_Fitness_Tracker/heartrate_seconds_merged.csv")
## Rows: 2483658 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Time
## dbl (2): Id, Value
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Weight <- read_csv("Fitbit_Fitness_Tracker/weightLogInfo_merged.csv")
## Rows: 67 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Date
## dbl (6): Id, WeightKg, WeightPounds, Fat, BMI, LogId
## lgl (1): IsManualReport
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Sleep <- read.csv("Fitbit_Fitness_Tracker/sleepDay_merged.csv")
Calories <- read.csv("Fitbit_Fitness_Tracker/dailyCalories_merged.csv")
class(Sleep$SleepDay)
## [1] "character"
Sleep$SleepDay=as.POSIXct(Sleep$SleepDay, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
Sleep$date <- format(Sleep$SleepDay, format = "%m/%d/%y")
Activity$ActivityDate=as.POSIXct(Activity$ActivityDate, format="%m/%d/%Y", tz=Sys.timezone())
Activity$date <- format(Activity$ActivityDate, format = "%m/%d/%y")
HeartRate$Time=as.POSIXct(HeartRate$Time, format="%m/%d/%Y", tz=Sys.timezone())
HeartRate$date <- format(HeartRate$Time, format = "%m/%d/%y")
n_distinct(Activity$Id)
## [1] 33
n_distinct(Calories$Id)
## [1] 33
n_distinct(Sleep$Id)
## [1] 24
n_distinct(HeartRate$Id)
## [1] 14
n_distinct(Weight$Id)
## [1] 8
There is information about 33 users from calories and activity, 24 for sleep and 14 for heart rate. The data frame for weight has only 8 users which is not enough to make conclusions regarding weight and fat percentage.
Let’s see what activity data frame tells us:
Activity %>%
select(TotalSteps,
VeryActiveMinutes,
LightlyActiveMinutes,
SedentaryMinutes) %>%
summary()
## TotalSteps VeryActiveMinutes LightlyActiveMinutes SedentaryMinutes
## Min. : 0 Min. : 0.00 Min. : 0.0 Min. : 0.0
## 1st Qu.: 3790 1st Qu.: 0.00 1st Qu.:127.0 1st Qu.: 729.8
## Median : 7406 Median : 4.00 Median :199.0 Median :1057.5
## Mean : 7638 Mean : 21.16 Mean :192.8 Mean : 991.2
## 3rd Qu.:10727 3rd Qu.: 32.00 3rd Qu.:264.0 3rd Qu.:1229.5
## Max. :36019 Max. :210.00 Max. :518.0 Max. :1440.0
If we consider these users as more active people than average, 7406 steps as the median value is not satisfactory and could be improved. Despite moderate and intense exercise having more health benefits than walking 10,000-12,000 steps a day, achieving both is something to strive for. The mean sedentary time is 991 minutes (16hours) and the maximum sedentary time is 1440 minutes (24h). This needs to be addressed.
HeartRate %>%
select(Value) %>%
summary()
## Value
## Min. : 36.00
## 1st Qu.: 63.00
## Median : 73.00
## Mean : 77.33
## 3rd Qu.: 88.00
## Max. :203.00
Average values and there is no information regarding heart rate while exercising or sleeping.
Sleep %>%
select(.) %>%
filter(!complete.cases(.))
## data frame with 0 columns and 0 rows
Sleep %>%
select(TotalMinutesAsleep) %>%
summarise(mean(TotalMinutesAsleep))
## mean(TotalMinutesAsleep)
## 1 419.4673
Sleep %>%
select(TotalMinutesAsleep) %>%
summarise(median(TotalMinutesAsleep))
## median(TotalMinutesAsleep)
## 1 433
The mean sleeping time is just under 7 hours and the median sleeping time is 7h13mins.
merged_data <- merge(Sleep, Activity, by=c('Id', 'date'))
head(merged_data)
## Id date SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 04/12/16 2016-04-12 1 327
## 2 1503960366 04/13/16 2016-04-13 2 384
## 3 1503960366 04/15/16 2016-04-15 1 412
## 4 1503960366 04/16/16 2016-04-16 2 340
## 5 1503960366 04/17/16 2016-04-17 1 700
## 6 1503960366 04/19/16 2016-04-19 1 304
## TotalTimeInBed ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 346 2016-04-12 13162 8.50 8.50
## 2 407 2016-04-13 10735 6.97 6.97
## 3 442 2016-04-15 9762 6.28 6.28
## 4 367 2016-04-16 12669 8.16 8.16
## 5 712 2016-04-17 9705 6.48 6.48
## 6 320 2016-04-19 15506 9.88 9.88
## LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.14 1.26
## 4 0 2.71 0.41
## 5 0 3.19 0.78
## 6 0 3.53 1.32
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 2.83 0 29
## 4 5.04 0 36
## 5 2.51 0 38
## 6 5.03 0 50
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 34 209 726 1745
## 4 10 221 773 1863
## 5 20 164 539 1728
## 6 31 264 775 2035
ggplot(data=Activity, aes(x = TotalDistance, y = VeryActiveDistance,)) +
geom_point (color = 'darkgreen') + labs(title = "Very Active Distance in relation to Total Distance")
cor(Activity$TotalDistance, Activity$VeryActiveDistance)
## [1] 0.7945816
There is a moderate correlation between distance in total and very active distance. People may tend to exercise more vigorously if they improve their walking/moving time.
plot( TotalMinutesAsleep ~ TotalTimeInBed,
data = Sleep,
main = "Do people sleep if they go to bed?",
col.main = 'navyblue',
fg = 'blue',
col = 'darkorange',
col.axis = 'navyblue',
col.lab = 'darkorange',
xlab = "TotalTimeInBed",
ylab = "TotalMinutesAsleep"
)
cor(Sleep$TotalTimeInBed, Sleep$TotalMinutesAsleep)
## [1] 0.9304575
There is a strong correlation between time spent in bed and sleeping time. Bellabeat may recommend going to bed half an hour earlier to improve sleeping time. Customers could receive points for good quality sleep and get discounts on other Bellabeat’s products.
ggplot(data=merged_data, aes(x = TotalMinutesAsleep, y = LightlyActiveMinutes)) +
geom_point( color = 'darkgreen') + geom_smooth( color = 'red') +
labs(title = 'Sleep and Activity')
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
ggplot(data=merged_data, aes(x = TotalMinutesAsleep, y = FairlyActiveMinutes)) +
geom_point( color = 'darkgreen') + geom_smooth( color = 'red') +
labs(title = 'Sleep and Activity')
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
ggplot(data=merged_data, aes(x = TotalMinutesAsleep, y = VeryActiveMinutes)) +
geom_point( color = 'darkgreen') + geom_smooth( color = 'red') +
labs(title = 'Sleep and Activity')
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
ggplot(data=merged_data, aes(x = TotalMinutesAsleep, y = SedentaryMinutes)) +
geom_point( color = 'darkgreen') + geom_smooth( color = 'red') +
labs(title = 'Sleep and Activity')
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
cor(merged_data$TotalMinutesAsleep, merged_data$SedentaryMinutes)
## [1] -0.599394
There seems to be no correlation between sleep time and active spent time. Sleep has a negative low correlation with sedentary minutes. However, sleep time should not be the focus to improve an active lifestyle according to this data.
There are no gender information about the participants which may bias these recommendations as Bellabeat’s target audience is women. Regarding the limited data set analysed, I can suggest the following:
sessionInfo()
## R version 4.2.2 (2022-10-31 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 22621)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=English_United Kingdom.utf8
## [2] LC_CTYPE=English_United Kingdom.utf8
## [3] LC_MONETARY=English_United Kingdom.utf8
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United Kingdom.utf8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] lubridate_1.9.0 timechange_0.1.1 forcats_0.5.2 stringr_1.5.0
## [5] dplyr_1.0.10 purrr_1.0.0 readr_2.1.3 tidyr_1.2.1
## [9] tibble_3.1.8 ggplot2_3.4.0 tidyverse_1.3.2
##
## loaded via a namespace (and not attached):
## [1] lattice_0.20-45 assertthat_0.2.1 digest_0.6.31
## [4] utf8_1.2.2 R6_2.5.1 cellranger_1.1.0
## [7] backports_1.4.1 reprex_2.0.2 evaluate_0.19
## [10] highr_0.10 httr_1.4.4 pillar_1.8.1
## [13] rlang_1.0.6 googlesheets4_1.0.1 readxl_1.4.1
## [16] rstudioapi_0.14 jquerylib_0.1.4 Matrix_1.5-1
## [19] rmarkdown_2.19 splines_4.2.2 labeling_0.4.2
## [22] googledrive_2.0.0 bit_4.0.5 munsell_0.5.0
## [25] broom_1.0.2 compiler_4.2.2 modelr_0.1.10
## [28] xfun_0.36 pkgconfig_2.0.3 mgcv_1.8-41
## [31] htmltools_0.5.4 tidyselect_1.2.0 fansi_1.0.3
## [34] crayon_1.5.2 tzdb_0.3.0 dbplyr_2.2.1
## [37] withr_2.5.0 grid_4.2.2 nlme_3.1-160
## [40] jsonlite_1.8.4 gtable_0.3.1 lifecycle_1.0.3
## [43] DBI_1.1.3 magrittr_2.0.3 scales_1.2.1
## [46] cli_3.5.0 stringi_1.7.8 vroom_1.6.0
## [49] cachem_1.0.6 farver_2.1.1 fs_1.5.2
## [52] xml2_1.3.3 bslib_0.4.2 ellipsis_0.3.2
## [55] generics_0.1.3 vctrs_0.5.1 tools_4.2.2
## [58] bit64_4.0.5 glue_1.6.2 hms_1.1.2
## [61] parallel_4.2.2 fastmap_1.1.0 yaml_2.3.6
## [64] colorspace_2.0-3 gargle_1.2.1 rvest_1.0.3
## [67] knitr_1.41 haven_2.5.1 sass_0.4.4