Welcome to my case study. My analysis will be on Bellabeat, a high-tech manufacturer of health-focused projects for women.
Bellabeat is a successful small company, but they have the potential to become a larger player in the global smart device market. Urška Sršen, cofounder and chief creative officer at Bellabeat, believes that analyzing smart device fitness data could help unlock new growth opportunities for the company. I have been asked to focus on one of Bellabeat’s products and analyze smart device data to gain insight into how consumers are using their smart devices.
Bellabeat app The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. This data can help users better understand their current habits and make healthy decisions. The Bellabeat app connects to their line of smart wellness products.
Leaf Bellabeat’s classic wellness tracker can be worn as a bracelet, necklace, or clip. The Leaf tracker connects to the Bellabeat app to track activity, sleep, and stress
Time The wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress. The Time watch connects to the Bellabeat app to provide you with insights into your daily wellness.
Spring This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day. The Spring bottle connects to the Bellabeat app to track your hydration levels.
Bellabeat membership Bellabeat also offers a subscription-based membership program for users. Membership gives users 24/7 access to fully personalized guidance on nutrition, activity, sleep, health and beauty, and mindfulness based on their lifestyle and goals.
Analyze smart device usage data in order to gain insight into how consumers use non-Bellabeat smart devices.
Fitbit Fitness Tracker data (CC0: Public Domain, dataset made available through Mobius) This dataset generated by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016-05.12.2016. This Kaggle data set contains personal fitness tracker from thirty fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It contains a total of 18 csv files and includes information about daily activity, steps, and heart rate that can be used to explore users’ habits.
install.packages("tidyverse")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.7 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(readr)
install.packages("dplyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(dplyr)
install.packages("here")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(here)
## here() starts at /cloud/project
install.packages("janitor")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
install.packages("skimr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(skimr)
install.packages("lubridate")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
install.packages("ggplot2")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(ggplot2)
dailyActivity_merged, sleepDay_merged
dailyActivity_merged <- read_csv("Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
## Rows: 940 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDate
## dbl (14): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
sleepDay_merged <- read_csv("Fitabase Data 4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
## Rows: 413 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): SleepDay
## dbl (4): Id, TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(dailyActivity_merged)
## spec_tbl_df [940 × 15] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Id : num [1:940] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityDate : chr [1:940] "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
## $ TotalSteps : num [1:940] 13162 10735 10460 9762 12669 ...
## $ TotalDistance : num [1:940] 8.5 6.97 6.74 6.28 8.16 ...
## $ TrackerDistance : num [1:940] 8.5 6.97 6.74 6.28 8.16 ...
## $ LoggedActivitiesDistance: num [1:940] 0 0 0 0 0 0 0 0 0 0 ...
## $ VeryActiveDistance : num [1:940] 1.88 1.57 2.44 2.14 2.71 ...
## $ ModeratelyActiveDistance: num [1:940] 0.55 0.69 0.4 1.26 0.41 ...
## $ LightActiveDistance : num [1:940] 6.06 4.71 3.91 2.83 5.04 ...
## $ SedentaryActiveDistance : num [1:940] 0 0 0 0 0 0 0 0 0 0 ...
## $ VeryActiveMinutes : num [1:940] 25 21 30 29 36 38 42 50 28 19 ...
## $ FairlyActiveMinutes : num [1:940] 13 19 11 34 10 20 16 31 12 8 ...
## $ LightlyActiveMinutes : num [1:940] 328 217 181 209 221 164 233 264 205 211 ...
## $ SedentaryMinutes : num [1:940] 728 776 1218 726 773 ...
## $ Calories : num [1:940] 1985 1797 1776 1745 1863 ...
## - attr(*, "spec")=
## .. cols(
## .. Id = col_double(),
## .. ActivityDate = col_character(),
## .. TotalSteps = col_double(),
## .. TotalDistance = col_double(),
## .. TrackerDistance = col_double(),
## .. LoggedActivitiesDistance = col_double(),
## .. VeryActiveDistance = col_double(),
## .. ModeratelyActiveDistance = col_double(),
## .. LightActiveDistance = col_double(),
## .. SedentaryActiveDistance = col_double(),
## .. VeryActiveMinutes = col_double(),
## .. FairlyActiveMinutes = col_double(),
## .. LightlyActiveMinutes = col_double(),
## .. SedentaryMinutes = col_double(),
## .. Calories = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
str(sleepDay_merged)
## spec_tbl_df [413 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ Id : num [1:413] 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ SleepDay : chr [1:413] "4/12/2016 12:00:00 AM" "4/13/2016 12:00:00 AM" "4/15/2016 12:00:00 AM" "4/16/2016 12:00:00 AM" ...
## $ TotalSleepRecords : num [1:413] 1 2 1 2 1 1 1 1 1 1 ...
## $ TotalMinutesAsleep: num [1:413] 327 384 412 340 700 304 360 325 361 430 ...
## $ TotalTimeInBed : num [1:413] 346 407 442 367 712 320 377 364 384 449 ...
## - attr(*, "spec")=
## .. cols(
## .. Id = col_double(),
## .. SleepDay = col_character(),
## .. TotalSleepRecords = col_double(),
## .. TotalMinutesAsleep = col_double(),
## .. TotalTimeInBed = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
head(dailyActivity_merged)
## # A tibble: 6 × 15
## Id ActivityDate TotalSteps TotalDistance TrackerDistance LoggedActivitie…
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1.50e9 4/12/2016 13162 8.5 8.5 0
## 2 1.50e9 4/13/2016 10735 6.97 6.97 0
## 3 1.50e9 4/14/2016 10460 6.74 6.74 0
## 4 1.50e9 4/15/2016 9762 6.28 6.28 0
## 5 1.50e9 4/16/2016 12669 8.16 8.16 0
## 6 1.50e9 4/17/2016 9705 6.48 6.48 0
## # … with 9 more variables: VeryActiveDistance <dbl>,
## # ModeratelyActiveDistance <dbl>, LightActiveDistance <dbl>,
## # SedentaryActiveDistance <dbl>, VeryActiveMinutes <dbl>,
## # FairlyActiveMinutes <dbl>, LightlyActiveMinutes <dbl>,
## # SedentaryMinutes <dbl>, Calories <dbl>
head(sleepDay_merged)
## # A tibble: 6 × 5
## Id SleepDay TotalSleepRecor… TotalMinutesAsl… TotalTimeInBed
## <dbl> <chr> <dbl> <dbl> <dbl>
## 1 1503960366 4/12/2016 12:00:0… 1 327 346
## 2 1503960366 4/13/2016 12:00:0… 2 384 407
## 3 1503960366 4/15/2016 12:00:0… 1 412 442
## 4 1503960366 4/16/2016 12:00:0… 2 340 367
## 5 1503960366 4/17/2016 12:00:0… 1 700 712
## 6 1503960366 4/19/2016 12:00:0… 1 304 320
colnames(dailyActivity_merged)
## [1] "Id" "ActivityDate"
## [3] "TotalSteps" "TotalDistance"
## [5] "TrackerDistance" "LoggedActivitiesDistance"
## [7] "VeryActiveDistance" "ModeratelyActiveDistance"
## [9] "LightActiveDistance" "SedentaryActiveDistance"
## [11] "VeryActiveMinutes" "FairlyActiveMinutes"
## [13] "LightlyActiveMinutes" "SedentaryMinutes"
## [15] "Calories"
colnames(sleepDay_merged)
## [1] "Id" "SleepDay" "TotalSleepRecords"
## [4] "TotalMinutesAsleep" "TotalTimeInBed"
n_distinct(dailyActivity_merged$Id)
## [1] 33
n_distinct(sleepDay_merged$Id)
## [1] 24
nrow(dailyActivity_merged)
## [1] 940
nrow(sleepDay_merged)
## [1] 413
nrow(unique(dailyActivity_merged))
## [1] 940
nrow(unique(sleepDay_merged))
## [1] 410
sleepDay <- unique(sleepDay_merged)
daily_activity_1 <- dailyActivity_merged %>%
select("Id","Date"= "ActivityDate","TotalSteps", "SedentaryMinutes", "VeryActiveMinutes","FairlyActiveMinutes", "LightlyActiveMinutes", "Calories")
view(daily_activity_1)
daily_activity_1$date <- mdy(daily_activity_1$Date)
sleep_1 <- sleepDay %>%
select("Id", "SleepDay", "TotalMinutesAsleep", "TotalTimeInBed")%>%
filter(TotalMinutesAsleep !=0)
sleep_1$Total_hrs_asleep <- round(sleep_1$TotalMinutesAsleep/60)
merged_data <- merge(daily_activity_1, sleep_1, by = "Id")
summary(merged_data)
## Id Date TotalSteps SedentaryMinutes
## Min. :1.504e+09 Length:12348 Min. : 0 Min. : 0.0
## 1st Qu.:3.977e+09 Class :character 1st Qu.: 4660 1st Qu.: 659.0
## Median :4.703e+09 Mode :character Median : 8585 Median : 734.0
## Mean :5.021e+09 Mean : 8108 Mean : 799.4
## 3rd Qu.:6.962e+09 3rd Qu.:11317 3rd Qu.: 853.0
## Max. :8.792e+09 Max. :22988 Max. :1440.0
## VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes Calories
## Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.:144.0 1st Qu.:1776
## Median : 8.00 Median : 10.00 Median :200.0 Median :2158
## Mean : 23.94 Mean : 17.34 Mean :199.8 Mean :2323
## 3rd Qu.: 36.00 3rd Qu.: 24.00 3rd Qu.:258.0 3rd Qu.:2859
## Max. :210.00 Max. :143.00 Max. :518.0 Max. :4900
## date SleepDay TotalMinutesAsleep TotalTimeInBed
## Min. :2016-04-12 Length:12348 Min. : 58.0 Min. : 61.0
## 1st Qu.:2016-04-19 Class :character 1st Qu.:361.0 1st Qu.:402.0
## Median :2016-04-27 Mode :character Median :432.0 Median :462.0
## Mean :2016-04-26 Mean :419.1 Mean :458.2
## 3rd Qu.:2016-05-04 3rd Qu.:492.0 3rd Qu.:526.0
## Max. :2016-05-12 Max. :796.0 Max. :961.0
## Total_hrs_asleep
## Min. : 1.00
## 1st Qu.: 6.00
## Median : 7.00
## Mean : 6.99
## 3rd Qu.: 8.00
## Max. :13.00
n_distinct(merged_data$Id)
## [1] 24
Looking at the summary, it seems the participants may not use their devices on a regular basis since columns tracking calories, distance and activity had 0 as min which doesn’t make sense. The sleep dataframe only has 24 participants and daily activity has 33. Mean total steps was 8530, which is below the 10,000 steps per day the CDC recommends. Mean time asleep was 6.9 hours which is close to the 7-9 hours per day recommended by the National Sleep Foundation. The mean sedentary minutes was 799.4 minutes which is over 13 hours.
merged_data_2 <- merged_data %>%
filter(TotalSteps !=0)%>%
filter(Calories != 0)%>%
view(merged_data_2)
VeryActiveMins <- sum(daily_activity_1$VeryActiveMinutes)
FairlyActiveMins <- sum(daily_activity_1$FairlyActiveMinutes)
LightlyActiveMins <- sum(daily_activity_1$LightlyActiveMinutes)
SedentaryMins <- sum(daily_activity_1$SedentaryMinutes)
TotalMinsActivity <- VeryActiveMins + FairlyActiveMins + LightlyActiveMins + SedentaryMins
ggplot(data = daily_activity_1)+
geom_point(mapping = aes(x = SedentaryMinutes, y = TotalSteps), color = "red")+
labs(title = "Total Steps v's Sedentary Minutes")
ggplot(data = daily_activity_1)+
geom_point(mapping = aes(x = LightlyActiveMinutes, y = TotalSteps), color = "dark green")+
labs(title = "Total Steps v's Lightly Active Minutes")
ggplot(data = daily_activity_1)+
geom_point(mapping = aes(x = VeryActiveMinutes, y = TotalSteps), color = "orange")+
labs(title = "Total Steps v's Very Active Minutes")
The above graphs show the relationship between daily steps and active minutes. Most participants seem to be sedentary to lightly active.
slices <- c(VeryActiveMins,FairlyActiveMins,LightlyActiveMins,SedentaryMins)
lbls <- c("VeryActive","FairlyActive","LightlyActive","Sedentary")
pct <- round(slices/sum(slices)*100)
lbls <- paste(lbls, pct)
lbls <- paste(lbls, "%", sep="")
pie(slices, labels = lbls, col = topo.colors(length(lbls)), main = "Percentage of Activity")
This pie chart clearly shows the percent of sedentary minutes recorded over 1 month by participants
ggplot(data = merged_data_2)+
geom_point(mapping= aes(x= TotalMinutesAsleep, y= TotalTimeInBed), color = "blue")+
labs(title = "Time in bed v's Minutes Asleep")
This graph shows that in general, most participants spent their time in bed sleeping.
ggplot(data=daily_activity_1)+
geom_point(mapping = aes(x =TotalSteps, y = Calories), color = "purple")+
geom_smooth(mapping = aes(x = TotalSteps, y = Calories)) +
labs(title = "Total Steps v's Calories")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
This graph shows the positive relationship between Total Steps and calories burned.
ggplot(data = merged_data_2)+
geom_bar(mapping = aes(x = Total_hrs_asleep, fill = TotalSteps))+
labs(title="Total steps v's Sleep", x="Hours Sleep", y="Total Steps")
This graph highlights the positive relationship between hours slept and daily activity