Bellabeat is a high-tech company that manufactures health-focused smart products.Bellabeat samrt products are designed to informs and inspires women around the world by collecting data on activity, sleep, stress, and reproductive health since 2013.
This analysis aims to analyze smart device fitness data to gain insight in how customer use their smart devices. With that, it is meant to help guide Bellabeat’s future marketing strategy based on the opportunity of growth that we discovered.
Stakeholders
Urška Sršen: Bellabeat’s co founder and Chief Creative Officer Sando Mur: Mathematician and Bellabeat’s co founder and key member of the Bellabeat executive team Bellabeat marketing analytics team
Products
The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. It connects to other Bellabeat smart wellness products. With that being said, Bellabeat app help customers to determine healthy decisions based on the data it collected.
Our work is to analyse non-bellabeat smart devices usage data and gain insights to draw high-level recommendations on Bellebat marketing strategy.
This analysis focus on answering this question: How current user trends can guide marketing strategy?
FitBit Fitness Tracker Data.zip, a folder
Fitabase Data 4.12.16-5.12.16 contains 18 CSV
files.We will install and loading some R packages that will encounter in this analysis.
There are three packages for cleaning data, “here”, “skimr”, “janitor”.
#Loading libraries
library(tidyverse)
library(lubridate)
library(scales)
library(ggplot2)
library(dplyr)
library(here)
library(skimr)
library(janitor)
We will import all 18 csv files and then view, clean, format and organize the data.
We will take look at the summary of each column.
#Importing Daily Activity dataset:
daily_activity <- read.csv(file = "~/Downloads/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
head(daily_activity)
colnames(daily_activity)
## [1] "Id" "ActivityDate"
## [3] "TotalSteps" "TotalDistance"
## [5] "TrackerDistance" "LoggedActivitiesDistance"
## [7] "VeryActiveDistance" "ModeratelyActiveDistance"
## [9] "LightActiveDistance" "SedentaryActiveDistance"
## [11] "VeryActiveMinutes" "FairlyActiveMinutes"
## [13] "LightlyActiveMinutes" "SedentaryMinutes"
## [15] "Calories"
str(daily_activity)
## 'data.frame': 940 obs. of 15 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityDate : chr "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
## $ TotalSteps : int 13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
## $ TotalDistance : num 8.5 6.97 6.74 6.28 8.16 ...
## $ TrackerDistance : num 8.5 6.97 6.74 6.28 8.16 ...
## $ LoggedActivitiesDistance: num 0 0 0 0 0 0 0 0 0 0 ...
## $ VeryActiveDistance : num 1.88 1.57 2.44 2.14 2.71 ...
## $ ModeratelyActiveDistance: num 0.55 0.69 0.4 1.26 0.41 ...
## $ LightActiveDistance : num 6.06 4.71 3.91 2.83 5.04 ...
## $ SedentaryActiveDistance : num 0 0 0 0 0 0 0 0 0 0 ...
## $ VeryActiveMinutes : int 25 21 30 29 36 38 42 50 28 19 ...
## $ FairlyActiveMinutes : int 13 19 11 34 10 20 16 31 12 8 ...
## $ LightlyActiveMinutes : int 328 217 181 209 221 164 233 264 205 211 ...
## $ SedentaryMinutes : int 728 776 1218 726 773 539 1149 775 818 838 ...
## $ Calories : int 1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...
#Importing Daily Calories dataset:
daily_calories <- read.csv(file = "~/Downloads/Fitabase Data 4.12.16-5.12.16/dailyCalories_merged.csv")
head(daily_calories)
colnames(daily_calories)
## [1] "Id" "ActivityDay" "Calories"
str(daily_calories)
## 'data.frame': 940 obs. of 3 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityDay: chr "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
## $ Calories : int 1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...
#Importing Daily Intensities dataset:
daily_intensities <- read.csv(file = "~/Downloads/Fitabase Data 4.12.16-5.12.16/dailyIntensities_merged.csv")
head(daily_intensities)
colnames(daily_intensities)
## [1] "Id" "ActivityDay"
## [3] "SedentaryMinutes" "LightlyActiveMinutes"
## [5] "FairlyActiveMinutes" "VeryActiveMinutes"
## [7] "SedentaryActiveDistance" "LightActiveDistance"
## [9] "ModeratelyActiveDistance" "VeryActiveDistance"
str(daily_intensities)
## 'data.frame': 940 obs. of 10 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityDay : chr "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
## $ SedentaryMinutes : int 728 776 1218 726 773 539 1149 775 818 838 ...
## $ LightlyActiveMinutes : int 328 217 181 209 221 164 233 264 205 211 ...
## $ FairlyActiveMinutes : int 13 19 11 34 10 20 16 31 12 8 ...
## $ VeryActiveMinutes : int 25 21 30 29 36 38 42 50 28 19 ...
## $ SedentaryActiveDistance : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LightActiveDistance : num 6.06 4.71 3.91 2.83 5.04 ...
## $ ModeratelyActiveDistance: num 0.55 0.69 0.4 1.26 0.41 ...
## $ VeryActiveDistance : num 1.88 1.57 2.44 2.14 2.71 ...
#Importing Daily Steps dataset:
daily_steps <- read.csv(file = "~/Downloads/Fitabase Data 4.12.16-5.12.16/dailySteps_merged.csv")
head(daily_steps)
colnames(daily_steps)
## [1] "Id" "ActivityDay" "StepTotal"
str(daily_steps)
## 'data.frame': 940 obs. of 3 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityDay: chr "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
## $ StepTotal : int 13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
#Importing Heart Rate Seconds dataset:
heart_rate_seconds <- read.csv(file = "~/Downloads/Fitabase Data 4.12.16-5.12.16/heartrate_seconds_merged.csv")
head(heart_rate_seconds)
colnames(heart_rate_seconds)
## [1] "Id" "Time" "Value"
str(heart_rate_seconds)
## 'data.frame': 2483658 obs. of 3 variables:
## $ Id : num 2.02e+09 2.02e+09 2.02e+09 2.02e+09 2.02e+09 ...
## $ Time : chr "4/12/2016 7:21:00 AM" "4/12/2016 7:21:05 AM" "4/12/2016 7:21:10 AM" "4/12/2016 7:21:20 AM" ...
## $ Value: int 97 102 105 103 101 95 91 93 94 93 ...
#Importing Sleep Day dataset:
sleep_day <- read.csv(file = "~/Downloads/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
head(sleep_day)
colnames(sleep_day)
## [1] "Id" "SleepDay" "TotalSleepRecords"
## [4] "TotalMinutesAsleep" "TotalTimeInBed"
str(sleep_day)
## 'data.frame': 413 obs. of 5 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ SleepDay : chr "4/12/2016 12:00:00 AM" "4/13/2016 12:00:00 AM" "4/15/2016 12:00:00 AM" "4/16/2016 12:00:00 AM" ...
## $ TotalSleepRecords : int 1 2 1 2 1 1 1 1 1 1 ...
## $ TotalMinutesAsleep: int 327 384 412 340 700 304 360 325 361 430 ...
## $ TotalTimeInBed : int 346 407 442 367 712 320 377 364 384 449 ...
#Importing Weight Log Info dataset:
weight_log <- read.csv(file = "~/Downloads/Fitabase Data 4.12.16-5.12.16/weightLogInfo_merged.csv")
head(weight_log)
colnames(weight_log)
## [1] "Id" "Date" "WeightKg" "WeightPounds"
## [5] "Fat" "BMI" "IsManualReport" "LogId"
str(weight_log)
## 'data.frame': 67 obs. of 8 variables:
## $ Id : num 1.50e+09 1.50e+09 1.93e+09 2.87e+09 2.87e+09 ...
## $ Date : chr "5/2/2016 11:59:59 PM" "5/3/2016 11:59:59 PM" "4/13/2016 1:08:52 AM" "4/21/2016 11:59:59 PM" ...
## $ WeightKg : num 52.6 52.6 133.5 56.7 57.3 ...
## $ WeightPounds : num 116 116 294 125 126 ...
## $ Fat : int 22 NA NA NA NA 25 NA NA NA NA ...
## $ BMI : num 22.6 22.6 47.5 21.5 21.7 ...
## $ IsManualReport: chr "True" "True" "False" "True" ...
## $ LogId : num 1.46e+12 1.46e+12 1.46e+12 1.46e+12 1.46e+12 ...
#Importing Hourly Steps dataset:
hourly_steps <-read.csv(file="~/Downloads/Fitabase Data 4.12.16-5.12.16/hourlySteps_merged.csv")
head(hourly_steps)
colnames(hourly_steps)
## [1] "Id" "ActivityHour" "StepTotal"
str(hourly_steps)
## 'data.frame': 22099 obs. of 3 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityHour: chr "4/12/2016 12:00:00 AM" "4/12/2016 1:00:00 AM" "4/12/2016 2:00:00 AM" "4/12/2016 3:00:00 AM" ...
## $ StepTotal : int 373 160 151 0 0 0 0 0 250 1864 ...
After using functions like glimpse() and skim_without_charts() to quick view the dataset.
Bothdaily Caloreis_merged.csv and
dailyIntensities_merged.csv contains the same data as
dailyActivity_merged.csv presented.
Even though we have informed that the sample size is 33 users. We use distinct() to verify the unique users.
n_distinct(daily_activity$Id)
## [1] 33
n_distinct(weight_log$Id)
## [1] 8
n_distinct(daily_steps$Id)
## [1] 33
n_distinct(heart_rate_seconds$Id)
## [1] 14
n_distinct(sleep_day$Id)
## [1] 24
n_distinct(weight_log$Id)
## [1] 8
n_distinct(hourly_steps$Id)
## [1] 33
After verify the unique users, we decide to focus on file
dailyActivity_merged.csv sleepDay_merged.csv
and hourlySteps_merged.csv in this case study.
We will checking for any duplicates by using drop_na(). We use distinct( .keep_all = TRUE) to remove duplicate rows based on certain columns.
daily_activity <- daily_activity %>%
drop_na()
sleep_day <- sleep_day %>%
drop_na()
hourly_steps <- hourly_steps %>%
drop_na()
sum(duplicated(daily_activity))
## [1] 0
sum(duplicated(sleep_day))
## [1] 3
sum(duplicated(hourly_steps))
## [1] 0
Now we will remove some duplicate value in
sleepDay_merged.csv.
sleep_day_2<- sleep_day %>% distinct(SleepDay, TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed, .keep_all = TRUE)
sum(duplicated(sleep_day_2))
## [1] 0
str(sleep_day_2)
## 'data.frame': 409 obs. of 5 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ SleepDay : chr "4/12/2016 12:00:00 AM" "4/13/2016 12:00:00 AM" "4/15/2016 12:00:00 AM" "4/16/2016 12:00:00 AM" ...
## $ TotalSleepRecords : int 1 2 1 2 1 1 1 1 1 1 ...
## $ TotalMinutesAsleep: int 327 384 412 340 700 304 360 325 361 430 ...
## $ TotalTimeInBed : int 346 407 442 367 712 320 377 364 384 449 ...
It is not distinctive enough to the column name of StepTotal
and Total Steps in hourly Steps merged.csv and
dailyActivity_merged so that we will rename
StepTotal to hourlySteps.
To align the date format of hourlySteps_merged.csv and
sleepDay_merged.csv with
dailyActivity_merged.csv, we will use as.Date and
as.POSIXct to format.
daily_activity <- daily_activity %>%
rename(Date = ActivityDate)
head(daily_activity)
sleep_day_2 <- sleep_day_2 %>%
rename(Date = SleepDay)
head(sleep_day_2)
hourly_steps <- hourly_steps %>%
rename(hourlyStep = StepTotal)
head(hourly_steps)
sleep_day_2$Date=as.POSIXct(sleep_day_2$Date, format="%m/%d/%Y %I:%M:%S %p")
sleep_day_2$Date=as.Date(sleep_day_2$Date, format = "%m/%d/%Y")
head(sleep_day_2)
daily_activity$Date=as.POSIXct(daily_activity$Date, format="%m/%d/%Y")
daily_activity$Date=as.Date(daily_activity$Date, format = "%m/%d/%Y")
head(daily_activity)
hourly_steps <- hourly_steps %>%
rename(date_time = ActivityHour) %>%
mutate(date_time = as.POSIXct(date_time, format = "%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone()))
head(hourly_steps)
We will merge dailyActivity_merged.csv and
sleepDay_merged.csvto delve into any correlation based on
the primary key of Id and Date. We will use
full_join() by including all=TRUE to keep all the value in
the dataset.
joined_df <- merge(daily_activity,sleep_day_2,by=c("Id","Date"))
glimpse(joined_df)
## Rows: 409
## Columns: 18
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 150396036…
## $ Date <date> 2016-04-11, 2016-04-12, 2016-04-14, 2016-04-…
## $ TotalSteps <int> 13162, 10735, 9762, 12669, 9705, 15506, 10544…
## $ TotalDistance <dbl> 8.50, 6.97, 6.28, 8.16, 6.48, 9.88, 6.68, 6.3…
## $ TrackerDistance <dbl> 8.50, 6.97, 6.28, 8.16, 6.48, 9.88, 6.68, 6.3…
## $ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveDistance <dbl> 1.88, 1.57, 2.14, 2.71, 3.19, 3.53, 1.96, 1.3…
## $ ModeratelyActiveDistance <dbl> 0.55, 0.69, 1.26, 0.41, 0.78, 1.32, 0.48, 0.3…
## $ LightActiveDistance <dbl> 6.06, 4.71, 2.83, 5.04, 2.51, 5.03, 4.24, 4.6…
## $ SedentaryActiveDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveMinutes <int> 25, 21, 29, 36, 38, 50, 28, 19, 41, 39, 73, 3…
## $ FairlyActiveMinutes <int> 13, 19, 34, 10, 20, 31, 12, 8, 21, 5, 14, 23,…
## $ LightlyActiveMinutes <int> 328, 217, 209, 221, 164, 264, 205, 211, 262, …
## $ SedentaryMinutes <int> 728, 776, 726, 773, 539, 775, 818, 838, 732, …
## $ Calories <int> 1985, 1797, 1745, 1863, 1728, 2035, 1786, 177…
## $ TotalSleepRecords <int> 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ TotalMinutesAsleep <int> 327, 384, 412, 340, 700, 304, 360, 325, 361, …
## $ TotalTimeInBed <int> 346, 407, 442, 367, 712, 320, 377, 364, 384, …
Now it’s time to analyze the trend of Fitbit users’ activity and identify any discoveries that would help Bellbeat’s marketing strategy.
We can classified users by their intensity of activity, and the correlation between intensity and steps as follows:
The correlation of intensity and steps was made according 10000 steps.org.au.
After knowing user type, we will calculate user daily steps.
daily_record<- joined_df %>%
group_by(Id) %>%
summarise(daily_steps = mean(TotalSteps), daily_calories = mean(Calories), daily_sleep = mean(TotalMinutesAsleep))
head(daily_record)
Since we have the value of daily steps, we will use it to classify the user type.
user_type <- daily_record %>%
mutate(user_type = case_when(
daily_steps < 5000 ~"Sedentary",
daily_steps >= 5000 & daily_steps <= 7499 ~"Lightly Active",
daily_steps >= 7500 & daily_steps <= 9999 ~"Fairly Active",
daily_steps >= 10000 ~"Very Active"
))
head(user_type)
Then we would like to know the percentage of each type of user.
user_type_percent <- user_type %>%
group_by(user_type) %>%
summarise(total = n()) %>%
mutate(total_users = sum(total)) %>%
group_by(user_type) %>%
summarise(total_percent = total / total_users) %>%
mutate(Percent = percent(total_percent))
head(user_type_percent)
Bellabeat is a high-tech company with the goal to inform and inspire women with knowledge about their own health and habits. With that being said, I will advised collect user own data in terms of demographic information for further investigation. This will be able to provide focused marketing strategy based on customer segmentation besides broader target.
As described in graphs above, we can draw conclusion as follows:
Above all, we will propose the following recommendation.
| Recommendation | Interpretation |
|---|---|
| 1.Activity notification & Recommended exercise | We classified users activity into four types based on the steps of a day. Leveraging this, Bellabeat app can send reminder if user walks less than 8000 steps of the day. On top of that, app can send some recommended workout for user to achieve the daily step goal. |
| 2.Bedtime notification and other resources helping sleep | Since we knew most of users sleep less than 8 hours, we suggest Bellabeat app send bedtime notification with alarm on and other resources that will help sleeping. |
| 3.Reward mechanism | To encourage users to adopt health lifestyle, we propose that there should be a game with redeemable rewards for users if they have completed the certain amount of workout in a limited period. At Bellabeat’s end, the app will reward eligible users with virtual medals and a certain amount of medals can convert to gift cards to be able to use on other Bellabeat’s products. |
| 4.Water-resistant mode | In order to users record more activity tracker data, Bellabeats’ product with a water-resistant feature would meet users’ need. |