3. Process Phase
Upload and import dataset files
The public dataset uploaded by Fitabase on the Kaggle platform has
been identified by the founder and deemed suitable for the analysis.
All the files have unique names, there are no corrupted files or
duplicates in the dataset. The files are in the wide format.
3.0 Loading Packages/Set up Environment
Installing packages and opening libraries
The following packages will be used for our analysis: ‘tidyverse’,
‘lubridate’, ‘plotly’.
Install the packages. Then import the data, transform and
analyze.
library(tidyverse) The packages under the tidyverse umbrella help
us in performing and interacting with the data. You use tidyverse for
such elements as subsetting, transforming, visualizing.
library(lubridate) Functions to work with date-times and
time-spans: fast and user friendly parsing of date-time data, extraction
and updating of components of a date-time (years, months, days, hours,
minutes, and seconds), manipulation on date-time and time-span
objects.
library(plotly) Plotly R graphing library makes interactive,
publication-quality graphs.
knitr::opts_chunk$set(echo = T)
knitr::opts_chunk$set(eval = T)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(plotly)
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
library(lubridate)
##
## Attaching package: 'lubridate'
##
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
options(scipen = 100)
3.1 Upload and import dataset files
I decided to analyze daily and hourly usage habits and exclude minute
activities, heart rate and weight records from the analysis. I continued
with R to show analysis process step by step and create interactive
visualizations for the Share phase
With the available data I will explore patterns and trends to help
identify potential opportunities for growth in untapped spaces.
Imparting informed incites to the marketing strategy and
stakeholders.
Data Organization and verification:
The dataset is a collection of 18 .csv files. 15 in long format, 3 in
wide format. The datasets are wide-ranging information from activity
metrics to sleep and calories expelled. Each of them show data related
to the different functions of the device: calories, sleep records, heart
rate and steps; in timeframes of seconds, minutes, hours and days.
I viewed and manipulated the data in Google sheets, the size of the
files were acceptable for analysis in a spreadsheet.
All Data was checked for relevancy and reviewed for spelling, column
naming, duplicate data, non-relevant decimals spaces, blanks (NA),
subsequently the subsets were then narrowed to the most viable/relevant
subsets and finally uploaded to Rstudio for further study and
analysis.
The analysis will focus on the following datasets:
33 ID:
dailyActivity_merged.csv
(contains information about hourly activity, calories, intensities
and steps)
dailyCalories_merged.csv
hourlyIntensities_merged.csv
dailySteps_merged.csv
hourlySteps_merged.csv
24 ID:
sleepDay_merged.csv
Due to the the small sample we won’t consider the following
analysis
Weight (8 Users) and Heart Rate (7 users)
## ========= Loading the data sets =============================================
dailyActivity_merged = read.csv("dailyActivity_merged .csv")
dailySteps_merged = read.csv("dailySteps_merged.csv")
hourlyCalories_merged = read.csv("hourlyCalories_merged.csv")
hourlySteps_merged = read.csv("hourlySteps_merged.csv")
hourlyIntensities_merged = read.csv("hourlyIntensities_merged.csv")
sleepDay_merged = read.csv("sleepDay_merged.csv")
The possible problems with the data are:
Small Sample Size/Set- The data is not originally
from Bellabeat sample set. 33 non-Bellabeat users is not the most ideal
sample size or participant pool. Many subsets had poor participation and
deemed them invalid for analysis. Insights gained may not apply to all
Bellabeat users. Further analysis is necessary.
No Metadata Provided- Details on user location,
lifestyle, weather, temperature, humidity etc. Basic metadata would
provide a richer deeper understanding and context to the data.
Data Collection Duration- Unfortunately, the data
collect is outdated. (04-12-2016 / 05-12-2016) 31 days of data is
limited in providing high-level recommendations. Seasonality is a major
factor when considering trends. Trends impact heavily on user activity
and lifestyle choices. e.g. user’s exercise habits differ between hotter
months and cooler months depending on the climate and geography of the
users. Finally, Data is not originally from BellaBeat which is a missed
opportunity to really clock in on users feedback and needs firsthand and
learn more about the heart and soul of the Bellabeat user.
Demographics- Key data such as gender, age,
ethnicity were not identified. This is unfortunate because there is
pertinent information excluded from the direct female-centric Bellabeat
user base. No insights will be such as the nuances in a women’s
physiology and activity patterns and menstrual cycle or lack
thereof.
We will use the Fitbit Data to analyze usage habits and search for
trends/patterns within the subsets.
3.2 Identify unique IDs to indicate participation levels and
determine data integrity
There are 33 participants in the activity, calories and intensities
data sets, 24 in the sleep and only 8 in the weight data set. 8
participants is not significant to make any recommendations or
conclusions based on this data
n_distinct(dailyActivity_merged$Id)
## [1] 33
n_distinct(dailySteps_merged$Id)
## [1] 33
n_distinct(hourlyCalories_merged$Id)
## [1] 33
n_distinct(hourlyIntensities_merged$Id)
## [1] 33
n_distinct(hourlySteps_merged$Id)
## [1] 33
n_distinct(sleepDay_merged$Id)
## [1] 24
3.3 Formatting date and time columns
Formatting of date and time columns from characters into POSIXct
formats.
## ================== fixing date and time for each ============================
# str(dailyActivity_merged)
dailyActivity_merged$ActivityDate = as.Date(dailyActivity_merged$ActivityDate, format = "%m/%d/%Y")
# str(dailySteps_merged)
dailySteps_merged$ActivityDay = as.Date(dailySteps_merged$ActivityDay, format = "%m/%d/%Y")
names(dailySteps_merged)[2] = "ActivityDate"
# str(hourlyCalories_merged)
hourlyCalories_merged$ActivityHour = as.POSIXct(hourlyCalories_merged$ActivityHour, format = "%m/%d/%Y %I:%M:%S %p")
# str(hourlySteps_merged)
hourlySteps_merged$ActivityHour = as.POSIXct(hourlySteps_merged$ActivityHour, format = "%m/%d/%Y %I:%M:%S %p")
# str(hourlyIntensities_merged)
hourlyIntensities_merged$ActivityHour = as.POSIXct(hourlyIntensities_merged$ActivityHour, format = "%m/%d/%Y %I:%M:%S %p")
# str(sleepDay_merged)
sleepDay_merged$SleepDay = as.POSIXct(sleepDay_merged$SleepDay, format = "%m/%d/%Y %I:%M:%S %p")
3.4 Merging Hourly Data: Calories, Intensity &
Steps
## ============ merging hourly data ============================================
MergedHourly = merge(hourlyCalories_merged,hourlyIntensities_merged,c("ActivityHour","Id"))
MergedHourly = merge(MergedHourly,hourlySteps_merged,c("ActivityHour","Id"))
3.5 Merging Daily Data: Calories, Intensity &
Steps
## =========== merging daily data ==============================================
MergedDaily = merge(dailyActivity_merged,dailySteps_merged, c("ActivityDate","Id"))
4. Analyze/Share
4.1 Analyze Phase and Share Phase
Here trends will be analyzed and visualized with interactive charts
and legends by extracting Fitbit user activity and applying the findings
to the original task questions.
Usage Distribution
Here we will ascertain how often the participants use their leaf
device. With daily_activity, we can assume that days with < 200 Total
Steps taken, are days where users have not used their leaf. I will
filter out these inactive days and assign the following
designations:
Low Use - 1 to 14 days
Moderate Use - 15 to 21 days
High Use - 22 to 31 days
Quantifing the analysis in this way will help to see the different
trends underlying in the Usage Groups.
## ============ Analysis =======================================================
## =============== Usage =======================================================
DailyCount = MergedDaily %>%
group_by(Id) %>%
summarise(Count = n())%>%
arrange(desc(Count))
bins = c(-Inf,20, 30, Inf)
bin_names = c("Low", "Medium", "High")
DailyCount$Freq <- cut(DailyCount$Count, breaks = bins, labels = bin_names)
# summary(DailyCount$Freq)
# summary(DailyCount$Freq)
DailyCount %>%
group_by(Freq)%>%
summarise(Count = n())%>%
plot_ly(type='pie', labels=~Freq , values=~Count,
textinfo='label+percent',
insidetextorientation='radial') %>%
layout(title = 'Usage Distribution of the Users')
Distribution of Usage
00 to 19 ========= Low
20 to 29 ========= Medium
30 to 31 ========= High
Steps
## =========== Steps ===========================================================
StepsData = MergedDaily %>%
group_by(Id)%>%
summarise(StepsSum = sum(TotalSteps), StepsAverage = mean(TotalSteps))
StepsData = merge(StepsData,DailyCount[,c("Id","Freq")], c("Id"))
plot_ly(StepsData,y = ~StepsSum, type = "box", name =~Freq )%>%
layout(title = 'Sum of Steps with respect to Usage',
xaxis = list(title = ""),
yaxis = list (title = "Sum of Steps"),
showlegend = TRUE
)
plot_ly(StepsData,y = ~StepsAverage, type = "box", name =~Freq ) %>%
layout(title = 'Average of Steps with respect to Usage',
xaxis = list(title = ""),
yaxis = list (title = "Average of Steps"),
showlegend = TRUE
)
DayWiseSteps = MergedDaily
DayWiseSteps$Day = weekdays(DayWiseSteps$ActivityDate)
DayWiseSteps = DayWiseSteps %>%
group_by(Day)%>%
summarise(StepsSum = sum(TotalSteps), StepsAverage = mean(TotalSteps))
plot_ly(DayWiseSteps,x = ~Day, y = ~StepsSum,type = "bar",text =~StepsSum, textposition = 'auto') %>%
layout(title = 'Sum of Steps with respect to Day',
xaxis = list(title = "",categoryorder = "total descending"),
yaxis = list (title = "Sum of Steps")
)
Average of Steps with respect to Day
Since we don’t have any demographic variables from our sample set we
want to determine the type of users with the data we have. We can
classify the users by activity considering the daily amount of steps. We
can categorize users as follows:
Sedentary non-active- Less than
5000 steps a day
Lightly active- Between 5000
and 7499 steps a day
Fairly active- Between 7500 and
8999 steps a day
Very active- More than 9000 steps a day
Classification has been made per the following article https://www.10000steps.org.au/articles/counting-steps/
DayWiseSteps$StepsAverage = round(DayWiseSteps$StepsAverage )
plot_ly(DayWiseSteps,x = ~Day, y = ~StepsAverage,type = "bar",text =~StepsAverage, textposition = 'auto') %>%
layout(title = 'Average of Steps with respect to Day',
xaxis = list(title = "",categoryorder = "total descending"),
yaxis = list (title = "Average of Steps")
)
Distance
## =========== distance ========================================================
DistanceData = MergedDaily %>%
group_by(Id)%>%
summarise(DistanceSum = sum(TotalDistance), DistanceAverage = mean(TotalDistance))
DistanceData = merge(DistanceData,DailyCount[,c("Id","Freq")], c("Id"))
plot_ly(DistanceData,y = ~DistanceSum, type = "box", name =~Freq )%>%
layout(title = 'Sum of Distance with respect to Usage',
xaxis = list(title = ""),
yaxis = list (title = "Sum of Distance"),
showlegend = TRUE
)
plot_ly(DistanceData,y = ~DistanceAverage, type = "box", name =~Freq ) %>%
layout(title = 'Average of Distance with respect to Usage',
xaxis = list(title = ""),
yaxis = list (title = "Average of Distance"),
showlegend = TRUE
)
DayWiseDistance = MergedDaily
DayWiseDistance$Day = weekdays(DayWiseDistance$ActivityDate)
DayWiseDistance = DayWiseDistance %>%
group_by(Day)%>%
summarise(DistanceSum = sum(TotalDistance), DistanceAverage = mean(TotalDistance))
plot_ly(DayWiseDistance,x = ~Day, y = ~DistanceSum,type = "bar",text =~DistanceSum, textposition = 'auto') %>%
layout(title = 'Sum of Distance with respect to Day',
xaxis = list(title = "",categoryorder = "total descending"),
yaxis = list (title = "Sum of Distance")
)
DayWiseDistance$DistanceAverage = round(DayWiseDistance$DistanceAverage ,2)
plot_ly(DayWiseDistance,x = ~Day, y = ~DistanceAverage,type = "bar",text =~DistanceAverage, textposition = 'auto') %>%
layout(title = 'Average Distance with respect to Day',
xaxis = list(title = "",categoryorder = "total descending"),
yaxis = list (title = "Average Distance")
)
Calories
The ‘High Use’ group has the highest median calories burnt as well as
a significant higher Q3 group with slightly under 3000 calories burnt.
The wider upper whisker of this group display a wide variance. An
indicator of the wide range of activities/exercise users in this group
partake in. It also reflects that users in this group are generally
active, but some are partaking in very intensity physical
activities.
## =========== calories ========================================================
CaloriesData = MergedDaily %>%
group_by(Id)%>%
summarise(CaloriesSum = sum(Calories), CaloriesAverage = mean(Calories))
CaloriesData = merge(CaloriesData,DailyCount[,c("Id","Freq")], c("Id"))
plot_ly(CaloriesData,y = ~CaloriesSum, type = "box", name =~Freq )%>%
layout(title = 'Sum of Calories with respect to Usage',
xaxis = list(title = ""),
yaxis = list (title = "Sum of Calories"),
showlegend = TRUE
)
plot_ly(CaloriesData,y = ~CaloriesAverage, type = "box", name =~Freq ) %>%
layout(title = 'Average of Calories with respect to Usage',
xaxis = list(title = ""),
yaxis = list (title = "Average of Calories"),
showlegend = TRUE
)
DayWiseCalories = MergedDaily
DayWiseCalories$Day = weekdays(DayWiseCalories$ActivityDate)
DayWiseCalories = DayWiseCalories %>%
group_by(Day)%>%
summarise(CaloriesSum = sum(Calories), CaloriesAverage = mean(Calories))
plot_ly(DayWiseCalories,x = ~Day, y = ~CaloriesSum,type = "bar",text =~CaloriesSum, textposition = 'auto') %>%
layout(title = 'Sum of Calories with respect to Day',
xaxis = list(title = "",categoryorder = "total descending"),
yaxis = list (title = "Sum of Calories")
)
DayWiseCalories$CaloriesAverage = round(DayWiseCalories$CaloriesAverage ,2)
plot_ly(DayWiseCalories,x = ~Day, y = ~CaloriesAverage,type = "bar",text =~CaloriesAverage, textposition = 'auto') %>%
layout(title = 'Average Calories with respect to Day',
xaxis = list(title = "",categoryorder = "total descending"),
yaxis = list (title = "Average Calories")
)
## ============== hourly analysis ==============================================
## extracting Day out of the time
MergedHourly$Day = weekdays(MergedHourly$ActivityHour)
## extracting Hour out of the time
MergedHourly$Hour = hour(MergedHourly$ActivityHour)
### ============ calories ======================================================
plot_ly(MergedHourly,y = ~Calories, type = "box", name =~Day ) %>%
layout(title = 'Calories with respect to Day',
xaxis = list(title = ""),
yaxis = list (title = "Calories"),
showlegend = TRUE
)
HourlyCalories = MergedHourly
HourlyCalories = HourlyCalories %>%
group_by(Day)%>%
summarise(CaloriesSum = sum(Calories), CaloriesAverage = mean(Calories))
plot_ly(HourlyCalories,x = ~Day, y = ~CaloriesSum,type = "bar",text =~CaloriesSum, textposition = 'auto') %>%
layout(title = 'Sum of Calories with respect to Day',
xaxis = list(title = "",categoryorder = "total descending"),
yaxis = list (title = "Sum of Calories")
)
Average calories burned are approximately 2,200 - 2,300 kcal. It
suggests the average amount of calories needed to burn in a day
according to:
https://www.healthline.com/health/fitness-exercise/how-many-calories-do-i-burn-a-day.
However, it varies from person to person, considering other factors
such as age, sex, height, weight, and activity levels.
Calories burned in each day are not strikingly contrasted. This might
be because users did not spend much time being very active on a specific
day for calories to burn.
HourlyCalories$CaloriesAverage = round(HourlyCalories$CaloriesAverage ,2)
plot_ly(HourlyCalories,x = ~Day, y = ~CaloriesAverage,type = "bar",text =~CaloriesAverage, textposition = 'auto') %>%
layout(title = 'Average Calories with respect to Day',
xaxis = list(title = "",categoryorder = "total descending"),
yaxis = list (title = "Average Calories")
)
plot_ly(MergedHourly,y = ~Calories, type = "box", name =~Hour ) %>%
layout(title = 'Calories with respect to Hour',
xaxis = list(title = ""),
yaxis = list (title = "Calories"),
showlegend = TRUE
)
HourlyCaloriesHour = MergedHourly
HourlyCaloriesHour = HourlyCaloriesHour %>%
group_by(Hour)%>%
summarise(CaloriesSum = sum(Calories), CaloriesAverage = mean(Calories))
# HourlyCaloriesHour$Hour = as.factor(HourlyCaloriesHour$Hour)
plot_ly(HourlyCaloriesHour,x = ~Hour, y = ~CaloriesSum,type = "bar",text =~CaloriesSum, textposition = 'auto') %>%
layout(title = 'Sum of Calories with respect to Hour',
xaxis = list(title = "",categoryorder = "total descending"),
yaxis = list (title = "Sum of Calories")
)
The highest amount of calories burned happened during 17:00 -
19:00.
The lowest calories burned period happened during 00:00 - 04:00. The
trends are similar to “Steps vs. Hours” and “Total Intensity vs. Hours”
chart. However, If noticed closely, the amount of burned calories is not
extremely contrasted comparing waking hours to sleeping hours. It is
because calories get burned even while being inactive.
HourlyCaloriesHour$CaloriesAverage = round(HourlyCaloriesHour$CaloriesAverage ,2)
plot_ly(HourlyCaloriesHour,x = ~Hour, y = ~CaloriesAverage,type = "bar",text =~CaloriesAverage, textposition = 'auto') %>%
layout(title = 'Average Calories with respect to Hour',
xaxis = list(title = "",categoryorder = "total descending"),
yaxis = list (title = "Average Calories")
)
Total Intensity
### ============ TotalIntensity ================================================
# names(MergedHourly)
plot_ly(MergedHourly,y = ~TotalIntensity, type = "box", name =~Day ) %>%
layout(title = 'Total Intensity with respect to Day',
xaxis = list(title = ""),
yaxis = list (title = "TotalIntensity"),
showlegend = TRUE
)
HourlyTotalIntensity = MergedHourly
HourlyTotalIntensity = HourlyTotalIntensity %>%
group_by(Day)%>%
summarise(TotalIntensitySum = sum(TotalIntensity), TotalIntensityAverage = mean(TotalIntensity))
plot_ly(HourlyTotalIntensity,x = ~Day, y = ~TotalIntensitySum,type = "bar",text =~TotalIntensitySum, textposition = 'auto') %>%
layout(title = 'Sum of Total Intensity with respect to Day',
xaxis = list(title = "",categoryorder = "total descending"),
yaxis = list (title = "Sum of TotalIntensity")
)
Tuesday and Saturday have the highest number of intensity. Sunday has
the lowest number of intensity, this might be the day when many users
took a rest, similarly to the chart “Steps vs. Days”
HourlyTotalIntensity$TotalIntensityAverage = round(HourlyTotalIntensity$TotalIntensityAverage ,2)
plot_ly(HourlyTotalIntensity,x = ~Day, y = ~TotalIntensityAverage,type = "bar",text =~TotalIntensityAverage, textposition = 'auto') %>%
layout(title = 'Average Total Intensity with respect to Day',
xaxis = list(title = "",categoryorder = "total descending"),
yaxis = list (title = "Average TotalIntensity")
)
plot_ly(MergedHourly,y = ~TotalIntensity, type = "box", name =~Hour ) %>%
layout(title = 'Total Intensity with respect to Hour',
xaxis = list(title = ""),
yaxis = list (title = "TotalIntensity"),
showlegend = TRUE
)
HourlyTotalIntensityHour = MergedHourly
HourlyTotalIntensityHour = HourlyTotalIntensityHour %>%
group_by(Hour)%>%
summarise(TotalIntensitySum = sum(TotalIntensity), TotalIntensityAverage = mean(TotalIntensity))
# HourlyTotalIntensityHour$Hour = as.factor(HourlyTotalIntensityHour$Hour)
plot_ly(HourlyTotalIntensityHour,x = ~Hour, y = ~TotalIntensitySum,type = "bar",text =~TotalIntensitySum, textposition = 'auto') %>%
layout(title = 'Sum of Total Intensity with respect to Hour',
xaxis = list(title = "",categoryorder = "total descending"),
yaxis = list (title = "Sum of TotalIntensity")
)
High activity intensity occurred from 17:00 to 19:00 5pm - 7pm.
Followed by exercise between 12pm - 2 pm while low activity intensity
occurred from 00:00 to 04:00 12am - 4am. The trends are similar to chart
“Steps vs. Hours”.
HourlyTotalIntensityHour$TotalIntensityAverage = round(HourlyTotalIntensityHour$TotalIntensityAverage ,2)
plot_ly(HourlyTotalIntensityHour,x = ~Hour, y = ~TotalIntensityAverage,type = "bar",text =~TotalIntensityAverage, textposition = 'auto') %>%
layout(title = 'Average Total Intensity with respect to Hour',
xaxis = list(title = "",categoryorder = "total descending"),
yaxis = list (title = "Average TotalIntensity")
)
Total Steps
Tuesday and Saturday have the highest number of steps, which are
about 8,900 steps. Sunday has the lowest number of steps, this might be
the day when many users took a rest.
Average steps users took per day are 8,319 steps, which are less than
the recommended amount of 10,000 according to
Counting Your Steps - 10000 Steps
https://www.10000steps.org.au/articles/healthy-lifestyles/counting-steps/
It is recommended that adults walk 10,000 steps a day. However, there
is no data about users’ age or health/physical. Assuming that they are
all adults. More data is needed e.g. gender, age, physical limitations
etc. to have more conclusive recommendations.
### ============ StepTotal ================================================
# names(MergedHourly)
plot_ly(MergedHourly,y = ~StepTotal, type = "box", name =~Day ) %>%
layout(title = 'Total Steps with respect to Day',
xaxis = list(title = ""),
yaxis = list (title = "StepTotal"),
showlegend = TRUE
)
HourlyStepTotal = MergedHourly
HourlyStepTotal = HourlyStepTotal %>%
group_by(Day)%>%
summarise(StepTotalSum = sum(StepTotal), StepTotalAverage = mean(StepTotal))
plot_ly(HourlyStepTotal,x = ~Day, y = ~StepTotalSum,type = "bar",text =~StepTotalSum, textposition = 'auto') %>%
layout(title = 'Sum of Total Steps with respect to Day',
xaxis = list(title = "",categoryorder = "total descending"),
yaxis = list (title = "Sum of StepTotal")
)
HourlyStepTotal$StepTotalAverage = round(HourlyStepTotal$StepTotalAverage ,2)
plot_ly(HourlyStepTotal,x = ~Day, y = ~StepTotalAverage,type = "bar",text =~StepTotalAverage, textposition = 'auto') %>%
layout(title = 'Average Total Steps with respect to Day',
xaxis = list(title = "",categoryorder = "total descending"),
yaxis = list (title = "Average StepTotal")
)
plot_ly(MergedHourly,y = ~StepTotal, type = "box", name =~Hour ) %>%
layout(title = 'Total Steps with respect to Hour',
xaxis = list(title = ""),
yaxis = list (title = "StepTotal"),
showlegend = TRUE
)
A high number of steps occurred in the evening during 17:00 - 19:00.
This is usually the time after work. A number of steps are extremely low
during the hours of 00:00 - 05:00, when most people are asleep.
HourlyStepTotalHour = MergedHourly
HourlyStepTotalHour = HourlyStepTotalHour %>%
group_by(Hour)%>%
summarise(StepTotalSum = sum(StepTotal), StepTotalAverage = mean(StepTotal))
# HourlyStepTotalHour$Hour = as.factor(HourlyStepTotalHour$Hour)
plot_ly(HourlyStepTotalHour,x = ~Hour, y = ~StepTotalSum,type = "bar",text =~StepTotalSum, textposition = 'auto') %>%
layout(title = 'Sum of Total Steps with respect to Hour',
xaxis = list(title = "",categoryorder = "total descending"),
yaxis = list (title = "Sum of StepTotal")
)
HourlyStepTotalHour$StepTotalAverage = round(HourlyStepTotalHour$StepTotalAverage ,2)
plot_ly(HourlyStepTotalHour,x = ~Hour, y = ~StepTotalAverage,type = "bar",text =~StepTotalAverage, textposition = 'auto') %>%
layout(title = 'Average Total Steps with respect to Hour',
xaxis = list(title = "",categoryorder = "total descending"),
yaxis = list (title = "Average StepTotal")
)
## linear Model
HourlyStepTotalHour = MergedHourly
HourlyStepTotalHour = HourlyStepTotalHour %>%
group_by(Hour)%>%
summarise(StepTotalSum = sum(StepTotal), StepTotalAverage = mean(StepTotal))
# HourlyStepTotalHour$Hour = as.factor(HourlyStepTotalHour$Hour)
plot_ly(HourlyStepTotalHour,x = ~Hour, y = ~StepTotalSum,type = "bar",text =~StepTotalSum, textposition = 'auto') %>%
layout(title = 'Sum of Total Steps with respect to Hour',
xaxis = list(title = "",categoryorder = "total descending"),
yaxis = list (title = "Sum of StepTotal")
)
HourlyStepTotalHour$StepTotalAverage = round(HourlyStepTotalHour$StepTotalAverage ,2)
plot_ly(HourlyStepTotalHour,x = ~Hour, y = ~StepTotalAverage,type = "bar",text =~StepTotalAverage, textposition = 'auto') %>%
layout(title = 'Average Total Steps with respect to Hour',
xaxis = list(title = "",categoryorder = "total descending"),
yaxis = list (title = "Average StepTotal")
)
Linear Models Hourly Data
# MergedHourly%>% names()
ModelIntensity = lm(Calories ~ TotalIntensity, data = MergedHourly)
summary(ModelIntensity)
##
## Call:
## lm(formula = Calories ~ TotalIntensity, data = MergedHourly)
##
## Residuals:
## Min 1Q Median 3Q Max
## -204.98 -15.30 -1.39 17.61 418.03
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 66.390525 0.208088 319 <0.0000000000000002 ***
## TotalIntensity 2.575435 0.008556 301 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 26.88 on 22097 degrees of freedom
## Multiple R-squared: 0.8039, Adjusted R-squared: 0.8039
## F-statistic: 9.06e+04 on 1 and 22097 DF, p-value: < 0.00000000000000022
MergedHourly %>%
plot_ly(x = ~TotalIntensity ) %>%
add_markers(y = ~Calories,name = "") %>%
add_lines(x = ~TotalIntensity, y = fitted(ModelIntensity),name = "Linear Regression")
ModelStep = lm(Calories ~ StepTotal, data = MergedHourly)
summary(ModelStep)
##
## Call:
## lm(formula = Calories ~ StepTotal, data = MergedHourly)
##
## Residuals:
## Min 1Q Median 3Q Max
## -343.64 -18.44 -3.44 11.56 594.56
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 74.4446790 0.2608497 285.4 <0.0000000000000002 ***
## StepTotal 0.0716568 0.0003428 209.1 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 35.18 on 22097 degrees of freedom
## Multiple R-squared: 0.6642, Adjusted R-squared: 0.6642
## F-statistic: 4.37e+04 on 1 and 22097 DF, p-value: < 0.00000000000000022
MergedHourly %>%
plot_ly(x = ~StepTotal ) %>%
add_markers(y = ~Calories,name = "") %>%
add_lines(x = ~StepTotal, y = fitted(ModelStep),name = "Linear Regression")
Sleep Data
## ================= sleep data ================================================
sleepDay_merged %>% names()
## [1] "Id" "SleepDay" "TotalSleepRecords"
## [4] "TotalMinutesAsleep" "TotalTimeInBed"
## extracting Day out of the Date
sleepDay_merged$Day = weekdays(sleepDay_merged$SleepDay)
Total Minutes Asleep
## ==================== TotalMinutesAsleep ===========================
plot_ly(sleepDay_merged,y = ~TotalMinutesAsleep, type = "box", name =~Day ) %>%
layout(title = 'Total Minutes Asleep with respect to Day',
xaxis = list(title = ""),
yaxis = list (title = "Total Minutes Asleep"),
showlegend = TRUE
)
sleepDay_Day = sleepDay_merged
sleepDay_Day = sleepDay_Day %>%
group_by(Day)%>%
summarise(TotalMinutesAsleepSum = sum(TotalMinutesAsleep), TotalMinutesAsleepAverage = mean(TotalMinutesAsleep))
sleepDay_Day$TotalMinutesAsleepAverage = round(sleepDay_Day$TotalMinutesAsleepAverage ,2)
plot_ly(sleepDay_Day,x = ~Day, y = ~TotalMinutesAsleepAverage,type = "bar",text =~TotalMinutesAsleepAverage, textposition = 'auto') %>%
layout(title = 'Average Total Minutes Asleep with respect to Days',
xaxis = list(title = "",categoryorder = "total descending"),
yaxis = list (title = "Average Minutes Asleep")
)
Total Time In Bed
Users spent more hours on Sunday sleeping and relaxing in bed before
weekdays begin.
An average total time in bed is 0.6 hours (36 minutes) more than
asleep hours. It is possible that users took half an hour to fall
asleep, which is a bit longer than most healthy adults who took 10-20
minutes to fall asleep as suggested by National Sleep Foundation. Users
might spend time in bed reading or checking phones before sleep or after
waking up.
## ==================== TotalTimeInBed ===============================
plot_ly(sleepDay_merged,y = ~TotalTimeInBed, type = "box", name =~Day ) %>%
layout(title = 'Total Time In Bed with respect to Day',
xaxis = list(title = ""),
yaxis = list (title = "Total Time In Bed"),
showlegend = TRUE
)
Thursdays corresponds with the lowest amount of sleep, while Sunday
corresponds with the highest. The variability of sleep is also the
highest on the weekends. Understandably, as it is a non-working rest
day. It is to note that Wednesday has the highest average sleep during
the weekdays. Perhaps, participants are catching up on sleep after a
busy start to the work week.
sleepDay_DayBed = sleepDay_merged
sleepDay_DayBed = sleepDay_DayBed %>%
group_by(Day)%>%
summarise(TotalTimeInBedSum = sum(TotalTimeInBed), TotalTimeInBedAverage = mean(TotalTimeInBed))
sleepDay_DayBed$TotalTimeInBedAverage = round(sleepDay_DayBed$TotalTimeInBedAverage ,2)
plot_ly(sleepDay_DayBed,x = ~Day, y = ~TotalTimeInBedAverage,type = "bar",text =~TotalTimeInBedAverage, textposition = 'auto') %>%
layout(title = 'Average Total Minutes Asleep with respect to Days',
xaxis = list(title = "",categoryorder = "total descending"),
yaxis = list (title = "Average Minutes Asleep")
)
Linear Model of Sleep Data
## ===================== Linear Model Sleep =====================================
ModelSleep = lm(TotalMinutesAsleep ~ TotalTimeInBed, data = sleepDay_merged)
summary(ModelSleep)
##
## Call:
## lm(formula = TotalMinutesAsleep ~ TotalTimeInBed, data = sleepDay_merged)
##
## Residuals:
## Min 1Q Median 3Q Max
## -264.688 -5.565 10.953 21.793 61.033
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 22.12446 8.00817 2.763 0.00599 **
## TotalTimeInBed 0.86635 0.01683 51.483 < 0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 43.41 on 411 degrees of freedom
## Multiple R-squared: 0.8658, Adjusted R-squared: 0.8654
## F-statistic: 2650 on 1 and 411 DF, p-value: < 0.00000000000000022
Sleep Distribution
Plot of the distribution of sleep for all participants based on the
number of hours of sleep and time in bed recommended by the National
Sleep Foundation
https://www.sleepfoundation.org/how-sleep-works/how-much-sleep-do-we-really-need:
Below Recommended - < 6 hours of sleep
Fairly Recommended - 6 and 7 hours of sleep
Recommended - 7 and 9 hours of sleep
Above Recommended - > 9 hours of sleep
The relationship between Total Minutes Asleep and Total Time in Bed
looks linear. So if the Bellabeat users want to improve their sleep,
Bellbeat app designers should strongly consider using notification
alerts to notify users of the optimum hour to begin rest/sleep.
Participants average 1 hour of sleep follows a normal distribution,
with majority sleeping:
- 320-530 minutes or 6-9 hours.
There are more participants that receive ‘Below Recommended’ amounts
of sleep than there those the receive ‘Above Recommended’ amounts of
sleep according to the National Sleep Foundation.
sleepDay_merged %>%
plot_ly(x = ~TotalTimeInBed ) %>%
add_markers(y = ~TotalMinutesAsleep,name = "") %>%
add_lines(x = ~TotalTimeInBed, y = fitted(ModelSleep),name = "Linear Regression")
Code Summaries
On the average, Fitbit participants sleep 1 time for 7
hours.
The majority of the Fitbit participants are lightly
active.
Average sedentary time is 991 minutes or 16 hours. This isnt
ideal and a sign that users aren’t motivated to use the product and or
exercise.
summary(MergedHourly)
## ActivityHour Id Calories
## Min. :2016-04-12 00:00:00.00 Min. :1503960366 Min. : 42.00
## 1st Qu.:2016-04-19 01:00:00.00 1st Qu.:2320127002 1st Qu.: 63.00
## Median :2016-04-26 06:00:00.00 Median :4445114986 Median : 83.00
## Mean :2016-04-26 11:46:42.58 Mean :4848235270 Mean : 97.39
## 3rd Qu.:2016-05-03 19:00:00.00 3rd Qu.:6962181067 3rd Qu.:108.00
## Max. :2016-05-12 15:00:00.00 Max. :8877689391 Max. :948.00
## TotalIntensity AverageIntensity StepTotal Day
## Min. : 0.00 Min. :0.0000 Min. : 0.0 Length:22099
## 1st Qu.: 0.00 1st Qu.:0.0000 1st Qu.: 0.0 Class :character
## Median : 3.00 Median :0.0500 Median : 40.0 Mode :character
## Mean : 12.04 Mean :0.2006 Mean : 320.2
## 3rd Qu.: 16.00 3rd Qu.:0.2667 3rd Qu.: 357.0
## Max. :180.00 Max. :3.0000 Max. :10554.0
## Hour
## Min. : 0.00
## 1st Qu.: 5.00
## Median :11.00
## Mean :11.42
## 3rd Qu.:17.00
## Max. :23.00
summary(MergedDaily)
## ActivityDate Id TotalSteps TotalDistance
## Min. :2016-04-12 Min. :1503960366 Min. : 0 Min. : 0.000
## 1st Qu.:2016-04-19 1st Qu.:2320127002 1st Qu.: 3790 1st Qu.: 2.620
## Median :2016-04-26 Median :4445114986 Median : 7406 Median : 5.245
## Mean :2016-04-26 Mean :4855407369 Mean : 7638 Mean : 5.490
## 3rd Qu.:2016-05-04 3rd Qu.:6962181067 3rd Qu.:10727 3rd Qu.: 7.713
## Max. :2016-05-12 Max. :8877689391 Max. :36019 Max. :28.030
## TrackerDistance LoggedActivitiesDistance VeryActiveDistance
## Min. : 0.000 Min. :0.0000 Min. : 0.000
## 1st Qu.: 2.620 1st Qu.:0.0000 1st Qu.: 0.000
## Median : 5.245 Median :0.0000 Median : 0.210
## Mean : 5.475 Mean :0.1082 Mean : 1.503
## 3rd Qu.: 7.710 3rd Qu.:0.0000 3rd Qu.: 2.053
## Max. :28.030 Max. :4.9421 Max. :21.920
## ModeratelyActiveDistance LightActiveDistance SedentaryActiveDistance
## Min. :0.0000 Min. : 0.000 Min. :0.000000
## 1st Qu.:0.0000 1st Qu.: 1.945 1st Qu.:0.000000
## Median :0.2400 Median : 3.365 Median :0.000000
## Mean :0.5675 Mean : 3.341 Mean :0.001606
## 3rd Qu.:0.8000 3rd Qu.: 4.782 3rd Qu.:0.000000
## Max. :6.4800 Max. :10.710 Max. :0.110000
## VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes
## Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.0
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.:127.0 1st Qu.: 729.8
## Median : 4.00 Median : 6.00 Median :199.0 Median :1057.5
## Mean : 21.16 Mean : 13.56 Mean :192.8 Mean : 991.2
## 3rd Qu.: 32.00 3rd Qu.: 19.00 3rd Qu.:264.0 3rd Qu.:1229.5
## Max. :210.00 Max. :143.00 Max. :518.0 Max. :1440.0
## Calories StepTotal
## Min. : 0 Min. : 0
## 1st Qu.:1828 1st Qu.: 3790
## Median :2134 Median : 7406
## Mean :2304 Mean : 7638
## 3rd Qu.:2793 3rd Qu.:10727
## Max. :4900 Max. :36019
summary(sleepDay_merged)
## Id SleepDay TotalSleepRecords
## Min. :1503960366 Min. :2016-04-12 00:00:00.00 Min. :1.000
## 1st Qu.:3977333714 1st Qu.:2016-04-19 00:00:00.00 1st Qu.:1.000
## Median :4702921684 Median :2016-04-27 00:00:00.00 Median :1.000
## Mean :5000979403 Mean :2016-04-26 12:40:05.80 Mean :1.119
## 3rd Qu.:6962181067 3rd Qu.:2016-05-04 00:00:00.00 3rd Qu.:1.000
## Max. :8792009665 Max. :2016-05-12 00:00:00.00 Max. :3.000
## TotalMinutesAsleep TotalTimeInBed Day
## Min. : 58.0 Min. : 61.0 Length:413
## 1st Qu.:361.0 1st Qu.:403.0 Class :character
## Median :433.0 Median :463.0 Mode :character
## Mean :419.5 Mean :458.6
## 3rd Qu.:490.0 3rd Qu.:526.0
## Max. :796.0 Max. :961.0
Share Findings
What are some trends in smart device usage?
The data 6 frames have 5 different characteristics as classified into
activity types: Distance, Steps, Intensities, Calories, and Sleep. There
are 33 distinct IDs with steps, intensities and calories records, and 24
distinct IDs with sleep records.
4.2 Activity Types
Steps, Intensities, Calories
Steps, intensities and calories burned follow similar patterns. Users
were most active between 17:00 and 19:00 in the evening, and least
active between 00:00 and 04:00. They were the most active on Tuesday and
Saturday, and least active on Sunday. This means they probably spent
Saturday doing house chores and grocery shopping before resting on
Sunday.
Sleep
Users slept for an adequate amount of time each day (about 7 hours).
They were awake for 0.6 hours or 36 minutes in bed, which might indicate
time to fall asleep or leisure time after waking up. Furthermore, they
spent the most time in bed on Sunday to relax before the week began.
Data indicates not enough sleep recommended according to the National
Sleep Foundation
https://www.sleepfoundation.org/how-sleep-works/how-much-sleep-do-we-really-need
4.3 Relationships Between Activity Types
Calories burned, Steps, Intensity minutes
Calories burned and steps taken have a positive correlation; the more
steps taken, the more calories burned. The trend is similar when users
spending time being very active. If users want to burn more calories,
they should spend less time in lightly active and sedentary activities,
since we discovered that calories burned more during the first period of
being less active, and only burned steadily after that.
Sleep, Steps and Sedentary Minutes
There is no correlation between the number of steps taken and minutes
spent sleeping. In any case, we found that the more time spent being
inactive, the less minutes spent sleeping. After staying sedentary
during the day, users might not be able to sleep much. It’s unlikely
that sedentary minutes might overlap with sleep minutes, but the data is
insufficient to draw any conclusions.
Smart Device Usage
The consistency of smart device usage is different considering all 33
users. There were days when 24 unique IDs wore the device throughout the
day, including during sleep hours. There were days when 30 unique IDs
wore the device ONLY in waking hours, and took off during sleep hours.
And there were days when the device was not worn by 15 unique IDs. It is
obvious that most users wore the device ONLY during the day and took off
before bed.
4.5 Smart Device Usage Groups and Their
Behaviors
Based on 24 users who wore the device all day, 50% (12 users) are in
the high use group, 12.5% (3 users) are in the average use group, and
37.5% (9 users) are in the low use group.
The trends of 3 different usage groups are similar yet different. All
3 groups traveled the farthest around 3-5 km during ‘light active’
activity, which might be activities like leisurely walk or casual bike
ride. Interestingly, users in a low use group are more engaged in ‘very
active’ activity than users in other groups. Overall, the majority of
users were not highly active people.
The trends of intensities and steps are similar. The high use group
has the highest number of steps and intensities on Monday at 18:00, the
average use group has the highest number of steps on Saturday at 08:00,
but the low use group’s trends do not differ among their own.
Recommendations
The Bellabeat app is a unique fitness activity app. By becoming a
companion guide (like a friend) to its users and customers. Bellabeat
can balance their personal and professional life with healthy habits.
Taking into account the current situation with the pandemic, and given
that many people are working remotely, recommendations can be based on
activity at home. Even during the workday women could do simple
exercises each 1h/1,5h, or the desired time.
Users need to be informed that to get the out most out of the device
frequent use would facilitate better data collection that can help bring
more insights and actionable recommendations to their daily routines.
Thereby giving the user an optimum Bellabeat experience. One such
conclusion could point at the users perception that the devices are only
useful when worn during exercise or physical activities.
“THE STANDOUT”
Differentiate “Leaf Urban” device from the competition by marketing
its unique metrics made for women such as the benefits of tracking one’s
stress-resistance and menstrual cycles. However, introducing more
features would drastically improve usership.
Test And Research
Run yearly product evaluations to learn which Bellabeat features are
most beneficial, which ones aren’t, and what can be improved. Set up
periodic short customer surveys for feedback on Bellabeat products
through the Bellabeat app with reward points.
Bellabeat has to invest in first-party data. When a customer first
downloads the Bellabeat App, give the user the option to ‘opt-in’ to
collect usage data for research purposes that remains anonymous. This is
imperative to gain deeper incites and to continue un-bais further
analysis. Collecting demographic buyer data to strengthen each marketing
message would prove useful as well as collection of ethnicity, age,
location, financial, health user data to strengthen each marketing
message and further incites.
Notifications
Provide notifications within the wearable device or app of the user’s
daily progress towards steps goal or weekly sleep goal. Notifications
could also be sent an hour before the user’s desired time to go to bed
to help them prepare for sleep. Ideally, the application should use
algorithms based on user preferences and habits.
A daily calendar/to do list that can be synced up with the rest of
the users day.
Pre schedule and/or intuitive impromptu mindful moment prompts to
center and realigned throughout your workday.
Meal, snack, and water reminders, creating healthier habits
throughout the the day. Bellabeat can use this information for
suggesting healthy eating options: e. g. recommend restaurants/healthy
options based on geolocation.
Gamification
Aware that some users are not motivated by notifications, there could
be some who respond better to rewards! Bellabeat could create a kind of
game on the app for a limited period of time. The Game would consist of
reaching different levels based on the amount of steps walked or
calories burned every day. Users need to maintain certain activity
levels for a period of time (maybe a month) to pass to the next level.
For each level users would win certain amount of points. Once enough
points are accumulated the users can redeem them for free merchandise or
a set percentage discount on a future purchase of a wearable device.
Users can also choose to donate their points to a charity.
Partnerships with health technology companies for integration of
their products with Bellabeat wearables is a clever way to expand
exposure. Partnerships could include scales for weight tracking, insulin
tracking, or water bottles for water intake tracking.
Competitions
For those users in the High active group Belleabeat could create a
regional leaderboard with each user’s weekly/monthly number of steps
ranked with other users globally. At the end of each term, the top 3
users are awarded merchandise or discount codes for a future wearable
purchase, coaching, vacation, spa days, etc.
Incorporate a Referral Program
Where current wearable users can receive incremental discounts on
merchandise or future wearable purchases based on the how many of their
referrals actually purchase a wearable.
Blockchain NFT Technology
Create a Community token/NFT that can circulate in the community,
Bellabeat store or with varies partnerships. Perhaps even a BellaBeat
“Dao” as an investment strategy. An investment DAO is a decentralized
organization that invests funds as a group. Anyone who owns the
investment DAO’s governance token can participate in the decision-making
process. The more of the token you hold, the larger your voting
power.
Some women may choose to make Bellabeat apart of their extended
family and friends. Building a user community in an app can increase
motivation.
BellaCommunity Niche
Establishing a community like no other! Moderators and guest speakers
giving real professional advice. Fostering a safe space to address
sensitive questions and to exchange discourse on solutions, testimonies,
with health professionals and peers alike. A place to convene meetings
and events, exercise classes, concerts, digital flea markets in
different cities and/or virtually via zoom, metaverse meetups etc.
A “friend” chat support (additional payment) from a Bellabeat team
member.
A notebook/Journal in which women could write down their feelings,
emotions, or just something that inspires and motivates them. Women
could choose to view their own written motivation in a pop-up window (if
they selected that option in the app).
Experts In The Field
Hire an herbalist, psychotherapist, nutritionist, personal trainer
and a digital organizer who could support and motivate women and provide
real advice. (additional payment).
Isolate and define each User Segments needs.
Bringing the needs of ALL women to the table. Full figured, Women of
color, ie. Black women Latino, Asian women, findings show these
demographics are not a significant part of the marketing campaign. This
is a untapped market with many marketing avenues and pipelines to
explore.
Women who are health motivated will want more sophisticated metrics
in tracking their daily habits and wellbeing, e.g. highly stressed women
who need wellness tracker support for sleep and hormonal changes
vs. health motivated women who want to track daily activity levels and
calories wearing a classic removable jewelry piece.
Other recommendations:
Possibility to create a playlist of favorite music, podcast, movies,
books, media which will help women to stay motivated.
Special recommendations for pre-natal and child bearing woman.
Reminders about health issues and health screenings.
Clearly articulate different customer segments within the marketing
strategy.
55 and older - senior users, needs exploration. Researching and
collecting data in this demographic is often overlooked, e.g. Dementia
and Alzheimer’s disease also called: senile dementia. A device like this
would be excellent to keep family, friends and neighbors aware of the
users whereabouts with geolocation and health vitals. Creating a ledger
of all health stats and exercise ready for a health care provider to
review. Just another untapped space, the partnerships are endless!