How Can a Wellness Technology Company Play It Smart?
INTRODUCTION
Bellabeat, a high-tech manufacturer of health-focused products for women. Collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits. Since it was founded in 2013, the company has grown rapidly and quickly positioned itself as a tech-driven wellness company for women.
PRODUCT
•Bellabeat App: The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. This data can help users better understand their current habits and make healthy decisions.
•Leaf: Bellabeat’s classic wellness tracker can be worn as a bracelet, necklace, or clip. It connects to the Bellabeat app to track activity, sleep, and stress.
•Time: This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress.
•Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day. The Spring bottle connects to the Bellabeat app to track your hydration levels.
Bellabeat membership: Bellabeat also offers a subscription-based membership program for users. Membership gives users 24/7 access to fully personalized guidance on nutrition, activity, sleep, health and beauty, and mindfulness based on their lifestyle and goals.
DEFINITION OF THE PROBLEM
Bellabeat, since its inception has grown rapidly. By 2016, it had opened offices around the world and launched multiple products. Bellabeat products became available through a growing number of online retailers in addition to their own e-commerce channel on their website. The Founders, Urška Sršen and Sando Mur believe that though Bellabeat is a successful small company, it has the potential to become a larger player in the global smart device market. They suggest that an analysis of Bellabeat’s available consumer data would reveal more opportunities for growth as well as an analysis of smart device fitness data could help unlock new growth opportunities for the company.
BUSINESS TASK
• To analyze smart device usage data in order to gain insight into how consumers use non-Bellabeat smart devices.
• To select one Bellabeat product to apply these insights to in the presentation
• To provide high level recommendations for how these trends can inform Bellabeat marketing strategy.
PROBLEM QUESTIONS
•What are some trends in smart device usage?
•How could these trends apply to Bellabeat customers?
•How could these trends help influence Bellabeat marketing strategy?
DATA COLLECTION
The data collected for the purpose of this analysis is a public data that explores smart device daily users’ habit; the Fitbit Fitness Tracker Data. This dataset was made available through Mobius with license - CC0: Public Domain. The data was generated by respondents to a distributed survey via Amazon Mechanical Turk.
DATA PREPARATION AND ORGANIZATION
The Fitbit Fitness Tracker Data contains 18 files that includes information about daily activity, steps, and heart rate that can be used to explore users’ habits. The data was collected from Thirty Three eligible Fitbit users who consented to the submission of personal tracker data. The names of these users have been replaced by ID numbers for the sake of privacy and anonymity.
PROCESS
1. Installing Packages
install.packages('tidyverse')
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages('ggplot2')
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("scales")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("lubridate")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("janitor")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("ggplot2")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
2. Loading Packages
library('tidyverse')
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library('ggplot2')
library('scales')
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
library('lubridate')
library('janitor')
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library ('ggplot2')
3. Importing ‘daily_activity’ and ‘sleep_day’ Datasets.
daily_activity <- read.csv("dailyActivity_merged.csv")
sleep_day <- read.csv("sleepDay_merged.csv")
4. Previewing The Dataset
colnames(daily_activity)
## [1] "Id" "ActivityDate"
## [3] "TotalSteps" "TotalDistance"
## [5] "TrackerDistance" "LoggedActivitiesDistance"
## [7] "VeryActiveDistance" "ModeratelyActiveDistance"
## [9] "LightActiveDistance" "SedentaryActiveDistance"
## [11] "VeryActiveMinutes" "FairlyActiveMinutes"
## [13] "LightlyActiveMinutes" "SedentaryMinutes"
## [15] "Calories"
head(daily_activity)
## Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366 4/12/2016 13162 8.50 8.50
## 2 1503960366 4/13/2016 10735 6.97 6.97
## 3 1503960366 4/14/2016 10460 6.74 6.74
## 4 1503960366 4/15/2016 9762 6.28 6.28
## 5 1503960366 4/16/2016 12669 8.16 8.16
## 6 1503960366 4/17/2016 9705 6.48 6.48
## LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.44 0.40
## 4 0 2.14 1.26
## 5 0 2.71 0.41
## 6 0 3.19 0.78
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 11 181 1218 1776
## 4 34 209 726 1745
## 5 10 221 773 1863
## 6 20 164 539 1728
colnames(sleep_day)
## [1] "Id" "SleepDay" "TotalSleepRecords"
## [4] "TotalMinutesAsleep" "TotalTimeInBed"
head(sleep_day)
## Id SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 4/12/2016 12:00:00 AM 1 327
## 2 1503960366 4/13/2016 12:00:00 AM 2 384
## 3 1503960366 4/15/2016 12:00:00 AM 1 412
## 4 1503960366 4/16/2016 12:00:00 AM 2 340
## 5 1503960366 4/17/2016 12:00:00 AM 1 700
## 6 1503960366 4/19/2016 12:00:00 AM 1 304
## TotalTimeInBed
## 1 346
## 2 407
## 3 442
## 4 367
## 5 712
## 6 320
5. Checking For The Number of Rows in the Dataset
nrow(daily_activity)
## [1] 940
nrow(sleep_day)
## [1] 413
6. Checking For the Number of Duplicated Rows in Each Datasets.
sum(duplicated(daily_activity))
## [1] 0
sum(duplicated(sleep_day))
## [1] 3
7. Removing duplicate rows from sleepDay dataset
library(tidyr)
sleep_day <- sleep_day %>%
distinct() %>%
drop_na()
8. Merging Both Datasets on ‘id’ Column
combined_data <- merge(sleep_day, daily_activity, by="Id")
n_distinct(combined_data$Id)
## [1] 24
glimpse(combined_data)
## Rows: 12,348
## Columns: 19
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 150396036…
## $ SleepDay <chr> "4/12/2016 12:00:00 AM", "4/12/2016 12:00:00 …
## $ TotalSleepRecords <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ TotalMinutesAsleep <int> 327, 327, 327, 327, 327, 327, 327, 327, 327, …
## $ TotalTimeInBed <int> 346, 346, 346, 346, 346, 346, 346, 346, 346, …
## $ ActivityDate <chr> "5/7/2016", "5/6/2016", "5/1/2016", "4/30/201…
## $ TotalSteps <int> 11992, 12159, 10602, 14673, 13162, 10735, 153…
## $ TotalDistance <dbl> 7.71, 8.03, 6.81, 9.25, 8.50, 6.97, 9.80, 8.9…
## $ TrackerDistance <dbl> 7.71, 8.03, 6.81, 9.25, 8.50, 6.97, 9.80, 8.9…
## $ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveDistance <dbl> 2.46, 1.97, 2.29, 3.56, 1.88, 1.57, 5.29, 2.9…
## $ ModeratelyActiveDistance <dbl> 2.12, 0.25, 1.60, 1.42, 0.55, 0.69, 0.57, 1.0…
## $ LightActiveDistance <dbl> 3.13, 5.81, 2.92, 4.27, 6.06, 4.71, 3.94, 4.8…
## $ SedentaryActiveDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveMinutes <int> 37, 24, 33, 52, 25, 21, 73, 45, 48, 16, 31, 7…
## $ FairlyActiveMinutes <int> 46, 6, 35, 34, 13, 19, 14, 24, 28, 12, 23, 11…
## $ LightlyActiveMinutes <int> 175, 289, 246, 217, 328, 217, 216, 250, 189, …
## $ SedentaryMinutes <int> 833, 754, 730, 712, 728, 776, 814, 857, 782, …
## $ Calories <int> 1821, 1896, 1820, 1947, 1985, 1797, 2013, 195…
ANALYZING DATA
1. Summary statistics on data sets and combined dataset’s columns. This code chunk displays the data in this column’s overall maximum, minimum, and mean.
daily_activity %>%
select(TotalSteps,
TotalDistance,
SedentaryMinutes) %>%
summary(daily_activity)
## TotalSteps TotalDistance SedentaryMinutes
## Min. : 0 Min. : 0.000 Min. : 0.0
## 1st Qu.: 3790 1st Qu.: 2.620 1st Qu.: 729.8
## Median : 7406 Median : 5.245 Median :1057.5
## Mean : 7638 Mean : 5.490 Mean : 991.2
## 3rd Qu.:10727 3rd Qu.: 7.713 3rd Qu.:1229.5
## Max. :36019 Max. :28.030 Max. :1440.0
sleep_day %>%
select(TotalSleepRecords,
TotalMinutesAsleep,
TotalTimeInBed) %>%
summary(sleep_day)
## TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## Min. :1.00 Min. : 58.0 Min. : 61.0
## 1st Qu.:1.00 1st Qu.:361.0 1st Qu.:403.8
## Median :1.00 Median :432.5 Median :463.0
## Mean :1.12 Mean :419.2 Mean :458.5
## 3rd Qu.:1.00 3rd Qu.:490.0 3rd Qu.:526.0
## Max. :3.00 Max. :796.0 Max. :961.0
combined_data %>%
select(TotalSteps,TotalDistance,SedentaryMinutes,TotalTimeInBed,
TotalSleepRecords,TotalMinutesAsleep) %>%
summary()
## TotalSteps TotalDistance SedentaryMinutes TotalTimeInBed
## Min. : 0 Min. : 0.000 Min. : 0.0 Min. : 61.0
## 1st Qu.: 4660 1st Qu.: 3.160 1st Qu.: 659.0 1st Qu.:402.0
## Median : 8585 Median : 6.120 Median : 734.0 Median :462.0
## Mean : 8108 Mean : 5.722 Mean : 799.4 Mean :458.2
## 3rd Qu.:11317 3rd Qu.: 7.920 3rd Qu.: 853.0 3rd Qu.:526.0
## Max. :22988 Max. :17.950 Max. :1440.0 Max. :961.0
## TotalSleepRecords TotalMinutesAsleep
## Min. :1.000 Min. : 58.0
## 1st Qu.:1.000 1st Qu.:361.0
## Median :1.000 Median :432.0
## Mean :1.122 Mean :419.1
## 3rd Qu.:1.000 3rd Qu.:492.0
## Max. :3.000 Max. :796.0
2. Relationship between physical activities (like steps, active minutes) and calories burnt.
ggplot(data=combined_data) +
geom_point(mapping =aes(x=TotalSteps, y=Calories) ) +
labs(title="Relationship between total steps and calories",
subtitle="There is increase in calories burnt as steps increase")
combined_data <- combined_data %>%
mutate(TotalActiveMinutes = FairlyActiveMinutes +
LightlyActiveMinutes + VeryActiveMinutes)
ggplot(data=combined_data ) +
geom_point(mapping = aes(x=TotalActiveMinutes, y=Calories)) +
labs(title="Relationship between total active minutes and calories")
The two visualizations above demonstrate a stronger relationship between
the two variables. A user who wants to burn more calories per day will
want to increase their steps and activity level.
3. Users Sleep
ggplot(data=combined_data) + geom_histogram(mapping = aes(x=TotalMinutesAsleep)) +
labs(title="Total minutes asleep ")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
This histogram shows the hours of sleep users get. According to the
histogram, most users get 5-8hours of sleep.
4.What is the average daily calories,step, and sleep of each user ?
Created another data frame called ‘daily_average’ this shows the daily average steps, calories, and sleep of each user.
daily_average <- combined_data %>%
group_by(Id) %>%
summarise (mean_daily_steps = mean(TotalSteps), mean_daily_calories
= mean(Calories), mean_daily_sleep = mean(TotalMinutesAsleep))
head(daily_average)
## # A tibble: 6 × 4
## Id mean_daily_steps mean_daily_calories mean_daily_sleep
## <dbl> <dbl> <dbl> <dbl>
## 1 1503960366 12117. 1816. 360.
## 2 1644430081 7283. 2811. 294
## 3 1844505072 2580. 1573. 652
## 4 1927972279 916. 2173. 417
## 5 2026352035 5567. 1541. 506.
## 6 2320127002 4717. 1724. 61
5. How active are the users?
user_type <- daily_average %>%
mutate(user_type = case_when(
mean_daily_steps < 5000 ~ "sedentary",
mean_daily_steps >= 5000 & mean_daily_steps < 7499 ~ "lightly active",
mean_daily_steps >= 7500 & mean_daily_steps < 9999 ~ "fairly active",
mean_daily_steps >= 10000 ~ "very active"
))
head(user_type)
## # A tibble: 6 × 5
## Id mean_daily_steps mean_daily_calories mean_daily_sleep user_type
## <dbl> <dbl> <dbl> <dbl> <chr>
## 1 1503960366 12117. 1816. 360. very active
## 2 1644430081 7283. 2811. 294 lightly acti…
## 3 1844505072 2580. 1573. 652 sedentary
## 4 1927972279 916. 2173. 417 sedentary
## 5 2026352035 5567. 1541. 506. lightly acti…
## 6 2320127002 4717. 1724. 61 sedentary
ggplot(data=user_type) +
geom_bar(mapping =aes(x=user_type)) +
labs(title="User distribution based on average steps taken daily" )
The bar graph above show that the fairly active users use the smart
devices more.
ACT
Problems with the dataset:
The dataset was last recently updated 5 years ago. It is out of date and may no longer be relevant to Bellabeat.
We cannot check the accuracy of the data because it is provided by a third party. Also,because the data was gathered from only 30 Fitbit users, there may be a bias during analysis.
Reccommendations:
• All Bellabeat products should be integrated into the Bella-App so that anyone having the app may quickly access all services. This is due to the fact that it is quite rare for anyone to go somewhere without their mobile phone. As a result, having access to a mobile phone implies having automatic access to the app, as individuals who leave their electronics at home will have the app to rely on.
• After being integrated with other products, the app should be developed in such a way that it collects all of the user’s necessary data, including product usage data.
• The Bellabeat app should generate a monthly report for each user, which will be mailed to them. This report will include a monthly summary on the relationship, pattern, and trends of their app usage data collected and analyzed, as well as tips on how to live a healthier life.
• A nutrition plan should also be integrated into the app, easing the effort to provide users with more suggestions on how to improve their health and wellness.