Bellabeat is a high-tech company co-founded by Urška Sršen and Sando Mur, known for manufacturing health-focused smart products designed to inspire and empower women. Established in 2013, Bellabeat has experienced significant growth, positioning itself as a leading tech-driven wellness company for women. By collecting data on activity, sleep, stress, and reproductive health, Bellabeat equips women with valuable insights about their own well-being. Recognizing the importance of consumer data analysis, Sršen has tasked the marketing analytics team with studying smart device usage data for a Bellabeat product. This analysis will inform high-level recommendations for the company’s marketing strategy, allowing them to better serve their customers’ needs and preferences.
Characters
○ Urška Sršen: Bellabeat’s cofounder and Chief Creative Officer
○ Sando Mur: Mathematician and Bellabeat’s cofounder; key member of the Bellabeat executive team
○ Bellabeat marketing analytics team: A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat’s marketing strategy.
Products
○ Bellabeat app: The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. This data can help users better understand their current habits and make healthy decisions. The Bellabeat app connects to their line of smart wellness products.
○ Leaf: Bellabeat’s classic wellness tracker can be worn as a bracelet, necklace, or clip. The Leaf tracker connects to the Bellabeat app to track activity, sleep, and stress.
○ Time: This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress.The Time watch connects to the Bellabeat app to provide you with insights into your daily wellness.
○ Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day. The Spring bottle connects to the Bellabeat app to track your hydration levels.
○ Bellabeat membership: Bellabeat also offers a subscription-based membership program for users. Membership gives users 24/7 access to fully personalized guidance on nutrition, activity, sleep, health and beauty, and mindfulness based on their lifestyle and goals.
⦁ Business Task
Analyzing the combined data set, segmenting users based on behavior pattern, discovering trends, and personalized recommendations can be provided to users.
⦁ Questions
What are some trends in smart device usage? How could these trends apply to Bellabeat customers? How could these trends help influence Bellabeat marketing strategy?Source and License
⦁ The data set includes 18.csv files from 33 fitbit users who utilize personal fitness trackers.
⦁ It comes from a third party (Amazon Mechanical Turk).
⦁ The information was gathered in 2016 (not recent).
⦁ License is under CC0: Public Domain, dataset made available through Mobius.
Setting Up Packages
install.packages("tidyverse")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("ggplot2")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("lubridate")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("scales")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("dplyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(ggplot2)
library(dplyr)
library(scales)
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
setwd("/cloud/project/Bellabeat_project")
# Importing Datasets
daily1 <- read.csv("/cloud/project/Bellabeat_project/dailyActivity_merged.csv")
daily2 <- read.csv("/cloud/project/Bellabeat_project/dailyCalories_merged.csv")
daily3 <- read.csv("/cloud/project/Bellabeat_project/dailySteps_merged.csv")
daily4 <- read.csv("/cloud/project/Bellabeat_project/sleepDay_merged.csv")
Checking using these functions ( ):
colnames( ) - to check columns.
head( ) - displaying first 6 rows.
colnames(daily1)
## [1] "Id" "ActivityDate"
## [3] "TotalSteps" "TotalDistance"
## [5] "TrackerDistance" "LoggedActivitiesDistance"
## [7] "VeryActiveDistance" "ModeratelyActiveDistance"
## [9] "LightActiveDistance" "SedentaryActiveDistance"
## [11] "VeryActiveMinutes" "FairlyActiveMinutes"
## [13] "LightlyActiveMinutes" "SedentaryMinutes"
## [15] "Calories"
colnames(daily2)
## [1] "Id" "ActivityDay" "Calories"
colnames(daily3)
## [1] "Id" "ActivityDay" "StepTotal"
colnames(daily4)
## [1] "Id" "SleepDay" "TotalSleepRecords"
## [4] "TotalMinutesAsleep" "TotalTimeInBed"
head(daily1) # dailyActivity
## Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366 4/12/2016 13162 8.50 8.50
## 2 1503960366 4/13/2016 10735 6.97 6.97
## 3 1503960366 4/14/2016 10460 6.74 6.74
## 4 1503960366 4/15/2016 9762 6.28 6.28
## 5 1503960366 4/16/2016 12669 8.16 8.16
## 6 1503960366 4/17/2016 9705 6.48 6.48
## LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.44 0.40
## 4 0 2.14 1.26
## 5 0 2.71 0.41
## 6 0 3.19 0.78
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 11 181 1218 1776
## 4 34 209 726 1745
## 5 10 221 773 1863
## 6 20 164 539 1728
head(daily2) # dailyCalories
## Id ActivityDay Calories
## 1 1503960366 4/12/2016 1985
## 2 1503960366 4/13/2016 1797
## 3 1503960366 4/14/2016 1776
## 4 1503960366 4/15/2016 1745
## 5 1503960366 4/16/2016 1863
## 6 1503960366 4/17/2016 1728
head(daily3) # dailySteps
## Id ActivityDay StepTotal
## 1 1503960366 4/12/2016 13162
## 2 1503960366 4/13/2016 10735
## 3 1503960366 4/14/2016 10460
## 4 1503960366 4/15/2016 9762
## 5 1503960366 4/16/2016 12669
## 6 1503960366 4/17/2016 9705
head(daily4) # SleepDay
## Id SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 4/12/2016 12:00:00 AM 1 327
## 2 1503960366 4/13/2016 12:00:00 AM 2 384
## 3 1503960366 4/15/2016 12:00:00 AM 1 412
## 4 1503960366 4/16/2016 12:00:00 AM 2 340
## 5 1503960366 4/17/2016 12:00:00 AM 1 700
## 6 1503960366 4/19/2016 12:00:00 AM 1 304
## TotalTimeInBed
## 1 346
## 2 407
## 3 442
## 4 367
## 5 712
## 6 320
# Changing date formats and adding column names
daily1$mdy <- mdy(daily1$ActivityDate)
str(daily1)
## 'data.frame': 940 obs. of 16 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityDate : chr "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
## $ TotalSteps : int 13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
## $ TotalDistance : num 8.5 6.97 6.74 6.28 8.16 ...
## $ TrackerDistance : num 8.5 6.97 6.74 6.28 8.16 ...
## $ LoggedActivitiesDistance: num 0 0 0 0 0 0 0 0 0 0 ...
## $ VeryActiveDistance : num 1.88 1.57 2.44 2.14 2.71 ...
## $ ModeratelyActiveDistance: num 0.55 0.69 0.4 1.26 0.41 ...
## $ LightActiveDistance : num 6.06 4.71 3.91 2.83 5.04 ...
## $ SedentaryActiveDistance : num 0 0 0 0 0 0 0 0 0 0 ...
## $ VeryActiveMinutes : int 25 21 30 29 36 38 42 50 28 19 ...
## $ FairlyActiveMinutes : int 13 19 11 34 10 20 16 31 12 8 ...
## $ LightlyActiveMinutes : int 328 217 181 209 221 164 233 264 205 211 ...
## $ SedentaryMinutes : int 728 776 1218 726 773 539 1149 775 818 838 ...
## $ Calories : int 1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...
## $ mdy : Date, format: "2016-04-12" "2016-04-13" ...
daily2$mdy <- mdy(daily2$ActivityDay)
str(daily2)
## 'data.frame': 940 obs. of 4 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityDay: chr "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
## $ Calories : int 1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...
## $ mdy : Date, format: "2016-04-12" "2016-04-13" ...
daily3$mdy <- mdy(daily3$ActivityDay)
str(daily3)
## 'data.frame': 940 obs. of 4 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityDay: chr "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
## $ StepTotal : int 13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
## $ mdy : Date, format: "2016-04-12" "2016-04-13" ...
daily4_new <- daily4 %>%
rename_with(tolower) %>%
rename(activitydate = sleepday) %>%
mutate(date = as.Date(activitydate, "%m/%d/%y"))
str(daily4_new)
## 'data.frame': 413 obs. of 6 variables:
## $ id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ activitydate : chr "4/12/2016 12:00:00 AM" "4/13/2016 12:00:00 AM" "4/15/2016 12:00:00 AM" "4/16/2016 12:00:00 AM" ...
## $ totalsleeprecords : int 1 2 1 2 1 1 1 1 1 1 ...
## $ totalminutesasleep: int 327 384 412 340 700 304 360 325 361 430 ...
## $ totaltimeinbed : int 346 407 442 367 712 320 377 364 384 449 ...
## $ date : Date, format: "2020-04-12" "2020-04-13" ...
# Let's check the summary.
summary(daily1$TotalDistance)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 2.620 5.245 5.490 7.713 28.030
n_distinct(daily1$Id)
## [1] 33
summary(daily1$VeryActiveMinutes)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 0.00 4.00 21.16 32.00 210.00
n_distinct(daily1$Id)
## [1] 33
summary(daily2$Calories)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 1828 2134 2304 2793 4900
n_distinct(daily2$Id)
## [1] 33
summary(daily3$StepTotal)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 3790 7406 7638 10727 36019
n_distinct(daily3$Id)
## [1] 33
summary(daily4$TotalMinutesAsleep)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 58.0 361.0 433.0 419.5 490.0 796.0
n_distinct(daily4$Id)
## [1] 24
Observations
33 participant observations were collected on daily1, 2, and 3.
In daily4, there are just 24 participant observations, and 9 users are absent.
Plot 1
daily1 %>%
ggplot(aes(x=TotalSteps, y= TotalDistance, color=Id)) + geom_point() + geom_smooth(color="Purple") +
scale_color_gradient(low="Blue",high="Yellow") +
labs(title= "Steps And Distance By Users", x= "Distance", y="Steps")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Plot 2
daily1 %>%
ggplot(aes(x= TotalSteps, y= Calories, color= Id)) + geom_point(shape=20, size=2) + geom_smooth(method="loess", color="Purple") +
scale_color_gradient(low ="Yellow", high= "Blue") +
labs(title= "Calories Per Steps",x= "Steps", y="Calories")
## `geom_smooth()` using formula = 'y ~ x'
Plot 3
daily1 %>% group_by(mdy) %>%
ggplot(aes(x=mdy, y= VeryActiveMinutes, color=Id)) + geom_col() +
scale_color_gradient(low="Blue",high="Yellow") +
labs(title ="Active Minutes By Week", x="Weekly", y="Active Minutes")
Plot 4
daily1 %>% group_by(Calories) %>%
ggplot(aes(x=mdy, y= Calories, color=Id)) + geom_col() +
scale_color_gradient(low="Blue",high="Yellow") +
labs(title = "Calories Burn Per Week", x= "Weekly", y= "Calories Burned")
Plot 5
daily1 %>% group_by(mdy, VeryActiveMinutes) %>%
ggplot(aes(x=mdy, y= VeryActiveMinutes)) + geom_smooth(method="loess", color="Yellow") +
labs(title = "VeryActiveMinutes By Week", x="Weekly", y="Active Minutes")
## `geom_smooth()` using formula = 'y ~ x'
Plot 6
daily4_new %>%
ggplot(aes(x=date, y=totalminutesasleep, color=id)) + geom_point() +geom_smooth(color="Red")+
scale_color_gradient(low ="Blue", high="Green") +
labs(title = "Total of Minutes Asleep", x="Weekly", y="Minutes Asleep")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
⦁ On the 33 individuals included in this data, some of them aren’t recording their data or participating.
⦁ The graphics show us that users were engaged within the first three weeks of the data before gradually becoming inactive and, in some cases, stopping altogether.
⦁ There isn’t enough evidence to provide a firm conclusion about trends in activity levels or sleeping habits.
⦁ When consumers are inactive over an extended period of time, the device should encourage them to be active.
⦁ Additionally, it should encourage users to consume eight glasses of water per day, obtain more than seven hours of sleep, and walk 10,000 steps every day.
⦁ Bellabeat will be able to improve the user experience and maintain its position as a market leader by continuously gathering customer feedback and iterating on improvements.
⦁ To reach their target demographic, they must promote the product more using a mix of traditional and digital marketing tactics, including Google Search, social media interaction, video ads on YouTube, and display ads on the Google Display Network.