About the company

Bellabeat is a high-tech company co-founded by Urška Sršen and Sando Mur, known for manufacturing health-focused smart products designed to inspire and empower women. Established in 2013, Bellabeat has experienced significant growth, positioning itself as a leading tech-driven wellness company for women. By collecting data on activity, sleep, stress, and reproductive health, Bellabeat equips women with valuable insights about their own well-being. Recognizing the importance of consumer data analysis, Sršen has tasked the marketing analytics team with studying smart device usage data for a Bellabeat product. This analysis will inform high-level recommendations for the company’s marketing strategy, allowing them to better serve their customers’ needs and preferences.

Characters and products

Characters

○ Urška Sršen: Bellabeat’s cofounder and Chief Creative Officer

○ Sando Mur: Mathematician and Bellabeat’s cofounder; key member of the Bellabeat executive team

○ Bellabeat marketing analytics team: A team of data analysts responsible for   collecting, analyzing, and reporting data that helps guide Bellabeat’s marketing strategy. 

Products

○ Bellabeat app: The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. This data can help users better understand their current habits and make healthy decisions. The Bellabeat app connects to their line of smart wellness products.

○ Leaf: Bellabeat’s classic wellness tracker can be worn as a bracelet, necklace, or clip. The Leaf tracker connects to the Bellabeat app to track activity, sleep, and stress.

○ Time: This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity,  sleep, and stress.The Time watch connects to the Bellabeat app to provide you with insights into your daily wellness.

○ Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day. The Spring bottle connects to the Bellabeat app to track your hydration levels.

○ Bellabeat membership: Bellabeat also offers a subscription-based  membership program for users. Membership gives users 24/7 access to fully personalized guidance on nutrition, activity, sleep, health and beauty, and mindfulness based on their lifestyle and goals.

ASK

Business Task

Analyzing the combined data set, segmenting users based on behavior pattern, discovering trends, and personalized recommendations can be provided to users.

Questions

  1.  What are some trends in smart device usage?
  2.  How could these trends apply to Bellabeat customers?
  3.  How could these trends help influence Bellabeat marketing strategy?

PREPARE

Source and License

FitBit Fitness Tracker Data

⦁ The data set includes 18.csv files from 33 fitbit users who utilize personal fitness trackers.

⦁ It comes from a third party (Amazon Mechanical Turk).

⦁ The information was gathered in 2016 (not recent).

⦁ License is under CC0: Public Domain, dataset made available through Mobius.

Setting Up Packages

install.packages("tidyverse")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("ggplot2")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("lubridate")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("scales")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
install.packages("dplyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(ggplot2)
library(dplyr)
library(scales)
## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## The following object is masked from 'package:readr':
## 
##     col_factor
setwd("/cloud/project/Bellabeat_project")
# Importing Datasets

daily1 <- read.csv("/cloud/project/Bellabeat_project/dailyActivity_merged.csv")
daily2 <- read.csv("/cloud/project/Bellabeat_project/dailyCalories_merged.csv")
daily3 <- read.csv("/cloud/project/Bellabeat_project/dailySteps_merged.csv")
daily4 <- read.csv("/cloud/project/Bellabeat_project/sleepDay_merged.csv")

Checking using these functions ( ):

colnames(daily1)
##  [1] "Id"                       "ActivityDate"            
##  [3] "TotalSteps"               "TotalDistance"           
##  [5] "TrackerDistance"          "LoggedActivitiesDistance"
##  [7] "VeryActiveDistance"       "ModeratelyActiveDistance"
##  [9] "LightActiveDistance"      "SedentaryActiveDistance" 
## [11] "VeryActiveMinutes"        "FairlyActiveMinutes"     
## [13] "LightlyActiveMinutes"     "SedentaryMinutes"        
## [15] "Calories"
colnames(daily2)
## [1] "Id"          "ActivityDay" "Calories"
colnames(daily3)
## [1] "Id"          "ActivityDay" "StepTotal"
colnames(daily4)
## [1] "Id"                 "SleepDay"           "TotalSleepRecords" 
## [4] "TotalMinutesAsleep" "TotalTimeInBed"
head(daily1) # dailyActivity
##           Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366    4/12/2016      13162          8.50            8.50
## 2 1503960366    4/13/2016      10735          6.97            6.97
## 3 1503960366    4/14/2016      10460          6.74            6.74
## 4 1503960366    4/15/2016       9762          6.28            6.28
## 5 1503960366    4/16/2016      12669          8.16            8.16
## 6 1503960366    4/17/2016       9705          6.48            6.48
##   LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1                        0               1.88                     0.55
## 2                        0               1.57                     0.69
## 3                        0               2.44                     0.40
## 4                        0               2.14                     1.26
## 5                        0               2.71                     0.41
## 6                        0               3.19                     0.78
##   LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1                6.06                       0                25
## 2                4.71                       0                21
## 3                3.91                       0                30
## 4                2.83                       0                29
## 5                5.04                       0                36
## 6                2.51                       0                38
##   FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1                  13                  328              728     1985
## 2                  19                  217              776     1797
## 3                  11                  181             1218     1776
## 4                  34                  209              726     1745
## 5                  10                  221              773     1863
## 6                  20                  164              539     1728
head(daily2) # dailyCalories
##           Id ActivityDay Calories
## 1 1503960366   4/12/2016     1985
## 2 1503960366   4/13/2016     1797
## 3 1503960366   4/14/2016     1776
## 4 1503960366   4/15/2016     1745
## 5 1503960366   4/16/2016     1863
## 6 1503960366   4/17/2016     1728
head(daily3) # dailySteps
##           Id ActivityDay StepTotal
## 1 1503960366   4/12/2016     13162
## 2 1503960366   4/13/2016     10735
## 3 1503960366   4/14/2016     10460
## 4 1503960366   4/15/2016      9762
## 5 1503960366   4/16/2016     12669
## 6 1503960366   4/17/2016      9705
head(daily4) # SleepDay
##           Id              SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 4/12/2016 12:00:00 AM                 1                327
## 2 1503960366 4/13/2016 12:00:00 AM                 2                384
## 3 1503960366 4/15/2016 12:00:00 AM                 1                412
## 4 1503960366 4/16/2016 12:00:00 AM                 2                340
## 5 1503960366 4/17/2016 12:00:00 AM                 1                700
## 6 1503960366 4/19/2016 12:00:00 AM                 1                304
##   TotalTimeInBed
## 1            346
## 2            407
## 3            442
## 4            367
## 5            712
## 6            320

PROCESS

# Changing date formats and adding column names

daily1$mdy <- mdy(daily1$ActivityDate) 
str(daily1)
## 'data.frame':    940 obs. of  16 variables:
##  $ Id                      : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDate            : chr  "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
##  $ TotalSteps              : int  13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
##  $ TotalDistance           : num  8.5 6.97 6.74 6.28 8.16 ...
##  $ TrackerDistance         : num  8.5 6.97 6.74 6.28 8.16 ...
##  $ LoggedActivitiesDistance: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveDistance      : num  1.88 1.57 2.44 2.14 2.71 ...
##  $ ModeratelyActiveDistance: num  0.55 0.69 0.4 1.26 0.41 ...
##  $ LightActiveDistance     : num  6.06 4.71 3.91 2.83 5.04 ...
##  $ SedentaryActiveDistance : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveMinutes       : int  25 21 30 29 36 38 42 50 28 19 ...
##  $ FairlyActiveMinutes     : int  13 19 11 34 10 20 16 31 12 8 ...
##  $ LightlyActiveMinutes    : int  328 217 181 209 221 164 233 264 205 211 ...
##  $ SedentaryMinutes        : int  728 776 1218 726 773 539 1149 775 818 838 ...
##  $ Calories                : int  1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...
##  $ mdy                     : Date, format: "2016-04-12" "2016-04-13" ...
daily2$mdy <- mdy(daily2$ActivityDay)
str(daily2)
## 'data.frame':    940 obs. of  4 variables:
##  $ Id         : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDay: chr  "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
##  $ Calories   : int  1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...
##  $ mdy        : Date, format: "2016-04-12" "2016-04-13" ...
daily3$mdy <- mdy(daily3$ActivityDay)
str(daily3)
## 'data.frame':    940 obs. of  4 variables:
##  $ Id         : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDay: chr  "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
##  $ StepTotal  : int  13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
##  $ mdy        : Date, format: "2016-04-12" "2016-04-13" ...
daily4_new <- daily4 %>%
    rename_with(tolower) %>%
    rename(activitydate = sleepday) %>%
    mutate(date = as.Date(activitydate, "%m/%d/%y"))
str(daily4_new)
## 'data.frame':    413 obs. of  6 variables:
##  $ id                : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ activitydate      : chr  "4/12/2016 12:00:00 AM" "4/13/2016 12:00:00 AM" "4/15/2016 12:00:00 AM" "4/16/2016 12:00:00 AM" ...
##  $ totalsleeprecords : int  1 2 1 2 1 1 1 1 1 1 ...
##  $ totalminutesasleep: int  327 384 412 340 700 304 360 325 361 430 ...
##  $ totaltimeinbed    : int  346 407 442 367 712 320 377 364 384 449 ...
##  $ date              : Date, format: "2020-04-12" "2020-04-13" ...

Analyzing

# Let's check the summary.
summary(daily1$TotalDistance) 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   2.620   5.245   5.490   7.713  28.030
n_distinct(daily1$Id)
## [1] 33
summary(daily1$VeryActiveMinutes)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    0.00    4.00   21.16   32.00  210.00
n_distinct(daily1$Id)
## [1] 33
summary(daily2$Calories)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    1828    2134    2304    2793    4900
n_distinct(daily2$Id)
## [1] 33
summary(daily3$StepTotal)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       0    3790    7406    7638   10727   36019
n_distinct(daily3$Id)
## [1] 33
summary(daily4$TotalMinutesAsleep)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    58.0   361.0   433.0   419.5   490.0   796.0
n_distinct(daily4$Id)
## [1] 24

Observations

Visualizing Data

Plot 1

daily1 %>% 
  ggplot(aes(x=TotalSteps, y= TotalDistance, color=Id)) + geom_point() + geom_smooth(color="Purple") + 
  scale_color_gradient(low="Blue",high="Yellow") +
  labs(title= "Steps And Distance By Users", x= "Distance", y="Steps")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Plot 2

daily1 %>% 
  ggplot(aes(x= TotalSteps, y= Calories, color= Id)) + geom_point(shape=20, size=2) + geom_smooth(method="loess", color="Purple") + 
  scale_color_gradient(low ="Yellow", high= "Blue") +
  labs(title= "Calories Per Steps",x= "Steps", y="Calories")
## `geom_smooth()` using formula = 'y ~ x'

Plot 3

daily1 %>% group_by(mdy) %>%
 ggplot(aes(x=mdy, y= VeryActiveMinutes, color=Id)) + geom_col() +
scale_color_gradient(low="Blue",high="Yellow") +
labs(title ="Active Minutes By Week", x="Weekly", y="Active Minutes")

Plot 4

daily1 %>% group_by(Calories) %>%
 ggplot(aes(x=mdy, y= Calories, color=Id)) + geom_col() + 
  scale_color_gradient(low="Blue",high="Yellow") +
labs(title = "Calories Burn Per Week", x= "Weekly", y= "Calories Burned")

Plot 5

daily1 %>% group_by(mdy, VeryActiveMinutes) %>%
 ggplot(aes(x=mdy, y= VeryActiveMinutes)) + geom_smooth(method="loess", color="Yellow") +
labs(title = "VeryActiveMinutes By Week", x="Weekly", y="Active Minutes")
## `geom_smooth()` using formula = 'y ~ x'

Plot 6

daily4_new %>%
ggplot(aes(x=date, y=totalminutesasleep, color=id)) + geom_point() +geom_smooth(color="Red")+ 
   scale_color_gradient(low ="Blue", high="Green") +
  labs(title = "Total of Minutes Asleep", x="Weekly", y="Minutes Asleep")
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

Conclusion

⦁ On the 33 individuals included in this data, some of them aren’t recording their data or participating.

⦁ The graphics show us that users were engaged within the first three weeks of the data before gradually becoming inactive and, in some cases, stopping altogether.

⦁ There isn’t enough evidence to provide a firm conclusion about trends in activity levels or sleeping habits.

Recommendations

⦁ When consumers are inactive over an extended period of time, the device should encourage them to be active.

⦁ Additionally, it should encourage users to consume eight glasses of water per day, obtain more than seven hours of sleep, and walk 10,000 steps every day.

⦁ Bellabeat will be able to improve the user experience and maintain its position as a market leader by continuously gathering customer feedback and iterating on improvements.

⦁ To reach their target demographic, they must promote the product more using a mix of traditional and digital marketing tactics, including Google Search, social media interaction, video ads on YouTube, and display ads on the Google Display Network.