Bellabeat is a high-tech company that manufactures health-focused smart products.They offer different smart devices that collect data on activity, sleep, stress, and reproductive health to empower women with knowledge about their own health and habits.
To analyze smart device fitness data to gain insight into how consumers are using their smart devices and use these insights to guide Bellabeat’s marketing strategy for growth in the global smart device market
The primary business goal is to utilize external data on smart water bottle usage to refine Bellabeat’s product development and marketing strategies within the smart water bottle niche. This effort is focused on gaining in-depth insights into consumer behaviors and preferences specific to smart water bottles. By achieving this objective, Bellabeat aims to optimize its approach, cater effectively to potential smart water bottle customers, and strategically position its products for success in the ever-evolving smart water bottle market.
Spring - water bottle https://bellabeat.com/product/spring/
These trends highlight the evolving landscape of smart water bottle usage, where personalization, motivation, and technological advancements play pivotal roles in enhancing the user experience and promoting better hydration habits.
Easy Customization: Modern smart water bottles offer easy customization of hydration targets based on individual factors like age, weight, activity levels, and environmental conditions. Users can set their own hydration preferences and reminders for a personalized experience.
Motivational and Fun Features: These bottles incorporate motivational elements such as visual cues (e.g., illumination, color changes) to encourage users to drink water regularly. Social integration features allow users to engage with friends on social media, fostering friendly challenges and adding excitement to hydration routines.
Long Battery Life: Many smart water bottles feature long-lasting batteries, making them suitable for daily use, travel, and outdoor activities. While smaller models typically have battery capacities of 200-500 mAh for portability, larger and advanced bottles can reach up to 1500mAh, providing weeks or months of usage without recharging.
Backed by Science: Clinical trials support the efficacy of smart water bottles in promoting healthy hydration habits. Beyond reminders, these bottles hold potential in the medical field, addressing the challenge of maintaining proper hydration and contributing to overall well-being.
Spring’s app and smart technology can calculate the optimal amount of water for user’s body and remind users of water intake base on the users age, height, weight, local weather, activity level, pregnancy or breastfeeding, help to remind users of water consumption. Thus, Spring, as smart bottle that can help remind users avoid dehydration, establish, and maintain healthy hydration habit, is considerable product for development in the market.
The main stakeholders here are Urška Sršen, Bellabeat’s co-founder and Chief Creative Officer; Sando Mur, Mathematician and Bellabeat’s cofounder; And the rest of the Bellabeat marketing analytics team.
The data source used for our case study is FitBit Fitness Tracker Data. This dataset is stored in Kaggle and was made available through Mobius.
Verifying the metadata of our dataset we can confirm it is open-source. The owner has dedicated the work to the public domain by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law. You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission.
These datasets were generated by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016-05.12.2016. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. Variation between output represents use of different types of Fitbit trackers and individual tracking behaviors / preferences.
Available to us are 6 CSV documents and a excel sheet created by research on other smart water bottle products. Each document represents different quantitative data tracked by Fitbit. The data is considered long since each row is one time point per subject, so each subject will have data in multiple rows.Every user has a unique ID and different rows since data is tracked by day and time. Counted sample size (users) of each table and verified time length of analysis - 31 days.
data <- data.frame(
Filename = c("dailyActivity_merged.csv", "sleepDay_merged.csv", "dailySteps_merged.csv","dailyIntensties_merged.csv", "dailyCalories_merged.csv", "weightLogInfo_merged.csv","Smart_waterbottles.xlsx"),
TypeOfFile = c("CSV", "CSV", "CSV", "CSV", "CSV", "CSV","XLSX"),
Description = c(
"Daily Activity over 31 days of 33 users. Tracking daily: Steps, Distance, Intensities, Calories",
"Daily sleep logs, tracked by: Total count of sleeps a day, Total minutes, Total Time in Bed",
"Daily Steps over 31 days of 33 users",
"Daily Intensity over 31 days of 33 users. Measured in Minutes and Distance, dividing groups in 4 categories: Sedentary, Lightly Active, Fairly Active,Very Active",
"Daily Calories over 31 days of 33 users",
"Weight track by day in Kg and Pounds over 30 days. Calculation of BMI.5 users report weight manually 3 users not.In total there are 8 users",
"Data on 6 other smart watter bottle products"
)
)
print(data)
## Filename TypeOfFile
## 1 dailyActivity_merged.csv CSV
## 2 sleepDay_merged.csv CSV
## 3 dailySteps_merged.csv CSV
## 4 dailyIntensties_merged.csv CSV
## 5 dailyCalories_merged.csv CSV
## 6 weightLogInfo_merged.csv CSV
## 7 Smart_waterbottles.xlsx XLSX
## Description
## 1 Daily Activity over 31 days of 33 users. Tracking daily: Steps, Distance, Intensities, Calories
## 2 Daily sleep logs, tracked by: Total count of sleeps a day, Total minutes, Total Time in Bed
## 3 Daily Steps over 31 days of 33 users
## 4 Daily Intensity over 31 days of 33 users. Measured in Minutes and Distance, dividing groups in 4 categories: Sedentary, Lightly Active, Fairly Active,Very Active
## 5 Daily Calories over 31 days of 33 users
## 6 Weight track by day in Kg and Pounds over 30 days. Calculation of BMI.5 users report weight manually 3 users not.In total there are 8 users
## 7 Data on 6 other smart watter bottle products
Due to the limitation of size (30 users) and not having any demographic information we could encounter a sampling bias. We are not sure if the sample is representative of the population as a whole. Another problem we would encounter is that the dataset is not current and also the time limitation of the survey (2 months long). That is why we will give our case study an operational approach.
The entire analysis is done in RStudio.
We will choose the packages that will help us on our analysis and open them. We will use the following packages for our analysis:
library(ggpubr)
## Loading required package: ggplot2
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ lubridate 1.9.3 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(here)
## here() starts at C:/Users/Harish/OneDrive/Documents/Rcase_studies/bellabeat_casestudy
library(skimr)
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(lubridate)
library(ggrepel)
library(readxl)
Knowing the datasets we have, we will upload the datasets that will help us answer our business task. On our analysis we will focus on the following datasets
Due to the the small sample we won’t consider Weight (8 Users) for this analysis.
daily_steps <- read.csv("dailySteps_merged.csv")
daily_intensities <- read.csv("dailyIntensities_merged.csv")
daily_calories <- read.csv("dailyCalories_merged.csv")
hourly_steps <- read.csv("hourlySteps_merged.csv")
daily_activity <- read.csv("dailyActivity_merged.csv")
products <- read_excel("smart_waterbottles.xlsx")
We will preview our selected data frames and check the summary of each column.
head(daily_steps)
## Id ActivityDay StepTotal
## 1 1503960366 4/12/2016 13162
## 2 1503960366 4/13/2016 10735
## 3 1503960366 4/14/2016 10460
## 4 1503960366 4/15/2016 9762
## 5 1503960366 4/16/2016 12669
## 6 1503960366 4/17/2016 9705
str(daily_steps)
## 'data.frame': 940 obs. of 3 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityDay: chr "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
## $ StepTotal : int 13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
head(daily_intensities)
## Id ActivityDay SedentaryMinutes LightlyActiveMinutes
## 1 1503960366 4/12/2016 728 328
## 2 1503960366 4/13/2016 776 217
## 3 1503960366 4/14/2016 1218 181
## 4 1503960366 4/15/2016 726 209
## 5 1503960366 4/16/2016 773 221
## 6 1503960366 4/17/2016 539 164
## FairlyActiveMinutes VeryActiveMinutes SedentaryActiveDistance
## 1 13 25 0
## 2 19 21 0
## 3 11 30 0
## 4 34 29 0
## 5 10 36 0
## 6 20 38 0
## LightActiveDistance ModeratelyActiveDistance VeryActiveDistance
## 1 6.06 0.55 1.88
## 2 4.71 0.69 1.57
## 3 3.91 0.40 2.44
## 4 2.83 1.26 2.14
## 5 5.04 0.41 2.71
## 6 2.51 0.78 3.19
str(daily_intensities)
## 'data.frame': 940 obs. of 10 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityDay : chr "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
## $ SedentaryMinutes : int 728 776 1218 726 773 539 1149 775 818 838 ...
## $ LightlyActiveMinutes : int 328 217 181 209 221 164 233 264 205 211 ...
## $ FairlyActiveMinutes : int 13 19 11 34 10 20 16 31 12 8 ...
## $ VeryActiveMinutes : int 25 21 30 29 36 38 42 50 28 19 ...
## $ SedentaryActiveDistance : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LightActiveDistance : num 6.06 4.71 3.91 2.83 5.04 ...
## $ ModeratelyActiveDistance: num 0.55 0.69 0.4 1.26 0.41 ...
## $ VeryActiveDistance : num 1.88 1.57 2.44 2.14 2.71 ...
head(daily_calories)
## Id ActivityDay Calories
## 1 1503960366 4/12/2016 1985
## 2 1503960366 4/13/2016 1797
## 3 1503960366 4/14/2016 1776
## 4 1503960366 4/15/2016 1745
## 5 1503960366 4/16/2016 1863
## 6 1503960366 4/17/2016 1728
str(daily_calories)
## 'data.frame': 940 obs. of 3 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityDay: chr "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
## $ Calories : int 1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...
head(hourly_steps)
## Id ActivityHour StepTotal
## 1 1503960366 4/12/2016 12:00:00 AM 373
## 2 1503960366 4/12/2016 1:00:00 AM 160
## 3 1503960366 4/12/2016 2:00:00 AM 151
## 4 1503960366 4/12/2016 3:00:00 AM 0
## 5 1503960366 4/12/2016 4:00:00 AM 0
## 6 1503960366 4/12/2016 5:00:00 AM 0
str(hourly_steps)
## 'data.frame': 22099 obs. of 3 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityHour: chr "4/12/2016 12:00:00 AM" "4/12/2016 1:00:00 AM" "4/12/2016 2:00:00 AM" "4/12/2016 3:00:00 AM" ...
## $ StepTotal : int 373 160 151 0 0 0 0 0 250 1864 ...
head(daily_activity)
## Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366 4/12/2016 13162 8.50 8.50
## 2 1503960366 4/13/2016 10735 6.97 6.97
## 3 1503960366 4/14/2016 10460 6.74 6.74
## 4 1503960366 4/15/2016 9762 6.28 6.28
## 5 1503960366 4/16/2016 12669 8.16 8.16
## 6 1503960366 4/17/2016 9705 6.48 6.48
## LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.44 0.40
## 4 0 2.14 1.26
## 5 0 2.71 0.41
## 6 0 3.19 0.78
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 11 181 1218 1776
## 4 34 209 726 1745
## 5 10 221 773 1863
## 6 20 164 539 1728
str(daily_activity)
## 'data.frame': 940 obs. of 15 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityDate : chr "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
## $ TotalSteps : int 13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
## $ TotalDistance : num 8.5 6.97 6.74 6.28 8.16 ...
## $ TrackerDistance : num 8.5 6.97 6.74 6.28 8.16 ...
## $ LoggedActivitiesDistance: num 0 0 0 0 0 0 0 0 0 0 ...
## $ VeryActiveDistance : num 1.88 1.57 2.44 2.14 2.71 ...
## $ ModeratelyActiveDistance: num 0.55 0.69 0.4 1.26 0.41 ...
## $ LightActiveDistance : num 6.06 4.71 3.91 2.83 5.04 ...
## $ SedentaryActiveDistance : num 0 0 0 0 0 0 0 0 0 0 ...
## $ VeryActiveMinutes : int 25 21 30 29 36 38 42 50 28 19 ...
## $ FairlyActiveMinutes : int 13 19 11 34 10 20 16 31 12 8 ...
## $ LightlyActiveMinutes : int 328 217 181 209 221 164 233 264 205 211 ...
## $ SedentaryMinutes : int 728 776 1218 726 773 539 1149 775 818 838 ...
## $ Calories : int 1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...
Now that we got to know more about our data structures we will process them to look for any errors and inconsistencies.
Before we continue with our cleaning we want to make sure how many unique users are per data frame.
n_unique(daily_steps$Id)
## [1] 33
n_unique(daily_intensities$Id)
## [1] 33
n_unique(daily_calories$Id)
## [1] 33
n_unique(hourly_steps$Id)
## [1] 33
n_unique(daily_activity$Id)
## [1] 33
We will now look for any duplicates
sum(duplicated(daily_steps))
## [1] 0
sum(duplicated(daily_calories))
## [1] 0
sum(duplicated(daily_intensities))
## [1] 0
sum(duplicated(hourly_steps))
## [1] 0
sum(duplicated(daily_activity))
## [1] 0
We want to ensure that column names are using right syntax and same format in all datasets since we will merge them later on. We are changing the format of all columns to lower case.
clean_names(daily_steps)
daily_steps<- rename_with(daily_steps, tolower)
clean_names(daily_calories)
daily_calories <- rename_with(daily_calories, tolower)
clean_names(daily_intensities)
daily_intensities<- rename_with(daily_intensities, tolower)
clean_names(hourly_steps)
hourly_steps <- rename_with(hourly_steps, tolower)
clean_names(daily_activity)
daily_activity <- rename_with(daily_activity, tolower)
products <- rename_with(products, tolower)
Make sure the column names are consistent across the files used and check date format.
hourly_steps <- hourly_steps %>%
rename(date_time = activityhour) %>%
mutate(date_time = as.POSIXct(date_time, format = "%m/%d/%Y %I:%M:%S %p"))
head(hourly_steps)
## id date_time steptotal
## 1 1503960366 2016-04-12 00:00:00 373
## 2 1503960366 2016-04-12 01:00:00 160
## 3 1503960366 2016-04-12 02:00:00 151
## 4 1503960366 2016-04-12 03:00:00 0
## 5 1503960366 2016-04-12 04:00:00 0
## 6 1503960366 2016-04-12 05:00:00 0
We will merge daily_intensities and daily_steps with daily_calories to see correlation between variables by using id as their primary keys.
user_dailyx <- merge(daily_intensities,daily_calories ,by =c ("id","activityday"))
user_dailyx$activityday <- as.Date(user_dailyx$activityday, format = "%m/%d/%Y")
glimpse(user_dailyx)
## Rows: 940
## Columns: 11
## $ id <dbl> 1503960366, 1503960366, 1503960366, 150396036…
## $ activityday <date> 2016-04-12, 2016-04-13, 2016-04-14, 2016-04-…
## $ sedentaryminutes <int> 728, 776, 1218, 726, 773, 539, 1149, 775, 818…
## $ lightlyactiveminutes <int> 328, 217, 181, 209, 221, 164, 233, 264, 205, …
## $ fairlyactiveminutes <int> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, 21…
## $ veryactiveminutes <int> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66, 4…
## $ sedentaryactivedistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ lightactivedistance <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5.0…
## $ moderatelyactivedistance <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1.3…
## $ veryactivedistance <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3.5…
## $ calories <int> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 203…
user_dailyy <- merge(daily_steps ,daily_calories ,by =c ("id","activityday"))
user_dailyy$activityday <- as.Date(user_dailyy$activityday, format = "%m/%d/%Y")
glimpse(user_dailyy)
## Rows: 940
## Columns: 4
## $ id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1503960366…
## $ activityday <date> 2016-04-12, 2016-04-13, 2016-04-14, 2016-04-15, 2016-04-1…
## $ steptotal <int> 13162, 10735, 10460, 9762, 12669, 9705, 13019, 15506, 1054…
## $ calories <int> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 2035, 1786, 1775…
user_dailyz <- merge(daily_intensities ,daily_steps ,by =c ("id","activityday"))
user_dailyz$activityday <- as.Date(user_dailyz$activityday, format = "%m/%d/%Y")
glimpse(user_dailyz)
## Rows: 940
## Columns: 11
## $ id <dbl> 1503960366, 1503960366, 1503960366, 150396036…
## $ activityday <date> 2016-04-12, 2016-04-13, 2016-04-14, 2016-04-…
## $ sedentaryminutes <int> 728, 776, 1218, 726, 773, 539, 1149, 775, 818…
## $ lightlyactiveminutes <int> 328, 217, 181, 209, 221, 164, 233, 264, 205, …
## $ fairlyactiveminutes <int> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, 21…
## $ veryactiveminutes <int> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66, 4…
## $ sedentaryactivedistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ lightactivedistance <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5.0…
## $ moderatelyactivedistance <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1.3…
## $ veryactivedistance <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3.5…
## $ steptotal <int> 13162, 10735, 10460, 9762, 12669, 9705, 13019…
Bellabeat’s mission is deeply rooted in empowering women through data-driven insights. To effectively support Bellabeat’s mission and address our business objectives, it is imperative to harness our own comprehensive tracking data for in-depth analysis. The data sets we’ve employed thus far have limitations, primarily their small sample size and the absence of user demographic information. As our primary target demographic comprises young and adult women, it is crucial to persist in uncovering actionable trends within our data sets. This ongoing pursuit of insights will enable us to craft a focused and effective marketing strategy that resonates with our core audience, ensuring that we continue to serve and empower women in their health and wellness journeys.
That being said, after our analysis we have found different trends that may help our online campaign and improve Bellabeat SPRING
On our analysis we didn’t just check trends on daily users habits we also realized that 88% of the users use their device on a daily basis and that 50% of the users wear the device all time the day they used it. We can continue promote Bellabeat’s products features: