Bellabeat is a high-tech manufacturer of health-focused products for women. It is a successful small company with the potential to become a larger player in the global smart device market. Co-founders of Bellabeat believe that analyzing smart device fitness data could help unlock new growth new growth opportunities for the company. You have been asked to focus on one of Bellabeat’s products and analyze smart device data to gain insight into how consumers are using their smart devices. The insights you discover will then help guide marketing strategy for the company. You will present your analysis to the Bellabeat executive team along with your high-level recommendations for Bellabeat’s marketing strategy.
In this phase I look into the business objective and try to ask
the right questions.
Key Objectives:
Identify some of the trends in the smart device usage among women
Consider the stakeholders
Depending on the trends come up with high level marketing strategy for Bellabeat
In this phase I prepare the date for the further exploration.
The data set generated by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016-05.12.2016. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring.
Source at: Kaggle/FitBit Fitness Tracker Data
#Loading Packages
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(dplyr)
library(ggplot2)
library(tidyr)
In R cloud I uploaded the data set to the cloud in working directory. See the image below:
daily_activity <- read.csv("fitabase_data/dailyActivity_merged.csv")
day_sleep <- read.csv("fitabase_data/sleepDay_merged.csv")
hourly_steps <- read.csv("fitabase_data/hourlySteps_merged.csv")
After examining the data, I decided to look closely into the data on daily activity and sleep of the users.
head(daily_activity)
## Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366 4/12/2016 13162 8.50 8.50
## 2 1503960366 4/13/2016 10735 6.97 6.97
## 3 1503960366 4/14/2016 10460 6.74 6.74
## 4 1503960366 4/15/2016 9762 6.28 6.28
## 5 1503960366 4/16/2016 12669 8.16 8.16
## 6 1503960366 4/17/2016 9705 6.48 6.48
## LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.44 0.40
## 4 0 2.14 1.26
## 5 0 2.71 0.41
## 6 0 3.19 0.78
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 11 181 1218 1776
## 4 34 209 726 1745
## 5 10 221 773 1863
## 6 20 164 539 1728
head(day_sleep)
## Id SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 4/12/2016 12:00:00 AM 1 327
## 2 1503960366 4/13/2016 12:00:00 AM 2 384
## 3 1503960366 4/15/2016 12:00:00 AM 1 412
## 4 1503960366 4/16/2016 12:00:00 AM 2 340
## 5 1503960366 4/17/2016 12:00:00 AM 1 700
## 6 1503960366 4/19/2016 12:00:00 AM 1 304
## TotalTimeInBed
## 1 346
## 2 407
## 3 442
## 4 367
## 5 712
## 6 320
unique(daily_activity$Id)
## [1] 1503960366 1624580081 1644430081 1844505072 1927972279 2022484408
## [7] 2026352035 2320127002 2347167796 2873212765 3372868164 3977333714
## [13] 4020332650 4057192912 4319703577 4388161847 4445114986 4558609924
## [19] 4702921684 5553957443 5577150313 6117666160 6290855005 6775888955
## [25] 6962181067 7007744171 7086361926 8053475328 8253242879 8378563200
## [31] 8583815059 8792009665 8877689391
n_distinct(day_sleep$Id)
## [1] 24
In this phase I process the data to make it ready for the next phase of analysis.
# Further on I want to merge daily activity data frame with daily sleep data frame. Thus, I am making sure that the date format of these two data frames are the same
daily_activity$ActivityDate=as.POSIXct(daily_activity$ActivityDate, format="%m/%d/%Y", tz=Sys.timezone())
daily_activity$date <- format(daily_activity$ActivityDate, format = "%m/%d/%y")
day_sleep$SleepDay=as.POSIXct(day_sleep$SleepDay, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
day_sleep$date <- format(day_sleep$SleepDay, format = "%m/%d/%y")
# Now I want to check the type of dates and IDs
class(day_sleep$date)
## [1] "character"
class(day_sleep$Id)
## [1] "numeric"
class(daily_activity$date)
## [1] "character"
class(daily_activity$Id)
## [1] "numeric"
# Merging these two data frames by Id and date matching
daily_activity_merged <- merge(day_sleep, daily_activity, by=c('Id', 'date'))
glimpse(daily_activity_merged)
## Rows: 413
## Columns: 20
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 150396036…
## $ date <chr> "04/12/16", "04/13/16", "04/15/16", "04/16/16…
## $ SleepDay <dttm> 2016-04-12, 2016-04-13, 2016-04-15, 2016-04-…
## $ TotalSleepRecords <int> 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ TotalMinutesAsleep <int> 327, 384, 412, 340, 700, 304, 360, 325, 361, …
## $ TotalTimeInBed <int> 346, 407, 442, 367, 712, 320, 377, 364, 384, …
## $ ActivityDate <dttm> 2016-04-12, 2016-04-13, 2016-04-15, 2016-04-…
## $ TotalSteps <int> 13162, 10735, 9762, 12669, 9705, 15506, 10544…
## $ TotalDistance <dbl> 8.50, 6.97, 6.28, 8.16, 6.48, 9.88, 6.68, 6.3…
## $ TrackerDistance <dbl> 8.50, 6.97, 6.28, 8.16, 6.48, 9.88, 6.68, 6.3…
## $ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveDistance <dbl> 1.88, 1.57, 2.14, 2.71, 3.19, 3.53, 1.96, 1.3…
## $ ModeratelyActiveDistance <dbl> 0.55, 0.69, 1.26, 0.41, 0.78, 1.32, 0.48, 0.3…
## $ LightActiveDistance <dbl> 6.06, 4.71, 2.83, 5.04, 2.51, 5.03, 4.24, 4.6…
## $ SedentaryActiveDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveMinutes <int> 25, 21, 29, 36, 38, 50, 28, 19, 41, 39, 73, 3…
## $ FairlyActiveMinutes <int> 13, 19, 34, 10, 20, 31, 12, 8, 21, 5, 14, 23,…
## $ LightlyActiveMinutes <int> 328, 217, 209, 221, 164, 264, 205, 211, 262, …
## $ SedentaryMinutes <int> 728, 776, 726, 773, 539, 775, 818, 838, 732, …
## $ Calories <int> 1985, 1797, 1745, 1863, 1728, 2035, 1786, 177…
n_distinct(daily_activity_merged$Id)
## [1] 24
One can see now the number of observations is 413 and 24 unique IDs
In the sixth and final phase, “Act,” I will leverage the analyses conducted thus far to formulate actionable recommendations for stakeholders. These recommendations aim to address the lingering business challenges and empower stakeholders to make well-informed decisions
Based on the above insights and conclusions, here are high-level marketing strategy recommendations for Bellabeat:
These recommendations align with Bellabeat’s mission to empower women through health-focused smart devices and capitalize on the potential for growth in the global smart device market.