The Mission Statement

Bellabeat is a high-tech manufacturer of health-focused products for women. It is a successful small company with the potential to become a larger player in the global smart device market. Co-founders of Bellabeat believe that analyzing smart device fitness data could help unlock new growth new growth opportunities for the company. You have been asked to focus on one of Bellabeat’s products and analyze smart device data to gain insight into how consumers are using their smart devices. The insights you discover will then help guide marketing strategy for the company. You will present your analysis to the Bellabeat executive team along with your high-level recommendations for Bellabeat’s marketing strategy.

Phase 1: Ask

In this phase I look into the business objective and try to ask the right questions.
Key Objectives:

  1. Identify some of the trends in the smart device usage among women

  2. Consider the stakeholders

  3. Depending on the trends come up with high level marketing strategy for Bellabeat

Phase 2: Prepare

In this phase I prepare the date for the further exploration.

1. Info about the data set

The data set generated by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016-05.12.2016. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring.

Source at: Kaggle/FitBit Fitness Tracker Data

2. R set up for further analysis by installing packages

#Loading Packages
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(dplyr)
library(ggplot2)
library(tidyr)

3. Importing the data set to R

In R cloud I uploaded the data set to the cloud in working directory. See the image below:

daily_activity <- read.csv("fitabase_data/dailyActivity_merged.csv")
day_sleep <- read.csv("fitabase_data/sleepDay_merged.csv")
hourly_steps <- read.csv("fitabase_data/hourlySteps_merged.csv")

After examining the data, I decided to look closely into the data on daily activity and sleep of the users.

4. Let look at the overview and the summary of the data set

head(daily_activity)
##           Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366    4/12/2016      13162          8.50            8.50
## 2 1503960366    4/13/2016      10735          6.97            6.97
## 3 1503960366    4/14/2016      10460          6.74            6.74
## 4 1503960366    4/15/2016       9762          6.28            6.28
## 5 1503960366    4/16/2016      12669          8.16            8.16
## 6 1503960366    4/17/2016       9705          6.48            6.48
##   LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1                        0               1.88                     0.55
## 2                        0               1.57                     0.69
## 3                        0               2.44                     0.40
## 4                        0               2.14                     1.26
## 5                        0               2.71                     0.41
## 6                        0               3.19                     0.78
##   LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1                6.06                       0                25
## 2                4.71                       0                21
## 3                3.91                       0                30
## 4                2.83                       0                29
## 5                5.04                       0                36
## 6                2.51                       0                38
##   FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1                  13                  328              728     1985
## 2                  19                  217              776     1797
## 3                  11                  181             1218     1776
## 4                  34                  209              726     1745
## 5                  10                  221              773     1863
## 6                  20                  164              539     1728
head(day_sleep)
##           Id              SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 4/12/2016 12:00:00 AM                 1                327
## 2 1503960366 4/13/2016 12:00:00 AM                 2                384
## 3 1503960366 4/15/2016 12:00:00 AM                 1                412
## 4 1503960366 4/16/2016 12:00:00 AM                 2                340
## 5 1503960366 4/17/2016 12:00:00 AM                 1                700
## 6 1503960366 4/19/2016 12:00:00 AM                 1                304
##   TotalTimeInBed
## 1            346
## 2            407
## 3            442
## 4            367
## 5            712
## 6            320
unique(daily_activity$Id)
##  [1] 1503960366 1624580081 1644430081 1844505072 1927972279 2022484408
##  [7] 2026352035 2320127002 2347167796 2873212765 3372868164 3977333714
## [13] 4020332650 4057192912 4319703577 4388161847 4445114986 4558609924
## [19] 4702921684 5553957443 5577150313 6117666160 6290855005 6775888955
## [25] 6962181067 7007744171 7086361926 8053475328 8253242879 8378563200
## [31] 8583815059 8792009665 8877689391
n_distinct(day_sleep$Id)
## [1] 24

Phase 3: Prosess

In this phase I process the data to make it ready for the next phase of analysis.

1. Fixing date formatting by adding a new column

# Further on I want to merge daily activity data frame with daily sleep data frame. Thus, I am making sure that the date format of these two data frames are the same
daily_activity$ActivityDate=as.POSIXct(daily_activity$ActivityDate, format="%m/%d/%Y", tz=Sys.timezone())
daily_activity$date <- format(daily_activity$ActivityDate, format = "%m/%d/%y")

day_sleep$SleepDay=as.POSIXct(day_sleep$SleepDay, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
day_sleep$date <- format(day_sleep$SleepDay, format = "%m/%d/%y")

# Now I want to check the type of dates and IDs
class(day_sleep$date)
## [1] "character"
class(day_sleep$Id)
## [1] "numeric"
class(daily_activity$date)
## [1] "character"
class(daily_activity$Id)
## [1] "numeric"

2. Merging data frames

# Merging these two data frames by Id and date matching
daily_activity_merged <- merge(day_sleep, daily_activity, by=c('Id', 'date'))

glimpse(daily_activity_merged)
## Rows: 413
## Columns: 20
## $ Id                       <dbl> 1503960366, 1503960366, 1503960366, 150396036…
## $ date                     <chr> "04/12/16", "04/13/16", "04/15/16", "04/16/16…
## $ SleepDay                 <dttm> 2016-04-12, 2016-04-13, 2016-04-15, 2016-04-…
## $ TotalSleepRecords        <int> 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ TotalMinutesAsleep       <int> 327, 384, 412, 340, 700, 304, 360, 325, 361, …
## $ TotalTimeInBed           <int> 346, 407, 442, 367, 712, 320, 377, 364, 384, …
## $ ActivityDate             <dttm> 2016-04-12, 2016-04-13, 2016-04-15, 2016-04-…
## $ TotalSteps               <int> 13162, 10735, 9762, 12669, 9705, 15506, 10544…
## $ TotalDistance            <dbl> 8.50, 6.97, 6.28, 8.16, 6.48, 9.88, 6.68, 6.3…
## $ TrackerDistance          <dbl> 8.50, 6.97, 6.28, 8.16, 6.48, 9.88, 6.68, 6.3…
## $ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveDistance       <dbl> 1.88, 1.57, 2.14, 2.71, 3.19, 3.53, 1.96, 1.3…
## $ ModeratelyActiveDistance <dbl> 0.55, 0.69, 1.26, 0.41, 0.78, 1.32, 0.48, 0.3…
## $ LightActiveDistance      <dbl> 6.06, 4.71, 2.83, 5.04, 2.51, 5.03, 4.24, 4.6…
## $ SedentaryActiveDistance  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveMinutes        <int> 25, 21, 29, 36, 38, 50, 28, 19, 41, 39, 73, 3…
## $ FairlyActiveMinutes      <int> 13, 19, 34, 10, 20, 31, 12, 8, 21, 5, 14, 23,…
## $ LightlyActiveMinutes     <int> 328, 217, 209, 221, 164, 264, 205, 211, 262, …
## $ SedentaryMinutes         <int> 728, 776, 726, 773, 539, 775, 818, 838, 732, …
## $ Calories                 <int> 1985, 1797, 1745, 1863, 1728, 2035, 1786, 177…
n_distinct(daily_activity_merged$Id)
## [1] 24

One can see now the number of observations is 413 and 24 unique IDs

Phase 4 & 5: Analyze & Share

In these phases I analyze the date and share my findings. The primary goal of data analysis is to find the relationships, trends, and patterns that will help me solve the business problem more accurately. Meanwhile share phase includes sharing the results with help of visualizations.

1. Asleep time vs time in bed relationship

ggplot(data = daily_activity_merged)+
  geom_point(mapping = aes(x= TotalTimeInBed, y = TotalMinutesAsleep), color = "#FE8F77")+
  labs(title = "Total Minutes in Bed x Total Minutes Asleep", x = "Time in bed", y = "Total minutes asleep")+
  theme(plot.title = element_text(color = "#FE8F77"))

cor(x= daily_activity_merged$TotalTimeInBed, y = daily_activity_merged$TotalMinutesAsleep)
## [1] 0.9304575
  • Outcome: We see a linear relationship between minutes spent in bed and the total minutes asleep, with correlation coefficient 0.93.

  • Conclusion: According to Harvard Health Publishing more than 60% of women fell short of daily recommended 8 hours of sleep. A bedtime notification depending on the individual need to be in bed in order to get certain hours of sleep is a good idea. Bellabeat gadget can send an individually tailored message such as: ‘If you want to get 8 hours of sleep and wake up at 7 A.M. tomorrow we recommend you go to bed in 30 mins.’

2. Activity vs calories burnt per day relationship

# I want to select a target(calories vs steps) from the data for a specific user 
daily_activity_merged %>% 
  select(Id, Calories, TotalSteps, date) %>% 
  filter(Calories > 4000, TotalSteps > 15000)
##           Id Calories TotalSteps     date
## 1 4388161847     4022      22770 05/07/16
## 2 5577150313     4392      15764 04/24/16
## 3 6117666160     4900      19542 04/21/16
## 4 8378563200     4236      15148 04/21/16

I will pick the user Id 8378563200 with a target of 4236 calories a day.

ggplot(data = daily_activity_merged)+
  geom_smooth(mapping = aes(x = TotalSteps, y = Calories), color = "#FEE9EE")+
  geom_point(mapping = aes(x = TotalSteps, y = Calories), color = "#FE8F77")+
  geom_point(data = data.frame(x = 15148, y = 4236), aes(x = x, y = y), shape = 17, color = "black",  size = 3)+ 
  annotate("text", x = 15000, y = 4000, label = "Target", color  ="black" )+
  labs(title = "Steps x Calories")+
  theme(plot.title = element_text(color = "#FE8F77"))
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

  • Outcome: The plot shows us the relationship of activity level (measured in number of steps) vs calorie burnt in each day. I also handpicked a target goal for the user ID8378563200.

  • Conclusion: Active day helps with calorie burn. Bellabeat can help women to set up a target calorie burn per day and send individually tailored reminder how many steps the user needs to reach the goal. In the case of user ID8378563200, she needs 15148 steps to reach calorie burn target of 4236 a day.

3. Activity intensity throughout the day

I decided to use the number of steps throughout the day as activity intensity. I will use the data frame ‘hourly_steps’.

# First I need to format the date and create a new column with hours data
hourly_steps$Hours <- strptime(hourly_steps$ActivityHour, format = "%m/%d/%Y %I:%M:%S %p")
hourly_steps$DayHours <- hour(hourly_steps$Hour)  
# Now I create a new data frame with average number of steps in each hour
hourly_activity <- hourly_steps %>% 
  group_by(DayHours) %>% 
  summarize(mean_steps = round(mean(StepTotal), 0))
# Let's finally plot our bar chart 
hourly_activity$DayHours <- factor(hourly_activity$DayHours)
ggplot(data = hourly_activity, aes(x = DayHours, y = mean_steps)) +
  geom_bar(stat = "identity", fill = "#FE8F77") +
  labs(title = "Hourly Activity", x = "24 Hours", y = "Average steps") +
  theme(plot.title = element_text(color = "#FE8F77"))

  • Outcome: The graph shows that the daily activity has a bell curve, activity mostly peaking during the day and after work hours.

  • Conclusion: Although one can see that at 15:00 the activity is relatively low. So Bellabeat can consider that fact while endorsing an active lifestyle to its users. Extra encouragement and reminder to move at around 15:00 is recommended.

4. Sedentary time by weekdays

I want to see if in certain days of the week the users are more sedentary. I will use the ‘daily_activity_merged’ data frame that contains the dates and sedentary minute. I already formatted the date above (Phase 3, Section 1).

# Here I am extracting weekdays from the Date column
daily_activity_merged$Weekday <- weekdays(daily_activity_merged$ActivityDate)

# Now I create a new data frame with average sedentary minutes per weekdays
daily_sedentary_2 <- daily_activity_merged %>% 
  group_by(Weekday) %>% 
  summarize(MeanSedentaryMinutes = mean(SedentaryMinutes), StandardDeviation = sd(SedentaryMinutes))  

# It's time to plot a bar chart
ggplot(data = daily_sedentary_2, aes(x = Weekday, y = MeanSedentaryMinutes)) +
  geom_point() +  # Plot the mean as points
  geom_errorbar(aes(ymin = MeanSedentaryMinutes - StandardDeviation, ymax = MeanSedentaryMinutes + StandardDeviation), width = 0.2, color = "#FE8F77") +
  labs(title = "Weekly Sedentary Time", x = "Weekdays", y = "Sedentary Minutes") +
  theme(plot.title = element_text(color = "#FE8F77"))+
  scale_x_discrete(limits = c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"))

  • Outcome: The plot shows the mean sedentary minutes in the middle as black dots. And the standard deviation is quiet high, especially on weekends.

  • Conclusion: My expectation was that the user would be more sedentary on workdays than on weekends. Since it is not the case Bellabeat can rather focus on individual habits of the users in this sense.

Phase 6: Act

In the sixth and final phase, “Act,” I will leverage the analyses conducted thus far to formulate actionable recommendations for stakeholders. These recommendations aim to address the lingering business challenges and empower stakeholders to make well-informed decisions

Summarized Outcomes and Conclusions:

1. Sleep Analysis:

  • Outcome: Linear relationship found between minutes in bed and total minutes asleep (correlation coefficient: 0.93).
  • Conclusion: Over 60% of women don’t achieve the recommended 8 hours of sleep. Suggest implementing bedtime notifications tailored to individual sleep needs.

2. Activity vs. Calorie Burn:

  • Outcome: Activity level (steps) vs. calorie burn relationship analyzed for 24 different users, with a target goal for a specific user.
  • Conclusion: Active days lead to increased calorie burn. Recommend setting daily calorie burn goals and providing tailored step reminders to achieve these goals.

3. Daily Activity Pattern:

  • Outcome: Daily activity follows a bell curve, peaking during the day and post-work hours.
  • Conclusion: Note a dip in activity around 15:00. Suggest extra encouragement and reminders for movement around this time to promote an active lifestyle.

Business Recommendations:

Based on the above insights and conclusions, here are high-level marketing strategy recommendations for Bellabeat:

1. Product Features and Notifications:

  • Develop a feature that provides personalized bedtime notifications to help users achieve the recommended 8 hours of sleep. Tailor notifications to individual sleep needs and wake-up times.

2. Fitness Goal Setting and Reminders:

  • Introduce a feature allowing users to set daily calorie burn goals and track their progress. Send tailored reminders to encourage users to achieve their goals, specifying the number of steps needed.

3. Enhanced User Engagement:

  • Optimize user engagement by strategically sending reminders for physical activity, especially during the afternoon slump around 15:00. Encourage movement and activity during this period.

These recommendations align with Bellabeat’s mission to empower women through health-focused smart devices and capitalize on the potential for growth in the global smart device market.