A new growth opportunity of Bellabeat

Summary

This data analytics report is the second case study of Google professional analytics course. The purpose of the analysis is to show a new growth opportunity for Bellabeat, a high-tech manufacturer of health-focused products for women. Based on the analysis of the existing data of smart health device, new insights are found at the section of “Act” through passing six basic steps, “Ask”, “Prepare”,“Process”, “Analyze”, “Share”, and “Act”.

Objectives

  • An analytically report
  • Included:
    • Process of an analysis
    • Visualization
    • Insights

1.ASK

This section is to clarify the basic requirements and business questions in related with the analysis.

1.1.Business tasks

According to Ms.Urška Sršen, co-founder and Chief Creative Officer of Bellabeat, the business tasks are following:

  1. To analyze smart device usage data in order to gain insight into how consumers use non-Bellabeat smart devices.
  2. Which Bellabeat product can be applied these insight for the potential business opportunity.

1.2.Question

  1. What are some trends in smart device usage?
  2. How could these trends apply to Bellabeat customers?
  3. How could these trends help influence Bellabeat marketing strategy?

1.3.Stakeholders

  • Urška Sršen: Bellabeat’s cofounder and Chief Creative Officer
  • Sando Mur: Mathematician and Bellabeat’s cofounder; key member of the Bellabeat executive team
  • Bellabeat marketing analytics team: A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat’s marketing strategy.

1.4.Products

  1. Bellabeat app: The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. This data can help users better understand their current habits and make healthy decisions. The Bellabeat app connects to their line of smart wellness products.
  2. Leaf: Bellabeat’s classic wellness tracker can be worn as a bracelet, necklace, or clip. The Leaf tracker connects to the Bellabeat app to track activity, sleep, and stress.
  3. Time: This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress. The Time watch connects to the Bellabeat app to provide you with insights into your daily wellness.
  4. Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day. The Spring bottle connects to the Bellabeat app to track your hydration levels.
  5. Bellabeat membership: Bellabeat also offers a subscription-based membership program for users. Membership gives users 24/7 access to fully personalized guidance on nutrition, activity, sleep, health and beauty, and mindfulness based on their lifestyle and goals.

2.Prepare

In order to start analysis, the data of smart device must be prepared. According to Sršen, using public data that explores smart device users’ daily habits is recommended.

2.1.Data sets

  • FitBit Fitness TrackerData (CC0: Public Domain, a data set made available through Mobius): This Kaggle data set contains personal fitness tracker from thirty fitbit users
  • The data sets about “daily activity”, “sleep day”, “hourly calories”, and “heart rate” that can be used to explore users’ habits.
  • If necessary, another data will be used to address the limitation for the analysis.

2.2.COLLECT DATA

For the preparation to analysis, the following actions are taken.

  1. Download the data from the above designated link and store it.

  2. Identify how it’s organized.

  3. Sort and filter the data.

  4. Determine the credibility of the data

2.3.Installation packages

To implement the data preparation, the following packages of tools are installed.

library(tidyverse) #helps wrangle data
library(lubridate) #helps wrangle date attributes
library(ggplot2) #helps visualize data
library(cowplot) #helps visualize data

#For data cleaning, following packages are installed.
library(here)
library(janitor)
library(skimr)
library(dplyr)
getwd() #displays your working directory
#Data location: "C:/Users/satos/Documents/project/case-2dataset"

2.4.Identify

Identify how the data sets are organized. After the download the data sets, We confirmed the data type is csv files which is collected by Fitbit.Those shall be uploaded as below.

activity <- read_csv("dailyActivity_merged.csv")
sleep <- read_csv("sleepDay_merged.csv")
heartrate <- read_csv("heartrate_seconds_merged.csv")
hourly_calories <- read_csv("hourlyCalories_merged.csv")

Preview the each data frame.

glimpse(activity) #Rows: 940, Columns: 15, ActivityDate <chr>
glimpse(sleep) #Rows: 940, Columns: 3, ActivityDay <chr>
glimpse(heartrate) #Rows: 2,483,658, Columns: 3, Time <chr>
glimpse(hourly_calories) #Rows: 22,099, Columns: 3 ActivityHour <chr>

According to previewing the data, activity data has wide format and data of heart rate and hourly calories have long format. In addition, to integrate multiple data in one, the following actions will be required in the next step, “PROCESS”.

  1. Unify the chronological column as “date”.

  2. Rename “StepTotal” to “TotalStep”.

  3. Split the time data to year, month, day, hour, minute, and seconds.

  4. Convert the data type of “date” to as.POSIXct from character.

2.5.Credibility and integrity of data

According to the data with 30 users and only two months of collected data, the credibility of the data is not high. For example, the data was taken in April and May. In general, the outside temperature of the season is relatively comfort to do exercise. The tendencies might be different from other months. In addition, numbers of sleeping data is 14. In terms of the numbers of sampling data, it is limited for the analysis.

3.PROCESS

For the process the data for analysis, the following tasks will be implemented.

  1. Check the data for errors by Rstudio.

  2. Transform the data effectively.

  3. Document the cleaning process.

3.1.Unify the same types of data in each data sets

Rename some column names such as renaming “date” from “ActivityDate” or “ActivityHour”. In addition, make a consequence name of columns with “clean_names” function.

#Rename to common names.
activity <- activity %>%  rename(date = ActivityDate)
heartrate <- heartrate %>%  rename(date_time= Time)
heartrate <- heartrate %>%  rename(heart_rate = Value)
sleep <- sleep %>% rename(date = SleepDay )
hourly_calories <- hourly_calories %>% rename(id = Id)
hourly_calories <- hourly_calories %>% rename(date_time = ActivityHour)
hourly_calories <- hourly_calories %>% rename(hourly_calories = Calories)


#Use clean name function for readable.
activity <- clean_names(activity)
heartrate <- clean_names(heartrate)
sleep <- clean_names(sleep)
hourly_calories <- clean_names(hourly_calories)

Review if it changed.

colnames(activity)
colnames(heartrate)
colnames(sleep)
colnames(hourly_calories)
# [1] "id"                         "date"                       "total_steps"               
# [4] "total_distance"             "tracker_distance"           "logged_activities_distance"
# [7] "very_active_distance"       "moderately_active_distance" "light_active_distance"     
#[10] "sedentary_active_distance"  "very_active_minutes"        "fairly_active_minutes"     
#[13] "lightly_active_minutes"     "sedentary_minutes"          "calories"                  
#[1] "id"         "date_time"  "heart_rate"
#[1] "id"                   "date"                 "total_sleep_records"  "total_minutes_asleep"
#[5] "total_time_in_bed"   
#[1] "id"              "date_time"       "hourly_calories"

3.2.Split the time data to year, month, day, hour, minute, and seconds.

Before split the time data, confirm the forma on each date frame.

head(activity) #<date>, Year,Month, Day
tail(activity)
head(heartrate) #<S3: POSIXct>  Year,Month, Day, hour,minute,second, AM/PM
tail(heartrate) 
head(sleep) #<date>, Year,Month, Day
tail(sleep)
head(hourly_calories) #<S3: POSIXct>  Year,Month, Day, hour,minute,second, AM/PM
tail(hourly_calories)

The parts should be transformed are as follows:

  1. Transform the character vector to date vector.

  2. Split and add columns of day, month, and year.

3.2.1.Transform to date vector and split columns
#activity
activity <- activity %>%
  mutate(date = as_date(date, format = "%m/%d/%Y")) #The default format is yyyy-mm-dd

#Hourly calories
hourly_calories<- hourly_calories %>% 
  mutate(date_time = as.POSIXct(date_time,format ="%m/%d/%Y %I:%M:%S %p" , tz=Sys.timezone()))
#Split date and time
hourly_calories$time <- format(hourly_calories$date_time, format = "%H:%M:%S")
hourly_calories$date <- format(hourly_calories$date_time, format = "%m/%d/%y")
hourly_calories$date <- as.POSIXct(hourly_calories$date, format = "%m/%d/%y")
hourly_calories$date <- as_date(hourly_calories$date)

#Heart rate
heartrate <- heartrate %>% 
  mutate(date_time= as.POSIXct(date_time,format ="%m/%d/%Y %I:%M:%S %p" , tz=Sys.timezone()))
#Split date and time
heartrate$time <- format(heartrate$date_time, format = "%H:%M:%S")
heartrate$date <- format(heartrate$date_time, format = "%m/%d/%y")
heartrate$date <- as.POSIXct(heartrate$date, format = "%m/%d/%y")
heartrate$date <- as_date(heartrate$date)

#sleep
sleep <- sleep %>% mutate(date = as_date(date,format ="%m/%d/%Y %I:%M:%S %p")) #The default format is yyyy-mm-dd
head(activity)
## # A tibble: 6 × 15
##           id date       total_steps total_distance tracker_distance
##        <dbl> <date>           <dbl>          <dbl>            <dbl>
## 1 1503960366 2016-04-12       13162           8.5              8.5 
## 2 1503960366 2016-04-13       10735           6.97             6.97
## 3 1503960366 2016-04-14       10460           6.74             6.74
## 4 1503960366 2016-04-15        9762           6.28             6.28
## 5 1503960366 2016-04-16       12669           8.16             8.16
## 6 1503960366 2016-04-17        9705           6.48             6.48
## # ℹ 10 more variables: logged_activities_distance <dbl>,
## #   very_active_distance <dbl>, moderately_active_distance <dbl>,
## #   light_active_distance <dbl>, sedentary_active_distance <dbl>,
## #   very_active_minutes <dbl>, fairly_active_minutes <dbl>,
## #   lightly_active_minutes <dbl>, sedentary_minutes <dbl>, calories <dbl>
head(heartrate)
## # A tibble: 6 × 5
##           id date_time           heart_rate time     date      
##        <dbl> <dttm>                   <dbl> <chr>    <date>    
## 1 2022484408 2016-04-12 07:21:00         97 07:21:00 2016-04-12
## 2 2022484408 2016-04-12 07:21:05        102 07:21:05 2016-04-12
## 3 2022484408 2016-04-12 07:21:10        105 07:21:10 2016-04-12
## 4 2022484408 2016-04-12 07:21:20        103 07:21:20 2016-04-12
## 5 2022484408 2016-04-12 07:21:25        101 07:21:25 2016-04-12
## 6 2022484408 2016-04-12 07:22:05         95 07:22:05 2016-04-12
head(sleep)
## # A tibble: 6 × 5
##         id date       total_sleep_records total_minutes_asleep total_time_in_bed
##      <dbl> <date>                   <dbl>                <dbl>             <dbl>
## 1   1.50e9 2016-04-12                   1                  327               346
## 2   1.50e9 2016-04-13                   2                  384               407
## 3   1.50e9 2016-04-15                   1                  412               442
## 4   1.50e9 2016-04-16                   2                  340               367
## 5   1.50e9 2016-04-17                   1                  700               712
## 6   1.50e9 2016-04-19                   1                  304               320
head(hourly_calories)
## # A tibble: 6 × 5
##           id date_time           hourly_calories time     date      
##        <dbl> <dttm>                        <dbl> <chr>    <date>    
## 1 1503960366 2016-04-12 00:00:00              81 00:00:00 2016-04-12
## 2 1503960366 2016-04-12 01:00:00              61 01:00:00 2016-04-12
## 3 1503960366 2016-04-12 02:00:00              59 02:00:00 2016-04-12
## 4 1503960366 2016-04-12 03:00:00              47 03:00:00 2016-04-12
## 5 1503960366 2016-04-12 04:00:00              48 04:00:00 2016-04-12
## 6 1503960366 2016-04-12 05:00:00              48 05:00:00 2016-04-12

3.3.Eliminate unnecessary data, duplicates and NA.

Check if there are any duplicated data.

sum(duplicated(activity)) #0
sum(duplicated(sleep))  #3
sum(duplicated(heartrate)) #0
sum(duplicated(hourly_calories)) #0

We found duplicated data in the sleep file.

3.4.Numbers of users

We verify the numbers of users on each data set.

n_unique(activity$id) #33
n_unique(sleep$id) #24
n_unique(heartrate$id) #14
n_unique(hourly_calories$id) #33

Accordingly, the data set of “heartrate” contains small numbers of users. To keep user numbers as many as possible, we may not use the data of “heartrate” in this analysis.

3.5.Drop off NA data

#Activity
activity <- activity %>%
  distinct() %>%
  drop_na()

#Heartrate
heartrate <- heartrate %>% 
  distinct() %>%
  drop_na()

#sleep
sleep <- sleep %>%
  distinct() %>%
  drop_na()

#hourly steps
hourly_calories <- hourly_calories %>%
  distinct() %>%
  drop_na()

Verify if all unnecessary data is removed.

## [1] 0
## [1] 0
## [1] 0
## [1] 0

In addition, check cleaned four data.

head(activity)
head(heartrate)
head(sleep)
head(hourly_calories)

Accordingly, we verified that all data is cleaned.

3.6.Merge data sets

Firstly, we merge two data sets of “activity” and “sleep” to see any correlation between variables by using id and date as their primary keys.

#Merge two data sets of "activity" and "sleep", merged data is named as "activity_sleep"
activity_sleep <- merge(activity, sleep, by= c("id", "date"))
glimpse(activity_sleep)
#Confirm any duplicated data.
sum(duplicated(activity_sleep)) #[1] 0

Now, we confirmed the merged data set is cleaned.

4. ANALYZE

At this part, through organizing and perform calculations, the trends of the FitBit users will be revealed to meet BellaBeat’s marketing strategy. The analysis will be proceeded as following steps:

  1. Knowing the meaning of data variables.

  2. Perform calculations.

  3. Identify trends and relationships.

4.1.Knowing the meaning of data variables.

To do analysis work, knowing the meaning of data variables is important.

colnames(activity_sleep)
#############################################################################################
#[1] "id"                         "date"                       "total_steps"               
#[4] "total_distance"             "tracker_distance"           "logged_activities_distance"
#[7] "very_active_distance"       "moderately_active_distance" "light_active_distance"     
#[10] "sedentary_active_distance"  "very_active_minutes"        "fairly_active_minutes"     
#[13] "lightly_active_minutes"     "sedentary_minutes"          "calories"                  
#[16] "total_sleep_records"        "total_minutes_asleep"       "total_time_in_bed"
#############################################################################################

According to the Centers for Disease Control and Prevention (CDC) defines the following activity levels in terms of steps per day. Reference: Tomwaltersfitness

df_level <- data.frame(
  activity_level = c("Very active", "Moderately active", "Fairly active", "Lightly active", "Sedentary active"),
  steps_per_day = c(">12000", "7500-12000", "5000-7499", "2500-4999","<2500")
)
print(df_level)
##      activity_level steps_per_day
## 1       Very active        >12000
## 2 Moderately active    7500-12000
## 3     Fairly active     5000-7499
## 4    Lightly active     2500-4999
## 5  Sedentary active         <2500
knitr::kable(df_level, caption = "Table with user type")
Table with user type
activity_level steps_per_day
Very active >12000
Moderately active 7500-12000
Fairly active 5000-7499
Lightly active 2500-4999
Sedentary active <2500

4.2.Perform calculations

To get dived into deeply about users of FitBit, calculate average steps, average calories, and average heart rate by users.

4.2.1.Average steps, calories, and heart rate

We calculate average values of steps, calories, and heart rate.

daily_average <- activity_sleep %>%
  group_by(id) %>%
  summarise(mean_daily_steps = mean(total_steps), mean_daily_calories = mean(calories), mean_minutes_asleep = mean(total_minutes_asleep))

head(daily_average)
## # A tibble: 6 × 4
##           id mean_daily_steps mean_daily_calories mean_minutes_asleep
##        <dbl>            <dbl>               <dbl>               <dbl>
## 1 1503960366           12406.               1872.                360.
## 2 1644430081            7968.               2978.                294 
## 3 1844505072            3477                1676.                652 
## 4 1927972279            1490                2316.                417 
## 5 2026352035            5619.               1541.                506.
## 6 2320127002            5079                1804                  61
4.2.2 User type (average steps)

Based on the average data above, users are classified by the daily average steps.

user_type <- daily_average %>%
  mutate(user_type = case_when(
    mean_daily_steps < 2500 ~ "sedentary",
    mean_daily_steps >= 2500 & mean_daily_steps < 5000 ~ "lightly active", 
    mean_daily_steps >= 5000 & mean_daily_steps < 7500 ~ "fairly active", 
    mean_daily_steps >= 7500 & mean_daily_steps < 12000 ~ "moderately active", 
    mean_daily_steps >= 12000 ~ "very active"
  ))

head(user_type)
## # A tibble: 6 × 5
##           id mean_daily_steps mean_daily_calories mean_minutes_asleep user_type 
##        <dbl>            <dbl>               <dbl>               <dbl> <chr>     
## 1 1503960366           12406.               1872.                360. very acti…
## 2 1644430081            7968.               2978.                294  moderatel…
## 3 1844505072            3477                1676.                652  lightly a…
## 4 1927972279            1490                2316.                417  sedentary 
## 5 2026352035            5619.               1541.                506. fairly ac…
## 6 2320127002            5079                1804                  61  fairly ac…
sum(is.na(user_type)) # 0
## [1] 0

Then, calculate how many user type exists in the data.

user_type_ratio <- user_type %>%
  group_by(user_type) %>%
  summarise(total = n()) %>%
  mutate(totals = sum(total)) %>%
  group_by(user_type) %>%
  summarise(total_ratio = total / totals) %>%
  mutate(labels = scales::percent(total_ratio))

user_type_ratio$user_type <- factor(user_type_ratio$user_type , levels = c("very active", "moderately active", "fairly active", "lightly active", "sedentary"))

head(user_type_ratio)
## # A tibble: 5 × 3
##   user_type         total_ratio labels
##   <fct>                   <dbl> <chr> 
## 1 fairly active          0.208  20.8% 
## 2 lightly active         0.167  16.7% 
## 3 moderately active      0.5    50.0% 
## 4 sedentary              0.0417 4.2%  
## 5 very active            0.0833 8.3%
4.2.3.Distribution of user type

Visualize the distribution of user type as follows.

user_type_ratio %>%
  ggplot(aes(x="",y=total_ratio, fill=user_type)) +
  geom_bar(stat = "identity", width = 1)+
  coord_polar("y", start=0)+  #Convert the plot to polar coorinates
  theme_minimal()+
  theme(axis.title.x= element_blank(),
        axis.title.y = element_blank(),
        panel.border = element_blank(), 
        panel.grid = element_blank(), 
        axis.ticks = element_blank(),
        axis.text.x = element_blank(),
        plot.title = element_text(hjust = 0.5, size=14, face = "bold")) +
  scale_fill_manual(values = c("#66CDAA","#8EEBEC", "#ffd480","#ffa07a", "#e55451")) +
  geom_text(aes(label = labels),
            position = position_stack(vjust = 0.5, reverse = FALSE))+
  labs(title="Distribution of users", fill = "User type")

4.2.4 Sleeper type Categorize the type of sleeper.
df_asleep <- data.frame(
  sleeper_type = c("Long sleeper", "Normal sleeper", "Short sleeper"),
  sleeping_minutes = c(">540", "420-539", "<420")
)

knitr::kable(df_asleep, caption = "Table with sleeper type")
Table with sleeper type
sleeper_type sleeping_minutes
Long sleeper >540
Normal sleeper 420-539
Short sleeper <420
sleeper_type <- daily_average %>%
  mutate(sleeper_type = case_when(
    mean_minutes_asleep < 420 ~ "short sleeper",
    mean_minutes_asleep >= 420 & mean_minutes_asleep < 540 ~ "normal sleeper", 
    mean_minutes_asleep >= 540 ~ "long sleeper"
  ))

head(sleeper_type)
## # A tibble: 6 × 5
##         id mean_daily_steps mean_daily_calories mean_minutes_asleep sleeper_type
##      <dbl>            <dbl>               <dbl>               <dbl> <chr>       
## 1   1.50e9           12406.               1872.                360. short sleep…
## 2   1.64e9            7968.               2978.                294  short sleep…
## 3   1.84e9            3477                1676.                652  long sleeper
## 4   1.93e9            1490                2316.                417  short sleep…
## 5   2.03e9            5619.               1541.                506. normal slee…
## 6   2.32e9            5079                1804                  61  short sleep…

Then, calculate how many user type exists in the data.

sleeper_type_ratio <- sleeper_type %>%
  group_by(sleeper_type) %>%
  summarise(total = n()) %>%
  mutate(totals = sum(total)) %>%
  group_by(sleeper_type) %>%
  summarise(total_ratio = total / totals) %>%
  mutate(labels = scales::percent(total_ratio))

sleeper_type_ratio$sleeper_type <- factor(sleeper_type_ratio$sleeper_type , levels = c("short sleeper", "normal sleeper", "long sleeper"))
head(sleeper_type_ratio)
## # A tibble: 3 × 3
##   sleeper_type   total_ratio labels
##   <fct>                <dbl> <chr> 
## 1 long sleeper        0.0417 4%    
## 2 normal sleeper      0.417  42%   
## 3 short sleeper       0.542  54%
4.2.5.Distribution of user type.

Visualize the distribution of user type as follows.

sleeper_type_ratio %>%
  ggplot(aes(x="",y=total_ratio, fill=sleeper_type)) +
  geom_bar(stat = "identity", width = 1)+
  coord_polar("y", start=0)+
  theme_minimal()+
  theme(axis.title.x= element_blank(),
        axis.title.y = element_blank(),
        panel.border = element_blank(), 
        panel.grid = element_blank(), 
        axis.ticks = element_blank(),
        axis.text.x = element_blank(),
        plot.title = element_text(hjust = 0.5, size=14, face = "bold")) +
  scale_fill_manual(values = c("#66CDAA", "#ffd480", "#e55451")) +
  geom_text(aes(label = labels),
            position = position_stack(vjust = 0.5))+
  labs(title="Distribution of sleepers", fill = "Sleeper type")

5.SHARE

In this section, we visualize the detailed trends about habits of calorie consumption among the user with “moderately active” and “normal sleeper”.

5.2.Calories per weekday

#Mean values vs Median values
p4 <- ggplot(data = df_standard_user)+
  geom_bar(mapping = aes(x=weekday, y=calories), stat = "summary", fun = "mean") +
  annotate("rect", xmin = "Monday", xmax= "Sunday", ymin = 2200, ymax = 2400, alpha = .2, fill = "green")+
  labs(title = "Average calories per weekday", x= "Weekday", y = "Calorie") +
  theme(axis.text.x = element_text(angle = 30,vjust = 0.5, hjust = 1))
 

p5 <- ggplot(data = df_standard_user)+
  geom_bar(mapping = aes(x=weekday, y=calories), stat = "summary", fun = "median") +
  annotate("rect", xmin = "Monday", xmax= "Sunday", ymin = 2200, ymax = 2400, alpha = .2, fill = "green")+
  labs(title = "Median calories per weekday", x= "Weekday", y = "Calorie") +
  theme(axis.text.x = element_text(angle = 30,vjust = 0.5, hjust = 1))

plot_grid(p4,p5)

#2200 calories are the minimum values of calorie consumption for men according to the Dietary Guidelines for Americans.
#2400 calories are the maximum values of calorie consumption for women according to the Dietary Guidelines for Americans.

In general the median values are less sensitivity from the outliers. So, we apply the data of median values.

According to the bar graph of median values of calorie, the moderately active user with normal sleeping time tends to active intensively on Tuesday, Wednesday, and Saturday.

On the other hands, users are moderately active on Monday, Thursday, and Sunday.

So, next we wonder how long the moderately active with normal sleeping users take sleep during weekday?

5.3.Minutes asleep per weekday

# Mean minutes asleep vs Median minutes asleep 
p6 <- ggplot(data = df_standard_user)+
  geom_bar(mapping = aes(x=weekday, y=total_minutes_asleep), stat = "summary", fun = "mean") +
  annotate("rect", xmin = "Monday", xmax= "Sunday", ymin = 420, ymax = 539, alpha = .2, fill = "green")+
  labs(title = "Average minutes asleep per weekday", x= "Weekday", y = "Minute") +
  theme(axis.text.x = element_text(angle = 30,vjust = 0.5, hjust = 1))
 

p7 <- ggplot(data = df_standard_user)+
  geom_bar(mapping = aes(x=weekday, y=total_minutes_asleep), stat = "summary", fun = "median") +
  annotate("rect", xmin = "Monday", xmax= "Sunday", ymin = 420, ymax = 539, alpha = .2, fill = "green")+
  labs(title = "Median minutes asleep per weekday", x= "Weekday", y = "Minute") +
  theme(axis.text.x = element_text(angle = 30,vjust = 0.5, hjust = 1))

plot_grid(p6,p7)

#Green range indicates the sleeping time with more than 420 mins(7 hours) but below 540mins(9 hours.) 

Same as the graph of calorie consumption, we apply median values for minutes asleep through a week.

According to the graph of the median minutes asleep per weekday, the users take a rest more on Sunday, Saturday, and Wednesday.

The shortest sleeping day of the week is Thursday.

5.5.Tuesday vs Saturday in terms of behaviors of calrorie consumption.

To get to know about the behaviors of “moderately active with normal sleeping user, we compare the calorie consumption on Tuesday and Saturday.

#Merge the data freame of "hourly_calories" and "df_standard_user.
df_standard_user1 <- merge(hourly_calories, df_standard_user, by= c("id", "date"))
glimpse(df_standard_user1)
## Rows: 4,136
## Columns: 27
## $ id                         <dbl> 2347167796, 2347167796, 2347167796, 2347167…
## $ date                       <date> 2016-04-13, 2016-04-13, 2016-04-13, 2016-0…
## $ date_time                  <dttm> 2016-04-13 18:00:00, 2016-04-13 16:00:00, …
## $ hourly_calories            <dbl> 140, 91, 57, 121, 73, 63, 59, 75, 86, 77, 5…
## $ time                       <chr> "18:00:00", "16:00:00", "05:00:00", "13:00:…
## $ total_steps                <dbl> 10352, 10352, 10352, 10352, 10352, 10352, 1…
## $ total_distance             <dbl> 7.01, 7.01, 7.01, 7.01, 7.01, 7.01, 7.01, 7…
## $ tracker_distance           <dbl> 7.01, 7.01, 7.01, 7.01, 7.01, 7.01, 7.01, 7…
## $ logged_activities_distance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ very_active_distance       <dbl> 1.66, 1.66, 1.66, 1.66, 1.66, 1.66, 1.66, 1…
## $ moderately_active_distance <dbl> 1.94, 1.94, 1.94, 1.94, 1.94, 1.94, 1.94, 1…
## $ light_active_distance      <dbl> 3.41, 3.41, 3.41, 3.41, 3.41, 3.41, 3.41, 3…
## $ sedentary_active_distance  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ very_active_minutes        <dbl> 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19,…
## $ fairly_active_minutes      <dbl> 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32,…
## $ lightly_active_minutes     <dbl> 195, 195, 195, 195, 195, 195, 195, 195, 195…
## $ sedentary_minutes          <dbl> 676, 676, 676, 676, 676, 676, 676, 676, 676…
## $ calories                   <dbl> 2038, 2038, 2038, 2038, 2038, 2038, 2038, 2…
## $ total_sleep_records        <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ total_minutes_asleep       <dbl> 467, 467, 467, 467, 467, 467, 467, 467, 467…
## $ total_time_in_bed          <dbl> 531, 531, 531, 531, 531, 531, 531, 531, 531…
## $ mean_daily_steps           <dbl> 8533.2, 8533.2, 8533.2, 8533.2, 8533.2, 853…
## $ mean_daily_calories        <dbl> 1971.333, 1971.333, 1971.333, 1971.333, 197…
## $ mean_minutes_asleep        <dbl> 446.8, 446.8, 446.8, 446.8, 446.8, 446.8, 4…
## $ sleeper_type               <chr> "normal sleeper", "normal sleeper", "normal…
## $ user_type                  <chr> "moderately active", "moderately active", "…
## $ weekday                    <ord> Wednesday, Wednesday, Wednesday, Wednesday,…
n_unique(df_standard_user1$id) #7 users
## [1] 7
#Hourly calorie consumption on Tuesday.
df_standard_user1 %>%
  filter(weekday == "Tuesday") %>%
  group_by(time) %>%
  summarize(median_hourly_calories = median(hourly_calories)) %>%
  ggplot() +
  geom_col(mapping = aes(x=time, y = median_hourly_calories, fill = median_hourly_calories)) + 
  labs(title = "Hourly calories throughout Tuesday", x="hour", y="cal", fill = "Calorie") + 
  scale_fill_gradient(low = "green", high = "orange")+
  theme(axis.text.x = element_text(angle = 90))

#Hourly calorie consumption on Saturday.
df_standard_user1 %>%
  filter(weekday == "Saturday") %>%
  group_by(time) %>%
  summarize(median_hourly_calories = median(hourly_calories)) %>%
  ggplot() +
  geom_col(mapping = aes(x=time, y = median_hourly_calories, fill = median_hourly_calories)) + 
  labs(title = "Hourly calories throughout Saturday", x="hour", y="cal", fill = "Calorie") + 
  scale_fill_gradient(low = "green", high = "orange")+
  theme(axis.text.x = element_text(angle = 90))

For the reference, we confirm the tendencies of calorie consumption on each day of the week.

#All days of the week.
df_standard_user1 %>%
  group_by(time, weekday) %>%
  summarize(median_hourly_calories = median(hourly_calories)) %>%
  ggplot() +
  geom_col(mapping = aes(x=time, y = median_hourly_calories, fill = median_hourly_calories)) + 
  facet_wrap(~weekday) +
  labs(title = "Hourly calories throughout Saturday", x="hour", y="cal", fill = "Calorie") + 
  scale_fill_gradient(low = "green", high = "orange")+
  theme(axis.text.x = element_text(angle = 90))

According to the graph, both days of the week are days of the high calorie consumption, but both are different excising timing.

  • On Tuesday, the users did active at 6:00 and at 16:00.

  • On Saturday, the users did active at noon. In the morning and evening, they are relatively relaxed.

  • As the wide point of views, 7 users do exercise in the morning and in the evening. During weekend, they relatively active during at noon and afternoon.

6.Act

In this section, based on the business tasks and analysis works, we describe our recommendations to take actions for a new business growth and value of Bellabeat.

The Bellabeat’s value is “Empowering Women to Unlock Their Full Potential”.

The data sets contains 33 users as maximum. In addition, the remarkable user class, which is “moderately active with normal sleeping time”, contains only 7 users, 21%. It is limited to expand the insights from the results of the data analysis to tips for new marketing strategy of Bllabeat. Based on such fact, our answers to the business questionss as follows.

6.4.Suggestion for potential business opportunity.

As a conclusion, we sugest the following ideas for potential business opportunity.

  • Online live coaching service can be a potential product for Bellabeat.

    • Online “Live coaching” service could be set up in the evening on Tuesday and in the afternoon on Saturday according to the results of trends 5.1.3 and 5.1.4. This service could be a high chance to connect interactively Bellabeat and customers.
  • Informative articles for only subscribed membership users

    • New articles on SNS or mailing news can be read if marketing team release on the less active day, such as Sunday, Monday, and Thursday

If the raw data has more varieties for the analysis, such as gender, ages, category of the sports for the activities. We wish Bellabeat could empower woman’s potential.