1. Introduction 🗂

1.1 About Bellabeat

Bellabeat is a high-tech company that manufactures health-focused smart products.Bellabeat samrt products are designed to informs and inspires women around the world by collecting data on activity, sleep, stress, and reproductive health since 2013.

This analysis aims to analyze smart device fitness data to gain insight in how customer use their smart devices. With that, it is meant to help guide Bellabeat’s future marketing strategy based on the opportunity of growth that we discovered.

1.2 Stakeholders and products

Stakeholders

Urška Sršen: Bellabeat’s co founder and Chief Creative Officer Sando Mur: Mathematician and Bellabeat’s co founder and key member of the Bellabeat executive team Bellabeat marketing analytics team

Products

The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. It connects to other Bellabeat smart wellness products. With that being said, Bellabeat app help customers to determine healthy decisions based on the data it collected.

2. Ask Phase 📋

2.1 Business task

Our work is to analyse non-bellabeat smart devices usage data and gain insights to draw high-level recommendations on Bellebat marketing strategy.

This analysis focus on answering this question: How current user trends can guide marketing strategy?

3. Prepare Phase 🗄

3.1 About dataset

The data source used in this analysis is FitBit Fitness Tracker Data.
This dataset generated by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016-05.12.2016.
Unzip the FitBit Fitness Tracker Data.zip, a folder Fitabase Data 4.12.16-5.12.16 contains 18 CSV files.
It contains thirty eligible FitBit users personal tracker data, including minute-level output for daily activity, heart rate, and sleep monitoring.

3.2 Accessbility and privacy of data

The dataset is open-source which is dedicated to the public domain.
It is stored in Kaggle and made available through MÖBIUS.
Thirty eligible Fitbit users consented to the submission of personal tracker data.

3.3 Installing packages and loading libraries

We will install and loading some R packages that will encounter in this analysis.

There are three packages for cleaning data, “here”, “skimr”, “janitor”.

#Loading libraries
library(tidyverse)
library(lubridate)
library(scales)
library(ggplot2)
library(dplyr)
library(here)
library(skimr)
library(janitor)

3.4 Importing and previewing dataset

We will import all 18 csv files and then view, clean, format and organize the data.

We will take look at the summary of each column.

dailyActivity_merged.csv

#Importing Daily Activity dataset:
daily_activity <- read.csv(file = "~/Downloads/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
head(daily_activity)

colnames(daily_activity)

##  [1] "Id"                       "ActivityDate"            
##  [3] "TotalSteps"               "TotalDistance"           
##  [5] "TrackerDistance"          "LoggedActivitiesDistance"
##  [7] "VeryActiveDistance"       "ModeratelyActiveDistance"
##  [9] "LightActiveDistance"      "SedentaryActiveDistance" 
## [11] "VeryActiveMinutes"        "FairlyActiveMinutes"     
## [13] "LightlyActiveMinutes"     "SedentaryMinutes"        
## [15] "Calories"

str(daily_activity)

## 'data.frame':    940 obs. of  15 variables:
##  $ Id                      : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDate            : chr  "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
##  $ TotalSteps              : int  13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
##  $ TotalDistance           : num  8.5 6.97 6.74 6.28 8.16 ...
##  $ TrackerDistance         : num  8.5 6.97 6.74 6.28 8.16 ...
##  $ LoggedActivitiesDistance: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveDistance      : num  1.88 1.57 2.44 2.14 2.71 ...
##  $ ModeratelyActiveDistance: num  0.55 0.69 0.4 1.26 0.41 ...
##  $ LightActiveDistance     : num  6.06 4.71 3.91 2.83 5.04 ...
##  $ SedentaryActiveDistance : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ VeryActiveMinutes       : int  25 21 30 29 36 38 42 50 28 19 ...
##  $ FairlyActiveMinutes     : int  13 19 11 34 10 20 16 31 12 8 ...
##  $ LightlyActiveMinutes    : int  328 217 181 209 221 164 233 264 205 211 ...
##  $ SedentaryMinutes        : int  728 776 1218 726 773 539 1149 775 818 838 ...
##  $ Calories                : int  1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...

dailyCaloreis_merged.csv

#Importing Daily Calories dataset:
daily_calories <- read.csv(file = "~/Downloads/Fitabase Data 4.12.16-5.12.16/dailyCalories_merged.csv")
head(daily_calories)

colnames(daily_calories)

## [1] "Id"          "ActivityDay" "Calories"

str(daily_calories)

## 'data.frame':    940 obs. of  3 variables:
##  $ Id         : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDay: chr  "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
##  $ Calories   : int  1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...

dailyIntensities_merged.csv

#Importing Daily Intensities dataset:
daily_intensities <- read.csv(file = "~/Downloads/Fitabase Data 4.12.16-5.12.16/dailyIntensities_merged.csv")
head(daily_intensities)

colnames(daily_intensities)

##  [1] "Id"                       "ActivityDay"             
##  [3] "SedentaryMinutes"         "LightlyActiveMinutes"    
##  [5] "FairlyActiveMinutes"      "VeryActiveMinutes"       
##  [7] "SedentaryActiveDistance"  "LightActiveDistance"     
##  [9] "ModeratelyActiveDistance" "VeryActiveDistance"

str(daily_intensities)

## 'data.frame':    940 obs. of  10 variables:
##  $ Id                      : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDay             : chr  "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
##  $ SedentaryMinutes        : int  728 776 1218 726 773 539 1149 775 818 838 ...
##  $ LightlyActiveMinutes    : int  328 217 181 209 221 164 233 264 205 211 ...
##  $ FairlyActiveMinutes     : int  13 19 11 34 10 20 16 31 12 8 ...
##  $ VeryActiveMinutes       : int  25 21 30 29 36 38 42 50 28 19 ...
##  $ SedentaryActiveDistance : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ LightActiveDistance     : num  6.06 4.71 3.91 2.83 5.04 ...
##  $ ModeratelyActiveDistance: num  0.55 0.69 0.4 1.26 0.41 ...
##  $ VeryActiveDistance      : num  1.88 1.57 2.44 2.14 2.71 ...

dailySteps_merged.csv

#Importing Daily Steps dataset:
daily_steps <- read.csv(file = "~/Downloads/Fitabase Data 4.12.16-5.12.16/dailySteps_merged.csv")
head(daily_steps)

colnames(daily_steps)

## [1] "Id"          "ActivityDay" "StepTotal"

str(daily_steps)

## 'data.frame':    940 obs. of  3 variables:
##  $ Id         : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityDay: chr  "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
##  $ StepTotal  : int  13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...

heartrate_seconds_merged.csv

#Importing Heart Rate Seconds dataset:
heart_rate_seconds <- read.csv(file = "~/Downloads/Fitabase Data 4.12.16-5.12.16/heartrate_seconds_merged.csv")
head(heart_rate_seconds)

colnames(heart_rate_seconds)

## [1] "Id"    "Time"  "Value"

str(heart_rate_seconds)

## 'data.frame':    2483658 obs. of  3 variables:
##  $ Id   : num  2.02e+09 2.02e+09 2.02e+09 2.02e+09 2.02e+09 ...
##  $ Time : chr  "4/12/2016 7:21:00 AM" "4/12/2016 7:21:05 AM" "4/12/2016 7:21:10 AM" "4/12/2016 7:21:20 AM" ...
##  $ Value: int  97 102 105 103 101 95 91 93 94 93 ...

sleepDay_merged.csv

#Importing Sleep Day dataset:
sleep_day <- read.csv(file = "~/Downloads/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
head(sleep_day)

colnames(sleep_day)

## [1] "Id"                 "SleepDay"           "TotalSleepRecords" 
## [4] "TotalMinutesAsleep" "TotalTimeInBed"

str(sleep_day)

## 'data.frame':    413 obs. of  5 variables:
##  $ Id                : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ SleepDay          : chr  "4/12/2016 12:00:00 AM" "4/13/2016 12:00:00 AM" "4/15/2016 12:00:00 AM" "4/16/2016 12:00:00 AM" ...
##  $ TotalSleepRecords : int  1 2 1 2 1 1 1 1 1 1 ...
##  $ TotalMinutesAsleep: int  327 384 412 340 700 304 360 325 361 430 ...
##  $ TotalTimeInBed    : int  346 407 442 367 712 320 377 364 384 449 ...

weightLogInfo_merged.csv

#Importing Weight Log Info dataset:
weight_log <- read.csv(file = "~/Downloads/Fitabase Data 4.12.16-5.12.16/weightLogInfo_merged.csv")
head(weight_log)

colnames(weight_log)

## [1] "Id"             "Date"           "WeightKg"       "WeightPounds"  
## [5] "Fat"            "BMI"            "IsManualReport" "LogId"

str(weight_log)

## 'data.frame':    67 obs. of  8 variables:
##  $ Id            : num  1.50e+09 1.50e+09 1.93e+09 2.87e+09 2.87e+09 ...
##  $ Date          : chr  "5/2/2016 11:59:59 PM" "5/3/2016 11:59:59 PM" "4/13/2016 1:08:52 AM" "4/21/2016 11:59:59 PM" ...
##  $ WeightKg      : num  52.6 52.6 133.5 56.7 57.3 ...
##  $ WeightPounds  : num  116 116 294 125 126 ...
##  $ Fat           : int  22 NA NA NA NA 25 NA NA NA NA ...
##  $ BMI           : num  22.6 22.6 47.5 21.5 21.7 ...
##  $ IsManualReport: chr  "True" "True" "False" "True" ...
##  $ LogId         : num  1.46e+12 1.46e+12 1.46e+12 1.46e+12 1.46e+12 ...

hourlySteps_merged.csv

#Importing Hourly Steps dataset:
hourly_steps <-read.csv(file="~/Downloads/Fitabase Data 4.12.16-5.12.16/hourlySteps_merged.csv")
head(hourly_steps)

colnames(hourly_steps)

## [1] "Id"           "ActivityHour" "StepTotal"

str(hourly_steps)

## 'data.frame':    22099 obs. of  3 variables:
##  $ Id          : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ ActivityHour: chr  "4/12/2016 12:00:00 AM" "4/12/2016 1:00:00 AM" "4/12/2016 2:00:00 AM" "4/12/2016 3:00:00 AM" ...
##  $ StepTotal   : int  373 160 151 0 0 0 0 0 250 1864 ...

4. Process Phase 🔍

4.1 Checking for redundancy

After using functions like glimpse() and skim_without_charts() to quick view the dataset.

Bothdaily Caloreis_merged.csv and dailyIntensities_merged.csv contains the same data as dailyActivity_merged.csv presented.

Even though we have informed that the sample size is 33 users. We use distinct() to verify the unique users.

n_distinct(daily_activity$Id)

## [1] 33

n_distinct(weight_log$Id)

## [1] 8

n_distinct(daily_steps$Id)

## [1] 33

n_distinct(heart_rate_seconds$Id)

## [1] 14

n_distinct(sleep_day$Id)

## [1] 24

n_distinct(weight_log$Id)

## [1] 8

n_distinct(hourly_steps$Id)

## [1] 33

After verify the unique users, we decide to focus on file dailyActivity_merged.csv sleepDay_merged.csv and hourlySteps_merged.csv in this case study.

4.2 Checking and removing duplicates

We will checking for any duplicates by using drop_na(). We use distinct( .keep_all = TRUE) to remove duplicate rows based on certain columns.

daily_activity <- daily_activity %>%
  drop_na()
sleep_day <- sleep_day  %>%
  drop_na()
hourly_steps <- hourly_steps %>%
  drop_na()

sum(duplicated(daily_activity))

## [1] 0

sum(duplicated(sleep_day))

## [1] 3

sum(duplicated(hourly_steps))

## [1] 0

Now we will remove some duplicate value in sleepDay_merged.csv.

sleep_day_2<- sleep_day %>% distinct(SleepDay, TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed, .keep_all = TRUE)

sum(duplicated(sleep_day_2))

## [1] 0

str(sleep_day_2)

## 'data.frame':    409 obs. of  5 variables:
##  $ Id                : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ SleepDay          : chr  "4/12/2016 12:00:00 AM" "4/13/2016 12:00:00 AM" "4/15/2016 12:00:00 AM" "4/16/2016 12:00:00 AM" ...
##  $ TotalSleepRecords : int  1 2 1 2 1 1 1 1 1 1 ...
##  $ TotalMinutesAsleep: int  327 384 412 340 700 304 360 325 361 430 ...
##  $ TotalTimeInBed    : int  346 407 442 367 712 320 377 364 384 449 ...

4.3 Formatting date_time and renaming columns

It is not distinctive enough to the column name of StepTotal and Total Steps in hourly Steps merged.csv and dailyActivity_merged so that we will rename StepTotal to hourlySteps.

To align the date format of hourlySteps_merged.csv and sleepDay_merged.csv with dailyActivity_merged.csv, we will use as.Date and as.POSIXct to format.

daily_activity <- daily_activity %>%
  rename(Date = ActivityDate)
head(daily_activity)

sleep_day_2 <- sleep_day_2 %>%
  rename(Date = SleepDay)
head(sleep_day_2)

hourly_steps <- hourly_steps %>% 
  rename(hourlyStep = StepTotal) 
head(hourly_steps)

sleep_day_2$Date=as.POSIXct(sleep_day_2$Date, format="%m/%d/%Y %I:%M:%S %p")
sleep_day_2$Date=as.Date(sleep_day_2$Date, format = "%m/%d/%Y")

head(sleep_day_2)

daily_activity$Date=as.POSIXct(daily_activity$Date, format="%m/%d/%Y")
daily_activity$Date=as.Date(daily_activity$Date, format = "%m/%d/%Y")

head(daily_activity)

hourly_steps <- hourly_steps %>%
  rename(date_time = ActivityHour) %>%
  mutate(date_time = as.POSIXct(date_time, format = "%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone()))

head(hourly_steps)

4.4 Merging datasets

We will merge dailyActivity_merged.csv and sleepDay_merged.csvto delve into any correlation based on the primary key of Id and Date. We will use full_join() by including all=TRUE to keep all the value in the dataset.

joined_df <- merge(daily_activity,sleep_day_2,by=c("Id","Date"))
glimpse(joined_df)

## Rows: 409
## Columns: 18
## $ Id                       <dbl> 1503960366, 1503960366, 1503960366, 150396036…
## $ Date                     <date> 2016-04-11, 2016-04-12, 2016-04-14, 2016-04-…
## $ TotalSteps               <int> 13162, 10735, 9762, 12669, 9705, 15506, 10544…
## $ TotalDistance            <dbl> 8.50, 6.97, 6.28, 8.16, 6.48, 9.88, 6.68, 6.3…
## $ TrackerDistance          <dbl> 8.50, 6.97, 6.28, 8.16, 6.48, 9.88, 6.68, 6.3…
## $ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveDistance       <dbl> 1.88, 1.57, 2.14, 2.71, 3.19, 3.53, 1.96, 1.3…
## $ ModeratelyActiveDistance <dbl> 0.55, 0.69, 1.26, 0.41, 0.78, 1.32, 0.48, 0.3…
## $ LightActiveDistance      <dbl> 6.06, 4.71, 2.83, 5.04, 2.51, 5.03, 4.24, 4.6…
## $ SedentaryActiveDistance  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveMinutes        <int> 25, 21, 29, 36, 38, 50, 28, 19, 41, 39, 73, 3…
## $ FairlyActiveMinutes      <int> 13, 19, 34, 10, 20, 31, 12, 8, 21, 5, 14, 23,…
## $ LightlyActiveMinutes     <int> 328, 217, 209, 221, 164, 264, 205, 211, 262, …
## $ SedentaryMinutes         <int> 728, 776, 726, 773, 539, 775, 818, 838, 732, …
## $ Calories                 <int> 1985, 1797, 1745, 1863, 1728, 2035, 1786, 177…
## $ TotalSleepRecords        <int> 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ TotalMinutesAsleep       <int> 327, 384, 412, 340, 700, 304, 360, 325, 361, …
## $ TotalTimeInBed           <int> 346, 407, 442, 367, 712, 320, 377, 364, 384, …

5. Analyze Phase 💻

Now it’s time to analyze the trend of Fitbit users’ activity and identify any discoveries that would help Bellbeat’s marketing strategy.

5.1 Define user type

We can classified users by their intensity of activity, and the correlation between intensity and steps as follows:

Sedentary: less than 5,000 steps per day
Light Active: 5,000 to 7,499 steps per day
Fairly Active: 7,500 to 9,999 steps per day
Very Active: 10,000 steps per day

The correlation of intensity and steps was made according 10000 steps.org.au.

5.2 Calculate user daily steps

After knowing user type, we will calculate user daily steps.

daily_record<- joined_df %>%
  group_by(Id) %>%
  summarise(daily_steps = mean(TotalSteps), daily_calories = mean(Calories), daily_sleep = mean(TotalMinutesAsleep))

head(daily_record)

5.3 Classify user type by daily steps

Since we have the value of daily steps, we will use it to classify the user type.

user_type <- daily_record %>%
mutate(user_type = case_when(
     daily_steps < 5000 ~"Sedentary",
     daily_steps >= 5000 & daily_steps <= 7499 ~"Lightly Active",
     daily_steps >= 7500 & daily_steps <= 9999 ~"Fairly Active",
     daily_steps >= 10000 ~"Very Active"
))
head(user_type)

5.4 User type distribution

Then we would like to know the percentage of each type of user.

user_type_percent <- user_type %>%
  group_by(user_type) %>%
  summarise(total = n()) %>%
  mutate(total_users = sum(total)) %>%
  group_by(user_type) %>%
  summarise(total_percent = total / total_users) %>%
  mutate(Percent = percent(total_percent))
  
head(user_type_percent)

6.Share Phase 📥

We will use data visualization to clearly display the trend of Fitbit users’ activity and other relevant discoveries.

6.1 Data visualization: User type distribution

user_type_percent %>%
  ggplot(aes(x="", y=total_percent, fill=user_type))+
  geom_bar(width = 1, stat = "identity")+
  coord_polar("y")+
  theme_minimal()+
    theme(axis.title.x= element_blank(),
          axis.title.y = element_blank(),
          panel.border = element_blank(), 
          panel.grid = element_blank(), 
          axis.ticks = element_blank(),
          axis.text.x = element_blank(),
          plot.title = element_text(hjust=0.5, size=12, face = "bold")) +
  geom_text(aes(label= Percent), size=3.5, position = position_stack(vjust = 0.5))+
  scale_fill_manual(values=c("Very Active"= "#ff80b3", "Fairly Active" = "#ff99c2", "Lightly Active" = "#ffb3d1", "Sedentary" = "#ffcce0"))+
  labs(title = "User type distribution")

6.2 Data visualization: User daily sleep and daily steps of the week

step_sleep_week <- joined_df %>%
  mutate(WeekDay = weekdays(Date))

step_sleep_week$WeekDay <- ordered(step_sleep_week$WeekDay, c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"))

step_sleep_week <- step_sleep_week %>%
group_by(WeekDay) %>%

summarise(daily_step=mean(TotalSteps), daily_sleep=mean(TotalMinutesAsleep))

head(step_sleep_week)

 ggplot(step_sleep_week) +
      geom_col(aes(WeekDay, daily_step), fill = "#ff8566") +
      geom_hline(yintercept = 7500) +
      labs(title = "Daily steps of the week", x= "", y = "") +
      theme(axis.text.x = element_text(angle = 45,vjust = 0.5, hjust = 1))

ggplot(step_sleep_week, aes(WeekDay, daily_sleep)) +
      geom_col(fill = "#99CCFF") +
      geom_hline(yintercept = 480) +
      labs(title = "Minutes asleep of the week", x= "", y = "") +
      theme(axis.text.x = element_text(angle = 45,vjust = 0.5, hjust = 1))

Throughout the graph shown as above, we can identify as follows:

Most Fitbit users’ daily steps are 7500 over a week excludes on Saturday.
Users did not get enough sleep of 8 hours.

6.2 Hourly steps of the day

We will look at how users activity looks like at the scale of a day by using hourlySteps_merged.csv

hourly_steps_day <- hourly_steps %>%
  separate(date_time, into = c("Date", "Time"), sep = ' ') 

head(hourly_steps_day)

hourly_steps_day %>%
  group_by(Time) %>%
  summarise(average_step = mean(hourlyStep)) %>%
  ggplot()+
  geom_col(mapping = aes(x = Time, y= average_step, fill= average_step))+
  labs(title="Average steps of the day", x="", y="")+
  scale_fill_gradient(low = "green", high = "orange")+
  theme(axis.text.x = element_text(angle = 90))

In this graph, we can get two takeaway as follows:

Users are active between 7:00am and 7:00pm.
From 11:00am to 2:00pm and from 5:00pm to 7:00pm are the two of the most active moment of the day.

6.3 The frequency of smart device usage

In this part, we would like to know how often user use smart device on a daily basis. We will classify it into three groups, mostly use, frequently use, occasionally use based on the number of days.

mostly use: users who use Fitbit between 21 days and 31 days.
frequently use: users who use Fibit between 10 days and 20 days.
occasionally use: users who use Fitbit between 1 and 10 days.

use_of_smart_device <- joined_df %>%
  group_by(Id) %>%
  summarise(daily_used=sum(n())) %>%
  mutate(frequency = case_when(
  daily_used >= 21 & daily_used <= 31 ~ "use mostly",
  daily_used >= 10 & daily_used <= 20 ~ "use moderately",
  daily_used >= 1 & daily_used <=10 ~ "use occasionally"
  ))
  
head(use_of_smart_device)

We will create the data frame to show the percentage of each frequency group.

percent_of_smart_device <- use_of_smart_device %>%
  group_by(frequency) %>%
  summarise(number= n()) %>%
  mutate(total_number = sum(number)) %>%
  group_by(frequency) %>%
  summarise(use_percent = number/total_number) %>%
  mutate(use_percent_total= percent(use_percent)) 

head(percent_of_smart_device)

We will display the percentage in a pie chart shown as follows.

percent_of_smart_device %>%
  ggplot(aes(x="", y=use_percent, fill= frequency))+
  geom_bar(width = 1, stat="identity")+
    coord_polar("y")+
  theme_minimal()+
  theme(axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        panel.border = element_blank(),
        panel.grid = element_blank(),
        axis.ticks = element_blank(),
        axis.text.x = element_blank(),
        plot.title = element_text(hjust=0.5, size = 12, face = "bold"))+
  labs(title = "Percentage of smart device usage")+
  geom_text(aes(label=use_percent_total), size=3.5, position=position_stack(vjust=0.5))+
  scale_fill_manual(values=c("use mostly" = "#ffcc80", "use moderately" = "#ffd699", "use occasionally" = "#ffebcc"))

Looking over the graph, it presents:

50 % of users use smart devices mostly - between 21 and 30 days.
38% of users use smart devices moderately - between 10 and 20 days.
12 of users use smart devices occasionally - between 1 and 10 days.

joined_use_merged <- merge(daily_activity, use_of_smart_device, by=c("Id"))
head(joined_use_merged)

minutes_worn <- joined_use_merged %>%
  mutate(total_minute_worn = VeryActiveMinutes+FairlyActiveMinutes+LightlyActiveMinutes+SedentaryMinutes) %>% 
  mutate(percent_worn = (total_minute_worn/1440)*100) %>% 
  mutate(worn = case_when(
      percent_worn == 100 ~ "Whole day",
      percent_worn < 100 & percent_worn >= 50 ~ "More than half day",
      percent_worn < 50 & percent_worn >0 ~ "Less than half day"))

head(minutes_worn)

Whole day - user worn devices all day.
More than half day - user worn devices more than half of the day.
Less than half day - user worn devices less than half of the day.

minutes_worn_percent <- minutes_worn %>%
  group_by(worn) %>%
  summarise(total=n()) %>%
  mutate(totals=sum(total)) %>%
  group_by(worn) %>%
  summarise(total_percent=total/totals) %>%
  mutate(percent= percent(total_percent))

head(minutes_worn_percent)

minutes_worn_usemostly <- minutes_worn %>%
  filter(frequency == "use mostly") %>%
  group_by(worn) %>%
  summarise(total=n()) %>%
  mutate(totals=sum(total)) %>%
  group_by(worn) %>%
  summarise(total_percent=total/totals) %>%
  mutate(percent_usemostly= percent(total_percent))

head(minutes_worn_usemostly)

minutes_worn_usemoderately <- minutes_worn %>%
  filter(frequency == "use moderately") %>%
  group_by(worn) %>%
  summarise(total=n()) %>%
  mutate(totals=sum(total)) %>%
  group_by(worn) %>%
  summarise(total_percent=total/totals) %>%
  mutate(percent_usemoderately= percent(total_percent))

head(minutes_worn_usemoderately)

minutes_worn_useoccasionally <- minutes_worn %>%
  filter(frequency == "use occasionally") %>%
  group_by(worn) %>%
  summarise(total=n()) %>%
  mutate(totals=sum(total)) %>%
  group_by(worn) %>%
  summarise(total_percent=total/totals) %>%
  mutate(percent_useoccasionally= percent(total_percent))

head(minutes_worn_useoccasionally)

We will visualize these four data frame in the following graphs.

 minutes_worn_percent %>%
  ggplot(aes(x="", y=total_percent, fill=worn))+
  geom_bar(width=1, stat="identity")+
  coord_polar("y")+
  theme_minimal()+
  theme(axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        panel.border = element_blank(),
        panel.grid = element_blank(),
        axis.ticks = element_blank(),
        axis.text.x = element_blank(),
        plot.title = element_text(hjust=0.5, size=12, face="bold"))+
  labs(title="Utility time of smart devices")+
  geom_text(aes(label=percent), size=3.5, color = "#ffffff", position = position_stack(vjust=0.5))+
  scale_fill_manual(values = c("Whole day" = "#0033cc", "More than half day" = "#3366ff", "Less than half day" = "#809fff"))

minutes_worn_usemostly %>%
  ggplot(aes(x="", y=total_percent, fill=worn))+
  geom_bar(width=1, stat="identity")+
  coord_polar("y")+
  theme_minimal()+
  theme(axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        panel.border = element_blank(),
        panel.grid = element_blank(),
        axis.ticks = element_blank(),
        axis.text.x = element_blank(),
        plot.title = element_text(hjust=0.5,size=12, face="bold"))+
  labs(title="Use mostly")+
  geom_text(aes(label=percent_usemostly), size=3.5, color = "#ffffff", position = position_stack(vjust=0.8))+
  scale_fill_manual(values = c("Whole day" = "#0033cc", "More than half day" = "#3366ff", "Less than half day" = "#809fff"))

minutes_worn_usemoderately %>%
  ggplot(aes(x="", y=total_percent, fill=worn))+
  geom_bar(width=1, stat="identity")+
  coord_polar("y")+
  theme_minimal()+
  theme(axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        panel.border = element_blank(),
        panel.grid = element_blank(),
        axis.ticks = element_blank(),
        axis.text.x = element_blank(),
        plot.title = element_text(hjust=0.5, size=12, face="bold"))+
  labs(title="Use moderately")+
  geom_text(aes(label=percent_usemoderately), size=3.5, color = "#ffffff", position = position_stack(vjust=0.5))+
  scale_fill_manual(values = c("Whole day" = "#0033cc", "More than half day" = "#3366ff", "Less than half day" = "#809fff"))

minutes_worn_useoccasionally %>%
  ggplot(aes(x="", y=total_percent, fill=worn))+
  geom_bar(width=1, stat="identity")+
  coord_polar("y")+
  theme_minimal()+
  theme(axis.title.x = element_blank(),
        axis.title.y = element_blank(),
        panel.border = element_blank(),
        panel.grid = element_blank(),
        axis.ticks = element_blank(),
        axis.text.x = element_blank(),
        plot.title = element_text(hjust=0.5, size=12, face="bold"))+
  labs(title="Use occasionally")+
  geom_text(aes(label=percent_useoccasionally), size=3.5, color = "#ffffff", position = position_stack(vjust=0.5))+
  scale_fill_manual(values = c("Whole day" = "#0033cc", "More than half day" = "#3366ff", "Less than half day" = "#809fff"))

As shown in the first pie chart, we can tell that 36% of the total users wear smart devices all day long, 60% of users wear it more than half of the day and only 4% less than half day.

In terms of proportion of different usage,

88% of users wearing the smart device more than half of the day who use it often. People who wear it whole day and less than half day that only take up 6.8% and 4.3%.
For moderate user, the majority (69%) wear it more than half day.
As a result of occasional user, most of people wear it whole day compare to other types of users.

7. Act Phase ⛳️

Bellabeat is a high-tech company with the goal to inform and inspire women with knowledge about their own health and habits. With that being said, I will advised collect user own data in terms of demographic information for further investigation. This will be able to provide focused marketing strategy based on customer segmentation besides broader target.

As described in graphs above, we can draw conclusion as follows:

Most Fitbit users’ daily steps are 7500 over a week excludes on Saturday.
Users did not get enough 8 hours sleep.
Half of the sample size use smart devices between 21days and 30 days.
Occasional users will wear smart devices all day.
Both frequent users and moderate users wear smart devices more than half of the day.

Above all, we will propose the following recommendation.

Recommendation	Interpretation
1.Activity notification & Recommended exercise	We classified users activity into four types based on the steps of a day. Leveraging this, Bellabeat app can send reminder if user walks less than 8000 steps of the day. On top of that, app can send some recommended workout for user to achieve the daily step goal.
2.Bedtime notification and other resources helping sleep	Since we knew most of users sleep less than 8 hours, we suggest Bellabeat app send bedtime notification with alarm on and other resources that will help sleeping.
3.Reward mechanism	To encourage users to adopt health lifestyle, we propose that there should be a game with redeemable rewards for users if they have completed the certain amount of workout in a limited period. At Bellabeat’s end, the app will reward eligible users with virtual medals and a certain amount of medals can convert to gift cards to be able to use on other Bellabeat’s products.
4.Water-resistant mode	In order to users record more activity tracker data, Bellabeats’ product with a water-resistant feature would meet users’ need.

Google Data Analytics Capstone: Bellabeat

Meihui(Florence) Zhang

03-10-2022