Case Study: How Can a Wellness Technology Company Play It Smart?

Introduction

Bellabeat is a successful small company, a high-tech manufacturer of health-focused products for women with a potential to become a larger player in the global smart device market. Bellabeat products, collects data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits.

The analysis’s main goal are to analyze smart device usage data in order to gain insight into how consumers use non-Bellabeat smart devices and also to select one Bellabeat product to apply these insights to my presentation.

I am a junior data analyst working in the marketing analytics team at Bellabeat. In order to answer the key business questions, uncover key insights, pattern and trends on the Bellabeat’s products that will help guide marketing strategy for the company. i will follow the steps of the data analysis process: ask, prepare, process, analyze, share, and act.

Stakeholders

⦁ Urška Sršen- Bellabeat’s cofounder and Chief Creative Officer

⦁ Sando Mur- Bellabeat’s cofounder; key member of the Bellabeat executive team

⦁ Bellabeat marketing analytics team - A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat’s marketing strategy.

Product

⦁ Bellabeat app: The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits.

⦁ Leaf: Bellabeat’s classic wellness tracker can be worn as a bracelet, necklace, or clip. The Leaf tracker connects to the Bellabeat app to track activity, sleep, and stress.

⦁ Time: This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress.

⦁ Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day

This data can help users better understand their current habits and make healthy decisions. The Bellabeat app connects to their line of smart wellness products.

Ask

Business Task

⦁ What are some trends in smart device usage?

⦁ How could these trends apply to Bellabeat customers?

⦁ How could these trends help influence Bellabeat marketing strategy?

Prepare

Data Source Description

This dataset used in this study was made available through Mobius: FitBit Fitness Tracker Data https://www.kaggle.com/datasets/arashnic/fitbit. The dataset comprises of 18.csv files, it contains the combined personal fitness tracker data from thirty (30) FitBit users who consented to submit their personal data which includes their heart rate, sleep details, intensities, physical activities and other related data necessary to assess their habits.

Data Interigity(ROCCC)

Reliable: data is not reliable, contains 30 Fitbit consented respondent, hence can cause bias during analysis.

Original: Third party provider - survey via Amazon Mechanical Turk and unverifiable if the data is accurate.

Comprehensive: data within boundary required for bellabeat business task.

Current: not current, was sourced in 2016 (7 years old) and it is out of date and hence might be irrelevant to Bellabeat.

Cited : available through Mobuis via kaggle

The dataset is not recommended to produce business recommendations due to bad quality data. data does not contain information on key characteristics such as age, lifestyle of the participants.

Dataset

I would be making use of the

  • dailyActivity_merged.csv

  • hourlyCalories_merged.csv

  • sleepDay_merged.csv

  • hourlySteps_merged.csv

Process

R Programming

i will perform data cleaning operations using R to ensure the dataset is correct, complete and error free:

⦁ Explore and observe data

⦁ Check for missing or null values

⦁ Transform data — format data type

⦁ Conduct statistical analysis

I imported and load the data into RStudio.

Setting up my r environment and Loading packages

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.3.2
## Warning: package 'ggplot2' was built under R version 4.3.2
## Warning: package 'tidyr' was built under R version 4.3.2
## Warning: package 'dplyr' was built under R version 4.3.2
## Warning: package 'lubridate' was built under R version 4.3.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(janitor)
## Warning: package 'janitor' was built under R version 4.3.2
## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(lubridate)
library(readr)
library(skimr)
## Warning: package 'skimr' was built under R version 4.3.2
library(dplyr)
library(tidyr)
library(plotrix)
## Warning: package 'plotrix' was built under R version 4.3.2

assigining and importing dataset

daily_activity <- read.csv("dailyActivity_merged.csv")
hourly_calories <- read.csv("hourlyCalories_merged.csv")
sleep_day <- read.csv("sleepDay_merged.csv")
hourly_steps <- read.csv("hourlySteps_merged.csv")

Data Exploration

Preview the imported datasets using the head function

head(daily_activity, n = 15) #overview of first 15 records
##            Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1  1503960366    4/12/2016      13162          8.50            8.50
## 2  1503960366    4/13/2016      10735          6.97            6.97
## 3  1503960366    4/14/2016      10460          6.74            6.74
## 4  1503960366    4/15/2016       9762          6.28            6.28
## 5  1503960366    4/16/2016      12669          8.16            8.16
## 6  1503960366    4/17/2016       9705          6.48            6.48
## 7  1503960366    4/18/2016      13019          8.59            8.59
## 8  1503960366    4/19/2016      15506          9.88            9.88
## 9  1503960366    4/20/2016      10544          6.68            6.68
## 10 1503960366    4/21/2016       9819          6.34            6.34
## 11 1503960366    4/22/2016      12764          8.13            8.13
## 12 1503960366    4/23/2016      14371          9.04            9.04
## 13 1503960366    4/24/2016      10039          6.41            6.41
## 14 1503960366    4/25/2016      15355          9.80            9.80
## 15 1503960366    4/26/2016      13755          8.79            8.79
##    LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1                         0               1.88                     0.55
## 2                         0               1.57                     0.69
## 3                         0               2.44                     0.40
## 4                         0               2.14                     1.26
## 5                         0               2.71                     0.41
## 6                         0               3.19                     0.78
## 7                         0               3.25                     0.64
## 8                         0               3.53                     1.32
## 9                         0               1.96                     0.48
## 10                        0               1.34                     0.35
## 11                        0               4.76                     1.12
## 12                        0               2.81                     0.87
## 13                        0               2.92                     0.21
## 14                        0               5.29                     0.57
## 15                        0               2.33                     0.92
##    LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1                 6.06                       0                25
## 2                 4.71                       0                21
## 3                 3.91                       0                30
## 4                 2.83                       0                29
## 5                 5.04                       0                36
## 6                 2.51                       0                38
## 7                 4.71                       0                42
## 8                 5.03                       0                50
## 9                 4.24                       0                28
## 10                4.65                       0                19
## 11                2.24                       0                66
## 12                5.36                       0                41
## 13                3.28                       0                39
## 14                3.94                       0                73
## 15                5.54                       0                31
##    FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1                   13                  328              728     1985
## 2                   19                  217              776     1797
## 3                   11                  181             1218     1776
## 4                   34                  209              726     1745
## 5                   10                  221              773     1863
## 6                   20                  164              539     1728
## 7                   16                  233             1149     1921
## 8                   31                  264              775     2035
## 9                   12                  205              818     1786
## 10                   8                  211              838     1775
## 11                  27                  130             1217     1827
## 12                  21                  262              732     1949
## 13                   5                  238              709     1788
## 14                  14                  216              814     2013
## 15                  23                  279              833     1970
head(hourly_calories)
##           Id          ActivityHour Calories
## 1 1503960366 4/12/2016 12:00:00 AM       81
## 2 1503960366  4/12/2016 1:00:00 AM       61
## 3 1503960366  4/12/2016 2:00:00 AM       59
## 4 1503960366  4/12/2016 3:00:00 AM       47
## 5 1503960366  4/12/2016 4:00:00 AM       48
## 6 1503960366  4/12/2016 5:00:00 AM       48
head(hourly_steps)
##           Id          ActivityHour StepTotal
## 1 1503960366 4/12/2016 12:00:00 AM       373
## 2 1503960366  4/12/2016 1:00:00 AM       160
## 3 1503960366  4/12/2016 2:00:00 AM       151
## 4 1503960366  4/12/2016 3:00:00 AM         0
## 5 1503960366  4/12/2016 4:00:00 AM         0
## 6 1503960366  4/12/2016 5:00:00 AM         0
head(sleep_day)
##           Id              SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 4/12/2016 12:00:00 AM                 1                327
## 2 1503960366 4/13/2016 12:00:00 AM                 2                384
## 3 1503960366 4/15/2016 12:00:00 AM                 1                412
## 4 1503960366 4/16/2016 12:00:00 AM                 2                340
## 5 1503960366 4/17/2016 12:00:00 AM                 1                700
## 6 1503960366 4/19/2016 12:00:00 AM                 1                304
##   TotalTimeInBed
## 1            346
## 2            407
## 3            442
## 4            367
## 5            712
## 6            320

Lets find out if there are null or missing values in each dataset using skim_without_charts

## broad overview each dataset

skim_without_charts(daily_activity)
Data summary
Name daily_activity
Number of rows 940
Number of columns 15
_______________________
Column type frequency:
character 1
numeric 14
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
ActivityDate 0 1 8 9 0 31 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
Id 0 1 4.855407e+09 2.424805e+09 1503960366 2.320127e+09 4.445115e+09 6.962181e+09 8.877689e+09
TotalSteps 0 1 7.637910e+03 5.087150e+03 0 3.789750e+03 7.405500e+03 1.072700e+04 3.601900e+04
TotalDistance 0 1 5.490000e+00 3.920000e+00 0 2.620000e+00 5.240000e+00 7.710000e+00 2.803000e+01
TrackerDistance 0 1 5.480000e+00 3.910000e+00 0 2.620000e+00 5.240000e+00 7.710000e+00 2.803000e+01
LoggedActivitiesDistance 0 1 1.100000e-01 6.200000e-01 0 0.000000e+00 0.000000e+00 0.000000e+00 4.940000e+00
VeryActiveDistance 0 1 1.500000e+00 2.660000e+00 0 0.000000e+00 2.100000e-01 2.050000e+00 2.192000e+01
ModeratelyActiveDistance 0 1 5.700000e-01 8.800000e-01 0 0.000000e+00 2.400000e-01 8.000000e-01 6.480000e+00
LightActiveDistance 0 1 3.340000e+00 2.040000e+00 0 1.950000e+00 3.360000e+00 4.780000e+00 1.071000e+01
SedentaryActiveDistance 0 1 0.000000e+00 1.000000e-02 0 0.000000e+00 0.000000e+00 0.000000e+00 1.100000e-01
VeryActiveMinutes 0 1 2.116000e+01 3.284000e+01 0 0.000000e+00 4.000000e+00 3.200000e+01 2.100000e+02
FairlyActiveMinutes 0 1 1.356000e+01 1.999000e+01 0 0.000000e+00 6.000000e+00 1.900000e+01 1.430000e+02
LightlyActiveMinutes 0 1 1.928100e+02 1.091700e+02 0 1.270000e+02 1.990000e+02 2.640000e+02 5.180000e+02
SedentaryMinutes 0 1 9.912100e+02 3.012700e+02 0 7.297500e+02 1.057500e+03 1.229500e+03 1.440000e+03
Calories 0 1 2.303610e+03 7.181700e+02 0 1.828500e+03 2.134000e+03 2.793250e+03 4.900000e+03
skim_without_charts(hourly_calories)
Data summary
Name hourly_calories
Number of rows 22099
Number of columns 3
_______________________
Column type frequency:
character 1
numeric 2
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
ActivityHour 0 1 19 21 0 736 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
Id 0 1 4.848235e+09 2.4225e+09 1503960366 2320127002 4445114986 6962181067 8877689391
Calories 0 1 9.739000e+01 6.0700e+01 42 63 83 108 948
skim_without_charts(hourly_steps)
Data summary
Name hourly_steps
Number of rows 22099
Number of columns 3
_______________________
Column type frequency:
character 1
numeric 2
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
ActivityHour 0 1 19 21 0 736 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
Id 0 1 4.848235e+09 2.4225e+09 1503960366 2320127002 4445114986 6962181067 8877689391
StepTotal 0 1 3.201700e+02 6.9038e+02 0 0 40 357 10554
skim_without_charts(sleep_day)
Data summary
Name sleep_day
Number of rows 413
Number of columns 5
_______________________
Column type frequency:
character 1
numeric 4
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
SleepDay 0 1 20 21 0 31 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
Id 0 1 5.000979e+09 2.06036e+09 1503960366 3977333714 4702921684 6962181067 8792009665
TotalSleepRecords 0 1 1.120000e+00 3.50000e-01 1 1 1 1 3
TotalMinutesAsleep 0 1 4.194700e+02 1.18340e+02 58 361 433 490 796
TotalTimeInBed 0 1 4.586400e+02 1.27100e+02 61 403 463 526 961

making column names consistent and in lower case

daily_activity <- daily_activity %>% clean_names()
sleep_day <- sleep_day %>% clean_names()
hourly_calories <- hourly_calories %>% clean_names()
hourly_steps <- hourly_steps %>% clean_names()

checking for distinct user in each dataset

n_distinct(daily_activity$id) #33 unique user
## [1] 33
n_distinct(sleep_day$id) #24 unique user
## [1] 24
n_distinct(hourly_calories$id) #33 unique user
## [1] 33
n_distinct(hourly_steps$id) #33 unique user
## [1] 33

checking for duplicates records in each dataset

sum(duplicated(daily_activity))
## [1] 0
sum(duplicated(sleep_day)) #3 duplicate
## [1] 3
sum(duplicated(hourly_calories))
## [1] 0
sum(duplicated(hourly_steps))
## [1] 0

viewing duplicate rows in ‘sleep_daily’ dataset

sleep_day[duplicated(sleep_day), ] # 3 duplicated rows
##             id             sleep_day total_sleep_records total_minutes_asleep
## 162 4388161847  5/5/2016 12:00:00 AM                   1                  471
## 224 4702921684  5/7/2016 12:00:00 AM                   1                  520
## 381 8378563200 4/25/2016 12:00:00 AM                   1                  388
##     total_time_in_bed
## 162               495
## 224               543
## 381               402

Removing duplicated rows in the ‘Sleep_daily’ dataset

sleep_day <- sleep_day %>%  #dropping duplicates
  distinct() %>% drop_na()

sum(duplicated(sleep_day)) # zero(0) duplicate
## [1] 0

Data Transformation

Renaming the ‘activity_date’, ‘activity_hour’, ‘sleepy_day’ for readability purpose and converting the renamed columns to ‘date’ datatype

# rename the 'activity_date' for readability purpose and converting the 'date' to columns to 'date' datatype
daily_activity <- daily_activity %>% 
  rename(date = activity_date) %>%
  mutate(date = as.Date(date, format = "%m/%d/%Y"))
# rename the 'activity_date' for readability purpose and converting the 'date' to columns to 'date' datatype
sleep_day <- sleep_day %>% 
  rename(date = sleep_day) %>% 
  mutate(date = as.Date(date, format = "%m/%d/%Y"))
# rename the 'activity_date' for readability purpose and converting the 'date' to columns to 'date' datatype
hourly_calories <- hourly_calories %>% 
  rename(date = activity_hour) %>% 
  mutate(date = mdy_hms(date))

hourly_calories$hr <- format(as.POSIXct(hourly_calories$date), "%H") # create hour of the day column 
# rename the 'activity_date' for readability purpose and converting the 'date' to columns to 'date' datatype
hourly_steps <- hourly_steps %>% 
  rename(date = activity_hour) %>% 
  mutate(date = as.Date(date, format = "%m/%d/%Y"))
# adding 'weekday' column to represent Day of the Week
daily_activity <- daily_activity %>%
  mutate(weekday = lubridate::wday(date, label = TRUE, abbr = FALSE))

hourly_calories <- hourly_calories %>%
  mutate(weekday = lubridate::wday(date, label = TRUE, abbr = FALSE))

Data Merging

combining the data set to find trends

# inner join 'daily_activity' and 'sleep_day' on id and date column
merged_daily <-  merge(daily_activity, sleep_day, by= c("id", "date"))
merged_daily %>% head(n = 10)
##            id       date total_steps total_distance tracker_distance
## 1  1503960366 2016-04-12       13162           8.50             8.50
## 2  1503960366 2016-04-13       10735           6.97             6.97
## 3  1503960366 2016-04-15        9762           6.28             6.28
## 4  1503960366 2016-04-16       12669           8.16             8.16
## 5  1503960366 2016-04-17        9705           6.48             6.48
## 6  1503960366 2016-04-19       15506           9.88             9.88
## 7  1503960366 2016-04-20       10544           6.68             6.68
## 8  1503960366 2016-04-21        9819           6.34             6.34
## 9  1503960366 2016-04-23       14371           9.04             9.04
## 10 1503960366 2016-04-24       10039           6.41             6.41
##    logged_activities_distance very_active_distance moderately_active_distance
## 1                           0                 1.88                       0.55
## 2                           0                 1.57                       0.69
## 3                           0                 2.14                       1.26
## 4                           0                 2.71                       0.41
## 5                           0                 3.19                       0.78
## 6                           0                 3.53                       1.32
## 7                           0                 1.96                       0.48
## 8                           0                 1.34                       0.35
## 9                           0                 2.81                       0.87
## 10                          0                 2.92                       0.21
##    light_active_distance sedentary_active_distance very_active_minutes
## 1                   6.06                         0                  25
## 2                   4.71                         0                  21
## 3                   2.83                         0                  29
## 4                   5.04                         0                  36
## 5                   2.51                         0                  38
## 6                   5.03                         0                  50
## 7                   4.24                         0                  28
## 8                   4.65                         0                  19
## 9                   5.36                         0                  41
## 10                  3.28                         0                  39
##    fairly_active_minutes lightly_active_minutes sedentary_minutes calories
## 1                     13                    328               728     1985
## 2                     19                    217               776     1797
## 3                     34                    209               726     1745
## 4                     10                    221               773     1863
## 5                     20                    164               539     1728
## 6                     31                    264               775     2035
## 7                     12                    205               818     1786
## 8                      8                    211               838     1775
## 9                     21                    262               732     1949
## 10                     5                    238               709     1788
##      weekday total_sleep_records total_minutes_asleep total_time_in_bed
## 1    Tuesday                   1                  327               346
## 2  Wednesday                   2                  384               407
## 3     Friday                   1                  412               442
## 4   Saturday                   2                  340               367
## 5     Sunday                   1                  700               712
## 6    Tuesday                   1                  304               320
## 7  Wednesday                   1                  360               377
## 8   Thursday                   1                  325               364
## 9   Saturday                   1                  361               384
## 10    Sunday                   1                  430               449
# inner join 'hourly_steps' and 'hourly_calories' on id and date column
merged_hour <- merge(hourly_steps, hourly_calories, by = c("id", "date")) 
merged_hour %>% head(n = 10)
##            id       date step_total calories hr weekday
## 1  1503960366 2016-04-12        373       81 00 Tuesday
## 2  1503960366 2016-04-12        373       61 01 Tuesday
## 3  1503960366 2016-04-12        373       59 02 Tuesday
## 4  1503960366 2016-04-12        373       47 03 Tuesday
## 5  1503960366 2016-04-12        373       48 04 Tuesday
## 6  1503960366 2016-04-12        373       48 05 Tuesday
## 7  1503960366 2016-04-12        373       48 06 Tuesday
## 8  1503960366 2016-04-12        373       47 07 Tuesday
## 9  1503960366 2016-04-12        373       68 08 Tuesday
## 10 1503960366 2016-04-12        373      141 09 Tuesday

Analyze

Overview of the General statistics of the merged_daily

merged_daily %>% summary()
##        id                 date             total_steps    total_distance  
##  Min.   :1.504e+09   Min.   :2016-04-12   Min.   :   17   Min.   : 0.010  
##  1st Qu.:3.977e+09   1st Qu.:2016-04-19   1st Qu.: 5189   1st Qu.: 3.592  
##  Median :4.703e+09   Median :2016-04-27   Median : 8913   Median : 6.270  
##  Mean   :4.995e+09   Mean   :2016-04-26   Mean   : 8515   Mean   : 6.012  
##  3rd Qu.:6.962e+09   3rd Qu.:2016-05-04   3rd Qu.:11370   3rd Qu.: 8.005  
##  Max.   :8.792e+09   Max.   :2016-05-12   Max.   :22770   Max.   :17.540  
##                                                                           
##  tracker_distance logged_activities_distance very_active_distance
##  Min.   : 0.010   Min.   :0.0000             Min.   : 0.000      
##  1st Qu.: 3.592   1st Qu.:0.0000             1st Qu.: 0.000      
##  Median : 6.270   Median :0.0000             Median : 0.570      
##  Mean   : 6.007   Mean   :0.1089             Mean   : 1.446      
##  3rd Qu.: 7.950   3rd Qu.:0.0000             3rd Qu.: 2.360      
##  Max.   :17.540   Max.   :4.0817             Max.   :12.540      
##                                                                  
##  moderately_active_distance light_active_distance sedentary_active_distance
##  Min.   :0.0000             Min.   :0.010         Min.   :0.0000000        
##  1st Qu.:0.0000             1st Qu.:2.540         1st Qu.:0.0000000        
##  Median :0.4200             Median :3.665         Median :0.0000000        
##  Mean   :0.7439             Mean   :3.791         Mean   :0.0009268        
##  3rd Qu.:1.0375             3rd Qu.:4.918         3rd Qu.:0.0000000        
##  Max.   :6.4800             Max.   :9.480         Max.   :0.1100000        
##                                                                            
##  very_active_minutes fairly_active_minutes lightly_active_minutes
##  Min.   :  0.00      Min.   :  0.00        Min.   :  2.0         
##  1st Qu.:  0.00      1st Qu.:  0.00        1st Qu.:158.0         
##  Median :  9.00      Median : 11.00        Median :208.0         
##  Mean   : 25.05      Mean   : 17.92        Mean   :216.5         
##  3rd Qu.: 38.00      3rd Qu.: 26.75        3rd Qu.:263.0         
##  Max.   :210.00      Max.   :143.00        Max.   :518.0         
##                                                                  
##  sedentary_minutes    calories         weekday   total_sleep_records
##  Min.   :   0.0    Min.   : 257   Sunday   :55   Min.   :1.00       
##  1st Qu.: 631.2    1st Qu.:1841   Monday   :46   1st Qu.:1.00       
##  Median : 717.0    Median :2207   Tuesday  :65   Median :1.00       
##  Mean   : 712.1    Mean   :2389   Wednesday:66   Mean   :1.12       
##  3rd Qu.: 782.8    3rd Qu.:2920   Thursday :64   3rd Qu.:1.00       
##  Max.   :1265.0    Max.   :4900   Friday   :57   Max.   :3.00       
##                                   Saturday :57                      
##  total_minutes_asleep total_time_in_bed
##  Min.   : 58.0        Min.   : 61.0    
##  1st Qu.:361.0        1st Qu.:403.8    
##  Median :432.5        Median :463.0    
##  Mean   :419.2        Mean   :458.5    
##  3rd Qu.:490.0        3rd Qu.:526.0    
##  Max.   :796.0        Max.   :961.0    
## 

Observation:-

  1. The total steps taken by users on an average is 8515 step and at an average distance of 6.01 km. The CDC recommends people take 10,000 steps daily.

  2. The users spend 712 minutes sedentary (idle) on an average which is 11 hours 52 minutes.

  3. The users burnt an average of 2389 calories which is equivalent to 0.31kg.

  4. The average participants are lightly active at 216 minutes which is 3 hours 36 minutes.

  5. On average, participants sleep for 6 hours 59 minutes and time in bed 7 hours 38 minutes.

Activity Duration

# daily activity by user
daily_activity %>%
  group_by(id) %>% 
  summarize(fairly_active = sum(fairly_active_minutes),
            lightly_active = sum(lightly_active_minutes),
            very_active = sum(very_active_minutes),
            sedentary = sum(sedentary_minutes))
## # A tibble: 33 × 5
##            id fairly_active lightly_active very_active sedentary
##         <dbl>         <int>          <int>       <int>     <int>
##  1 1503960366           594           6818        1200     26293
##  2 1624580081           180           4758         269     38990
##  3 1644430081           641           5354         287     34856
##  4 1844505072            40           3579           4     37405
##  5 1927972279            24           1196          41     40840
##  6 2022484408           600           7981        1125     34490
##  7 2026352035             8           7956           3     21372
##  8 2320127002            80           6144          42     37823
##  9 2347167796           370           4545         243     12369
## 10 2873212765           190           9548         437     34013
## # ℹ 23 more rows

Activity by Day of the Week

#daily activity by day of the week
merged_daily %>%
  group_by(weekday) %>% 
  summarize(fairly_active = sum(fairly_active_minutes),
            lightly_active = sum(lightly_active_minutes),
            very_active = sum(very_active_minutes),
            sedentary = sum(sedentary_minutes))
## # A tibble: 7 × 5
##   weekday   fairly_active lightly_active very_active sedentary
##   <ord>             <int>          <int>       <int>     <int>
## 1 Sunday              922          11002        1218     37820
## 2 Monday              878          10229        1413     33047
## 3 Tuesday            1303          14078        1990     48103
## 4 Wednesday          1105          13726        1408     47154
## 5 Thursday           1015          12988        1463     44696
## 6 Friday              831          12693        1206     42356
## 7 Saturday           1295          14066        1571     38785

Avg . Activity by Day of the Week

# Average daily activity by day of the week
merged_daily %>%
  group_by(weekday) %>% 
  summarize(fairly_active = mean(fairly_active_minutes),
            lightly_active = mean(lightly_active_minutes),
            very_active = mean(very_active_minutes),
            sedentary = mean(sedentary_minutes))
## # A tibble: 7 × 5
##   weekday   fairly_active lightly_active very_active sedentary
##   <ord>             <dbl>          <dbl>       <dbl>     <dbl>
## 1 Sunday             16.8           200.        22.1      688.
## 2 Monday             19.1           222.        30.7      718.
## 3 Tuesday            20.0           217.        30.6      740.
## 4 Wednesday          16.7           208.        21.3      714.
## 5 Thursday           15.9           203.        22.9      698.
## 6 Friday             14.6           223.        21.2      743.
## 7 Saturday           22.7           247.        27.6      680.

Daily Avg. of Total Minutes Asleep And Total Time In Bed By Day Of The Week

# How long does Users Spend Asleep and in Bed?
merged_daily %>%
  group_by(weekday) %>% 
  summarize(total_minutes_asleep = mean(total_minutes_asleep),
            total_time_in_bed = mean(total_time_in_bed))
## # A tibble: 7 × 3
##   weekday   total_minutes_asleep total_time_in_bed
##   <ord>                    <dbl>             <dbl>
## 1 Sunday                    453.              504.
## 2 Monday                    420.              457.
## 3 Tuesday                   405.              443.
## 4 Wednesday                 435.              470.
## 5 Thursday                  401.              435.
## 6 Friday                    405.              445.
## 7 Saturday                  419.              460.

Daily Avg. of total_distance, Calories Burnt & Total Steps By Day Of The Week

merged_daily %>% 
  group_by(weekday) %>% 
  summarize(Avg.distance = mean(total_distance),
            Avg.Calories = mean(calories),
            Avg.Totalstep = mean(total_steps))
## # A tibble: 7 × 4
##   weekday   Avg.distance Avg.Calories Avg.Totalstep
##   <ord>            <dbl>        <dbl>         <dbl>
## 1 Sunday            5.18        2277.         7298.
## 2 Monday            6.54        2432.         9273.
## 3 Tuesday           6.43        2496.         9183.
## 4 Wednesday         5.72        2378.         8023.
## 5 Thursday          5.77        2307.         8184.
## 6 Friday            5.51        2330.         7901.
## 7 Saturday          7.02        2507.         9871.

Avg. Calories Burnt By Hour of Day

merged_hour %>% 
  group_by(hr) %>% 
  summarize(Avg.Calories= mean(calories))
## # A tibble: 24 × 2
##    hr    Avg.Calories
##    <chr>        <dbl>
##  1 00            71.8
##  2 01            70.2
##  3 02            69.2
##  4 03            67.5
##  5 04            68.3
##  6 05            81.9
##  7 06            86.9
##  8 07            94.4
##  9 08           103. 
## 10 09           106. 
## # ℹ 14 more rows

Share

Percentage of Activity Minutes

# user activity percentage

merged_daily %>% # SUM() of daily activity
  summarise(fairly_active = sum(fairly_active_minutes),
            lightly_active = sum(lightly_active_minutes),
            very_active = sum(very_active_minutes),
            sedentary = sum(sedentary_minutes))
##   fairly_active lightly_active very_active sedentary
## 1          7349          88782       10269    291961
#Creating data for the graph
x <- c(7349, 88782, 10269,291961)
piepercent <- paste0(round(x/sum(x)*100, 1), "%")
labels <- c("fairly_active_minutes", "lightly_active_minues",
            "very_active_minutes", "sedentary_minutes")
# Plot the chart.
pie(x, labels = piepercent, radius = 1, main = "Percentage of Activity", col = rainbow(length(x)))
legend("bottomright",c("fairly_active_minutes","lightly_active_minues", "very_active_minutes",
      "sedentary_minutes"), cex = 0.7, yjust = 0.1, xjust = -0.15, fill=rainbow(length(x)), bty = "n")

Observation:-

Relationship btw Total Steps and Calories

merged_daily %>% 
  ggplot(aes(x = total_steps, y = calories)) +
  geom_point() +
  labs(title="Total Steps Vs. Calories Burnt")

Observation:-

Relationship btw Time in Bed and Time asleep

merged_daily %>% 
  ggplot(aes(x = total_minutes_asleep, y = total_time_in_bed))+
  geom_point()+
  labs(title = "Time in Bed Vs Total Minutes Asleep")

Observation:-

Avg. Calories Burnt by Day of the week

merged_hour %>% 
  group_by(weekday) %>% 
  summarize(calories_burnt = mean(calories)) %>% 
  ggplot(aes(x = weekday, y= calories_burnt, fill = weekday)) +
  geom_col(position = "dodge") +
  labs(title = "Avg. Calories Burnt Weekly", x="Hours", y="Calories Burnt") 

Observation:-

Average Calories by time of Day

merged_hour %>% 
  group_by(hr) %>% 
  summarize(calories_burnt = mean(calories)) %>% 
  ggplot(aes(x = hr, y= calories_burnt, fill = calories_burnt)) +
  geom_col(position = "dodge") +
  labs(title = "Avg. Calories Burnt By Hour of Day", x="Hours", y="Calories Burnt") 

Observation:-

Busiest Time of the day

merged_daily %>% 
  group_by(weekday) %>% 
  summarize(Average_step = mean(total_steps)) %>% 
  ggplot(aes(x = weekday, y = Average_step, fill = weekday))+
  geom_col()

Observation:-

Act

In this final phase, we will answer the key business question and provide recommendations based on our analysis to guide Bellabeat’s marketing strategy.

What are some trends in smart device usage?

Recomendation

These recommendations below ensure that the Bellabeat’s marketing strategy is a success:

  • A Timer can be added in the Bella app to remind the users to take few steps after a certain period of inactivity.

  • A Fitness challenge group can be added as a new feature where friends or families of the user can compete and finish weekly goals especially on weekends, digital tokens can be rewarded to winners.

  • A Short intense exercise or Jogging should be incorporated as a feature especially in the Morning between 6am to 7 am since most users get active from this time interval.

  • A Customer satisfaction survey can be conducted weekly using tracked data from the previous week to assess the causes of inactive periods since user might be sick.

  • A User Nearby feature can be added as premium where users can search for a Running friend near their location. This feature is both fun for users and generates revenue for Bellabeat.