Bellabeat

Case Study: How Can a Wellness Technology Company Play It Smart?

Introduction

Bellabeat is a successful small company, a high-tech manufacturer of health-focused products for women with a potential to become a larger player in the global smart device market. Bellabeat products, collects data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits.

The analysis’s main goal are to analyze smart device usage data in order to gain insight into how consumers use non-Bellabeat smart devices and also to select one Bellabeat product to apply these insights to my presentation.

I am a junior data analyst working in the marketing analytics team at Bellabeat. In order to answer the key business questions, uncover key insights, pattern and trends on the Bellabeat’s products that will help guide marketing strategy for the company. i will follow the steps of the data analysis process: ask, prepare, process, analyze, share, and act.

Stakeholders

⦁ Urška Sršen- Bellabeat’s cofounder and Chief Creative Officer

⦁ Sando Mur- Bellabeat’s cofounder; key member of the Bellabeat executive team

⦁ Bellabeat marketing analytics team - A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat’s marketing strategy.

Product

⦁ Bellabeat app: The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits.

⦁ Leaf: Bellabeat’s classic wellness tracker can be worn as a bracelet, necklace, or clip. The Leaf tracker connects to the Bellabeat app to track activity, sleep, and stress.

⦁ Time: This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress.

⦁ Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day

This data can help users better understand their current habits and make healthy decisions. The Bellabeat app connects to their line of smart wellness products.

Ask

Business Task

⦁ What are some trends in smart device usage?

⦁ How could these trends apply to Bellabeat customers?

⦁ How could these trends help influence Bellabeat marketing strategy?

Prepare

Data Source Description

This dataset used in this study was made available through Mobius: FitBit Fitness Tracker Data https://www.kaggle.com/datasets/arashnic/fitbit. The dataset comprises of 18.csv files, it contains the combined personal fitness tracker data from thirty (30) FitBit users who consented to submit their personal data which includes their heart rate, sleep details, intensities, physical activities and other related data necessary to assess their habits.

Data Interigity(ROCCC)

Reliable: data is not reliable, contains 30 Fitbit consented respondent, hence can cause bias during analysis.

Original: Third party provider - survey via Amazon Mechanical Turk and unverifiable if the data is accurate.

Comprehensive: data within boundary required for bellabeat business task.

Current: not current, was sourced in 2016 (7 years old) and it is out of date and hence might be irrelevant to Bellabeat.

Cited : available through Mobuis via kaggle

The dataset is not recommended to produce business recommendations due to bad quality data. data does not contain information on key characteristics such as age, lifestyle of the participants.

Dataset

I would be making use of the

dailyActivity_merged.csv
hourlyCalories_merged.csv
sleepDay_merged.csv
hourlySteps_merged.csv

Process

R Programming

i will perform data cleaning operations using R to ensure the dataset is correct, complete and error free:

⦁ Explore and observe data

⦁ Check for missing or null values

⦁ Transform data — format data type

⦁ Conduct statistical analysis

I imported and load the data into RStudio.

Setting up my r environment and Loading packages

library(tidyverse)

## Warning: package 'tidyverse' was built under R version 4.3.2

## Warning: package 'ggplot2' was built under R version 4.3.2

## Warning: package 'tidyr' was built under R version 4.3.2

## Warning: package 'dplyr' was built under R version 4.3.2

## Warning: package 'lubridate' was built under R version 4.3.2

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(ggplot2)
library(janitor)

## Warning: package 'janitor' was built under R version 4.3.2

## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

library(lubridate)
library(readr)
library(skimr)

## Warning: package 'skimr' was built under R version 4.3.2

library(dplyr)
library(tidyr)
library(plotrix)

## Warning: package 'plotrix' was built under R version 4.3.2

assigining and importing dataset

daily_activity <- read.csv("dailyActivity_merged.csv")
hourly_calories <- read.csv("hourlyCalories_merged.csv")
sleep_day <- read.csv("sleepDay_merged.csv")
hourly_steps <- read.csv("hourlySteps_merged.csv")

Data Exploration

Preview the imported datasets using the head function

head(daily_activity, n = 15) #overview of first 15 records

##            Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1  1503960366    4/12/2016      13162          8.50            8.50
## 2  1503960366    4/13/2016      10735          6.97            6.97
## 3  1503960366    4/14/2016      10460          6.74            6.74
## 4  1503960366    4/15/2016       9762          6.28            6.28
## 5  1503960366    4/16/2016      12669          8.16            8.16
## 6  1503960366    4/17/2016       9705          6.48            6.48
## 7  1503960366    4/18/2016      13019          8.59            8.59
## 8  1503960366    4/19/2016      15506          9.88            9.88
## 9  1503960366    4/20/2016      10544          6.68            6.68
## 10 1503960366    4/21/2016       9819          6.34            6.34
## 11 1503960366    4/22/2016      12764          8.13            8.13
## 12 1503960366    4/23/2016      14371          9.04            9.04
## 13 1503960366    4/24/2016      10039          6.41            6.41
## 14 1503960366    4/25/2016      15355          9.80            9.80
## 15 1503960366    4/26/2016      13755          8.79            8.79
##    LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1                         0               1.88                     0.55
## 2                         0               1.57                     0.69
## 3                         0               2.44                     0.40
## 4                         0               2.14                     1.26
## 5                         0               2.71                     0.41
## 6                         0               3.19                     0.78
## 7                         0               3.25                     0.64
## 8                         0               3.53                     1.32
## 9                         0               1.96                     0.48
## 10                        0               1.34                     0.35
## 11                        0               4.76                     1.12
## 12                        0               2.81                     0.87
## 13                        0               2.92                     0.21
## 14                        0               5.29                     0.57
## 15                        0               2.33                     0.92
##    LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1                 6.06                       0                25
## 2                 4.71                       0                21
## 3                 3.91                       0                30
## 4                 2.83                       0                29
## 5                 5.04                       0                36
## 6                 2.51                       0                38
## 7                 4.71                       0                42
## 8                 5.03                       0                50
## 9                 4.24                       0                28
## 10                4.65                       0                19
## 11                2.24                       0                66
## 12                5.36                       0                41
## 13                3.28                       0                39
## 14                3.94                       0                73
## 15                5.54                       0                31
##    FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1                   13                  328              728     1985
## 2                   19                  217              776     1797
## 3                   11                  181             1218     1776
## 4                   34                  209              726     1745
## 5                   10                  221              773     1863
## 6                   20                  164              539     1728
## 7                   16                  233             1149     1921
## 8                   31                  264              775     2035
## 9                   12                  205              818     1786
## 10                   8                  211              838     1775
## 11                  27                  130             1217     1827
## 12                  21                  262              732     1949
## 13                   5                  238              709     1788
## 14                  14                  216              814     2013
## 15                  23                  279              833     1970

head(hourly_calories)

##           Id          ActivityHour Calories
## 1 1503960366 4/12/2016 12:00:00 AM       81
## 2 1503960366  4/12/2016 1:00:00 AM       61
## 3 1503960366  4/12/2016 2:00:00 AM       59
## 4 1503960366  4/12/2016 3:00:00 AM       47
## 5 1503960366  4/12/2016 4:00:00 AM       48
## 6 1503960366  4/12/2016 5:00:00 AM       48

head(hourly_steps)

##           Id          ActivityHour StepTotal
## 1 1503960366 4/12/2016 12:00:00 AM       373
## 2 1503960366  4/12/2016 1:00:00 AM       160
## 3 1503960366  4/12/2016 2:00:00 AM       151
## 4 1503960366  4/12/2016 3:00:00 AM         0
## 5 1503960366  4/12/2016 4:00:00 AM         0
## 6 1503960366  4/12/2016 5:00:00 AM         0

head(sleep_day)

##           Id              SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 4/12/2016 12:00:00 AM                 1                327
## 2 1503960366 4/13/2016 12:00:00 AM                 2                384
## 3 1503960366 4/15/2016 12:00:00 AM                 1                412
## 4 1503960366 4/16/2016 12:00:00 AM                 2                340
## 5 1503960366 4/17/2016 12:00:00 AM                 1                700
## 6 1503960366 4/19/2016 12:00:00 AM                 1                304
##   TotalTimeInBed
## 1            346
## 2            407
## 3            442
## 4            367
## 5            712
## 6            320

Lets find out if there are null or missing values in each dataset using skim_without_charts

## broad overview each dataset

skim_without_charts(daily_activity)

Data summary
Name	daily_activity
Number of rows	940
Number of columns	15
_______________________
Column type frequency:
character	1
numeric	14
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
ActivityDate	0	1	8	9	0	31	0

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100
Id	1	4.855407e+09	2.424805e+09	1503960366	2.320127e+09	4.445115e+09	6.962181e+09	8.877689e+09
TotalSteps	1	7.637910e+03	5.087150e+03	0	3.789750e+03	7.405500e+03	1.072700e+04	3.601900e+04
TotalDistance	1	5.490000e+00	3.920000e+00	0	2.620000e+00	5.240000e+00	7.710000e+00	2.803000e+01
TrackerDistance	1	5.480000e+00	3.910000e+00	0	2.620000e+00	5.240000e+00	7.710000e+00	2.803000e+01
LoggedActivitiesDistance	1	1.100000e-01	6.200000e-01	0	0.000000e+00	0.000000e+00	0.000000e+00	4.940000e+00
VeryActiveDistance	1	1.500000e+00	2.660000e+00	0	0.000000e+00	2.100000e-01	2.050000e+00	2.192000e+01
ModeratelyActiveDistance	1	5.700000e-01	8.800000e-01	0	0.000000e+00	2.400000e-01	8.000000e-01	6.480000e+00
LightActiveDistance	1	3.340000e+00	2.040000e+00	0	1.950000e+00	3.360000e+00	4.780000e+00	1.071000e+01
SedentaryActiveDistance	1	0.000000e+00	1.000000e-02	0	0.000000e+00	0.000000e+00	0.000000e+00	1.100000e-01
VeryActiveMinutes	1	2.116000e+01	3.284000e+01	0	0.000000e+00	4.000000e+00	3.200000e+01	2.100000e+02
FairlyActiveMinutes	1	1.356000e+01	1.999000e+01	0	0.000000e+00	6.000000e+00	1.900000e+01	1.430000e+02
LightlyActiveMinutes	1	1.928100e+02	1.091700e+02	0	1.270000e+02	1.990000e+02	2.640000e+02	5.180000e+02
SedentaryMinutes	1	9.912100e+02	3.012700e+02	0	7.297500e+02	1.057500e+03	1.229500e+03	1.440000e+03
Calories	1	2.303610e+03	7.181700e+02	0	1.828500e+03	2.134000e+03	2.793250e+03	4.900000e+03

skim_without_charts(hourly_calories)

Data summary
Name	hourly_calories
Number of rows	22099
Number of columns	3
_______________________
Column type frequency:
character	1
numeric	2
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
ActivityHour	0	1	19	21	0	736	0

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100
Id	0	1	4.848235e+09	2.4225e+09	1503960366	2320127002	4445114986	6962181067	8877689391
Calories	0	1	9.739000e+01	6.0700e+01	42	63	83	108	948

skim_without_charts(hourly_steps)

Data summary
Name	hourly_steps
Number of rows	22099
Number of columns	3
_______________________
Column type frequency:
character	1
numeric	2
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
ActivityHour	0	1	19	21	0	736	0

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100
Id	0	1	4.848235e+09	2.4225e+09	1503960366	2320127002	4445114986	6962181067	8877689391
StepTotal	0	1	3.201700e+02	6.9038e+02	0	0	40	357	10554

skim_without_charts(sleep_day)

Data summary
Name	sleep_day
Number of rows	413
Number of columns	5
_______________________
Column type frequency:
character	1
numeric	4
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
SleepDay	0	1	20	21	0	31	0

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100
Id	1	5.000979e+09	2.06036e+09	1503960366	3977333714	4702921684	6962181067	8792009665
TotalSleepRecords	1	1.120000e+00	3.50000e-01	1	1	1	1	3
TotalMinutesAsleep	1	4.194700e+02	1.18340e+02	58	361	433	490	796
TotalTimeInBed	1	4.586400e+02	1.27100e+02	61	403	463	526	961

making column names consistent and in lower case

daily_activity <- daily_activity %>% clean_names()
sleep_day <- sleep_day %>% clean_names()
hourly_calories <- hourly_calories %>% clean_names()
hourly_steps <- hourly_steps %>% clean_names()

checking for distinct user in each dataset

n_distinct(daily_activity$id) #33 unique user

## [1] 33

n_distinct(sleep_day$id) #24 unique user

## [1] 24

n_distinct(hourly_calories$id) #33 unique user

## [1] 33

n_distinct(hourly_steps$id) #33 unique user

## [1] 33

checking for duplicates records in each dataset

sum(duplicated(daily_activity))

## [1] 0

sum(duplicated(sleep_day)) #3 duplicate

## [1] 3

sum(duplicated(hourly_calories))

## [1] 0

sum(duplicated(hourly_steps))

## [1] 0

viewing duplicate rows in ‘sleep_daily’ dataset

sleep_day[duplicated(sleep_day), ] # 3 duplicated rows

##             id             sleep_day total_sleep_records total_minutes_asleep
## 162 4388161847  5/5/2016 12:00:00 AM                   1                  471
## 224 4702921684  5/7/2016 12:00:00 AM                   1                  520
## 381 8378563200 4/25/2016 12:00:00 AM                   1                  388
##     total_time_in_bed
## 162               495
## 224               543
## 381               402

Removing duplicated rows in the ‘Sleep_daily’ dataset

sleep_day <- sleep_day %>%  #dropping duplicates
  distinct() %>% drop_na()

sum(duplicated(sleep_day)) # zero(0) duplicate

## [1] 0

Data Transformation

Renaming the ‘activity_date’, ‘activity_hour’, ‘sleepy_day’ for readability purpose and converting the renamed columns to ‘date’ datatype

# rename the 'activity_date' for readability purpose and converting the 'date' to columns to 'date' datatype
daily_activity <- daily_activity %>% 
  rename(date = activity_date) %>%
  mutate(date = as.Date(date, format = "%m/%d/%Y"))

# rename the 'activity_date' for readability purpose and converting the 'date' to columns to 'date' datatype
sleep_day <- sleep_day %>% 
  rename(date = sleep_day) %>% 
  mutate(date = as.Date(date, format = "%m/%d/%Y"))

# rename the 'activity_date' for readability purpose and converting the 'date' to columns to 'date' datatype
hourly_calories <- hourly_calories %>% 
  rename(date = activity_hour) %>% 
  mutate(date = mdy_hms(date))

hourly_calories$hr <- format(as.POSIXct(hourly_calories$date), "%H") # create hour of the day column

# rename the 'activity_date' for readability purpose and converting the 'date' to columns to 'date' datatype
hourly_steps <- hourly_steps %>% 
  rename(date = activity_hour) %>% 
  mutate(date = as.Date(date, format = "%m/%d/%Y"))

# adding 'weekday' column to represent Day of the Week
daily_activity <- daily_activity %>%
  mutate(weekday = lubridate::wday(date, label = TRUE, abbr = FALSE))

hourly_calories <- hourly_calories %>%
  mutate(weekday = lubridate::wday(date, label = TRUE, abbr = FALSE))

Data Merging

combining the data set to find trends

# inner join 'daily_activity' and 'sleep_day' on id and date column
merged_daily <-  merge(daily_activity, sleep_day, by= c("id", "date"))
merged_daily %>% head(n = 10)

##            id       date total_steps total_distance tracker_distance
## 1  1503960366 2016-04-12       13162           8.50             8.50
## 2  1503960366 2016-04-13       10735           6.97             6.97
## 3  1503960366 2016-04-15        9762           6.28             6.28
## 4  1503960366 2016-04-16       12669           8.16             8.16
## 5  1503960366 2016-04-17        9705           6.48             6.48
## 6  1503960366 2016-04-19       15506           9.88             9.88
## 7  1503960366 2016-04-20       10544           6.68             6.68
## 8  1503960366 2016-04-21        9819           6.34             6.34
## 9  1503960366 2016-04-23       14371           9.04             9.04
## 10 1503960366 2016-04-24       10039           6.41             6.41
##    logged_activities_distance very_active_distance moderately_active_distance
## 1                           0                 1.88                       0.55
## 2                           0                 1.57                       0.69
## 3                           0                 2.14                       1.26
## 4                           0                 2.71                       0.41
## 5                           0                 3.19                       0.78
## 6                           0                 3.53                       1.32
## 7                           0                 1.96                       0.48
## 8                           0                 1.34                       0.35
## 9                           0                 2.81                       0.87
## 10                          0                 2.92                       0.21
##    light_active_distance sedentary_active_distance very_active_minutes
## 1                   6.06                         0                  25
## 2                   4.71                         0                  21
## 3                   2.83                         0                  29
## 4                   5.04                         0                  36
## 5                   2.51                         0                  38
## 6                   5.03                         0                  50
## 7                   4.24                         0                  28
## 8                   4.65                         0                  19
## 9                   5.36                         0                  41
## 10                  3.28                         0                  39
##    fairly_active_minutes lightly_active_minutes sedentary_minutes calories
## 1                     13                    328               728     1985
## 2                     19                    217               776     1797
## 3                     34                    209               726     1745
## 4                     10                    221               773     1863
## 5                     20                    164               539     1728
## 6                     31                    264               775     2035
## 7                     12                    205               818     1786
## 8                      8                    211               838     1775
## 9                     21                    262               732     1949
## 10                     5                    238               709     1788
##      weekday total_sleep_records total_minutes_asleep total_time_in_bed
## 1    Tuesday                   1                  327               346
## 2  Wednesday                   2                  384               407
## 3     Friday                   1                  412               442
## 4   Saturday                   2                  340               367
## 5     Sunday                   1                  700               712
## 6    Tuesday                   1                  304               320
## 7  Wednesday                   1                  360               377
## 8   Thursday                   1                  325               364
## 9   Saturday                   1                  361               384
## 10    Sunday                   1                  430               449

# inner join 'hourly_steps' and 'hourly_calories' on id and date column
merged_hour <- merge(hourly_steps, hourly_calories, by = c("id", "date")) 
merged_hour %>% head(n = 10)

##            id       date step_total calories hr weekday
## 1  1503960366 2016-04-12        373       81 00 Tuesday
## 2  1503960366 2016-04-12        373       61 01 Tuesday
## 3  1503960366 2016-04-12        373       59 02 Tuesday
## 4  1503960366 2016-04-12        373       47 03 Tuesday
## 5  1503960366 2016-04-12        373       48 04 Tuesday
## 6  1503960366 2016-04-12        373       48 05 Tuesday
## 7  1503960366 2016-04-12        373       48 06 Tuesday
## 8  1503960366 2016-04-12        373       47 07 Tuesday
## 9  1503960366 2016-04-12        373       68 08 Tuesday
## 10 1503960366 2016-04-12        373      141 09 Tuesday

Analyze

Overview of the General statistics of the merged_daily

merged_daily %>% summary()

##        id                 date             total_steps    total_distance  
##  Min.   :1.504e+09   Min.   :2016-04-12   Min.   :   17   Min.   : 0.010  
##  1st Qu.:3.977e+09   1st Qu.:2016-04-19   1st Qu.: 5189   1st Qu.: 3.592  
##  Median :4.703e+09   Median :2016-04-27   Median : 8913   Median : 6.270  
##  Mean   :4.995e+09   Mean   :2016-04-26   Mean   : 8515   Mean   : 6.012  
##  3rd Qu.:6.962e+09   3rd Qu.:2016-05-04   3rd Qu.:11370   3rd Qu.: 8.005  
##  Max.   :8.792e+09   Max.   :2016-05-12   Max.   :22770   Max.   :17.540  
##                                                                           
##  tracker_distance logged_activities_distance very_active_distance
##  Min.   : 0.010   Min.   :0.0000             Min.   : 0.000      
##  1st Qu.: 3.592   1st Qu.:0.0000             1st Qu.: 0.000      
##  Median : 6.270   Median :0.0000             Median : 0.570      
##  Mean   : 6.007   Mean   :0.1089             Mean   : 1.446      
##  3rd Qu.: 7.950   3rd Qu.:0.0000             3rd Qu.: 2.360      
##  Max.   :17.540   Max.   :4.0817             Max.   :12.540      
##                                                                  
##  moderately_active_distance light_active_distance sedentary_active_distance
##  Min.   :0.0000             Min.   :0.010         Min.   :0.0000000        
##  1st Qu.:0.0000             1st Qu.:2.540         1st Qu.:0.0000000        
##  Median :0.4200             Median :3.665         Median :0.0000000        
##  Mean   :0.7439             Mean   :3.791         Mean   :0.0009268        
##  3rd Qu.:1.0375             3rd Qu.:4.918         3rd Qu.:0.0000000        
##  Max.   :6.4800             Max.   :9.480         Max.   :0.1100000        
##                                                                            
##  very_active_minutes fairly_active_minutes lightly_active_minutes
##  Min.   :  0.00      Min.   :  0.00        Min.   :  2.0         
##  1st Qu.:  0.00      1st Qu.:  0.00        1st Qu.:158.0         
##  Median :  9.00      Median : 11.00        Median :208.0         
##  Mean   : 25.05      Mean   : 17.92        Mean   :216.5         
##  3rd Qu.: 38.00      3rd Qu.: 26.75        3rd Qu.:263.0         
##  Max.   :210.00      Max.   :143.00        Max.   :518.0         
##                                                                  
##  sedentary_minutes    calories         weekday   total_sleep_records
##  Min.   :   0.0    Min.   : 257   Sunday   :55   Min.   :1.00       
##  1st Qu.: 631.2    1st Qu.:1841   Monday   :46   1st Qu.:1.00       
##  Median : 717.0    Median :2207   Tuesday  :65   Median :1.00       
##  Mean   : 712.1    Mean   :2389   Wednesday:66   Mean   :1.12       
##  3rd Qu.: 782.8    3rd Qu.:2920   Thursday :64   3rd Qu.:1.00       
##  Max.   :1265.0    Max.   :4900   Friday   :57   Max.   :3.00       
##                                   Saturday :57                      
##  total_minutes_asleep total_time_in_bed
##  Min.   : 58.0        Min.   : 61.0    
##  1st Qu.:361.0        1st Qu.:403.8    
##  Median :432.5        Median :463.0    
##  Mean   :419.2        Mean   :458.5    
##  3rd Qu.:490.0        3rd Qu.:526.0    
##  Max.   :796.0        Max.   :961.0    
##

Observation:-

The total steps taken by users on an average is 8515 step and at an average distance of 6.01 km. The CDC recommends people take 10,000 steps daily.
The users spend 712 minutes sedentary (idle) on an average which is 11 hours 52 minutes.
The users burnt an average of 2389 calories which is equivalent to 0.31kg.
The average participants are lightly active at 216 minutes which is 3 hours 36 minutes.
On average, participants sleep for 6 hours 59 minutes and time in bed 7 hours 38 minutes.

Activity Duration

# daily activity by user
daily_activity %>%
  group_by(id) %>% 
  summarize(fairly_active = sum(fairly_active_minutes),
            lightly_active = sum(lightly_active_minutes),
            very_active = sum(very_active_minutes),
            sedentary = sum(sedentary_minutes))

## # A tibble: 33 × 5
##            id fairly_active lightly_active very_active sedentary
##         <dbl>         <int>          <int>       <int>     <int>
##  1 1503960366           594           6818        1200     26293
##  2 1624580081           180           4758         269     38990
##  3 1644430081           641           5354         287     34856
##  4 1844505072            40           3579           4     37405
##  5 1927972279            24           1196          41     40840
##  6 2022484408           600           7981        1125     34490
##  7 2026352035             8           7956           3     21372
##  8 2320127002            80           6144          42     37823
##  9 2347167796           370           4545         243     12369
## 10 2873212765           190           9548         437     34013
## # ℹ 23 more rows

Activity by Day of the Week

#daily activity by day of the week
merged_daily %>%
  group_by(weekday) %>% 
  summarize(fairly_active = sum(fairly_active_minutes),
            lightly_active = sum(lightly_active_minutes),
            very_active = sum(very_active_minutes),
            sedentary = sum(sedentary_minutes))

## # A tibble: 7 × 5
##   weekday   fairly_active lightly_active very_active sedentary
##   <ord>             <int>          <int>       <int>     <int>
## 1 Sunday              922          11002        1218     37820
## 2 Monday              878          10229        1413     33047
## 3 Tuesday            1303          14078        1990     48103
## 4 Wednesday          1105          13726        1408     47154
## 5 Thursday           1015          12988        1463     44696
## 6 Friday              831          12693        1206     42356
## 7 Saturday           1295          14066        1571     38785

Avg . Activity by Day of the Week

# Average daily activity by day of the week
merged_daily %>%
  group_by(weekday) %>% 
  summarize(fairly_active = mean(fairly_active_minutes),
            lightly_active = mean(lightly_active_minutes),
            very_active = mean(very_active_minutes),
            sedentary = mean(sedentary_minutes))

## # A tibble: 7 × 5
##   weekday   fairly_active lightly_active very_active sedentary
##   <ord>             <dbl>          <dbl>       <dbl>     <dbl>
## 1 Sunday             16.8           200.        22.1      688.
## 2 Monday             19.1           222.        30.7      718.
## 3 Tuesday            20.0           217.        30.6      740.
## 4 Wednesday          16.7           208.        21.3      714.
## 5 Thursday           15.9           203.        22.9      698.
## 6 Friday             14.6           223.        21.2      743.
## 7 Saturday           22.7           247.        27.6      680.

Daily Avg. of Total Minutes Asleep And Total Time In Bed By Day Of The Week

# How long does Users Spend Asleep and in Bed?
merged_daily %>%
  group_by(weekday) %>% 
  summarize(total_minutes_asleep = mean(total_minutes_asleep),
            total_time_in_bed = mean(total_time_in_bed))

## # A tibble: 7 × 3
##   weekday   total_minutes_asleep total_time_in_bed
##   <ord>                    <dbl>             <dbl>
## 1 Sunday                    453.              504.
## 2 Monday                    420.              457.
## 3 Tuesday                   405.              443.
## 4 Wednesday                 435.              470.
## 5 Thursday                  401.              435.
## 6 Friday                    405.              445.
## 7 Saturday                  419.              460.

Daily Avg. of total_distance, Calories Burnt & Total Steps By Day Of The Week

merged_daily %>% 
  group_by(weekday) %>% 
  summarize(Avg.distance = mean(total_distance),
            Avg.Calories = mean(calories),
            Avg.Totalstep = mean(total_steps))

## # A tibble: 7 × 4
##   weekday   Avg.distance Avg.Calories Avg.Totalstep
##   <ord>            <dbl>        <dbl>         <dbl>
## 1 Sunday            5.18        2277.         7298.
## 2 Monday            6.54        2432.         9273.
## 3 Tuesday           6.43        2496.         9183.
## 4 Wednesday         5.72        2378.         8023.
## 5 Thursday          5.77        2307.         8184.
## 6 Friday            5.51        2330.         7901.
## 7 Saturday          7.02        2507.         9871.

Avg. Calories Burnt By Hour of Day

merged_hour %>% 
  group_by(hr) %>% 
  summarize(Avg.Calories= mean(calories))

## # A tibble: 24 × 2
##    hr    Avg.Calories
##    <chr>        <dbl>
##  1 00            71.8
##  2 01            70.2
##  3 02            69.2
##  4 03            67.5
##  5 04            68.3
##  6 05            81.9
##  7 06            86.9
##  8 07            94.4
##  9 08           103. 
## 10 09           106. 
## # ℹ 14 more rows

Percentage of Activity Minutes

# user activity percentage

merged_daily %>% # SUM() of daily activity
  summarise(fairly_active = sum(fairly_active_minutes),
            lightly_active = sum(lightly_active_minutes),
            very_active = sum(very_active_minutes),
            sedentary = sum(sedentary_minutes))

##   fairly_active lightly_active very_active sedentary
## 1          7349          88782       10269    291961

#Creating data for the graph
x <- c(7349, 88782, 10269,291961)
piepercent <- paste0(round(x/sum(x)*100, 1), "%")
labels <- c("fairly_active_minutes", "lightly_active_minues",
            "very_active_minutes", "sedentary_minutes")
# Plot the chart.
pie(x, labels = piepercent, radius = 1, main = "Percentage of Activity", col = rainbow(length(x)))
legend("bottomright",c("fairly_active_minutes","lightly_active_minues", "very_active_minutes",
      "sedentary_minutes"), cex = 0.7, yjust = 0.1, xjust = -0.15, fill=rainbow(length(x)), bty = "n")

Observation:-

Sedentary minutes makes up 73.3%, the active minutes make up a minor percentage of 2.6%, while the lightly_active make up of 22.3%.

Relationship btw Total Steps and Calories

merged_daily %>% 
  ggplot(aes(x = total_steps, y = calories)) +
  geom_point() +
  labs(title="Total Steps Vs. Calories Burnt")

Observation:-

The greater the steps taken, the more calories users burn. Users burn mostly 1000 to 3000 calories as the steps taken range from 0 to 15,000.There show a positive correlation between the two variables.

Relationship btw Time in Bed and Time asleep

merged_daily %>% 
  ggplot(aes(x = total_minutes_asleep, y = total_time_in_bed))+
  geom_point()+
  labs(title = "Time in Bed Vs Total Minutes Asleep")

Observation:-

There is a strong correlation between the two variables. Majority of users are only in bed when they are sleeping but there are few occasions when they spend larger amount of time in bed awake.

Avg. Calories Burnt by Day of the week

merged_hour %>% 
  group_by(weekday) %>% 
  summarize(calories_burnt = mean(calories)) %>% 
  ggplot(aes(x = weekday, y= calories_burnt, fill = weekday)) +
  geom_col(position = "dodge") +
  labs(title = "Avg. Calories Burnt Weekly", x="Hours", y="Calories Burnt")

Observation:-

The most active day is Tuesday and Saturday.
The least active day is Sunday.

Average Calories by time of Day

merged_hour %>% 
  group_by(hr) %>% 
  summarize(calories_burnt = mean(calories)) %>% 
  ggplot(aes(x = hr, y= calories_burnt, fill = calories_burnt)) +
  geom_col(position = "dodge") +
  labs(title = "Avg. Calories Burnt By Hour of Day", x="Hours", y="Calories Burnt")

Observation:-

Strong correlation between activity hours and calories burned
Calories burnt increases along the early hours of the day at 5:00 AM and reduces at 9:00 PM

Busiest Time of the day

merged_daily %>% 
  group_by(weekday) %>% 
  summarize(Average_step = mean(total_steps)) %>% 
  ggplot(aes(x = weekday, y = Average_step, fill = weekday))+
  geom_col()

Observation:-

The most active day is Monday and Saturday.
The least active day is Sunday.

Act

In this final phase, we will answer the key business question and provide recommendations based on our analysis to guide Bellabeat’s marketing strategy.

What are some trends in smart device usage?

Users spend 81.2% of their time Inactive
There is a positive relationship between the total number of steps and the total number of burned calories. The more steps taken the more calories burnt by the User.
The Users start their day between 6 am and 8 am. They are most active between 12pm to 2pm and 5 pm to 7 pm, and become less active at 8 pm.

Recomendation

These recommendations below ensure that the Bellabeat’s marketing strategy is a success:

A Timer can be added in the Bella app to remind the users to take few steps after a certain period of inactivity.
A Fitness challenge group can be added as a new feature where friends or families of the user can compete and finish weekly goals especially on weekends, digital tokens can be rewarded to winners.
A Short intense exercise or Jogging should be incorporated as a feature especially in the Morning between 6am to 7 am since most users get active from this time interval.
A Customer satisfaction survey can be conducted weekly using tracked data from the previous week to assess the causes of inactive periods since user might be sick.
A User Nearby feature can be added as premium where users can search for a Running friend near their location. This feature is both fun for users and generates revenue for Bellabeat.

Bellabeat

Olorunfemi Taiwo

2023-11-17

Case Study: How Can a Wellness Technology Company Play It Smart?

Introduction

Stakeholders

Product

Ask

Business Task

Prepare

Data Source Description

Data Interigity(ROCCC)

Dataset

Process

R Programming

Setting up my r environment and Loading packages

assigining and importing dataset

Data Exploration

viewing duplicate rows in ‘sleep_daily’ dataset

Data Transformation

Data Merging

Analyze

Act

Recomendation

Bellabeat

Olorunfemi Taiwo

2023-11-17

Case Study: How Can a Wellness Technology Company Play It Smart?

Introduction

Stakeholders

Product

Ask

Business Task

Prepare

Data Source Description

Data Interigity(ROCCC)

Dataset

Process

R Programming

Setting up my r environment and Loading packages

assigining and importing dataset

Data Exploration

viewing duplicate rows in ‘sleep_daily’ dataset

Data Transformation

Data Merging

Analyze

Share

Act

Recomendation