Bellabeat is a successful small company, a high-tech manufacturer of health-focused products for women with a potential to become a larger player in the global smart device market. Bellabeat products, collects data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits.
The analysis’s main goal are to analyze smart device usage data in order to gain insight into how consumers use non-Bellabeat smart devices and also to select one Bellabeat product to apply these insights to my presentation.
I am a junior data analyst working in the marketing analytics team at Bellabeat. In order to answer the key business questions, uncover key insights, pattern and trends on the Bellabeat’s products that will help guide marketing strategy for the company. i will follow the steps of the data analysis process: ask, prepare, process, analyze, share, and act.
⦁ Urška Sršen- Bellabeat’s cofounder and Chief Creative Officer
⦁ Sando Mur- Bellabeat’s cofounder; key member of the Bellabeat executive team
⦁ Bellabeat marketing analytics team - A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat’s marketing strategy.
⦁ Bellabeat app: The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits.
⦁ Leaf: Bellabeat’s classic wellness tracker can be worn as a bracelet, necklace, or clip. The Leaf tracker connects to the Bellabeat app to track activity, sleep, and stress.
⦁ Time: This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress.
⦁ Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day
This data can help users better understand their current habits and make healthy decisions. The Bellabeat app connects to their line of smart wellness products.
⦁ What are some trends in smart device usage?
⦁ How could these trends apply to Bellabeat customers?
⦁ How could these trends help influence Bellabeat marketing strategy?
This dataset used in this study was made available through Mobius: FitBit Fitness Tracker Data https://www.kaggle.com/datasets/arashnic/fitbit. The dataset comprises of 18.csv files, it contains the combined personal fitness tracker data from thirty (30) FitBit users who consented to submit their personal data which includes their heart rate, sleep details, intensities, physical activities and other related data necessary to assess their habits.
Reliable: data is not reliable, contains 30 Fitbit consented respondent, hence can cause bias during analysis.
Original: Third party provider - survey via Amazon Mechanical Turk and unverifiable if the data is accurate.
Comprehensive: data within boundary required for bellabeat business task.
Current: not current, was sourced in 2016 (7 years old) and it is out of date and hence might be irrelevant to Bellabeat.
Cited : available through Mobuis via kaggle
The dataset is not recommended to produce business recommendations due to bad quality data. data does not contain information on key characteristics such as age, lifestyle of the participants.
I would be making use of the
dailyActivity_merged.csv
hourlyCalories_merged.csv
sleepDay_merged.csv
hourlySteps_merged.csv
i will perform data cleaning operations using R to ensure the dataset is correct, complete and error free:
⦁ Explore and observe data
⦁ Check for missing or null values
⦁ Transform data — format data type
⦁ Conduct statistical analysis
I imported and load the data into RStudio.
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.3.2
## Warning: package 'ggplot2' was built under R version 4.3.2
## Warning: package 'tidyr' was built under R version 4.3.2
## Warning: package 'dplyr' was built under R version 4.3.2
## Warning: package 'lubridate' was built under R version 4.3.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(janitor)
## Warning: package 'janitor' was built under R version 4.3.2
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(lubridate)
library(readr)
library(skimr)
## Warning: package 'skimr' was built under R version 4.3.2
library(dplyr)
library(tidyr)
library(plotrix)
## Warning: package 'plotrix' was built under R version 4.3.2
daily_activity <- read.csv("dailyActivity_merged.csv")
hourly_calories <- read.csv("hourlyCalories_merged.csv")
sleep_day <- read.csv("sleepDay_merged.csv")
hourly_steps <- read.csv("hourlySteps_merged.csv")
Preview the imported datasets using the head function
head(daily_activity, n = 15) #overview of first 15 records
## Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366 4/12/2016 13162 8.50 8.50
## 2 1503960366 4/13/2016 10735 6.97 6.97
## 3 1503960366 4/14/2016 10460 6.74 6.74
## 4 1503960366 4/15/2016 9762 6.28 6.28
## 5 1503960366 4/16/2016 12669 8.16 8.16
## 6 1503960366 4/17/2016 9705 6.48 6.48
## 7 1503960366 4/18/2016 13019 8.59 8.59
## 8 1503960366 4/19/2016 15506 9.88 9.88
## 9 1503960366 4/20/2016 10544 6.68 6.68
## 10 1503960366 4/21/2016 9819 6.34 6.34
## 11 1503960366 4/22/2016 12764 8.13 8.13
## 12 1503960366 4/23/2016 14371 9.04 9.04
## 13 1503960366 4/24/2016 10039 6.41 6.41
## 14 1503960366 4/25/2016 15355 9.80 9.80
## 15 1503960366 4/26/2016 13755 8.79 8.79
## LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.44 0.40
## 4 0 2.14 1.26
## 5 0 2.71 0.41
## 6 0 3.19 0.78
## 7 0 3.25 0.64
## 8 0 3.53 1.32
## 9 0 1.96 0.48
## 10 0 1.34 0.35
## 11 0 4.76 1.12
## 12 0 2.81 0.87
## 13 0 2.92 0.21
## 14 0 5.29 0.57
## 15 0 2.33 0.92
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## 7 4.71 0 42
## 8 5.03 0 50
## 9 4.24 0 28
## 10 4.65 0 19
## 11 2.24 0 66
## 12 5.36 0 41
## 13 3.28 0 39
## 14 3.94 0 73
## 15 5.54 0 31
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 11 181 1218 1776
## 4 34 209 726 1745
## 5 10 221 773 1863
## 6 20 164 539 1728
## 7 16 233 1149 1921
## 8 31 264 775 2035
## 9 12 205 818 1786
## 10 8 211 838 1775
## 11 27 130 1217 1827
## 12 21 262 732 1949
## 13 5 238 709 1788
## 14 14 216 814 2013
## 15 23 279 833 1970
head(hourly_calories)
## Id ActivityHour Calories
## 1 1503960366 4/12/2016 12:00:00 AM 81
## 2 1503960366 4/12/2016 1:00:00 AM 61
## 3 1503960366 4/12/2016 2:00:00 AM 59
## 4 1503960366 4/12/2016 3:00:00 AM 47
## 5 1503960366 4/12/2016 4:00:00 AM 48
## 6 1503960366 4/12/2016 5:00:00 AM 48
head(hourly_steps)
## Id ActivityHour StepTotal
## 1 1503960366 4/12/2016 12:00:00 AM 373
## 2 1503960366 4/12/2016 1:00:00 AM 160
## 3 1503960366 4/12/2016 2:00:00 AM 151
## 4 1503960366 4/12/2016 3:00:00 AM 0
## 5 1503960366 4/12/2016 4:00:00 AM 0
## 6 1503960366 4/12/2016 5:00:00 AM 0
head(sleep_day)
## Id SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 4/12/2016 12:00:00 AM 1 327
## 2 1503960366 4/13/2016 12:00:00 AM 2 384
## 3 1503960366 4/15/2016 12:00:00 AM 1 412
## 4 1503960366 4/16/2016 12:00:00 AM 2 340
## 5 1503960366 4/17/2016 12:00:00 AM 1 700
## 6 1503960366 4/19/2016 12:00:00 AM 1 304
## TotalTimeInBed
## 1 346
## 2 407
## 3 442
## 4 367
## 5 712
## 6 320
Lets find out if there are null or missing values in each dataset using skim_without_charts
## broad overview each dataset
skim_without_charts(daily_activity)
| Name | daily_activity |
| Number of rows | 940 |
| Number of columns | 15 |
| _______________________ | |
| Column type frequency: | |
| character | 1 |
| numeric | 14 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| ActivityDate | 0 | 1 | 8 | 9 | 0 | 31 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 |
|---|---|---|---|---|---|---|---|---|---|
| Id | 0 | 1 | 4.855407e+09 | 2.424805e+09 | 1503960366 | 2.320127e+09 | 4.445115e+09 | 6.962181e+09 | 8.877689e+09 |
| TotalSteps | 0 | 1 | 7.637910e+03 | 5.087150e+03 | 0 | 3.789750e+03 | 7.405500e+03 | 1.072700e+04 | 3.601900e+04 |
| TotalDistance | 0 | 1 | 5.490000e+00 | 3.920000e+00 | 0 | 2.620000e+00 | 5.240000e+00 | 7.710000e+00 | 2.803000e+01 |
| TrackerDistance | 0 | 1 | 5.480000e+00 | 3.910000e+00 | 0 | 2.620000e+00 | 5.240000e+00 | 7.710000e+00 | 2.803000e+01 |
| LoggedActivitiesDistance | 0 | 1 | 1.100000e-01 | 6.200000e-01 | 0 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 4.940000e+00 |
| VeryActiveDistance | 0 | 1 | 1.500000e+00 | 2.660000e+00 | 0 | 0.000000e+00 | 2.100000e-01 | 2.050000e+00 | 2.192000e+01 |
| ModeratelyActiveDistance | 0 | 1 | 5.700000e-01 | 8.800000e-01 | 0 | 0.000000e+00 | 2.400000e-01 | 8.000000e-01 | 6.480000e+00 |
| LightActiveDistance | 0 | 1 | 3.340000e+00 | 2.040000e+00 | 0 | 1.950000e+00 | 3.360000e+00 | 4.780000e+00 | 1.071000e+01 |
| SedentaryActiveDistance | 0 | 1 | 0.000000e+00 | 1.000000e-02 | 0 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 1.100000e-01 |
| VeryActiveMinutes | 0 | 1 | 2.116000e+01 | 3.284000e+01 | 0 | 0.000000e+00 | 4.000000e+00 | 3.200000e+01 | 2.100000e+02 |
| FairlyActiveMinutes | 0 | 1 | 1.356000e+01 | 1.999000e+01 | 0 | 0.000000e+00 | 6.000000e+00 | 1.900000e+01 | 1.430000e+02 |
| LightlyActiveMinutes | 0 | 1 | 1.928100e+02 | 1.091700e+02 | 0 | 1.270000e+02 | 1.990000e+02 | 2.640000e+02 | 5.180000e+02 |
| SedentaryMinutes | 0 | 1 | 9.912100e+02 | 3.012700e+02 | 0 | 7.297500e+02 | 1.057500e+03 | 1.229500e+03 | 1.440000e+03 |
| Calories | 0 | 1 | 2.303610e+03 | 7.181700e+02 | 0 | 1.828500e+03 | 2.134000e+03 | 2.793250e+03 | 4.900000e+03 |
skim_without_charts(hourly_calories)
| Name | hourly_calories |
| Number of rows | 22099 |
| Number of columns | 3 |
| _______________________ | |
| Column type frequency: | |
| character | 1 |
| numeric | 2 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| ActivityHour | 0 | 1 | 19 | 21 | 0 | 736 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 |
|---|---|---|---|---|---|---|---|---|---|
| Id | 0 | 1 | 4.848235e+09 | 2.4225e+09 | 1503960366 | 2320127002 | 4445114986 | 6962181067 | 8877689391 |
| Calories | 0 | 1 | 9.739000e+01 | 6.0700e+01 | 42 | 63 | 83 | 108 | 948 |
skim_without_charts(hourly_steps)
| Name | hourly_steps |
| Number of rows | 22099 |
| Number of columns | 3 |
| _______________________ | |
| Column type frequency: | |
| character | 1 |
| numeric | 2 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| ActivityHour | 0 | 1 | 19 | 21 | 0 | 736 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 |
|---|---|---|---|---|---|---|---|---|---|
| Id | 0 | 1 | 4.848235e+09 | 2.4225e+09 | 1503960366 | 2320127002 | 4445114986 | 6962181067 | 8877689391 |
| StepTotal | 0 | 1 | 3.201700e+02 | 6.9038e+02 | 0 | 0 | 40 | 357 | 10554 |
skim_without_charts(sleep_day)
| Name | sleep_day |
| Number of rows | 413 |
| Number of columns | 5 |
| _______________________ | |
| Column type frequency: | |
| character | 1 |
| numeric | 4 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| SleepDay | 0 | 1 | 20 | 21 | 0 | 31 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 |
|---|---|---|---|---|---|---|---|---|---|
| Id | 0 | 1 | 5.000979e+09 | 2.06036e+09 | 1503960366 | 3977333714 | 4702921684 | 6962181067 | 8792009665 |
| TotalSleepRecords | 0 | 1 | 1.120000e+00 | 3.50000e-01 | 1 | 1 | 1 | 1 | 3 |
| TotalMinutesAsleep | 0 | 1 | 4.194700e+02 | 1.18340e+02 | 58 | 361 | 433 | 490 | 796 |
| TotalTimeInBed | 0 | 1 | 4.586400e+02 | 1.27100e+02 | 61 | 403 | 463 | 526 | 961 |
making column names consistent and in lower case
daily_activity <- daily_activity %>% clean_names()
sleep_day <- sleep_day %>% clean_names()
hourly_calories <- hourly_calories %>% clean_names()
hourly_steps <- hourly_steps %>% clean_names()
checking for distinct user in each dataset
n_distinct(daily_activity$id) #33 unique user
## [1] 33
n_distinct(sleep_day$id) #24 unique user
## [1] 24
n_distinct(hourly_calories$id) #33 unique user
## [1] 33
n_distinct(hourly_steps$id) #33 unique user
## [1] 33
checking for duplicates records in each dataset
sum(duplicated(daily_activity))
## [1] 0
sum(duplicated(sleep_day)) #3 duplicate
## [1] 3
sum(duplicated(hourly_calories))
## [1] 0
sum(duplicated(hourly_steps))
## [1] 0
sleep_day[duplicated(sleep_day), ] # 3 duplicated rows
## id sleep_day total_sleep_records total_minutes_asleep
## 162 4388161847 5/5/2016 12:00:00 AM 1 471
## 224 4702921684 5/7/2016 12:00:00 AM 1 520
## 381 8378563200 4/25/2016 12:00:00 AM 1 388
## total_time_in_bed
## 162 495
## 224 543
## 381 402
Removing duplicated rows in the ‘Sleep_daily’ dataset
sleep_day <- sleep_day %>% #dropping duplicates
distinct() %>% drop_na()
sum(duplicated(sleep_day)) # zero(0) duplicate
## [1] 0
Renaming the ‘activity_date’, ‘activity_hour’, ‘sleepy_day’ for readability purpose and converting the renamed columns to ‘date’ datatype
# rename the 'activity_date' for readability purpose and converting the 'date' to columns to 'date' datatype
daily_activity <- daily_activity %>%
rename(date = activity_date) %>%
mutate(date = as.Date(date, format = "%m/%d/%Y"))
# rename the 'activity_date' for readability purpose and converting the 'date' to columns to 'date' datatype
sleep_day <- sleep_day %>%
rename(date = sleep_day) %>%
mutate(date = as.Date(date, format = "%m/%d/%Y"))
# rename the 'activity_date' for readability purpose and converting the 'date' to columns to 'date' datatype
hourly_calories <- hourly_calories %>%
rename(date = activity_hour) %>%
mutate(date = mdy_hms(date))
hourly_calories$hr <- format(as.POSIXct(hourly_calories$date), "%H") # create hour of the day column
# rename the 'activity_date' for readability purpose and converting the 'date' to columns to 'date' datatype
hourly_steps <- hourly_steps %>%
rename(date = activity_hour) %>%
mutate(date = as.Date(date, format = "%m/%d/%Y"))
# adding 'weekday' column to represent Day of the Week
daily_activity <- daily_activity %>%
mutate(weekday = lubridate::wday(date, label = TRUE, abbr = FALSE))
hourly_calories <- hourly_calories %>%
mutate(weekday = lubridate::wday(date, label = TRUE, abbr = FALSE))
combining the data set to find trends
# inner join 'daily_activity' and 'sleep_day' on id and date column
merged_daily <- merge(daily_activity, sleep_day, by= c("id", "date"))
merged_daily %>% head(n = 10)
## id date total_steps total_distance tracker_distance
## 1 1503960366 2016-04-12 13162 8.50 8.50
## 2 1503960366 2016-04-13 10735 6.97 6.97
## 3 1503960366 2016-04-15 9762 6.28 6.28
## 4 1503960366 2016-04-16 12669 8.16 8.16
## 5 1503960366 2016-04-17 9705 6.48 6.48
## 6 1503960366 2016-04-19 15506 9.88 9.88
## 7 1503960366 2016-04-20 10544 6.68 6.68
## 8 1503960366 2016-04-21 9819 6.34 6.34
## 9 1503960366 2016-04-23 14371 9.04 9.04
## 10 1503960366 2016-04-24 10039 6.41 6.41
## logged_activities_distance very_active_distance moderately_active_distance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.14 1.26
## 4 0 2.71 0.41
## 5 0 3.19 0.78
## 6 0 3.53 1.32
## 7 0 1.96 0.48
## 8 0 1.34 0.35
## 9 0 2.81 0.87
## 10 0 2.92 0.21
## light_active_distance sedentary_active_distance very_active_minutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 2.83 0 29
## 4 5.04 0 36
## 5 2.51 0 38
## 6 5.03 0 50
## 7 4.24 0 28
## 8 4.65 0 19
## 9 5.36 0 41
## 10 3.28 0 39
## fairly_active_minutes lightly_active_minutes sedentary_minutes calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 34 209 726 1745
## 4 10 221 773 1863
## 5 20 164 539 1728
## 6 31 264 775 2035
## 7 12 205 818 1786
## 8 8 211 838 1775
## 9 21 262 732 1949
## 10 5 238 709 1788
## weekday total_sleep_records total_minutes_asleep total_time_in_bed
## 1 Tuesday 1 327 346
## 2 Wednesday 2 384 407
## 3 Friday 1 412 442
## 4 Saturday 2 340 367
## 5 Sunday 1 700 712
## 6 Tuesday 1 304 320
## 7 Wednesday 1 360 377
## 8 Thursday 1 325 364
## 9 Saturday 1 361 384
## 10 Sunday 1 430 449
# inner join 'hourly_steps' and 'hourly_calories' on id and date column
merged_hour <- merge(hourly_steps, hourly_calories, by = c("id", "date"))
merged_hour %>% head(n = 10)
## id date step_total calories hr weekday
## 1 1503960366 2016-04-12 373 81 00 Tuesday
## 2 1503960366 2016-04-12 373 61 01 Tuesday
## 3 1503960366 2016-04-12 373 59 02 Tuesday
## 4 1503960366 2016-04-12 373 47 03 Tuesday
## 5 1503960366 2016-04-12 373 48 04 Tuesday
## 6 1503960366 2016-04-12 373 48 05 Tuesday
## 7 1503960366 2016-04-12 373 48 06 Tuesday
## 8 1503960366 2016-04-12 373 47 07 Tuesday
## 9 1503960366 2016-04-12 373 68 08 Tuesday
## 10 1503960366 2016-04-12 373 141 09 Tuesday
Overview of the General statistics of the merged_daily
merged_daily %>% summary()
## id date total_steps total_distance
## Min. :1.504e+09 Min. :2016-04-12 Min. : 17 Min. : 0.010
## 1st Qu.:3.977e+09 1st Qu.:2016-04-19 1st Qu.: 5189 1st Qu.: 3.592
## Median :4.703e+09 Median :2016-04-27 Median : 8913 Median : 6.270
## Mean :4.995e+09 Mean :2016-04-26 Mean : 8515 Mean : 6.012
## 3rd Qu.:6.962e+09 3rd Qu.:2016-05-04 3rd Qu.:11370 3rd Qu.: 8.005
## Max. :8.792e+09 Max. :2016-05-12 Max. :22770 Max. :17.540
##
## tracker_distance logged_activities_distance very_active_distance
## Min. : 0.010 Min. :0.0000 Min. : 0.000
## 1st Qu.: 3.592 1st Qu.:0.0000 1st Qu.: 0.000
## Median : 6.270 Median :0.0000 Median : 0.570
## Mean : 6.007 Mean :0.1089 Mean : 1.446
## 3rd Qu.: 7.950 3rd Qu.:0.0000 3rd Qu.: 2.360
## Max. :17.540 Max. :4.0817 Max. :12.540
##
## moderately_active_distance light_active_distance sedentary_active_distance
## Min. :0.0000 Min. :0.010 Min. :0.0000000
## 1st Qu.:0.0000 1st Qu.:2.540 1st Qu.:0.0000000
## Median :0.4200 Median :3.665 Median :0.0000000
## Mean :0.7439 Mean :3.791 Mean :0.0009268
## 3rd Qu.:1.0375 3rd Qu.:4.918 3rd Qu.:0.0000000
## Max. :6.4800 Max. :9.480 Max. :0.1100000
##
## very_active_minutes fairly_active_minutes lightly_active_minutes
## Min. : 0.00 Min. : 0.00 Min. : 2.0
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.:158.0
## Median : 9.00 Median : 11.00 Median :208.0
## Mean : 25.05 Mean : 17.92 Mean :216.5
## 3rd Qu.: 38.00 3rd Qu.: 26.75 3rd Qu.:263.0
## Max. :210.00 Max. :143.00 Max. :518.0
##
## sedentary_minutes calories weekday total_sleep_records
## Min. : 0.0 Min. : 257 Sunday :55 Min. :1.00
## 1st Qu.: 631.2 1st Qu.:1841 Monday :46 1st Qu.:1.00
## Median : 717.0 Median :2207 Tuesday :65 Median :1.00
## Mean : 712.1 Mean :2389 Wednesday:66 Mean :1.12
## 3rd Qu.: 782.8 3rd Qu.:2920 Thursday :64 3rd Qu.:1.00
## Max. :1265.0 Max. :4900 Friday :57 Max. :3.00
## Saturday :57
## total_minutes_asleep total_time_in_bed
## Min. : 58.0 Min. : 61.0
## 1st Qu.:361.0 1st Qu.:403.8
## Median :432.5 Median :463.0
## Mean :419.2 Mean :458.5
## 3rd Qu.:490.0 3rd Qu.:526.0
## Max. :796.0 Max. :961.0
##
Observation:-
The total steps taken by users on an average is 8515 step and at an average distance of 6.01 km. The CDC recommends people take 10,000 steps daily.
The users spend 712 minutes sedentary (idle) on an average which is 11 hours 52 minutes.
The users burnt an average of 2389 calories which is equivalent to 0.31kg.
The average participants are lightly active at 216 minutes which is 3 hours 36 minutes.
On average, participants sleep for 6 hours 59 minutes and time in bed 7 hours 38 minutes.
Activity Duration
# daily activity by user
daily_activity %>%
group_by(id) %>%
summarize(fairly_active = sum(fairly_active_minutes),
lightly_active = sum(lightly_active_minutes),
very_active = sum(very_active_minutes),
sedentary = sum(sedentary_minutes))
## # A tibble: 33 × 5
## id fairly_active lightly_active very_active sedentary
## <dbl> <int> <int> <int> <int>
## 1 1503960366 594 6818 1200 26293
## 2 1624580081 180 4758 269 38990
## 3 1644430081 641 5354 287 34856
## 4 1844505072 40 3579 4 37405
## 5 1927972279 24 1196 41 40840
## 6 2022484408 600 7981 1125 34490
## 7 2026352035 8 7956 3 21372
## 8 2320127002 80 6144 42 37823
## 9 2347167796 370 4545 243 12369
## 10 2873212765 190 9548 437 34013
## # ℹ 23 more rows
Activity by Day of the Week
#daily activity by day of the week
merged_daily %>%
group_by(weekday) %>%
summarize(fairly_active = sum(fairly_active_minutes),
lightly_active = sum(lightly_active_minutes),
very_active = sum(very_active_minutes),
sedentary = sum(sedentary_minutes))
## # A tibble: 7 × 5
## weekday fairly_active lightly_active very_active sedentary
## <ord> <int> <int> <int> <int>
## 1 Sunday 922 11002 1218 37820
## 2 Monday 878 10229 1413 33047
## 3 Tuesday 1303 14078 1990 48103
## 4 Wednesday 1105 13726 1408 47154
## 5 Thursday 1015 12988 1463 44696
## 6 Friday 831 12693 1206 42356
## 7 Saturday 1295 14066 1571 38785
Avg . Activity by Day of the Week
# Average daily activity by day of the week
merged_daily %>%
group_by(weekday) %>%
summarize(fairly_active = mean(fairly_active_minutes),
lightly_active = mean(lightly_active_minutes),
very_active = mean(very_active_minutes),
sedentary = mean(sedentary_minutes))
## # A tibble: 7 × 5
## weekday fairly_active lightly_active very_active sedentary
## <ord> <dbl> <dbl> <dbl> <dbl>
## 1 Sunday 16.8 200. 22.1 688.
## 2 Monday 19.1 222. 30.7 718.
## 3 Tuesday 20.0 217. 30.6 740.
## 4 Wednesday 16.7 208. 21.3 714.
## 5 Thursday 15.9 203. 22.9 698.
## 6 Friday 14.6 223. 21.2 743.
## 7 Saturday 22.7 247. 27.6 680.
Daily Avg. of Total Minutes Asleep And Total Time In Bed By Day Of The Week
# How long does Users Spend Asleep and in Bed?
merged_daily %>%
group_by(weekday) %>%
summarize(total_minutes_asleep = mean(total_minutes_asleep),
total_time_in_bed = mean(total_time_in_bed))
## # A tibble: 7 × 3
## weekday total_minutes_asleep total_time_in_bed
## <ord> <dbl> <dbl>
## 1 Sunday 453. 504.
## 2 Monday 420. 457.
## 3 Tuesday 405. 443.
## 4 Wednesday 435. 470.
## 5 Thursday 401. 435.
## 6 Friday 405. 445.
## 7 Saturday 419. 460.
Daily Avg. of total_distance, Calories Burnt & Total Steps By Day Of The Week
merged_daily %>%
group_by(weekday) %>%
summarize(Avg.distance = mean(total_distance),
Avg.Calories = mean(calories),
Avg.Totalstep = mean(total_steps))
## # A tibble: 7 × 4
## weekday Avg.distance Avg.Calories Avg.Totalstep
## <ord> <dbl> <dbl> <dbl>
## 1 Sunday 5.18 2277. 7298.
## 2 Monday 6.54 2432. 9273.
## 3 Tuesday 6.43 2496. 9183.
## 4 Wednesday 5.72 2378. 8023.
## 5 Thursday 5.77 2307. 8184.
## 6 Friday 5.51 2330. 7901.
## 7 Saturday 7.02 2507. 9871.
Avg. Calories Burnt By Hour of Day
merged_hour %>%
group_by(hr) %>%
summarize(Avg.Calories= mean(calories))
## # A tibble: 24 × 2
## hr Avg.Calories
## <chr> <dbl>
## 1 00 71.8
## 2 01 70.2
## 3 02 69.2
## 4 03 67.5
## 5 04 68.3
## 6 05 81.9
## 7 06 86.9
## 8 07 94.4
## 9 08 103.
## 10 09 106.
## # ℹ 14 more rows
In this final phase, we will answer the key business question and provide recommendations based on our analysis to guide Bellabeat’s marketing strategy.
What are some trends in smart device usage?
Users spend 81.2% of their time Inactive
There is a positive relationship between the total number of steps and the total number of burned calories. The more steps taken the more calories burnt by the User.
The Users start their day between 6 am and 8 am. They are most active between 12pm to 2pm and 5 pm to 7 pm, and become less active at 8 pm.
These recommendations below ensure that the Bellabeat’s marketing strategy is a success:
A Timer can be added in the Bella app to remind the users to take few steps after a certain period of inactivity.
A Fitness challenge group can be added as a new feature where friends or families of the user can compete and finish weekly goals especially on weekends, digital tokens can be rewarded to winners.
A Short intense exercise or Jogging should be incorporated as a feature especially in the Morning between 6am to 7 am since most users get active from this time interval.
A Customer satisfaction survey can be conducted weekly using tracked data from the previous week to assess the causes of inactive periods since user might be sick.
A User Nearby feature can be added as premium where users can search for a Running friend near their location. This feature is both fun for users and generates revenue for Bellabeat.