The business task was to analyse smart device usage data in order to gain insights into how consumers use non-Bellabeat smart devices. Then selecting one Bellabeat product to apply these insights to in the presentation. The business questions are:
What are some trends in smart device usage?
How could these trends apply to Bellabeat customers?
How could these trends help influence Bellabeat marketing strategy?
This is a public data that explores smart device users’ daily habits.
● FitBit Fitness Tracker Data clickMobius (CC0: Public Domain, dataset made available through Mobius) contains personal fitness tracker from thirty fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits.
Reliable:low — the data collected from users without demographic information
Originality: low — the data was collected from third-party Amazon Mechanical Turk
Comprehensive: high — the data contained personal health data which allowed me to answer business questions
Current: low — the respondents were generated during 04.12.2016–05.12.2016.
Cited: high — the data source was well-documented.
Urška Sršen:Bellabeat’s co-founder and Chief Creative Officer.
Sando Mur:Mathematician and Bellabeat’s co-founder.
Bellabeat marketing analytics team.
To start, let’s set up the environment by downloading and opening the necessary libraries for the analysis.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(ggplot2)
library(lubridate)
library(readr)
library(highcharter)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Highcharts (www.highcharts.com) is a Highsoft software product which is
## not free for commercial and Governmental use
Here we use Three Dataset from April 4.12.16 to May 5.12.16.
data1 <- read.csv("Raw datsets for Analysis/bellabelt case study dataset/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
data2<- read.csv("Raw datsets for Analysis/bellabelt case study dataset/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
data3<- read.csv("Raw datsets for Analysis/bellabelt case study dataset/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/weightLogInfo_merged.csv")
str(data1)
## 'data.frame': 940 obs. of 15 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityDate : chr "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
## $ TotalSteps : int 13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
## $ TotalDistance : num 8.5 6.97 6.74 6.28 8.16 ...
## $ TrackerDistance : num 8.5 6.97 6.74 6.28 8.16 ...
## $ LoggedActivitiesDistance: num 0 0 0 0 0 0 0 0 0 0 ...
## $ VeryActiveDistance : num 1.88 1.57 2.44 2.14 2.71 ...
## $ ModeratelyActiveDistance: num 0.55 0.69 0.4 1.26 0.41 ...
## $ LightActiveDistance : num 6.06 4.71 3.91 2.83 5.04 ...
## $ SedentaryActiveDistance : num 0 0 0 0 0 0 0 0 0 0 ...
## $ VeryActiveMinutes : int 25 21 30 29 36 38 42 50 28 19 ...
## $ FairlyActiveMinutes : int 13 19 11 34 10 20 16 31 12 8 ...
## $ LightlyActiveMinutes : int 328 217 181 209 221 164 233 264 205 211 ...
## $ SedentaryMinutes : int 728 776 1218 726 773 539 1149 775 818 838 ...
## $ Calories : int 1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...
data1 <- data1 %>%
distinct() %>%
mutate(Date = lubridate::mdy(ActivityDate)) %>%
fill(everything(), .direction = "down") %>%
select(-ActivityDate)
glimpse(data1)
## Rows: 940
## Columns: 15
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 150396036…
## $ TotalSteps <int> 13162, 10735, 10460, 9762, 12669, 9705, 13019…
## $ TotalDistance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8…
## $ TrackerDistance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8…
## $ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveDistance <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3.5…
## $ ModeratelyActiveDistance <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1.3…
## $ LightActiveDistance <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5.0…
## $ SedentaryActiveDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveMinutes <int> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66, 4…
## $ FairlyActiveMinutes <int> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, 21…
## $ LightlyActiveMinutes <int> 328, 217, 181, 209, 221, 164, 233, 264, 205, …
## $ SedentaryMinutes <int> 728, 776, 1218, 726, 773, 539, 1149, 775, 818…
## $ Calories <int> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 203…
## $ Date <date> 2016-04-12, 2016-04-13, 2016-04-14, 2016-04-…
# Check for missing values
sum(is.na(data1))
## [1] 0
# finding distinct and unique value
unique(data1$Id)
## [1] 1503960366 1624580081 1644430081 1844505072 1927972279 2022484408
## [7] 2026352035 2320127002 2347167796 2873212765 3372868164 3977333714
## [13] 4020332650 4057192912 4319703577 4388161847 4445114986 4558609924
## [19] 4702921684 5553957443 5577150313 6117666160 6290855005 6775888955
## [25] 6962181067 7007744171 7086361926 8053475328 8253242879 8378563200
## [31] 8583815059 8792009665 8877689391
unique(data1$Date)
## [1] "2016-04-12" "2016-04-13" "2016-04-14" "2016-04-15" "2016-04-16"
## [6] "2016-04-17" "2016-04-18" "2016-04-19" "2016-04-20" "2016-04-21"
## [11] "2016-04-22" "2016-04-23" "2016-04-24" "2016-04-25" "2016-04-26"
## [16] "2016-04-27" "2016-04-28" "2016-04-29" "2016-04-30" "2016-05-01"
## [21] "2016-05-02" "2016-05-03" "2016-05-04" "2016-05-05" "2016-05-06"
## [26] "2016-05-07" "2016-05-08" "2016-05-09" "2016-05-10" "2016-05-11"
## [31] "2016-05-12"
str(data2)
## 'data.frame': 413 obs. of 5 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ SleepDay : chr "4/12/2016 12:00:00 AM" "4/13/2016 12:00:00 AM" "4/15/2016 12:00:00 AM" "4/16/2016 12:00:00 AM" ...
## $ TotalSleepRecords : int 1 2 1 2 1 1 1 1 1 1 ...
## $ TotalMinutesAsleep: int 327 384 412 340 700 304 360 325 361 430 ...
## $ TotalTimeInBed : int 346 407 442 367 712 320 377 364 384 449 ...
## Mutate Date from given ActivityDate
data2 <- data2 %>%
distinct() %>%
mutate(Date = lubridate::mdy_hms(SleepDay)) %>%
fill(everything(), .direction = "down") %>%
select(-SleepDay)
glimpse(data2)
## Rows: 410
## Columns: 5
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 150…
## $ TotalSleepRecords <int> 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ TotalMinutesAsleep <int> 327, 384, 412, 340, 700, 304, 360, 325, 361, 430, 2…
## $ TotalTimeInBed <int> 346, 407, 442, 367, 712, 320, 377, 364, 384, 449, 3…
## $ Date <dttm> 2016-04-12, 2016-04-13, 2016-04-15, 2016-04-16, 20…
# Check for missing values
sum(is.na(data2))
## [1] 0
unique(data2$Id)
## [1] 1503960366 1644430081 1844505072 1927972279 2026352035 2320127002
## [7] 2347167796 3977333714 4020332650 4319703577 4388161847 4445114986
## [13] 4558609924 4702921684 5553957443 5577150313 6117666160 6775888955
## [19] 6962181067 7007744171 7086361926 8053475328 8378563200 8792009665
unique(data2$Date)
## [1] "2016-04-12 UTC" "2016-04-13 UTC" "2016-04-15 UTC" "2016-04-16 UTC"
## [5] "2016-04-17 UTC" "2016-04-19 UTC" "2016-04-20 UTC" "2016-04-21 UTC"
## [9] "2016-04-23 UTC" "2016-04-24 UTC" "2016-04-25 UTC" "2016-04-26 UTC"
## [13] "2016-04-28 UTC" "2016-04-29 UTC" "2016-04-30 UTC" "2016-05-01 UTC"
## [17] "2016-05-02 UTC" "2016-05-03 UTC" "2016-05-05 UTC" "2016-05-06 UTC"
## [21] "2016-05-07 UTC" "2016-05-08 UTC" "2016-05-09 UTC" "2016-05-10 UTC"
## [25] "2016-05-11 UTC" "2016-04-14 UTC" "2016-04-22 UTC" "2016-04-27 UTC"
## [29] "2016-05-04 UTC" "2016-05-12 UTC" "2016-04-18 UTC"
str(data3)
## 'data.frame': 67 obs. of 8 variables:
## $ Id : num 1.50e+09 1.50e+09 1.93e+09 2.87e+09 2.87e+09 ...
## $ Date : chr "5/2/2016 11:59:59 PM" "5/3/2016 11:59:59 PM" "4/13/2016 1:08:52 AM" "4/21/2016 11:59:59 PM" ...
## $ WeightKg : num 52.6 52.6 133.5 56.7 57.3 ...
## $ WeightPounds : num 116 116 294 125 126 ...
## $ Fat : int 22 NA NA NA NA 25 NA NA NA NA ...
## $ BMI : num 22.6 22.6 47.5 21.5 21.7 ...
## $ IsManualReport: chr "True" "True" "False" "True" ...
## $ LogId : num 1.46e+12 1.46e+12 1.46e+12 1.46e+12 1.46e+12 ...
sum(is.na(data3))
## [1] 65
data3$datetime <- mdy_hms(data3$Date)
data3 <- data3 %>%
select(-Fat,-Date,-LogId,-WeightPounds,-IsManualReport)
glimpse(data3)
## Rows: 67
## Columns: 4
## $ Id <dbl> 1503960366, 1503960366, 1927972279, 2873212765, 2873212765, 4…
## $ WeightKg <dbl> 52.6, 52.6, 133.5, 56.7, 57.3, 72.4, 72.3, 69.7, 70.3, 69.9, …
## $ BMI <dbl> 22.65, 22.65, 47.54, 21.45, 21.69, 27.45, 27.38, 27.25, 27.46…
## $ datetime <dttm> 2016-05-02 23:59:59, 2016-05-03 23:59:59, 2016-04-13 01:08:5…
data3$Date <- as.Date(data3$datetime)
data3$Time <- format(data3$datetime, "%H:%M:%S")
glimpse(data3)
## Rows: 67
## Columns: 6
## $ Id <dbl> 1503960366, 1503960366, 1927972279, 2873212765, 2873212765, 4…
## $ WeightKg <dbl> 52.6, 52.6, 133.5, 56.7, 57.3, 72.4, 72.3, 69.7, 70.3, 69.9, …
## $ BMI <dbl> 22.65, 22.65, 47.54, 21.45, 21.69, 27.45, 27.38, 27.25, 27.46…
## $ datetime <dttm> 2016-05-02 23:59:59, 2016-05-03 23:59:59, 2016-04-13 01:08:5…
## $ Date <date> 2016-05-02, 2016-05-03, 2016-04-13, 2016-04-21, 2016-05-12, …
## $ Time <chr> "23:59:59", "23:59:59", "01:08:52", "23:59:59", "23:59:59", "…
data3 <- data3 %>%
arrange( Date) %>%
select(- datetime)
glimpse(data3)
## Rows: 67
## Columns: 5
## $ Id <dbl> 6962181067, 8877689391, 1927972279, 6962181067, 8877689391, 6…
## $ WeightKg <dbl> 62.5, 85.8, 133.5, 62.1, 84.9, 61.7, 84.5, 61.5, 62.0, 85.5, …
## $ BMI <dbl> 24.39, 25.68, 47.54, 24.24, 25.41, 24.10, 25.31, 24.00, 24.21…
## $ Date <date> 2016-04-12, 2016-04-12, 2016-04-13, 2016-04-13, 2016-04-13, …
## $ Time <chr> "23:59:59", "06:47:11", "01:08:52", "23:59:59", "06:55:00", "…
sum(is.na(data3))
## [1] 0
unique(data3$Id)
## [1] 6962181067 8877689391 1927972279 4319703577 5577150313 4558609924 2873212765
## [8] 1503960366
unique(data3$Date)
## [1] "2016-04-12" "2016-04-13" "2016-04-14" "2016-04-15" "2016-04-16"
## [6] "2016-04-17" "2016-04-18" "2016-04-19" "2016-04-20" "2016-04-21"
## [11] "2016-04-22" "2016-04-23" "2016-04-24" "2016-04-25" "2016-04-26"
## [16] "2016-04-27" "2016-04-28" "2016-04-29" "2016-04-30" "2016-05-01"
## [21] "2016-05-02" "2016-05-03" "2016-05-04" "2016-05-05" "2016-05-06"
## [26] "2016-05-07" "2016-05-08" "2016-05-09" "2016-05-10" "2016-05-11"
## [31] "2016-05-12"
#Analyze
final_df <- merge(merge(data1, data2,by = c('Id', 'Date'), all = TRUE), data3,by = c('Id', 'Date'), all = TRUE)
glimpse(final_df)
## Rows: 940
## Columns: 21
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 150396036…
## $ Date <date> 2016-04-12, 2016-04-13, 2016-04-14, 2016-04-…
## $ TotalSteps <int> 13162, 10735, 10460, 9762, 12669, 9705, 13019…
## $ TotalDistance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8…
## $ TrackerDistance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8…
## $ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveDistance <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3.5…
## $ ModeratelyActiveDistance <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1.3…
## $ LightActiveDistance <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5.0…
## $ SedentaryActiveDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveMinutes <int> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66, 4…
## $ FairlyActiveMinutes <int> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, 21…
## $ LightlyActiveMinutes <int> 328, 217, 181, 209, 221, 164, 233, 264, 205, …
## $ SedentaryMinutes <int> 728, 776, 1218, 726, 773, 539, 1149, 775, 818…
## $ Calories <int> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 203…
## $ TotalSleepRecords <int> 1, 2, NA, 1, 2, 1, NA, 1, 1, 1, NA, 1, 1, 1, …
## $ TotalMinutesAsleep <int> 327, 384, NA, 412, 340, 700, NA, 304, 360, 32…
## $ TotalTimeInBed <int> 346, 407, NA, 442, 367, 712, NA, 320, 377, 36…
## $ WeightKg <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ BMI <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ Time <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
View(final_df)
summary(final_df)
## Id Date TotalSteps TotalDistance
## Min. :1.504e+09 Min. :2016-04-12 Min. : 0 Min. : 0.000
## 1st Qu.:2.320e+09 1st Qu.:2016-04-19 1st Qu.: 3790 1st Qu.: 2.620
## Median :4.445e+09 Median :2016-04-26 Median : 7406 Median : 5.245
## Mean :4.855e+09 Mean :2016-04-26 Mean : 7638 Mean : 5.490
## 3rd Qu.:6.962e+09 3rd Qu.:2016-05-04 3rd Qu.:10727 3rd Qu.: 7.713
## Max. :8.878e+09 Max. :2016-05-12 Max. :36019 Max. :28.030
##
## TrackerDistance LoggedActivitiesDistance VeryActiveDistance
## Min. : 0.000 Min. :0.0000 Min. : 0.000
## 1st Qu.: 2.620 1st Qu.:0.0000 1st Qu.: 0.000
## Median : 5.245 Median :0.0000 Median : 0.210
## Mean : 5.475 Mean :0.1082 Mean : 1.503
## 3rd Qu.: 7.710 3rd Qu.:0.0000 3rd Qu.: 2.053
## Max. :28.030 Max. :4.9421 Max. :21.920
##
## ModeratelyActiveDistance LightActiveDistance SedentaryActiveDistance
## Min. :0.0000 Min. : 0.000 Min. :0.000000
## 1st Qu.:0.0000 1st Qu.: 1.945 1st Qu.:0.000000
## Median :0.2400 Median : 3.365 Median :0.000000
## Mean :0.5675 Mean : 3.341 Mean :0.001606
## 3rd Qu.:0.8000 3rd Qu.: 4.782 3rd Qu.:0.000000
## Max. :6.4800 Max. :10.710 Max. :0.110000
##
## VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes
## Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.0
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.:127.0 1st Qu.: 729.8
## Median : 4.00 Median : 6.00 Median :199.0 Median :1057.5
## Mean : 21.16 Mean : 13.56 Mean :192.8 Mean : 991.2
## 3rd Qu.: 32.00 3rd Qu.: 19.00 3rd Qu.:264.0 3rd Qu.:1229.5
## Max. :210.00 Max. :143.00 Max. :518.0 Max. :1440.0
##
## Calories TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## Min. : 0 Min. :1.000 Min. : 58.0 Min. : 61.0
## 1st Qu.:1828 1st Qu.:1.000 1st Qu.:361.0 1st Qu.:403.8
## Median :2134 Median :1.000 Median :432.5 Median :463.0
## Mean :2304 Mean :1.119 Mean :419.2 Mean :458.5
## 3rd Qu.:2793 3rd Qu.:1.000 3rd Qu.:490.0 3rd Qu.:526.0
## Max. :4900 Max. :3.000 Max. :796.0 Max. :961.0
## NA's :530 NA's :530 NA's :530
## WeightKg BMI Time
## Min. : 52.60 Min. :21.45 Length:940
## 1st Qu.: 61.40 1st Qu.:23.96 Class :character
## Median : 62.50 Median :24.39 Mode :character
## Mean : 72.04 Mean :25.19
## 3rd Qu.: 85.05 3rd Qu.:25.56
## Max. :133.50 Max. :47.54
## NA's :873 NA's :873
bella_daily <- final_df %>% ##eliminating na from the dataset
select(Id,Date,TrackerDistance,TotalSteps,Calories) %>%
filter(complete.cases(.))
glimpse(bella_daily)
## Rows: 940
## Columns: 5
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 150396…
## $ Date <date> 2016-04-12, 2016-04-13, 2016-04-14, 2016-04-15, 2016-…
## $ TrackerDistance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.88, 6.68, …
## $ TotalSteps <int> 13162, 10735, 10460, 9762, 12669, 9705, 13019, 15506, …
## $ Calories <int> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 2035, 1786, …
summary(bella_daily)
## Id Date TrackerDistance TotalSteps
## Min. :1.504e+09 Min. :2016-04-12 Min. : 0.000 Min. : 0
## 1st Qu.:2.320e+09 1st Qu.:2016-04-19 1st Qu.: 2.620 1st Qu.: 3790
## Median :4.445e+09 Median :2016-04-26 Median : 5.245 Median : 7406
## Mean :4.855e+09 Mean :2016-04-26 Mean : 5.475 Mean : 7638
## 3rd Qu.:6.962e+09 3rd Qu.:2016-05-04 3rd Qu.: 7.710 3rd Qu.:10727
## Max. :8.878e+09 Max. :2016-05-12 Max. :28.030 Max. :36019
## Calories
## Min. : 0
## 1st Qu.:1828
## Median :2134
## Mean :2304
## 3rd Qu.:2793
## Max. :4900
bella_daily <- bella_daily %>%
distinct(Id,Date, .keep_all = TRUE) %>%
group_by(Id,Date) %>%
summarize(Total_Distance = sum(TrackerDistance),
Total_Steps =sum(TotalSteps),
Total_calories= sum(Calories)) %>%
ungroup()
## `summarise()` has grouped output by 'Id'. You can override using the `.groups`
## argument.
glimpse(bella_daily)
## Rows: 940
## Columns: 5
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1503960…
## $ Date <date> 2016-04-12, 2016-04-13, 2016-04-14, 2016-04-15, 2016-0…
## $ Total_Distance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.88, 6.68, 6…
## $ Total_Steps <int> 13162, 10735, 10460, 9762, 12669, 9705, 13019, 15506, 1…
## $ Total_calories <int> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 2035, 1786, 1…
bella_daily<- bella_daily %>%
mutate(Week = weekdays(Date))
glimpse(bella_daily)
## Rows: 940
## Columns: 6
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 1503960…
## $ Date <date> 2016-04-12, 2016-04-13, 2016-04-14, 2016-04-15, 2016-0…
## $ Total_Distance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.88, 6.68, 6…
## $ Total_Steps <int> 13162, 10735, 10460, 9762, 12669, 9705, 13019, 15506, 1…
## $ Total_calories <int> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 2035, 1786, 1…
## $ Week <chr> "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday…
bella_daily_week <- bella_daily%>%
distinct(Id,Date, .keep_all = TRUE) %>%
group_by(Id,Week) %>%
summarize( average_calories= mean(Total_calories),
average_steps = mean (Total_Steps),
average_distance= mean(Total_Distance)) %>%
ungroup()
## `summarise()` has grouped output by 'Id'. You can override using the `.groups`
## argument.
glimpse(bella_daily_week)
## Rows: 228
## Columns: 5
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 15039…
## $ Week <chr> "Friday", "Monday", "Saturday", "Sunday", "Thursday",…
## $ average_calories <dbl> 1826.25, 1939.25, 1895.00, 1769.00, 1481.60, 1967.80,…
## $ average_steps <dbl> 11466.50, 13780.75, 13426.25, 10101.50, 9500.60, 1394…
## $ average_distance <dbl> 7.3975, 8.9550, 8.5400, 6.5700, 6.1020, 8.9200, 8.228…
bella_data_sed <-final_df%>%
select(Id,Date,Calories,ends_with("Minutes")) %>%
mutate(total_minutes = rowSums(across(ends_with("Minutes"))),
total_hours = total_minutes %/% 60,
total_remaining_minutes = total_minutes %% 60)
glimpse(bella_data_sed)
## Rows: 940
## Columns: 10
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366…
## $ Date <date> 2016-04-12, 2016-04-13, 2016-04-14, 2016-04-1…
## $ Calories <int> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 2035…
## $ VeryActiveMinutes <int> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66, 41…
## $ FairlyActiveMinutes <int> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, 21,…
## $ LightlyActiveMinutes <int> 328, 217, 181, 209, 221, 164, 233, 264, 205, 2…
## $ SedentaryMinutes <int> 728, 776, 1218, 726, 773, 539, 1149, 775, 818,…
## $ total_minutes <dbl> 1094, 1033, 1440, 998, 1040, 761, 1440, 1120, …
## $ total_hours <dbl> 18, 17, 24, 16, 17, 12, 24, 18, 17, 17, 24, 17…
## $ total_remaining_minutes <dbl> 14, 13, 0, 38, 20, 41, 0, 40, 43, 56, 0, 36, 3…
summary(bella_data_sed)
## Id Date Calories VeryActiveMinutes
## Min. :1.504e+09 Min. :2016-04-12 Min. : 0 Min. : 0.00
## 1st Qu.:2.320e+09 1st Qu.:2016-04-19 1st Qu.:1828 1st Qu.: 0.00
## Median :4.445e+09 Median :2016-04-26 Median :2134 Median : 4.00
## Mean :4.855e+09 Mean :2016-04-26 Mean :2304 Mean : 21.16
## 3rd Qu.:6.962e+09 3rd Qu.:2016-05-04 3rd Qu.:2793 3rd Qu.: 32.00
## Max. :8.878e+09 Max. :2016-05-12 Max. :4900 Max. :210.00
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes total_minutes
## Min. : 0.00 Min. : 0.0 Min. : 0.0 Min. : 2.0
## 1st Qu.: 0.00 1st Qu.:127.0 1st Qu.: 729.8 1st Qu.: 989.8
## Median : 6.00 Median :199.0 Median :1057.5 Median :1440.0
## Mean : 13.56 Mean :192.8 Mean : 991.2 Mean :1218.8
## 3rd Qu.: 19.00 3rd Qu.:264.0 3rd Qu.:1229.5 3rd Qu.:1440.0
## Max. :143.00 Max. :518.0 Max. :1440.0 Max. :1440.0
## total_hours total_remaining_minutes
## Min. : 0.00 Min. : 0.00
## 1st Qu.:16.00 1st Qu.: 0.00
## Median :24.00 Median : 0.00
## Mean :20.07 Mean :14.73
## 3rd Qu.:24.00 3rd Qu.:30.00
## Max. :24.00 Max. :59.00
bella_daily_sed <- bella_data_sed %>%
distinct(Id,Date,.keep_all = TRUE) %>%
group_by(Id,Date) %>%
summarize(Max_Active = sum (VeryActiveMinutes),
Median_Active = sum(FairlyActiveMinutes),
min_Active = sum(LightlyActiveMinutes),
Not_ACtive = sum (SedentaryMinutes),
total_active_hr = sum(total_hours)) %>%
ungroup()
## `summarise()` has grouped output by 'Id'. You can override using the `.groups`
## argument.
glimpse(bella_daily_sed)
## Rows: 940
## Columns: 7
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 150396…
## $ Date <date> 2016-04-12, 2016-04-13, 2016-04-14, 2016-04-15, 2016-…
## $ Max_Active <int> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66, 41, 39, 73…
## $ Median_Active <int> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, 21, 5, 14, …
## $ min_Active <int> 328, 217, 181, 209, 221, 164, 233, 264, 205, 211, 130,…
## $ Not_ACtive <int> 728, 776, 1218, 726, 773, 539, 1149, 775, 818, 838, 12…
## $ total_active_hr <dbl> 18, 17, 24, 16, 17, 12, 24, 18, 17, 17, 24, 17, 16, 18…
bella_daily_sed<- bella_data_sed%>%
mutate(Week = weekdays(Date))
glimpse(bella_daily_sed)
## Rows: 940
## Columns: 11
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366…
## $ Date <date> 2016-04-12, 2016-04-13, 2016-04-14, 2016-04-1…
## $ Calories <int> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 2035…
## $ VeryActiveMinutes <int> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66, 41…
## $ FairlyActiveMinutes <int> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, 21,…
## $ LightlyActiveMinutes <int> 328, 217, 181, 209, 221, 164, 233, 264, 205, 2…
## $ SedentaryMinutes <int> 728, 776, 1218, 726, 773, 539, 1149, 775, 818,…
## $ total_minutes <dbl> 1094, 1033, 1440, 998, 1040, 761, 1440, 1120, …
## $ total_hours <dbl> 18, 17, 24, 16, 17, 12, 24, 18, 17, 17, 24, 17…
## $ total_remaining_minutes <dbl> 14, 13, 0, 38, 20, 41, 0, 40, 43, 56, 0, 36, 3…
## $ Week <chr> "Tuesday", "Wednesday", "Thursday", "Friday", …
DailySleep_quality <- final_df %>%
select(Id,Date,TotalMinutesAsleep) %>%
drop_na() %>%
mutate(Week = weekdays(Date)) %>%
mutate(sleep_quality = ifelse(TotalMinutesAsleep <= 420, 'Less than 7h',
ifelse(TotalMinutesAsleep <= 540, '7h to 9h',
'More than 9h'))) %>%
mutate(sleep_quality = factor(sleep_quality,
levels = c('Less than 7h','7h to 9h',
'More than 9h'))) %>%
ungroup()
glimpse(DailySleep_quality)
## Rows: 410
## Columns: 5
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 150…
## $ Date <date> 2016-04-12, 2016-04-13, 2016-04-15, 2016-04-16, 20…
## $ TotalMinutesAsleep <int> 327, 384, 412, 340, 700, 304, 360, 325, 361, 430, 2…
## $ Week <chr> "Tuesday", "Wednesday", "Friday", "Saturday", "Sund…
## $ sleep_quality <fct> Less than 7h, Less than 7h, Less than 7h, Less than…
Avrg_sleep <- final_df %>%
select(Id,Date,TotalDistance,TotalMinutesAsleep,TotalTimeInBed) %>%
drop_na() %>%
group_by(Id,Date) %>%
mutate(Week = weekdays(Date)) %>%
summarise(Average_distance = mean(TotalDistance),Average_sleep = mean(TotalMinutesAsleep))%>%
ungroup()
## `summarise()` has grouped output by 'Id'. You can override using the `.groups`
## argument.
glimpse(Avrg_sleep)
## Rows: 410
## Columns: 4
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 15039…
## $ Date <date> 2016-04-12, 2016-04-13, 2016-04-15, 2016-04-16, 2016…
## $ Average_distance <dbl> 8.50, 6.97, 6.28, 8.16, 6.48, 9.88, 6.68, 6.34, 9.04,…
## $ Average_sleep <dbl> 327, 384, 412, 340, 700, 304, 360, 325, 361, 430, 277…
weight_data <- final_df %>%
select(Id,Date,WeightKg,TotalSteps) %>%
drop_na() %>%
group_by(Id,Date) %>%
summarise(Average_weight = mean(WeightKg),Average_Steps = mean(TotalSteps)) %>%
ungroup()
## `summarise()` has grouped output by 'Id'. You can override using the `.groups`
## argument.
glimpse(weight_data)
## Rows: 67
## Columns: 4
## $ Id <dbl> 1503960366, 1503960366, 1927972279, 2873212765, 2873212…
## $ Date <date> 2016-05-02, 2016-05-03, 2016-04-13, 2016-04-21, 2016-0…
## $ Average_weight <dbl> 52.6, 52.6, 133.5, 56.7, 57.3, 72.4, 72.3, 69.7, 70.3, …
## $ Average_Steps <dbl> 14727, 15103, 356, 8859, 7566, 29, 10429, 8940, 8095, 3…
Based on the analysis, we have identified several key trends and insights that can be leveraged to enhance marketing strategies:
Calories Burned and Steps Taken:
Insight: There is a statistically significant relationship between the number of steps taken and calories burned.
Application: Highlight this relationship in marketing campaigns to emphasize the importance of physical activity for effective calorie burning.
Weekly Distance Patterns:
Insight: Users tend to cover more distance on Saturdays compared to other days of the week.
Application: Promote weekend challenges or special Saturday fitness events to capitalize on higher activity levels.
Daily Activity Patterns:
Insight: Users are more active in the first half of the day (6 AM to 1 PM) and show a decrease in activity and calories burned in the later part of the day.
Application: Schedule motivational messages and fitness reminders for the morning hours to encourage users to maintain their activity levels throughout the day.
Sleep Patterns:
Insight: Users spend more time in bed on weekends, with average sleep durations higher on Fridays and Sundays. However, not all users have good sleep quality.
Application: Introduce educational content about the benefits of good sleep and how to achieve it, and promote sleep tracking features of the bellabeat device.
Active Minutes and Weight:
Insight: Most users have an average of 20 minutes of very active minutes daily. More obese users tend to maintain more active minutes to manage their weight.
Application: Create targeted fitness programs that encourage short bursts of high-intensity activity, and share success stories of users who have effectively managed their weight through consistent activity.
Weight Management:
Insight: The data shows that users maintain a constant weight with little improvement over time.
Application: Offer personalized weight management plans and tools to help users achieve their weight goals more effectively.
Average Steps and Weight:
Insight: The average steps taken by users are 12,102, and the average weight is 72 kg.
Application: Use these averages as benchmarks in marketing materials, encouraging new users to join the community and reach these fitness milestones.
Promote Activity Challenges:
Morning Motivation Campaigns:
Sleep Health Education:
Personalized Fitness Plans:
Success Stories and Testimonials:
Weekend Wellness Programs:
By aligning marketing strategies with these insights, bellabeat can enhance user engagement, promote healthy habits, and ultimately drive product adoption and satisfaction.