This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
You can also embed plots, for example:
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.
In this case study, you will imagine you are working for Bellabeat, a high-tech manufacturer of health-focused products for women, and meet different characters and team members. You are a junior data analyst working on the marketing analyst team at Bellabeat. Bellabeat is a successful small company, but they have the potential to become a larger player in the global smart device market.
Urška Sršen, cofounder and Chief Creative Officer of Bellabeat, believes that analyzing smart device fitness data could help unlock new growth opportunities for the company. You have been asked to focus on one of Bellabeat’s products and analyze smart device data to gain insight into how consumers are using their smart devices. The insights you discover will then help guide marketing strategy for the company. You will present your analysis to the Bellabeat executive team along with your high-level recommendations for Bellabeat’s marketing strategy.
Urška Sršen: Bellabeat’s co founder and Chief Creative Officer
Sando Mur: Mathematician and Bellabeat cofounder; key member of the Bellabeat executive team
Bellabeat marketing analytics team: A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat’s marketing strategy. You joined this team six months ago and have been busy learning about Bellabeat’’s mission and business goals — as well as how you, as a junior data analyst, can help Bellabeat achieve them.
Bellabeat app: The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. This data can help users better understand their current habits and make healthy decisions. The Bellabeat app connects to their line of smart wellness products.
Leaf: Bellabeat’s classic wellness tracker can
be worn as a bracelet, necklace, or clip. The Leaf tracker connects to
the Bellabeat app to track activity, sleep, and stress.
Time: This wellness watch combines the timeless look of
a classic timepiece with smart technology to track user activity, sleep,
and stress. The Time watch connects to the Bellabeat app to provide you
with insights into your daily wellness.
Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day. The Spring bottle connects to the Bellabeat app to track your hydration levels.
Bellabeat membership: Bellabeat also offers a subscription-based membership program for users. Membership gives users 24/7 access to fully personalized guidance on nutrition, activity, sleep, health and beauty, and mindfulness based on their lifestyle and goals.
Urška Sršen (Bellabeat’s co-founder, Chief Creative Officer) asks you to analyze smart device usage data in order to gain insight into how consumers use non-Bellabeat smart devices. She then wants you to select one Bellabeat product to apply these insights to in your presentation.
Analyze smart device usage data in order to gain insight into how consumers use non-Bellabeat smart devices and use those insights to create high-level recommendations for how these trends can inform Bellabeat marketing strategy.
In this scenario, Sršen encourages you to use public data that explores smart device users’ daily habits. Specifically she requests you use FitBit Fitness Tracker (https://www.kaggle.com/datasets/arashnic/fitbit) Data (CC0: Public Domain, dataset made available through Mobiushttps://www.kaggle.com/arashnic): This Kaggle data set contains personal fitness tracker from thirty FitBit users. Thirty eligible FitBit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits.
Loaded the packages needed for the analysis using the library() function.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.7 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(lubridate)
##
## Attaching package: 'lubridate'
##
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(ggplot2)
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(dplyr)
library(skimr)
library(here)
## here() starts at C:/Users/yeiro/OneDrive/Desktop/Bellabeat Case Study data
Imported the csv files from the kaggle dataset that were used for analysis.
activity_daily <- read.csv("dailyActivity_merged.csv")
calories_daily <- read.csv("dailyCalories_merged.csv")
daily_steps <- read.csv("dailySteps_merged.csv")
sleep_daily <- read.csv("sleepDay_merged.csv")
weight_log <- read.csv("weightLogInfo_merged.csv")
Checked the data to see if there were any structural changes that might be needed. Also checked to see how many observations were in each data frame and take a look at the column names for each.
head(activity_daily)
## Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366 4/12/2016 13162 8.50 8.50
## 2 1503960366 4/13/2016 10735 6.97 6.97
## 3 1503960366 4/14/2016 10460 6.74 6.74
## 4 1503960366 4/15/2016 9762 6.28 6.28
## 5 1503960366 4/16/2016 12669 8.16 8.16
## 6 1503960366 4/17/2016 9705 6.48 6.48
## LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.44 0.40
## 4 0 2.14 1.26
## 5 0 2.71 0.41
## 6 0 3.19 0.78
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 11 181 1218 1776
## 4 34 209 726 1745
## 5 10 221 773 1863
## 6 20 164 539 1728
head(calories_daily)
## Id ActivityDay Calories
## 1 1503960366 4/12/2016 1985
## 2 1503960366 4/13/2016 1797
## 3 1503960366 4/14/2016 1776
## 4 1503960366 4/15/2016 1745
## 5 1503960366 4/16/2016 1863
## 6 1503960366 4/17/2016 1728
head(daily_steps)
## Id ActivityDay StepTotal
## 1 1503960366 4/12/2016 13162
## 2 1503960366 4/13/2016 10735
## 3 1503960366 4/14/2016 10460
## 4 1503960366 4/15/2016 9762
## 5 1503960366 4/16/2016 12669
## 6 1503960366 4/17/2016 9705
head(sleep_daily)
## Id SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 4/12/2016 12:00:00 AM 1 327
## 2 1503960366 4/13/2016 12:00:00 AM 2 384
## 3 1503960366 4/15/2016 12:00:00 AM 1 412
## 4 1503960366 4/16/2016 12:00:00 AM 2 340
## 5 1503960366 4/17/2016 12:00:00 AM 1 700
## 6 1503960366 4/19/2016 12:00:00 AM 1 304
## TotalTimeInBed
## 1 346
## 2 407
## 3 442
## 4 367
## 5 712
## 6 320
head(weight_log)
## Id Date WeightKg WeightPounds Fat BMI
## 1 1503960366 5/2/2016 11:59:59 PM 52.6 115.9631 22 22.65
## 2 1503960366 5/3/2016 11:59:59 PM 52.6 115.9631 NA 22.65
## 3 1927972279 4/13/2016 1:08:52 AM 133.5 294.3171 NA 47.54
## 4 2873212765 4/21/2016 11:59:59 PM 56.7 125.0021 NA 21.45
## 5 2873212765 5/12/2016 11:59:59 PM 57.3 126.3249 NA 21.69
## 6 4319703577 4/17/2016 11:59:59 PM 72.4 159.6147 25 27.45
## IsManualReport LogId
## 1 True 1.462234e+12
## 2 True 1.462320e+12
## 3 False 1.460510e+12
## 4 True 1.461283e+12
## 5 True 1.463098e+12
## 6 True 1.460938e+12
str(activity_daily)
## 'data.frame': 940 obs. of 15 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityDate : chr "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
## $ TotalSteps : int 13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
## $ TotalDistance : num 8.5 6.97 6.74 6.28 8.16 ...
## $ TrackerDistance : num 8.5 6.97 6.74 6.28 8.16 ...
## $ LoggedActivitiesDistance: num 0 0 0 0 0 0 0 0 0 0 ...
## $ VeryActiveDistance : num 1.88 1.57 2.44 2.14 2.71 ...
## $ ModeratelyActiveDistance: num 0.55 0.69 0.4 1.26 0.41 ...
## $ LightActiveDistance : num 6.06 4.71 3.91 2.83 5.04 ...
## $ SedentaryActiveDistance : num 0 0 0 0 0 0 0 0 0 0 ...
## $ VeryActiveMinutes : int 25 21 30 29 36 38 42 50 28 19 ...
## $ FairlyActiveMinutes : int 13 19 11 34 10 20 16 31 12 8 ...
## $ LightlyActiveMinutes : int 328 217 181 209 221 164 233 264 205 211 ...
## $ SedentaryMinutes : int 728 776 1218 726 773 539 1149 775 818 838 ...
## $ Calories : int 1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...
str(calories_daily)
## 'data.frame': 940 obs. of 3 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityDay: chr "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
## $ Calories : int 1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...
str(daily_steps)
## 'data.frame': 940 obs. of 3 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityDay: chr "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
## $ StepTotal : int 13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
str(sleep_daily)
## 'data.frame': 413 obs. of 5 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ SleepDay : chr "4/12/2016 12:00:00 AM" "4/13/2016 12:00:00 AM" "4/15/2016 12:00:00 AM" "4/16/2016 12:00:00 AM" ...
## $ TotalSleepRecords : int 1 2 1 2 1 1 1 1 1 1 ...
## $ TotalMinutesAsleep: int 327 384 412 340 700 304 360 325 361 430 ...
## $ TotalTimeInBed : int 346 407 442 367 712 320 377 364 384 449 ...
str(weight_log)
## 'data.frame': 67 obs. of 8 variables:
## $ Id : num 1.50e+09 1.50e+09 1.93e+09 2.87e+09 2.87e+09 ...
## $ Date : chr "5/2/2016 11:59:59 PM" "5/3/2016 11:59:59 PM" "4/13/2016 1:08:52 AM" "4/21/2016 11:59:59 PM" ...
## $ WeightKg : num 52.6 52.6 133.5 56.7 57.3 ...
## $ WeightPounds : num 116 116 294 125 126 ...
## $ Fat : int 22 NA NA NA NA 25 NA NA NA NA ...
## $ BMI : num 22.6 22.6 47.5 21.5 21.7 ...
## $ IsManualReport: chr "True" "True" "False" "True" ...
## $ LogId : num 1.46e+12 1.46e+12 1.46e+12 1.46e+12 1.46e+12 ...
Looked at the number of distinct ids to confirm how many unique participants there were for each data frame.
n_distinct(activity_daily$Id)
## [1] 33
n_distinct(calories_daily$Id)
## [1] 33
n_distinct(daily_steps$Id)
## [1] 33
n_distinct(sleep_daily$Id)
## [1] 24
n_distinct(weight_log$Id)
## [1] 8
Also checked the consistency across the data by seeing how many observations there were in each data frame.
nrow(activity_daily)
## [1] 940
nrow(calories_daily)
## [1] 940
nrow(daily_steps)
## [1] 940
nrow(sleep_daily)
## [1] 413
nrow(weight_log)
## [1] 67
After looking at the column names, I decided to rename the ActivityDay, SleepDay and Date columns in the other data frames to be consistent. Changed format to date for ActivityDate column in each data frame.
calories_daily <- calories_daily %>%
rename(ActivityDate = ActivityDay) %>%
rename(calories = Calories) %>%
mutate(ActivityDate = as_date(ActivityDate, format = "%m/%d/%Y"))
daily_steps <- daily_steps %>%
rename(ActivityDate = ActivityDay) %>%
rename(TotalSteps = StepTotal) %>%
mutate(ActivityDate = as_date(ActivityDate, format = "%m/%d/%Y"))
activity_daily <- activity_daily %>%
rename(calories = Calories) %>%
mutate(ActivityDate = as_date(ActivityDate, format = "%m/%d/%Y"))
sleep_daily <- sleep_daily %>%
rename(ActivityDate = SleepDay) %>%
mutate(ActivityDate = as_date(ActivityDate, format = "%m/%d/%Y"))
weight_log <- weight_log %>%
rename(ActivityDate = Date) %>%
mutate(ActivityDate = as_date(ActivityDate, format = "%m/%d/%Y"))
Remove duplicates and drop NA.
activity_daily <- activity_daily %>%
distinct() %>%
drop_na()
calories_daily <- calories_daily %>%
distinct() %>%
drop_na()
daily_steps <- daily_steps %>%
distinct() %>%
drop_na()
sleep_daily <- sleep_daily %>%
distinct() %>%
drop_na()
weight_log <- weight_log %>%
distinct() %>%
drop_na()
Next the data frames were summarized to take a high level look at the data.
summary(activity_daily)
## Id ActivityDate TotalSteps TotalDistance
## Min. :1.504e+09 Min. :2016-04-12 Min. : 0 Min. : 0.000
## 1st Qu.:2.320e+09 1st Qu.:2016-04-19 1st Qu.: 3790 1st Qu.: 2.620
## Median :4.445e+09 Median :2016-04-26 Median : 7406 Median : 5.245
## Mean :4.855e+09 Mean :2016-04-26 Mean : 7638 Mean : 5.490
## 3rd Qu.:6.962e+09 3rd Qu.:2016-05-04 3rd Qu.:10727 3rd Qu.: 7.713
## Max. :8.878e+09 Max. :2016-05-12 Max. :36019 Max. :28.030
## TrackerDistance LoggedActivitiesDistance VeryActiveDistance
## Min. : 0.000 Min. :0.0000 Min. : 0.000
## 1st Qu.: 2.620 1st Qu.:0.0000 1st Qu.: 0.000
## Median : 5.245 Median :0.0000 Median : 0.210
## Mean : 5.475 Mean :0.1082 Mean : 1.503
## 3rd Qu.: 7.710 3rd Qu.:0.0000 3rd Qu.: 2.053
## Max. :28.030 Max. :4.9421 Max. :21.920
## ModeratelyActiveDistance LightActiveDistance SedentaryActiveDistance
## Min. :0.0000 Min. : 0.000 Min. :0.000000
## 1st Qu.:0.0000 1st Qu.: 1.945 1st Qu.:0.000000
## Median :0.2400 Median : 3.365 Median :0.000000
## Mean :0.5675 Mean : 3.341 Mean :0.001606
## 3rd Qu.:0.8000 3rd Qu.: 4.782 3rd Qu.:0.000000
## Max. :6.4800 Max. :10.710 Max. :0.110000
## VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes
## Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.0
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.:127.0 1st Qu.: 729.8
## Median : 4.00 Median : 6.00 Median :199.0 Median :1057.5
## Mean : 21.16 Mean : 13.56 Mean :192.8 Mean : 991.2
## 3rd Qu.: 32.00 3rd Qu.: 19.00 3rd Qu.:264.0 3rd Qu.:1229.5
## Max. :210.00 Max. :143.00 Max. :518.0 Max. :1440.0
## calories
## Min. : 0
## 1st Qu.:1828
## Median :2134
## Mean :2304
## 3rd Qu.:2793
## Max. :4900
summary(calories_daily)
## Id ActivityDate calories
## Min. :1.504e+09 Min. :2016-04-12 Min. : 0
## 1st Qu.:2.320e+09 1st Qu.:2016-04-19 1st Qu.:1828
## Median :4.445e+09 Median :2016-04-26 Median :2134
## Mean :4.855e+09 Mean :2016-04-26 Mean :2304
## 3rd Qu.:6.962e+09 3rd Qu.:2016-05-04 3rd Qu.:2793
## Max. :8.878e+09 Max. :2016-05-12 Max. :4900
summary(daily_steps)
## Id ActivityDate TotalSteps
## Min. :1.504e+09 Min. :2016-04-12 Min. : 0
## 1st Qu.:2.320e+09 1st Qu.:2016-04-19 1st Qu.: 3790
## Median :4.445e+09 Median :2016-04-26 Median : 7406
## Mean :4.855e+09 Mean :2016-04-26 Mean : 7638
## 3rd Qu.:6.962e+09 3rd Qu.:2016-05-04 3rd Qu.:10727
## Max. :8.878e+09 Max. :2016-05-12 Max. :36019
summary(sleep_daily)
## Id ActivityDate TotalSleepRecords TotalMinutesAsleep
## Min. :1.504e+09 Min. :2016-04-12 Min. :1.00 Min. : 58.0
## 1st Qu.:3.977e+09 1st Qu.:2016-04-19 1st Qu.:1.00 1st Qu.:361.0
## Median :4.703e+09 Median :2016-04-27 Median :1.00 Median :432.5
## Mean :4.995e+09 Mean :2016-04-26 Mean :1.12 Mean :419.2
## 3rd Qu.:6.962e+09 3rd Qu.:2016-05-04 3rd Qu.:1.00 3rd Qu.:490.0
## Max. :8.792e+09 Max. :2016-05-12 Max. :3.00 Max. :796.0
## TotalTimeInBed
## Min. : 61.0
## 1st Qu.:403.8
## Median :463.0
## Mean :458.5
## 3rd Qu.:526.0
## Max. :961.0
summary(weight_log)
## Id ActivityDate WeightKg WeightPounds
## Min. :1.504e+09 Min. :2016-04-17 Min. :52.60 Min. :116.0
## 1st Qu.:2.208e+09 1st Qu.:2016-04-20 1st Qu.:57.55 1st Qu.:126.9
## Median :2.912e+09 Median :2016-04-24 Median :62.50 Median :137.8
## Mean :2.912e+09 Mean :2016-04-24 Mean :62.50 Mean :137.8
## 3rd Qu.:3.616e+09 3rd Qu.:2016-04-28 3rd Qu.:67.45 3rd Qu.:148.7
## Max. :4.320e+09 Max. :2016-05-02 Max. :72.40 Max. :159.6
## Fat BMI IsManualReport LogId
## Min. :22.00 Min. :22.65 Length:2 Min. :1.461e+12
## 1st Qu.:22.75 1st Qu.:23.85 Class :character 1st Qu.:1.461e+12
## Median :23.50 Median :25.05 Mode :character Median :1.462e+12
## Mean :23.50 Mean :25.05 Mean :1.462e+12
## 3rd Qu.:24.25 3rd Qu.:26.25 3rd Qu.:1.462e+12
## Max. :25.00 Max. :27.45 Max. :1.462e+12
After reviewing the summaries for each dataset, merged the data frames together.
merged_calories_activity <- merge(activity_daily, calories_daily, by=c("Id", "ActivityDate", "calories"))
user_activity <- merge(merged_calories_activity, daily_steps, by= c("Id", "ActivityDate", "TotalSteps"))
weight_and_activity <- merge(user_activity, weight_log, by=c("Id", "ActivityDate"))
str(user_activity)
## 'data.frame': 940 obs. of 15 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityDate : Date, format: "2016-04-12" "2016-04-13" ...
## $ TotalSteps : int 13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
## $ calories : int 1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...
## $ TotalDistance : num 8.5 6.97 6.74 6.28 8.16 ...
## $ TrackerDistance : num 8.5 6.97 6.74 6.28 8.16 ...
## $ LoggedActivitiesDistance: num 0 0 0 0 0 0 0 0 0 0 ...
## $ VeryActiveDistance : num 1.88 1.57 2.44 2.14 2.71 ...
## $ ModeratelyActiveDistance: num 0.55 0.69 0.4 1.26 0.41 ...
## $ LightActiveDistance : num 6.06 4.71 3.91 2.83 5.04 ...
## $ SedentaryActiveDistance : num 0 0 0 0 0 0 0 0 0 0 ...
## $ VeryActiveMinutes : int 25 21 30 29 36 38 42 50 28 19 ...
## $ FairlyActiveMinutes : int 13 19 11 34 10 20 16 31 12 8 ...
## $ LightlyActiveMinutes : int 328 217 181 209 221 164 233 264 205 211 ...
## $ SedentaryMinutes : int 728 776 1218 726 773 539 1149 775 818 838 ...
str(weight_and_activity)
## 'data.frame': 2 obs. of 21 variables:
## $ Id : num 1.50e+09 4.32e+09
## $ ActivityDate : Date, format: "2016-05-02" "2016-04-17"
## $ TotalSteps : int 14727 29
## $ calories : int 2004 1464
## $ TotalDistance : num 9.71 0.02
## $ TrackerDistance : num 9.71 0.02
## $ LoggedActivitiesDistance: num 0 0
## $ VeryActiveDistance : num 3.21 0
## $ ModeratelyActiveDistance: num 0.57 0
## $ LightActiveDistance : num 5.92 0.02
## $ SedentaryActiveDistance : num 0 0
## $ VeryActiveMinutes : int 41 0
## $ FairlyActiveMinutes : int 15 0
## $ LightlyActiveMinutes : int 277 3
## $ SedentaryMinutes : int 798 1363
## $ WeightKg : num 52.6 72.4
## $ WeightPounds : num 116 160
## $ Fat : int 22 25
## $ BMI : num 22.6 27.5
## $ IsManualReport : chr "True" "True"
## $ LogId : num 1.46e+12 1.46e+12
Then checked the summaries for the merged data frames
summary(user_activity)
## Id ActivityDate TotalSteps calories
## Min. :1.504e+09 Min. :2016-04-12 Min. : 0 Min. : 0
## 1st Qu.:2.320e+09 1st Qu.:2016-04-19 1st Qu.: 3790 1st Qu.:1828
## Median :4.445e+09 Median :2016-04-26 Median : 7406 Median :2134
## Mean :4.855e+09 Mean :2016-04-26 Mean : 7638 Mean :2304
## 3rd Qu.:6.962e+09 3rd Qu.:2016-05-04 3rd Qu.:10727 3rd Qu.:2793
## Max. :8.878e+09 Max. :2016-05-12 Max. :36019 Max. :4900
## TotalDistance TrackerDistance LoggedActivitiesDistance VeryActiveDistance
## Min. : 0.000 Min. : 0.000 Min. :0.0000 Min. : 0.000
## 1st Qu.: 2.620 1st Qu.: 2.620 1st Qu.:0.0000 1st Qu.: 0.000
## Median : 5.245 Median : 5.245 Median :0.0000 Median : 0.210
## Mean : 5.490 Mean : 5.475 Mean :0.1082 Mean : 1.503
## 3rd Qu.: 7.713 3rd Qu.: 7.710 3rd Qu.:0.0000 3rd Qu.: 2.053
## Max. :28.030 Max. :28.030 Max. :4.9421 Max. :21.920
## ModeratelyActiveDistance LightActiveDistance SedentaryActiveDistance
## Min. :0.0000 Min. : 0.000 Min. :0.000000
## 1st Qu.:0.0000 1st Qu.: 1.945 1st Qu.:0.000000
## Median :0.2400 Median : 3.365 Median :0.000000
## Mean :0.5675 Mean : 3.341 Mean :0.001606
## 3rd Qu.:0.8000 3rd Qu.: 4.782 3rd Qu.:0.000000
## Max. :6.4800 Max. :10.710 Max. :0.110000
## VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes
## Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.0
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.:127.0 1st Qu.: 729.8
## Median : 4.00 Median : 6.00 Median :199.0 Median :1057.5
## Mean : 21.16 Mean : 13.56 Mean :192.8 Mean : 991.2
## 3rd Qu.: 32.00 3rd Qu.: 19.00 3rd Qu.:264.0 3rd Qu.:1229.5
## Max. :210.00 Max. :143.00 Max. :518.0 Max. :1440.0
There are several interesting findings in the summary including:
Average daily sedentary time was 991.2 minutes (16 hours and 31 minutes). Over 2/3 of the day on average. This could be explained by work and sleep if the average user’s day typically involved 8 hours of each.
The weight_and_activity data frame only had 2 observations from two distinct id numbers just as the weight_log did, which means only 2 out of the 33 users involved in the study logged their weight, only once each, during the period when the data was collected.
This suggests users aren’t actively logging their weight which could be an area of interest.
Obtaining user feedback on how they feel about using the feature and how likely they are to use it would be helpful. While weight plays a significant part in a fitness journey, users might not want to enter it and insight into why can help improve the product
Additional columns were added using the ActivityDate column to include day, month, year, and day of the week
user_activity$date <- as.Date(user_activity$ActivityDate)
user_activity$month <- format(as.Date(user_activity$date), "%m")
user_activity$day <- format(as.Date(user_activity$date), "%d")
user_activity$year <- format(as.Date(user_activity$date), "%Y")
user_activity$day_of_week <- format(as.Date(user_activity$date), "%A")
user_activity$day_of_week <- ordered(user_activity$day_of_week, levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
Average calories, steps and distance by day of week using user_activity dataframe:
user_activity %>%
mutate(weekday = wday(ActivityDate, label = TRUE)) %>%
group_by(weekday) %>%
summarise(average_calories = mean(TotalSteps) ,avg_steps = mean(calories), avg_distance = mean(TotalDistance)) %>%
arrange(weekday)
## # A tibble: 7 × 4
## weekday average_calories avg_steps avg_distance
## <ord> <dbl> <dbl> <dbl>
## 1 Sun 6933. 2263 5.03
## 2 Mon 7781. 2324. 5.55
## 3 Tue 8125. 2356. 5.83
## 4 Wed 7559. 2303. 5.49
## 5 Thu 7406. 2200. 5.31
## 6 Fri 7448. 2332. 5.31
## 7 Sat 8153. 2355. 5.85
User_Activity and Sleep merge and summary:
user_sleep_and_activity <- merge(user_activity, sleep_daily, by= c("Id", "ActivityDate"))
str(user_sleep_and_activity)
## 'data.frame': 410 obs. of 23 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityDate : Date, format: "2016-04-12" "2016-04-13" ...
## $ TotalSteps : int 13162 10735 9762 12669 9705 15506 10544 9819 14371 10039 ...
## $ calories : int 1985 1797 1745 1863 1728 2035 1786 1775 1949 1788 ...
## $ TotalDistance : num 8.5 6.97 6.28 8.16 6.48 ...
## $ TrackerDistance : num 8.5 6.97 6.28 8.16 6.48 ...
## $ LoggedActivitiesDistance: num 0 0 0 0 0 0 0 0 0 0 ...
## $ VeryActiveDistance : num 1.88 1.57 2.14 2.71 3.19 ...
## $ ModeratelyActiveDistance: num 0.55 0.69 1.26 0.41 0.78 ...
## $ LightActiveDistance : num 6.06 4.71 2.83 5.04 2.51 ...
## $ SedentaryActiveDistance : num 0 0 0 0 0 0 0 0 0 0 ...
## $ VeryActiveMinutes : int 25 21 29 36 38 50 28 19 41 39 ...
## $ FairlyActiveMinutes : int 13 19 34 10 20 31 12 8 21 5 ...
## $ LightlyActiveMinutes : int 328 217 209 221 164 264 205 211 262 238 ...
## $ SedentaryMinutes : int 728 776 726 773 539 775 818 838 732 709 ...
## $ date : Date, format: "2016-04-12" "2016-04-13" ...
## $ month : chr "04" "04" "04" "04" ...
## $ day : chr "12" "13" "15" "16" ...
## $ year : chr "2016" "2016" "2016" "2016" ...
## $ day_of_week : Ord.factor w/ 7 levels "Sunday"<"Monday"<..: 3 4 6 7 1 3 4 5 7 1 ...
## $ TotalSleepRecords : int 1 2 1 2 1 1 1 1 1 1 ...
## $ TotalMinutesAsleep : int 327 384 412 340 700 304 360 325 361 430 ...
## $ TotalTimeInBed : int 346 407 442 367 712 320 377 364 384 449 ...
summary(user_sleep_and_activity)
## Id ActivityDate TotalSteps calories
## Min. :1.504e+09 Min. :2016-04-12 Min. : 17 Min. : 257
## 1st Qu.:3.977e+09 1st Qu.:2016-04-19 1st Qu.: 5189 1st Qu.:1841
## Median :4.703e+09 Median :2016-04-27 Median : 8913 Median :2207
## Mean :4.995e+09 Mean :2016-04-26 Mean : 8515 Mean :2389
## 3rd Qu.:6.962e+09 3rd Qu.:2016-05-04 3rd Qu.:11370 3rd Qu.:2920
## Max. :8.792e+09 Max. :2016-05-12 Max. :22770 Max. :4900
##
## TotalDistance TrackerDistance LoggedActivitiesDistance VeryActiveDistance
## Min. : 0.010 Min. : 0.010 Min. :0.0000 Min. : 0.000
## 1st Qu.: 3.592 1st Qu.: 3.592 1st Qu.:0.0000 1st Qu.: 0.000
## Median : 6.270 Median : 6.270 Median :0.0000 Median : 0.570
## Mean : 6.012 Mean : 6.007 Mean :0.1089 Mean : 1.446
## 3rd Qu.: 8.005 3rd Qu.: 7.950 3rd Qu.:0.0000 3rd Qu.: 2.360
## Max. :17.540 Max. :17.540 Max. :4.0817 Max. :12.540
##
## ModeratelyActiveDistance LightActiveDistance SedentaryActiveDistance
## Min. :0.0000 Min. :0.010 Min. :0.0000000
## 1st Qu.:0.0000 1st Qu.:2.540 1st Qu.:0.0000000
## Median :0.4200 Median :3.665 Median :0.0000000
## Mean :0.7439 Mean :3.791 Mean :0.0009268
## 3rd Qu.:1.0375 3rd Qu.:4.918 3rd Qu.:0.0000000
## Max. :6.4800 Max. :9.480 Max. :0.1100000
##
## VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes
## Min. : 0.00 Min. : 0.00 Min. : 2.0 Min. : 0.0
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.:158.0 1st Qu.: 631.2
## Median : 9.00 Median : 11.00 Median :208.0 Median : 717.0
## Mean : 25.05 Mean : 17.92 Mean :216.5 Mean : 712.1
## 3rd Qu.: 38.00 3rd Qu.: 26.75 3rd Qu.:263.0 3rd Qu.: 782.8
## Max. :210.00 Max. :143.00 Max. :518.0 Max. :1265.0
##
## date month day year
## Min. :2016-04-12 Length:410 Length:410 Length:410
## 1st Qu.:2016-04-19 Class :character Class :character Class :character
## Median :2016-04-27 Mode :character Mode :character Mode :character
## Mean :2016-04-26
## 3rd Qu.:2016-05-04
## Max. :2016-05-12
##
## day_of_week TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## Sunday :55 Min. :1.00 Min. : 58.0 Min. : 61.0
## Monday :46 1st Qu.:1.00 1st Qu.:361.0 1st Qu.:403.8
## Tuesday :65 Median :1.00 Median :432.5 Median :463.0
## Wednesday:66 Mean :1.12 Mean :419.2 Mean :458.5
## Thursday :64 3rd Qu.:1.00 3rd Qu.:490.0 3rd Qu.:526.0
## Friday :57 Max. :3.00 Max. :796.0 Max. :961.0
## Saturday :57
Averages for user activity:
calories: 2304
distance: 5.49
steps: 7638
Average Sleep:
Calories: 2389
Distance: 6.012
steps: 8515
Users that logged their sleep information averaged higher amounts of calories burned, steps taken and total distance traveled compared to all users in the study. This could suggest that using the sleep feature could have made users more active, or at least suggest a correlation between users and logging sleep.
Average calories, distance , steps, minutes of sleep and time spent in bed by day of the week for user_sleep_and_activity data frame :
user_sleep_and_activity %>%
mutate(weekday = wday(ActivityDate, label = TRUE)) %>%
group_by(weekday) %>%
summarise(avg_hours_of_sleep = mean(TotalMinutesAsleep)/60, avg_hours_in_bed = mean(TotalTimeInBed)/60, avg_steps = mean(TotalSteps), avg_calories_burned = mean(calories), avg_distance = mean(TotalDistance)) %>%
arrange(weekday)
## # A tibble: 7 × 6
## weekday avg_hours_of_sleep avg_hours_in_bed avg_steps avg_calories_burned
## <ord> <dbl> <dbl> <dbl> <dbl>
## 1 Sun 7.55 8.39 7298. 2277.
## 2 Mon 6.99 7.62 9273. 2432.
## 3 Tue 6.74 7.39 9183. 2496.
## 4 Wed 7.24 7.83 8023. 2378.
## 5 Thu 6.69 7.25 8184. 2307.
## 6 Fri 6.76 7.42 7901. 2330.
## 7 Sat 6.98 7.66 9871. 2507.
## # … with 1 more variable: avg_distance <dbl>
The visualization below was created to see the average number of hours spent either asleep or in bed.
hours_in_bed_day_of_week <- user_sleep_and_activity %>%
mutate(weekday = wday(ActivityDate, label = TRUE)) %>%
group_by(weekday) %>%
summarise(avg_hours_of_sleep = mean(TotalMinutesAsleep)/60, avg_hours_in_bed = mean(TotalTimeInBed)/60, avg_steps = mean(TotalSteps), avg_calories_burned = mean(calories), avg_distance = mean(TotalDistance)) %>%
arrange(weekday) %>%
ggplot(aes(x = weekday, y = avg_hours_in_bed, fill = avg_hours_of_sleep)) +
geom_col(position = "dodge") +
labs(title = "Average Number of Hours Spent in Bed by Day of the Week", subtitle = "Color filled in by average amount of hours slept")
hours_in_bed_day_of_week
Observations:
Sundays were the day of the week that these FitBit users averaged the most time in bed.
Next the average amount of steps taken each day of the week was visualized. Distance was included as the fill since the two are correlated.
steps_day_of_week <- user_sleep_and_activity %>%
mutate(weekday = wday(ActivityDate, label = TRUE)) %>%
group_by(weekday) %>%
summarise(avg_hours_of_sleep = mean(TotalMinutesAsleep)/60, avg_hours_in_bed = mean(TotalTimeInBed)/60, avg_steps = mean(TotalSteps), avg_calories_burned = mean(calories), avg_distance = mean(TotalDistance)) %>%
arrange(weekday) %>%
ggplot(aes(x = weekday, y = avg_steps, fill = avg_distance)) +
geom_col(position = "dodge") +
labs(title = "Average Number of Steps Taken by Day of the Week", subtitle = "Color filled in by average total distance", caption = "The metrics above are representative of the 24 users who logged sleep information")
steps_day_of_week
Next all users were included to check the average amount of steps taken each day of the week .
all_steps_day_of_week <- user_activity %>%
mutate(weekday = wday(ActivityDate, label = TRUE)) %>%
group_by(weekday) %>%
summarise(avg_steps = mean(TotalSteps), avg_calories_burned = mean(calories), avg_distance = mean(TotalDistance)) %>%
arrange(weekday) %>%
ggplot(aes(x = weekday, y = avg_steps, fill = avg_distance)) +
geom_col(position = "dodge") +
labs(title = "Average Number of Steps Taken by Day of the Week", subtitle = "Color filled in by average total distance", caption = "The metrics above are representative of all users who logged their information")
all_steps_day_of_week
Observations from comparing both charts for each user group’s steps:
Including the other 9/33 users that didn’t log sleep data brought down the averages for each day.
Day of the week had similar trends for both groups, though the group of 24 users that logged sleep info did have more fluctuations based on the day of week.
Saturdays were the day of the week that users in both groups averaged their highest amount of steps and Sunday was the lowest.
Relationship between calories and steps
cal_steps <- ggplot(data=user_activity, aes(x=calories, y=TotalSteps)) +
geom_point() + geom_smooth() + labs(title="Calories vs. Total Steps")
cal_steps
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
There was a positive relationship between calories and total steps, which was to be expected.