Bellabeat is a high-tech manufacturer of health-focused products for women. Bellabeat is a successful small company and believes that analyzing smart device fitness data could help unlock new growth opportunities for the company. You have been asked to focus on one of Bellabeat’s products and analyze smart device data to gain insight into how consumers are using their smart devices in order to guide new marketing strategies for the company.
Bellabeat also offers a subscription-based membership program for users. Membership gives users 24/7 access to fully personalized guidance on nutrition, activity, sleep, health and beauty, and mindfulness based on their lifestyle and goals.
In this step of the analysis process, we identify the business task and desired outcome.
Bellabeat wants an analysis of their available consumer data in hope that it will reveal more opportunities for growth. they have has asked the marketing analytics team to focus on one Bellabeat product and analyze smart device usage data in order to gain insight into how people are already using their smart devices. Then, using this information, they would like high-level recommendations for how these trends can inform Bellabeat marketing strategy.
The dataset used is this FitBit dataset from Kaggle.
First I downloaded the dataset from Kaggle and cleaned it in Excel before beginning analysis in R Studio. In Excel I removed duplicate rows, created new columns to convert minutes to hours, and some formatting such as number and date formatting.
install.packages("tidyverse")
install.packages("lubridate")
install.packages("dplyr")
install.packages("ggplot2")
install.packages("tidyr")
install.packages("here")
install.packages("skimr")
install.packages("janitor")
library(tidyverse)
library(lubridate)
library(dplyr)
library(ggplot2)
library(tidyr)
library(here)
library(skimr)
library(janitor)
activity <- read_csv("/cloud/lib/dailyActivity_cleaned.csv")
sleep <- read_csv("/cloud/lib/sleep_day_cleaned.csv")
weight <- read_csv("/cloud/lib/weightLogInfo_cleaned.csv")
head(activity)
## # A tibble: 6 × 17
## Id ActivityDate TotalSteps TotalDistance TrackerDistance
## <dbl> <chr> <dbl> <dbl> <dbl>
## 1 1503960366 04/12/16 13162 8.5 8.5
## 2 1503960366 04/13/16 10735 6.97 6.97
## 3 1503960366 04/14/16 10460 6.74 6.74
## 4 1503960366 04/15/16 9762 6.28 6.28
## 5 1503960366 04/16/16 12669 8.16 8.16
## 6 1503960366 04/17/16 9705 6.48 6.48
## # ℹ 12 more variables: LoggedActivitiesDistance <dbl>,
## # VeryActiveDistance <dbl>, ModeratelyActiveDistance <dbl>,
## # LightActiveDistance <dbl>, SedentaryActiveDistance <dbl>,
## # VeryActiveMinutes <dbl>, FairlyActiveMinutes <dbl>,
## # LightlyActiveMinutes <dbl>, SedentaryMinutes <dbl>, Calories <dbl>,
## # LightHours <dbl>, SedHours <dbl>
head(sleep)
## # A tibble: 6 × 7
## Id SleepDay TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## <dbl> <chr> <dbl> <dbl> <dbl>
## 1 1503960366 04/12/16 1 327 346
## 2 1503960366 04/13/16 2 384 407
## 3 1503960366 04/15/16 1 412 442
## 4 1503960366 04/16/16 2 340 367
## 5 1503960366 04/17/16 1 700 712
## 6 1503960366 04/19/16 1 304 320
## # ℹ 2 more variables: TotalHoursAsleep <dbl>, TotalHoursInBed <dbl>
head(weight)
## # A tibble: 6 × 8
## Id Date WeightKg WeightPounds Fat BMI IsManualReport LogId
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <lgl> <dbl>
## 1 1503960366 05/02/16 52.6 116. 22 22.6 TRUE 1.46e12
## 2 1503960366 05/03/16 52.6 116. NA 22.6 TRUE 1.46e12
## 3 1927972279 04/13/16 134. 294. NA 47.5 FALSE 1.46e12
## 4 2873212765 04/21/16 56.7 125 NA 21.4 TRUE 1.46e12
## 5 2873212765 05/12/16 57.3 126. NA 21.7 TRUE 1.46e12
## 6 4319703577 04/17/16 72.4 160. 25 27.4 TRUE 1.46e12
n_distinct(activity$Id)
## [1] 33
n_distinct(sleep$Id)
## [1] 25
n_distinct(weight$Id)
## [1] 8
activity %>%
select(TotalSteps,
TotalDistance,
SedentaryMinutes,
VeryActiveMinutes,
LightlyActiveMinutes,
Calories) %>%
summary()
## TotalSteps TotalDistance SedentaryMinutes VeryActiveMinutes
## Min. : 0 Min. : 0.000 Min. : 0.0 Min. : 0.00
## 1st Qu.: 3790 1st Qu.: 2.620 1st Qu.: 729.8 1st Qu.: 0.00
## Median : 7406 Median : 5.245 Median :1057.5 Median : 4.00
## Mean : 7638 Mean : 5.490 Mean : 991.2 Mean : 21.16
## 3rd Qu.:10727 3rd Qu.: 7.713 3rd Qu.:1229.5 3rd Qu.: 32.00
## Max. :36019 Max. :28.030 Max. :1440.0 Max. :210.00
## LightlyActiveMinutes Calories
## Min. : 0.0 Min. : 0
## 1st Qu.:127.0 1st Qu.:1828
## Median :199.0 Median :2134
## Mean :192.8 Mean :2304
## 3rd Qu.:264.0 3rd Qu.:2793
## Max. :518.0 Max. :4900
sleep %>%
select(TotalHoursAsleep,
TotalHoursInBed,
TotalSleepRecords) %>%
summary()
## TotalHoursAsleep TotalHoursInBed TotalSleepRecords
## Min. : 0.970 Min. : 1.020 Min. :1.00
## 1st Qu.: 6.020 1st Qu.: 6.732 1st Qu.:1.00
## Median : 7.210 Median : 7.720 Median :1.00
## Mean : 6.987 Mean : 7.641 Mean :1.12
## 3rd Qu.: 8.170 3rd Qu.: 8.770 3rd Qu.:1.00
## Max. :13.270 Max. :16.020 Max. :3.00
## NA's :3 NA's :3 NA's :3
weight %>%
select(WeightPounds,
BMI) %>%
summary()
## WeightPounds BMI
## Min. :116.0 Min. :21.45
## 1st Qu.:135.4 1st Qu.:23.96
## Median :137.8 Median :24.39
## Mean :158.8 Mean :25.19
## 3rd Qu.:187.5 3rd Qu.:25.56
## Max. :294.3 Max. :47.54