BellaBeat is a successful small company that develops high-tech health focused products for women. They have the potential to become a larger player in the smart device market, so Urška Sršen, cofounder and Chief Creative Officer of Bellabeat consulted the data analytics team and tasked them with analyzing smart device data; in order to find new growth opportunities for the company. In addition, The Data Analytics team was asked to focus on one of the 5 Bellabeat products. Insights gathered from this Analysis will help guide marketing strategies. Your Analysis and high-level recommendations will be presented to the executive team.
Urška Sršen, suggested that the following dataset should be used, and disclosed that there might be some limitations: FitBit Fitness Tracker Data (CC0: Public Domain, dataset made available through Mobius): This Kaggle data set contains personal fitness tracker from thirty fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits.
Due to Daily_Activity merge having daily_steps and calories, I did not import that data. ###Install Packages and dataset in R
install.packages("tidyverse")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.7 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
install.packages("readr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(readr)
install.packages("dplyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(dplyr)
install.packages("tidyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(tidyr)
install.packages("janitor")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(janitor)
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
install.packages("skimr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(skimr)
Import Data set
daily_activity <- read_csv("dailyActivity_merged - dailyActivity_merged.csv")
## Rows: 940 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDate
## dbl (14): Id, TotalSteps, TotalDistance, TrackerDistance, Very Active Distan...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
weight_info <- read_csv("dailyActivity_merged - weightLogInfo_merged.csv")
## New names:
## Rows: 67 Columns: 10
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (2): Date...2, Date...10 dbl (6): Id, WeightKg, WeightPounds, Fat, BMI, LogId
## lgl (1): IsManualReport time (1): Time
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `Date` -> `Date...2`
## • `Date` -> `Date...10`
sleep <- read_csv("dailyActivity_merged - sleepDay_merged.csv")
## Rows: 410 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Sleep Day, SleepDay
## dbl (5): Id, TotalSleepRecords, TotalMinutesAsleep, Hours in Bed, TotalTime...
## time (1): Sleep Time
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(daily_activity)
## Rows: 940
## Columns: 15
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 15039…
## $ ActivityDate <chr> "4/12/2016", "4/13/2016", "4/14/2016", "4…
## $ TotalSteps <dbl> 13162, 10735, 10460, 9762, 12669, 9705, 1…
## $ TotalDistance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59,…
## $ TrackerDistance <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59,…
## $ `Very Active Distance` <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25,…
## $ `Moderately Active Distance` <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64,…
## $ `Light Active Distance` <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71,…
## $ `Sedentary Active Distance` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ `Very Active Minutes` <dbl> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 6…
## $ `Fairly Active Minutes` <dbl> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27…
## $ `Lightly Active Minutes` <dbl> 328, 217, 181, 209, 221, 164, 233, 264, 2…
## $ `Sedentary Minutes` <dbl> 728, 776, 1218, 726, 773, 539, 1149, 775,…
## $ Calories <dbl> 1985, 1797, 1776, 1745, 1863, 1728, 1921,…
## $ `Logged Activities Distance` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
glimpse(weight_info)
## Rows: 67
## Columns: 10
## $ Id <dbl> 1503960366, 1503960366, 1927972279, 2873212765, 2873212…
## $ Date...2 <chr> "5/2/2016", "5/3/2016", "4/13/2016", "4/21/2016", "5/12…
## $ Time <time> 23:59:59, 23:59:59, 01:08:52, 23:59:59, 23:59:59, 23:5…
## $ WeightKg <dbl> 52.6, 52.6, 133.5, 56.7, 57.3, 72.4, 72.3, 69.7, 70.3, …
## $ WeightPounds <dbl> 115.9631, 115.9631, 294.3171, 125.0021, 126.3249, 159.6…
## $ Fat <dbl> 22, NA, NA, NA, NA, 25, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ BMI <dbl> 22.65, 22.65, 47.54, 21.45, 21.69, 27.45, 27.38, 27.25,…
## $ IsManualReport <lgl> TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, …
## $ LogId <dbl> 1.462234e+12, 1.462320e+12, 1.460510e+12, 1.461283e+12,…
## $ Date...10 <chr> "5/2/2016 11:59:59 PM", "5/3/2016 11:59:59 PM", "4/13/2…
glimpse(sleep)
## Rows: 410
## Columns: 8
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 150…
## $ `Sleep Day` <chr> "4/12/2016", "4/13/2016", "4/15/2016", "4/16/2016",…
## $ `Sleep Time` <time> 00:00:00, 00:00:00, 00:00:00, 00:00:00, 00:00:00, …
## $ TotalSleepRecords <dbl> 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ TotalMinutesAsleep <dbl> 327, 384, 412, 340, 700, 304, 360, 325, 361, 430, 2…
## $ `Hours in Bed` <dbl> 5.450000, 6.400000, 6.866667, 5.666667, 11.666667, …
## $ TotalTimeInBed <dbl> 346, 407, 442, 367, 712, 320, 377, 364, 384, 449, 3…
## $ SleepDay <chr> "4/12/2016 12:00:00 AM", "4/13/2016 12:00:00 AM", "…
##“Process” –Disclaimer– Due to technical difficulties in R most cleaning was done in google spreadsheet. The following was done to the daily_activity, sleep, and weight data in google spreadsheet: Removed Duplicates Adjusted the format Checked for blanks Separated Sleep Day Column into “Sleep day” and “Sleep time” Separated Data column into Time and Date. Made times and dates consistent Trimmed White Space clean_daily_activity <- clean_names(daily_activity) Checked and Eliminated Duplicate Data.
In R, I will correct the column name conventions.
clean_daily_activity <- clean_names(daily_activity)
clean_weight_info <- clean_names(weight_info)
clean_sleep <- clean_names(sleep)
I checked to make sure the column names were corrected
colnames(clean_daily_activity)
## [1] "id" "activity_date"
## [3] "total_steps" "total_distance"
## [5] "tracker_distance" "very_active_distance"
## [7] "moderately_active_distance" "light_active_distance"
## [9] "sedentary_active_distance" "very_active_minutes"
## [11] "fairly_active_minutes" "lightly_active_minutes"
## [13] "sedentary_minutes" "calories"
## [15] "logged_activities_distance"
colnames(clean_weight_info)
## [1] "id" "date_2" "time" "weight_kg"
## [5] "weight_pounds" "fat" "bmi" "is_manual_report"
## [9] "log_id" "date_10"
colnames(clean_sleep)
## [1] "id" "sleep_day" "sleep_time"
## [4] "total_sleep_records" "total_minutes_asleep" "hours_in_bed"
## [7] "total_time_in_bed" "sleep_day_2"
Smart devices are able to track a range of data that is beneficial to the Users health and wellness. The trends noted in the Analysis and Share section will help users make more conscious decisions about when they should exercise, what is effective high intensity workouts vs low intensity workouts, and weight loss. Although all of Bellabeats products would be beneficial to users, I believe based on my analysis the bellabeat app would be a proper tool to Market as it can go on all devices. I suggest their being a reminder feature for users to input their weight in pounds/kg (depending on their preference) as women tend to focus on losing weight as their main reason for utilizing devices like these.
This is my first data analytics case study. I have wrestled with this project, and I am open to the many critiques that more experienced data scientist and analyst might have for me. Please let me know how I can improve.