Introduction

BellaBeat is a successful small company that develops high-tech health focused products for women. They have the potential to become a larger player in the smart device market, so Urška Sršen, cofounder and Chief Creative Officer of Bellabeat consulted the data analytics team and tasked them with analyzing smart device data; in order to find new growth opportunities for the company. In addition, The Data Analytics team was asked to focus on one of the 5 Bellabeat products. Insights gathered from this Analysis will help guide marketing strategies. Your Analysis and high-level recommendations will be presented to the executive team.

Characters

  • Urška Sršen: Bellabeat’s co founder and Chief Creative Officer
  • Sando Mur: Mathematician and Bellabeat cofounder; key member of the Bellabeat executive team
  • Bellabeat marketing analytics team: A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat’s marketing strategy.

Products

  • Bellabeat app: The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. This data can help users better understand their current habits and make healthy decisions. The Bellabeat app connects to their line of smart wellness products.
  • Leaf: Bellabeat’s classic wellness tracker can be worn as a bracelet, necklace, or clip. The Leaf tracker connects to the Bellabeat app to track activity, sleep, and stress.
  • Time: This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress. The Time watch connects to the Bellabeat app to provide you with insights into your daily wellness.
  • Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day. The Spring bottle connects to the Bellabeat app to track your hydration levels
  • Bellabeat membership: Bellabeat also offers a subscription-based membership program for users. Membership gives users 24/7 access to fully personalized guidance on nutrition, activity, sleep, health and beauty, and mindfulness based on their lifestyle and goals.

Ask

  1. What are some trends in smart device usage?
  2. How could these trends apply to Bellabeat customers?
  3. How could these trends help influence Bellabeat marketing strategy?

Prepare

Urška Sršen, suggested that the following dataset should be used, and disclosed that there might be some limitations: FitBit Fitness Tracker Data (CC0: Public Domain, dataset made available through Mobius): This Kaggle data set contains personal fitness tracker from thirty fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits.

Due to Daily_Activity merge having daily_steps and calories, I did not import that data. ###Install Packages and dataset in R

install.packages("tidyverse")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
install.packages("readr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(readr)
install.packages("dplyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(dplyr)
install.packages("tidyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(tidyr)
install.packages("janitor")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(janitor)
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
install.packages("skimr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(skimr)

Import Data set

daily_activity <- read_csv("dailyActivity_merged - dailyActivity_merged.csv")
## Rows: 940 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): ActivityDate
## dbl (14): Id, TotalSteps, TotalDistance, TrackerDistance, Very Active Distan...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
weight_info <- read_csv("dailyActivity_merged - weightLogInfo_merged.csv")
## New names:
## Rows: 67 Columns: 10
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (2): Date...2, Date...10 dbl (6): Id, WeightKg, WeightPounds, Fat, BMI, LogId
## lgl (1): IsManualReport time (1): Time
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `Date` -> `Date...2`
## • `Date` -> `Date...10`
sleep <- read_csv("dailyActivity_merged - sleepDay_merged.csv")
## Rows: 410 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): Sleep Day, SleepDay
## dbl  (5): Id, TotalSleepRecords, TotalMinutesAsleep, Hours in Bed, TotalTime...
## time (1): Sleep Time
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(daily_activity)
## Rows: 940
## Columns: 15
## $ Id                           <dbl> 1503960366, 1503960366, 1503960366, 15039…
## $ ActivityDate                 <chr> "4/12/2016", "4/13/2016", "4/14/2016", "4…
## $ TotalSteps                   <dbl> 13162, 10735, 10460, 9762, 12669, 9705, 1…
## $ TotalDistance                <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59,…
## $ TrackerDistance              <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59,…
## $ `Very Active Distance`       <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25,…
## $ `Moderately Active Distance` <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64,…
## $ `Light Active Distance`      <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71,…
## $ `Sedentary Active Distance`  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ `Very Active Minutes`        <dbl> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 6…
## $ `Fairly Active Minutes`      <dbl> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27…
## $ `Lightly Active Minutes`     <dbl> 328, 217, 181, 209, 221, 164, 233, 264, 2…
## $ `Sedentary Minutes`          <dbl> 728, 776, 1218, 726, 773, 539, 1149, 775,…
## $ Calories                     <dbl> 1985, 1797, 1776, 1745, 1863, 1728, 1921,…
## $ `Logged Activities Distance` <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
glimpse(weight_info)
## Rows: 67
## Columns: 10
## $ Id             <dbl> 1503960366, 1503960366, 1927972279, 2873212765, 2873212…
## $ Date...2       <chr> "5/2/2016", "5/3/2016", "4/13/2016", "4/21/2016", "5/12…
## $ Time           <time> 23:59:59, 23:59:59, 01:08:52, 23:59:59, 23:59:59, 23:5…
## $ WeightKg       <dbl> 52.6, 52.6, 133.5, 56.7, 57.3, 72.4, 72.3, 69.7, 70.3, …
## $ WeightPounds   <dbl> 115.9631, 115.9631, 294.3171, 125.0021, 126.3249, 159.6…
## $ Fat            <dbl> 22, NA, NA, NA, NA, 25, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ BMI            <dbl> 22.65, 22.65, 47.54, 21.45, 21.69, 27.45, 27.38, 27.25,…
## $ IsManualReport <lgl> TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, …
## $ LogId          <dbl> 1.462234e+12, 1.462320e+12, 1.460510e+12, 1.461283e+12,…
## $ Date...10      <chr> "5/2/2016 11:59:59 PM", "5/3/2016 11:59:59 PM", "4/13/2…
glimpse(sleep)
## Rows: 410
## Columns: 8
## $ Id                 <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 150…
## $ `Sleep Day`        <chr> "4/12/2016", "4/13/2016", "4/15/2016", "4/16/2016",…
## $ `Sleep Time`       <time> 00:00:00, 00:00:00, 00:00:00, 00:00:00, 00:00:00, …
## $ TotalSleepRecords  <dbl> 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ TotalMinutesAsleep <dbl> 327, 384, 412, 340, 700, 304, 360, 325, 361, 430, 2…
## $ `Hours in Bed`     <dbl> 5.450000, 6.400000, 6.866667, 5.666667, 11.666667, …
## $ TotalTimeInBed     <dbl> 346, 407, 442, 367, 712, 320, 377, 364, 384, 449, 3…
## $ SleepDay           <chr> "4/12/2016 12:00:00 AM", "4/13/2016 12:00:00 AM", "…

##“Process” –Disclaimer– Due to technical difficulties in R most cleaning was done in google spreadsheet. The following was done to the daily_activity, sleep, and weight data in google spreadsheet: Removed Duplicates Adjusted the format Checked for blanks Separated Sleep Day Column into “Sleep day” and “Sleep time” Separated Data column into Time and Date. Made times and dates consistent Trimmed White Space clean_daily_activity <- clean_names(daily_activity) Checked and Eliminated Duplicate Data.

In R, I will correct the column name conventions.

clean_daily_activity <- clean_names(daily_activity)
clean_weight_info <- clean_names(weight_info)
clean_sleep <- clean_names(sleep)

I checked to make sure the column names were corrected

colnames(clean_daily_activity)
##  [1] "id"                         "activity_date"             
##  [3] "total_steps"                "total_distance"            
##  [5] "tracker_distance"           "very_active_distance"      
##  [7] "moderately_active_distance" "light_active_distance"     
##  [9] "sedentary_active_distance"  "very_active_minutes"       
## [11] "fairly_active_minutes"      "lightly_active_minutes"    
## [13] "sedentary_minutes"          "calories"                  
## [15] "logged_activities_distance"
colnames(clean_weight_info)
##  [1] "id"               "date_2"           "time"             "weight_kg"       
##  [5] "weight_pounds"    "fat"              "bmi"              "is_manual_report"
##  [9] "log_id"           "date_10"
colnames(clean_sleep)
## [1] "id"                   "sleep_day"            "sleep_time"          
## [4] "total_sleep_records"  "total_minutes_asleep" "hours_in_bed"        
## [7] "total_time_in_bed"    "sleep_day_2"

Analyze & Share

I will looking at the averages of the data to look for trends. I will also join data frames that have data that I want to visualize.

mean_daily_activity <- clean_daily_activity %>% 
  group_by(id) %>% 
  summarise(mean_very_active_minutes = mean(very_active_minutes), mean_lightly_active_minutes = mean(lightly_active_minutes), mean_fairly_active_minutes = mean(fairly_active_minutes), mean_sedentary_minutes = mean(sedentary_minutes), mean_total_steps = mean(total_steps), mean_calories = mean(calories))

head(mean_daily_activity)
## # A tibble: 6 × 7
##           id mean_very_activ… mean_lightly_ac… mean_fairly_act… mean_sedentary_…
##        <dbl>            <dbl>            <dbl>            <dbl>            <dbl>
## 1 1503960366           38.7              220.            19.2               848.
## 2 1624580081            8.68             153.             5.81             1258.
## 3 1644430081            9.57             178.            21.4              1162.
## 4 1844505072            0.129            115.             1.29             1207.
## 5 1927972279            1.32              38.6            0.774            1317.
## 6 2022484408           36.3              257.            19.4              1113.
## # … with 2 more variables: mean_total_steps <dbl>, mean_calories <dbl>
mean_weight_info <- clean_weight_info %>% 
  group_by(id) %>% 
  summarise(mean_weight_pounds = mean(weight_pounds), mean_bmi = mean(bmi))

head(mean_weight_info)
## # A tibble: 6 × 3
##           id mean_weight_pounds mean_bmi
##        <dbl>              <dbl>    <dbl>
## 1 1503960366               116.     22.6
## 2 1927972279               294.     47.5
## 3 2873212765               126.     21.6
## 4 4319703577               160.     27.4
## 5 4558609924               154.     27.2
## 6 5577150313               200.     28
mean_sleep <- clean_sleep %>%
  group_by(id) %>% 
  summarise(mean_total_time_in_bed = mean(total_time_in_bed), mean_total_hours_in_bed = mean(hours_in_bed))

head(mean_sleep)
## # A tibble: 6 × 3
##           id mean_total_time_in_bed mean_total_hours_in_bed
##        <dbl>                  <dbl>                   <dbl>
## 1 1503960366                   383.                    6.00
## 2 1644430081                   346                     4.90
## 3 1844505072                   961                    10.9 
## 4 1927972279                   438.                    6.95
## 5 2026352035                   538.                    8.44
## 6 2320127002                    69                     1.02

The hours_in_bed is a conversion that was made in spreadsheet that turned total_minutes_asleep into hours to make it easier to read. All minute fields were divided by 60. Merge all data to visualize correlating columns

total_activity <- merge(x=mean_daily_activity, y=mean_weight_info, all = TRUE)
head(total_activity)
##           id mean_very_active_minutes mean_lightly_active_minutes
## 1 1503960366               38.7096774                   219.93548
## 2 1624580081                8.6774194                   153.48387
## 3 1644430081                9.5666667                   178.46667
## 4 1844505072                0.1290323                   115.45161
## 5 1927972279                1.3225806                    38.58065
## 6 2022484408               36.2903226                   257.45161
##   mean_fairly_active_minutes mean_sedentary_minutes mean_total_steps
## 1                 19.1612903               848.1613        12116.742
## 2                  5.8064516              1257.7419         5743.903
## 3                 21.3666667              1161.8667         7282.967
## 4                  1.2903226              1206.6129         2580.065
## 5                  0.7741935              1317.4194          916.129
## 6                 19.3548387              1112.5806        11370.645
##   mean_calories mean_weight_pounds mean_bmi
## 1      1816.419           115.9631    22.65
## 2      1483.355                 NA       NA
## 3      2811.300                 NA       NA
## 4      1573.484                 NA       NA
## 5      2172.806           294.3171    47.54
## 6      2509.968                 NA       NA
total_activity <- merge(x=total_activity, y=mean_sleep, all = TRUE)
head(total_activity)
##           id mean_very_active_minutes mean_lightly_active_minutes
## 1 1503960366               38.7096774                   219.93548
## 2 1624580081                8.6774194                   153.48387
## 3 1644430081                9.5666667                   178.46667
## 4 1844505072                0.1290323                   115.45161
## 5 1927972279                1.3225806                    38.58065
## 6 2022484408               36.2903226                   257.45161
##   mean_fairly_active_minutes mean_sedentary_minutes mean_total_steps
## 1                 19.1612903               848.1613        12116.742
## 2                  5.8064516              1257.7419         5743.903
## 3                 21.3666667              1161.8667         7282.967
## 4                  1.2903226              1206.6129         2580.065
## 5                  0.7741935              1317.4194          916.129
## 6                 19.3548387              1112.5806        11370.645
##   mean_calories mean_weight_pounds mean_bmi mean_total_time_in_bed
## 1      1816.419           115.9631    22.65                  383.2
## 2      1483.355                 NA       NA                     NA
## 3      2811.300                 NA       NA                  346.0
## 4      1573.484                 NA       NA                  961.0
## 5      2172.806           294.3171    47.54                  437.8
## 6      2509.968                 NA       NA                     NA
##   mean_total_hours_in_bed
## 1                6.004667
## 2                      NA
## 3                4.900000
## 4               10.866667
## 5                6.950000
## 6                      NA

Install ggplot2 to visualize data Insights will follow each data viz displayed

install.packages("ggplot2")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)
library(ggplot2)
ggplot(data = total_activity) + 
   geom_point(mapping = aes(x=mean_very_active_minutes, y=mean_calories))+
  geom_smooth(mapping = aes(x=mean_very_active_minutes, y=mean_calories))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(data = total_activity) +
  geom_point(mapping = aes(x=mean_lightly_active_minutes, y=mean_calories))+
  geom_smooth(mapping = aes(x=mean_lightly_active_minutes, y=mean_calories))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(data = total_activity) +
  geom_point(mapping = aes(x=mean_fairly_active_minutes, y=mean_calories))+
  geom_smooth(mapping = aes(x=mean_fairly_active_minutes, y=mean_calories))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(data = total_activity) +
  geom_point(mapping = aes(x=mean_sedentary_minutes, y=mean_calories))+
  geom_smooth(mapping = aes(x=mean_sedentary_minutes, y=mean_calories))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

When the type of minutes recorded had an impact on how much calories were burned. Fairly active minutes, and lightly active minutes had more of an impact on calories burned, than very active minutes. This could help inform the user of what types of exercises that they should do more often to burn more calories.

ggplot(data = total_activity) +
  geom_point(mapping = aes(x=mean_total_steps, y=mean_calories))+
  geom_smooth(mapping = aes(x=mean_total_steps, y=mean_calories))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(data = total_activity) +
  geom_point(mapping = aes(x=mean_weight_pounds, y=mean_calories))+
  geom_smooth(mapping = aes(x=mean_weight_pounds, y=mean_calories))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 25 rows containing non-finite values (stat_smooth).
## Warning: Removed 25 rows containing missing values (geom_point).

The total calories a User burned correlated with the amount of steps they took. The more steps recorded the more calories burned.

There is a 28 users that have not recorded their weight, but I wanted to check with what data was available if there was a correlation between calories counted and weight loss. There is a slight correlation between lower calories and weight. Where the lower the calories, the lower the weight.

ggplot(data = total_activity) +
  geom_point(mapping = aes(x=mean_total_time_in_bed, y=mean_total_hours_in_bed))+
  geom_smooth(mapping = aes(x=mean_total_time_in_bed, y=mean_total_hours_in_bed))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 9 rows containing non-finite values (stat_smooth).
## Warning: Removed 9 rows containing missing values (geom_point).

ggplot(data = total_activity) +
  geom_point(mapping = aes(x=mean_total_steps, y=mean_total_hours_in_bed))+
  geom_smooth(mapping = aes(x=mean_total_steps, y=mean_total_hours_in_bed))
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## Warning: Removed 9 rows containing non-finite values (stat_smooth).
## Removed 9 rows containing missing values (geom_point).

As expected if users are in bed they are using most of that time to sleep and not lay idle awake.

When comparing the total steps to total hours in bed Users are accomplishing more steps with more sleep.

Act

Smart devices are able to track a range of data that is beneficial to the Users health and wellness. The trends noted in the Analysis and Share section will help users make more conscious decisions about when they should exercise, what is effective high intensity workouts vs low intensity workouts, and weight loss. Although all of Bellabeats products would be beneficial to users, I believe based on my analysis the bellabeat app would be a proper tool to Market as it can go on all devices. I suggest their being a reminder feature for users to input their weight in pounds/kg (depending on their preference) as women tend to focus on losing weight as their main reason for utilizing devices like these.

This is my first data analytics case study. I have wrestled with this project, and I am open to the many critiques that more experienced data scientist and analyst might have for me. Please let me know how I can improve.