Google Data Analysis Case Study: Bellabeat

Introduction

Bellabeat is a high-tech company that prides itself as ‘the go-to wellness brand for women with an ecosystem of products and services focused on women’s health’. They manufacture health-focused smart products that collect data on activity, sleep, stress and hydration levels as well as the reproductive health of women with the goal of empowering them with an understanding of their health and hitherto unknown habits.

Since it was founded in 2013, Bellabeat has grown rapidly and quickly positioned itself as a tech-driven wellness company for women. By 2016, Bellabeat had opened offices around the world and launched multiple products. Bellabeat products became available through a growing number of online retailers in addition to their own e-commerce channel on their website. The company has invested in traditional advertising media, such as radio, out-of-home billboards, print, and television, but focuses on digital marketing extensively. Bellabeat invests year-round in Google Search, maintaining active Facebook and Instagram pages, and consistently engages consumers on Twitter. Additionally, Bellabeat runs video ads on Youtube and display ads on the Google Display Network to support campaigns around key marketing dates.

Products

· Bellabeat app: The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. This data can help users better understand their current habits and make healthy decisions. The Bellabeat app connects to their line of smart wellness products.

· Leaf: Bellabeat’s classic wellness tracker can be worn as a bracelet, necklace, or clip. The Leaf tracker connects to the Bellabeat app to track activity, sleep, and stress.

· Time: This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress. The Time watch connects to the Bellabeat app to provide you with insights into your daily wellness.

· Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day. The Spring bottle connects to the Bellabeat app to track your hydration levels.

· Bellabeat membership: Bellabeat also offers a subscription-based membership program for users. Membership gives users 24/7 access to fully personalized guidance on nutrition, activity, sleep, health and beauty, and mindfulness based on their lifestyle and goals.

It is envisaged that a focus on a Bellabeat products and the analysis of the FitBit Fitness Tracker Data will help key stakeholders at Bellabeat to gain insights into how people are already using their smart devices and reveal more opportunities for growth.

In order to adequately analyze these data to answer the key business questions and make recommendations, I will follow the key steps of Data Analysis Process: Ask, Prepare, Process, Analyze, Share and Act

ASK

a. Business task

Analyze data about select users of FitBit smart devices to draw gainful insights into trends, patterns and relationships between health parameters; identify potential opportunities for growth and make high-level marketing recommendation and strategies to the marketing department of Bellabeat.

b. Questions

What are some trends in smart device usage?
How could these trends apply to Bellabeat customers?
How could these trends help influence Bellabeat marketing strategy?

c. Key Stakeholders:

· Urška Sršen — Bellabeat’s co-founder and Chief Creative Officer

· Sando Mur — Mathematician and Bellabeat’s co-founder;

· Bellabeat marketing analytics team — A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat’s marketing strategy.

PREPARE

a. Data Source

For the purpose of this analysis, Bellabeat’s Chief Creative, Urška Sršen gave his nod to the usage of a public data that explored smart device user’s daily measures - the FitBit Fitness Tracker Data.

The FitBit Fitness Tracker Data is a public domain dataset made available by Möbius under CC0 database protection license. The dataset, comprising of 18 .csv files, has the combined personal fitness tracker statistics from thirty (30) FitBit users who consented to submit their personal data which includes their heart rate, sleep details, intensities, physical activities and other related data necessary to assess their habits.

b. Data Assessment for credibility & integrity

To determine the credibility, reliability and integrity of the dataset presented, I will utilize the ROCCC (Reliable, Original, Comprehensive, Current & Cited) data test model.

Reliability: (LOW) There were only 30 individuals involved in this survey. This is a very small sample size for making far-reaching analysis & recommendation for the required business task.

Originality: (LOW) Data is sourced from a third-party survey by Amazon Mechanical Turk.

Comprehensive: (MEDIUM) – Data is within the parameters required for the Bellabeat’s business task.

Current: (LOW) The dataset was sourced back in 2016 (over 8 years ago) and covered a short period of March – May 2016. It is my opinion that this data is somewhat stale given the pace of better and improved health data tracking methods over the years. More so, a 2-month data collection window is so short for the highly dynamic data type.

Cited: (MEDIUM) The third-party dataset was available by Mobius via Kaggle.

I also observed some limitations to the data provided as it did not give information on key characteristics such as gender, age, location, lifestyle of the participants.

c. Data Selection

The 18 datasets were first opened on Excel for preliminary review, filtering and sorting to observe for blanks, inconsistent naming convention, missing data and possible duplicity of data.

From the review, I observed the data for daily_calories, daily_intensities, and daily_steps data frames are contained in daily_activity data frame. For ease of analysis, these data frames will be deemed well represented and taken out of our analysis to avoid duplicity.

I also noticed a lot of empty cells in the Fat column of the weight_log data frame. The Fat column was then removed.

d. RStudio Cloud: Installation and Loading of packages

I will use RStudio Cloud for this analysis. This is because of the wide range of functionalities it has for data manipulation, cleaning, analysis and visualization. To use RStudio, key packages needed for the analysis are to be installed.

install.packages("tidyverse")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)

install.packages("janitor")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)

install.packages("dplyr")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)

install.packages("ggplot2")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)

install.packages("readr")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)

install.packages("forcats")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)

install.packages("scales")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)

install.packages("lubridate")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)

install.packages ("geosphere")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)

install.packages("plotrix")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)

install.packages("here")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)

install.packages ("skimr")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)

install.packages("tidyr")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.2'
## (as 'lib' is unspecified)

These installed packages are then loaded to use their functionalities.

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.5 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(janitor)

## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

library(dplyr)

library(ggplot2)

library(readr)

library(forcats)

library(scales)

## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## The following object is masked from 'package:readr':
## 
##     col_factor

library(lubridate)

## 
## Attaching package: 'lubridate'
## 
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

library(geosphere)

library(plotrix)

## 
## Attaching package: 'plotrix'
## 
## The following object is masked from 'package:scales':
## 
##     rescale

library(here)

## here() starts at /cloud/project

library(skimr)

library(tidyr)

e. Data Importation

The datasets are now imported into the RStudio application and given simplified names using the assignment operator

daily_activity <- read_csv("dailyActivity.csv")

## Rows: 940 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): ActivityDate
## dbl (14): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

daily_calories <- read_csv("dailyCalories.csv")

## Rows: 940 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDay
## dbl (2): Id, Calories
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

daily_intensities <- read_csv("dailyIntensities.csv")

## Rows: 940 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDay
## dbl (9): Id, SedentaryMinutes, LightlyActiveMinutes, FairlyActiveMinutes, Ve...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

daily_steps <- read_csv("dailySteps.csv")

## Rows: 940 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDay
## dbl (2): Id, StepTotal
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

heart_rate <- read_csv("heartrate_seconds.csv")

## Rows: 1048575 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Time
## dbl (2): Id, Value
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

daily_sleep <- read_csv("sleepDay.csv")

## Rows: 413 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): SleepDay
## dbl (4): Id, TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

weight_log_data <- read_csv("weightLogInfo.csv")

## Rows: 67 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Date
## dbl (5): Id, WeightKg, WeightPounds, BMI, LogId
## lgl (1): IsManualReport
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

minute_METs <- read_csv("minuteMETsNarrow.csv")

## Rows: 1048575 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityMinute
## dbl (2): Id, METs
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

PROCESS

Here, we will explore the data frames to find area of commonalities and confirm that the data were appropriately imported. Functions like head(), colnames() glimpse() and str() will be used.

daily_activity:

head(daily_activities)

head(daily_activity)

## # A tibble: 6 × 15
##       Id Activ…¹ Total…² Total…³ Track…⁴ Logge…⁵ VeryA…⁶ Moder…⁷ Light…⁸ Seden…⁹
##    <dbl> <chr>     <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 1.50e9 4/12/2…   13162    8.5     8.5        0    1.88   0.550    6.06       0
## 2 1.50e9 4/13/2…   10735    6.97    6.97       0    1.57   0.690    4.71       0
## 3 1.50e9 4/14/2…   10460    6.74    6.74       0    2.44   0.400    3.91       0
## 4 1.50e9 4/15/2…    9762    6.28    6.28       0    2.14   1.26     2.83       0
## 5 1.50e9 4/16/2…   12669    8.16    8.16       0    2.71   0.410    5.04       0
## 6 1.50e9 4/17/2…    9705    6.48    6.48       0    3.19   0.780    2.51       0
## # … with 5 more variables: VeryActiveMinutes <dbl>, FairlyActiveMinutes <dbl>,
## #   LightlyActiveMinutes <dbl>, SedentaryMinutes <dbl>, Calories <dbl>, and
## #   abbreviated variable names ¹ActivityDate, ²TotalSteps, ³TotalDistance,
## #   ⁴TrackerDistance, ⁵LoggedActivitiesDistance, ⁶VeryActiveDistance,
## #   ⁷ModeratelyActiveDistance, ⁸LightActiveDistance, ⁹SedentaryActiveDistance

colnames(daily_activity)

colnames(daily_activity)

##  [1] "Id"                       "ActivityDate"            
##  [3] "TotalSteps"               "TotalDistance"           
##  [5] "TrackerDistance"          "LoggedActivitiesDistance"
##  [7] "VeryActiveDistance"       "ModeratelyActiveDistance"
##  [9] "LightActiveDistance"      "SedentaryActiveDistance" 
## [11] "VeryActiveMinutes"        "FairlyActiveMinutes"     
## [13] "LightlyActiveMinutes"     "SedentaryMinutes"        
## [15] "Calories"

glimpse(daily_activity)

glimpse(daily_activity)

## Rows: 940
## Columns: 15
## $ Id                       <dbl> 1503960366, 1503960366, 1503960366, 150396036…
## $ ActivityDate             <chr> "4/12/2016", "4/13/2016", "4/14/2016", "4/15/…
## $ TotalSteps               <dbl> 13162, 10735, 10460, 9762, 12669, 9705, 13019…
## $ TotalDistance            <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8…
## $ TrackerDistance          <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8…
## $ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveDistance       <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3.5…
## $ ModeratelyActiveDistance <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1.3…
## $ LightActiveDistance      <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5.0…
## $ SedentaryActiveDistance  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveMinutes        <dbl> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66, 4…
## $ FairlyActiveMinutes      <dbl> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, 21…
## $ LightlyActiveMinutes     <dbl> 328, 217, 181, 209, 221, 164, 233, 264, 205, …
## $ SedentaryMinutes         <dbl> 728, 776, 1218, 726, 773, 539, 1149, 775, 818…
## $ Calories                 <dbl> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 203…

heart_rate:

head(heart_rate)

head(heart_rate)

## # A tibble: 6 × 3
##           Id Time           Value
##        <dbl> <chr>          <dbl>
## 1 2022484408 4/12/2016 7:21    97
## 2 2022484408 4/12/2016 7:21   102
## 3 2022484408 4/12/2016 7:21   105
## 4 2022484408 4/12/2016 7:21   103
## 5 2022484408 4/12/2016 7:21   101
## 6 2022484408 4/12/2016 7:22    95

colnames(heart_rate)

colnames(heart_rate)

## [1] "Id"    "Time"  "Value"

daily_sleep:

head(daily_sleep)

head(daily_sleep)

## # A tibble: 6 × 5
##           Id SleepDay       TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
##        <dbl> <chr>                      <dbl>              <dbl>          <dbl>
## 1 1503960366 4/12/2016 0:00                 1                327            346
## 2 1503960366 4/13/2016 0:00                 2                384            407
## 3 1503960366 4/15/2016 0:00                 1                412            442
## 4 1503960366 4/16/2016 0:00                 2                340            367
## 5 1503960366 4/17/2016 0:00                 1                700            712
## 6 1503960366 4/19/2016 0:00                 1                304            320

colnames(daily_ sleep)

colnames(daily_sleep)

## [1] "Id"                 "SleepDay"           "TotalSleepRecords" 
## [4] "TotalMinutesAsleep" "TotalTimeInBed"

weight_log_data:

head(weight_log_data)

head(weight_log_data)

## # A tibble: 6 × 7
##           Id Date            WeightKg WeightPounds   BMI IsManualReport    LogId
##        <dbl> <chr>              <dbl>        <dbl> <dbl> <lgl>             <dbl>
## 1 1503960366 5/2/2016 23:59      52.6         116.  22.6 TRUE            1.46e12
## 2 1503960366 5/3/2016 23:59      52.6         116.  22.6 TRUE            1.46e12
## 3 1927972279 4/13/2016 1:08     134.          294.  47.5 FALSE           1.46e12
## 4 2873212765 4/21/2016 23:59     56.7         125.  21.5 TRUE            1.46e12
## 5 2873212765 5/12/2016 23:59     57.3         126.  21.7 TRUE            1.46e12
## 6 4319703577 4/17/2016 23:59     72.4         160.  27.5 TRUE            1.46e12

colnames(weight_log_data)

colnames(weight_log_data)

## [1] "Id"             "Date"           "WeightKg"       "WeightPounds"  
## [5] "BMI"            "IsManualReport" "LogId"

minute_METs:

head(minute_METs)

head(minute_METs)

## # A tibble: 6 × 3
##           Id ActivityMinute  METs
##        <dbl> <chr>          <dbl>
## 1 1503960366 4/12/2016 0:00    10
## 2 1503960366 4/12/2016 0:01    10
## 3 1503960366 4/12/2016 0:02    10
## 4 1503960366 4/12/2016 0:03    10
## 5 1503960366 4/12/2016 0:04    10
## 6 1503960366 4/12/2016 0:05    12

colnames(minute_METs)

colnames(minute_METs)

## [1] "Id"             "ActivityMinute" "METs"

ANALYZE

We then run a quick summary on the various data frames by using the skim_without_chart() function to provide broader overview of a data frames.

skim_without_chart(daily_activity)

skim_without_charts(daily_activity)

Data summary
Name	daily_activity
Number of rows	940
Number of columns	15
_______________________
Column type frequency:
character	1
numeric	14
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
ActivityDate	0	1	8	9	0	31	0

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100
Id	1	4.855407e+09	2.424805e+09	1503960366	2.320127e+09	4.445115e+09	6.962181e+09	8.877689e+09
TotalSteps	1	7.637910e+03	5.087150e+03	0	3.789750e+03	7.405500e+03	1.072700e+04	3.601900e+04
TotalDistance	1	5.490000e+00	3.920000e+00	0	2.620000e+00	5.240000e+00	7.710000e+00	2.803000e+01
TrackerDistance	1	5.480000e+00	3.910000e+00	0	2.620000e+00	5.240000e+00	7.710000e+00	2.803000e+01
LoggedActivitiesDistance	1	1.100000e-01	6.200000e-01	0	0.000000e+00	0.000000e+00	0.000000e+00	4.940000e+00
VeryActiveDistance	1	1.500000e+00	2.660000e+00	0	0.000000e+00	2.100000e-01	2.050000e+00	2.192000e+01
ModeratelyActiveDistance	1	5.700000e-01	8.800000e-01	0	0.000000e+00	2.400000e-01	8.000000e-01	6.480000e+00
LightActiveDistance	1	3.340000e+00	2.040000e+00	0	1.950000e+00	3.360000e+00	4.780000e+00	1.071000e+01
SedentaryActiveDistance	1	0.000000e+00	1.000000e-02	0	0.000000e+00	0.000000e+00	0.000000e+00	1.100000e-01
VeryActiveMinutes	1	2.116000e+01	3.284000e+01	0	0.000000e+00	4.000000e+00	3.200000e+01	2.100000e+02
FairlyActiveMinutes	1	1.356000e+01	1.999000e+01	0	0.000000e+00	6.000000e+00	1.900000e+01	1.430000e+02
LightlyActiveMinutes	1	1.928100e+02	1.091700e+02	0	1.270000e+02	1.990000e+02	2.640000e+02	5.180000e+02
SedentaryMinutes	1	9.912100e+02	3.012700e+02	0	7.297500e+02	1.057500e+03	1.229500e+03	1.440000e+03
Calories	1	2.303610e+03	7.181700e+02	0	1.828500e+03	2.134000e+03	2.793250e+03	4.900000e+03

skim_without_charts(daily_sleep)

skim_without_charts(daily_sleep)

Data summary
Name	daily_sleep
Number of rows	413
Number of columns	5
_______________________
Column type frequency:
character	1
numeric	4
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
SleepDay	0	1	13	14	0	31	0

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100
Id	1	5.000979e+09	2.06036e+09	1503960366	3977333714	4702921684	6962181067	8792009665
TotalSleepRecords	1	1.120000e+00	3.50000e-01	1	1	1	1	3
TotalMinutesAsleep	1	4.194700e+02	1.18340e+02	58	361	433	490	796
TotalTimeInBed	1	4.586400e+02	1.27100e+02	61	403	463	526	961

skim_without_charts(weight_log_data)

skim_without_charts(weight_log_data)

Data summary
Name	weight_log_data
Number of rows	67
Number of columns	7
_______________________
Column type frequency:
character	1
logical	1
numeric	5
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	empty	n_unique	whitespace
Date	0	1	13	15	0	56	0

Variable type: logical

skim_variable	n_missing	complete_rate	mean	count
IsManualReport	0	1	0.61	TRU: 41, FAL: 26

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100
Id	1	7.009282e+09	1.950322e+09	1.50396e+09	6.962181e+09	6.962181e+09	8.877689e+09	8.877689e+09
WeightKg	1	7.204000e+01	1.392000e+01	5.26000e+01	6.140000e+01	6.250000e+01	8.505000e+01	1.335000e+02
WeightPounds	1	1.588100e+02	3.070000e+01	1.15960e+02	1.353600e+02	1.377900e+02	1.875000e+02	2.943200e+02
BMI	1	2.519000e+01	3.070000e+00	2.14500e+01	2.396000e+01	2.439000e+01	2.556000e+01	4.754000e+01
LogId	1	1.460000e+12	0.000000e+00	1.46000e+12	1.460000e+12	1.460000e+12	1.460000e+12	1.460000e+12

Running these summaries, we discovered that;

daily_activity has 940 observations

daily_sleep has 413 observations

weight_log_data has 67 observations

Explore number of users in each dataset using the common foreign key - ID

n_distinct(daily_activity$Id)

n_distinct(daily_activity$Id)

## [1] 33

This give us 33 IDs; this means some users might have created more IDs

n_distinct(daily_sleep$Id)

n_distinct(daily_sleep$Id)

## [1] 24

This gives 24 IDs; this means 6 participants’ information were missing in the survey

n_distinct(weight_log_data$Id)

n_distinct(weight_log_data$Id)

## [1] 8

8 entries recorded; 22 participants’ data were not populated

Explore Average Sleep time & Average Time in Bed

Avg_minutes_asleep <- daily_sleep %>% summarize(avg_sleeptime = mean(TotalMinutesAsleep))

Avg_minutes_asleep

## # A tibble: 1 × 1
##   avg_sleeptime
##           <dbl>
## 1          419.

Avg_TimeInBed <- daily_sleep %>%
summarize(avg_TimeInBed = mean(TotalTimeInBed))

Avg_TimeInBed

## # A tibble: 1 × 1
##   avg_TimeInBed
##           <dbl>
## 1          459.

The above data exploration shows the participants stayed up in bed for an additional 40 minutes before they fall asleep.

I converted the ActivityDate column to days of the week (Monday-Friday), from the daily_activity data set

daily_activity <- daily_activity %>% 
  mutate(weekday1 = weekdays(as.Date(ActivityDate, "%m/%d/%Y")))

glimpse(daily_activity)

## Rows: 940
## Columns: 16
## $ Id                       <dbl> 1503960366, 1503960366, 1503960366, 150396036…
## $ ActivityDate             <chr> "4/12/2016", "4/13/2016", "4/14/2016", "4/15/…
## $ TotalSteps               <dbl> 13162, 10735, 10460, 9762, 12669, 9705, 13019…
## $ TotalDistance            <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8…
## $ TrackerDistance          <dbl> 8.50, 6.97, 6.74, 6.28, 8.16, 6.48, 8.59, 9.8…
## $ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveDistance       <dbl> 1.88, 1.57, 2.44, 2.14, 2.71, 3.19, 3.25, 3.5…
## $ ModeratelyActiveDistance <dbl> 0.55, 0.69, 0.40, 1.26, 0.41, 0.78, 0.64, 1.3…
## $ LightActiveDistance      <dbl> 6.06, 4.71, 3.91, 2.83, 5.04, 2.51, 4.71, 5.0…
## $ SedentaryActiveDistance  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveMinutes        <dbl> 25, 21, 30, 29, 36, 38, 42, 50, 28, 19, 66, 4…
## $ FairlyActiveMinutes      <dbl> 13, 19, 11, 34, 10, 20, 16, 31, 12, 8, 27, 21…
## $ LightlyActiveMinutes     <dbl> 328, 217, 181, 209, 221, 164, 233, 264, 205, …
## $ SedentaryMinutes         <dbl> 728, 776, 1218, 726, 773, 539, 1149, 775, 818…
## $ Calories                 <dbl> 1985, 1797, 1776, 1745, 1863, 1728, 1921, 203…
## $ weekday1                 <chr> "Tuesday", "Wednesday", "Thursday", "Friday",…

daily_activity$weekday1 <- ordered(daily_activity$weekday1, levels=c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday"))

activity_data <- daily_activity %>% 
  group_by(weekday1) %>% 
  summarize(count_of = n())

glimpse(activity_data)

## Rows: 7
## Columns: 2
## $ weekday1 <ord> Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday
## $ count_of <int> 120, 152, 150, 147, 126, 124, 121

VISUALIZATION

Explore number of times FitBit users track their activity throughout the week

ggplot(activity_data, aes(x=weekday1, y=count_of)) +
  geom_bar(stat="identity",color="black",fill="#b75dab") +
  labs(title="Tracker user count across the week", x="Day of the week", y="Count") +
  geom_label(aes(label=count_of),color="black")

The visualization above shows that more tracker records were captured on Tuesday, Wednesday and Thursday.

Total Steps vs. Sedentary Minutes

ggplot(data=daily_activity,aes(x=TotalSteps,y=SedentaryMinutes, color=Calories)) +
  geom_point() +
  geom_smooth(method="lm",color="blue") +
  labs(title="Total Steps vs. Sedentary Minutes",x="Total Steps",y="Sedentary Minutes")+
  scale_color_gradient(low="#ffdca7",high="#422d9e")

## `geom_smooth()` using formula 'y ~ x'

From the visualization above, we can see an inverse relationship between Total steps taken and sedentary time in any given time.

Observing relationship between steps taken and calories burned

mean_steps <- mean(daily_activity$TotalSteps)
mean_steps

## [1] 7637.911

mean_calories <- mean(daily_activity$Calories)
mean_calories

## [1] 2303.61

ggplot(data=daily_activity, aes(x=TotalSteps,y=Calories,color=Calories)) +
  geom_point() +
  labs(title="Calories burned for every step taken",x="Total Steps Taken",y="Calories Burned") +
  geom_smooth(method="lm") +
  geom_hline(mapping = aes(yintercept=mean_calories),color="yellow",lwd=1.0)+
  geom_vline(mapping = aes(xintercept=mean_steps),color="red",lwd=1.0) +
  geom_text(mapping = aes(x=10000,y=500,label="Average Steps",srt=-90)) +
  geom_text(mapping = aes(x=29000,y=2500,label="Average Calories")) +
  scale_color_gradient(low="#ffdca7",high="#422d9e")

## `geom_smooth()` using formula 'y ~ x'

The visualization above shows a positive correlation between the steps taken and the calories burnt.

Total Minutes Asleep vs. Total Time in Bed

ggplot(data=daily_sleep, aes(x=TotalMinutesAsleep, y=TotalTimeInBed)) +
  geom_point() +
  labs(title="Time Asleep vs. Time in Bed",x="Time Asleep",y="Time in Bed") +
  geom_smooth(method="lm") + geom_jitter()

## `geom_smooth()` using formula 'y ~ x'

The visualization above shows total time in bed is positively correlated to total time asleep

Relationship between being very active and calories burned

ggplot(data=daily_activity, aes(x=VeryActiveMinutes, y=Calories, color=Calories)) +
  geom_point() +
  geom_smooth(method="loess",color="blue") +
  labs(title="Very Active Minutes vs. Calories",x="Very Active Minutes",y="Calories") + 
  scale_color_gradient(low="#ffdca7",high="#422d9e")

## `geom_smooth()` using formula 'y ~ x'

The visualization above shows a positive correlation between active minutes and calories burned.

Relationship between sedentary minutes and calories burned

ggplot(data=daily_activity, aes(x=SedentaryMinutes,y=Calories,color=Calories)) +
  geom_point() +
  geom_smooth(method="loess",color="blue") +
  labs(title="Sedentary Minutes vs. Calories Burned",x="Sedentary Minutes",y="Calories") + 
scale_color_gradient(low="#ffdca7",high="#422d9e")

## `geom_smooth()` using formula 'y ~ x'

The visualization above initially showed a positive correlation but then turned negative – lesser burned calories as sedentary minutes increased.

SHARE/RECOMMENDATION

From the insights gained from the FitBit datasets, here are some recommendations for the Bellabeat’s marketing strategy team

Bellabeat App

Redesign the Bellabeat app to give more insights into users’ health, activity, sleep, hydration levels and reproductive health. These insights would encourage users to meet fitness goals and become a social media interface for community health goals.

· The Heartrate monitor should help the users understand if they are exercising efficiently considering their other individual peculiarities like BMI, work rate and lifestyle.

· The sleep tracker, which is just as important as an exercise tracker, should help provide deeper analysis of users’ night sleep showing how much time one spends in deep sleep, restless or asleep.

· Notification of success / landmark achievements and weekly fitness challenges should also be integrated in the app. These notifications or alerts create positive reinforcements that will spur users to achieve more.

· Electrocardiogram (ECG) should also be added to the app as this sensor helps with an early detection of atrial fibrillation in the heart.

· Water intake reminders that would recommend appropriate volume of water intakes and program reminders at intervals for the users.

· Social networking interface should be added as this group activities tend to promote group challenges, group health tracking and accountability which will unconsciously motivate users for success. Furthermore, users’ favorite workouts, healthy meals and wellness tips can be shared.

· The app should also encourage users to have time-outs for meditations and stress management routine like breathing exercises, yoga and mindful meditations.

· Finally, the app should encourage users to meet the recommended 10,000 steps and at least 7 hours of sleep daily

· The app should also have a chat channel for users to directly interact with health specialists or coaches for health counseling, nutrition and exercises where the need arises.

· Menstrual Health guidance to help note your symptoms throughout the cycle.

Bellabeat Membership

· Free trial user experience to encourage try-outs of the premium service to drive more subscriptions

· Referral and Reward Program should be introduced to build membership drive.

· Discounted smart device cost for members.

· Partnership with health & fitness companies for added market presence and membership growth.

Bellabeat Products

The Bellabeat products best fits into the classy and luxury lifestyle niche. In addition to the mainstream, print and social media campaigns, Bellabeat should make presence in executive lounges, workspace hubs, bars, business and social clubs and other luxury fun places known to attract users who fit into the niche.

Thank you.

Google Data Analysis Case Study: Bellabeat

Ekene Okechukwu

2022-11-02

Introduction

Products

ASK

PREPARE

PROCESS

daily_activity:

heart_rate:

daily_sleep:

weight_log_data:

minute_METs:

ANALYZE

VISUALIZATION

Explore number of times FitBit users track their activity throughout the week

Total Steps vs. Sedentary Minutes

Observing relationship between steps taken and calories burned

Total Minutes Asleep vs. Total Time in Bed

Relationship between being very active and calories burned

Relationship between sedentary minutes and calories burned

SHARE/RECOMMENDATION

Bellabeat App

Bellabeat Membership

Bellabeat Products