BELLA BEAT PROJECT

How Can a Wellness Technology Company Play It Smart?

INTRODUCTION

Bellabeat, a high-tech manufacturer of health-focused products for women. Collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits. Since it was founded in 2013, the company has grown rapidly and quickly positioned itself as a tech-driven wellness company for women.

PRODUCT

•Bellabeat App: The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. This data can help users better understand their current habits and make healthy decisions.

•Leaf: Bellabeat’s classic wellness tracker can be worn as a bracelet, necklace, or clip. It connects to the Bellabeat app to track activity, sleep, and stress.

•Time: This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress.

•Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day. The Spring bottle connects to the Bellabeat app to track your hydration levels.

Bellabeat membership: Bellabeat also offers a subscription-based membership program for users. Membership gives users 24/7 access to fully personalized guidance on nutrition, activity, sleep, health and beauty, and mindfulness based on their lifestyle and goals.

DEFINITION OF THE PROBLEM

Bellabeat, since its inception has grown rapidly. By 2016, it had opened offices around the world and launched multiple products. Bellabeat products became available through a growing number of online retailers in addition to their own e-commerce channel on their website. The Founders, Urška Sršen and Sando Mur believe that though Bellabeat is a successful small company, it has the potential to become a larger player in the global smart device market. They suggest that an analysis of Bellabeat’s available consumer data would reveal more opportunities for growth as well as an analysis of smart device fitness data could help unlock new growth opportunities for the company.

BUSINESS TASK

• To analyze smart device usage data in order to gain insight into how consumers use non-Bellabeat smart devices.

• To select one Bellabeat product to apply these insights to in the presentation

• To provide high level recommendations for how these trends can inform Bellabeat marketing strategy.

PROBLEM QUESTIONS

•What are some trends in smart device usage?

•How could these trends apply to Bellabeat customers?

•How could these trends help influence Bellabeat marketing strategy?

DATA COLLECTION

The data collected for the purpose of this analysis is a public data that explores smart device daily users’ habit; the Fitbit Fitness Tracker Data. This dataset was made available through Mobius with license - CC0: Public Domain. The data was generated by respondents to a distributed survey via Amazon Mechanical Turk.

DATA PREPARATION AND ORGANIZATION

The Fitbit Fitness Tracker Data contains 18 files that includes information about daily activity, steps, and heart rate that can be used to explore users’ habits. The data was collected from Thirty Three eligible Fitbit users who consented to the submission of personal tracker data. The names of these users have been replaced by ID numbers for the sake of privacy and anonymity.

PROCESS

1. Installing Packages

install.packages('tidyverse')

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)

install.packages('ggplot2')

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)

install.packages("scales")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)

install.packages("lubridate")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)

install.packages("janitor")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)

install.packages("ggplot2")

## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.3'
## (as 'lib' is unspecified)

2. Loading Packages

library('tidyverse')

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.2     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library('ggplot2')
library('scales')

## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## The following object is masked from 'package:readr':
## 
##     col_factor

library('lubridate')
library('janitor')

## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

library ('ggplot2')

3. Importing ‘daily_activity’ and ‘sleep_day’ Datasets.

daily_activity <- read.csv("dailyActivity_merged.csv")

sleep_day <- read.csv("sleepDay_merged.csv")

4. Previewing The Dataset

colnames(daily_activity)

##  [1] "Id"                       "ActivityDate"            
##  [3] "TotalSteps"               "TotalDistance"           
##  [5] "TrackerDistance"          "LoggedActivitiesDistance"
##  [7] "VeryActiveDistance"       "ModeratelyActiveDistance"
##  [9] "LightActiveDistance"      "SedentaryActiveDistance" 
## [11] "VeryActiveMinutes"        "FairlyActiveMinutes"     
## [13] "LightlyActiveMinutes"     "SedentaryMinutes"        
## [15] "Calories"

head(daily_activity)

##           Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366    4/12/2016      13162          8.50            8.50
## 2 1503960366    4/13/2016      10735          6.97            6.97
## 3 1503960366    4/14/2016      10460          6.74            6.74
## 4 1503960366    4/15/2016       9762          6.28            6.28
## 5 1503960366    4/16/2016      12669          8.16            8.16
## 6 1503960366    4/17/2016       9705          6.48            6.48
##   LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1                        0               1.88                     0.55
## 2                        0               1.57                     0.69
## 3                        0               2.44                     0.40
## 4                        0               2.14                     1.26
## 5                        0               2.71                     0.41
## 6                        0               3.19                     0.78
##   LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1                6.06                       0                25
## 2                4.71                       0                21
## 3                3.91                       0                30
## 4                2.83                       0                29
## 5                5.04                       0                36
## 6                2.51                       0                38
##   FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1                  13                  328              728     1985
## 2                  19                  217              776     1797
## 3                  11                  181             1218     1776
## 4                  34                  209              726     1745
## 5                  10                  221              773     1863
## 6                  20                  164              539     1728

colnames(sleep_day)

## [1] "Id"                 "SleepDay"           "TotalSleepRecords" 
## [4] "TotalMinutesAsleep" "TotalTimeInBed"

head(sleep_day)

##           Id              SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 4/12/2016 12:00:00 AM                 1                327
## 2 1503960366 4/13/2016 12:00:00 AM                 2                384
## 3 1503960366 4/15/2016 12:00:00 AM                 1                412
## 4 1503960366 4/16/2016 12:00:00 AM                 2                340
## 5 1503960366 4/17/2016 12:00:00 AM                 1                700
## 6 1503960366 4/19/2016 12:00:00 AM                 1                304
##   TotalTimeInBed
## 1            346
## 2            407
## 3            442
## 4            367
## 5            712
## 6            320

5. Checking For The Number of Rows in the Dataset

nrow(daily_activity)

## [1] 940

nrow(sleep_day)

## [1] 413

6. Checking For the Number of Duplicated Rows in Each Datasets.

sum(duplicated(daily_activity))

## [1] 0

sum(duplicated(sleep_day))

## [1] 3

7. Removing duplicate rows from sleepDay dataset

library(tidyr)
sleep_day <- sleep_day %>% 
  distinct() %>% 
  drop_na()

8. Merging Both Datasets on ‘id’ Column

combined_data <- merge(sleep_day, daily_activity, by="Id")
n_distinct(combined_data$Id)

## [1] 24

glimpse(combined_data)

## Rows: 12,348
## Columns: 19
## $ Id                       <dbl> 1503960366, 1503960366, 1503960366, 150396036…
## $ SleepDay                 <chr> "4/12/2016 12:00:00 AM", "4/12/2016 12:00:00 …
## $ TotalSleepRecords        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ TotalMinutesAsleep       <int> 327, 327, 327, 327, 327, 327, 327, 327, 327, …
## $ TotalTimeInBed           <int> 346, 346, 346, 346, 346, 346, 346, 346, 346, …
## $ ActivityDate             <chr> "5/7/2016", "5/6/2016", "5/1/2016", "4/30/201…
## $ TotalSteps               <int> 11992, 12159, 10602, 14673, 13162, 10735, 153…
## $ TotalDistance            <dbl> 7.71, 8.03, 6.81, 9.25, 8.50, 6.97, 9.80, 8.9…
## $ TrackerDistance          <dbl> 7.71, 8.03, 6.81, 9.25, 8.50, 6.97, 9.80, 8.9…
## $ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveDistance       <dbl> 2.46, 1.97, 2.29, 3.56, 1.88, 1.57, 5.29, 2.9…
## $ ModeratelyActiveDistance <dbl> 2.12, 0.25, 1.60, 1.42, 0.55, 0.69, 0.57, 1.0…
## $ LightActiveDistance      <dbl> 3.13, 5.81, 2.92, 4.27, 6.06, 4.71, 3.94, 4.8…
## $ SedentaryActiveDistance  <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveMinutes        <int> 37, 24, 33, 52, 25, 21, 73, 45, 48, 16, 31, 7…
## $ FairlyActiveMinutes      <int> 46, 6, 35, 34, 13, 19, 14, 24, 28, 12, 23, 11…
## $ LightlyActiveMinutes     <int> 175, 289, 246, 217, 328, 217, 216, 250, 189, …
## $ SedentaryMinutes         <int> 833, 754, 730, 712, 728, 776, 814, 857, 782, …
## $ Calories                 <int> 1821, 1896, 1820, 1947, 1985, 1797, 2013, 195…

ANALYZING DATA

1. Summary statistics on data sets and combined dataset’s columns. This code chunk displays the data in this column’s overall maximum, minimum, and mean.

daily_activity %>% 
  select(TotalSteps,
         TotalDistance,
         SedentaryMinutes) %>%
  summary(daily_activity)

##    TotalSteps    TotalDistance    SedentaryMinutes
##  Min.   :    0   Min.   : 0.000   Min.   :   0.0  
##  1st Qu.: 3790   1st Qu.: 2.620   1st Qu.: 729.8  
##  Median : 7406   Median : 5.245   Median :1057.5  
##  Mean   : 7638   Mean   : 5.490   Mean   : 991.2  
##  3rd Qu.:10727   3rd Qu.: 7.713   3rd Qu.:1229.5  
##  Max.   :36019   Max.   :28.030   Max.   :1440.0

sleep_day %>%
  select(TotalSleepRecords,
         TotalMinutesAsleep,
         TotalTimeInBed) %>%
  summary(sleep_day)

##  TotalSleepRecords TotalMinutesAsleep TotalTimeInBed 
##  Min.   :1.00      Min.   : 58.0      Min.   : 61.0  
##  1st Qu.:1.00      1st Qu.:361.0      1st Qu.:403.8  
##  Median :1.00      Median :432.5      Median :463.0  
##  Mean   :1.12      Mean   :419.2      Mean   :458.5  
##  3rd Qu.:1.00      3rd Qu.:490.0      3rd Qu.:526.0  
##  Max.   :3.00      Max.   :796.0      Max.   :961.0

combined_data %>% 
  select(TotalSteps,TotalDistance,SedentaryMinutes,TotalTimeInBed,
         TotalSleepRecords,TotalMinutesAsleep) %>%
  summary()

##    TotalSteps    TotalDistance    SedentaryMinutes TotalTimeInBed 
##  Min.   :    0   Min.   : 0.000   Min.   :   0.0   Min.   : 61.0  
##  1st Qu.: 4660   1st Qu.: 3.160   1st Qu.: 659.0   1st Qu.:402.0  
##  Median : 8585   Median : 6.120   Median : 734.0   Median :462.0  
##  Mean   : 8108   Mean   : 5.722   Mean   : 799.4   Mean   :458.2  
##  3rd Qu.:11317   3rd Qu.: 7.920   3rd Qu.: 853.0   3rd Qu.:526.0  
##  Max.   :22988   Max.   :17.950   Max.   :1440.0   Max.   :961.0  
##  TotalSleepRecords TotalMinutesAsleep
##  Min.   :1.000     Min.   : 58.0     
##  1st Qu.:1.000     1st Qu.:361.0     
##  Median :1.000     Median :432.0     
##  Mean   :1.122     Mean   :419.1     
##  3rd Qu.:1.000     3rd Qu.:492.0     
##  Max.   :3.000     Max.   :796.0

2. Relationship between physical activities (like steps, active minutes) and calories burnt.

ggplot(data=combined_data) +
  geom_point(mapping =aes(x=TotalSteps, y=Calories) ) +
  labs(title="Relationship between total steps and calories",
       subtitle="There is increase in calories burnt as steps increase")

combined_data <- combined_data %>%
  mutate(TotalActiveMinutes = FairlyActiveMinutes + 
           LightlyActiveMinutes + VeryActiveMinutes)

ggplot(data=combined_data ) + 
  geom_point(mapping = aes(x=TotalActiveMinutes, y=Calories)) +
  labs(title="Relationship between total active minutes and calories")

The two visualizations above demonstrate a stronger relationship between the two variables. A user who wants to burn more calories per day will want to increase their steps and activity level.

3. Users Sleep

ggplot(data=combined_data) + geom_histogram(mapping = aes(x=TotalMinutesAsleep)) +
  labs(title="Total minutes asleep ")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

This histogram shows the hours of sleep users get. According to the histogram, most users get 5-8hours of sleep.

4.What is the average daily calories,step, and sleep of each user ?

Created another data frame called ‘daily_average’ this shows the daily average steps, calories, and sleep of each user.

daily_average <- combined_data %>%
  group_by(Id) %>%
  summarise (mean_daily_steps = mean(TotalSteps), mean_daily_calories
             = mean(Calories), mean_daily_sleep = mean(TotalMinutesAsleep))

head(daily_average)

## # A tibble: 6 × 4
##           Id mean_daily_steps mean_daily_calories mean_daily_sleep
##        <dbl>            <dbl>               <dbl>            <dbl>
## 1 1503960366           12117.               1816.             360.
## 2 1644430081            7283.               2811.             294 
## 3 1844505072            2580.               1573.             652 
## 4 1927972279             916.               2173.             417 
## 5 2026352035            5567.               1541.             506.
## 6 2320127002            4717.               1724.              61

5. How active are the users?

user_type <- daily_average %>%
  mutate(user_type = case_when(
    mean_daily_steps < 5000 ~ "sedentary",
    mean_daily_steps >= 5000 & mean_daily_steps < 7499 ~ "lightly active", 
    mean_daily_steps >= 7500 & mean_daily_steps < 9999 ~ "fairly active", 
    mean_daily_steps >= 10000 ~ "very active"
  ))
head(user_type)

## # A tibble: 6 × 5
##           Id mean_daily_steps mean_daily_calories mean_daily_sleep user_type    
##        <dbl>            <dbl>               <dbl>            <dbl> <chr>        
## 1 1503960366           12117.               1816.             360. very active  
## 2 1644430081            7283.               2811.             294  lightly acti…
## 3 1844505072            2580.               1573.             652  sedentary    
## 4 1927972279             916.               2173.             417  sedentary    
## 5 2026352035            5567.               1541.             506. lightly acti…
## 6 2320127002            4717.               1724.              61  sedentary

ggplot(data=user_type) + 
  geom_bar(mapping =aes(x=user_type)) +
  labs(title="User distribution based on average steps taken daily" )

The bar graph above show that the fairly active users use the smart devices more.

ACT

Problems with the dataset:

The dataset was last recently updated 5 years ago. It is out of date and may no longer be relevant to Bellabeat.

We cannot check the accuracy of the data because it is provided by a third party. Also,because the data was gathered from only 30 Fitbit users, there may be a bias during analysis.

Reccommendations:

• All Bellabeat products should be integrated into the Bella-App so that anyone having the app may quickly access all services. This is due to the fact that it is quite rare for anyone to go somewhere without their mobile phone. As a result, having access to a mobile phone implies having automatic access to the app, as individuals who leave their electronics at home will have the app to rely on.

• After being integrated with other products, the app should be developed in such a way that it collects all of the user’s necessary data, including product usage data.

• The Bellabeat app should generate a monthly report for each user, which will be mailed to them. This report will include a monthly summary on the relationship, pattern, and trends of their app usage data collected and analyzed, as well as tips on how to live a healthier life.

• A nutrition plan should also be integrated into the app, easing the effort to provide users with more suggestions on how to improve their health and wellness.

BELLA BEAT PROJECT

MICHELLE DE-VEER

2023-08-24