Scenario :

This is my version of the Google Data Analytics : Case Study 2 , A ‘Wellness Technology’ Company case study.

Bellabeat A high-tech company that manufactures health-focused smart products. They said this case study will be an another ‘Tangible’ way to demonstrate my knowledge and skills so here we are :

I will be performing many real-world tasks of a junior data analyst working on the marketing analyst team at Bellabeat, a high-tech manufacturer of health-focused products for women. “Bellabeat” is a successful small company, but they have the potential to become a larger player in the global smart device market.

I have joined this team six months ago and have been busy learning about Bellabeat’s mission and business goals — as well as how I, as a junior data analyst, can help Bellabeat achieve them.

Urška Sršen, cofounder and Chief Creative Officer of Bellabeat, believes that analyzing smart device fitness data could help unlock new growth opportunities for the company. I have been asked to focus on one of Bellabeat’s products and analyze smart device data to gain insight into how consumers are using their smart devices. The insights I discover will then help guide marketing strategy for the company. And I will be presenting data analysis to the Bellabeat executive team along with my high-level recommendations for Bellabeat’s marketing strategy.

Characters and Products :

Characters

Urška Sršen: Bellabeat’s cofounder and Chief Creative Officer.

Sando Mur: Mathematician and Bellabeat’s cofounder; key member of the Bellabeat executive team.

Bellabeat Marketing Analytics Team: A team of Data Analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat’s marketing strategy.

Products

Bellabeat app : The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits.

Leaf : Bellabeat’s classic wellness tracker can be worn as a bracelet, necklace, or clip. The Leaf tracker connects to the Bellabeat app to track activity, sleep, and stress.

Time : This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress.

Spring : This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day. The Spring bottle connects to the Bellabeat app to track your hydration levels.

Bellabeat membership : Bellabeat also offers a subscription-based membership program for users. Membership gives users 24/7 access to fully personalized guidance on nutrition, activity, sleep, health and beauty, and mindfulness based on their lifestyle and goals.

Before we begin the Process for this Project, there are few key points that are wrapped below: As these are the ‘phases’ I’ll be following to ensure Data Analysis completion.

I’ll be following these vital steps for the Data Analysis Process:

PHASE 1 : Ask

Sršen asks me to analyze smart device usage data in order to gain insight into how consumers use non-Bellabeat smart devices. She then wants me to select one Bellabeat product to apply these insights to, in my presentation.

About the company:

Urška Sršen and Sando Mur founded Bellabeat, a high-tech company that manufactures health-focused smart products. Sršen used her background as an artist to develop beautifully designed technology that informs and inspires women around the world. Collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits. Since it was founded in 2013, Bellabeat has grown rapidly and quickly positioned itself as a tech-driven wellness company for women.

By 2016, Bellabeat had opened offices around the world and launched multiple products. Bellabeat products became available through a growing number of online retailers in addition to their own e-commerce channel on their website. The company has invested in traditional advertising media, such as radio, out-of-home billboards, print, and television, but focuses on digital marketing extensively. Bellabeat invests year-round in Google Search, maintaining active Facebook and Instagram pages, and consistently engages consumers on Twitter. Additionally, Bellabeat runs video ads on Youtube and display ads on the Google Display Network to support campaigns around key marketing dates.

Sršen knows that an analysis of Bellabeat’s available consumer data would reveal more opportunities for growth. She has asked the marketing analytics team to focus on a Bellabeat product and analyze smart device usage data in order to gain insight into how people are already using their smart devices. Then, using this information, she would like high-level recommendations for how these trends can inform Bellabeat marketing strategy.

Deliverable

To gain insights from Data to solve business problem

PHASE 2 : Prepare

I’ll be using Public Dataset to analyze and identify trends.

The data is located on FitBit Fitness Tracker Data (CC0: Public Domain , dataset made available through Mobius.)

Downloaded all 18 dataset and stored it on my Google Drive too.

I will use the ROCCC system to determine the credibility and integrity of the data.

Reliability: This data is not reliable. There are no further information for example : margin of error , smart device type etc & small sample size has been used, which can limit the data analysis that can be done.
Originality: This is not an original dataset as it was originally collected from Amazon Mechanical Turk.
Comprehensiveness: This data is not comprehensive. There is no further information about the participants, gender, age, health state, etc. If the data is biased, then the insights from the analysis will be unfair and a complete time waste.
Current: This data was collected back in 2016, which is currently outdated.
Cited: Amazon Mechanical Turk created the dataset, but we have no information on whether this is a credible source.

Now the Datasets is clearly does not meet the ROCCC System. Therefore,insights from the Analysis might only provide some direction(s), I guess.

Downloaded data and stored it appropriately.
Identified how it’s organized.
Determined the credibility of the data.

PHASE 3 : Process

R is primarily used for statistical analysis and data visualization. So, I chose ‘RStudio’ to merge appropriate dataset for further Analysis.

Setting up the Environment

Dependencies

# install.packages("tidyverse")
# install.packages("lubridate")
# install.packages("ggplot2")
# install.packages("janitor")

Libraries

library(tidyverse)
library(lubridate)
library(ggplot2)
library(janitor)

Working Directory

setwd("D:/Case_Study/Data/Bellabeat/Fitabase Data 4.12.16-5.12.16")

> Categorized Data Collection

Hourly Data

hour_cal <- read.csv("hourlyCalories_merged.csv")
hour_inten <- read.csv("hourlyIntensities_merged.csv")
hour_steps <- read.csv("hourlySteps_merged.csv")

Minute Data

min_cal <- read.csv("minuteCaloriesNarrow_merged.csv")
min_cal_wide <- read.csv("minuteCaloriesWide_merged.csv")
min_inten <- read.csv("minuteIntensitiesNarrow_merged.csv")
min_inten_wide <- read.csv("minuteIntensitiesWide_merged.csv")
min_mets <- read.csv("minuteMetsNarrow_merged.csv")
min_sleep <- read.csv("minuteSleep_merged.csv")
min_steps <- read.csv("minuteStepsNarrow_merged.csv")
min_steps_wide <- read.csv("minuteStepsWide_merged.csv")

Daily Data

daily_sleep <- read.csv("sleepDay_merged.csv")
daily_act <- read.csv("dailyActivity_merged.csv")
daily_cal <- read.csv("dailyCalories_merged.csv")
daily_inten <- read.csv("dailyIntensities_merged.csv")
daily_steps <- read.csv("dailySteps_merged.csv")

Weight Data (Incomplete)

weight_log <- read.csv("weightLogInfo_merged.csv")

Heart Rate Data (Incomplete)

heart_sec <- read.csv("heartrate_seconds_merged.csv")

Gonna drop these datasets :

weight_log
all the dataframe ending with “wide”

PHASE 4 : Analyse

Working on Daily Datasets :

Data Cleaning & Manipulation on “daily_sleep” :

daily_sleep <- daily_sleep %>% 
  clean_names() %>% 
  rename(act_date = sleep_day,
         sleep_min = total_minutes_asleep,
         inbed_min = total_time_in_bed) %>%
  distinct()

daily_sleep$act_date <- as.Date(daily_sleep$act_date, format = "%m/%d/%Y %H:%M:%S %p")
str(daily_sleep)

## 'data.frame':    410 obs. of  5 variables:
##  $ id                 : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ act_date           : Date, format: "2016-04-12" "2016-04-13" ...
##  $ total_sleep_records: int  1 2 1 2 1 1 1 1 1 1 ...
##  $ sleep_min          : int  327 384 412 340 700 304 360 325 361 430 ...
##  $ inbed_min          : int  346 407 442 367 712 320 377 364 384 449 ...

Statistical Summary

summary(daily_sleep)

##        id               act_date          total_sleep_records   sleep_min    
##  Min.   :1.504e+09   Min.   :2016-04-12   Min.   :1.00        Min.   : 58.0  
##  1st Qu.:3.977e+09   1st Qu.:2016-04-19   1st Qu.:1.00        1st Qu.:361.0  
##  Median :4.703e+09   Median :2016-04-27   Median :1.00        Median :432.5  
##  Mean   :4.995e+09   Mean   :2016-04-26   Mean   :1.12        Mean   :419.2  
##  3rd Qu.:6.962e+09   3rd Qu.:2016-05-04   3rd Qu.:1.00        3rd Qu.:490.0  
##  Max.   :8.792e+09   Max.   :2016-05-12   Max.   :3.00        Max.   :796.0  
##    inbed_min    
##  Min.   : 61.0  
##  1st Qu.:403.8  
##  Median :463.0  
##  Mean   :458.5  
##  3rd Qu.:526.0  
##  Max.   :961.0

Now daily_sleep is ready for next phase

Average Sleep duration is almost 7 hours which is overall good for Health.

Data Cleaning & Manipulation on “daily_act” :

daily_act <- daily_act %>% 
  clean_names() %>% 
  rename(act_date = activity_date) %>% 
  mutate(act_date = as.Date(act_date, format = "%m/%d/%Y")) %>% 
  distinct()
  
str(daily_act)

## 'data.frame':    940 obs. of  15 variables:
##  $ id                        : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ act_date                  : Date, format: "2016-04-12" "2016-04-13" ...
##  $ total_steps               : int  13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
##  $ total_distance            : num  8.5 6.97 6.74 6.28 8.16 ...
##  $ tracker_distance          : num  8.5 6.97 6.74 6.28 8.16 ...
##  $ logged_activities_distance: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ very_active_distance      : num  1.88 1.57 2.44 2.14 2.71 ...
##  $ moderately_active_distance: num  0.55 0.69 0.4 1.26 0.41 ...
##  $ light_active_distance     : num  6.06 4.71 3.91 2.83 5.04 ...
##  $ sedentary_active_distance : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ very_active_minutes       : int  25 21 30 29 36 38 42 50 28 19 ...
##  $ fairly_active_minutes     : int  13 19 11 34 10 20 16 31 12 8 ...
##  $ lightly_active_minutes    : int  328 217 181 209 221 164 233 264 205 211 ...
##  $ sedentary_minutes         : int  728 776 1218 726 773 539 1149 775 818 838 ...
##  $ calories                  : int  1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...

Statistical Summary

summary(daily_act)

##        id               act_date           total_steps    total_distance  
##  Min.   :1.504e+09   Min.   :2016-04-12   Min.   :    0   Min.   : 0.000  
##  1st Qu.:2.320e+09   1st Qu.:2016-04-19   1st Qu.: 3790   1st Qu.: 2.620  
##  Median :4.445e+09   Median :2016-04-26   Median : 7406   Median : 5.245  
##  Mean   :4.855e+09   Mean   :2016-04-26   Mean   : 7638   Mean   : 5.490  
##  3rd Qu.:6.962e+09   3rd Qu.:2016-05-04   3rd Qu.:10727   3rd Qu.: 7.713  
##  Max.   :8.878e+09   Max.   :2016-05-12   Max.   :36019   Max.   :28.030  
##  tracker_distance logged_activities_distance very_active_distance
##  Min.   : 0.000   Min.   :0.0000             Min.   : 0.000      
##  1st Qu.: 2.620   1st Qu.:0.0000             1st Qu.: 0.000      
##  Median : 5.245   Median :0.0000             Median : 0.210      
##  Mean   : 5.475   Mean   :0.1082             Mean   : 1.503      
##  3rd Qu.: 7.710   3rd Qu.:0.0000             3rd Qu.: 2.053      
##  Max.   :28.030   Max.   :4.9421             Max.   :21.920      
##  moderately_active_distance light_active_distance sedentary_active_distance
##  Min.   :0.0000             Min.   : 0.000        Min.   :0.000000         
##  1st Qu.:0.0000             1st Qu.: 1.945        1st Qu.:0.000000         
##  Median :0.2400             Median : 3.365        Median :0.000000         
##  Mean   :0.5675             Mean   : 3.341        Mean   :0.001606         
##  3rd Qu.:0.8000             3rd Qu.: 4.782        3rd Qu.:0.000000         
##  Max.   :6.4800             Max.   :10.710        Max.   :0.110000         
##  very_active_minutes fairly_active_minutes lightly_active_minutes
##  Min.   :  0.00      Min.   :  0.00        Min.   :  0.0         
##  1st Qu.:  0.00      1st Qu.:  0.00        1st Qu.:127.0         
##  Median :  4.00      Median :  6.00        Median :199.0         
##  Mean   : 21.16      Mean   : 13.56        Mean   :192.8         
##  3rd Qu.: 32.00      3rd Qu.: 19.00        3rd Qu.:264.0         
##  Max.   :210.00      Max.   :143.00        Max.   :518.0         
##  sedentary_minutes    calories   
##  Min.   :   0.0    Min.   :   0  
##  1st Qu.: 729.8    1st Qu.:1828  
##  Median :1057.5    Median :2134  
##  Mean   : 991.2    Mean   :2304  
##  3rd Qu.:1229.5    3rd Qu.:2793  
##  Max.   :1440.0    Max.   :4900

Almost 17 hours of Average sitting, this is some kinda serious problem whether it is for a person or even for a company, insights from the data representing wearable device to be faulty as it can not happen, imagining everyone sitting for 17 hours . If we look closer on min. and max. sitting minutes are 0 and 24 hours respectively which 100 % inaccurate. so now ‘sedentary tracker’ need to be fixed or it’s time for the upgrade.

Data Cleaning & Manipulation on “daily_cal” :

daily_cal <- daily_cal %>% 
  clean_names() %>% 
  rename(act_date = activity_day) %>%
  mutate(act_date = as.Date(act_date, format = "%m/%d/%Y")) %>% 
  distinct()

str(daily_cal)

## 'data.frame':    940 obs. of  3 variables:
##  $ id      : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ act_date: Date, format: "2016-04-12" "2016-04-13" ...
##  $ calories: int  1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...

Statistical Summary

summary(daily_cal)

##        id               act_date             calories   
##  Min.   :1.504e+09   Min.   :2016-04-12   Min.   :   0  
##  1st Qu.:2.320e+09   1st Qu.:2016-04-19   1st Qu.:1828  
##  Median :4.445e+09   Median :2016-04-26   Median :2134  
##  Mean   :4.855e+09   Mean   :2016-04-26   Mean   :2304  
##  3rd Qu.:6.962e+09   3rd Qu.:2016-05-04   3rd Qu.:2793  
##  Max.   :8.878e+09   Max.   :2016-05-12   Max.   :4900

Looks good

Data Cleaning & Manipulation on “daily_inten” :

daily_inten <- daily_inten %>% 
  clean_names() %>% 
  rename(act_date = activity_day) %>% 
  mutate(act_date = as.Date(act_date, format = "%m/%d/%Y")) %>% 
  distinct()

str(daily_inten)

## 'data.frame':    940 obs. of  10 variables:
##  $ id                        : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ act_date                  : Date, format: "2016-04-12" "2016-04-13" ...
##  $ sedentary_minutes         : int  728 776 1218 726 773 539 1149 775 818 838 ...
##  $ lightly_active_minutes    : int  328 217 181 209 221 164 233 264 205 211 ...
##  $ fairly_active_minutes     : int  13 19 11 34 10 20 16 31 12 8 ...
##  $ very_active_minutes       : int  25 21 30 29 36 38 42 50 28 19 ...
##  $ sedentary_active_distance : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ light_active_distance     : num  6.06 4.71 3.91 2.83 5.04 ...
##  $ moderately_active_distance: num  0.55 0.69 0.4 1.26 0.41 ...
##  $ very_active_distance      : num  1.88 1.57 2.44 2.14 2.71 ...

Statistical Summary

summary(daily_inten)

##        id               act_date          sedentary_minutes
##  Min.   :1.504e+09   Min.   :2016-04-12   Min.   :   0.0   
##  1st Qu.:2.320e+09   1st Qu.:2016-04-19   1st Qu.: 729.8   
##  Median :4.445e+09   Median :2016-04-26   Median :1057.5   
##  Mean   :4.855e+09   Mean   :2016-04-26   Mean   : 991.2   
##  3rd Qu.:6.962e+09   3rd Qu.:2016-05-04   3rd Qu.:1229.5   
##  Max.   :8.878e+09   Max.   :2016-05-12   Max.   :1440.0   
##  lightly_active_minutes fairly_active_minutes very_active_minutes
##  Min.   :  0.0          Min.   :  0.00        Min.   :  0.00     
##  1st Qu.:127.0          1st Qu.:  0.00        1st Qu.:  0.00     
##  Median :199.0          Median :  6.00        Median :  4.00     
##  Mean   :192.8          Mean   : 13.56        Mean   : 21.16     
##  3rd Qu.:264.0          3rd Qu.: 19.00        3rd Qu.: 32.00     
##  Max.   :518.0          Max.   :143.00        Max.   :210.00     
##  sedentary_active_distance light_active_distance moderately_active_distance
##  Min.   :0.000000          Min.   : 0.000        Min.   :0.0000            
##  1st Qu.:0.000000          1st Qu.: 1.945        1st Qu.:0.0000            
##  Median :0.000000          Median : 3.365        Median :0.2400            
##  Mean   :0.001606          Mean   : 3.341        Mean   :0.5675            
##  3rd Qu.:0.000000          3rd Qu.: 4.782        3rd Qu.:0.8000            
##  Max.   :0.110000          Max.   :10.710        Max.   :6.4800            
##  very_active_distance
##  Min.   : 0.000      
##  1st Qu.: 0.000      
##  Median : 0.210      
##  Mean   : 1.503      
##  3rd Qu.: 2.053      
##  Max.   :21.920

Looks good

Data Cleaning & Manipulation on “daily_steps” :

daily_steps <- daily_steps %>% 
  clean_names() %>% 
  rename(act_date = activity_day) %>% 
  mutate(act_date = as.Date(act_date, format = "%m/%d/%Y")) %>% 
  distinct()

str(daily_steps)

## 'data.frame':    940 obs. of  3 variables:
##  $ id        : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ act_date  : Date, format: "2016-04-12" "2016-04-13" ...
##  $ step_total: int  13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...

Statistical Summary

summary(daily_steps)

##        id               act_date            step_total   
##  Min.   :1.504e+09   Min.   :2016-04-12   Min.   :    0  
##  1st Qu.:2.320e+09   1st Qu.:2016-04-19   1st Qu.: 3790  
##  Median :4.445e+09   Median :2016-04-26   Median : 7406  
##  Mean   :4.855e+09   Mean   :2016-04-26   Mean   : 7638  
##  3rd Qu.:6.962e+09   3rd Qu.:2016-05-04   3rd Qu.:10727  
##  Max.   :8.878e+09   Max.   :2016-05-12   Max.   :36019

Looks good

Working on Minutes Datasets :

Data Cleaning & Manipulation on “min_cal” :

min_cal <- min_cal %>% 
  clean_names() %>% 
  rename(act_date = activity_minute)%>% 
  distinct()


min_cal$date <- mdy_hms(min_cal$act_date)
min_cal$time <- format(as.POSIXct(min_cal$date), format = "%H:%M %p")

min_cal <- min_cal %>% 
  select(-c(act_date))

str(min_cal)

## 'data.frame':    1325580 obs. of  4 variables:
##  $ id      : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ calories: num  0.786 0.786 0.786 0.786 0.786 ...
##  $ date    : POSIXct, format: "2016-04-12 00:00:00" "2016-04-12 00:01:00" ...
##  $ time    : chr  "00:00 AM" "00:01 AM" "00:02 AM" "00:03 AM" ...

Statistical Summary

summary(min_cal)

##        id               calories            date                       
##  Min.   :1.504e+09   Min.   : 0.0000   Min.   :2016-04-12 00:00:00.00  
##  1st Qu.:2.320e+09   1st Qu.: 0.9357   1st Qu.:2016-04-19 01:51:00.00  
##  Median :4.445e+09   Median : 1.2176   Median :2016-04-26 06:27:00.00  
##  Mean   :4.848e+09   Mean   : 1.6231   Mean   :2016-04-26 12:09:55.15  
##  3rd Qu.:6.962e+09   3rd Qu.: 1.4327   3rd Qu.:2016-05-03 18:55:00.00  
##  Max.   :8.878e+09   Max.   :19.7499   Max.   :2016-05-12 15:59:00.00  
##      time          
##  Length:1325580    
##  Class :character  
##  Mode  :character  
##                    
##                    
##

Looks good

Data Cleaning & Manipulation on “min_inten” :

min_inten <- min_inten %>% 
  clean_names() %>% 
  rename(act_date = activity_minute) %>% 
  distinct()
  
min_inten$date <- mdy_hms(min_inten$act_date)
min_inten$time <- format(as.POSIXct(min_inten$date), format = "%H:%M %p")
min_inten <- min_inten %>% 
  select(-c(act_date))

str(min_inten)

## 'data.frame':    1325580 obs. of  4 variables:
##  $ id       : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ intensity: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ date     : POSIXct, format: "2016-04-12 00:00:00" "2016-04-12 00:01:00" ...
##  $ time     : chr  "00:00 AM" "00:01 AM" "00:02 AM" "00:03 AM" ...

Statistical Summary

summary(min_inten)

##        id              intensity           date                       
##  Min.   :1.504e+09   Min.   :0.0000   Min.   :2016-04-12 00:00:00.00  
##  1st Qu.:2.320e+09   1st Qu.:0.0000   1st Qu.:2016-04-19 01:51:00.00  
##  Median :4.445e+09   Median :0.0000   Median :2016-04-26 06:27:00.00  
##  Mean   :4.848e+09   Mean   :0.2006   Mean   :2016-04-26 12:09:55.15  
##  3rd Qu.:6.962e+09   3rd Qu.:0.0000   3rd Qu.:2016-05-03 18:55:00.00  
##  Max.   :8.878e+09   Max.   :3.0000   Max.   :2016-05-12 15:59:00.00  
##      time          
##  Length:1325580    
##  Class :character  
##  Mode  :character  
##                    
##                    
##

Looks good

Data Cleaning & Manipulation on “min_mets” :

min_mets <- min_mets %>% 
  clean_names() %>% 
  rename(act_date = activity_minute) %>%
  rename(mets = me_ts) %>% 
  distinct()

min_mets$date <- mdy_hms(min_mets$act_date)
min_mets$time <- format(as.POSIXct(min_mets$date), format = "%H:%M %p")

min_mets <- min_mets %>% 
  select(-c(act_date))

str(min_mets)

## 'data.frame':    1325580 obs. of  4 variables:
##  $ id  : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ mets: int  10 10 10 10 10 12 12 12 12 12 ...
##  $ date: POSIXct, format: "2016-04-12 00:00:00" "2016-04-12 00:01:00" ...
##  $ time: chr  "00:00 AM" "00:01 AM" "00:02 AM" "00:03 AM" ...

Statistical Summary

summary(min_mets)

##        id                 mets             date                       
##  Min.   :1.504e+09   Min.   :  0.00   Min.   :2016-04-12 00:00:00.00  
##  1st Qu.:2.320e+09   1st Qu.: 10.00   1st Qu.:2016-04-19 01:51:00.00  
##  Median :4.445e+09   Median : 10.00   Median :2016-04-26 06:27:00.00  
##  Mean   :4.848e+09   Mean   : 14.69   Mean   :2016-04-26 12:09:55.15  
##  3rd Qu.:6.962e+09   3rd Qu.: 11.00   3rd Qu.:2016-05-03 18:55:00.00  
##  Max.   :8.878e+09   Max.   :157.00   Max.   :2016-05-12 15:59:00.00  
##      time          
##  Length:1325580    
##  Class :character  
##  Mode  :character  
##                    
##                    
##

Looks good

Data Cleaning & Manipulation on “min_mets” :

min_sleep <- min_sleep %>% 
  clean_names() %>% 
  rename(act_date = date) %>% 
  distinct()

min_sleep$date <- mdy_hms(min_sleep$act_date)
min_sleep$time <- format(as.POSIXct(min_sleep$date), format = "%H:%M %p")

min_sleep <- min_sleep %>% 
  select(-c(act_date))

str(min_sleep)

## 'data.frame':    187978 obs. of  5 variables:
##  $ id    : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ value : int  3 2 1 1 1 1 1 2 2 2 ...
##  $ log_id: num  1.14e+10 1.14e+10 1.14e+10 1.14e+10 1.14e+10 ...
##  $ date  : POSIXct, format: "2016-04-12 02:47:30" "2016-04-12 02:48:30" ...
##  $ time  : chr  "02:47 AM" "02:48 AM" "02:49 AM" "02:50 AM" ...

Statistical Summary

summary(min_sleep)

##        id                value           log_id         
##  Min.   :1.504e+09   Min.   :1.000   Min.   :1.137e+10  
##  1st Qu.:3.977e+09   1st Qu.:1.000   1st Qu.:1.144e+10  
##  Median :4.703e+09   Median :1.000   Median :1.150e+10  
##  Mean   :4.997e+09   Mean   :1.096   Mean   :1.150e+10  
##  3rd Qu.:6.962e+09   3rd Qu.:1.000   3rd Qu.:1.155e+10  
##  Max.   :8.792e+09   Max.   :3.000   Max.   :1.162e+10  
##       date                            time          
##  Min.   :2016-04-11 20:48:00.00   Length:187978     
##  1st Qu.:2016-04-19 02:48:00.00   Class :character  
##  Median :2016-04-26 21:48:00.00   Mode  :character  
##  Mean   :2016-04-26 13:31:23.11                     
##  3rd Qu.:2016-05-03 23:47:00.00                     
##  Max.   :2016-05-12 09:56:00.00

“min_sleep” dataset has no further information about data, this is an example of bad data. After seeing this I went through the dataset and noticed something weired data and we can not use this dataset for Data Analysis

Dropping ‘min_sleep’ dataset

rm(min_sleep)

Data Cleaning & Manipulation on “min_mets” :

min_steps <- min_steps %>% 
  clean_names() %>% 
  rename(act_date = activity_minute) %>% 
  distinct()

min_steps$date <- mdy_hms(min_steps$act_date)
min_steps$time <- format(as.POSIXct(min_steps$date), format = "%H:%M %p")

min_steps <- min_steps %>% 
  select(-c(act_date))
str(min_steps)

## 'data.frame':    1325580 obs. of  4 variables:
##  $ id   : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ steps: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ date : POSIXct, format: "2016-04-12 00:00:00" "2016-04-12 00:01:00" ...
##  $ time : chr  "00:00 AM" "00:01 AM" "00:02 AM" "00:03 AM" ...

Statistical Summary

summary(min_steps)

##        id                steps              date                       
##  Min.   :1.504e+09   Min.   :  0.000   Min.   :2016-04-12 00:00:00.00  
##  1st Qu.:2.320e+09   1st Qu.:  0.000   1st Qu.:2016-04-19 01:51:00.00  
##  Median :4.445e+09   Median :  0.000   Median :2016-04-26 06:27:00.00  
##  Mean   :4.848e+09   Mean   :  5.336   Mean   :2016-04-26 12:09:55.15  
##  3rd Qu.:6.962e+09   3rd Qu.:  0.000   3rd Qu.:2016-05-03 18:55:00.00  
##  Max.   :8.878e+09   Max.   :220.000   Max.   :2016-05-12 15:59:00.00  
##      time          
##  Length:1325580    
##  Class :character  
##  Mode  :character  
##                    
##                    
##

Looks good

Working on heart_sec Datasets :

Data Cleaning & Manipulation on “heart_sec” :

heart_sec <- heart_sec %>% 
  clean_names() %>% 
  rename(act_date = time, bpm = value) %>%
  distinct()

heart_sec$date <- mdy_hms(heart_sec$act_date)
heart_sec$time <- format(as.POSIXct(heart_sec$date), format = "%H:%M:%S %p")

heart_sec <- heart_sec %>% 
  select(-c(act_date))

str(heart_sec)

## 'data.frame':    2483658 obs. of  4 variables:
##  $ id  : num  2.02e+09 2.02e+09 2.02e+09 2.02e+09 2.02e+09 ...
##  $ bpm : int  97 102 105 103 101 95 91 93 94 93 ...
##  $ date: POSIXct, format: "2016-04-12 07:21:00" "2016-04-12 07:21:05" ...
##  $ time: chr  "07:21:00 AM" "07:21:05 AM" "07:21:10 AM" "07:21:20 AM" ...

Statistical Summary

summary(heart_sec)

##        id                 bpm              date                       
##  Min.   :2.022e+09   Min.   : 36.00   Min.   :2016-04-12 00:00:00.00  
##  1st Qu.:4.388e+09   1st Qu.: 63.00   1st Qu.:2016-04-19 06:18:10.00  
##  Median :5.554e+09   Median : 73.00   Median :2016-04-26 20:28:50.00  
##  Mean   :5.514e+09   Mean   : 77.33   Mean   :2016-04-26 19:43:52.24  
##  3rd Qu.:6.962e+09   3rd Qu.: 88.00   3rd Qu.:2016-05-04 08:00:20.00  
##  Max.   :8.878e+09   Max.   :203.00   Max.   :2016-05-12 16:20:00.00  
##      time          
##  Length:2483658    
##  Class :character  
##  Mode  :character  
##                    
##                    
##

Looks good, somehow some people facing some serious kind of disease, their bpm is very low they need to be hospitalized where some facing very high with bpm.

Working on Hourly Datasets :

Data Cleaning & Manipulation on “hour_steps” :

hour_steps <- hour_steps %>% 
  clean_names() %>% 
  rename(act_date = activity_hour) %>% 
  distinct()

hour_steps$date <- mdy_hms(hour_steps$act_date)
hour_steps$time <- format(as.POSIXct(hour_steps$date), format = "%H:%M %p")

hour_steps <- hour_steps %>% 
  select(-c(act_date))

str(hour_steps)

## 'data.frame':    22099 obs. of  4 variables:
##  $ id        : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ step_total: int  373 160 151 0 0 0 0 0 250 1864 ...
##  $ date      : POSIXct, format: "2016-04-12 00:00:00" "2016-04-12 01:00:00" ...
##  $ time      : chr  "00:00 AM" "01:00 AM" "02:00 AM" "03:00 AM" ...

Statistical Summary

summary(hour_steps)

##        id              step_total           date                       
##  Min.   :1.504e+09   Min.   :    0.0   Min.   :2016-04-12 00:00:00.00  
##  1st Qu.:2.320e+09   1st Qu.:    0.0   1st Qu.:2016-04-19 01:00:00.00  
##  Median :4.445e+09   Median :   40.0   Median :2016-04-26 06:00:00.00  
##  Mean   :4.848e+09   Mean   :  320.2   Mean   :2016-04-26 11:46:42.58  
##  3rd Qu.:6.962e+09   3rd Qu.:  357.0   3rd Qu.:2016-05-03 19:00:00.00  
##  Max.   :8.878e+09   Max.   :10554.0   Max.   :2016-05-12 15:00:00.00  
##      time          
##  Length:22099      
##  Class :character  
##  Mode  :character  
##                    
##                    
##

Looks good

Data Cleaning & Manipulation on “hour_inten” :

hour_inten <- hour_inten %>% 
  clean_names() %>% 
  rename(act_date = activity_hour) %>% 
  distinct()

hour_inten$date <- mdy_hms(hour_inten$act_date)
hour_inten$time <- format(as.POSIXct(hour_inten$date), format = "%H:%M %p")

hour_inten <- hour_inten %>% 
  select(-c(act_date))

str(hour_inten)

## 'data.frame':    22099 obs. of  5 variables:
##  $ id               : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ total_intensity  : int  20 8 7 0 0 0 0 0 13 30 ...
##  $ average_intensity: num  0.333 0.133 0.117 0 0 ...
##  $ date             : POSIXct, format: "2016-04-12 00:00:00" "2016-04-12 01:00:00" ...
##  $ time             : chr  "00:00 AM" "01:00 AM" "02:00 AM" "03:00 AM" ...

Statistical Summary

summary(hour_inten)

##        id            total_intensity  average_intensity
##  Min.   :1.504e+09   Min.   :  0.00   Min.   :0.0000   
##  1st Qu.:2.320e+09   1st Qu.:  0.00   1st Qu.:0.0000   
##  Median :4.445e+09   Median :  3.00   Median :0.0500   
##  Mean   :4.848e+09   Mean   : 12.04   Mean   :0.2006   
##  3rd Qu.:6.962e+09   3rd Qu.: 16.00   3rd Qu.:0.2667   
##  Max.   :8.878e+09   Max.   :180.00   Max.   :3.0000   
##       date                            time          
##  Min.   :2016-04-12 00:00:00.00   Length:22099      
##  1st Qu.:2016-04-19 01:00:00.00   Class :character  
##  Median :2016-04-26 06:00:00.00   Mode  :character  
##  Mean   :2016-04-26 11:46:42.58                     
##  3rd Qu.:2016-05-03 19:00:00.00                     
##  Max.   :2016-05-12 15:00:00.00

Looks good

Data Cleaning & Manipulation on “hour_cal” :

hour_cal <- hour_cal %>% 
  clean_names() %>% 
  rename(act_date = activity_hour)%>% 
  distinct()

hour_cal$date <- mdy_hms(hour_cal$act_date)
hour_cal$time <- format(as.POSIXct(hour_cal$date), format = "%H:%M %p")


hour_cal <- hour_cal %>% 
  select(-c(act_date))

str(hour_cal)

## 'data.frame':    22099 obs. of  4 variables:
##  $ id      : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ calories: int  81 61 59 47 48 48 48 47 68 141 ...
##  $ date    : POSIXct, format: "2016-04-12 00:00:00" "2016-04-12 01:00:00" ...
##  $ time    : chr  "00:00 AM" "01:00 AM" "02:00 AM" "03:00 AM" ...

Statistical Summary

summary(hour_cal)

##        id               calories           date                       
##  Min.   :1.504e+09   Min.   : 42.00   Min.   :2016-04-12 00:00:00.00  
##  1st Qu.:2.320e+09   1st Qu.: 63.00   1st Qu.:2016-04-19 01:00:00.00  
##  Median :4.445e+09   Median : 83.00   Median :2016-04-26 06:00:00.00  
##  Mean   :4.848e+09   Mean   : 97.39   Mean   :2016-04-26 11:46:42.58  
##  3rd Qu.:6.962e+09   3rd Qu.:108.00   3rd Qu.:2016-05-03 19:00:00.00  
##  Max.   :8.878e+09   Max.   :948.00   Max.   :2016-05-12 15:00:00.00  
##      time          
##  Length:22099      
##  Class :character  
##  Mode  :character  
##                    
##                    
##

Looks good

finding relationship bewtween different datasets and merging :

hourly data

hourly_data <- hour_cal %>% 
  left_join(hour_inten, by = c("id", "date", "time")) %>% 
  left_join(hour_steps, by = c("id", "date", "time")) %>% 
  arrange(time) %>% 
  distinct()

hourly_data <- hourly_data %>% 
  select(-c(average_intensity))
str(hourly_data)

## 'data.frame':    22099 obs. of  6 variables:
##  $ id             : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ calories       : int  81 69 56 60 77 47 82 47 54 54 ...
##  $ date           : POSIXct, format: "2016-04-12 00:00:00" "2016-04-13 00:00:00" ...
##  $ time           : chr  "00:00 AM" "00:00 AM" "00:00 AM" "00:00 AM" ...
##  $ total_intensity: int  20 14 4 6 15 0 21 0 2 2 ...
##  $ step_total     : int  373 144 81 83 459 0 416 0 16 17 ...

Statistical Summary

summary(hourly_data)

##        id               calories           date                       
##  Min.   :1.504e+09   Min.   : 42.00   Min.   :2016-04-12 00:00:00.00  
##  1st Qu.:2.320e+09   1st Qu.: 63.00   1st Qu.:2016-04-19 01:00:00.00  
##  Median :4.445e+09   Median : 83.00   Median :2016-04-26 06:00:00.00  
##  Mean   :4.848e+09   Mean   : 97.39   Mean   :2016-04-26 11:46:42.58  
##  3rd Qu.:6.962e+09   3rd Qu.:108.00   3rd Qu.:2016-05-03 19:00:00.00  
##  Max.   :8.878e+09   Max.   :948.00   Max.   :2016-05-12 15:00:00.00  
##      time           total_intensity    step_total     
##  Length:22099       Min.   :  0.00   Min.   :    0.0  
##  Class :character   1st Qu.:  0.00   1st Qu.:    0.0  
##  Mode  :character   Median :  3.00   Median :   40.0  
##                     Mean   : 12.04   Mean   :  320.2  
##                     3rd Qu.: 16.00   3rd Qu.:  357.0  
##                     Max.   :180.00   Max.   :10554.0

Looks good

Minute data

minute_data <- min_mets %>%
  left_join(min_cal, by = c("id", "date", "time")) %>% 
  left_join(min_inten, by = c("id", "date", "time")) %>% 
  left_join(min_steps, by = c("id", "date", "time")) %>% 
  distinct()

str(minute_data)

## 'data.frame':    1325580 obs. of  7 variables:
##  $ id       : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ mets     : int  10 10 10 10 10 12 12 12 12 12 ...
##  $ date     : POSIXct, format: "2016-04-12 00:00:00" "2016-04-12 00:01:00" ...
##  $ time     : chr  "00:00 AM" "00:01 AM" "00:02 AM" "00:03 AM" ...
##  $ calories : num  0.786 0.786 0.786 0.786 0.786 ...
##  $ intensity: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ steps    : int  0 0 0 0 0 0 0 0 0 0 ...

Statistical Summary

summary(minute_data)

##        id                 mets             date                       
##  Min.   :1.504e+09   Min.   :  0.00   Min.   :2016-04-12 00:00:00.00  
##  1st Qu.:2.320e+09   1st Qu.: 10.00   1st Qu.:2016-04-19 01:51:00.00  
##  Median :4.445e+09   Median : 10.00   Median :2016-04-26 06:27:00.00  
##  Mean   :4.848e+09   Mean   : 14.69   Mean   :2016-04-26 12:09:55.15  
##  3rd Qu.:6.962e+09   3rd Qu.: 11.00   3rd Qu.:2016-05-03 18:55:00.00  
##  Max.   :8.878e+09   Max.   :157.00   Max.   :2016-05-12 15:59:00.00  
##      time              calories         intensity          steps        
##  Length:1325580     Min.   : 0.0000   Min.   :0.0000   Min.   :  0.000  
##  Class :character   1st Qu.: 0.9357   1st Qu.:0.0000   1st Qu.:  0.000  
##  Mode  :character   Median : 1.2176   Median :0.0000   Median :  0.000  
##                     Mean   : 1.6231   Mean   :0.2006   Mean   :  5.336  
##                     3rd Qu.: 1.4327   3rd Qu.:0.0000   3rd Qu.:  0.000  
##                     Max.   :19.7499   Max.   :3.0000   Max.   :220.000

Looks good

creating some dataframe for data visualization :

effectable datasets (till now):

daily_act
hourly_data
minute_data
min_sleep
daily_sleep
sec_heart

PHASE 5 : Data Visualization

1. daily_act

daily_act <- daily_act %>% 
  select(c(id, act_date, 
           total_steps, 
           sedentary_minutes,
           calories))
str(daily_act)

## 'data.frame':    940 obs. of  5 variables:
##  $ id               : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ act_date         : Date, format: "2016-04-12" "2016-04-13" ...
##  $ total_steps      : int  13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
##  $ sedentary_minutes: int  728 776 1218 726 773 539 1149 775 818 838 ...
##  $ calories         : int  1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...

for sedentary vs total steps

daily_act %>% 
  ggplot(aes(sedentary_minutes, total_steps, 
             color = sedentary_minutes)) +
  geom_point(size = 2, alpha = 0.5) +
  geom_smooth(method = "lm", se = FALSE, color = 'purple') +
  labs( x = "Total Sedentary Minutes", y = "Total Steps Taken",
        color = "Sedentary Minutes", 
        title = "Relation Between Daily Sedentary Time By Steps Taken ",
        caption = "Data Analyst : JP")+
  annotate("text", x=220, y= 30000, label= "R^2 =  0.10" , color= "red", 
           fontface = "bold"  , size = 5, angle = 25) +
  theme_bw()

## `geom_smooth()` using formula = 'y ~ x'

summary(lm(sedentary_minutes ~ total_steps, daily_act))$r.squared

## [1] 0.1072455

Therefore, there is no relationship between sitting time and steps taken. A zero impact.

for total steps and calories

daily_act %>% 
  ggplot(aes(calories, total_steps, color = sedentary_minutes)) +
  geom_point(size = 2, alpha = 0.4) +
  geom_smooth(method = 'lm' , se = FALSE, color = 'purple') +
  labs(x = "Calories",
       y = "Total Steps Taken",
       color = "Sedentary Minutes",
       title = "Relationship Between Calories by Steps Taken",
       subtitle = "Linear Regression Model has Small fit for this relationship",
       caption = "Data Analyst : JP") +
  annotate("text", x=500, y= 30000, label = "R^2 =  0.34", color = "darkgreen",
           fontface = "bold", size = 5, angle = 25 ) +
  theme_bw()

## `geom_smooth()` using formula = 'y ~ x'

summary(lm(calories ~ total_steps , daily_act))$r.squared

## [1] 0.3499528

Calories consumption has small relationship with Steps taken.

For daily_sleep

daily_sleep %>% 
  ggplot(aes(sleep_min, inbed_min, color = factor(total_sleep_records))) +
  geom_point(size = 2, alpha = 0.5) + 
  geom_smooth(method = 'lm' , se = FALSE, color = "#330000") +
  labs(x = "Total Minutes Sleep",
       y = "Total Minute in Bed",
       color = "Sleep(s)",
       title = "Relationship Between Sleep vs In Bed Time",
       subtitle = "Linear Regression Model has Strong fit for this relationship",
       caption = "Data Analyst : JP") +
  annotate("text", x=175, y= 775, label = "R^2 =  0.86", color = "darkgreen",
           fontface = "bold", size = 5, angle = 25 ) +
  theme_bw()

## `geom_smooth()` using formula = 'y ~ x'

summary(lm(sleep_min~inbed_min, daily_sleep))$r.squared

## [1] 0.8656858

Strong, Customers are being idle on bed, wasting more time . We can ads our ‘Membership Plan’ to them.

Merging daily_sleep & daily_act

daily_sleep_act <- daily_sleep %>% 
  left_join(daily_act, by = c("id", "act_date")) %>% 
  distinct()

str(daily_sleep_act)

## 'data.frame':    410 obs. of  8 variables:
##  $ id                 : num  1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
##  $ act_date           : Date, format: "2016-04-12" "2016-04-13" ...
##  $ total_sleep_records: int  1 2 1 2 1 1 1 1 1 1 ...
##  $ sleep_min          : int  327 384 412 340 700 304 360 325 361 430 ...
##  $ inbed_min          : int  346 407 442 367 712 320 377 364 384 449 ...
##  $ total_steps        : int  13162 10735 9762 12669 9705 15506 10544 9819 14371 10039 ...
##  $ sedentary_minutes  : int  728 776 726 773 539 775 818 838 732 709 ...
##  $ calories           : int  1985 1797 1745 1863 1728 2035 1786 1775 1949 1788 ...

After merging & summarising 1 by 1 for linear regression for “daily_sleep_act” there is “none” effect, so dropping it. Hence, Sedentary tracking device is faulty).

rm(daily_sleep_act)

2. hourly_data

for intensity and steps

hourly_data %>% 
  filter(step_total < 6000) %>% 
  ggplot(aes(total_intensity, step_total)) +
  geom_point(color = "blue", size = 2, alpha = 0.3)+
  geom_smooth(method = 'lm', se = FALSE, color = "red") +
  labs(x = "Intensity",
       y = "Steps Taken",
       title = "Relationship Between Intensity & Total Steps",
       subtitle = "Linear Regression Model has Strong fit for this relationship",
       caption = "Data Analyst : JP") +
  annotate("text", x=10, y= 5000, label = "R^2 =  0.80", color = "darkgreen",
           fontface = "bold", size = 5, angle = 25 ) +
  theme_bw()

## `geom_smooth()` using formula = 'y ~ x'

summary(lm(total_intensity~step_total, hourly_data))$r.squared

## [1] 0.8027856

for intensity and calories

hourly_data %>% 
  filter(calories < 600) %>% 
  ggplot(aes(total_intensity, calories)) +
  geom_point(color = "blue", size = 2, alpha = 0.3)+
  geom_smooth(method = 'lm', se = FALSE, color = "red") +
  labs(x = "Intensity",
       y = "Calories",
       title = "Relationship Between Intensity & Calories",
       subtitle = "Linear Regression Model has Strong fit for this relationship",
       caption = "Data Analyst : JP") +
  annotate("text", x=10, y= 475, label = "R^2 =  0.80", color = "darkgreen",
           fontface = "bold", size = 5, angle = 25 ) +
  theme_bw()

## `geom_smooth()` using formula = 'y ~ x'

summary(lm(total_intensity~calories, hourly_data))$r.squared

## [1] 0.8039204

Looks good, Calories consumption has great affect on Intensity Level.

for steps and calories

hourly_data %>% 
  filter(calories < 600) %>% 
  ggplot(aes(step_total, calories)) +
  geom_point(color = "blue", size = 2, alpha = 0.3)+
  geom_smooth(method = 'lm', se = FALSE, color = "red") +
  labs(x = "Steps Taken",
       y = "Calories",
       title = "Relationship Between Steps Taken & Calories",
       subtitle = "Linear Regression Model has Strong fit for this relationship",
       caption = "Data Analyst : JP") +
  annotate("text", x=6800, y= 150, label = "R^2 =  0.66", color = "darkgreen",
           fontface = "bold", size = 5, angle = 25 ) +
  theme_bw()

## `geom_smooth()` using formula = 'y ~ x'

summary(lm(step_total~calories, hourly_data))$r.squared

## [1] 0.6641728

Looks good, Calories consumption has great affect on Steps Taken too.

3. Minute Data

for METs and Calories

minute_data %>% 
  ggplot(aes(mets, calories)) +
  geom_line(color = "blue", size = 0.5, alpha = 0.3)+
  geom_smooth(method = 'lm', se = FALSE, color = "red") +
  labs(x = "METs",
       y = "Calories",
       title = "Relationship Between METs & Calories",
       subtitle = "Linear Regression Model has Strong fit for this relationship",
       caption = "Data Analyst : JP") +
  annotate("text", x=25, y= 17, label = "R^2 =  0.91", color = "darkgreen",
           fontface = "bold", size = 5, angle = 25 ) +
  theme_bw()

## `geom_smooth()` using formula = 'y ~ x'

summary(lm(mets~calories, minute_data))$r.squared

## [1] 0.9138607

Looks good, Metabolism consumption has great affect on Calories consumption too.

PHASE 6 : Act

The act phase would be done by the Executive team of the company. So,Passing the documented report to :

Urška Sršen & the Team

Recommendations for Bellabeat Marketing Strategy:

Bellabeat should use improvised notification system to notify users to exercise more through reminders as they can keep track of health. And more centralized about engagement in the app with some marketing strategy like steps taken can be redeemed in the app to buy product on discount and something like this

Bellabeat should upgrade their sedentary tracking device while advertising for exchange with new one which has much more functions and improvised upgraded hardware which gonna provides precise tracking. So, Company will benefit by having more customers and more data for the next time when we will be doing Analysis for great insights

Bellabeat should encourage their customers for better sleeping habits, like best time to sleep and should also encourage to become “bellabeat membership” since it ofers 24/7 access to fully personalized guidance, and one more thing, Bellabeat should upgrade their system to notify users when heart rate rise or drops as these are Real engagement customers will not ignore.

Recommendations based on the limitations of the datasets:

Larger Sample size & more extended period of data is needed to get in-depth precise statistical analysis.
Data collection required from primary/secondary data sources just to increases credibility and reliability of the datasets.

Saved

Exported data

write.csv(hourly_data, file =“hourly_data.csv”)
write.csv(minute_data, file =“minute_data.csv”)
write.csv(daily_act, file =“daily_act.csv”)
write.csv(daily_sleep, file =“daily_sleep.csv”)

Google Data Analytics : Case Study 2

How Can a Wellness Technology Company Play It Smart?

Jayprakash Kumar

2023-06-11