‘Behind every great man there’s a great woman’

                                        Port Arthur News 1946

Introducing

BELLABEAT A health tracker made for women.

Background

This project is part of a case study for the Google Professional Data Analytics certification program.In this particular case study I’ll be analyzing a public data set provided in the course, as a data analyst working on the marketing analyst team of a health and wellness company named Bella-beat

I’ll be using: Computer Programming Language R

following the data analysis phases of:

  • Ask
  • Prepare
  • Process
  • Analyze
  • Share
  • Act

About the company

Bella-beat is a high-tech company that manufactures health-focused smart products. Since it was founded in 2013, Bella-beat has grown rapidly and quickly positioned itself as a tech-driven wellness company for women.

*The company has a strong online and offline presence due to their various add campaigns across platforms. Urška Sršen, co-founder and Chief Creative Officer of Bella-beat believes that the company has potential to grow and expand its business by analyzing trends in current usage of smart devices and therefore, would like high-level recommendations on how these trends can improve Bella-beat marketing strategy.

Ask

  1. What are some trends in smart device usage?
  2. How could these trends apply to Bella-beat customers?
  3. How could these trends help influence Bella-beat marketing strategy?

Business Task

Identify the potential opportunities and recommendations for Bella-beat marketing strategy improvement by analyzing trends in existing SMART device usage data and implementing it to one of many Bella-beat SMART products. Prepare. The data set to be used is a public data set available through Kaggle and contains personal fitness tracker from thirty fit-bit users. The data set includes 18 csv files that capture everything from daily activity, calories (daily, hourly and by minute), intensities (daily, hourly and by minute), number of steps (daily, hourly and by minute), heart rate, minute METs, sleep (Day and minute) and weight log info. *For the scope of this data analysis only a selection of the 18 data sets that were deemed relevant in addressing the business task were imported into R-Studio.

Loading Packages

library(ggplot2)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ tibble  3.1.7     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ✔ purrr   0.3.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(ggpubr)
library(rJava)
library(xlsxjars)
library(xlsx)

Importing datasets .FITBITCSV.csv

library(readr)
FITBITCSV <- read_csv("FITBITCSV.csv")
## Rows: 940 Columns: 18
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (1): ActivityDate
## dbl (17): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Process. Exploring Data Using tidyverse R.

View(fitbitcsv.csv) attach(fitbitcsv.csv) class(fitbitcsv.csv) length(fitbitcsv.csv)

names(FITBITCSV) Copy and paste the name into your code to avoid misspellings

names(FITBITCSV)
##  [1] "Id"                       "ActivityDate"            
##  [3] "TotalSteps"               "TotalDistance"           
##  [5] "TrackerDistance"          "LoggedActivitiesDistance"
##  [7] "VeryActiveDistance"       "ModeratelyActiveDistance"
##  [9] "LightActiveDistance"      "SedentaryActiveDistance" 
## [11] "VeryActiveMinutes"        "FairlyActiveMinutes"     
## [13] "LightlyActiveMinutes"     "SedentaryMinutes"        
## [15] "Calories"                 "TotalSleepRecords"       
## [17] "TotalMinutesAsleep"       "TotalTimeInBed"

unique(FITBITCSV) Shows NA values

missing <- !complete.cases(FITBITCSV) Missing or No missing values in this data set.

head(FITBITCSV, 10) Explore and preview the first 10 rows of data

tail(FITBITCSV, 10) Explore and preview the last 10 rows of data

summary(FITBITCSV)This summary code will provide you all the above

summary(FITBITCSV)
##        Id            ActivityDate         TotalSteps    TotalDistance   
##  Min.   :1.504e+09   Length:940         Min.   :    0   Min.   : 0.000  
##  1st Qu.:2.320e+09   Class :character   1st Qu.: 3790   1st Qu.: 2.620  
##  Median :4.445e+09   Mode  :character   Median : 7406   Median : 5.245  
##  Mean   :4.855e+09                      Mean   : 7638   Mean   : 5.490  
##  3rd Qu.:6.962e+09                      3rd Qu.:10727   3rd Qu.: 7.713  
##  Max.   :8.878e+09                      Max.   :36019   Max.   :28.030  
##                                                                         
##  TrackerDistance  LoggedActivitiesDistance VeryActiveDistance
##  Min.   : 0.000   Min.   :0.0000           Min.   : 0.000    
##  1st Qu.: 2.620   1st Qu.:0.0000           1st Qu.: 0.000    
##  Median : 5.245   Median :0.0000           Median : 0.210    
##  Mean   : 5.475   Mean   :0.1082           Mean   : 1.503    
##  3rd Qu.: 7.710   3rd Qu.:0.0000           3rd Qu.: 2.053    
##  Max.   :28.030   Max.   :4.9421           Max.   :21.920    
##                                                              
##  ModeratelyActiveDistance LightActiveDistance SedentaryActiveDistance
##  Min.   :0.0000           Min.   : 0.000      Min.   :0.000000       
##  1st Qu.:0.0000           1st Qu.: 1.945      1st Qu.:0.000000       
##  Median :0.2400           Median : 3.365      Median :0.000000       
##  Mean   :0.5675           Mean   : 3.341      Mean   :0.001606       
##  3rd Qu.:0.8000           3rd Qu.: 4.782      3rd Qu.:0.000000       
##  Max.   :6.4800           Max.   :10.710      Max.   :0.110000       
##                                                                      
##  VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes
##  Min.   :  0.00    Min.   :  0.00      Min.   :  0.0        Min.   :   0.0  
##  1st Qu.:  0.00    1st Qu.:  0.00      1st Qu.:127.0        1st Qu.: 729.8  
##  Median :  4.00    Median :  6.00      Median :199.0        Median :1057.5  
##  Mean   : 21.16    Mean   : 13.56      Mean   :192.8        Mean   : 991.2  
##  3rd Qu.: 32.00    3rd Qu.: 19.00      3rd Qu.:264.0        3rd Qu.:1229.5  
##  Max.   :210.00    Max.   :143.00      Max.   :518.0        Max.   :1440.0  
##                                                                             
##     Calories    TotalSleepRecords TotalMinutesAsleep TotalTimeInBed 
##  Min.   :   0   Min.   :1.000     Min.   : 58.0      Min.   : 61.0  
##  1st Qu.:1828   1st Qu.:1.000     1st Qu.:361.0      1st Qu.:403.0  
##  Median :2134   Median :1.000     Median :433.0      Median :463.0  
##  Mean   :2304   Mean   :1.119     Mean   :419.5      Mean   :458.6  
##  3rd Qu.:2793   3rd Qu.:1.000     3rd Qu.:490.0      3rd Qu.:526.0  
##  Max.   :4900   Max.   :3.000     Max.   :796.0      Max.   :961.0  
##                 NA's   :527       NA's   :527        NA's   :527

Interpreting summary statistics

Inconsistencies with the time stamp data shown as a character data type instead of a date, where found.

Key summary observations.

Mean Sedentary Minutes of 991 minutes which is roughly 16 hours is higher than normal daily hours of 7-8. Average total steps per day 7638 are below the 10000 daily recommended steps. Majority candidates are only lightly active. Participants sleep for 7 hours on average.

Analyzing and Sharing-Visualization

Trends and relationships.

Sedentary Minutes vs Steps taken

Negative relationship between total steps taken and sedentary minutes.

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Number of Steps vs Calories burned.

Positive correlation between number of steps to the amount of calories burned.

Total Distance, Total Steps, Very Active Minutes vs Calories

Positive overall trend as shown below

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Very Active Distance, Moderately Active Distance and Sedentary Active Distance vs Logged Activities Distance

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Figure above shows that, Overall, there is no consistence in the relationship between active distance and logged

active distance. Data in the Sedentary Active Distance is more inconsistent than others.

Guiding questions:

What surprises did you discover in the data?

There is negative relationship between total steps taken and sedentary minutes

How will these insights help answer your business questions?

We now know which categories of users use Bella-beat the most.

Key findings

Irrespective of activity levels, majority of participants seem not to be getting enough sleep hours. Average sedentary time of 991 (16 hrs.) per day mins seems higher than normal.

Recommendations. ACT

During data cleaning, we found out that data on sleep is far less than the daily activity

data. One reason could be that, many people find it uncomfortable to wear a watch in bed

Informing the end user that a Leaf-Bella-beat’s classic wellness tracker can be worn as a bracelet during sleep hours is recommened.

Bella-Beat Unique futures like long lasting Leaf battery hours, phone or text notifications plus consitant reminders are also recommended.

Conclusion.

Best on the data provided by BellaBeat.Com

The company seems to offer many products to choose from. Public awareness and more

Nutrition and Exercise Science institutions is needed.

Citation:

BellaBeat

kaggle

FITBIT

Google Data Professional Data Analytics Coursera