‘Behind every great man there’s a great woman’
Port Arthur News 1946
BELLABEAT A health tracker made for women.
This project is part of a case study for the Google Professional Data Analytics certification program.In this particular case study I’ll be analyzing a public data set provided in the course, as a data analyst working on the marketing analyst team of a health and wellness company named Bella-beat
Bella-beat is a high-tech company that manufactures health-focused smart products. Since it was founded in 2013, Bella-beat has grown rapidly and quickly positioned itself as a tech-driven wellness company for women.
*The company has a strong online and offline presence due to their various add campaigns across platforms. Urška Sršen, co-founder and Chief Creative Officer of Bella-beat believes that the company has potential to grow and expand its business by analyzing trends in current usage of smart devices and therefore, would like high-level recommendations on how these trends can improve Bella-beat marketing strategy.
Identify the potential opportunities and recommendations for Bella-beat marketing strategy improvement by analyzing trends in existing SMART device usage data and implementing it to one of many Bella-beat SMART products. Prepare. The data set to be used is a public data set available through Kaggle and contains personal fitness tracker from thirty fit-bit users. The data set includes 18 csv files that capture everything from daily activity, calories (daily, hourly and by minute), intensities (daily, hourly and by minute), number of steps (daily, hourly and by minute), heart rate, minute METs, sleep (Day and minute) and weight log info. *For the scope of this data analysis only a selection of the 18 data sets that were deemed relevant in addressing the business task were imported into R-Studio.
library(ggplot2)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✔ tibble 3.1.7 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ✔ purrr 0.3.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(ggpubr)
library(rJava)
library(xlsxjars)
library(xlsx)
library(readr)
FITBITCSV <- read_csv("FITBITCSV.csv")
## Rows: 940 Columns: 18
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): ActivityDate
## dbl (17): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(fitbitcsv.csv) attach(fitbitcsv.csv) class(fitbitcsv.csv) length(fitbitcsv.csv)
names(FITBITCSV)
## [1] "Id" "ActivityDate"
## [3] "TotalSteps" "TotalDistance"
## [5] "TrackerDistance" "LoggedActivitiesDistance"
## [7] "VeryActiveDistance" "ModeratelyActiveDistance"
## [9] "LightActiveDistance" "SedentaryActiveDistance"
## [11] "VeryActiveMinutes" "FairlyActiveMinutes"
## [13] "LightlyActiveMinutes" "SedentaryMinutes"
## [15] "Calories" "TotalSleepRecords"
## [17] "TotalMinutesAsleep" "TotalTimeInBed"
unique(FITBITCSV) Shows NA values
missing <- !complete.cases(FITBITCSV) Missing or No missing values in this data set.
head(FITBITCSV, 10) Explore and preview the first 10 rows of data
tail(FITBITCSV, 10) Explore and preview the last 10 rows of data
summary(FITBITCSV)
## Id ActivityDate TotalSteps TotalDistance
## Min. :1.504e+09 Length:940 Min. : 0 Min. : 0.000
## 1st Qu.:2.320e+09 Class :character 1st Qu.: 3790 1st Qu.: 2.620
## Median :4.445e+09 Mode :character Median : 7406 Median : 5.245
## Mean :4.855e+09 Mean : 7638 Mean : 5.490
## 3rd Qu.:6.962e+09 3rd Qu.:10727 3rd Qu.: 7.713
## Max. :8.878e+09 Max. :36019 Max. :28.030
##
## TrackerDistance LoggedActivitiesDistance VeryActiveDistance
## Min. : 0.000 Min. :0.0000 Min. : 0.000
## 1st Qu.: 2.620 1st Qu.:0.0000 1st Qu.: 0.000
## Median : 5.245 Median :0.0000 Median : 0.210
## Mean : 5.475 Mean :0.1082 Mean : 1.503
## 3rd Qu.: 7.710 3rd Qu.:0.0000 3rd Qu.: 2.053
## Max. :28.030 Max. :4.9421 Max. :21.920
##
## ModeratelyActiveDistance LightActiveDistance SedentaryActiveDistance
## Min. :0.0000 Min. : 0.000 Min. :0.000000
## 1st Qu.:0.0000 1st Qu.: 1.945 1st Qu.:0.000000
## Median :0.2400 Median : 3.365 Median :0.000000
## Mean :0.5675 Mean : 3.341 Mean :0.001606
## 3rd Qu.:0.8000 3rd Qu.: 4.782 3rd Qu.:0.000000
## Max. :6.4800 Max. :10.710 Max. :0.110000
##
## VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes
## Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.0
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.:127.0 1st Qu.: 729.8
## Median : 4.00 Median : 6.00 Median :199.0 Median :1057.5
## Mean : 21.16 Mean : 13.56 Mean :192.8 Mean : 991.2
## 3rd Qu.: 32.00 3rd Qu.: 19.00 3rd Qu.:264.0 3rd Qu.:1229.5
## Max. :210.00 Max. :143.00 Max. :518.0 Max. :1440.0
##
## Calories TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## Min. : 0 Min. :1.000 Min. : 58.0 Min. : 61.0
## 1st Qu.:1828 1st Qu.:1.000 1st Qu.:361.0 1st Qu.:403.0
## Median :2134 Median :1.000 Median :433.0 Median :463.0
## Mean :2304 Mean :1.119 Mean :419.5 Mean :458.6
## 3rd Qu.:2793 3rd Qu.:1.000 3rd Qu.:490.0 3rd Qu.:526.0
## Max. :4900 Max. :3.000 Max. :796.0 Max. :961.0
## NA's :527 NA's :527 NA's :527
Inconsistencies with the time stamp data shown as a character data type instead of a date, where found.
Mean Sedentary Minutes of 991 minutes which is roughly 16 hours is higher than normal daily hours of 7-8. Average total steps per day 7638 are below the 10000 daily recommended steps. Majority candidates are only lightly active. Participants sleep for 7 hours on average.
Trends and relationships.
Negative relationship between total steps taken and sedentary minutes.
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'