Nikhil Sharma
Center for computer and
Communication Technology, Chisopani, Sikkim
Google/ Data
Analytics Professional Certificate - Coursera
Data Analytics, bellabeat, wellness, Case Study
Urška Sršen and Sando Mur founded Bellabeat, a high-tech company that manufactures health-focused smart products. Sršen has asked the marketing analytics team to focus on a Bellabeat product and analyze smart device usage data in order to gain insight into how people are already using their smart devices. Then, using this information, she would like high-level recommendations for how these trends can inform Bellabeat marketing strategy. The analytic techniques will provide both descriptive and predictive analysis.In addition, data from the company’s ERP system is integrated in the analysis. The proposed techniques will help the companies to provide smart device usage data in order to gain insight into how consumers use non-Bellabeat smart devices and how these trends can inform to improve Bellabeat marketing strategy.
This exploratory analysis case study is towards Capstone project requirement for Google Data Analytics Professional Certificate. The case study involve to analyze a smart device data to gain insight into how consumers are using their smart devices.The insights I discover will then help guide marketing strategy for the company. The dataset has been made available by a Public Domain, Mobius
The analysis will follow the 6 phases of the Data Analysis process by the Google: Ask, Prepare, Process, Analyze, Share, and Act.
The focus of this section is to understand the basic concepts of our wellness company ‘Bellabeat’. Project objectives are derived from the Director of Marketing and later converted into data science problem definitions.
Please note that this is a fictional case study.
Urška Sršen and Sando Mur founded Bellabeat, a high-tech company that manufactures health-focused smart products. Sršen used her background as an artist to develop beautifully designed technology that informs and inspires women around the world. Collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits. Since it was founded in 2013, Bellabeat has grown rapidly and quickly positioned itself as a tech-driven wellness company for women.
By 2016, Bellabeat had opened offices around the world and launched multiple products. Bellabeat products became available through a growing number of online retailers in addition to their own e-commerce channel on their website. The company has invested in traditional advertising media, such as radio, out-of-home billboards, print, and television, but focuses on digital marketing extensively. Bellabeat invests year-round in Google Search, maintaining active Facebook and Instagram pages, and consistently engages consumers on Twitter. Additionally, Bellabeat runs video ads on Youtube and display ads on the Google Display Network to support campaigns around key marketing dates.
Sršen knows that an analysis of Bellabeat’s available consumer data would reveal more opportunities for growth. She has asked the marketing analytics team to focus on a Bellabeat product and analyze smart device usage data in order to gain insight into how people are already using their smart devices. Then, using this information, she would like high-level recommendations for how these trends can inform Bellabeat marketing strategy
Marketing team needs to design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ.
Director of Marketing has assigned me the first question to answer: How do annual members and casual riders use Cyclistic bikes differently?
Hence, the objective for this analysis is to give a proper insight of how non-bellabeat use smart devices and apply it to one of the bellabeat products. It would helpfull for Bellabeat marketing strategy.
This section starts with initial data collection and proceeds with activities that targets understanding the data. These activities include first insight into the data, identifying data for analytic purposes, discovering data quality issues and/or detecting interesting subsets to form hypothesis regarding previously un-detected patterns.
Would like to thank Möbius for providing this relevant dataset to conduct this smart wellness device usage and its trending
Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring.
Reliability: This Kaggle data set contains personal fitness tracker from 30 eligible Fitbit users, the sample size is too small, might not reflect the overall population, therefore chances of being bias is present. However, increasing the sample size by adding another data could help to address the limitation of small data size. Furthermore, the content section of the dataset mentioned that ‘Thirty eligible Fitbit users consented to the submission of personal tracker data’, further investigation and exploration is needed to find out the criteria for being ‘eligible’ users to the submission of the personal tracker data.
Original: The datasets are third party information from public domain by Mobius, not originally by the service provider, Amazon Mechanical Turk. Hence, the originality of the datasets are low.
Comprehensive: Missing information on age, gender, device type used on the tracking etc. hence, these datasets are not comprehensive.
Current: These datasets were generated by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016-05.12.2016.
Cited: These datasets are considered as crowd-sourcing data generated by respondents to a distributed survey via Amazon Mechanical Turk, hence, the data source is considered properly cited.
A total of 18 datasets have been made available for each health-focused data collected by smart products . Each dataset captures the details of every health-focused activity logged by the customers of bellabeat. This data that has been made publicly available has been scrubbed to omit customer’s personal information.
The combined size of all the 18 datasets which is massive in size. Data cleaning in spreadsheets will be time-consuming and slow compared to SQL or R. I am choosing R simply because I could do both data wrangling and analysis/ visualizations in the same platform. It is also an opportunity for me to learn R in a better way.
This section provides insight into the business problems before performing data modeling. The data preparation phase include activities, such as data selection, data transformation, data cleaning and data validation. Data preparation tasks may be performed several times and not in any given order. During this phase important issues are addressed like selecting the relevant data, cleaning of data, discarding unacceptable data and how the ERP system data can be integrated into the final data sets.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(ggplot2)
library(lubridate)
##
## Attaching package: 'lubridate'
##
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(dplyr)
library(geosphere)
library(readr)
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(data.table)
##
## Attaching package: 'data.table'
##
## The following objects are masked from 'package:lubridate':
##
## hour, isoweek, mday, minute, month, quarter, second, wday, week,
## yday, year
##
## The following objects are masked from 'package:dplyr':
##
## between, first, last
##
## The following object is masked from 'package:purrr':
##
## transpose
library(tidyr)
library(scales)
##
## Attaching package: 'scales'
##
## The following object is masked from 'package:purrr':
##
## discard
##
## The following object is masked from 'package:readr':
##
## col_factor
The 18 health-focused data(in CSV format) are extracted and stored in one folder titled “fitbase-data”.
daily_activity <- read.csv("../fitbase_data/dailyActivity_merged.csv")
daily_calories <- read.csv("../fitbase_data/dailyCalories_merged.csv")
daily_intensities <- read.csv("../fitbase_data/dailyIntensities_merged.csv")
daily_steps <- read.csv("../fitbase_data/dailySteps_merged.csv")
heartrate_seconds <- read.csv("../fitbase_data/heartrate_seconds_merged.csv")
hourly_calories <- read.csv("../fitbase_data/hourlyCalories_merged.csv")
hourly_intensities <- read.csv("../fitbase_data/hourlyIntensities_merged.csv")
hourly_steps <- read.csv("../fitbase_data/hourlySteps_merged.csv")
minute_calories_narrow <- read.csv("../fitbase_data/minuteCaloriesNarrow_merged.csv")
minute_calories_wide <- read.csv("../fitbase_data/minuteCaloriesWide_merged.csv")
minute_intensities_narrow <- read.csv("../fitbase_data/minuteIntensitiesNarrow_merged.csv")
minute_intensities_wide <- read.csv("../fitbase_data/minuteIntensitiesWide_merged.csv")
minute_METs_narrow <- read.csv("../fitbase_data/minuteMETsNarrow_merged.csv")
minute_sleep <- read.csv("../fitbase_data/minuteSleep_merged.csv")
minute_steps_narrow <- read.csv("../fitbase_data/minuteStepsNarrow_merged.csv")
minute_steps_wide <- read.csv("../fitbase_data/minuteStepsWide_merged.csv")
sleep_day <- read.csv("../fitbase_data/sleepDay_merged.csv")
weight_log_info <- read.csv("../fitbase_data/weightLogInfo_merged.csv")
head(daily_activity)
## Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366 04-12-2016 13162 8.50 8.50
## 2 1503960366 4/13/2016 10735 6.97 6.97
## 3 1503960366 4/14/2016 10460 6.74 6.74
## 4 1503960366 4/15/2016 9762 6.28 6.28
## 5 1503960366 4/16/2016 12669 8.16 8.16
## 6 1503960366 4/17/2016 9705 6.48 6.48
## LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.44 0.40
## 4 0 2.14 1.26
## 5 0 2.71 0.41
## 6 0 3.19 0.78
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 11 181 1218 1776
## 4 34 209 726 1745
## 5 10 221 773 1863
## 6 20 164 539 1728
head(daily_calories)
## Id ActivityDay Calories
## 1 1503960366 4/12/2016 1985
## 2 1503960366 4/13/2016 1797
## 3 1503960366 4/14/2016 1776
## 4 1503960366 4/15/2016 1745
## 5 1503960366 4/16/2016 1863
## 6 1503960366 4/17/2016 1728
head(daily_intensities)
## Id ActivityDay SedentaryMinutes LightlyActiveMinutes
## 1 1503960366 4/12/2016 728 328
## 2 1503960366 4/13/2016 776 217
## 3 1503960366 4/14/2016 1218 181
## 4 1503960366 4/15/2016 726 209
## 5 1503960366 4/16/2016 773 221
## 6 1503960366 4/17/2016 539 164
## FairlyActiveMinutes VeryActiveMinutes SedentaryActiveDistance
## 1 13 25 0
## 2 19 21 0
## 3 11 30 0
## 4 34 29 0
## 5 10 36 0
## 6 20 38 0
## LightActiveDistance ModeratelyActiveDistance VeryActiveDistance
## 1 6.06 0.55 1.88
## 2 4.71 0.69 1.57
## 3 3.91 0.40 2.44
## 4 2.83 1.26 2.14
## 5 5.04 0.41 2.71
## 6 2.51 0.78 3.19
head(daily_steps)
## Id ActivityDay StepTotal
## 1 1503960366 4/12/2016 13162
## 2 1503960366 4/13/2016 10735
## 3 1503960366 4/14/2016 10460
## 4 1503960366 4/15/2016 9762
## 5 1503960366 4/16/2016 12669
## 6 1503960366 4/17/2016 9705
head(heartrate_seconds)
## Id Time Value
## 1 2022484408 4/12/2016 7:21:00 AM 97
## 2 2022484408 4/12/2016 7:21:05 AM 102
## 3 2022484408 4/12/2016 7:21:10 AM 105
## 4 2022484408 4/12/2016 7:21:20 AM 103
## 5 2022484408 4/12/2016 7:21:25 AM 101
## 6 2022484408 4/12/2016 7:22:05 AM 95
head(hourly_calories)
## Id ActivityHour Calories
## 1 1503960366 4/12/2016 12:00:00 AM 81
## 2 1503960366 4/12/2016 1:00:00 AM 61
## 3 1503960366 4/12/2016 2:00:00 AM 59
## 4 1503960366 4/12/2016 3:00:00 AM 47
## 5 1503960366 4/12/2016 4:00:00 AM 48
## 6 1503960366 4/12/2016 5:00:00 AM 48
head(hourly_intensities)
## Id ActivityHour TotalIntensity AverageIntensity
## 1 1503960366 4/12/2016 12:00:00 AM 20 0.333333
## 2 1503960366 4/12/2016 1:00:00 AM 8 0.133333
## 3 1503960366 4/12/2016 2:00:00 AM 7 0.116667
## 4 1503960366 4/12/2016 3:00:00 AM 0 0.000000
## 5 1503960366 4/12/2016 4:00:00 AM 0 0.000000
## 6 1503960366 4/12/2016 5:00:00 AM 0 0.000000
head(hourly_steps)
## Id ActivityHour StepTotal
## 1 1503960366 4/12/2016 12:00:00 AM 373
## 2 1503960366 4/12/2016 1:00:00 AM 160
## 3 1503960366 4/12/2016 2:00:00 AM 151
## 4 1503960366 4/12/2016 3:00:00 AM 0
## 5 1503960366 4/12/2016 4:00:00 AM 0
## 6 1503960366 4/12/2016 5:00:00 AM 0
head(minute_calories_narrow)
## Id ActivityMinute Calories
## 1 1503960366 4/12/2016 12:00:00 AM 0.7865
## 2 1503960366 4/12/2016 12:01:00 AM 0.7865
## 3 1503960366 4/12/2016 12:02:00 AM 0.7865
## 4 1503960366 4/12/2016 12:03:00 AM 0.7865
## 5 1503960366 4/12/2016 12:04:00 AM 0.7865
## 6 1503960366 4/12/2016 12:05:00 AM 0.9438
head(minute_calories_wide)
## Id ActivityHour Calories00 Calories01 Calories02 Calories03
## 1 1503960366 4/13/2016 12:00:00 AM 1.8876 2.2022 0.9438 0.9438
## 2 1503960366 4/13/2016 1:00:00 AM 0.7865 0.7865 0.7865 0.7865
## 3 1503960366 4/13/2016 2:00:00 AM 0.7865 0.7865 0.7865 0.7865
## 4 1503960366 4/13/2016 3:00:00 AM 0.7865 0.7865 0.7865 0.7865
## 5 1503960366 4/13/2016 4:00:00 AM 0.7865 0.7865 0.7865 0.7865
## 6 1503960366 4/13/2016 5:00:00 AM 0.7865 0.7865 0.7865 0.7865
## Calories04 Calories05 Calories06 Calories07 Calories08 Calories09 Calories10
## 1 0.9438 2.0449 0.9438 2.2022 0.9438 0.7865 0.9438
## 2 0.9438 0.9438 0.9438 0.7865 0.9438 0.7865 0.9438
## 3 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## 4 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## 5 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## 6 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## Calories11 Calories12 Calories13 Calories14 Calories15 Calories16 Calories17
## 1 0.7865 0.7865 0.7865 0.7865 0.9438 0.9438 0.7865
## 2 0.7865 0.9438 0.7865 0.7865 0.7865 0.7865 0.7865
## 3 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## 4 2.0449 0.9438 0.7865 0.7865 0.9438 0.7865 0.9438
## 5 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## 6 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## Calories18 Calories19 Calories20 Calories21 Calories22 Calories23 Calories24
## 1 0.7865 0.7865 1.8876 0.9438 0.9438 0.9438 0.9438
## 2 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## 3 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## 4 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## 5 0.7865 0.9438 0.7865 0.7865 0.7865 0.7865 0.7865
## 6 0.7865 0.9438 0.7865 0.7865 0.7865 0.7865 0.7865
## Calories25 Calories26 Calories27 Calories28 Calories29 Calories30 Calories31
## 1 2.0449 2.3595 0.9438 2.0449 0.9438 0.9438 0.9438
## 2 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## 3 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## 4 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## 5 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## 6 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## Calories32 Calories33 Calories34 Calories35 Calories36 Calories37 Calories38
## 1 2.0449 1.8876 0.9438 0.7865 0.7865 0.7865 0.7865
## 2 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## 3 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## 4 0.7865 0.9438 2.0449 2.0449 1.8876 0.7865 0.7865
## 5 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## 6 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## Calories39 Calories40 Calories41 Calories42 Calories43 Calories44 Calories45
## 1 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## 2 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## 3 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## 4 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## 5 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## 6 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## Calories46 Calories47 Calories48 Calories49 Calories50 Calories51 Calories52
## 1 0.7865 0.7865 0.7865 0.7865 0.9438 2.0449 2.0449
## 2 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## 3 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## 4 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## 5 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## 6 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## Calories53 Calories54 Calories55 Calories56 Calories57 Calories58 Calories59
## 1 0.9438 2.3595 1.8876 0.9438 0.9438 0.9438 0.9438
## 2 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## 3 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## 4 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## 5 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
## 6 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865 0.7865
head(minute_intensities_narrow)
## Id ActivityMinute Intensity
## 1 1503960366 4/12/2016 12:00:00 AM 0
## 2 1503960366 4/12/2016 12:01:00 AM 0
## 3 1503960366 4/12/2016 12:02:00 AM 0
## 4 1503960366 4/12/2016 12:03:00 AM 0
## 5 1503960366 4/12/2016 12:04:00 AM 0
## 6 1503960366 4/12/2016 12:05:00 AM 0
head(minute_intensities_wide)
## Id ActivityHour Intensity00 Intensity01 Intensity02
## 1 1503960366 4/13/2016 12:00:00 AM 1 1 0
## 2 1503960366 4/13/2016 1:00:00 AM 0 0 0
## 3 1503960366 4/13/2016 2:00:00 AM 0 0 0
## 4 1503960366 4/13/2016 3:00:00 AM 0 0 0
## 5 1503960366 4/13/2016 4:00:00 AM 0 0 0
## 6 1503960366 4/13/2016 5:00:00 AM 0 0 0
## Intensity03 Intensity04 Intensity05 Intensity06 Intensity07 Intensity08
## 1 0 0 1 0 1 0
## 2 0 0 0 0 0 0
## 3 0 0 0 0 0 0
## 4 0 0 0 0 0 0
## 5 0 0 0 0 0 0
## 6 0 0 0 0 0 0
## Intensity09 Intensity10 Intensity11 Intensity12 Intensity13 Intensity14
## 1 0 0 0 0 0 0
## 2 0 0 0 0 0 0
## 3 0 0 0 0 0 0
## 4 0 0 1 0 0 0
## 5 0 0 0 0 0 0
## 6 0 0 0 0 0 0
## Intensity15 Intensity16 Intensity17 Intensity18 Intensity19 Intensity20
## 1 0 0 0 0 0 1
## 2 0 0 0 0 0 0
## 3 0 0 0 0 0 0
## 4 0 0 0 0 0 0
## 5 0 0 0 0 0 0
## 6 0 0 0 0 0 0
## Intensity21 Intensity22 Intensity23 Intensity24 Intensity25 Intensity26
## 1 0 0 0 0 1 1
## 2 0 0 0 0 0 0
## 3 0 0 0 0 0 0
## 4 0 0 0 0 0 0
## 5 0 0 0 0 0 0
## 6 0 0 0 0 0 0
## Intensity27 Intensity28 Intensity29 Intensity30 Intensity31 Intensity32
## 1 0 1 0 0 0 1
## 2 0 0 0 0 0 0
## 3 0 0 0 0 0 0
## 4 0 0 0 0 0 0
## 5 0 0 0 0 0 0
## 6 0 0 0 0 0 0
## Intensity33 Intensity34 Intensity35 Intensity36 Intensity37 Intensity38
## 1 1 0 0 0 0 0
## 2 0 0 0 0 0 0
## 3 0 0 0 0 0 0
## 4 0 1 1 1 0 0
## 5 0 0 0 0 0 0
## 6 0 0 0 0 0 0
## Intensity39 Intensity40 Intensity41 Intensity42 Intensity43 Intensity44
## 1 0 0 0 0 0 0
## 2 0 0 0 0 0 0
## 3 0 0 0 0 0 0
## 4 0 0 0 0 0 0
## 5 0 0 0 0 0 0
## 6 0 0 0 0 0 0
## Intensity45 Intensity46 Intensity47 Intensity48 Intensity49 Intensity50
## 1 0 0 0 0 0 0
## 2 0 0 0 0 0 0
## 3 0 0 0 0 0 0
## 4 0 0 0 0 0 0
## 5 0 0 0 0 0 0
## 6 0 0 0 0 0 0
## Intensity51 Intensity52 Intensity53 Intensity54 Intensity55 Intensity56
## 1 1 1 0 1 1 0
## 2 0 0 0 0 0 0
## 3 0 0 0 0 0 0
## 4 0 0 0 0 0 0
## 5 0 0 0 0 0 0
## 6 0 0 0 0 0 0
## Intensity57 Intensity58 Intensity59
## 1 0 0 0
## 2 0 0 0
## 3 0 0 0
## 4 0 0 0
## 5 0 0 0
## 6 0 0 0
head(minute_METs_narrow)
## Id ActivityMinute METs
## 1 1503960366 4/12/2016 12:00:00 AM 10
## 2 1503960366 4/12/2016 12:01:00 AM 10
## 3 1503960366 4/12/2016 12:02:00 AM 10
## 4 1503960366 4/12/2016 12:03:00 AM 10
## 5 1503960366 4/12/2016 12:04:00 AM 10
## 6 1503960366 4/12/2016 12:05:00 AM 12
head(minute_sleep)
## Id date value logId
## 1 1503960366 4/12/2016 2:47:30 AM 3 11380564589
## 2 1503960366 4/12/2016 2:48:30 AM 2 11380564589
## 3 1503960366 4/12/2016 2:49:30 AM 1 11380564589
## 4 1503960366 4/12/2016 2:50:30 AM 1 11380564589
## 5 1503960366 4/12/2016 2:51:30 AM 1 11380564589
## 6 1503960366 4/12/2016 2:52:30 AM 1 11380564589
head(minute_steps_narrow)
## Id ActivityMinute Steps
## 1 1503960366 4/12/2016 12:00:00 AM 0
## 2 1503960366 4/12/2016 12:01:00 AM 0
## 3 1503960366 4/12/2016 12:02:00 AM 0
## 4 1503960366 4/12/2016 12:03:00 AM 0
## 5 1503960366 4/12/2016 12:04:00 AM 0
## 6 1503960366 4/12/2016 12:05:00 AM 0
head(minute_steps_wide)
## Id ActivityHour Steps00 Steps01 Steps02 Steps03 Steps04
## 1 1503960366 4/13/2016 12:00:00 AM 4 16 0 0 0
## 2 1503960366 4/13/2016 1:00:00 AM 0 0 0 0 0
## 3 1503960366 4/13/2016 2:00:00 AM 0 0 0 0 0
## 4 1503960366 4/13/2016 3:00:00 AM 0 0 0 0 0
## 5 1503960366 4/13/2016 4:00:00 AM 0 0 0 0 0
## 6 1503960366 4/13/2016 5:00:00 AM 0 0 0 0 0
## Steps05 Steps06 Steps07 Steps08 Steps09 Steps10 Steps11 Steps12 Steps13
## 1 9 0 17 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 10 0 0
## 5 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0
## Steps14 Steps15 Steps16 Steps17 Steps18 Steps19 Steps20 Steps21 Steps22
## 1 0 0 0 0 0 0 6 0 0
## 2 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0
## Steps23 Steps24 Steps25 Steps26 Steps27 Steps28 Steps29 Steps30 Steps31
## 1 0 0 11 21 0 8 0 0 0
## 2 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0
## Steps32 Steps33 Steps34 Steps35 Steps36 Steps37 Steps38 Steps39 Steps40
## 1 8 6 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0
## 4 0 0 11 9 6 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0
## Steps41 Steps42 Steps43 Steps44 Steps45 Steps46 Steps47 Steps48 Steps49
## 1 0 0 0 0 0 0 0 0 0
## 2 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0
## Steps50 Steps51 Steps52 Steps53 Steps54 Steps55 Steps56 Steps57 Steps58
## 1 0 9 8 0 20 1 0 0 0
## 2 0 0 0 0 0 0 0 0 0
## 3 0 0 0 0 0 0 0 0 0
## 4 0 0 0 0 0 0 0 0 0
## 5 0 0 0 0 0 0 0 0 0
## 6 0 0 0 0 0 0 0 0 0
## Steps59
## 1 0
## 2 0
## 3 0
## 4 0
## 5 0
## 6 0
head(sleep_day)
## Id SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 4/12/2016 12:00:00 AM 1 327
## 2 1503960366 4/13/2016 12:00:00 AM 2 384
## 3 1503960366 4/15/2016 12:00:00 AM 1 412
## 4 1503960366 4/16/2016 12:00:00 AM 2 340
## 5 1503960366 4/17/2016 12:00:00 AM 1 700
## 6 1503960366 4/19/2016 12:00:00 AM 1 304
## TotalTimeInBed
## 1 346
## 2 407
## 3 442
## 4 367
## 5 712
## 6 320
head(weight_log_info)
## Id Date WeightKg WeightPounds Fat BMI
## 1 1503960366 5/2/2016 11:59:59 PM 52.6 115.9631 22 22.65
## 2 1503960366 5/3/2016 11:59:59 PM 52.6 115.9631 NA 22.65
## 3 1927972279 4/13/2016 1:08:52 AM 133.5 294.3171 NA 47.54
## 4 2873212765 4/21/2016 11:59:59 PM 56.7 125.0021 NA 21.45
## 5 2873212765 5/12/2016 11:59:59 PM 57.3 126.3249 NA 21.69
## 6 4319703577 4/17/2016 11:59:59 PM 72.4 159.6147 25 27.45
## IsManualReport LogId
## 1 True 1.462234e+12
## 2 True 1.462320e+12
## 3 False 1.460510e+12
## 4 True 1.461283e+12
## 5 True 1.463098e+12
## 6 True 1.460938e+12
The following 3 datasets will be used for trend analysis:
First, we will be looking at the kind of users who use smart devices. According to the U.S. Department of Health and Human Service, the average woman expands rough;y 1,600 to 2,400 calories per day. We, thus, consider any day with over 2,400 calories expended to be an active day.
dailyCalories <- subset(daily_calories,Calories>500)
days_active <- dailyCalories %>%
group_by(Id) %>%
summarise(is_active = sum(Calories >2400), days_recorded = sum(Calories>0))
days_active <- subset(days_active,days_recorded > 15)
print(days_active)
## # A tibble: 32 × 3
## Id is_active days_recorded
## <dbl> <int> <int>
## 1 1503960366 0 30
## 2 1624580081 1 31
## 3 1644430081 26 30
## 4 1844505072 0 31
## 5 1927972279 4 31
## 6 2022484408 22 31
## 7 2026352035 0 31
## 8 2320127002 0 31
## 9 2347167796 2 17
## 10 2873212765 0 31
## # … with 22 more rows
## # ℹ Use `print(n = ...)` to see more rows
Of the subject group, 12 spend more than half of the recorded days having some form of exercise, while 15 did not have any form of exercise in more than 5 days. This shows that while some of these smart device users track their frequent physical activities, there is also a significant proportion of the users who use them to track their normal daily lifestyle.
Exploratory data analysis (EDA) is primarily a graphic approach that provides a first insight into the data. There are no formal set of rules that can be used in EDA, however, common approaches are: summary statistics, correlation, visualization and aggregation.Summary statistics or univariate analysis is the firststep that helps us to understand data. Univariate analysis is the simplest form of data analysis where the data being analyzed contains only one variable. Further, data correlation or multivariate analysis helps us to find relationships between two or more variables.
Finding connections between variables also has a crucial impact on choosing and building the predictive model(s). Data visualization helps us to gain perspective into the data, such as to find anomalies and to detect outliers. Finally, data aggregation helps us to group data from coarser to finer granularity in order to improve understanding.
Just by seeing the heads and a quick summary one can’t see the full trend of the dataframes. So I will plot some graphs to see the relationship properly.
I would like to start with the relationship between steps taken in a da and sedentary(people were inactive) minutes
ggplot(data=daily_activity, aes(x=TotalSteps, y=SedentaryMinutes, color = Calories)) + geom_point() + geom_smooth() + labs(title = "Total Steps Vs Sedentary minutes ", subtitle ="This plot is based on the value from Daily Activity Dataset", x = "Total Steps", y="Sedentary Minutes")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
relation between total steps and sedentary minutes which is true also because one doesn’t move when he/she is inactive!
So we can easily market this to consumers by telling them smart-devices could help them start their journey by measuring how much they’re already moving!
The can also know about their sedentary time.
One can note that sedentary time is not necessarily related to calories burned.
Now I will plot the graph between calories and total steps to see the relationship between them.
ggplot(data=daily_activity, aes(x=TotalSteps, y = Calories, ))+ geom_point() + stat_smooth(method=lm) +labs(title = "Total Steps Vs Calories ", subtitle ="This plot is based on the value from Daily Activity Dataset", x = "Total Steps", y="Calories")
## `geom_smooth()` using formula 'y ~ x'
We can clearly see people who took the most total steps tend to burn the
most calories. But there is a lot of spread in the value.
Now lets look at the residual or the difference between the observed values and the estimated value
calories.lm <- lm(Calories ~ TotalSteps, data = daily_activity)
calories.res <- resid(calories.lm)
plot(daily_activity$TotalSteps, calories.res, ylab="Residuals",
xlab = "Total Steps", main = "Calories Burned")
abline(0,0)
plot(density(calories.res))
#Checking for normality
qqnorm(calories.res)
qqline(calories.res)
So it looks like the spread isn’t as far statistically as we thought.
By seeing linear relationship in the graphs we can market that in order to burn calories we do not need to do high-intensity work out, one just needs to walk.
ggplot(data=sleep_day, aes(x=TotalMinutesAsleep, y=TotalTimeInBed)) + geom_point()+ stat_smooth(method=lm) +labs(title = "Relation between sleep and time in bed ", subtitle ="This plot is based on the value from Sleep Day Dataset", x = "Total Minute Sleep", y="Total Minute in Bed")
## `geom_smooth()` using formula 'y ~ x'
As we can see, there are some outliers, some people that spent a lot of
time in bed, but didn’t actually sleep, and then a small batch that
slept a whole bunch and spent time in bed
ggplot(data=weight_log_info, aes(x=WeightKg, y=BMI)) + geom_line() + labs(title = "Weight VS BMI ", subtitle ="This plot is based on Weigh Log Info Dataset", x = "Weight (Kg)", y="BMI")
Next. we look at how the average user spend their day. Going on the average calories spent per hour depending on activities, we assume: 1. Sleep: < 80 calories per hour 2. Normal: >=80, < 150 calories per hour 3. Moderate exercise: >=150, < 400 calories per hour 4. Intense exercise: >= 400 calories per hour
hourlyCalories <- hourly_calories
hourlyCalories$ActivityHour <- mdy_hms(hourlyCalories$ActivityHour)
hourlyCalories$date <- date(hourlyCalories$ActivityHour)
hourlyCalories$hour <- hour(hourlyCalories$ActivityHour)
hourlyCalories$activity <- ifelse(hourlyCalories$Calories < 80, "sleep",
ifelse ( (hourlyCalories$Calories >=80 & hourlyCalories$Calories < 150), "normal",
ifelse ((hourlyCalories$Calories >=150 & hourlyCalories$Calories < 400), "moderate_exercise", "intense_exercise")))
usage <- count(hourlyCalories, vars= hour)
activity <- hourlyCalories %>%
group_by(hour) %>%
summarise(count_sleep = sum(activity == "sleep"), count_normal = sum(activity == "normal"), count_moderate_exercise = sum(activity == "moderate_exercise"), count_intense_exercise = sum(activity=="intense_exercise"))
activity_sorted_long <-
activity %>%
gather(c("count_normal","count_moderate_exercise","count_intense_exercise"),key="Activity",value="Count")
ggplot(data = activity_sorted_long, aes (x = hour, y = Count, colour = Activity)) +
geom_line()+labs(title = "How Average User Spend their Day ", subtitle =" Going on the average calories spent per hour depending on activities", x = "Hour", y="Count")
Finally, we try to look at whether the users change their habits over the course of their smart device usage. We do this by tracking their daily calories usage and observing whether they change over time.
ggplot(data = dailyCalories, aes (x = ActivityDay, y = Calories, colour = (factor(Id)), group = 33)) +
theme(axis.text.x=element_blank()) +
geom_point() + geom_smooth() + labs(title = "Changing habits of user over the course of smart device usage ", subtitle ="This plot is based on Daily Calories dataset ", x = "Acitivity Day", y="Calories")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
We can tell there is little to no significant change in their habits
over the average period of 30 days for each test subject.
Marketing for Bellabeat devices should be marketed as a fashion piece or statement with a secondary benefit of tracking health indicators to promote a self-confident, self-sufficient and independent lifestyle rather than a fitness or healthy lifestyle. The focus should be only “Taking good care and pampering yourself” rather than “Staying healthy”.
The Bellabeat app should focus on social aspects of the users lifestyles and provide minor goals or recommendations to improve their wellness. The app can focus on what the users have done well and allow them to publish these successes on their social media to allow them to portray their excellent usage of the Bellabeat device and an exemplary social media image of themselves.
One of the most beneficial features of smart wearing devices is to motivate customers to have healthier lifestyles. A peer comparison feature might be developed to encourage customers to increase their active level to improve their health.
As the data quality is not great based on POCCC review, all the abpve recommendations required further validation.
since in the scenario of the case study provided to me it was written Bellabeat collects hydration data because they had one product related to hydration check, they are better because fitbit did not have hydration data.
A marketing strategy can be implemented to tell about sufficient sleep required by body, how it be achieved and how bellabeat can help them keep track of it and improve it.
We can see that more people log their calories, steps taken, etc, and fewer users log their sleep data, and only a select few are logging their weight.
The daily activities of the users are mostly sedentary in nature and few change their lifestyle significant based on the Fitbit data provided. As such, most users do not wear these devices to improve to a healthy lifestyle.