Introduction
Welcome to my Bellabeat data analysis case study. In this project, I
will take on the role of a data analyst, applying real-world analytical
techniques to uncover valuable insights. To address key business
questions, I will follow a structured data analysis process: ask,
prepare, process, analyze, share, and act.
About the Company Bellabeat
Bellabeat, is a high-tech manufacturer of health-focused products for
women. Bellabeat is a successful small company, and they have the
potential to become a larger player in the global smart device market.
Urška Sršen and Sando Mur founded Bellabeat, a high-tech company that
manufactures health-focused smart products. Collecting data on activity,
sleep, stress, and reproductive health has allowed Bellabeat to empower
women with knowledge about their own health and habits. Since it was
founded in 2013, Bellabeat has grown rapidly and quickly positioned
itself as a tech-driven wellness company for women.
Scenario of the Study
In this study, I will focus on one of Bellabeat’s products and analyze
smart device data to gain insight into how consumers are using their
smart devices. These insights will then help me guide marketing
strategies for the company.
Step 1:Ask
Objective: Analyze the provided Fitbit fitness tracker data to uncover key insights that can help Bellabeat enhance its marketing strategy.
Primary Stakeholders: Bellabeat’s Co-Founders, Urška Sršen and Sando Mur
Key Questions: What trends are emerging in smart device usage? How do these trends relate to Bellabeat’s customer base? In what ways can these insights shape Bellabeat’s marketing strategy?
Step 2: Date Preparation (uploading the needed packages and chosen data sets)
The data set used was downloaded from Kaggle. This Kaggle data set contains personal fitness trackers from thirty fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. And It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits.The dataset was generated by respondents to a distributed survey via Amazon Mechanical Turk between 03.12.2016 and 05.12.2016. And include 18 CSV files. The link to the data set is below.
FitBit Fitness Tracker Data : https://www.kaggle.com/arashnic/fitbit.
I downloaded the tidyverse,lubridate,dplyr,ggplot2,and tidyr libraries to help with analysis. The here,skimr, and janitor libraries are for data cleaning.
# Installing packages :
install.packages("tidyverse")
install.packages("lubridate")
install.packages("dplyr")
install.packages("ggplot2")
install.packages("tidyr")
install.packages("here")
install.packages("skimr")
install.packages("janitor")
#Load Libraries:
library(tidyverse)
library(lubridate)
library(dplyr)
library(ggplot2)
library(tidyr)
library(here)
library(skimr)
library(janitor)
The data available is activity, calories, intensity, steps, heart rate, weight, and sleep. This data is offered in daily, hourly , and by the minute. I chose to focus on the following daily data sets: Activity,Steps,Intensity, Sleep, Weight, and Calories. 1. I kept the “daily” data and omitted the “minute” and “hourly” data. 2. The “Sleep” dataset has 24 users, while the rest of the datasets contain 33. 3. After further review, I chose to drop the “Weight” dataset as it contained only 8 users. 4. I reformatted the time columns
Activity <- read.csv("dailyActivity_merged.csv", stringsAsFactors=FALSE)
Steps <- read.csv("dailySteps_merged.csv")
Intensity<- read.csv("dailyIntensities_merged.csv")
Sleep <- read.csv("sleepDay_merged.csv")
Weight<- read.csv("weightLogInfo_merged.csv")
Calories <- read.csv("hourlyCalories_merged.csv", stringsAsFactors=FALSE)
head(Activity)
## Id ActivityDate TotalSteps TotalDistance TrackerDistance
## 1 1503960366 4/12/2016 13162 8.50 8.50
## 2 1503960366 4/13/2016 10735 6.97 6.97
## 3 1503960366 4/14/2016 10460 6.74 6.74
## 4 1503960366 4/15/2016 9762 6.28 6.28
## 5 1503960366 4/16/2016 12669 8.16 8.16
## 6 1503960366 4/17/2016 9705 6.48 6.48
## LoggedActivitiesDistance VeryActiveDistance ModeratelyActiveDistance
## 1 0 1.88 0.55
## 2 0 1.57 0.69
## 3 0 2.44 0.40
## 4 0 2.14 1.26
## 5 0 2.71 0.41
## 6 0 3.19 0.78
## LightActiveDistance SedentaryActiveDistance VeryActiveMinutes
## 1 6.06 0 25
## 2 4.71 0 21
## 3 3.91 0 30
## 4 2.83 0 29
## 5 5.04 0 36
## 6 2.51 0 38
## FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes Calories
## 1 13 328 728 1985
## 2 19 217 776 1797
## 3 11 181 1218 1776
## 4 34 209 726 1745
## 5 10 221 773 1863
## 6 20 164 539 1728
colnames(Activity)
## [1] "Id" "ActivityDate"
## [3] "TotalSteps" "TotalDistance"
## [5] "TrackerDistance" "LoggedActivitiesDistance"
## [7] "VeryActiveDistance" "ModeratelyActiveDistance"
## [9] "LightActiveDistance" "SedentaryActiveDistance"
## [11] "VeryActiveMinutes" "FairlyActiveMinutes"
## [13] "LightlyActiveMinutes" "SedentaryMinutes"
## [15] "Calories"
str(Activity)
## 'data.frame': 940 obs. of 15 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityDate : chr "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
## $ TotalSteps : int 13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
## $ TotalDistance : num 8.5 6.97 6.74 6.28 8.16 ...
## $ TrackerDistance : num 8.5 6.97 6.74 6.28 8.16 ...
## $ LoggedActivitiesDistance: num 0 0 0 0 0 0 0 0 0 0 ...
## $ VeryActiveDistance : num 1.88 1.57 2.44 2.14 2.71 ...
## $ ModeratelyActiveDistance: num 0.55 0.69 0.4 1.26 0.41 ...
## $ LightActiveDistance : num 6.06 4.71 3.91 2.83 5.04 ...
## $ SedentaryActiveDistance : num 0 0 0 0 0 0 0 0 0 0 ...
## $ VeryActiveMinutes : int 25 21 30 29 36 38 42 50 28 19 ...
## $ FairlyActiveMinutes : int 13 19 11 34 10 20 16 31 12 8 ...
## $ LightlyActiveMinutes : int 328 217 181 209 221 164 233 264 205 211 ...
## $ SedentaryMinutes : int 728 776 1218 726 773 539 1149 775 818 838 ...
## $ Calories : int 1985 1797 1776 1745 1863 1728 1921 2035 1786 1775 ...
summary(Calories)
## Id ActivityHour Calories
## Min. :1.504e+09 Length:22099 Min. : 42.00
## 1st Qu.:2.320e+09 Class :character 1st Qu.: 63.00
## Median :4.445e+09 Mode :character Median : 83.00
## Mean :4.848e+09 Mean : 97.39
## 3rd Qu.:6.962e+09 3rd Qu.:108.00
## Max. :8.878e+09 Max. :948.00
head(Steps)
## Id ActivityDay StepTotal
## 1 1503960366 4/12/2016 13162
## 2 1503960366 4/13/2016 10735
## 3 1503960366 4/14/2016 10460
## 4 1503960366 4/15/2016 9762
## 5 1503960366 4/16/2016 12669
## 6 1503960366 4/17/2016 9705
colnames(Steps)
## [1] "Id" "ActivityDay" "StepTotal"
str(Steps)
## 'data.frame': 940 obs. of 3 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityDay: chr "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
## $ StepTotal : int 13162 10735 10460 9762 12669 9705 13019 15506 10544 9819 ...
head(Intensity)
## Id ActivityDay SedentaryMinutes LightlyActiveMinutes
## 1 1503960366 4/12/2016 728 328
## 2 1503960366 4/13/2016 776 217
## 3 1503960366 4/14/2016 1218 181
## 4 1503960366 4/15/2016 726 209
## 5 1503960366 4/16/2016 773 221
## 6 1503960366 4/17/2016 539 164
## FairlyActiveMinutes VeryActiveMinutes SedentaryActiveDistance
## 1 13 25 0
## 2 19 21 0
## 3 11 30 0
## 4 34 29 0
## 5 10 36 0
## 6 20 38 0
## LightActiveDistance ModeratelyActiveDistance VeryActiveDistance
## 1 6.06 0.55 1.88
## 2 4.71 0.69 1.57
## 3 3.91 0.40 2.44
## 4 2.83 1.26 2.14
## 5 5.04 0.41 2.71
## 6 2.51 0.78 3.19
colnames(Intensity)
## [1] "Id" "ActivityDay"
## [3] "SedentaryMinutes" "LightlyActiveMinutes"
## [5] "FairlyActiveMinutes" "VeryActiveMinutes"
## [7] "SedentaryActiveDistance" "LightActiveDistance"
## [9] "ModeratelyActiveDistance" "VeryActiveDistance"
str(Intensity)
## 'data.frame': 940 obs. of 10 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ ActivityDay : chr "4/12/2016" "4/13/2016" "4/14/2016" "4/15/2016" ...
## $ SedentaryMinutes : int 728 776 1218 726 773 539 1149 775 818 838 ...
## $ LightlyActiveMinutes : int 328 217 181 209 221 164 233 264 205 211 ...
## $ FairlyActiveMinutes : int 13 19 11 34 10 20 16 31 12 8 ...
## $ VeryActiveMinutes : int 25 21 30 29 36 38 42 50 28 19 ...
## $ SedentaryActiveDistance : num 0 0 0 0 0 0 0 0 0 0 ...
## $ LightActiveDistance : num 6.06 4.71 3.91 2.83 5.04 ...
## $ ModeratelyActiveDistance: num 0.55 0.69 0.4 1.26 0.41 ...
## $ VeryActiveDistance : num 1.88 1.57 2.44 2.14 2.71 ...
head(Sleep)
## Id SleepDay TotalSleepRecords TotalMinutesAsleep
## 1 1503960366 4/12/2016 12:00:00 AM 1 327
## 2 1503960366 4/13/2016 12:00:00 AM 2 384
## 3 1503960366 4/15/2016 12:00:00 AM 1 412
## 4 1503960366 4/16/2016 12:00:00 AM 2 340
## 5 1503960366 4/17/2016 12:00:00 AM 1 700
## 6 1503960366 4/19/2016 12:00:00 AM 1 304
## TotalTimeInBed
## 1 346
## 2 407
## 3 442
## 4 367
## 5 712
## 6 320
colnames(Sleep)
## [1] "Id" "SleepDay" "TotalSleepRecords"
## [4] "TotalMinutesAsleep" "TotalTimeInBed"
str(Sleep)
## 'data.frame': 413 obs. of 5 variables:
## $ Id : num 1.5e+09 1.5e+09 1.5e+09 1.5e+09 1.5e+09 ...
## $ SleepDay : chr "4/12/2016 12:00:00 AM" "4/13/2016 12:00:00 AM" "4/15/2016 12:00:00 AM" "4/16/2016 12:00:00 AM" ...
## $ TotalSleepRecords : int 1 2 1 2 1 1 1 1 1 1 ...
## $ TotalMinutesAsleep: int 327 384 412 340 700 304 360 325 361 430 ...
## $ TotalTimeInBed : int 346 407 442 367 712 320 377 364 384 449 ...
head(Weight)
## Id Date WeightKg WeightPounds Fat BMI
## 1 1503960366 5/2/2016 11:59:59 PM 52.6 115.9631 22 22.65
## 2 1503960366 5/3/2016 11:59:59 PM 52.6 115.9631 NA 22.65
## 3 1927972279 4/13/2016 1:08:52 AM 133.5 294.3171 NA 47.54
## 4 2873212765 4/21/2016 11:59:59 PM 56.7 125.0021 NA 21.45
## 5 2873212765 5/12/2016 11:59:59 PM 57.3 126.3249 NA 21.69
## 6 4319703577 4/17/2016 11:59:59 PM 72.4 159.6147 25 27.45
## IsManualReport LogId
## 1 True 1.462234e+12
## 2 True 1.462320e+12
## 3 False 1.460510e+12
## 4 True 1.461283e+12
## 5 True 1.463098e+12
## 6 True 1.460938e+12
colnames(Weight)
## [1] "Id" "Date" "WeightKg" "WeightPounds"
## [5] "Fat" "BMI" "IsManualReport" "LogId"
str(Weight)
## 'data.frame': 67 obs. of 8 variables:
## $ Id : num 1.50e+09 1.50e+09 1.93e+09 2.87e+09 2.87e+09 ...
## $ Date : chr "5/2/2016 11:59:59 PM" "5/3/2016 11:59:59 PM" "4/13/2016 1:08:52 AM" "4/21/2016 11:59:59 PM" ...
## $ WeightKg : num 52.6 52.6 133.5 56.7 57.3 ...
## $ WeightPounds : num 116 116 294 125 126 ...
## $ Fat : int 22 NA NA NA NA 25 NA NA NA NA ...
## $ BMI : num 22.6 22.6 47.5 21.5 21.7 ...
## $ IsManualReport: chr "True" "True" "False" "True" ...
## $ LogId : num 1.46e+12 1.46e+12 1.46e+12 1.46e+12 1.46e+12 ...
Step 3:Process Data Cleaning the dataset (Process Phase)
Now, I’m going to Process, Clean and Organize the dataset for analysis. I used the funtions glimpse() and skim_without_charts to quickly review the data. I also cleaned the names of the data using clean_names().
For Dataset (Activity, Calories and Intensities): For the data cleaning steps, I did NOT FOUND in this data (Spelling errors, Misfield values, Missing values, Extra and blank space, no duplicated found). For formatting, I used clear formatting. For Data types, some data were converted to numeric and Dates columns will be converted to date type.
Sleep data : 3 duplicates were found and removed.
Weight data : too many missing values were found in one column. And I decided to remove that column. Fixing formatting I spotted some problems with the timestamp data. So before analysis, I need to convert it to date time format and split to date and time.
Step 4:Analysis
I chose to start by taking a look at the Total Steps v Sedentary minutes.
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Upon first look, a few outliers are noticed, but nothing to hold back any analysis. It is interesting to note the blue trend line dives sharply from 0 - 5,000 steps, and another drop from 5,000 - 10,00 steps. It also appears to bottom out close to the 10,000 step mark and be close to flat line / bounce back up slowly beyond.
I then wanted to see if there was any correlation between the total steps taken and the calories burned.
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
As expected, the more activity results in the more calories being burned. The majority falling in the 1,000-3,000 calories mark and under 10,000 steps. If we follow the blue trend line we will notice that at 10,000 steps you typically burn about 2500 calories.
Step5:Share & Act The analysis of Bellabeat’s data has revealed key trends in user behavior regarding activity levels, sleep patterns, and calorie expenditure. By summarizing these findings, we can present actionable insights that will help Bellabeat refine its marketing strategies and product development.
Key Findings:
Activity Levels: Users who consistently engage in moderate-to-high activity levels burn more calories and exhibit healthier sleep patterns. However, a significant portion of users have sedentary lifestyles, indicating a potential market segment for motivation-based engagement strategies.
Sleep Patterns: Users with higher step counts tend to have better sleep quality. Encouraging consistent movement throughout the day may improve overall wellness.
Caloric Expenditure: Calorie burn correlates with activity levels, but variations exist among users, suggesting opportunities to tailor fitness recommendations based on individual user profiles.
Engagement Gaps: Data reveals periods of low engagement, such as weekends or specific time slots, which present opportunities for targeted messaging and engagement campaigns.
ACT: Implementing Data-Driven Strategies
Based on the insights derived from the analysis, the following action steps can help Bellabeat improve user engagement and product effectiveness:
Personalized Recommendations:
Develop AI-driven recommendations for users based on their activity and sleep patterns.
Provide customized wellness tips through the Bellabeat app to encourage better habits.
Gamification & Incentives:
Introduce reward-based challenges to increase daily step count and reduce sedentary behavior.
Implement badges and milestones that users can unlock based on their activity levels.
Targeted Marketing Campaigns:
Segment users based on their activity levels and send personalized messages encouraging engagement.
Use social media and email campaigns to educate users on the benefits of maintaining consistent activity.
Enhanced Product Features:
Integrate smart reminders for movement and hydration based on user inactivity.
Improve sleep tracking insights with tailored advice for better rest quality.
Strategic Partnerships:
Collaborate with fitness influencers and wellness coaches to promote Bellabeat’s benefits.
Develop partnerships with health-focused brands to provide exclusive offers and integrations.
By implementing these strategies, Bellabeat can strengthen its market position, enhance user engagement, and continue its growth as a leader in women’s wellness technology.