I used Bellabeat case study for my capstone to achieve a Google Data Analytic certificate. In this case study, R was used to clean, analyse and visualise the data.
Bellabeat is a high-tech manufacturer of health products. Bellabeat targets women to use their product for a healthy lifestyle.
To date, Bellabeat has three innovative health products: Bellabeat app, Leaf Time Spring.
Each of these products has its own functions but with similar aims to track the user’s lifestyle, such as daily activity, sleep, stress, menstrual cycle, and dehydration.
Bellabeat wants to use the data produced by the intelligent device to gain insight into how consumers use their smart devices. Based on the insights could guide the marketing strategy for this company and recommend or suggest Bellabeat’s marketing strategy. Using a smart device, Bellabeat would like to know how these trends in women can inform Bellbeat’s marketing strategy.
1.What are some trends in intelligent device usage?
2.How could these trends apply to Bellabeat customers?
3.How could these trends guide marketing strategy?
Based on the insights, we could suggest new features in the smart devices and custom profiles based on age, promotion and consultation.
Data was retrieved from https://www.kaggle.com/datasets/arashnic/fitbit. There are 18 files with different types of data, such as activity, calorie, step and intensity.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.4 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 2.0.2 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(tidyr)
library(sqldf)
## Loading required package: gsubfn
## Loading required package: proto
## Loading required package: RSQLite
library(stringi)
library(stringr)
library(ggplot2)
d_act<- read.csv("~/Desktop/Coursera/Capstone/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
d_cal<- read.csv("~/Desktop/Coursera/Capstone/Fitabase Data 4.12.16-5.12.16/dailyCalories_merged.csv")
d_step<- read.csv("~/Desktop/Coursera/Capstone/Fitabase Data 4.12.16-5.12.16/dailySteps_merged.csv")
weight<- read.csv("~/Desktop/Coursera/Capstone/Fitabase Data 4.12.16-5.12.16/weightLogInfo_merged.csv")
sleep<- read.csv("~/Desktop/Coursera/Capstone/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
colnames(d_act)
## [1] "Id" "ActivityDate"
## [3] "TotalSteps" "TotalDistance"
## [5] "TrackerDistance" "LoggedActivitiesDistance"
## [7] "VeryActiveDistance" "ModeratelyActiveDistance"
## [9] "LightActiveDistance" "SedentaryActiveDistance"
## [11] "VeryActiveMinutes" "FairlyActiveMinutes"
## [13] "LightlyActiveMinutes" "SedentaryMinutes"
## [15] "Calories"
colnames(d_cal)
## [1] "Id" "ActivityDay" "Calories"
colnames(d_step)
## [1] "Id" "ActivityDay" "StepTotal"
user_count<- unique(select(d_act, Id)) #33 user use the tracker
user_sleep<- unique(select(sleep, Id)) #24 user key in the sleep pattern
m_user<- anti_join(user_count, user_sleep, by="Id") #list of 9 user ID with missing sleep pattern
b<- unique(d_act) #identify duplicate data #no duplicate user in daily activity file
b2<- unique(sleep) #sleep has duplicate & remove duplicate
sleep<- b2
rm(b2)
The dataset consists of 33 individual which is not a big datasets. More samples are needed to ensure clarity and integrity of the results. In addition, the dataset is from 2 months data collection. However, this data could be used for early insights of Bellabeat costumers in using the app. For instance: * to track how many users consistently used the Bellabeat app. * how many users track their sleep and weight pattern. * which user is actively used the Bellabeat app.
#merge sleep & daily activity
sleep<- separate(data=sleep, col=SleepDay, into=c("SleepDay", "rmv", "rmv2"), sep="\\ ") #fix SleepDay col.
sleep<- select(sleep, Id, SleepDay, TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed)
sleep$SleepDay<- str_trim(sleep$SleepDay)
colnames(sleep)[2]="ActivityDate"
act_sleep<- merge(d_act, sleep, by=c("Id", "ActivityDate"))
glimpse(act_sleep)
## Rows: 410
## Columns: 18
## $ Id <dbl> 1503960366, 1503960366, 1503960366, 150396036…
## $ ActivityDate <chr> "4/12/2016", "4/13/2016", "4/15/2016", "4/16/…
## $ TotalSteps <int> 13162, 10735, 9762, 12669, 9705, 15506, 10544…
## $ TotalDistance <dbl> 8.50, 6.97, 6.28, 8.16, 6.48, 9.88, 6.68, 6.3…
## $ TrackerDistance <dbl> 8.50, 6.97, 6.28, 8.16, 6.48, 9.88, 6.68, 6.3…
## $ LoggedActivitiesDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveDistance <dbl> 1.88, 1.57, 2.14, 2.71, 3.19, 3.53, 1.96, 1.3…
## $ ModeratelyActiveDistance <dbl> 0.55, 0.69, 1.26, 0.41, 0.78, 1.32, 0.48, 0.3…
## $ LightActiveDistance <dbl> 6.06, 4.71, 2.83, 5.04, 2.51, 5.03, 4.24, 4.6…
## $ SedentaryActiveDistance <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
## $ VeryActiveMinutes <int> 25, 21, 29, 36, 38, 50, 28, 19, 41, 39, 73, 3…
## $ FairlyActiveMinutes <int> 13, 19, 34, 10, 20, 31, 12, 8, 21, 5, 14, 23,…
## $ LightlyActiveMinutes <int> 328, 217, 209, 221, 164, 264, 205, 211, 262, …
## $ SedentaryMinutes <int> 728, 776, 726, 773, 539, 775, 818, 838, 732, …
## $ Calories <int> 1985, 1797, 1745, 1863, 1728, 2035, 1786, 177…
## $ TotalSleepRecords <int> 1, 2, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
## $ TotalMinutesAsleep <int> 327, 384, 412, 340, 700, 304, 360, 325, 361, …
## $ TotalTimeInBed <int> 346, 407, 442, 367, 712, 320, 377, 364, 384, …
#merge user and weight
weight<- separate(data = weight, col= Date, into=c("ActivityDate", "rmv", "rmv2"), sep="\\ ")
weight<- weight[, c(1,2,5:10)]
weight$ActivityDate<- str_trim(weight$ActivityDate)
act_weight<- merge(weight, d_act, by=c("Id", "ActivityDate"))
m_weight<- anti_join(d_act, act_weight, by=c("Id", "ActivityDate"))
33 users for two months (April-May).
29 active users, 3 moderate and 1 low user.
24 users tracked sleep patterns, but they were not consistent.
8 active users that consistently keep track of their daily activities (sleep, walk) and weight.
This finding shows that users needed to be more consistent in tracking their activity.
8 users key in their weight, and it shows:
1 user= obesity (User ID: 1927972279)
3 users= with healthy weight (User ID: 6962181067, 1503960366, 2873212765)
4 users= overweight (User ID: 4319703577, 4558609924, 5577150313, 8877689391 )
It is presumed that the more weight, the more sleep individuals need. However, in this study, the individual weight does not reflect the amount of sleep.
10,000 steps are recommended for daily steps, which could burn 300-400 calories. However, 18 users in the Light Active category achieve less than 10000 steps. 8 users in the Sedentary category complete less than 5000 steps. Results show that individuals with less weight performed more daily steps than those with more weight.
There is a relation between daily steps, calories and weight per individual. For instance, the more steps you have then, the more calorie you burn.
Others suggestion: * To improve the data sizes and data types. For instance, provide profile of individual such as age, marital status and occupation. This information could help Bellabeat custom activity or features to the selected users.