image1
Bellabeat is a high-tech company that manufactures health-focused smart products. They are a small company but have the potential to make an impact in the global smart device market. Sršen, co-founder of Bellabeat, believes that analyzing fitness data collected from smart devices can give them an inside look on how to market their products and find opportunities for future growth.
Identify potential opportunities for growth and recommendations for the Bellabeat marketing strategy improvement based on trends in smart device usage.
library(tidyverse)
library(lubridate)
library(dplyr)
library(ggplot2)
library(tidyr)
Before importing the data sets to R, I saved the data sets in Google Sheets. There, I cleaned the data by removing duplicates, trimming white spaces and formatted the time and dates for each data set.
activity <- read_csv("Bellabeat_Capstone - daily_activity.csv")
sleep <- read_csv("Bellabeat_Capstone - sleep_day (1).csv")
weight <- read_csv("Bellabeat_Capstone - weight_log_info.csv")
steps <- read_csv("Bellabeat_Capstone - daily_steps.csv")
calories <- read_csv("Bellabeat_Capstone - daily_calories.csv")
hourly_steps <- read_csv("Bellabeat_Capstone - hourly_steps (1).csv")
Since I already looked at the data in Google Sheets, I just needed to make sure that all the information was uploaded properly by using View().
colnames(activity)
## [1] "Id" "Date"
## [3] "ActivityTime" "TotalSteps"
## [5] "TotalDistance" "TrackerDistance"
## [7] "LoggedActivitiesDistance" "VeryActiveDistance"
## [9] "ModeratelyActiveDistance" "LightActiveDistance"
## [11] "SedentaryActiveDistance" "VeryActiveMinutes"
## [13] "FairlyActiveMinutes" "LightlyActiveMinutes"
## [15] "SedentaryMinutes" "Calories"
Now that everything is ready, it is time to explore the data.
n_distinct(activity$Id)
## [1] 33
n_distinct(sleep$Id)
## [1] 24
n_distinct(steps$Id)
## [1] 33
n_distinct(weight$Id)
## [1] 8
n_distinct(calories$Id)
## [1] 33
This coding shows us information about the amount of participants per smart device tracking feature.
There are 33 participants using the daily activity feature, 24 participants using the sleep tracking feature, 33 participants using the step tracking feature, 8 participants using the weight tracking feature, and 33 participants using the calorie tracking feature.
8 participants does not offer enough data to draw accurate conclusions.
Now, we’ll take a look at the data set summaries:
activity %>%
select(VeryActiveMinutes,FairlyActiveMinutes, LightlyActiveMinutes,
SedentaryMinutes) %>%
summary()
## VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes
## Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.0
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.:127.0 1st Qu.: 729.8
## Median : 4.00 Median : 6.00 Median :199.0 Median :1057.5
## Mean : 21.16 Mean : 13.56 Mean :192.8 Mean : 991.2
## 3rd Qu.: 32.00 3rd Qu.: 19.00 3rd Qu.:264.0 3rd Qu.:1229.5
## Max. :210.00 Max. :143.00 Max. :518.0 Max. :1440.0
activity %>%
select(TotalSteps, TotalDistance) %>%
summary()
## TotalSteps TotalDistance
## Min. : 0 Min. : 0.000
## 1st Qu.: 3790 1st Qu.: 2.620
## Median : 7406 Median : 5.245
## Mean : 7638 Mean : 5.490
## 3rd Qu.:10727 3rd Qu.: 7.713
## Max. :36019 Max. :28.030
calories %>%
select(Calories) %>%
summary()
## Calories
## Min. : 0
## 1st Qu.:1828
## Median :2134
## Mean :2304
## 3rd Qu.:2793
## Max. :4900
hourly_steps %>%
select(ActivityDate, ActivityHour, StepTotal) %>%
summary()
## ActivityDate ActivityHour StepTotal
## Length:22099 Length:22099 Min. : 0.0
## Class :character Class1:hms 1st Qu.: 0.0
## Mode :character Class2:difftime Median : 40.0
## Mode :numeric Mean : 320.2
## 3rd Qu.: 357.0
## Max. :10554.0
sleep %>%
select(TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed) %>%
summary()
## TotalSleepRecords TotalMinutesAsleep TotalTimeInBed
## Min. :1.00 Min. : 58.0 Min. : 61.0
## 1st Qu.:1.00 1st Qu.:361.0 1st Qu.:403.8
## Median :1.00 Median :432.5 Median :463.0
## Mean :1.12 Mean :419.2 Mean :458.5
## 3rd Qu.:1.00 3rd Qu.:490.0 3rd Qu.:526.0
## Max. :3.00 Max. :796.0 Max. :961.0
weight %>%
select(WeightKg, BMI) %>%
summary()
## WeightKg BMI
## Min. : 52.60 Min. :21.45
## 1st Qu.: 61.40 1st Qu.:23.96
## Median : 61.90 Median :24.17
## Mean : 69.48 Mean :25.13
## 3rd Qu.: 84.50 3rd Qu.:25.57
## Max. :133.50 Max. :47.54
Interesting findings from these summaries:
I merged the activity and sleep data sets by ‘Id’ and ‘Date’ before starting my data visualizations:
merged_data <- merge(sleep, activity, by=c('Id', 'Date'))
head(merged_data)
Here, I filtered the sleep data set to see how many people got less than 7 hours of sleep.
filtered_sleep <- sleep %>%
filter(TotalMinutesAsleep <= 420) %>%
group_by(Id)
count_fs_id <- count(filtered_sleep, Id)
as_tibble(count_fs_id)
This tibble shows us how many times each participant slept for less than 7 hours.
unique_count_ids <- unique(count_fs_id$Id)
n_distinct(unique_count_ids)
## [1] 23
According to these findings, only one participant got more than 7 hours of sleep every time they slept.
ggplot(data = activity)+
geom_smooth(mapping = aes(x=TotalSteps, y=Calories)) +
geom_jitter(mapping = aes(x=TotalSteps, y=Calories)) +
labs(title= "Total Steps VS Calories", x="Total Steps", y="Calories")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Here we see a positive correlation between the amount of steps taken by participants and the amount of calories they burned.
ggplot(data=sleep) +
geom_jitter(mapping = aes(x=TotalMinutesAsleep, y=TotalTimeInBed))+
labs(title="Time In Bed VS Time Asleep", x="Total Minutes Asleep", y="Total Time In Bed")
According to the CDC, the recommended hours of sleep for adults is an average of 7 hours or more. Bellabeat could add an extra feature such as sleep reminders to the sleep tracking portion of their app in order to help their users reach their sleep goals and the CDC sleep recommendation. This can help with overall productivity and health of the participants.
Now, let’s take a look at how active participants are per hour:
hourly_steps_new <- hourly_steps %>%
group_by(ActivityHour) %>%
drop_na() %>%
summarise(mean_step_total = mean(StepTotal))
ggplot(data=hourly_steps_new, aes(x=ActivityHour, y=mean_step_total)) +
geom_bar(stat='identity', fill='blue') +
theme(axis.text.x = element_text(angle = 45)) +
labs(title="Average Step Total VS Time", x="Activity Hour", y="Average Step Total")
This visualization shows us that participants are active from 6 AM to 11 PM. Out of those times, participants are the most active from 4 PM to 10 PM. This could be due to participants working out after getting off of work. To encourage participants, Bellabeat can incorporate motivational messages to pop up as a notification during these times.
Next, we’ll look at the relationship between total minutes asleep and sedentary minutes.
ggplot(data=merged_data)+
geom_point(mapping=aes(x=TotalMinutesAsleep, y=SedentaryMinutes))+
geom_smooth(mapping=aes(x=TotalMinutesAsleep, y=SedentaryMinutes))+
labs(title="Minutes Asleep VS Sedentary Minutes Per User",
x="Total Minutes Asleep", y="Sedentary Minutes")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
Here, we can see that there is a negative relationship between minutes asleep and sedentary minutes. Bellabeat could utilize push notifications to remind users to increase their activity throughout their day to help reduce sedentary time. This would help Bellabeat’s users improve their total sleep amount.
Note: I recommend gathering more data to support this insight since correlation does not mean causation.
By collecting data on activity, sleep, stress, and reproductive health, Bellabeat can help empower women through insights into their habits and their overall health.
After analyzing the FitBit Fitness Tracker Data, I discovered some insights that can help Bellabeat’s marketing strategy.
According to my analysis, Bellabeat’s target audience would be women who work day time, office jobs. These women likely work in offices, attend meetings and spend most of their working hours sitting. It can also be women who attend college during the day.
Add an extra feature such as sleep reminders to the sleep tracking portion of their app in order to help their users reach their sleep goals and the 7 hours or more, CDC sleep recommendation. This can help with overall productivity and health of the participants.
To encourage participants to continue being active, Bellabeat can incorporate motivational messages to pop up as a notification during the after work hours of 4 PM and 10 PM.
In order to reduce sedentary minutes (The average sedentary minutes are 991.2 minutes or 16.5 hours) Bellabeat could utilize push notification to remind users to increase their activity throughout their day. This would also help Bellabeat’s users improve their total sleep amount.
Another way to encourage users to continue to be active or to improve their active minutes, Bellabeat can educate them through “fun fact” notifications that list the many benefits of an active lifestyle. For example, according to the National Library of Medicine as few as approximately 4400 steps was significantly related to lower mortality rates compared to approximately 2700 steps.
image2
In order for Bellabeat to make an impact in the high-tech fitness industry, the Bellabeat app can not be like any another fitness tracking app. It has to make women feel like they have someone in their corner. Bellabeat’s app needs to helps empower women through motivational and educational daily or hourly notifications thus helping them transition to a healthy lifestyle and create healthy habits.
Thank you for taking the time to view my first ever capstone project using R programming.