Bellabeat is a high-tech company that manufactures health-focused smart products designed to improve women’s overall health. The company’s founder, Urška Sršen, believes that user data will drive their marketing strategy to provide more opportunites for growth. To learn more you can visit their website: https://bellabeat.com/
Using public data about daily smart device usage to:
find trends in smart device usage
apply these trends to Bellabeat customers
determine marketing strategies influenced by these trends
My analysis will focus on modifications to the mobile application, the Bellabeat app.
The data I used came from the public database of Kaggle. It is data provided by volunteers who logged their FitBit data. I downloaded all eighteen csv files. After viewing converting the files to Google Sheets and previewing the data, I decided to analyze data around daily activity, hourly intensities, and sleep. Here is the dataset: https://www.kaggle.com/datasets/arashnic/fitbit and credit to MÖBIUS for uploading the data.
I decided to use R to analyze the data so I could clean, manipulate, and create visualizations all in one spot.
library(tidyverse)
library(lubridate)
library(dplyr)
library(ggplot2)
library(readr)
dailyActivity_merged <- read_csv("FitbitData/dailyActivity_merged.csv")
hourlyIntensities <- read_csv("FitbitData/hourlyIntensities_merged.csv")
sleepDay <- read_csv("FitbitData/sleepDay_merged.csv")
My first thought was to look at any connections in the daily activity data, any statistics that might be useful.
head(dailyActivity_merged)
## # A tibble: 6 × 15
## Id ActivityDate TotalSteps TotalDistance TrackerDistance LoggedActivitie…
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1.50e9 4/12/2016 13162 8.5 8.5 0
## 2 1.50e9 4/13/2016 10735 6.97 6.97 0
## 3 1.50e9 4/14/2016 10460 6.74 6.74 0
## 4 1.50e9 4/15/2016 9762 6.28 6.28 0
## 5 1.50e9 4/16/2016 12669 8.16 8.16 0
## 6 1.50e9 4/17/2016 9705 6.48 6.48 0
## # … with 9 more variables: VeryActiveDistance <dbl>,
## # ModeratelyActiveDistance <dbl>, LightActiveDistance <dbl>,
## # SedentaryActiveDistance <dbl>, VeryActiveMinutes <dbl>,
## # FairlyActiveMinutes <dbl>, LightlyActiveMinutes <dbl>,
## # SedentaryMinutes <dbl>, Calories <dbl>
dailyActivity_merged%>% select(TotalSteps, TotalDistance, Calories)%>% summary()
## TotalSteps TotalDistance Calories
## Min. : 0 Min. : 0.000 Min. : 0
## 1st Qu.: 3790 1st Qu.: 2.620 1st Qu.:1828
## Median : 7406 Median : 5.245 Median :2134
## Mean : 7638 Mean : 5.490 Mean :2304
## 3rd Qu.:10727 3rd Qu.: 7.713 3rd Qu.:2793
## Max. :36019 Max. :28.030 Max. :4900
dailyActivity_merged%>% select(VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes, SedentaryMinutes)%>% summary()
## VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes
## Min. : 0.00 Min. : 0.00 Min. : 0.0 Min. : 0.0
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.:127.0 1st Qu.: 729.8
## Median : 4.00 Median : 6.00 Median :199.0 Median :1057.5
## Mean : 21.16 Mean : 13.56 Mean :192.8 Mean : 991.2
## 3rd Qu.: 32.00 3rd Qu.: 19.00 3rd Qu.:264.0 3rd Qu.:1229.5
## Max. :210.00 Max. :143.00 Max. :518.0 Max. :1440.0
The average calories for this sample group is around 2,300. As a general rule, men should burn 2,500 calories a day while women should burn 2,000. Assuming this is an unbiased dataset, there is equal representation and 2,300 is an accurate average.
I wanted to see if there were any connections to the amount of the data being categorized as active, so I created a new row totaling all the active minutes logged.
dailyActivity_merged$totalActiveMinutes = rowSums(dailyActivity_merged[,c("VeryActiveMinutes","FairlyActiveMinutes","LightlyActiveMinutes")])
Then I created a plot showing the relationship between active minutes and calories.
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
There is a positive correlation between active time and calories burned. A suggestion for Bellabeat would be to add options for daily tracking. Most apps use steps as their metric. That can include steps taken on short walks, like to the bathroom and back, which isn’t really being active. Bellabeat could use ‘active time’ as their metric. If ‘active time’ is defined as sustained activity, that would provide the app with more accurate measurements.
Bellabeat could also add a notification to help users know when they have reached enough active minutes to burn the desired number of calories.
I wanted to analyze and see what times of day FitBit users were logging the highest intensity of activity.
head(hourlyIntensities)
## # A tibble: 6 × 4
## Id ActivityHour TotalIntensity AverageIntensity
## <dbl> <chr> <dbl> <dbl>
## 1 1503960366 4/12/2016 12:00:00 AM 20 0.333
## 2 1503960366 4/12/2016 1:00:00 AM 8 0.133
## 3 1503960366 4/12/2016 2:00:00 AM 7 0.117
## 4 1503960366 4/12/2016 3:00:00 AM 0 0
## 5 1503960366 4/12/2016 4:00:00 AM 0 0
## 6 1503960366 4/12/2016 5:00:00 AM 0 0
I wanted to change the formatting of the ‘ActivityHour’ to date and time so I could aggregate the data by hour.
hourlyIntensities$ActivityHour=as.POSIXct(hourlyIntensities$ActivityHour, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
hourlyIntensities$time <- format(hourlyIntensities$ActivityHour, format = "%H:%M:%S")
hourlyIntensities$date <- format(hourlyIntensities$ActivityHour, format = "%m/%d/%y")
#aggregating intensity by hour
average_hourly_intensity <- aggregate(hourlyIntensities$AverageIntensity, by=list(time_hour=hourlyIntensities$time), mean)
colnames(average_hourly_intensity)<- c('hour_military', 'average_intensity')
head(average_hourly_intensity)
## hour_military average_intensity
## 1 00:00:00 0.035492524
## 2 01:00:00 0.023651328
## 3 02:00:00 0.017399091
## 4 03:00:00 0.007395519
## 5 04:00:00 0.010550802
## 6 05:00:00 0.082510751
From here, I created a plot to show what times of day, FitBit users are most active.
As we can see, the highest intensity in activity comes around 5:00pm to 7:00pm, most likely when users are getting off work. Being a working adult, there are lots of factors that can impact how we spend our time during the day, like our jobs, families, commutes, et cetera. Bellabeat could help health and fitness be higher of a priority by offering suggestions for different lengths of workouts, quick tips for incorporating movement, or even fast and easy recipes.
For this analysis, I thought there would be some connection between time asleep and intensity of activity. I aggregated the data based on users for both sleep and average intensity, merged the data, and converted minutes asleep into hours.
#aggregating sleep time based on user id
userr_avg_time_asleep <- aggregate(sleepDay$TotalMinutesAsleep, by=list(user=sleepDay$Id), mean)
#filtering out hours when average intensity was low due to sleep(see graphic above)
filtered_avg_int <- hourlyIntensities%>% filter(hourlyIntensities$time != '00:00:00',hourlyIntensities$time != '01:00:00', hourlyIntensities$time != '02:00:00', hourlyIntensities$time != '03:00:00', hourlyIntensities$time != '04:00:00')
#aggregating intensity based on user id
user_avg_intensity <- aggregate(filtered_avg_int$AverageIntensity, by=list(Avg_Int=filtered_avg_int$Id),mean)
colnames(userr_avg_time_asleep)<-c('user_ID', 'avg_mins_asleep')
colnames(user_avg_intensity)<-c('user_ID','avg_active_intensity')
#merging sleep and intensity data to view correlation
merged_sleep_intensity <- merge(userr_avg_time_asleep, user_avg_intensity, by=('user_ID'))
merged_sleep_intensity$avg_hours_asleep <- (merged_sleep_intensity$avg_mins_asleep)/60
head(merged_sleep_intensity)
## user_ID avg_mins_asleep avg_active_intensity avg_hours_asleep
## 1 1503960366 360.2800 0.32851266 6.004667
## 2 1644430081 294.0000 0.21323179 4.900000
## 3 1844505072 652.0000 0.10164932 10.866667
## 4 1927972279 417.0000 0.03869766 6.950000
## 5 2026352035 506.1786 0.22722317 8.436310
## 6 2320127002 61.0000 0.16893679 1.016667
Surprisingly, I did not find a strong correlation between the intensity of activity and time spent sleeping. This data does only show 24 participants that provided sleep data.
Credit to JULEN ARANGUREN for this next chunk of code. To view his analysis of this data, click here. I still wanted to explore how sleep might be affected so I took inspiration from JULEN and categorized each user into the type of activity they logged the most. I merged sleep activity with daily activity so I could make a plot.
data_by_usertype <- daily_sleep_activity %>%
summarise(
user_type = factor(case_when(
SedentaryMinutes > mean(SedentaryMinutes) & LightlyActiveMinutes < mean(LightlyActiveMinutes) & FairlyActiveMinutes < mean(FairlyActiveMinutes) & VeryActiveMinutes < mean(VeryActiveMinutes) ~ "Sedentary",
SedentaryMinutes < mean(SedentaryMinutes) & LightlyActiveMinutes > mean(LightlyActiveMinutes) & FairlyActiveMinutes < mean(FairlyActiveMinutes) & VeryActiveMinutes < mean(VeryActiveMinutes) ~ "Lightly Active",
SedentaryMinutes < mean(SedentaryMinutes) & LightlyActiveMinutes < mean(LightlyActiveMinutes) & FairlyActiveMinutes > mean(FairlyActiveMinutes) & VeryActiveMinutes < mean(VeryActiveMinutes) ~ "Fairly Active",
SedentaryMinutes < mean(SedentaryMinutes) & LightlyActiveMinutes < mean(LightlyActiveMinutes) & FairlyActiveMinutes < mean(FairlyActiveMinutes) & VeryActiveMinutes > mean(VeryActiveMinutes) ~ "Very Active",
),levels=c("Sedentary", "Lightly Active", "Fairly Active", "Very Active")), TotalMinutesAsleep, .group=Id) %>%
drop_na()
Now I can see how sleep times are distributed by user type.
There is a larger number of users that fall under the ‘Sedentary’ and ‘Lightly Active’ user types.
When the sleep data is added, we can see that there are more outliers for ‘Sedentary’ and ‘Lightly Active’ users. This can mean that it is less likely you will fall within the recommended hours of sleep.
According to the National Sleep Foundation, it is recommended that adults between the ages of 18 and 64 get seven to nine hours of sleep. I indicated this range with red horizontal lines. For Bellabeat to help their users, they can emphasize the effects a fairly or highly active life style can have on a user’s sleep.
If Bellabeat uses ‘active time’ as a metric for their users to track, this can connect to the sleep data and help users more clearly visualize their behavior.
We already know that Bellabeat has given women knowledge about their own habits and health by gathering data on activity, sleep, stress, and reproductive health. Bellabeat has rapidly expanded since its founding in 2013 and established itself as a tech-driven health firm for women.
According to hourly intensity data, women who work full-time typically spend a lot of time at the computer, in meetings, or otherwise concentrated on their work(according to the hourly intensity data).
These women engage in little to no exercise to maintain their health (according to the user type analysis). Despite the fact that they must increase their daily activities to promote their health. They could require inspiration to continue or understanding about forming wholesome behaviors.
Being active can mean different things to different people. Not everyone has time for two hours in a gym, or a long ten mile run. With Bellabeat, doing what works for you and reaching those goals can be easier than ever. Get recommendations for activities, meals, and healthy habits that fit your lifestyle.
Bellabeat can use ‘active time’ as their metric to track
Bellabeat can add a notification to help users know when they have reached enough active minutes to burn the desired number of calories
Bellabeat can include a library for different lengths of workouts, quick tips for incorporating movement, or even fast and easy recipes
Bellabeat can more clearly connect sleep data to the ‘active time’ metric to show progress over time