About the Company

Bellabeat is a high-tech company that manufactures health-focused smart products designed to improve women’s overall health. The company’s founder, Urška Sršen, believes that user data will drive their marketing strategy to provide more opportunites for growth. To learn more you can visit their website: https://bellabeat.com/

Business Task

Using public data about daily smart device usage to:

My analysis will focus on modifications to the mobile application, the Bellabeat app.

Data Sources

The data I used came from the public database of Kaggle. It is data provided by volunteers who logged their FitBit data. I downloaded all eighteen csv files. After viewing converting the files to Google Sheets and previewing the data, I decided to analyze data around daily activity, hourly intensities, and sleep. Here is the dataset: https://www.kaggle.com/datasets/arashnic/fitbit and credit to MÖBIUS for uploading the data.

Importing Libraries

I decided to use R to analyze the data so I could clean, manipulate, and create visualizations all in one spot.

library(tidyverse)
library(lubridate)
library(dplyr)
library(ggplot2)
library(readr)

Importing the Data

dailyActivity_merged <- read_csv("FitbitData/dailyActivity_merged.csv")
hourlyIntensities <- read_csv("FitbitData/hourlyIntensities_merged.csv")
sleepDay <- read_csv("FitbitData/sleepDay_merged.csv")

Data Cleaning, Manipulation, and Exploration

Daily Activity

My first thought was to look at any connections in the daily activity data, any statistics that might be useful.

head(dailyActivity_merged)
## # A tibble: 6 × 15
##        Id ActivityDate TotalSteps TotalDistance TrackerDistance LoggedActivitie…
##     <dbl> <chr>             <dbl>         <dbl>           <dbl>            <dbl>
## 1  1.50e9 4/12/2016         13162          8.5             8.5                 0
## 2  1.50e9 4/13/2016         10735          6.97            6.97                0
## 3  1.50e9 4/14/2016         10460          6.74            6.74                0
## 4  1.50e9 4/15/2016          9762          6.28            6.28                0
## 5  1.50e9 4/16/2016         12669          8.16            8.16                0
## 6  1.50e9 4/17/2016          9705          6.48            6.48                0
## # … with 9 more variables: VeryActiveDistance <dbl>,
## #   ModeratelyActiveDistance <dbl>, LightActiveDistance <dbl>,
## #   SedentaryActiveDistance <dbl>, VeryActiveMinutes <dbl>,
## #   FairlyActiveMinutes <dbl>, LightlyActiveMinutes <dbl>,
## #   SedentaryMinutes <dbl>, Calories <dbl>
dailyActivity_merged%>% select(TotalSteps, TotalDistance, Calories)%>% summary()
##    TotalSteps    TotalDistance       Calories   
##  Min.   :    0   Min.   : 0.000   Min.   :   0  
##  1st Qu.: 3790   1st Qu.: 2.620   1st Qu.:1828  
##  Median : 7406   Median : 5.245   Median :2134  
##  Mean   : 7638   Mean   : 5.490   Mean   :2304  
##  3rd Qu.:10727   3rd Qu.: 7.713   3rd Qu.:2793  
##  Max.   :36019   Max.   :28.030   Max.   :4900
dailyActivity_merged%>% select(VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes, SedentaryMinutes)%>% summary()
##  VeryActiveMinutes FairlyActiveMinutes LightlyActiveMinutes SedentaryMinutes
##  Min.   :  0.00    Min.   :  0.00      Min.   :  0.0        Min.   :   0.0  
##  1st Qu.:  0.00    1st Qu.:  0.00      1st Qu.:127.0        1st Qu.: 729.8  
##  Median :  4.00    Median :  6.00      Median :199.0        Median :1057.5  
##  Mean   : 21.16    Mean   : 13.56      Mean   :192.8        Mean   : 991.2  
##  3rd Qu.: 32.00    3rd Qu.: 19.00      3rd Qu.:264.0        3rd Qu.:1229.5  
##  Max.   :210.00    Max.   :143.00      Max.   :518.0        Max.   :1440.0

The average calories for this sample group is around 2,300. As a general rule, men should burn 2,500 calories a day while women should burn 2,000. Assuming this is an unbiased dataset, there is equal representation and 2,300 is an accurate average.

I wanted to see if there were any connections to the amount of the data being categorized as active, so I created a new row totaling all the active minutes logged.

dailyActivity_merged$totalActiveMinutes = rowSums(dailyActivity_merged[,c("VeryActiveMinutes","FairlyActiveMinutes","LightlyActiveMinutes")])

Then I created a plot showing the relationship between active minutes and calories.

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

There is a positive correlation between active time and calories burned. A suggestion for Bellabeat would be to add options for daily tracking. Most apps use steps as their metric. That can include steps taken on short walks, like to the bathroom and back, which isn’t really being active. Bellabeat could use ‘active time’ as their metric. If ‘active time’ is defined as sustained activity, that would provide the app with more accurate measurements.

Bellabeat could also add a notification to help users know when they have reached enough active minutes to burn the desired number of calories.

Hourly Intensities

I wanted to analyze and see what times of day FitBit users were logging the highest intensity of activity.

head(hourlyIntensities)
## # A tibble: 6 × 4
##           Id ActivityHour          TotalIntensity AverageIntensity
##        <dbl> <chr>                          <dbl>            <dbl>
## 1 1503960366 4/12/2016 12:00:00 AM             20            0.333
## 2 1503960366 4/12/2016 1:00:00 AM               8            0.133
## 3 1503960366 4/12/2016 2:00:00 AM               7            0.117
## 4 1503960366 4/12/2016 3:00:00 AM               0            0    
## 5 1503960366 4/12/2016 4:00:00 AM               0            0    
## 6 1503960366 4/12/2016 5:00:00 AM               0            0

I wanted to change the formatting of the ‘ActivityHour’ to date and time so I could aggregate the data by hour.

hourlyIntensities$ActivityHour=as.POSIXct(hourlyIntensities$ActivityHour, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
hourlyIntensities$time <- format(hourlyIntensities$ActivityHour, format = "%H:%M:%S")
hourlyIntensities$date <- format(hourlyIntensities$ActivityHour, format = "%m/%d/%y")
#aggregating intensity by hour
average_hourly_intensity <- aggregate(hourlyIntensities$AverageIntensity, by=list(time_hour=hourlyIntensities$time), mean)
colnames(average_hourly_intensity)<- c('hour_military', 'average_intensity')
head(average_hourly_intensity)
##   hour_military average_intensity
## 1      00:00:00       0.035492524
## 2      01:00:00       0.023651328
## 3      02:00:00       0.017399091
## 4      03:00:00       0.007395519
## 5      04:00:00       0.010550802
## 6      05:00:00       0.082510751

From here, I created a plot to show what times of day, FitBit users are most active.

As we can see, the highest intensity in activity comes around 5:00pm to 7:00pm, most likely when users are getting off work. Being a working adult, there are lots of factors that can impact how we spend our time during the day, like our jobs, families, commutes, et cetera. Bellabeat could help health and fitness be higher of a priority by offering suggestions for different lengths of workouts, quick tips for incorporating movement, or even fast and easy recipes.

Sleep Versus Intensity

For this analysis, I thought there would be some connection between time asleep and intensity of activity. I aggregated the data based on users for both sleep and average intensity, merged the data, and converted minutes asleep into hours.

#aggregating sleep time based on user id
userr_avg_time_asleep <- aggregate(sleepDay$TotalMinutesAsleep, by=list(user=sleepDay$Id), mean)
#filtering out hours when average intensity was low due to sleep(see graphic above)
 filtered_avg_int <- hourlyIntensities%>% filter(hourlyIntensities$time != '00:00:00',hourlyIntensities$time != '01:00:00', hourlyIntensities$time != '02:00:00', hourlyIntensities$time != '03:00:00', hourlyIntensities$time != '04:00:00')
#aggregating intensity based on user id
 user_avg_intensity <- aggregate(filtered_avg_int$AverageIntensity, by=list(Avg_Int=filtered_avg_int$Id),mean)
colnames(userr_avg_time_asleep)<-c('user_ID', 'avg_mins_asleep')
colnames(user_avg_intensity)<-c('user_ID','avg_active_intensity')
#merging sleep and intensity data to view correlation
merged_sleep_intensity <- merge(userr_avg_time_asleep, user_avg_intensity, by=('user_ID'))
merged_sleep_intensity$avg_hours_asleep <- (merged_sleep_intensity$avg_mins_asleep)/60
head(merged_sleep_intensity)
##      user_ID avg_mins_asleep avg_active_intensity avg_hours_asleep
## 1 1503960366        360.2800           0.32851266         6.004667
## 2 1644430081        294.0000           0.21323179         4.900000
## 3 1844505072        652.0000           0.10164932        10.866667
## 4 1927972279        417.0000           0.03869766         6.950000
## 5 2026352035        506.1786           0.22722317         8.436310
## 6 2320127002         61.0000           0.16893679         1.016667

Surprisingly, I did not find a strong correlation between the intensity of activity and time spent sleeping. This data does only show 24 participants that provided sleep data.

Sleep Versus User Type

Credit to JULEN ARANGUREN for this next chunk of code. To view his analysis of this data, click here. I still wanted to explore how sleep might be affected so I took inspiration from JULEN and categorized each user into the type of activity they logged the most. I merged sleep activity with daily activity so I could make a plot.

data_by_usertype <- daily_sleep_activity %>%
summarise(
user_type = factor(case_when(
    SedentaryMinutes > mean(SedentaryMinutes) & LightlyActiveMinutes < mean(LightlyActiveMinutes) & FairlyActiveMinutes < mean(FairlyActiveMinutes) & VeryActiveMinutes < mean(VeryActiveMinutes) ~ "Sedentary",
    SedentaryMinutes < mean(SedentaryMinutes) & LightlyActiveMinutes > mean(LightlyActiveMinutes) & FairlyActiveMinutes < mean(FairlyActiveMinutes) & VeryActiveMinutes < mean(VeryActiveMinutes) ~ "Lightly Active",
    SedentaryMinutes < mean(SedentaryMinutes) & LightlyActiveMinutes < mean(LightlyActiveMinutes) & FairlyActiveMinutes > mean(FairlyActiveMinutes) & VeryActiveMinutes < mean(VeryActiveMinutes) ~ "Fairly Active",
    SedentaryMinutes < mean(SedentaryMinutes) & LightlyActiveMinutes < mean(LightlyActiveMinutes) & FairlyActiveMinutes < mean(FairlyActiveMinutes) & VeryActiveMinutes > mean(VeryActiveMinutes) ~ "Very Active",
),levels=c("Sedentary", "Lightly Active", "Fairly Active", "Very Active")), TotalMinutesAsleep, .group=Id) %>%
drop_na()

Now I can see how sleep times are distributed by user type.

There is a larger number of users that fall under the ‘Sedentary’ and ‘Lightly Active’ user types.

When the sleep data is added, we can see that there are more outliers for ‘Sedentary’ and ‘Lightly Active’ users. This can mean that it is less likely you will fall within the recommended hours of sleep.

According to the National Sleep Foundation, it is recommended that adults between the ages of 18 and 64 get seven to nine hours of sleep. I indicated this range with red horizontal lines. For Bellabeat to help their users, they can emphasize the effects a fairly or highly active life style can have on a user’s sleep.

If Bellabeat uses ‘active time’ as a metric for their users to track, this can connect to the sleep data and help users more clearly visualize their behavior.

Summary

We already know that Bellabeat has given women knowledge about their own habits and health by gathering data on activity, sleep, stress, and reproductive health. Bellabeat has rapidly expanded since its founding in 2013 and established itself as a tech-driven health firm for women.

Target Audience

According to hourly intensity data, women who work full-time typically spend a lot of time at the computer, in meetings, or otherwise concentrated on their work(according to the hourly intensity data).

These women engage in little to no exercise to maintain their health (according to the user type analysis). Despite the fact that they must increase their daily activities to promote their health. They could require inspiration to continue or understanding about forming wholesome behaviors.

Message for Bellabeat marking strategy

Being active can mean different things to different people. Not everyone has time for two hours in a gym, or a long ten mile run. With Bellabeat, doing what works for you and reaching those goals can be easier than ever. Get recommendations for activities, meals, and healthy habits that fit your lifestyle.

Recommendations for the Bellabeat app

  1. Bellabeat can use ‘active time’ as their metric to track

  2. Bellabeat can add a notification to help users know when they have reached enough active minutes to burn the desired number of calories

  3. Bellabeat can include a library for different lengths of workouts, quick tips for incorporating movement, or even fast and easy recipes

  4. Bellabeat can more clearly connect sleep data to the ‘active time’ metric to show progress over time

This was my first project running with R. I hope you enjoyed it!