Scenario

Bellabeat is a high-tech company that manufactures health-focused smart products, which was founded by Urska Srsen and Sando Mur. The focus of the smart products is informing and inspiring women by collecting data on activity, sleep, stress, and reproductive health. Since its establishment in 2013, Bellabeat has grown rapidly and has positioned itself as a tech-driven wellness company for women.

The company has 5 main products in their lineup:

  1. Bellabeat App: Provides users with comprehensive health data that connects to their line of smart wellness products.

  2. Leaf: Basic wellness tracker that can be worn as bracelet, nechlace, or clip. The Leaf connects to the Bellabeat app to track activity, sleep, and stress.

  3. Time: Wellness watch that connects to the Bellabeat app to track activity, sleep, and stress.

  4. Spring: Water bottle that tracks daily water intake. Spring connects to the Bellabeat app to track hydration levels.

  5. Bellabeat Membership: Subscription-based membership which gives users 24/7 access to fully personalized guidance on nutrition, activity, sleep, health, and beauty, and mindfulness based on lifestyle and gols.

Ask

The task is to analyze smart device usage data to gain insight into how consumers use non-Bellabeat smart devices. These insigths will be used to answer important questions about Bellabeat users.

Prepare

The data for this analysis was collected through FitBit. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. With the data being submitted voluntarily, sampling bias is likely. Depending on the population being studied, a sample size of 30 may not sufficient to test a hypothesis. Furthermore, the credibility of the data can be described below:

The data is published Data can be found here.

The following data sets were used in the study:

To continue the analysis, necessary packages will need to be installed and loaded. Followed by the needed data sets.

Process

To begin, it is beneficial to have an idea of how the tables are formatted.

# A glimpse into the data.
glimpse(activity_daily)
## Rows: 940
## Columns: 8
## $ Id                   <dbl> 1503960366, 1624580081, 1844505072, 1927972279, 2…
## $ ActivityDate         <chr> "5/12/2016", "5/12/2016", "5/12/2016", "5/12/2016…
## $ TotalSteps           <dbl> 0, 2971, 0, 0, 9117, 8891, 2661, 7566, 590, 17, 3…
## $ VeryActiveMinutes    <dbl> 0, 0, 0, 0, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 0, …
## $ FairlyActiveMinutes  <dbl> 0, 0, 0, 0, 16, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 8, …
## $ LightlyActiveMinutes <dbl> 0, 107, 0, 0, 236, 343, 128, 268, 21, 2, 108, 58,…
## $ SedentaryMinutes     <dbl> 1440, 890, 711, 966, 728, 330, 830, 720, 721, 0, …
## $ Calories             <dbl> 0, 1002, 665, 1383, 1853, 1364, 1125, 1431, 1120,…
glimpse(sleep_day)
## Rows: 413
## Columns: 4
## $ Id                 <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 150…
## $ SleepDay           <chr> "4/12/2016", "4/13/2016", "4/15/2016", "4/16/2016",…
## $ TotalMinutesAsleep <dbl> 327, 384, 412, 340, 700, 304, 360, 325, 361, 430, 2…
## $ TotalTimeInBed     <dbl> 346, 407, 442, 367, 712, 320, 377, 364, 384, 449, 3…
glimpse(calories_hourly)
## Rows: 22,099
## Columns: 4
## $ Id           <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 150396036…
## $ ActivityDate <chr> "4/12/2016", "4/12/2016", "4/12/2016", "4/12/2016", "4/12…
## $ ActivityHour <time> 00:00:00, 01:00:00, 02:00:00, 03:00:00, 04:00:00, 05:00:…
## $ Calories     <dbl> 81, 61, 59, 47, 48, 48, 48, 47, 68, 141, 99, 76, 73, 66, …
glimpse(steps_hourly)
## Rows: 22,099
## Columns: 4
## $ Id           <dbl> 1503960366, 1503960366, 1503960366, 1503960366, 150396036…
## $ ActivityDate <chr> "4/12/2016", "4/12/2016", "4/12/2016", "4/12/2016", "4/12…
## $ ActivityHour <time> 00:00:00, 01:00:00, 02:00:00, 03:00:00, 04:00:00, 05:00:…
## $ StepTotal    <dbl> 373, 160, 151, 0, 0, 0, 0, 0, 250, 1864, 676, 360, 253, 2…

Adjusting format of tables

With an understanding of the structure, removing any duplicates is beneficial. This was completed using Excel. Three uplicates were found and removed in the Heartrate data. There were redundant columns in the data sets that were not included. The following data sets had columns removed

  • Daily Activity
    • Total Distance, Tracker Distance, Logged Activities, Very Active Distance, Moderately Active Distance, Light Active Distance
  • Daily Sleep
    • Total recorded sleeps

To continue, the column names and values need to be addressed.

# Renaming columns that will be used for analysis
activity_daily <- activity_daily %>%
  rename(participant = Id) %>%
  rename(date = ActivityDate) %>%
  rename(steps = TotalSteps) %>%
  rename(very_active_minutes = VeryActiveMinutes) %>%
  rename(moderate_active_minutes = FairlyActiveMinutes) %>%
  rename(light_active_minutes = LightlyActiveMinutes) %>%
  rename(sed_active_minutes = SedentaryMinutes) %>%
  rename(calories = Calories)

sleep_day <- sleep_day %>%
  rename(participant = Id) %>%
  rename(date = SleepDay) %>%
  rename(sleep_minutes = TotalMinutesAsleep) %>%
  rename(time_in_bed = TotalTimeInBed)

calories_hourly <- calories_hourly %>%
  rename(participant = Id) %>%
  rename(date = ActivityDate) %>%
  rename(hour = ActivityHour) %>%
  rename(calories = Calories)

steps_hourly <- steps_hourly %>%
  rename(participant = Id) %>%
  rename(date = ActivityDate) %>%
  rename(hour = ActivityHour) %>%
  rename(steps = StepTotal)

# Change Date columns so that the values are recognized as dates.
activity_daily$date <- as.Date(activity_daily$date, format = "%m/%d/%Y")
sleep_day$date <- as.Date(sleep_day$date, format = "%m/%d/%Y")
calories_hourly$date <- as.Date(calories_hourly$date, format = "%m/%d/%Y")
steps_hourly$date <- as.Date(steps_hourly$date, format = "%m/%d/%Y")

# Change ID column to characters so they are representative of individuals as opposed to values.
activity_daily$participant <- as.character(activity_daily$participant)
sleep_day$participant <- as.character(sleep_day$participant)
calories_hourly$participant <- as.character(calories_hourly$participant)
steps_hourly$participant <- as.character(steps_hourly$participant)

# Merging sleep and calories together to gain insight on their relationship.
activity_sleep_daily <- merge(sleep_day, activity_daily, by = c("participant", "date"))
calories_step_hourly <- merge(calories_hourly, steps_hourly, by = c("participant", "date", "hour"))


# Make Daily Activity table long. This will help dive deeper into the different activity levels.
activity_sleep_daily_long <- activity_sleep_daily %>%
  gather(activity_level, value, very_active_minutes:sed_active_minutes)

Analyze

Begin analysis by calculating the 5-number summary of the desired information from the Daily Activity dataset. Daily steps, calories and active minutes will be summarized.

activity_daily %>%
  select(steps, calories, very_active_minutes, moderate_active_minutes, light_active_minutes, sed_active_minutes) %>%
  summary()
##      steps          calories    very_active_minutes moderate_active_minutes
##  Min.   :    0   Min.   :   0   Min.   :  0.00      Min.   :  0.00         
##  1st Qu.: 3790   1st Qu.:1828   1st Qu.:  0.00      1st Qu.:  0.00         
##  Median : 7406   Median :2134   Median :  4.00      Median :  6.00         
##  Mean   : 7638   Mean   :2304   Mean   : 21.16      Mean   : 13.56         
##  3rd Qu.:10727   3rd Qu.:2793   3rd Qu.: 32.00      3rd Qu.: 19.00         
##  Max.   :36019   Max.   :4900   Max.   :210.00      Max.   :143.00         
##  light_active_minutes sed_active_minutes
##  Min.   :  0.0        Min.   :   0.0    
##  1st Qu.:127.0        1st Qu.: 729.8    
##  Median :199.0        Median :1057.5    
##  Mean   :192.8        Mean   : 991.2    
##  3rd Qu.:264.0        3rd Qu.:1229.5    
##  Max.   :518.0        Max.   :1440.0

Followed by the 5-number summary of the daily sleep pattern.

sleep_day %>%
  select(sleep_minutes, time_in_bed) %>%
  summary()
##  sleep_minutes    time_in_bed   
##  Min.   : 58.0   Min.   : 61.0  
##  1st Qu.:361.0   1st Qu.:403.0  
##  Median :433.0   Median :463.0  
##  Mean   :419.5   Mean   :458.6  
##  3rd Qu.:490.0   3rd Qu.:526.0  
##  Max.   :796.0   Max.   :961.0

Finally, the 5-number summary for the daily calories and steps by the hour.

calories_step_hourly %>%
  select(steps, calories) %>%
  summary()
##      steps            calories     
##  Min.   :    0.0   Min.   : 42.00  
##  1st Qu.:    0.0   1st Qu.: 63.00  
##  Median :   40.0   Median : 83.00  
##  Mean   :  320.2   Mean   : 97.39  
##  3rd Qu.:  357.0   3rd Qu.:108.00  
##  Max.   :10554.0   Max.   :948.00

Analysis Summary

  • Daily Steps and Calories seem to be normally distributed.
  • Very Active and Moderately Active minutes seem to be skewed to the right.
  • Lightly Active and Sedentary minutes seem to be normally distributed.
  • Minutes Asleep and Time in Bed seem to be normally distributed.
  • Calories per Hour and Steps per Hour seem to be skewed to the right.

Share

The following scatter plots will show the effect sleep has on daily activity.

ggplot(activity_sleep_daily_long, aes(x = sleep_minutes, y = value, color = steps)) + geom_point(size = 1) + labs(x = "Minutes Asleep", y = "Activity Minutes", color = "Steps") + facet_grid(~activity_level) + scale_color_gradient(low = "green", high = "red") + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + theme(strip.text = element_text(size = 7))

ggplot(activity_sleep_daily_long, aes(x = sleep_minutes, y = value, color = calories)) + geom_point(size = 1) + labs(x = "Minutes Asleep", y = "Activity Minutes", color = "Calories") + facet_grid(~activity_level) + scale_color_gradient(low = "yellow", high = "blue") + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + theme(strip.text = element_text(size = 7))

It is beneficial to view the relationship between daily steps taken and the daily calories.

#Calculate correlation coefficient.
correlation_coefficient <- cor(activity_sleep_daily$steps, activity_sleep_daily$calories)

# Scatter plot with regression line and value of r.
ggplot(activity_sleep_daily, aes(x = steps, y = calories)) +
  geom_point(size = 1) +
  geom_smooth(method = "lm", se = FALSE, color = "blue") +
  geom_text(x = min(activity_sleep_daily$steps) + 0.1 * diff(range(activity_sleep_daily$steps)),
            y = min(activity_sleep_daily$calories) + 0.9 * diff(range(activity_sleep_daily$calories)),
            label = paste("Correlation:", round(correlation_coefficient, 2)),
            color = "red",
            hjust = -2.7,
            vjust = 8) +
  labs(x = "Daily Steps", y = "Daily Calories") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1), strip.text = element_text(size = 7))
## `geom_smooth()` using formula = 'y ~ x'

The following scatter plots will show how the daily steps and calories fluctuate throughout a day.

ggplot(calories_step_hourly, aes(x = hour, y = steps, color = calories)) + geom_point(size = 1) + labs(x = "Time of Day", y = "Step Count", color = "Calories") +  scale_color_gradient(low = "orange", high = "purple") + theme(axis.text.x = element_text(angle = 45, hjust = 1))

Key Findings

  • Majority of users have logged activity that falls in the category of Sedentary activity. with light activity following in frequency. Moderate and Very Active activity are minimal. These values were compared to the amount of sleep the individual experienced the night before. Based on the information present, the amount of sleep has little effect on the level of activity.

    • It is important to note that the following information is needed to make a clear decision
      • Frequency the health trackers are being worn. There is a possibility that individuals remove their tracker during exercise to limit damage to device. This may cause the findings to show limited Very Active and Moderate activity.
  • There is a statistically significant amount of correlation between Daily Steps and Daily Calories

  • Time of Day vs Step count has a left skew, which is not surprising as it is expected that the majority of daily steps occur between 8:00 and 20:00.

Act

Conclusion

Bellabeat intends to become a strong competitor in the wearable health tracker space. To do this, Bellabeat should focus on areas where FitBit lacks. It seems that FitBit has issues with wearability. Areas that reduce wearability would include: * Needing to be charged for long period of time * Lack of style in device design * Device is sensitive to damage

Bellabeat should design a wearable fitness tracker that will allow for: * Minimal charge time * Considered a wardrobe staple * High durability

A reduction of charge time will allow for the device to be worn more consistently throughut the day. A design that can be worn with any outfit will allow users to wear the device during any occasion without causing distraction. A highly durable device can be worn through all activities ranging from sedentary to highly active.