Background

As per their website: “Bellabeat is the go-to wellness brand for women with an ecosystem of products and services focused on women’s health” (Bellabeat (n.d.) https://bellabeat.com/about/). Urška Sršen, cofounder and Chief Creative Officer of Bellabeat, believes that analyzing smart device fitness data could help unlock new growth opportunities for the company.

Products:

  • Bellabeat app: The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. This data can help users better understand their current habits and make healthy decisions. The Bellabeat app connects to their line of smart wellness products.

  • Leaf: Bellabeat’s classic wellness tracker can be worn as a bracelet, necklace, or clip. The Leaf tracker connects to the Bellabeat app to track activity, sleep, and stress.

  • Time: This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress. The Time watch connects to the Bellabeat app to provide you with insights into your daily wellness.

  • Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day. The Spring bottle connects to the Bellabeat app to track your hydration levels.

Business task

Use data about trends in smart device usage to develop effective marketing strategies for Bellabeat products.

Data used

I worked with the following data set: [FitBit Fitness Tracker Data] https://www.kaggle.com/arashnic/fitbit (CC0: Public Domain, dataset made available through Mobius).

It contains free data from 33 Fitbit users from 4/12/2016 to 5/12/2016. From said dataset, I used the following tables:

  • Daily Activity. It includes the following columns: Id, Date, Steps, Distance, Tracker Distance, Logged Activities, Very Active Distance, Moderately Active Distance, Light Active Distance, Light Active Distance, Sedentary Active Distance, Very Active Minutes, Fairly Active Minutes, Lightly Active Minutes, Sedentary Minutes, Calories.

  • Weight (Add description). It includes the following columns: Id, Date, Weight in Kg, Weigh in Pounds, Fat, BMI, Is Manual (boolean) and Log ID

  • Sleep (Add description). It includes: Id, SleepDay, Total Sleep Records, Total Minutes Asleep and Total Time in Bed.

Setting up work space

install.packages("tidyverse", repos = "http://cran.us.r-project.org")
## Installing package into 'C:/Users/silvi/OneDrive/Documentos/R/win-library/4.1'
## (as 'lib' is unspecified)
## package 'tidyverse' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\silvi\AppData\Local\Temp\RtmpOOOj6k\downloaded_packages
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.1.3
## -- Attaching packages --------------------------------------- tidyverse 1.3.2 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.7
## v tidyr   1.1.4     v stringr 1.4.0
## v readr   2.1.1     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Loading datasets

activity <-read_csv("dailyActivity_merged.csv")
## Rows: 940 Columns: 15
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr  (1): ActivityDate
## dbl (14): Id, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDi...
## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
sleep <- read_csv("sleepDay_merged.csv")
## Rows: 413 Columns: 5
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): SleepDay
## dbl (4): Id, TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed
## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
weight <- read_csv("weightLogInfo_merged.csv")
## Rows: 67 Columns: 8
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): Date
## dbl (6): Id, WeightKg, WeightPounds, Fat, BMI, LogId
## lgl (1): IsManualReport
## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.

Cleaning Process

While previewing the files on a spreadsheet, I noticed that all data sets have the columns ID and Date in common which could allow for merging them. However, the date columns didn’t have a consistent format. The following code was used to standarize the date formats:

activity$ActivityDate = as.POSIXct(activity$ActivityDate, format="%m/%d/%Y", tz=Sys.timezone())
activity$date <- format(activity$ActivityDate, format="%m/%d/%Y")
sleep$SleepDay = as.POSIXct(sleep$SleepDay, format = "%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
sleep$date <- format(sleep$SleepDay, format = "%m/%d/%Y")
weight$Date = as.POSIXct(weight$Date, format = "%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
weight$date <- format(weight$Date, format = "%m/%d/%Y")

Exploring the data

Let’s double check the number of participants

n_distinct(activity$Id)
## [1] 33
n_distinct(sleep$Id)
## [1] 24
n_distinct(weight$Id)
## [1] 8

We can confirm here that 33 distinct users recorded activities, 24 used the sleep features and only 8 recorded weight. This tell us that the weight information is not enough to draw any conclusions since it’s not statistically significant. However, we can take a look at the weight data for those 8 users.

Weigth Data

Fig 1. Weight loss per user id

We can see that 5 out of the 8 users that recorded wight experienced a weight loss. This may be worth investigating further with a larger pool of subjects to see if that’s a trend.

Calories

On the other hand, all users recorded calories burned using the activity features of their smart device. This chart shows us the minimun, maximun and average calories burn daily.

Fig 2. Min, max and average of Calories burned daily

Number of activities recorded per user

Fig 3. Total of activities recorded per user

This graph shows how only 1 (3%) user recorded less than 10 activities, 3 (9%), recorded between 10 and 20 activities and the rest (88%) consistently used their fitbit every day or almost every day. Now, that leads us to the question: What type of activities are being recorded?

Activity per type

Fig 4. Light, Moderate, and Very active activities recorded per user

All users have recorded Lightly activity. This category is also the one with the most minutes recorded per user. Very few minutes of moderate and intense activity. We can conclude from here that users of this tracker are not very active athletes but people interested in recording and improving their regular light activity.

Sleep Records

Fig 5. Number of sleep records per user

Out of the 24 users that recorded sleep activities, about half registered at least one activity per day during the month.

Merging data

merge_1 <- merge(activity, sleep, by = c('Id', 'date'))
all_data <- merge(merge_1, weight, by = c('Id', 'date'))

Visualizing some data

ggplot(data = merge_1, aes(x = VeryActiveMinutes, y = TotalMinutesAsleep)) + geom_point() + geom_smooth() +
  labs(title = "Very Active Minutes vs Total Minutes Asleep")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

ggplot(data = merge_1, aes(x = TotalMinutesAsleep, y = SedentaryMinutes)) + geom_point() + geom_smooth() +
  labs(title = "Minutes Asleep vs Sedentary Minutes")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

Caveats

The data provided doesn’t indicate gender or gender identity. Since Bellabeat’s products are exclusively target to people with menstrual cycles, more research on those subjects is necessary.

Conclusions

  • Most users seem to use the device during their normal day, not for strenuous activities
  • Most users of this type of devices are not typically active athletes, but people with light to moderate activities built into their daily lives
  • There’s no correlation between very active minutes and more sleep time
  • Although there’s an inverse correlation between the number of sedentary minutes and number of asleep minutes.

Recommendations for the marketing campaing

  • The target audience should be people looking to make small changes to their routine
  • The focus of the campaign can be on how adding small increments of activity to your day, can result in more quality sleep