Introduction

Bellabeat is a high-tech manufacturer of health-focused products for women. Collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits. Although Bellabeat is a successful small company, they have the potential to become a larger player in the global smart device market. Urška Sršen, cofounder and Chief Creative Officer of Bellabeat, believes that analyzing smart device fitness data could help unlock new growth opportunities for the company.

Background

Since it was founded in 2013, Bellabeat has grown rapidly and quickly positioned itself as a tech-driven wellness company for women. The company has 5 focus products: bellabeat app, leaf, time, spring and bellabeat membership. Bellabeat is a successful small company, but they have the potential to become a larger player in the global smart device market. Our team have been asked to analyze smart device data to gain insight into how consumers are using their smart devices. The insights we discover will then help guide marketing strategy for the company.

Major Objective

To identify potential opportunities for growth and provide recommendations for the Bellabeat marketing strategy improvement based on trends in smart device usage.

Key Stakeholders:

Urška Sršen: Bellabeat’s cofounder and Chief Creative Officer Sando Mur: Mathematician and Bellabeat’s co-founder

Questions to guide the analysis:

Prepare

The data being used in this case study can be found here: https://www.kaggle.com/datasets/arashnic/fitbit. The data is stored and uploaded in R Studio. This Kaggle data set contains personal fitness tracker from thirty fitbit users. Thirty eligible Fitbit users consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring. It includes information about daily activity, steps, and heart rate that can be used to explore users’ habits. The data set contains 18 CSV files organized in long format.

Whether the dataset follows the ROCCC Analysis as mentioned below:

Reliability - LOW: The data comes from 30 fitbit users who consented to the submission of personal tracker data, including minute-level output for physical activity, heart rate, and sleep monitoring.

Original - LOW: Third party data collect using Amazon Mechanical Turk.

Comprehensive - MED: The dataset contains multiple fields on daily activity intensity, calories used, daily steps taken, daily sleep time and weight record.

Current - LOW: This data is from March 2016 through May 2016. The data is not current, meaning that user habits may have changed over the years.

Cited - LOW: Data was collected from a third party, therefore unknown.

Install & load packages in R

install.packages("tidyverse")
library ("tidyverse")

Process

Importing the datasets

activity <- read.csv("C:/Users/HP/OneDrive/Desktop/Capstone/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/dailyActivity_merged.csv")
calories <- read.csv("C:/Users/HP/OneDrive/Desktop/Capstone/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/hourlyCalories_merged.csv")
intensity <- read.csv("C:/Users/HP/OneDrive/Desktop/Capstone/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/hourlyIntensities_merged.csv")
sleep <- read.csv("C:/Users/HP/OneDrive/Desktop/Capstone/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/sleepDay_merged.csv")
weight <- read.csv("C:/Users/HP/OneDrive/Desktop/Capstone/mturkfitbit_export_4.12.16-5.12.16/Fitabase Data 4.12.16-5.12.16/weightLogInfo_merged.csv")

Cleaning The data

Date Formatting

#intensity

intensity$ActivityHour=as.POSIXct(intensity$ActivityHour, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
intensity$time <- format(intensity$ActivityHour, format = "%H:%M:%S")
intensity$date <- format(intensity$ActivityHour, format = "%m/%d/%y")

# calories

calories$ActivityHour=as.POSIXct(calories$ActivityHour, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
calories$time <- format(calories$ActivityHour, format = "%H:%M:%S")
calories$date <- format(calories$ActivityHour, format = "%m/%d/%y")

# activity

activity$ActivityDate=as.POSIXct(activity$ActivityDate, format="%m/%d/%Y", tz=Sys.timezone())
activity$date <- format(activity$ActivityDate, format = "%m/%d/%y")

# sleep

sleep$SleepDay=as.POSIXct(sleep$SleepDay, format="%m/%d/%Y %I:%M:%S %p", tz=Sys.timezone())
sleep$date <- format(sleep$SleepDay, format = "%m/%d/%y")

Analyze

To beging the analysis phase, we will first see how many participants there are in each category.

# Finding number of participants in each category
n_distinct(activity$Id)  
n_distinct(calories$Id)   
n_distinct(intensity$Id)
n_distinct(sleep$Id)
n_distinct(weight$Id)

There are 33 participants in the activity, calories, and intensities datasets, 24 in the sleep dataset, and only 8 in the weight dataset. The fact that there are only 8 participants in the weight dataset means that more data would be needed to make a strong recommendation or conclusion.

Summary of the datasets:

#activity
activity %>%  
 select(TotalSteps,
        TotalDistance,
        SedentaryMinutes, Calories) %>%
 summary()

# active minutes per category
activity %>%
 select(VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes) %>%
 summary()

# calories
calories %>%
 select(Calories) %>%
 summary()
 
# sleep
sleep %>%
 select(TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed) %>%
 summary()
 
# weight
weight %>%
 select(WeightKg, BMI) %>%
 summary()
 

Observations made from the above summaries:

Merging data

Before beginning to visualize the data, we need to merge the sleep and activity datasets.

merged_data <- merge(sleep, activity, by=c('Id', 'date'))
head(merged_data)

Visualization

ggplot(data=activity, aes(x=TotalSteps, y=Calories)) + 
  geom_point() + geom_smooth() + labs(title="Total Steps vs. Calories")

ggplot(data=sleep, aes(x=TotalMinutesAsleep, y=TotalTimeInBed)) + 
  geom_point()+ labs(title="Total Minutes Asleep vs. Total Time in Bed")

ggplot(data=merged_data, aes(x=TotalMinutesAsleep, y=SedentaryMinutes)) + 
  geom_point(color='darkblue') + geom_smooth() +
  labs(title="Minutes Asleep vs. Sedentary Minutes")
  

Plots:

Fig1
Fig1
Fig2
Fig2
fig3
fig3

Recommendations

As we already know, collecting data on activity, sleep, stress, and reproductive health has allowed Bellabeat to empower women with knowledge about their own health and habits. Since it was founded in 2013, Bellabeat has grown rapidly and quickly positioned itself as a tech-driven wellness company for women.

After analyzing FitBit Fitness Tracker Data, I found some insights that would help influence Bellabeat marketing strategy.